I love the hello-samza project -- it's quite magical to run a bunch of commands 
and see real data flow through the example job. Great idea to use Wikipedia's 
IRC feed!

However, I feel the setup process is still a bit intimidating and fragile. I 
just wanted to bounce around some ideas about how we could make it quicker to 
get started:

• YARN is very heavyweight (100MB download). Could we avoid using YARN in 
hello-samza, in favour of LocalJobFactory? Does Kafka have a local mode for 
development that doesn't require Zookeeper? The fewer dependencies the better.

• The Vagrant bootstrap script was quite broken -- I submitted a pull request 
(https://github.com/linkedin/hello-samza/pull/18) which should hopefully fix it.

• I somehow got my setup into a bad state (where YARN was running but its web 
UI wouldn't load); I think it happened because I ran `vagrant up` at the same 
time as `bin/grid bootstrap` outside of the VM, and the two processes trampled 
on each other. Deleting the 'deploy' directory and starting from a clean slate 
fixed it. Can we isolate Vagrant and local-OS bootstrap from each other?

• Can we make task logs go to stdout by default? Logs provide reassurance that 
something is happening, and at the moment you have to dig around somewhere in 
the deploy directory to find the log files.

• Can we shorten the commands? Having to unpack the .tar.gz file and then 
copy/paste a scary long run-job.sh line makes the process feel arcane, and 
obscures what is really happening. Perhaps just a shell script wrapper for 
run-job.sh or a maven goal would do it.

• Would it be possible to have maven download the dependencies, rather than 
bin/grid calling curl on random URLs? Somehow it feels weird to have a script 
download and run random code off the internet (although of course that's what 
every package manager does, it's irrational). It would also avoid 
re-downloading everything in case you decide to blow away the deploy directory.

What do you think? Please chime in. I'm happy to work on these things, just 
wanted to get a read on what people think first.

Martin

Reply via email to