I have instrumented word count to track how many machines the code runs
on. I use an accumulator to maintain a Set or MacAddresses. I find that
everything is done on a single machine. This is probably optimal for word
count but not the larger problems I am working on.
How to a force processing to be split into multiple tasks. How to I access
the task and attempt numbers to track which processing happens in which
attempt. Also is using MacAddress to determine which machine is running the
code.
As far as I can tell a simple word count is running in one thread on  one
machine and the remainder of the cluster does nothing,
This is consistent with tests where I write to sdout from functions and see
little output on most machines in the cluster

Reply via email to