I am running on a 15 node cluster and am trying to set partitioning to
balance the work across all nodes. I am using an Accumulator to track work
by Mac Address but would prefer to use data known to the Spark environment
-  Executor ID, and Function ID show up in the Spark UI and Task ID and
Attempt ID (assuming these work like Hadoop) would be useful.
Does someone know how code running in a function can access these
parameters. I think I have asked this group several times about Task IDand
Attempt ID without getting a reply.

Incidentally the data I collect suggests that my execution is not at all
balanced

Reply via email to