Guys, Currently I'm looking into the problem how to properly deliver computation to data in most efficient way.
Basically I need to iterate over all cache partitions on all grid nodes, compute some function on each key-value pair and return aggregated result to a caller. This is a job for map-reduce API. But it seems where is no possibility to easily manage automatic routing and failover of compute jobs to data nodes containing specific partitions. I found interesting paragraph in javadoc for @AffinityKeyMapped annotation. Collocating Computations And Data It is also possible to route computations to the nodes where the data is cached. This concept is otherwise known as Collocation Of Computations And Data .... which makes strong sense for me. But in fact this is not working. Instead we only have automatic routing(and failover) for special cases: affinityCall and affinityRun with explicit partition. And it seems I can't longer use task sessions for them with recent changes in Compute API (removed withAsync support) I think this is not OK and we should allow jobs to be automatically routed to data if they have some annotation attached to them specifying partition and cache names, same as for affinityCall/Run. Probably we should introduce special task type for such workflows, something like AffinityComputeTask, without explicit mapping phase, for convenient usage. I'm willing to make a JIRA ticket for this. Thoughs ? -- Best regards, Alexei Scherbakov