Proper collocation of computations and data.

Alexei Scherbakov Wed, 19 Apr 2017 10:10:04 -0700

Guys,

Currently I'm looking into the problem how to properly deliver computation
to data in most efficient way.


Basically I need to iterate over all cache partitions on all grid nodes,
compute some function on each key-value pair and return aggregated result
to a caller.

This is a job for map-reduce API.

But it seems where is no possibility to easily manage automatic routing and
failover of compute jobs to data nodes containing specific partitions.

I found interesting paragraph in javadoc for @AffinityKeyMapped annotation.

Collocating Computations And Data
It is also possible to route computations to the nodes where the data is
cached. This concept is otherwise known as Collocation Of Computations And
Data ....

which makes strong sense for me.

But in fact this is not working. Instead we only have automatic routing(and
failover) for special cases: affinityCall and affinityRun with explicit
partition. And it seems I can't longer use task sessions for them with
recent changes in Compute API (removed withAsync support)

I think this is not OK and we should allow jobs to be automatically routed
to data if they have some annotation attached to them specifying partition
and cache names, same as for affinityCall/Run. Probably we should introduce
special task type for such workflows, something like AffinityComputeTask,
without explicit mapping phase, for convenient usage.

I'm willing to make a JIRA ticket for this.

Thoughs ?

-- 

Best regards,
Alexei Scherbakov

Proper collocation of computations and data.

Reply via email to