Right. I'm familiar with the map/reduce process and the proposed improvements.

This part of the blog threw me off:

"as the map/reduce tasks now run in parallel over both the nodes in the cluster 
and within the same node (multiple threads)"

To me, it implies that there are now multiple map threads per node. Further, I 
thought that the map / reduce 'working set' was limited to what was in memory. 
I did not realize that map / reduce would iterate over all of the data both in 
memory and on disk. That is good to hear, though I'm curious if it will apply 
to all cache stores (e.g. LevelDB) and how ISPN map / reduce handles a data set 
that is greater than the available memory. A lot in-memory stores face this 
limitation when backed by on-disk stores. If the data is retrieved one entry at 
a time, I don't see how multiple threads will help. However, if it is retrieved 
in bulk I can see how it might. Not entirely sure.

Shane

----- Original Message -----
From: "Vladimir Blagojevic" <[email protected]>
To: "infinispan -Dev List" <[email protected]>
Cc: "Shane Johnson" <[email protected]>
Sent: Tuesday, September 17, 2013 10:32:39 AM
Subject: Re: [infinispan-dev] blog on new cache store API

Shane,

When MapReduce command arrives on the Infinispan node it is execute on a 
single thread that carries the incoming message. I have done preliminary 
work on multithreaded execution [1] but I have not get around to 
complete it. The main idea is that incoming thread submits a task to 
executor that in turns splits map/reduce work on multiple threads and 
executes work. Once work is completed incoming thread is given result to 
return response back.

I am not sure how Mircea implemented parallel iteration in stores but it 
is definitely a different beast. Although I agree with him that parallel 
reading from stores definitely helps. The above thread we mentioned will 
wait for reading from stores much less.

Hope it all makes more sense now!

Regards,
Vladimir

[1] https://github.com/vblagoje/infinispan/tree/t_2284


On 13-09-16 4:59 PM, Shane Johnson wrote:
> But there are now multiple map threads per node? Or, is there one map thread 
> and multiple cache store threads? I'm not sure how a single map thread could 
> benefit from multiple cache store threads.
>
> Shane
>
> ----- Original Message -----
> From: "Mircea Markus" <[email protected]>
> To: "infinispan -Dev List" <[email protected]>
> Sent: Monday, September 16, 2013 3:31:01 PM
> Subject: Re: [infinispan-dev] blog on new cache store API
>
>
> On Sep 16, 2013, at 7:11 PM, Shane Johnson <[email protected]> wrote:
>
>> "parallel iteration: it is now possible to iterate over entries in the store 
>> with multiple threads in parallel. Map/Reduce tasks immediately benefit from 
>> this, as the map/reduce  tasks now run in parallel over both the nodes in 
>> the cluster and within the same node (multiple threads)"
>>
>> Does this apply to entries in the cache as well (ISPN-2284)?
> no, only the entries in the store are iterated in parallel.
>
> Cheers,

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to