Gianmarco,
                 I am coming from storm community. I think PKG is a very 
interesting and we can provide an implementation of Partitioner for PKG. Can 
you open a JIRA for this.

-- 
Harsha
Sent with Airmail

On April 3, 2015 at 4:49:15 AM, Gianmarco De Francisci Morales 
(g...@apache.org) wrote:

Hi,  

We have recently studied the problem of load balancing in distributed  
stream processing systems such as Samza [1].  
In particular, we focused on what happens when the key distribution of the  
stream is skewed when using key grouping.  
We developed a new stream partitioning scheme (which we call Partial Key  
Grouping). It achieves better load balancing than hashing while being more  
scalable than round robin in terms of memory.  

In the paper we show a number of mining algorithms that are easy to  
implement with partial key grouping, and whose performance can benefit from  
it. We think that it might also be useful for a larger class of algorithms.  

PKG has already been integrated in Storm [2], and I would like to be able  
to use it in Samza as well. As far as I understand, Kafka producers are the  
ones that decide how to partition the stream (or Kafka topic). Even after  
doing a bit of reading, I am still not sure if I should be writing this  
email here or on the Samza dev list. Anyway, my first guess is Kafka.  

I do not have experience with Kafka, however partial key grouping is very  
easy to implement: it requires just a few lines of code in Java when  
implemented as a custom grouping in Storm [3].  
I believe it should be very easy to integrate.  

For all these reasons, I believe it will be a nice addition to Kafka/Samza.  
If the community thinks it's a good idea, I will be happy to offer support  
in the porting.  

References:  
[1]  
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
  
[2] https://issues.apache.org/jira/browse/STORM-632  
[3] https://github.com/gdfm/partial-key-grouping  
--  
Gianmarco  

Reply via email to