Hey guys,

I think the first step here would be to expose a partitioner interface for
the new producer that would make it easy to plug in these different
strategies. I filed a JIRA for this:
https://issues.apache.org/jira/browse/KAFKA-2091

-Jay

On Fri, Apr 3, 2015 at 9:36 AM, Harsha <ka...@harsha.io> wrote:

> Gianmarco,
>                  I am coming from storm community. I think PKG is a very
> interesting and we can provide an implementation of Partitioner for PKG.
> Can you open a JIRA for this.
>
> --
> Harsha
> Sent with Airmail
>
> On April 3, 2015 at 4:49:15 AM, Gianmarco De Francisci Morales (
> g...@apache.org) wrote:
>
> Hi,
>
> We have recently studied the problem of load balancing in distributed
> stream processing systems such as Samza [1].
> In particular, we focused on what happens when the key distribution of the
> stream is skewed when using key grouping.
> We developed a new stream partitioning scheme (which we call Partial Key
> Grouping). It achieves better load balancing than hashing while being more
> scalable than round robin in terms of memory.
>
> In the paper we show a number of mining algorithms that are easy to
> implement with partial key grouping, and whose performance can benefit from
> it. We think that it might also be useful for a larger class of algorithms.
>
> PKG has already been integrated in Storm [2], and I would like to be able
> to use it in Samza as well. As far as I understand, Kafka producers are the
> ones that decide how to partition the stream (or Kafka topic). Even after
> doing a bit of reading, I am still not sure if I should be writing this
> email here or on the Samza dev list. Anyway, my first guess is Kafka.
>
> I do not have experience with Kafka, however partial key grouping is very
> easy to implement: it requires just a few lines of code in Java when
> implemented as a custom grouping in Storm [3].
> I believe it should be very easy to integrate.
>
> For all these reasons, I believe it will be a nice addition to Kafka/Samza.
> If the community thinks it's a good idea, I will be happy to offer support
> in the porting.
>
> References:
> [1]
>
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2] https://issues.apache.org/jira/browse/STORM-632
> [3] https://github.com/gdfm/partial-key-grouping
> --
> Gianmarco
>

Reply via email to