Gwen,

I'm curious about this use case. Given the Kafka -> HDFS flow, it obviously
relates to Copycat. More generally, this could be a problem even when
streaming data if your processing takes too long such that your consumer
simply can't keep up with the rate at which messages are produced.

The "easy" solution would have been to use more partitions since the
problem in both the batch and streaming cases is that you need more
processing throughput. In the case that required modifying Camus, was this
not an option simply because making that modification was too painful
(i.e., if there had been more partitions to start with, it might not have
been needed at all) or because there were other constraints on partitioning?

-Ewen

On Thu, Jul 23, 2015 at 2:45 PM, Gwen Shapira <gshap...@cloudera.com> wrote:

> Agree.
>
> On Thu, Jul 23, 2015 at 2:43 PM, Jiangjie Qin <j...@linkedin.com.invalid>
> wrote:
> > Ah, I see. Thanks for the use case, Gwen. I guess in that case it seems
> the
> > time to use low level consumer.
> >
> > Jiangjie (Becket) Qin
> >
> > On Thu, Jul 23, 2015 at 9:52 AM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> >
> >> As crazy as it sounds, there is an actual use-case there.
> >>
> >> Writing to HDFS can be very slow, so if you do a batch dump from a
> >> topic to HDFS, you may want more consumers reading from the topic than
> >> for "normal" streaming use-cases. We ended up modifying Camus to split
> >> a partition between multiple mappers (take start and end offsets and
> >> divide into ranges) to solve this problem. Not exactly a round-robin
> >> but same idea.
> >>
> >> I think thats what J A was referring to in "decoupling consumers" -
> >> different consumers have slightly different requirements.
> >>
> >> Gwen
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jul 23, 2015 at 9:44 AM, Jiangjie Qin <j...@linkedin.com.invalid
> >
> >> wrote:
> >> > J A,
> >> >
> >> > It looks to me that in your case you actually want to scale the topic,
> >> > right? Otherwise wouldn't a single consumer be enough?
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Wed, Jul 22, 2015 at 7:39 PM, J A <mbatth...@gmail.com> wrote:
> >> >
> >> >> Why have partition at all, if I don't need to scale topic. Coupling
> >> topic
> >> >> scalability with consumer scalability just goes against messaging
> >> systems
> >> >> core principle of decoupling consumer and producers
> >> >>
> >> >> On Wednesday, July 22, 2015, Aditya Auradkar
> >> >> <aaurad...@linkedin.com.invalid>
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > Why not simply have as many partitions as the set of consumers you
> >> want
> >> >> to
> >> >> > round robin across?
> >> >> >
> >> >> > Aditya
> >> >> >
> >> >> > On Wed, Jul 22, 2015 at 2:37 PM, Ashish Singh <asi...@cloudera.com
> >> >> > <javascript:;>> wrote:
> >> >> >
> >> >> > > Hey, don't you think that would be against the basic ordering
> >> >> guarantees
> >> >> > > Kafka provides?
> >> >> > >
> >> >> > > On Wed, Jul 22, 2015 at 2:14 PM, J A <mbatth...@gmail.com
> >> >> <javascript:;>>
> >> >> > wrote:
> >> >> > >
> >> >> > > > Hi, This is reference to stackoverflow question "
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> http://stackoverflow.com/questions/31547216/kafka-log-deletion-and-load-balancing-across-consumers
> >> >> > > > "
> >> >> > > > Since Kafka 0.8 already maintains a client offset, i would
> like to
> >> >> > > request
> >> >> > > > a feature, where a single partition consumption can be round
> robin
> >> >> > > across a
> >> >> > > > set of consumers. The message delivery strategy should be an
> >> option
> >> >> > > chosen
> >> >> > > > by the consumer.
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > >
> >> >> > > Regards,
> >> >> > > Ashish
> >> >> > >
> >> >> >
> >> >>
> >>
>



-- 
Thanks,
Ewen

Reply via email to