Hi Rob,

about 24-35 messages per partiotion per second. I have tried to play
with fetch_sizes, maxprocs etc. But changes was differences were only
marginal. It seems that heka itself is not a bottleneck, because it
doesn't matter if the process serves 1 or 4 partitions.

It furthermore seems that if I use offset_method 'Oldest', heka has
about 80msg/s per partition throughput and then slows down.

I do see spikes of about 400 msg/s in the queue. Could be the problem,
that kafka needs to have more partitions? My understanding was that
throughput of kafka should be at least order of magnitude higher.

Thank you,

    Antonin


* Rob Miller <[email protected]> [2015-04-16 07:52] wrote:
> What sort of throughput are you seeing?
> 
> On 04/14/2015 10:53 PM, Antonin Kral wrote:
> >Hi Rob and Gil,
> >
> >I am trying to do something similar, even in smaller scale. My
> >configuration looks something like:
> >
> >     [ModgenKafkaInput0]
> >     type = "KafkaInput"
> >     topic = "logs"
> >     addrs = ["kafka_tmp.modgen.net:9092"]
> >     splitter = "KafkaSplitter"
> >     decoder = "ProtobufDecoder"
> >     group = "kafka-client-group01"
> >     partition = 0
> >     event_buffer_size = 512
> >     max_open_reqests = 8
> >     default_fetch_size = 65536
> >
> >     [ModgenKafkaInput1]
> >     type = "KafkaInput"
> >     topic = "logs"
> >     addrs = ["kafka_tmp.modgen.net:9092"]
> >     splitter = "KafkaSplitter"
> >     decoder = "ProtobufDecoder"
> >     group = "kafka-client-group01"
> >     partition = 1
> >     event_buffer_size = 512
> >     max_open_reqests = 8
> >     default_fetch_size = 65536
> >
> >     [KafkaSplitter]
> >     type = "NullSplitter"
> >     use_message_bytes = true
> >
> >     [ESJsonEncoder]
> >     es_index_from_timestamp = true
> >     type_name = "%{Type}"
> >
> >     [ElasticSearchOutput]
> >     server = "http://localhost:9200";
> >     message_matcher = "Type !~ /^heka/"
> >     encoder = "ESJsonEncoder"
> >     flush_interval = 100 # in ms
> >     flush_count = 50
> >     use_buffering = true
> >     queue_max_buffer_size = 102400000
> >     queue_full_action = "shutdown"
> >
> >I am currently observing quite low throughput though. Not rely sure
> >why, but it seems that problem is between kafka and heka.
> >
> >     Best,
> >
> >         Antonin
> >
> >
> >* Rob Miller <[email protected]> [2015-04-15 07:41] wrote:
> >> On 04/14/2015 12:48 PM, Gil Fliker wrote:
> >>> Thx for the quick response,
> >>>
> >>> I am not yet familiar with all of heka's futures specifically with
> >>> "message_matcher".
> >> That's one of Heka's fundamental concepts, please see 
> >> http://hekad.readthedocs.org/en/v0.9.1/index.html, 
> >> http://hekad.readthedocs.org/en/v0.9.1/getting_started.html and 
> >> http://hekad.readthedocs.org/en/v0.9.1/message_matcher.html.
> >>> Let me just add that the reasoning behind the high number of partitions
> >>> is to enable parallelism to support the throughput needed.
> >>>
> >>> Can you please point me in a direction for a similar heka example ?
> >> Sorry, there's no existing example that I can point you to at the moment. 
> >> We're happy to answer specific questions, to the extent we're able, but 
> >> every massively parallel data processing infrastructure is going to be 
> >> different, you're going to have to get familiar with the building blocks 
> >> that Heka provides and drill down a bit before you'll be able to get a 
> >> useful response. :)
> >>
> >> -r
> >>
> >>>
> >>>
> >>> Thx
> >>>
> >>>
> >>>
> >>>
> >>> Gil Fliker
> >>>
> >>>
> >>> On Tue, Apr 14, 2015 at 3:22 PM, Rob Miller <[email protected]
> >>> <mailto:[email protected]>> wrote:
> >>>
> >>>     Yes, currently a single KafkaInput can only pull from a single Kafka
> >>>     partition. You can think of Heka's KafkaInput as analogous to a
> >>>     SimpleConsumer (see
> >>>     
> >>> https://cwiki.apache.org/__confluence/display/KAFKA/0.8.__0+SimpleConsumer+Example
> >>>     
> >>> <https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example>).
> >>>
> >>>     If you want to manage inter-partition coordination, along the lines
> >>>     of what is described as a "High Level Consumer"
> >>>     
> >>> (https://cwiki.apache.org/__confluence/display/KAFKA/__Consumer+Group+Example
> >>>     
> >>> <https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example>),
> >>>     you'd handle that at the filter layer. For instance, you might set
> >>>     up a filter plugin with a message_matcher constructed such that it
> >>>     catches all of the messages from a single topic, regardless of
> >>>     partition, and perform any necessary correlations therein. The
> >>>     delivery semantics to this filter would match that described on the
> >>>     consumer group example page linked above, i.e. all of the messages
> >>>     from a single partition will be received in the correct order, but
> >>>     the messages from across partitions would be non-deterministically
> >>>     interleaved.
> >>>
> >>>     If there are so many partitions carrying so much data that a single
> >>>     Heka instance can't handle them all, then you might have to have one
> >>>     box handling one subset of partitions, another box processing a
> >>>     different subset, and each of *those* in turn feeding into a third
> >>>     box that performs the next level of correlation.
> >>>
> >>>     In other words, the building blocks are there, but you have to
> >>>     actually use them to put together a more sophisticated system. We're
> >>>     unfortunately not yet at the point where there are higher level
> >>>     constructs that will automatically distribute load for you.
> >>>
> >>>     Hope this helps!
> >>>
> >>>     -r
> >>>
> >>>
> >>>
> >>>     On 04/10/2015 02:21 PM, Gil Fliker wrote:
> >>>
> >>>         Hi,
> >>>
> >>>         We are about to start a poc using Heka.
> >>>
> >>>         The plan is to pipe messages via Kafka transport and Heka being 
> >>> the
> >>>         endpoints speaking http with various producers and consumers.
> >>>
> >>>         I saw in the documentation that you have to specify a partition
> >>>         number
> >>>         and only one partition number ?
> >>>
> >>>         Our Kafka topic setup will be made of around 1000 partitions.
> >>>
> >>>         What is the best way to approach this ?
> >>>
> >>>
> >>>         Thx
> >>>
> >>>
> >>>         Gil Fliker
> >>>
> >>>         Outbrain Operations Manager
> >>>
> >>>         The above terms reflect a potential business arrangement, are
> >>>         provided
> >>>         solely as a basis for further discussion, and are not intended
> >>>         to be and
> >>>         do not constitute a legally binding obligation. No legally binding
> >>>         obligations will be created, implied, or inferred until an
> >>>         agreement in
> >>>         final form is executed in writing by all parties involved.
> >>>
> >>>         This email and any attachments hereto may be confidential or
> >>>         privileged.
> >>>            If you received this communication by mistake, please don't
> >>>         forward it
> >>>         to anyone else, please erase all copies and attachments, and
> >>>         please let
> >>>         me know that it has gone to the wrong person. Thanks.
> >>>
> >>>
> >>>         _________________________________________________
> >>>         Heka mailing list
> >>>         [email protected] <mailto:[email protected]>
> >>>         https://mail.mozilla.org/__listinfo/heka
> >>>         <https://mail.mozilla.org/listinfo/heka>
> >>>
> >>>
> >>>
> >>>
> >>> The above terms reflect a potential business arrangement, are provided
> >>> solely as a basis for further discussion, and are not intended to be and
> >>> do not constitute a legally binding obligation. No legally binding
> >>> obligations will be created, implied, or inferred until an agreement in
> >>> final form is executed in writing by all parties involved.
> >>>
> >>> This email and any attachments hereto may be confidential or privileged.
> >>>   If you received this communication by mistake, please don't forward it
> >>> to anyone else, please erase all copies and attachments, and please let
> >>> me know that it has gone to the wrong person. Thanks.
> >>
> >> _______________________________________________
> >> Heka mailing list
> >> [email protected]
> >> https://mail.mozilla.org/listinfo/heka
> >>
> 
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to