Hi Rob and Gil,

I am trying to do something similar, even in smaller scale. My
configuration looks something like:

    [ModgenKafkaInput0]
    type = "KafkaInput"
    topic = "logs"
    addrs = ["kafka_tmp.modgen.net:9092"]
    splitter = "KafkaSplitter"
    decoder = "ProtobufDecoder"
    group = "kafka-client-group01"
    partition = 0
    event_buffer_size = 512
    max_open_reqests = 8
    default_fetch_size = 65536

    [ModgenKafkaInput1]
    type = "KafkaInput"
    topic = "logs"
    addrs = ["kafka_tmp.modgen.net:9092"]
    splitter = "KafkaSplitter"
    decoder = "ProtobufDecoder"
    group = "kafka-client-group01"
    partition = 1
    event_buffer_size = 512
    max_open_reqests = 8
    default_fetch_size = 65536

    [KafkaSplitter]
    type = "NullSplitter"
    use_message_bytes = true

    [ESJsonEncoder]
    es_index_from_timestamp = true
    type_name = "%{Type}"

    [ElasticSearchOutput]
    server = "http://localhost:9200";
    message_matcher = "Type !~ /^heka/"
    encoder = "ESJsonEncoder"
    flush_interval = 100 # in ms
    flush_count = 50
    use_buffering = true
    queue_max_buffer_size = 102400000
    queue_full_action = "shutdown"

I am currently observing quite low throughput though. Not rely sure
why, but it seems that problem is between kafka and heka.

    Best,

        Antonin


* Rob Miller <[email protected]> [2015-04-15 07:41] wrote:
> On 04/14/2015 12:48 PM, Gil Fliker wrote:
> >Thx for the quick response,
> >
> >I am not yet familiar with all of heka's futures specifically with
> >"message_matcher".
> That's one of Heka's fundamental concepts, please see 
> http://hekad.readthedocs.org/en/v0.9.1/index.html, 
> http://hekad.readthedocs.org/en/v0.9.1/getting_started.html and 
> http://hekad.readthedocs.org/en/v0.9.1/message_matcher.html.
> >Let me just add that the reasoning behind the high number of partitions
> >is to enable parallelism to support the throughput needed.
> >
> >Can you please point me in a direction for a similar heka example ?
> Sorry, there's no existing example that I can point you to at the moment. 
> We're happy to answer specific questions, to the extent we're able, but every 
> massively parallel data processing infrastructure is going to be different, 
> you're going to have to get familiar with the building blocks that Heka 
> provides and drill down a bit before you'll be able to get a useful response. 
> :)
> 
> -r
> 
> >
> >
> >Thx
> >
> >
> >
> >
> >Gil Fliker
> >
> >
> >On Tue, Apr 14, 2015 at 3:22 PM, Rob Miller <[email protected]
> ><mailto:[email protected]>> wrote:
> >
> >    Yes, currently a single KafkaInput can only pull from a single Kafka
> >    partition. You can think of Heka's KafkaInput as analogous to a
> >    SimpleConsumer (see
> >    
> > https://cwiki.apache.org/__confluence/display/KAFKA/0.8.__0+SimpleConsumer+Example
> >    
> > <https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example>).
> >
> >    If you want to manage inter-partition coordination, along the lines
> >    of what is described as a "High Level Consumer"
> >    
> > (https://cwiki.apache.org/__confluence/display/KAFKA/__Consumer+Group+Example
> >    
> > <https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example>),
> >    you'd handle that at the filter layer. For instance, you might set
> >    up a filter plugin with a message_matcher constructed such that it
> >    catches all of the messages from a single topic, regardless of
> >    partition, and perform any necessary correlations therein. The
> >    delivery semantics to this filter would match that described on the
> >    consumer group example page linked above, i.e. all of the messages
> >    from a single partition will be received in the correct order, but
> >    the messages from across partitions would be non-deterministically
> >    interleaved.
> >
> >    If there are so many partitions carrying so much data that a single
> >    Heka instance can't handle them all, then you might have to have one
> >    box handling one subset of partitions, another box processing a
> >    different subset, and each of *those* in turn feeding into a third
> >    box that performs the next level of correlation.
> >
> >    In other words, the building blocks are there, but you have to
> >    actually use them to put together a more sophisticated system. We're
> >    unfortunately not yet at the point where there are higher level
> >    constructs that will automatically distribute load for you.
> >
> >    Hope this helps!
> >
> >    -r
> >
> >
> >
> >    On 04/10/2015 02:21 PM, Gil Fliker wrote:
> >
> >        Hi,
> >
> >        We are about to start a poc using Heka.
> >
> >        The plan is to pipe messages via Kafka transport and Heka being the
> >        endpoints speaking http with various producers and consumers.
> >
> >        I saw in the documentation that you have to specify a partition
> >        number
> >        and only one partition number ?
> >
> >        Our Kafka topic setup will be made of around 1000 partitions.
> >
> >        What is the best way to approach this ?
> >
> >
> >        Thx
> >
> >
> >        Gil Fliker
> >
> >        Outbrain Operations Manager
> >
> >        The above terms reflect a potential business arrangement, are
> >        provided
> >        solely as a basis for further discussion, and are not intended
> >        to be and
> >        do not constitute a legally binding obligation. No legally binding
> >        obligations will be created, implied, or inferred until an
> >        agreement in
> >        final form is executed in writing by all parties involved.
> >
> >        This email and any attachments hereto may be confidential or
> >        privileged.
> >           If you received this communication by mistake, please don't
> >        forward it
> >        to anyone else, please erase all copies and attachments, and
> >        please let
> >        me know that it has gone to the wrong person. Thanks.
> >
> >
> >        _________________________________________________
> >        Heka mailing list
> >        [email protected] <mailto:[email protected]>
> >        https://mail.mozilla.org/__listinfo/heka
> >        <https://mail.mozilla.org/listinfo/heka>
> >
> >
> >
> >
> >The above terms reflect a potential business arrangement, are provided
> >solely as a basis for further discussion, and are not intended to be and
> >do not constitute a legally binding obligation. No legally binding
> >obligations will be created, implied, or inferred until an agreement in
> >final form is executed in writing by all parties involved.
> >
> >This email and any attachments hereto may be confidential or privileged.
> >  If you received this communication by mistake, please don't forward it
> >to anyone else, please erase all copies and attachments, and please let
> >me know that it has gone to the wrong person. Thanks.
> 
> _______________________________________________
> Heka mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/heka
> 
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to