Thanks David for the quick response and your valuable suggestions. Added
back to the list.

I just want to be able to publish my data into Heka and may be used by
other systems. So it's a latter case. I appreciate the inspiration link and
will read and watch those.

You guys really helped me. Thank you a lot!

Emily


On Fri, Jan 8, 2016 at 11:10 AM, David Birdsong <david.birds...@gmail.com>
wrote:

>
>
> On Fri, Jan 8, 2016 at 10:28 AM, Emily Gu <77.e...@gmail.com> wrote:
>
>> Thank you very much, David.
>> By using your code snippet and the [RstEncoder], I can see messages.
>> This helps me to understand how to interactive with Heka programmatically.
>>
>> My project is a Go project and I have been with it for 4 months. I'm new
>> to Golang and Heka as well.
>>
>> Let's say my array of metrics data is as following:
>>
>> type MetricType struct {
>> Namespace []string `json:"namespace"`
>> LastAdvertisedTime time.Time `json:"last_advertised_time"`
>> Version int `json:"version"`
>> Config *cda.ConfigDataNode `json:"config"`
>> Data interface{} `json:"data"`
>> Labels []core.Label `json:"labels"`
>> Tags map[string]string `json:"tags"`
>> Source string `json:"source"`
>> Timestamp time.Time `json:"timestamp"`
>> }
>>
>> I can see two ways to interactive with Heka.
>>
>> 1. Convert each metric data point to Heka message.Message and send them
>> over using TcpInput.
>> 2. Write our own Heka plugin
>>
>> Please comment on the performance and scalability in terms of real time
>> large data collecting.
>>
>> I really appreciate all your help!
>>
>
> Maybe I'd start by asking the question of what you want heka to do for
> you. The native Heka protobuf format maps directly to the in-memory
> structure of a given message as it traverses through other phases. So a
> simple question is, do you want to operate on the message in heka itself or
> do you just want heka to ship your data to another system?
>
> If the latter, than simply embedding an opaque set of bytes as the payload
> might be the way to go. In that case, heka is just a bit-shipper throwing
> the payload bytes at things like Elastic Search, Influxdb, kafka, random
> http endpoint etc..
>
> Heka is interesting when you use it to operate on your data though. In
> this case, by using the native format, you've bypassed the need to think
> about input and decode and get to work on interesting filters. Check out
> Rob's talk at Monitorama for some inspiration on what's possible:
> https://egustafson.github.io/monitorama-2015.html#heka-workshop
>
> ..hope that helps.
>
> Emily
>>
>>
>>
>> On Fri, Jan 8, 2016 at 2:13 AM, David Birdsong <david.birds...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Jan 8, 2016 at 12:14 AM, Emily Gu <77.e...@gmail.com> wrote:
>>>
>>>> Hi Rob & David,
>>>>
>>>> We have our own data collectors and publishers. We would like to
>>>> publish data into Heka using TCP. Questions are:
>>>>
>>>> 1. If directly publish data through the existing TcpInput plugin, what
>>>> generic decoder, splitter may I use to exam the data and make sure all is
>>>> correct?
>>>>
>>>
>>> you dont need a decoder or splitter to examine the data. you need an
>>> output plugin and encoder. this is what i use when i'm debugging:
>>>
>>> [debug]
>>> type = "LogOutput"
>>> encoder = "RstEncoder"
>>> message_matcher = 'TRUE' # or any suitable message_matcher
>>> [RstEncoder]
>>>
>>> 2. If I write my own custom input plugin, do I need to write my own
>>>> decoder, and output plugins as well?
>>>>
>>>
>>> it depends, but probably not. if you're application is already a go
>>> project, it's tempting to use the heka native protobuf plus the framing. if
>>> you follow either snippet i sent you, it should work.
>>>
>>>
>>>>
>>>> Currently, I programmatically send data over to TcpInput, I can never
>>>> exam the data I sent and don't know what's going on. If it's possible we
>>>> can have a meeting tomorrow so that I can show you what I need.
>>>>
>>>
>>> try the above. if you need more help, i'm happy to have a quick chat if
>>> you'd like.
>>>
>>>
>>>
>>>>
>>>> Thank you very much!
>>>> Emily
>>>>
>>>>
>>>> On Thu, Jan 7, 2016 at 4:23 PM, Emily Gu <77.e...@gmail.com> wrote:
>>>>
>>>>> Thanks for the point, David. I'll take a look.
>>>>>
>>>>> Thanks,
>>>>> Emily
>>>>>
>>>>> On Thu, Jan 7, 2016 at 4:07 PM, David Birdsong <
>>>>> david.birds...@gmail.com> wrote:
>>>>>
>>>>>> here's a shorter, more succinct gist:
>>>>>> https://gist.github.com/davidbirdsong/e2a829c9519790e8d9df
>>>>>>
>>>>>> On Thu, Jan 7, 2016 at 4:04 PM, David Birdsong <
>>>>>> david.birds...@gmail.com> wrote:
>>>>>>
>>>>>>> great, that's the info we needed.
>>>>>>>
>>>>>>> so you can drop the TcpOutput for now since it appears that you are
>>>>>>> trying to write in the native heka format to a heka process from your 
>>>>>>> own
>>>>>>> app. having heka write to another endpoint might be useful later, but it
>>>>>>> doesn't need to write to itself or the LogOutput plugin to display your
>>>>>>> messages in stdout.
>>>>>>>
>>>>>>> i'm not sure what it's in message_bytes, but here's a snippet that
>>>>>>> you can use as a reference.
>>>>>>>
>>>>>>> https://github.com/imgix/hekametrics/blob/master/hekalogger.go
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 7, 2016 at 3:48 PM, Emily Gu <77.e...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks you both Rob and David very much!
>>>>>>>>
>>>>>>>> Not sure where I need to define "base_dir"?
>>>>>>>>
>>>>>>>> I'm going to write a Heka plugin to pass our metrics data into Heka.
>>>>>>>>
>>>>>>>> For now, I have a hard time to see the data I send in through
>>>>>>>> TCP programmatically through TcpInput in the output.log file.
>>>>>>>> I don't see any output.  The configs are:
>>>>>>>>
>>>>>>>> tcp_input.toml
>>>>>>>> ============
>>>>>>>>
>>>>>>>> [hekad]
>>>>>>>>
>>>>>>>> maxprocs = 1
>>>>>>>>
>>>>>>>> share_dir = "/Users/egu/heka/share/heka"
>>>>>>>>
>>>>>>>>
>>>>>>>> [tcp_in:3242]
>>>>>>>>
>>>>>>>> type = "TcpInput"
>>>>>>>>
>>>>>>>> splitter = "HekaFramingSplitter"
>>>>>>>>
>>>>>>>> decoder = "ProtobufDecoder"
>>>>>>>>
>>>>>>>> address = ":3242"
>>>>>>>>
>>>>>>>>
>>>>>>>> tcp_output.toml
>>>>>>>>
>>>>>>>> ==============
>>>>>>>>
>>>>>>>> [hekad]
>>>>>>>>
>>>>>>>> maxprocs = 1
>>>>>>>>
>>>>>>>> share_dir = "/Users/egu/heka/share/heka"
>>>>>>>>
>>>>>>>>
>>>>>>>> [tcp_out:3242]
>>>>>>>>
>>>>>>>> type = "TcpOutput"
>>>>>>>>
>>>>>>>> message_matcher = "TRUE"
>>>>>>>>
>>>>>>>> address = "127.0.0.1:3242"
>>>>>>>>
>>>>>>>>
>>>>>>>> [tcp_heka_output_log]
>>>>>>>>
>>>>>>>> type = "FileOutput"
>>>>>>>>
>>>>>>>> message_matcher = "TRUE"
>>>>>>>>
>>>>>>>> path = "/tmp/output.log"
>>>>>>>>
>>>>>>>> perm = "664"
>>>>>>>>
>>>>>>>> encoder = "tcp_heka_output_encoder"
>>>>>>>>
>>>>>>>>
>>>>>>>> [tcp_heka_output_encoder]
>>>>>>>>
>>>>>>>> type = "PayloadEncoder"
>>>>>>>>
>>>>>>>> append_newlines = false
>>>>>>>>
>>>>>>>>
>>>>>>>> The client:
>>>>>>>>
>>>>>>>> package main
>>>>>>>>
>>>>>>>>
>>>>>>>> import (
>>>>>>>>
>>>>>>>>     "fmt"
>>>>>>>>
>>>>>>>>     "github.com/mozilla-services/heka/client"
>>>>>>>>
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> func main() {
>>>>>>>>
>>>>>>>>     message_bytes := []byte {100}
>>>>>>>>
>>>>>>>>
>>>>>>>>     sender, err := client.NewNetworkSender("tcp", "127.0.0.1:3242")
>>>>>>>>
>>>>>>>>     if err != nil {
>>>>>>>>
>>>>>>>>         fmt.Println("Could not connect to", "127.0.0.1:3242")
>>>>>>>>
>>>>>>>>         return
>>>>>>>>
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     fmt.Println("Connected")
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     var i int
>>>>>>>>
>>>>>>>>     for i = 0; i < 10; i++ {
>>>>>>>>
>>>>>>>>     fmt.Println("message byte:", string(message_bytes))
>>>>>>>>
>>>>>>>>         err = sender.SendMessage(message_bytes)
>>>>>>>>
>>>>>>>>         if err != nil {
>>>>>>>>
>>>>>>>>             break
>>>>>>>>
>>>>>>>>         }
>>>>>>>>
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     fmt.Println("sent", i, "messages")
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> Please let me know what else I need to change.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Emily
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 7, 2016 at 3:28 PM, David Birdsong <
>>>>>>>> david.birds...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 7, 2016 at 3:22 PM, Rob Miller <rmil...@mozilla.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 01/07/2016 03:09 PM, Emily Gu wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks David for all the help! I'll give it a try.
>>>>>>>>>>>
>>>>>>>>>>> Please bear with me as some parts I still not understand.
>>>>>>>>>>>
>>>>>>>>>>> 1. Why do I have to run two Heka instances where one for input
>>>>>>>>>>> and
>>>>>>>>>>> another for output?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Because if you send the output from a Heka instance back into
>>>>>>>>>> itself, then you're likely setting up an infinite loop of traffic 
>>>>>>>>>> that will
>>>>>>>>>> spin out of control.
>>>>>>>>>>
>>>>>>>>>> 2. Did you mean I need to specify different share_dirs in input
>>>>>>>>>>> and
>>>>>>>>>>> output Toml configs?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If you're running multiple Heka instances on a single machine, it
>>>>>>>>>> *should* be fine for them to use the same share_dir, which is 
>>>>>>>>>> read-only.
>>>>>>>>>> It's very important that each specifies a unique base_dir, however, 
>>>>>>>>>> since
>>>>>>>>>> that's used by Heka for internal bookkeeping data. Two Heka's using 
>>>>>>>>>> the
>>>>>>>>>> same base_dir is asking for trouble.
>>>>>>>>>>
>>>>>>>>>> 3. Do I need both TcpOutput and FileOutput in order for me to see
>>>>>>>>>>> messages inside an output file? What if I didn't specify
>>>>>>>>>>> TcpOutput?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Um, TcpOutput sends output data over a TCP connection. It expects
>>>>>>>>>> that there is a listener on the other side which will accept that TCP
>>>>>>>>>> connection, and will know how to correctly handle the data that Heka 
>>>>>>>>>> is
>>>>>>>>>> sending over the TCP connection.
>>>>>>>>>>
>>>>>>>>>> FileOutput sends data to a file on the local file system.
>>>>>>>>>>
>>>>>>>>>> It's of course fine to specify a FileOutput without specifying a
>>>>>>>>>> TcpOutput.
>>>>>>>>>>
>>>>>>>>>> -r
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> whoops, yes I meant base_dir for where heka writes various
>>>>>>>>> internal state information to.
>>>>>>>>>
>>>>>>>>> Emily,
>>>>>>>>>
>>>>>>>>> Maybe you could share what data you're trying to read into heka
>>>>>>>>> and what you would like to do with it and we could help get you going.
>>>>>>>>>
>>>>>>>>> Heka intended to a uni-directional pipeline. It can read data in
>>>>>>>>> from many places into various formats, aggregate into interesting new
>>>>>>>>> formats, and finally emit data somewhere else.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
_______________________________________________
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka

Reply via email to