Thanks David for the quick response and your valuable suggestions. Added back to the list.
I just want to be able to publish my data into Heka and may be used by other systems. So it's a latter case. I appreciate the inspiration link and will read and watch those. You guys really helped me. Thank you a lot! Emily On Fri, Jan 8, 2016 at 11:10 AM, David Birdsong <david.birds...@gmail.com> wrote: > > > On Fri, Jan 8, 2016 at 10:28 AM, Emily Gu <77.e...@gmail.com> wrote: > >> Thank you very much, David. >> By using your code snippet and the [RstEncoder], I can see messages. >> This helps me to understand how to interactive with Heka programmatically. >> >> My project is a Go project and I have been with it for 4 months. I'm new >> to Golang and Heka as well. >> >> Let's say my array of metrics data is as following: >> >> type MetricType struct { >> Namespace []string `json:"namespace"` >> LastAdvertisedTime time.Time `json:"last_advertised_time"` >> Version int `json:"version"` >> Config *cda.ConfigDataNode `json:"config"` >> Data interface{} `json:"data"` >> Labels []core.Label `json:"labels"` >> Tags map[string]string `json:"tags"` >> Source string `json:"source"` >> Timestamp time.Time `json:"timestamp"` >> } >> >> I can see two ways to interactive with Heka. >> >> 1. Convert each metric data point to Heka message.Message and send them >> over using TcpInput. >> 2. Write our own Heka plugin >> >> Please comment on the performance and scalability in terms of real time >> large data collecting. >> >> I really appreciate all your help! >> > > Maybe I'd start by asking the question of what you want heka to do for > you. The native Heka protobuf format maps directly to the in-memory > structure of a given message as it traverses through other phases. So a > simple question is, do you want to operate on the message in heka itself or > do you just want heka to ship your data to another system? > > If the latter, than simply embedding an opaque set of bytes as the payload > might be the way to go. In that case, heka is just a bit-shipper throwing > the payload bytes at things like Elastic Search, Influxdb, kafka, random > http endpoint etc.. > > Heka is interesting when you use it to operate on your data though. In > this case, by using the native format, you've bypassed the need to think > about input and decode and get to work on interesting filters. Check out > Rob's talk at Monitorama for some inspiration on what's possible: > https://egustafson.github.io/monitorama-2015.html#heka-workshop > > ..hope that helps. > > Emily >> >> >> >> On Fri, Jan 8, 2016 at 2:13 AM, David Birdsong <david.birds...@gmail.com> >> wrote: >> >>> >>> >>> On Fri, Jan 8, 2016 at 12:14 AM, Emily Gu <77.e...@gmail.com> wrote: >>> >>>> Hi Rob & David, >>>> >>>> We have our own data collectors and publishers. We would like to >>>> publish data into Heka using TCP. Questions are: >>>> >>>> 1. If directly publish data through the existing TcpInput plugin, what >>>> generic decoder, splitter may I use to exam the data and make sure all is >>>> correct? >>>> >>> >>> you dont need a decoder or splitter to examine the data. you need an >>> output plugin and encoder. this is what i use when i'm debugging: >>> >>> [debug] >>> type = "LogOutput" >>> encoder = "RstEncoder" >>> message_matcher = 'TRUE' # or any suitable message_matcher >>> [RstEncoder] >>> >>> 2. If I write my own custom input plugin, do I need to write my own >>>> decoder, and output plugins as well? >>>> >>> >>> it depends, but probably not. if you're application is already a go >>> project, it's tempting to use the heka native protobuf plus the framing. if >>> you follow either snippet i sent you, it should work. >>> >>> >>>> >>>> Currently, I programmatically send data over to TcpInput, I can never >>>> exam the data I sent and don't know what's going on. If it's possible we >>>> can have a meeting tomorrow so that I can show you what I need. >>>> >>> >>> try the above. if you need more help, i'm happy to have a quick chat if >>> you'd like. >>> >>> >>> >>>> >>>> Thank you very much! >>>> Emily >>>> >>>> >>>> On Thu, Jan 7, 2016 at 4:23 PM, Emily Gu <77.e...@gmail.com> wrote: >>>> >>>>> Thanks for the point, David. I'll take a look. >>>>> >>>>> Thanks, >>>>> Emily >>>>> >>>>> On Thu, Jan 7, 2016 at 4:07 PM, David Birdsong < >>>>> david.birds...@gmail.com> wrote: >>>>> >>>>>> here's a shorter, more succinct gist: >>>>>> https://gist.github.com/davidbirdsong/e2a829c9519790e8d9df >>>>>> >>>>>> On Thu, Jan 7, 2016 at 4:04 PM, David Birdsong < >>>>>> david.birds...@gmail.com> wrote: >>>>>> >>>>>>> great, that's the info we needed. >>>>>>> >>>>>>> so you can drop the TcpOutput for now since it appears that you are >>>>>>> trying to write in the native heka format to a heka process from your >>>>>>> own >>>>>>> app. having heka write to another endpoint might be useful later, but it >>>>>>> doesn't need to write to itself or the LogOutput plugin to display your >>>>>>> messages in stdout. >>>>>>> >>>>>>> i'm not sure what it's in message_bytes, but here's a snippet that >>>>>>> you can use as a reference. >>>>>>> >>>>>>> https://github.com/imgix/hekametrics/blob/master/hekalogger.go >>>>>>> >>>>>>> >>>>>>> On Thu, Jan 7, 2016 at 3:48 PM, Emily Gu <77.e...@gmail.com> wrote: >>>>>>> >>>>>>>> Thanks you both Rob and David very much! >>>>>>>> >>>>>>>> Not sure where I need to define "base_dir"? >>>>>>>> >>>>>>>> I'm going to write a Heka plugin to pass our metrics data into Heka. >>>>>>>> >>>>>>>> For now, I have a hard time to see the data I send in through >>>>>>>> TCP programmatically through TcpInput in the output.log file. >>>>>>>> I don't see any output. The configs are: >>>>>>>> >>>>>>>> tcp_input.toml >>>>>>>> ============ >>>>>>>> >>>>>>>> [hekad] >>>>>>>> >>>>>>>> maxprocs = 1 >>>>>>>> >>>>>>>> share_dir = "/Users/egu/heka/share/heka" >>>>>>>> >>>>>>>> >>>>>>>> [tcp_in:3242] >>>>>>>> >>>>>>>> type = "TcpInput" >>>>>>>> >>>>>>>> splitter = "HekaFramingSplitter" >>>>>>>> >>>>>>>> decoder = "ProtobufDecoder" >>>>>>>> >>>>>>>> address = ":3242" >>>>>>>> >>>>>>>> >>>>>>>> tcp_output.toml >>>>>>>> >>>>>>>> ============== >>>>>>>> >>>>>>>> [hekad] >>>>>>>> >>>>>>>> maxprocs = 1 >>>>>>>> >>>>>>>> share_dir = "/Users/egu/heka/share/heka" >>>>>>>> >>>>>>>> >>>>>>>> [tcp_out:3242] >>>>>>>> >>>>>>>> type = "TcpOutput" >>>>>>>> >>>>>>>> message_matcher = "TRUE" >>>>>>>> >>>>>>>> address = "127.0.0.1:3242" >>>>>>>> >>>>>>>> >>>>>>>> [tcp_heka_output_log] >>>>>>>> >>>>>>>> type = "FileOutput" >>>>>>>> >>>>>>>> message_matcher = "TRUE" >>>>>>>> >>>>>>>> path = "/tmp/output.log" >>>>>>>> >>>>>>>> perm = "664" >>>>>>>> >>>>>>>> encoder = "tcp_heka_output_encoder" >>>>>>>> >>>>>>>> >>>>>>>> [tcp_heka_output_encoder] >>>>>>>> >>>>>>>> type = "PayloadEncoder" >>>>>>>> >>>>>>>> append_newlines = false >>>>>>>> >>>>>>>> >>>>>>>> The client: >>>>>>>> >>>>>>>> package main >>>>>>>> >>>>>>>> >>>>>>>> import ( >>>>>>>> >>>>>>>> "fmt" >>>>>>>> >>>>>>>> "github.com/mozilla-services/heka/client" >>>>>>>> >>>>>>>> ) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> func main() { >>>>>>>> >>>>>>>> message_bytes := []byte {100} >>>>>>>> >>>>>>>> >>>>>>>> sender, err := client.NewNetworkSender("tcp", "127.0.0.1:3242") >>>>>>>> >>>>>>>> if err != nil { >>>>>>>> >>>>>>>> fmt.Println("Could not connect to", "127.0.0.1:3242") >>>>>>>> >>>>>>>> return >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> fmt.Println("Connected") >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> var i int >>>>>>>> >>>>>>>> for i = 0; i < 10; i++ { >>>>>>>> >>>>>>>> fmt.Println("message byte:", string(message_bytes)) >>>>>>>> >>>>>>>> err = sender.SendMessage(message_bytes) >>>>>>>> >>>>>>>> if err != nil { >>>>>>>> >>>>>>>> break >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> fmt.Println("sent", i, "messages") >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> Please let me know what else I need to change. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Emily >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jan 7, 2016 at 3:28 PM, David Birdsong < >>>>>>>> david.birds...@gmail.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jan 7, 2016 at 3:22 PM, Rob Miller <rmil...@mozilla.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On 01/07/2016 03:09 PM, Emily Gu wrote: >>>>>>>>>> >>>>>>>>>>> Thanks David for all the help! I'll give it a try. >>>>>>>>>>> >>>>>>>>>>> Please bear with me as some parts I still not understand. >>>>>>>>>>> >>>>>>>>>>> 1. Why do I have to run two Heka instances where one for input >>>>>>>>>>> and >>>>>>>>>>> another for output? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Because if you send the output from a Heka instance back into >>>>>>>>>> itself, then you're likely setting up an infinite loop of traffic >>>>>>>>>> that will >>>>>>>>>> spin out of control. >>>>>>>>>> >>>>>>>>>> 2. Did you mean I need to specify different share_dirs in input >>>>>>>>>>> and >>>>>>>>>>> output Toml configs? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If you're running multiple Heka instances on a single machine, it >>>>>>>>>> *should* be fine for them to use the same share_dir, which is >>>>>>>>>> read-only. >>>>>>>>>> It's very important that each specifies a unique base_dir, however, >>>>>>>>>> since >>>>>>>>>> that's used by Heka for internal bookkeeping data. Two Heka's using >>>>>>>>>> the >>>>>>>>>> same base_dir is asking for trouble. >>>>>>>>>> >>>>>>>>>> 3. Do I need both TcpOutput and FileOutput in order for me to see >>>>>>>>>>> messages inside an output file? What if I didn't specify >>>>>>>>>>> TcpOutput? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Um, TcpOutput sends output data over a TCP connection. It expects >>>>>>>>>> that there is a listener on the other side which will accept that TCP >>>>>>>>>> connection, and will know how to correctly handle the data that Heka >>>>>>>>>> is >>>>>>>>>> sending over the TCP connection. >>>>>>>>>> >>>>>>>>>> FileOutput sends data to a file on the local file system. >>>>>>>>>> >>>>>>>>>> It's of course fine to specify a FileOutput without specifying a >>>>>>>>>> TcpOutput. >>>>>>>>>> >>>>>>>>>> -r >>>>>>>>>> >>>>>>>>> >>>>>>>>> whoops, yes I meant base_dir for where heka writes various >>>>>>>>> internal state information to. >>>>>>>>> >>>>>>>>> Emily, >>>>>>>>> >>>>>>>>> Maybe you could share what data you're trying to read into heka >>>>>>>>> and what you would like to do with it and we could help get you going. >>>>>>>>> >>>>>>>>> Heka intended to a uni-directional pipeline. It can read data in >>>>>>>>> from many places into various formats, aggregate into interesting new >>>>>>>>> formats, and finally emit data somewhere else. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Heka mailing list Heka@mozilla.org https://mail.mozilla.org/listinfo/heka