Re: [heka] S3 output plugin

Dan Mon, 17 Mar 2014 16:08:34 -0700

On 17 March 2014 21:32, Rob Miller <[email protected]> wrote:

> On Mon 17 Mar 2014 01:30:10 PM PDT, Dan wrote:
>
>> Thanks for the reply, we're starting with just focusing on output to
>> Carbon but will look at something like this when we get there now we
>> know what's possible.
>>
>> Another related question we've found is the json format in the
>> FileOutput plugin. I think it exports json that resembles the internal
>> data structures but I think it would be possible to output cleaner json.
>>
>> Have you thought about having output encoders at all? I've seen code
>> in the Elasticsearch output that exports simple json I think might
>> work so it could be good to share a json decoder between that, the
>> FileOutput and a possible S3 output plugin.
>>
>
> Yes, absolutely. This has been on our radar for quite some time, as
> evidenced by this ticket I opened 6 months ago:
>
> https://github.com/mozilla-services/heka/issues/417
>
> We've been thinking about this a lot recently, and our plan is to
> introduce an Encoder plugin type that is analogous to the Decoder type.
> Encoders will take Heka messages as input and will emit raw bytes as
> output. They'll be encapsulated within Outputs the way that Decoders are
> encapsulated within Inputs. There will be a Go interface defined for the
> Encoder plugin type, of course, but we'll provide a SandboxEncoder so you
> can use Lua to do whatever you want, and we'll do most of our work there.
>
> This will give us some additional advantages. Right now, for instance, the
> CarbonOutput manages its own TCP connections, and has rudimentary
> keep-alive and reconnection support. But we already have a TcpOutput that
> has much more robust reconnection support, including the use of disk queues
> to ensure we don't lose messages through the disconnect / reconnect cycle.
> Ideally the CarbonOutput would go away, and instead you'd use a
> CarbonEncoder coupled with a TcpOutput, or a UdpOutput, or a
> <Whatever>Output, so all of the transport layer complexity only has to be
> gotten right once.
>
> That ticket has been open for a while, but this is on our short list of
> what's coming next. Not everything in our 0.6 milestone (see
> http://is.gd/azbUSB) will actually make it into the 0.6 release, but that
> one definitely will.
>
>
That sounds great, we've not decided how we're going to output and store
data yet but that flexibility would be good.



>
>  Are you using Heka to archive log data at mozilla? what format are you
>> storing that in if so?
>>
>
> Yes, we're using Heka to parse nginx and rsyslog logs into JSON (we ship
> w/ decoders for these formats: http://is.gd/B2F6qv and http://is.gd/sUiE8b)
> which we're then feeding into ElasticSearch. Unfortunately, we're finding
> that ES is having a hard time keeping up. A single machine running both
> nginx and Heka can produce and parse more log data than a cluster of 3 ES
> nodes on the same hardware can keep up with. ES is great, easy to use, and
> Kibana is awesome, but it may not be up to the scale that we need. Or we
> may be able to find a way to have Heka do more aggregation and pre-calc so
> that we don't have to slam ES so hard. Hard to say at this point.
>
>
We've started with our nginx logs too, we're currently just parsing stats
from them to send to carbon so not persisting the logs via Heka otherwise.
It's working well for us so far, but we've not pushed it much yet.

We have thought about ES too so that's interesting to hear, I've heard from
people before they've managed to get it to deal with quite a lot of
throughput but I'm not sure on the specifics. But at the end of the day you
are shoving it into a Lucene index which is quite a bit of computation!

Another potential we're thinking about too is kafka integration so we can
use that as the main message bus out of our applications then combine that
with other sources like nginx to archive together into graphite, ES, S3,
etc...


>  Thanks,
>>
>
> You're welcome!
>
> -r
>
>
>
>  Dan
>>
>> On 4 March 2014 17:44, Rob Miller <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     On Tue 04 Mar 2014 06:26:16 AM PST, Dan wrote:
>>
>>         Hi,
>>
>>
>>     Hi back!
>>
>>
>>         We are just evaluating Heka for use as our log and metrics
>>         aggregation
>>         system.
>>
>>
>>     Great! Hope you like what you find.
>>
>>
>>         We would like to archive our logs in S3 so it would be good if
>>         Heka could also store batches directly into a bucket.
>>
>>         Is anyone working on a S3 output plugin for Heka? If not we
>>         might look
>>         at starting to write one.
>>
>>
>>     I'm not aware of anyone actively working on an S3 output at the
>>     moment, no. We have, however, built Cloudwatch plugins, both an
>>     input and an output:
>>
>>     https://github.com/mozilla-__services/heka-mozsvc-plugins/_
>> _blob/dev/cloudwatch.go
>>
>>     <https://github.com/mozilla-services/heka-mozsvc-plugins/
>> blob/dev/cloudwatch.go>
>>
>>     Those use the crowdmob fork of Canonical's goamz package to handle
>>     the details of interfacing w/ Amazon's API authentication
>>     framework. You should be able to use that code as a model to get
>>     something bootstrapped pretty easily.
>>
>>     Our Cloudwatch plugins aren't in the Heka core, they're in a
>>     separate repo we set up for plugins that we think would be less
>>     widely used. Ultimately we'll probably create a separate repo
>>     specifically for AWS related plugins, so the Cloudwatch, S3, and
>>     any other Amazon-related plugins that get developed could have a
>>     nice cozy home together.
>>
>>     Hope this helps!
>>
>>     -r
>>
>>
>>

_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] S3 output plugin

Reply via email to