On Mon 17 Mar 2014 01:30:10 PM PDT, Dan wrote:
Thanks for the reply, we're starting with just focusing on output to
Carbon but will look at something like this when we get there now we
know what's possible.
Another related question we've found is the json format in the
FileOutput plugin. I think it exports json that resembles the internal
data structures but I think it would be possible to output cleaner json.
Have you thought about having output encoders at all? I've seen code
in the Elasticsearch output that exports simple json I think might
work so it could be good to share a json decoder between that, the
FileOutput and a possible S3 output plugin.
Yes, absolutely. This has been on our radar for quite some time, as
evidenced by this ticket I opened 6 months ago:
https://github.com/mozilla-services/heka/issues/417
We've been thinking about this a lot recently, and our plan is to
introduce an Encoder plugin type that is analogous to the Decoder type.
Encoders will take Heka messages as input and will emit raw bytes as
output. They'll be encapsulated within Outputs the way that Decoders
are encapsulated within Inputs. There will be a Go interface defined
for the Encoder plugin type, of course, but we'll provide a
SandboxEncoder so you can use Lua to do whatever you want, and we'll do
most of our work there.
This will give us some additional advantages. Right now, for instance,
the CarbonOutput manages its own TCP connections, and has rudimentary
keep-alive and reconnection support. But we already have a TcpOutput
that has much more robust reconnection support, including the use of
disk queues to ensure we don't lose messages through the disconnect /
reconnect cycle. Ideally the CarbonOutput would go away, and instead
you'd use a CarbonEncoder coupled with a TcpOutput, or a UdpOutput, or
a <Whatever>Output, so all of the transport layer complexity only has
to be gotten right once.
That ticket has been open for a while, but this is on our short list of
what's coming next. Not everything in our 0.6 milestone (see
http://is.gd/azbUSB) will actually make it into the 0.6 release, but
that one definitely will.
Are you using Heka to archive log data at mozilla? what format are you
storing that in if so?
Yes, we're using Heka to parse nginx and rsyslog logs into JSON (we
ship w/ decoders for these formats: http://is.gd/B2F6qv and
http://is.gd/sUiE8b) which we're then feeding into ElasticSearch.
Unfortunately, we're finding that ES is having a hard time keeping up.
A single machine running both nginx and Heka can produce and parse more
log data than a cluster of 3 ES nodes on the same hardware can keep up
with. ES is great, easy to use, and Kibana is awesome, but it may not
be up to the scale that we need. Or we may be able to find a way to
have Heka do more aggregation and pre-calc so that we don't have to
slam ES so hard. Hard to say at this point.
Thanks,
You're welcome!
-r
Dan
On 4 March 2014 17:44, Rob Miller <[email protected]
<mailto:[email protected]>> wrote:
On Tue 04 Mar 2014 06:26:16 AM PST, Dan wrote:
Hi,
Hi back!
We are just evaluating Heka for use as our log and metrics
aggregation
system.
Great! Hope you like what you find.
We would like to archive our logs in S3 so it would be good if
Heka could also store batches directly into a bucket.
Is anyone working on a S3 output plugin for Heka? If not we
might look
at starting to write one.
I'm not aware of anyone actively working on an S3 output at the
moment, no. We have, however, built Cloudwatch plugins, both an
input and an output:
https://github.com/mozilla-__services/heka-mozsvc-plugins/__blob/dev/cloudwatch.go
<https://github.com/mozilla-services/heka-mozsvc-plugins/blob/dev/cloudwatch.go>
Those use the crowdmob fork of Canonical's goamz package to handle
the details of interfacing w/ Amazon's API authentication
framework. You should be able to use that code as a model to get
something bootstrapped pretty easily.
Our Cloudwatch plugins aren't in the Heka core, they're in a
separate repo we set up for plugins that we think would be less
widely used. Ultimately we'll probably create a separate repo
specifically for AWS related plugins, so the Cloudwatch, S3, and
any other Amazon-related plugins that get developed could have a
nice cozy home together.
Hope this helps!
-r
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka