On 12 August 2015 at 18:16, Michael Trinkala <mtrink...@mozilla.com> wrote:
> Most looping requirements can be flatten out: i.e., alerting can be > handled in the output plugins in your example and > aggregation/sessionization etc can be handled in the inputs. As for > sharing: things fall into place when you start thinking about it from a > module level instead of an individual plugin level. > > The locations are configurable 'output_path' so you can put the output > files anywhere you want. > Good, I will set it somewhere on the /tmp/ partition, which is already a stored in RAM. > I have some plugins (like a stdin, simple file, TCP. and a pruning (cleans > up the output files when everyone in done with them) inputs; > I'm interested by this pruning plugin: could you share it? Or don't you think it would make sense for Hindsight to provide an option to automaticallly clean up output files? > heka protobuf and a payload outputs etc. but they haven't commited yet). > As for ProcessInput the Input sandbox has access to os.execute so there > won't be a generic version you can just call what you want and handle its > output directly (Hindsight already supports run once, polling, and > continuous input plugins) > Can you explain briefly how to configure input plugins for these 3 options (run once, polling, or continuous)? > The output files will grow until 'output_size' (defaults to 64MiB) before > they are rolled (they are not deleted by default). > Ok. Would it make sense to introduce an option to delete them by default? > I would not make it too small unless you need to prune really quickly > generally I run with a config of 1GIB (on some of our systems that rolls > several times a minute) space permitting I would just roll them after > several minutes of what would be average data flow on your system) > I plan to use a ram disk (tmpfs) to store these files (to avoid writing too much on the sd card), so I would like them to stay small in order to prevent eating too much RAM: does 5MB to 10MB seems reasonable to you? Also, do you plan to provide a debian package for Hindsight to ease installation? Thanks for the great help! Bruno > > Trink > > > > > > On Wed, Aug 12, 2015 at 1:55 AM, bruno binet <bruno.bi...@gmail.com> > wrote: > >> >> >> On 12 August 2015 at 10:19, bruno binet <bruno.bi...@gmail.com> wrote: >> >>> Thanks for all this valuable information. >>> >>> On 11 August 2015 at 17:17, Michael Trinkala <mtrink...@mozilla.com> >>> wrote: >>> >>>> There are a few intentional changes between Heka and Hindsight. >>>> Looping messages in Heka has always been a bad idea so it was removed. >>>> >>> >>> Personally I like the looping messages feature in Heka as it is very >>> flexible and could be useful to share ready-to-use plugins. Also it >>> supports processing messages through multiple ticker_interval which can be >>> useful (alerting, aggregations). >>> >>> >>>> There are a few API enhancements such as a protobuf stream reader and >>>> writer. Checkpoint are all managed by the Hindsight infrastructure (so >>>> much of the burden is removed from the plugin writer, this also alters the >>>> plugin API slightly). The write_message hack for Go has been removed since >>>> messages are immutable. read_config now has access to all related sandbox >>>> config options (standard and user defined). read_next_field is not >>>> supported (this will also be removed from Heka in 0.11). >>>> >>>> In most cases you will find the Hindsight IOPS lower than Heka due to >>>> the much more efficient check pointing (btw Heka 0.11 is moving to a disk >>>> buffer everywhere). >>>> >>> >>> Great, that is good to know. >>> >>> >>>> output_hi/input/* - contains the output from all of the input plugins >>>> output_hi/analysis/* - contains the output from all of the analysis >>>> plugins >>>> >>> >> Are the above files always growing? >> I suppose the output_limit configuration allow us limit their size: what >> are the implications if I limit their size to a few KB? Will it reduce >> Hindsight performance? >> >> >>> hindsight.cp - in the checkpoint file for all I/O (inputs, analysis, and >>>> output plugins) >>>> hindsight.tsv - in the self monitoring performance stats >>>> >>>> They files are all mandatory. They are the reason Hindsight has an at >>>> least once delivery guarantee and they provide valuable insight on system >>>> operation and performance. >>>> >>> >>> If I don't need delivery guarantee, do you think it could make sense to >>> move these files to a ramdisk (tmpfs) partition in order to preserve the >>> flash sd/usb card? >>> >>> >>>> decode_message needs to be turned on for analysis and output plugins, I >>>> will enable it. >>>> >>> >>> Ok, thank you: this is now working as expected. >>> >>> Also, do you plan to implement some additional lua modules to help build >>> input sandboxes similar to Heka input plugins (like the FilePollingInput, >>> the ProcessInput, or the LogstreamerInput)? >>> >>> >>>> >>>> Trink >>>> >>>> >>>> On Tue, Aug 11, 2015 at 1:54 AM, bruno binet <bruno.bi...@gmail.com> >>>> wrote: >>>> >>>>> I see, so I need to investigate how I can merge my multiple lua >>>>> sandbox filters into a single one. >>>>> >>>>> This make me wondering if there is some other differences between >>>>> Hindsight and Heka? >>>>> The fact that only one analysis plugin cannot consume the output of >>>>> another analysis plugin is the only difference beween Hindsight analysis >>>>> plugins and Heka filter sandbox plugins? >>>>> >>>>> Also I saw in another thread that Hindsight uses disk buffers at every >>>>> stage, so there's only ever one >>>>> message in memory at every step of the pipeline: does it mean >>>>> Hindsight will write much more frequently to the disk than Heka? This may >>>>> be an issue for me since we use a raspberry pi which disk is a sdcard or >>>>> usb flash key. >>>>> >>>>> I see that some data is written to the output_path (output_hl/ >>>>> directory in my case): can you explain what are all these files: >>>>> $ tree output_hl/ >>>>> output_hl/ >>>>> |-- analysis >>>>> | `-- 0.log >>>>> |-- hindsight.cp >>>>> |-- hindsight.tsv >>>>> `-- input >>>>> `-- 0.log >>>>> >>>>> Can we avoid generating all these files? >>>>> >>>>> Last question: I don't manage to use the "read_next_field" or >>>>> "decode_message" api function from the output plugin, are they available? >>>>> >>>>> The following error is returned: >>>>> 1439280624615780495 [error] output_plugins terminated: >>>>> output/encode_metric.cfg msg: process_message() >>>>> _hl/output/encode_metric.lua:16: attempt to call global 'read_next_field' >>>>> (a nil value) >>>>> >>>>> or when I change my output plugin to use the decode_message api >>>>> function: >>>>> 1439282566555139226 [error] output_plugins terminated: >>>>> output/encode_metric.cfg msg: process_message() >>>>> _hl/output/encode_metric.lua:15: attempt to call global 'decode_message' >>>>> (a >>>>> nil value) >>>>> >>>>> >>>>> Thanks, >>>>> Bruno >>>>> >>>>> >>>>> On 10 August 2015 at 19:19, Michael Trinkala <mtrink...@mozilla.com> >>>>> wrote: >>>>> >>>>>> There is no message looping in Hindsight (one analysis plugin cannot >>>>>> consume the output of another analysis plugin). In your example the >>>>>> decoding should happen in the input. Heka has Inputs, splitters, and >>>>>> decoder (in Hindsight it is just an Input and common functionality can be >>>>>> split into modules for code reuse). This in general simplifies the >>>>>> configuration, is easier to follow (since everything is in one place) and >>>>>> has performance benefits as well. >>>>>> >>>>>> Trink >>>>>> >>>>>> On Mon, Aug 10, 2015 at 9:23 AM, bruno binet <bruno.bi...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Back from vacations, I'm now playing again with Hindsight on a >>>>>>> raspberry pi. >>>>>>> As reported on github >>>>>>> https://github.com/trink/hindsight/issues/1#issuecomment-119593775 >>>>>>> the compilation now succeeds. >>>>>>> >>>>>>> So getting inspiration from the examples in the benchmarks >>>>>>> directory, I tried to create a Hindsight configuration to use my own lua >>>>>>> sandboxes: I can successfully read data from udp and use a filter to >>>>>>> decode >>>>>>> data, then I would like to use another filter to handle generated >>>>>>> messages, >>>>>>> but I can't get any message in the second filter. Does Hindsight support >>>>>>> more than one filter like Heka? >>>>>>> >>>>>>> Here is the Hindsight configuration, Lua sandboxes and output >>>>>>> directory generated by Hindsight: >>>>>>> https://github.com/bbinet/hindsight_hl_test >>>>>>> >>>>>>> Do you see anything wrong? Do I use hindsight correctly? >>>>>>> >>>>>>> Cheers, >>>>>>> Bruno >>>>>>> >>>>>>> On 8 July 2015 at 09:44, bruno binet <bruno.bi...@gmail.com> wrote: >>>>>>> >>>>>>>> Sure, I will try your branch and report possible new compilation >>>>>>>> issues in github. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Bruno >>>>>>>> >>>>>>>> On 7 July 2015 at 18:26, Michael Trinkala <mtrink...@mozilla.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I changed the checkpoint id to an unsigned long long. Can you test >>>>>>>>> out the branch and add any other compilation errors to the issue >>>>>>>>> (closing >>>>>>>>> out this email thread). I am also taking suggestions/recommendations >>>>>>>>> for a >>>>>>>>> CI build system that supports multiple platforms. TravisCI adds >>>>>>>>> almost no >>>>>>>>> value since I am already building on a Debian based box. >>>>>>>>> >>>>>>>>> https://github.com/trink/hindsight/tree/issue_1 >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Trink >>>>>>>>> >>>>>>>>> On Tue, Jul 7, 2015 at 8:21 AM, bruno binet <bruno.bi...@gmail.com >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Ok, thanks. >>>>>>>>>> And sorry, but I don't have a patch (don't know how to fix this >>>>>>>>>> kind of compilation issue). >>>>>>>>>> >>>>>>>>>> On 7 July 2015 at 16:17, Michael Trinkala <mtrink...@mozilla.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Yeah, I have only been building on Ubuntu and haven't done any >>>>>>>>>>> cross platform clean-up. Thanks for the build output I will fix >>>>>>>>>>> those >>>>>>>>>>> errors (unless you already have a patch). >>>>>>>>>>> >>>>>>>>>>> Trink >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 7, 2015 at 5:57 AM, bruno binet < >>>>>>>>>>> bruno.bi...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I now have some time to do a few tests with Hindsight, so I >>>>>>>>>>>> tried to compile it on our targeted arm platform (raspberry pi), >>>>>>>>>>>> but I get >>>>>>>>>>>> the following error: >>>>>>>>>>>> >>>>>>>>>>>> root@hl-mc-9999-dev:~/hindsight/release# cmake >>>>>>>>>>>> -DCMAKE_BUILD_TYPE=release .. >>>>>>>>>>>> -- The C compiler identification is GNU 4.7.2 >>>>>>>>>>>> -- The CXX compiler identification is GNU 4.7.2 >>>>>>>>>>>> -- Check for working C compiler: /usr/bin/gcc >>>>>>>>>>>> -- Check for working C compiler: /usr/bin/gcc -- works >>>>>>>>>>>> -- Detecting C compiler ABI info >>>>>>>>>>>> -- Detecting C compiler ABI info - done >>>>>>>>>>>> -- Detecting C compile features >>>>>>>>>>>> -- Detecting C compile features - done >>>>>>>>>>>> -- Check for working CXX compiler: /usr/bin/g++ >>>>>>>>>>>> -- Check for working CXX compiler: /usr/bin/g++ -- works >>>>>>>>>>>> -- Detecting CXX compiler ABI info >>>>>>>>>>>> -- Detecting CXX compiler ABI info - done >>>>>>>>>>>> -- Detecting CXX compile features >>>>>>>>>>>> -- Detecting CXX compile features - done >>>>>>>>>>>> -- Found LUASANDBOX: /usr/local/lib/libluasandbox.so >>>>>>>>>>>> -- Configuring done >>>>>>>>>>>> -- Generating done >>>>>>>>>>>> -- Build files have been written to: /root/hindsight/release >>>>>>>>>>>> >>>>>>>>>>>> root@hl-mc-9999-dev:~/hindsight/release# make >>>>>>>>>>>> Scanning dependencies of target hindsight >>>>>>>>>>>> [ 2%] Building C object src/CMakeFiles/hindsight.dir/ >>>>>>>>>>>> hindsight.c.o >>>>>>>>>>>> [ 4%] Building C object src/CMakeFiles/hindsight.dir/ >>>>>>>>>>>> hs_analysis_plugins.c.o >>>>>>>>>>>> [ 6%] Building C object src/CMakeFiles/hindsight.dir/ >>>>>>>>>>>> hs_checkpoint_reader.c.o >>>>>>>>>>>> /root/hindsight/src/hs_checkpoint_reader.c: In function >>>>>>>>>>>> 'find_first_id': >>>>>>>>>>>> /root/hindsight/src/hs_checkpoint_reader.c:46:3: error: large >>>>>>>>>>>> integer implicitly truncated to unsigned type [-Werror=overflow] >>>>>>>>>>>> /root/hindsight/src/hs_checkpoint_reader.c:55:3: error: >>>>>>>>>>>> comparison is always false due to limited range of data type >>>>>>>>>>>> [-Werror=type-limits] >>>>>>>>>>>> cc1: all warnings being treated as errors >>>>>>>>>>>> src/CMakeFiles/hindsight.dir/build.make:100: recipe for target >>>>>>>>>>>> 'src/CMakeFiles/hindsight.dir/hs_checkpoint_reader.c.o' failed >>>>>>>>>>>> make[2]: *** >>>>>>>>>>>> [src/CMakeFiles/hindsight.dir/hs_checkpoint_reader.c.o] >>>>>>>>>>>> Error 1 >>>>>>>>>>>> CMakeFiles/Makefile2:947: recipe for target >>>>>>>>>>>> 'src/CMakeFiles/hindsight.dir/all' failed >>>>>>>>>>>> make[1]: *** [src/CMakeFiles/hindsight.dir/all] Error 2 >>>>>>>>>>>> Makefile:146: recipe for target 'all' failed >>>>>>>>>>>> make: *** [all] Error 2 >>>>>>>>>>>> >>>>>>>>>>>> Do you know what is going on here? I guess this is an issue >>>>>>>>>>>> with the arm platform only? >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Bruno >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 10 June 2015 at 18:41, bruno binet <bruno.bi...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks a lot for your answers. >>>>>>>>>>>>> >>>>>>>>>>>>> And yes, I'm very interested in bootstrapping a first >>>>>>>>>>>>> prototype of my own data pipeline based on Hindsight so that I >>>>>>>>>>>>> can compare >>>>>>>>>>>>> the performance on a raspberry pi. >>>>>>>>>>>>> (here is the current state of our Heka-based data pipeline: >>>>>>>>>>>>> https://bitbucket.org/helioslite/heka-hl-sandboxes) >>>>>>>>>>>>> So it would be great if you can give me the first instructions >>>>>>>>>>>>> on how to build and setup Hindsight. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Bruno >>>>>>>>>>>>> >>>>>>>>>>>>> On 10 June 2015 at 18:18, Michael Trinkala < >>>>>>>>>>>>> mtrink...@mozilla.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> - It is usable and being actively developed with the intent >>>>>>>>>>>>>> to move it into production later this year. >>>>>>>>>>>>>> - We are currently running production data through it for >>>>>>>>>>>>>> testing but it is not deployed in an official capacity. It has >>>>>>>>>>>>>> been very >>>>>>>>>>>>>> stable but until a more robust set of tests have been build out >>>>>>>>>>>>>> I will not >>>>>>>>>>>>>> consider it production ready. >>>>>>>>>>>>>> - Yes, it can decode/encode Heka protobuf format >>>>>>>>>>>>>> - Yes, the router/message matcher is complete. The only >>>>>>>>>>>>>> difference is that it supports Lua string pattern matching >>>>>>>>>>>>>> instead of re2 >>>>>>>>>>>>>> regexp (Heka 'Hostname =~ /^foo/' vs Hindsight 'Hostname =~ >>>>>>>>>>>>>> "^foo"') >>>>>>>>>>>>>> - Yes, but you would need a lua-socket input and output >>>>>>>>>>>>>> sandbox (see benchmarks/hsr_run for related examples) >>>>>>>>>>>>>> - No documentation yet, only examples in the benchmarks >>>>>>>>>>>>>> directory. I could have you bootstrapped in about 30 minutes >>>>>>>>>>>>>> (and >>>>>>>>>>>>>> hopefully turn that into a getting started guide) if you are >>>>>>>>>>>>>> interested. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Implementation wise the only missing piece is support for >>>>>>>>>>>>>> dynamically loading plugins. The actual code to accomplish it >>>>>>>>>>>>>> is very >>>>>>>>>>>>>> small (just detecting files in the load directory and moving >>>>>>>>>>>>>> them to the >>>>>>>>>>>>>> run directory) but ideally it would be fronted by a web server >>>>>>>>>>>>>> and a GUI >>>>>>>>>>>>>> with access control and validation (a much larger effort and >>>>>>>>>>>>>> actually a >>>>>>>>>>>>>> separate project). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Trink >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jun 10, 2015 at 8:15 AM, bruno binet < >>>>>>>>>>>>>> bruno.bi...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I recently discovered the work pushed into the Hindsight >>>>>>>>>>>>>>> repository (https://github.com/trink/hindsight) which seems >>>>>>>>>>>>>>> to be a lightweight alternative to Heka, based on the lua >>>>>>>>>>>>>>> sandbox. >>>>>>>>>>>>>>> The Hindsight vs Heka benchmarks are quite impressive. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm currently running Heka on the raspberry pi (not so >>>>>>>>>>>>>>> powerful) device and the load average quickly increases and >>>>>>>>>>>>>>> exceeds 1 when >>>>>>>>>>>>>>> Heka is ingesting data, so Hindsight could be a good fit for us >>>>>>>>>>>>>>> if it can >>>>>>>>>>>>>>> perform better than Heka in terms of CPU cycles. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What is the current status of Hindsight? Is it just an >>>>>>>>>>>>>>> temporary experiment or will it be maintained and actually used >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> production? >>>>>>>>>>>>>>> Is it currently usable and stable? >>>>>>>>>>>>>>> Is Hindsight able to decode and encode Heka protobuf format? >>>>>>>>>>>>>>> Does Hindsight have a complete router implementation to >>>>>>>>>>>>>>> dispatch messages to sandboxes like in Heka? >>>>>>>>>>>>>>> My use case is basically to read raw text data from UDP >>>>>>>>>>>>>>> socket, parse text data with lua patterns or lpeg, process data >>>>>>>>>>>>>>> through a >>>>>>>>>>>>>>> few lua sandbox filters, then write output messages both to a >>>>>>>>>>>>>>> file >>>>>>>>>>>>>>> (protobuf heka format) and a HTTP server (json format): can >>>>>>>>>>>>>>> this be easily >>>>>>>>>>>>>>> accomplished with Hindsight? >>>>>>>>>>>>>>> Is there any documentation somewhere to get started with >>>>>>>>>>>>>>> Hindsight? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Bruno >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Heka mailing list >>>>>>>>>>>>>>> Heka@mozilla.org >>>>>>>>>>>>>>> https://mail.mozilla.org/listinfo/heka >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Heka mailing list Heka@mozilla.org https://mail.mozilla.org/listinfo/heka