Yes, Mike. I understand that it is shipped with a product that uses it for that purpose. To be honest, I have used Flume in 3 different projects so far and none of them have integrated with Hadoop. I do have an upcoming project that probably will, although Hadoop will probably only be one of the destinations the data is delivered to. The others might be a third party SIEM product as well as some kind of ELK stack, so even in that case Hadoop wouldn’t be the primary “selling” point.
No, I haven’t done profiling yet. At this point my main focus is Log4j. Once I get past that I can take a pass at profiling. It is possible the problem might be in Log4j, but since the embedded Appender just constructs the event and passes it to the Flume Embedded Agent I would be surprised if it is in Log4j. However, while testing I did find one bug already in Log4j that was causing a performance hit with Flume and have corrected that. Ralph > On Apr 28, 2019, at 11:42 AM, Mike Percy <mpe...@apache.org> wrote: > > I’d certainly be in favor of updating the project description to be more > general. That said, part of Flume’s value proposition is integration with a > bunch of components off the shelf and the main ones it ships are Hadoop > ecosystem components, so we shouldn’t completely ignore that when describing > the project. > > Regarding the memory channel perf issues you observed, did you do any > profiling? Do you think part of the issue could be Java GC? The memory > channel tends to allocate and reclaim a lot of memory in a short period of > time. > > Mike > > Sent from my iPhone > >> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote: >> >> What I am seeing is that people go to the home page and cut the first >> paragraph as a description of Flume. All I am really proposing is that we >> change that to more effectively describe Flume. The description that is >> there is accurate but minimal. I would just like to rephrase that paragraph >> to give a more complete description of what Flume can be used for. >> >> As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. >> In doing that I have done some crude benchmarking which you can see at >> http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance >> <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. >> I was quite surprised the performance of the Flume Embedded Appender with a >> memory channel. I would have expected it to be more in line with the Async >> Loggers and at the most in line with the Rolling File Appender since the >> event is essentially handed to another thread to be processed. It would be >> nice to see Flume be able to recommended for use as a log >> forwarder/aggregator for all apps with Docker instead of just when >> guaranteed delivery is required and I would love to upgrade the Flume >> documentation to describe how to do that. >> >> Ralph >> >>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <bes...@apache.org> >>> wrote: >>> >>> I agree that marketing could be improved and I support finding a >>> slogan that represents best what Flume is today. >>> I am not sure about the wording that has been proposed, though. Can >>> you please elaborate, Ralph? >>> >>> >>> Thank you, >>> >>> Donat >>> >>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ralph.go...@dslextreme.com> >>>> wrote: >>>> >>>> When I read sites like >>>> https://www.slant.co/versus/959/960/~fluentd_vs_flume >>>> <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit >>>> discouraged at how people misunderstand Flume. Even a site like >>>> https://www.predictiveanalyticstoday.com/data-ingestion-tools/ >>>> <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is >>>> misleading by copying our home page by just saying "Flume is a >>>> distributed, reliable, and available service for efficiently collecting, >>>> aggregating, and moving large amounts of log data” and then copying the >>>> image. This leads users to believe that Flume is only useful in a small >>>> set of use cases and is intimately tied to Hadoop. >>>> >>>> I believe the home page should be changed to indicate say that "Flume is a >>>> distributed, reliable, and available service for efficiently collecting, >>>> aggregating, and streaming large amounts of data”, and then following up >>>> to indicate that it is appropriate to use to move any kind of streaming >>>> data such as application, audit, or system logs, real time events such as >>>> stock quotes, or user transaction records. >>>> >>>> The second sentence should also be modified to say "It is robust and fault >>>> tolerant with tunable reliability mechanisms that can insure guaranteed >>>> delivery and many failover and recovery mechanisms”. >>>> >>>> I also think the very first image should be modified to not show just a >>>> web application and HDFS as it seems to give people the impression that >>>> Flume is only usable with Hadoop or in web applications. Unfortunately, >>>> only the png seems to have been committed so redoing the diagram will mean >>>> starting from scratch. >>>> >>>> Thoughts? >>>> >>>> Ralph >>> >> > >