Yes, Mike. I understand that it is shipped with a product that uses it for that 
purpose. To be honest, I have used Flume in 3 different projects so far and 
none of them have integrated with Hadoop. I do have an upcoming project that 
probably will, although Hadoop will probably only be one of the destinations 
the data is delivered to. The others might be a third party SIEM product as 
well as some kind of ELK stack, so even in that case Hadoop wouldn’t be the 
primary “selling” point.

No, I haven’t done profiling yet. At this point my main focus is Log4j. Once I 
get past that I can take a pass at profiling. It is possible the problem might 
be in Log4j, but since the embedded Appender just constructs the event and 
passes it to the Flume Embedded Agent I would be surprised if it is in Log4j. 
However, while testing I did find one bug already in Log4j that was causing a 
performance hit with Flume and have corrected that. 

Ralph

> On Apr 28, 2019, at 11:42 AM, Mike Percy <mpe...@apache.org> wrote:
> 
> I’d certainly be in favor of updating the project description to be more 
> general. That said, part of Flume’s value proposition is integration with a 
> bunch of components off the shelf and the main ones it ships are Hadoop 
> ecosystem components, so we shouldn’t completely ignore that when describing 
> the project.
> 
> Regarding the memory channel perf issues you observed, did you do any 
> profiling? Do you think part of the issue could be Java GC? The memory 
> channel tends to allocate and reclaim a lot of memory in a short period of 
> time.
> 
> Mike
> 
> Sent from my iPhone
> 
>> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote:
>> 
>> What I am seeing is that people go to the home page and cut the first 
>> paragraph as a description of Flume. All I am really proposing is that we 
>> change that to more effectively describe Flume. The description that is 
>> there is accurate but minimal. I would just like to rephrase that paragraph 
>> to give a more complete description of what Flume can be used for.
>> 
>> As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. 
>> In doing that I have done some crude benchmarking which you can see at 
>> http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance 
>> <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>.
>>  I was quite surprised the performance of the Flume Embedded Appender with a 
>> memory channel. I would have expected it to be more in line with the Async 
>> Loggers and at the most in line with the Rolling File Appender since the 
>> event is essentially handed to another thread to be processed.  It would be 
>> nice to see Flume be able to recommended for use as a log 
>> forwarder/aggregator for all apps with Docker instead of just when 
>> guaranteed delivery is required and I would love to upgrade the Flume 
>> documentation to describe how to do that.
>> 
>> Ralph
>> 
>>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <bes...@apache.org> 
>>> wrote:
>>> 
>>> I agree that marketing could be improved and I support finding a
>>> slogan that represents best what Flume is today.
>>> I am not sure about the wording that has been proposed, though. Can
>>> you please elaborate, Ralph?
>>> 
>>> 
>>> Thank you,
>>> 
>>> Donat
>>> 
>>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ralph.go...@dslextreme.com> 
>>>> wrote:
>>>> 
>>>> When I read sites like 
>>>> https://www.slant.co/versus/959/960/~fluentd_vs_flume 
>>>> <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit 
>>>> discouraged at how people misunderstand Flume. Even a site like 
>>>> https://www.predictiveanalyticstoday.com/data-ingestion-tools/ 
>>>> <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is 
>>>> misleading by copying our home page by just saying "Flume is a 
>>>> distributed, reliable, and available service for efficiently collecting, 
>>>> aggregating, and moving large amounts of log data” and then copying the 
>>>> image. This leads users to believe that Flume is only useful in a small 
>>>> set of use cases and is intimately tied to Hadoop.
>>>> 
>>>> I believe the home page should be changed to indicate say that "Flume is a 
>>>> distributed, reliable, and available service for efficiently collecting, 
>>>> aggregating, and streaming large amounts of data”, and then following up 
>>>> to indicate that it is appropriate to use to move any kind of streaming 
>>>> data such as application, audit, or system logs, real time events such as 
>>>> stock quotes, or user transaction records.
>>>> 
>>>> The second sentence should also be modified to say "It is robust and fault 
>>>> tolerant with tunable reliability mechanisms that can insure guaranteed 
>>>> delivery and many failover and recovery mechanisms”.
>>>> 
>>>> I also think the very first image should be modified to not show just a 
>>>> web application and HDFS as it seems to give people the impression that 
>>>> Flume is only usable with Hadoop or in web applications. Unfortunately, 
>>>> only the png seems to have been committed so redoing the diagram will mean 
>>>> starting from scratch.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Ralph
>>> 
>> 
> 
> 


Reply via email to