I’d certainly be in favor of updating the project description to be more general. That said, part of Flume’s value proposition is integration with a bunch of components off the shelf and the main ones it ships are Hadoop ecosystem components, so we shouldn’t completely ignore that when describing the project.
Regarding the memory channel perf issues you observed, did you do any profiling? Do you think part of the issue could be Java GC? The memory channel tends to allocate and reclaim a lot of memory in a short period of time. Mike Sent from my iPhone > On Apr 28, 2019, at 11:35 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote: > > What I am seeing is that people go to the home page and cut the first > paragraph as a description of Flume. All I am really proposing is that we > change that to more effectively describe Flume. The description that is there > is accurate but minimal. I would just like to rephrase that paragraph to give > a more complete description of what Flume can be used for. > > As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. In > doing that I have done some crude benchmarking which you can see at > http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance > <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. > I was quite surprised the performance of the Flume Embedded Appender with a > memory channel. I would have expected it to be more in line with the Async > Loggers and at the most in line with the Rolling File Appender since the > event is essentially handed to another thread to be processed. It would be > nice to see Flume be able to recommended for use as a log > forwarder/aggregator for all apps with Docker instead of just when guaranteed > delivery is required and I would love to upgrade the Flume documentation to > describe how to do that. > > Ralph > >> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <bes...@apache.org> >> wrote: >> >> I agree that marketing could be improved and I support finding a >> slogan that represents best what Flume is today. >> I am not sure about the wording that has been proposed, though. Can >> you please elaborate, Ralph? >> >> >> Thank you, >> >> Donat >> >>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ralph.go...@dslextreme.com> >>> wrote: >>> >>> When I read sites like >>> https://www.slant.co/versus/959/960/~fluentd_vs_flume >>> <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit >>> discouraged at how people misunderstand Flume. Even a site like >>> https://www.predictiveanalyticstoday.com/data-ingestion-tools/ >>> <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is >>> misleading by copying our home page by just saying "Flume is a distributed, >>> reliable, and available service for efficiently collecting, aggregating, >>> and moving large amounts of log data” and then copying the image. This >>> leads users to believe that Flume is only useful in a small set of use >>> cases and is intimately tied to Hadoop. >>> >>> I believe the home page should be changed to indicate say that "Flume is a >>> distributed, reliable, and available service for efficiently collecting, >>> aggregating, and streaming large amounts of data”, and then following up to >>> indicate that it is appropriate to use to move any kind of streaming data >>> such as application, audit, or system logs, real time events such as stock >>> quotes, or user transaction records. >>> >>> The second sentence should also be modified to say "It is robust and fault >>> tolerant with tunable reliability mechanisms that can insure guaranteed >>> delivery and many failover and recovery mechanisms”. >>> >>> I also think the very first image should be modified to not show just a web >>> application and HDFS as it seems to give people the impression that Flume >>> is only usable with Hadoop or in web applications. Unfortunately, only the >>> png seems to have been committed so redoing the diagram will mean starting >>> from scratch. >>> >>> Thoughts? >>> >>> Ralph >> >