I may be biased, but I wouldn’t consider Volkan a random GitHub user! However, that does raise an interesting point: we could link to third party plugins and such to help curate it.
> On Dec 1, 2023, at 5:00 AM, Gary Gregory <garydgreg...@gmail.com> wrote: > > Hi all, > > (Don't take his the wrong way Volkan ;-) > Assuming that I don't know who you are, why would I pick a random github > user's custom appender instead of an official Log4j appender? If your > appender is "battle-tested", why not move it to Log4j (or Redis?) > > Gary > > > On Fri, Dec 1, 2023, 4:08 AM Volkan Yazıcı <vol...@yazi.ci> wrote: > >> I appreciate your thoughts on this subject. We can eventually convert this >> into a chapter in the Log4j manual. My goal is to be able to make a >> statement as follows: >> >> *When Log4j is configured with Y, Y, Z settings, it can provide guaranteed >> delivery against certain types of log sinks such as A, B, C.* >> >> *A – You need to make sure A has ... feature enabled. Further, it has ... >> caveat.* >> *B – You need to make sure B has ... feature enabled and ...* >> *C – ...* >> >> >> That is, a cookbook for users with recipes for guaranteed delivery. >> >> [I respond to your message below inline.] >> >> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com> >> wrote: >> >>> Notice that neither of the links you have provided use the term >>> “guaranteed delivery”. That is because that is not really what they are >>> providing. In addition, notice that Logstash says "Input plugins that do >>> not use a request-response protocol cannot be protected from data loss”, >> >> >> But see the rest of that statement >> < >> https://www.elastic.co/guide/en/logstash/current/persistent-queues.html#persistent-queues-limitations >>> >> : *"Plugins such as beats and http, which do have an acknowledgement >> capability, are well protected by this [Logstash persistent] queue."* >> >> >>> and "Data may be lost if an abnormal shutdown occurs before the >> checkpoint >>> file has been committed”. >> >> >> See the following statement further down in that page: *"To avoid losing >> data in the persistent queue, you can set `queue.checkpoint.writes: 1` to >> force a checkpoint after each event is written."* >> >> These two make me conclude that, if configured correctly (e.g., using >> `http` plugin in combination with `queue.checkpoint.writes: 1`), Logstash >> can deliver guaranteed delivery. Am I mistaken? >> >> >>> As for using Google Cloud that would default the whole point. If your >> data >>> center has lost contact with the outside world it won’t be able to get to >>> Google Cloud. >>> >> >> But that cannot be an argument against using Google Cloud as a log sink >> with guaranteed delivery. An in-house Flume server can go down too. Let me >> know if I miss your point here. >> >> >>> While Redis would work it would require a) an application component that >>> interacts with Redis such as a Redis Appender (which we don’t have) b) a >>> Redis deployment c) a Logstash (or some other Redis consumer) to forward >>> the event. It is a lot simpler to configure Flume than to do all of that. >>> >> >> For one, there is a battle-tested Log4j Redis Appender >> <https://github.com/vy/log4j2-redis-appender>, which enabled us to remove >> `log4j-redis` in `main`. >> >> Indeed, Flume can deliver what Redis+Logstash do. Though my point is, not >> that Redis has a magical feature set, but there *are* several log sink >> stacks one can build using modern stock components and provide guaranteed >> delivery. I would like to document some of those, if not best-practices, >> known-to-work solutions. This way we can enable our users to make a >> well-informed decision and pick the best approach that fits into their >> existing stack. >> >> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com> >> wrote: >> >>> Volkan, >>> >>> Notice that neither of the links you have provided use the term >>> “guaranteed delivery”. That is because that is not really what they are >>> providing. In addition, notice that Logstash says "Input plugins that do >>> not use a request-response protocol cannot be protected from data loss”, >>> and "Data may be lost if an abnormal shutdown occurs before the >> checkpoint >>> file has been committed”. Note that Flume’s FileChannel does not face the >>> second issue while the first would also be a problem if it is using a >>> source that doesn’t support acknowledgements.However, Log4j’s >> FlumeAppender >>> always gets acks. >>> >>> To make this clearer let me review the architecture for my implementation >>> again. >>> >>> First the phone system maintains a list of ip addresses that can handle >>> Radius accounting records. We host 2 Flume servers in the same data >> center >>> as the phone system and configure the phone system with their ip >> addresses. >>> The Radius records will be sent to those Flume servers which will accept >>> them with our custom Radius Source. That converts them to JSON and passes >>> the JSON to the File Channel. Once the File Channel has written them to >>> disk the source responds back to the phone system with an ACK that the >>> record was received. It the record is not processed quickly enough (I >>> believe it is 100ms) then the phone system will try a different ip >> address >>> assuming it couldn’t be delivered by the first server. Another thread >>> reads the records from the File Channel and sends them to a Flume in a >>> different data center for processing. This follows the same pattern. The >>> Avro Sink serializes the record and sends it to the target Flume. That >>> Flume writes the record to a File channel and the Avro Source responds >> with >>> an ACK that the record was received, at which point the originating Flume >>> will remove it from the File Channel. >>> >>> If you will notice, the application itself knows that delivery is >>> guaranteed because it gets an ACK to say so. Due to this, Filbeat cannot >>> possibly implement guaranteed delivery. The application will expect that >>> once it writes to a file or to System.out delivery is guaranteed, which >>> really cannot be true. >>> >>> As for using Google Cloud that would default the whole point. If your >> data >>> center has lost contact with the outside world it won’t be able to get to >>> Google Cloud. >>> >>> While Redis would work it would require a) an application component that >>> interacts with Redis such as a Redis Appender (which we don’t have) b) a >>> Redis deployment c) a Logstash (or some other Redis consumer) to forward >>> the event. It is a lot simpler to configure Flume than to do all of that. >>> >>> Ralph >>> >>> >>>> On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı <vol...@yazi.ci> wrote: >>>> >>>> Ralph, could you elaborate on your response, please? AFAIK, Logstash >> and >>> Filebeat provide guaranteed delivery, if configured correctly. As a >> matter >>> of fact they have docs (here and here) explaining how to do it – >> actually, >>> there are several ways on how to do it. What makes you think they don't >>> provide guaranteed delivery? >>>> >>>> I have implemented two different types of logging pipelines with >>> guaranteed delivery: >>>> • >>>> Using a Google Cloud BigQuery appender >>>> • Using a Redis appender (Redis queue is ingested to Elasticsearch >>> through Logstash) >>>> I want to learn where I can potentially violate the delivery guarantee. >>>> >>>> On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers < >> ralph.go...@dslextreme.com> >>> wrote: >>>> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for >>> log forwarding. While they all have some amount of plugability none of >> the >>> are as flexible as Flume. In addition, as I have mentioned before, none >> of >>> them provide guaranteed delivery so I would never recommend them for >>> forwarding audit logs. >>> >>> >>