Of course RE Volkan, my first sentence must have been poorly written. I
meant to say that I know who Volkan is and that he is great, but, that's
not the point. I was trying to say that a peronal GitHub account, in
general, can't be compared to an official Apache account.

Gary

On Fri, Dec 1, 2023, 12:25 PM Matt Sicker <m...@musigma.org> wrote:

> I may be biased, but I wouldn’t consider Volkan a random GitHub user!
> However, that does raise an interesting point: we could link to third party
> plugins and such to help curate it.
>
> > On Dec 1, 2023, at 5:00 AM, Gary Gregory <garydgreg...@gmail.com> wrote:
> >
> > Hi all,
> >
> > (Don't take his the wrong way Volkan ;-)
> > Assuming that I don't know who you are, why would I pick a random github
> > user's custom appender instead of an official Log4j appender? If your
> > appender is "battle-tested", why not move it to Log4j (or Redis?)
> >
> > Gary
> >
> >
> > On Fri, Dec 1, 2023, 4:08 AM Volkan Yazıcı <vol...@yazi.ci> wrote:
> >
> >> I appreciate your thoughts on this subject. We can eventually convert
> this
> >> into a chapter in the Log4j manual. My goal is to be able to make a
> >> statement as follows:
> >>
> >> *When Log4j is configured with Y, Y, Z settings, it can provide
> guaranteed
> >> delivery against certain types of log sinks such as A, B, C.*
> >>
> >> *A – You need to make sure A has ... feature enabled. Further, it has
> ...
> >> caveat.*
> >> *B – You need to make sure B has ... feature enabled and ...*
> >> *C – ...*
> >>
> >>
> >> That is, a cookbook for users with recipes for guaranteed delivery.
> >>
> >> [I respond to your message below inline.]
> >>
> >> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com
> >
> >> wrote:
> >>
> >>> Notice that neither of the links you have provided use the term
> >>> “guaranteed delivery”. That is because that is not really what they are
> >>> providing. In addition, notice that Logstash says "Input plugins that
> do
> >>> not use a request-response protocol cannot be protected from data
> loss”,
> >>
> >>
> >> But see the rest of that statement
> >> <
> >>
> https://www.elastic.co/guide/en/logstash/current/persistent-queues.html#persistent-queues-limitations
> >>>
> >> : *"Plugins such as beats and http, which do have an acknowledgement
> >> capability, are well protected by this [Logstash persistent] queue."*
> >>
> >>
> >>> and "Data may be lost if an abnormal shutdown occurs before the
> >> checkpoint
> >>> file has been committed”.
> >>
> >>
> >> See the following statement further down in that page: *"To avoid losing
> >> data in the persistent queue, you can set `queue.checkpoint.writes: 1`
> to
> >> force a checkpoint after each event is written."*
> >>
> >> These two make me conclude that, if configured correctly (e.g., using
> >> `http` plugin in combination with `queue.checkpoint.writes: 1`),
> Logstash
> >> can deliver guaranteed delivery. Am I mistaken?
> >>
> >>
> >>> As for using Google Cloud that would default the whole point. If your
> >> data
> >>> center has lost contact with the outside world it won’t be able to get
> to
> >>> Google Cloud.
> >>>
> >>
> >> But that cannot be an argument against using Google Cloud as a log sink
> >> with guaranteed delivery. An in-house Flume server can go down too. Let
> me
> >> know if I miss your point here.
> >>
> >>
> >>> While Redis would work it would require a) an application component
> that
> >>> interacts with Redis such as a Redis Appender (which we don’t have) b)
> a
> >>> Redis deployment c) a Logstash (or some other Redis consumer) to
> forward
> >>> the event. It is a lot simpler to configure Flume than to do all of
> that.
> >>>
> >>
> >> For one, there is a battle-tested Log4j Redis Appender
> >> <https://github.com/vy/log4j2-redis-appender>, which enabled us to
> remove
> >> `log4j-redis` in `main`.
> >>
> >> Indeed, Flume can deliver what Redis+Logstash do. Though my point is,
> not
> >> that Redis has a magical feature set, but there *are* several log sink
> >> stacks one can build using modern stock components and provide
> guaranteed
> >> delivery. I would like to document some of those, if not best-practices,
> >> known-to-work solutions. This way we can enable our users to make a
> >> well-informed decision and pick the best approach that fits into their
> >> existing stack.
> >>
> >> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com
> >
> >> wrote:
> >>
> >>> Volkan,
> >>>
> >>> Notice that neither of the links you have provided use the term
> >>> “guaranteed delivery”. That is because that is not really what they are
> >>> providing. In addition, notice that Logstash says "Input plugins that
> do
> >>> not use a request-response protocol cannot be protected from data
> loss”,
> >>> and "Data may be lost if an abnormal shutdown occurs before the
> >> checkpoint
> >>> file has been committed”. Note that Flume’s FileChannel does not face
> the
> >>> second issue while the first would also be a problem if it is using a
> >>> source that doesn’t support acknowledgements.However, Log4j’s
> >> FlumeAppender
> >>> always gets acks.
> >>>
> >>> To make this clearer let me review the architecture for my
> implementation
> >>> again.
> >>>
> >>> First the phone system maintains a list of ip addresses that can handle
> >>> Radius accounting records. We host 2 Flume servers in the same data
> >> center
> >>> as the phone system and configure the phone system with their ip
> >> addresses.
> >>> The Radius records will be sent to those Flume servers which will
> accept
> >>> them with our custom Radius Source. That converts them to JSON and
> passes
> >>> the JSON to the File Channel. Once the File Channel has written them to
> >>> disk the source responds back to the phone system with an ACK that the
> >>> record was received. It the record is not processed quickly enough (I
> >>> believe it is 100ms) then the phone system will try a different ip
> >> address
> >>> assuming it couldn’t be delivered by the first server.  Another thread
> >>> reads the records from the File Channel and sends them to a Flume in a
> >>> different data center for processing. This follows the same pattern.
> The
> >>> Avro Sink serializes the record and sends it to the target Flume. That
> >>> Flume writes the record to a File channel and the Avro Source responds
> >> with
> >>> an ACK that the record was received, at which point the originating
> Flume
> >>> will remove it from the File Channel.
> >>>
> >>> If you will notice, the application itself knows that delivery is
> >>> guaranteed because it gets an ACK to say so. Due to this, Filbeat
> cannot
> >>> possibly implement guaranteed delivery. The application will expect
> that
> >>> once it writes to a file or to System.out delivery is guaranteed, which
> >>> really cannot be true.
> >>>
> >>> As for using Google Cloud that would default the whole point. If your
> >> data
> >>> center has lost contact with the outside world it won’t be able to get
> to
> >>> Google Cloud.
> >>>
> >>> While Redis would work it would require a) an application component
> that
> >>> interacts with Redis such as a Redis Appender (which we don’t have) b)
> a
> >>> Redis deployment c) a Logstash (or some other Redis consumer) to
> forward
> >>> the event. It is a lot simpler to configure Flume than to do all of
> that.
> >>>
> >>> Ralph
> >>>
> >>>
> >>>> On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı <vol...@yazi.ci> wrote:
> >>>>
> >>>> Ralph, could you elaborate on your response, please? AFAIK, Logstash
> >> and
> >>> Filebeat provide guaranteed delivery, if configured correctly. As a
> >> matter
> >>> of fact they have docs (here and here) explaining how to do it –
> >> actually,
> >>> there are several ways on how to do it. What makes you think they don't
> >>> provide guaranteed delivery?
> >>>>
> >>>> I have implemented two different types of logging pipelines with
> >>> guaranteed delivery:
> >>>>    •
> >>>> Using a Google Cloud BigQuery appender
> >>>>    • Using a Redis appender (Redis queue is ingested to Elasticsearch
> >>> through Logstash)
> >>>> I want to learn where I can potentially violate the delivery
> guarantee.
> >>>>
> >>>> On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers <
> >> ralph.go...@dslextreme.com>
> >>> wrote:
> >>>> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
> >>> log forwarding. While they all have some amount of plugability none of
> >> the
> >>> are as flexible as Flume. In addition, as I have mentioned before, none
> >> of
> >>> them provide guaranteed delivery so I would never recommend them for
> >>> forwarding audit logs.
> >>>
> >>>
> >>
>
>

Reply via email to