I appreciate your thoughts on this subject. We can eventually convert this
into a chapter in the Log4j manual. My goal is to be able to make a
statement as follows:

*When Log4j is configured with Y, Y, Z settings, it can provide guaranteed
delivery against certain types of log sinks such as A, B, C.*

*A – You need to make sure A has ... feature enabled. Further, it has ...
caveat.*
*B – You need to make sure B has ... feature enabled and ...*
*C – ...*


That is, a cookbook for users with recipes for guaranteed delivery.

[I respond to your message below inline.]

On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com>
wrote:

> Notice that neither of the links you have provided use the term
> “guaranteed delivery”. That is because that is not really what they are
> providing. In addition, notice that Logstash says "Input plugins that do
> not use a request-response protocol cannot be protected from data loss”,


But see the rest of that statement
<https://www.elastic.co/guide/en/logstash/current/persistent-queues.html#persistent-queues-limitations>
: *"Plugins such as beats and http, which do have an acknowledgement
capability, are well protected by this [Logstash persistent] queue."*


> and "Data may be lost if an abnormal shutdown occurs before the checkpoint
> file has been committed”.


See the following statement further down in that page: *"To avoid losing
data in the persistent queue, you can set `queue.checkpoint.writes: 1` to
force a checkpoint after each event is written."*

These two make me conclude that, if configured correctly (e.g., using
`http` plugin in combination with `queue.checkpoint.writes: 1`), Logstash
can deliver guaranteed delivery. Am I mistaken?


> As for using Google Cloud that would default the whole point. If your data
> center has lost contact with the outside world it won’t be able to get to
> Google Cloud.
>

But that cannot be an argument against using Google Cloud as a log sink
with guaranteed delivery. An in-house Flume server can go down too. Let me
know if I miss your point here.


> While Redis would work it would require a) an application component that
> interacts with Redis such as a Redis Appender (which we don’t have) b) a
> Redis deployment c) a Logstash (or some other Redis consumer) to forward
> the event. It is a lot simpler to configure Flume than to do all of that.
>

For one, there is a battle-tested Log4j Redis Appender
<https://github.com/vy/log4j2-redis-appender>, which enabled us to remove
`log4j-redis` in `main`.

Indeed, Flume can deliver what Redis+Logstash do. Though my point is, not
that Redis has a magical feature set, but there *are* several log sink
stacks one can build using modern stock components and provide guaranteed
delivery. I would like to document some of those, if not best-practices,
known-to-work solutions. This way we can enable our users to make a
well-informed decision and pick the best approach that fits into their
existing stack.

On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com>
wrote:

> Volkan,
>
> Notice that neither of the links you have provided use the term
> “guaranteed delivery”. That is because that is not really what they are
> providing. In addition, notice that Logstash says "Input plugins that do
> not use a request-response protocol cannot be protected from data loss”,
> and "Data may be lost if an abnormal shutdown occurs before the checkpoint
> file has been committed”. Note that Flume’s FileChannel does not face the
> second issue while the first would also be a problem if it is using a
> source that doesn’t support acknowledgements.However, Log4j’s FlumeAppender
> always gets acks.
>
> To make this clearer let me review the architecture for my implementation
> again.
>
> First the phone system maintains a list of ip addresses that can handle
> Radius accounting records. We host 2 Flume servers in the same data center
> as the phone system and configure the phone system with their ip addresses.
> The Radius records will be sent to those Flume servers which will accept
> them with our custom Radius Source. That converts them to JSON and passes
> the JSON to the File Channel. Once the File Channel has written them to
> disk the source responds back to the phone system with an ACK that the
> record was received. It the record is not processed quickly enough (I
> believe it is 100ms) then the phone system will try a different ip address
> assuming it couldn’t be delivered by the first server.  Another thread
> reads the records from the File Channel and sends them to a Flume in a
> different data center for processing. This follows the same pattern. The
> Avro Sink serializes the record and sends it to the target Flume. That
> Flume writes the record to a File channel and the Avro Source responds with
> an ACK that the record was received, at which point the originating Flume
> will remove it from the File Channel.
>
> If you will notice, the application itself knows that delivery is
> guaranteed because it gets an ACK to say so. Due to this, Filbeat cannot
> possibly implement guaranteed delivery. The application will expect that
> once it writes to a file or to System.out delivery is guaranteed, which
> really cannot be true.
>
> As for using Google Cloud that would default the whole point. If your data
> center has lost contact with the outside world it won’t be able to get to
> Google Cloud.
>
> While Redis would work it would require a) an application component that
> interacts with Redis such as a Redis Appender (which we don’t have) b) a
> Redis deployment c) a Logstash (or some other Redis consumer) to forward
> the event. It is a lot simpler to configure Flume than to do all of that.
>
> Ralph
>
>
> > On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı <vol...@yazi.ci> wrote:
> >
> > Ralph, could you elaborate on your response, please? AFAIK, Logstash and
> Filebeat provide guaranteed delivery, if configured correctly. As a matter
> of fact they have docs (here and here) explaining how to do it – actually,
> there are several ways on how to do it. What makes you think they don't
> provide guaranteed delivery?
> >
> > I have implemented two different types of logging pipelines with
> guaranteed delivery:
> >     •
> > Using a Google Cloud BigQuery appender
> >     • Using a Redis appender (Redis queue is ingested to Elasticsearch
> through Logstash)
> > I want to learn where I can potentially violate the delivery guarantee.
> >
> > On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers <ralph.go...@dslextreme.com>
> wrote:
> > Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
> log forwarding. While they all have some amount of plugability none of the
> are as flexible as Flume. In addition, as I have mentioned before, none of
> them provide guaranteed delivery so I would never recommend them for
> forwarding audit logs.
>
>

Reply via email to