Re: Logstash and Filebeat guaranteed delivery

Matt Sicker Fri, 01 Dec 2023 09:25:13 -0800

I may be biased, but I wouldn’t consider Volkan a random GitHub user! However, 
that does raise an interesting point: we could link to third party plugins and 
such to help curate it.


> On Dec 1, 2023, at 5:00 AM, Gary Gregory <garydgreg...@gmail.com> wrote:
> 
> Hi all,
> 
> (Don't take his the wrong way Volkan ;-)
> Assuming that I don't know who you are, why would I pick a random github
> user's custom appender instead of an official Log4j appender? If your
> appender is "battle-tested", why not move it to Log4j (or Redis?)
> 
> Gary
> 
> 
> On Fri, Dec 1, 2023, 4:08 AM Volkan Yazıcı <vol...@yazi.ci> wrote:
> 
>> I appreciate your thoughts on this subject. We can eventually convert this
>> into a chapter in the Log4j manual. My goal is to be able to make a
>> statement as follows:
>> 
>> *When Log4j is configured with Y, Y, Z settings, it can provide guaranteed
>> delivery against certain types of log sinks such as A, B, C.*
>> 
>> *A – You need to make sure A has ... feature enabled. Further, it has ...
>> caveat.*
>> *B – You need to make sure B has ... feature enabled and ...*
>> *C – ...*
>> 
>> 
>> That is, a cookbook for users with recipes for guaranteed delivery.
>> 
>> [I respond to your message below inline.]
>> 
>> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com>
>> wrote:
>> 
>>> Notice that neither of the links you have provided use the term
>>> “guaranteed delivery”. That is because that is not really what they are
>>> providing. In addition, notice that Logstash says "Input plugins that do
>>> not use a request-response protocol cannot be protected from data loss”,
>> 
>> 
>> But see the rest of that statement
>> <
>> https://www.elastic.co/guide/en/logstash/current/persistent-queues.html#persistent-queues-limitations
>>> 
>> : *"Plugins such as beats and http, which do have an acknowledgement
>> capability, are well protected by this [Logstash persistent] queue."*
>> 
>> 
>>> and "Data may be lost if an abnormal shutdown occurs before the
>> checkpoint
>>> file has been committed”.
>> 
>> 
>> See the following statement further down in that page: *"To avoid losing
>> data in the persistent queue, you can set `queue.checkpoint.writes: 1` to
>> force a checkpoint after each event is written."*
>> 
>> These two make me conclude that, if configured correctly (e.g., using
>> `http` plugin in combination with `queue.checkpoint.writes: 1`), Logstash
>> can deliver guaranteed delivery. Am I mistaken?
>> 
>> 
>>> As for using Google Cloud that would default the whole point. If your
>> data
>>> center has lost contact with the outside world it won’t be able to get to
>>> Google Cloud.
>>> 
>> 
>> But that cannot be an argument against using Google Cloud as a log sink
>> with guaranteed delivery. An in-house Flume server can go down too. Let me
>> know if I miss your point here.
>> 
>> 
>>> While Redis would work it would require a) an application component that
>>> interacts with Redis such as a Redis Appender (which we don’t have) b) a
>>> Redis deployment c) a Logstash (or some other Redis consumer) to forward
>>> the event. It is a lot simpler to configure Flume than to do all of that.
>>> 
>> 
>> For one, there is a battle-tested Log4j Redis Appender
>> <https://github.com/vy/log4j2-redis-appender>, which enabled us to remove
>> `log4j-redis` in `main`.
>> 
>> Indeed, Flume can deliver what Redis+Logstash do. Though my point is, not
>> that Redis has a magical feature set, but there *are* several log sink
>> stacks one can build using modern stock components and provide guaranteed
>> delivery. I would like to document some of those, if not best-practices,
>> known-to-work solutions. This way we can enable our users to make a
>> well-informed decision and pick the best approach that fits into their
>> existing stack.
>> 
>> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers <ralph.go...@dslextreme.com>
>> wrote:
>> 
>>> Volkan,
>>> 
>>> Notice that neither of the links you have provided use the term
>>> “guaranteed delivery”. That is because that is not really what they are
>>> providing. In addition, notice that Logstash says "Input plugins that do
>>> not use a request-response protocol cannot be protected from data loss”,
>>> and "Data may be lost if an abnormal shutdown occurs before the
>> checkpoint
>>> file has been committed”. Note that Flume’s FileChannel does not face the
>>> second issue while the first would also be a problem if it is using a
>>> source that doesn’t support acknowledgements.However, Log4j’s
>> FlumeAppender
>>> always gets acks.
>>> 
>>> To make this clearer let me review the architecture for my implementation
>>> again.
>>> 
>>> First the phone system maintains a list of ip addresses that can handle
>>> Radius accounting records. We host 2 Flume servers in the same data
>> center
>>> as the phone system and configure the phone system with their ip
>> addresses.
>>> The Radius records will be sent to those Flume servers which will accept
>>> them with our custom Radius Source. That converts them to JSON and passes
>>> the JSON to the File Channel. Once the File Channel has written them to
>>> disk the source responds back to the phone system with an ACK that the
>>> record was received. It the record is not processed quickly enough (I
>>> believe it is 100ms) then the phone system will try a different ip
>> address
>>> assuming it couldn’t be delivered by the first server.  Another thread
>>> reads the records from the File Channel and sends them to a Flume in a
>>> different data center for processing. This follows the same pattern. The
>>> Avro Sink serializes the record and sends it to the target Flume. That
>>> Flume writes the record to a File channel and the Avro Source responds
>> with
>>> an ACK that the record was received, at which point the originating Flume
>>> will remove it from the File Channel.
>>> 
>>> If you will notice, the application itself knows that delivery is
>>> guaranteed because it gets an ACK to say so. Due to this, Filbeat cannot
>>> possibly implement guaranteed delivery. The application will expect that
>>> once it writes to a file or to System.out delivery is guaranteed, which
>>> really cannot be true.
>>> 
>>> As for using Google Cloud that would default the whole point. If your
>> data
>>> center has lost contact with the outside world it won’t be able to get to
>>> Google Cloud.
>>> 
>>> While Redis would work it would require a) an application component that
>>> interacts with Redis such as a Redis Appender (which we don’t have) b) a
>>> Redis deployment c) a Logstash (or some other Redis consumer) to forward
>>> the event. It is a lot simpler to configure Flume than to do all of that.
>>> 
>>> Ralph
>>> 
>>> 
>>>> On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı <vol...@yazi.ci> wrote:
>>>> 
>>>> Ralph, could you elaborate on your response, please? AFAIK, Logstash
>> and
>>> Filebeat provide guaranteed delivery, if configured correctly. As a
>> matter
>>> of fact they have docs (here and here) explaining how to do it –
>> actually,
>>> there are several ways on how to do it. What makes you think they don't
>>> provide guaranteed delivery?
>>>> 
>>>> I have implemented two different types of logging pipelines with
>>> guaranteed delivery:
>>>>    •
>>>> Using a Google Cloud BigQuery appender
>>>>    • Using a Redis appender (Redis queue is ingested to Elasticsearch
>>> through Logstash)
>>>> I want to learn where I can potentially violate the delivery guarantee.
>>>> 
>>>> On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers <
>> ralph.go...@dslextreme.com>
>>> wrote:
>>>> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
>>> log forwarding. While they all have some amount of plugability none of
>> the
>>> are as flexible as Flume. In addition, as I have mentioned before, none
>> of
>>> them provide guaranteed delivery so I would never recommend them for
>>> forwarding audit logs.
>>> 
>>> 
>>

Re: Logstash and Filebeat guaranteed delivery

Reply via email to