Re: kafka-streams configuration for high resilience (crashing when broker crashing) ?

2017-10-05 Thread Dmitriy Vsekhvalnov
Ok, we can try that.

Some other settings to try?

On Thu, Oct 5, 2017 at 20:42 Stas Chizhov  wrote:

> I would set it to Integer.MAX_VALUE
>
> 2017-10-05 19:29 GMT+02:00 Dmitriy Vsekhvalnov :
>
> > I see, but producer.retries set to 10 by default.
> >
> > What value would you recommend to survive random broker crashes?
> >
> > On Thu, Oct 5, 2017 at 8:24 PM, Stas Chizhov  wrote:
> >
> > > It is a producer config:
> > > https://docs.confluent.io/current/streams/developer-
> > > guide.html#non-streams-configuration-parameters
> > >
> > > 2017-10-05 19:12 GMT+02:00 Dmitriy Vsekhvalnov  >:
> > >
> > > > replication.factor set to match source topics. (3 in our case).
> > > >
> > > > what do you mean by retires? I don't see retries property in
> > StreamConfig
> > > > class.
> > > >
> > > > On Thu, Oct 5, 2017 at 7:55 PM, Stas Chizhov 
> > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Have you set replication.factor and retries properties?
> > > > >
> > > > > BR
> > > > >
> > > > > tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov <
> > > > > dvsekhval...@gmail.com
> > > > > >:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > we were testing Kafka cluster outages by randomly crashing broker
> > > nodes
> > > > > (1
> > > > > > of 3 for instance) while still keeping majority of replicas
> > > available.
> > > > > >
> > > > > > Time to time our kafka-stream app crashing with exception:
> > > > > >
> > > > > > [ERROR] [StreamThread-1]
> > > > > > [org.apache.kafka.streams.processor.internals.StreamThread]
> > > > > [stream-thread
> > > > > > [StreamThread-1] Failed while executing StreamTask 0_1 due to
> flush
> > > > > state:
> > > > > > ] org.apache.kafka.streams.errors.StreamsException: task [0_1]
> > > > exception
> > > > > > caught when producing at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
> > > > > checkForException(RecordCollectorImpl.java:121)
> > > > > > at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.
> > > > RecordCollectorImpl.flush(
> > > > > RecordCollectorImpl.java:129)
> > > > > > at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.
> > StreamTask.flushState(
> > > > > StreamTask.java:422)
> > > > > > at
> > > > > >
> > > > > >
> org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> > > > > StreamThread.java:555)
> > > > > > at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.
> > > > > StreamThread.performOnTasks(StreamThread.java:501)
> > > > > > at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.
> > > > StreamThread.flushAllState(
> > > > > StreamThread.java:551)
> > > > > > at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.StreamThread.
> > > > > shutdownTasksAndState(StreamThread.java:449)
> > > > > > at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.
> > StreamThread.shutdown(
> > > > > StreamThread.java:391)
> > > > > > at
> > > > > >
> > > > > > org.apache.kafka.streams.processor.internals.
> > > > > StreamThread.run(StreamThread.java:372)
> > > > > > Caused by: org.apache.kafka.common.errors.TimeoutException:
> > > Expiring 1
> > > > > > record(s) for
> > > > > >
> > > > > > audit-metrics-collector-store_context.operation-message.
> > > > > bucketStartMs-message.dc-repartition-0:
> > > > > > 30026 ms has passed since batch creation plus linger time
> > > > > >
> > > > > > after that clearly only restart of app and it continues
> processing.
> > > > > >
> > > > > > We believe it is correlated with our outage testing and question
> > is:
> > > > what
> > > > > > are recommended options for make kafka-stream application more
> > > > resilient
> > > > > to
> > > > > > broker crashes?
> > > > > >
> > > > > > Thank you.
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: kafka-streams configuration for high resilience (crashing when broker crashing) ?

2017-10-05 Thread Stas Chizhov
I would set it to Integer.MAX_VALUE

2017-10-05 19:29 GMT+02:00 Dmitriy Vsekhvalnov :

> I see, but producer.retries set to 10 by default.
>
> What value would you recommend to survive random broker crashes?
>
> On Thu, Oct 5, 2017 at 8:24 PM, Stas Chizhov  wrote:
>
> > It is a producer config:
> > https://docs.confluent.io/current/streams/developer-
> > guide.html#non-streams-configuration-parameters
> >
> > 2017-10-05 19:12 GMT+02:00 Dmitriy Vsekhvalnov :
> >
> > > replication.factor set to match source topics. (3 in our case).
> > >
> > > what do you mean by retires? I don't see retries property in
> StreamConfig
> > > class.
> > >
> > > On Thu, Oct 5, 2017 at 7:55 PM, Stas Chizhov 
> wrote:
> > >
> > > > Hi
> > > >
> > > > Have you set replication.factor and retries properties?
> > > >
> > > > BR
> > > >
> > > > tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov <
> > > > dvsekhval...@gmail.com
> > > > >:
> > > >
> > > > > Hi all,
> > > > >
> > > > > we were testing Kafka cluster outages by randomly crashing broker
> > nodes
> > > > (1
> > > > > of 3 for instance) while still keeping majority of replicas
> > available.
> > > > >
> > > > > Time to time our kafka-stream app crashing with exception:
> > > > >
> > > > > [ERROR] [StreamThread-1]
> > > > > [org.apache.kafka.streams.processor.internals.StreamThread]
> > > > [stream-thread
> > > > > [StreamThread-1] Failed while executing StreamTask 0_1 due to flush
> > > > state:
> > > > > ] org.apache.kafka.streams.errors.StreamsException: task [0_1]
> > > exception
> > > > > caught when producing at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
> > > > checkForException(RecordCollectorImpl.java:121)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.
> > > RecordCollectorImpl.flush(
> > > > RecordCollectorImpl.java:129)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.
> StreamTask.flushState(
> > > > StreamTask.java:422)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> > > > StreamThread.java:555)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.
> > > > StreamThread.performOnTasks(StreamThread.java:501)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.
> > > StreamThread.flushAllState(
> > > > StreamThread.java:551)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.StreamThread.
> > > > shutdownTasksAndState(StreamThread.java:449)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.
> StreamThread.shutdown(
> > > > StreamThread.java:391)
> > > > > at
> > > > >
> > > > > org.apache.kafka.streams.processor.internals.
> > > > StreamThread.run(StreamThread.java:372)
> > > > > Caused by: org.apache.kafka.common.errors.TimeoutException:
> > Expiring 1
> > > > > record(s) for
> > > > >
> > > > > audit-metrics-collector-store_context.operation-message.
> > > > bucketStartMs-message.dc-repartition-0:
> > > > > 30026 ms has passed since batch creation plus linger time
> > > > >
> > > > > after that clearly only restart of app and it continues processing.
> > > > >
> > > > > We believe it is correlated with our outage testing and question
> is:
> > > what
> > > > > are recommended options for make kafka-stream application more
> > > resilient
> > > > to
> > > > > broker crashes?
> > > > >
> > > > > Thank you.
> > > > >
> > > >
> > >
> >
>


Re: kafka-streams configuration for high resilience (crashing when broker crashing) ?

2017-10-05 Thread Dmitriy Vsekhvalnov
I see, but producer.retries set to 10 by default.

What value would you recommend to survive random broker crashes?

On Thu, Oct 5, 2017 at 8:24 PM, Stas Chizhov  wrote:

> It is a producer config:
> https://docs.confluent.io/current/streams/developer-
> guide.html#non-streams-configuration-parameters
>
> 2017-10-05 19:12 GMT+02:00 Dmitriy Vsekhvalnov :
>
> > replication.factor set to match source topics. (3 in our case).
> >
> > what do you mean by retires? I don't see retries property in StreamConfig
> > class.
> >
> > On Thu, Oct 5, 2017 at 7:55 PM, Stas Chizhov  wrote:
> >
> > > Hi
> > >
> > > Have you set replication.factor and retries properties?
> > >
> > > BR
> > >
> > > tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov <
> > > dvsekhval...@gmail.com
> > > >:
> > >
> > > > Hi all,
> > > >
> > > > we were testing Kafka cluster outages by randomly crashing broker
> nodes
> > > (1
> > > > of 3 for instance) while still keeping majority of replicas
> available.
> > > >
> > > > Time to time our kafka-stream app crashing with exception:
> > > >
> > > > [ERROR] [StreamThread-1]
> > > > [org.apache.kafka.streams.processor.internals.StreamThread]
> > > [stream-thread
> > > > [StreamThread-1] Failed while executing StreamTask 0_1 due to flush
> > > state:
> > > > ] org.apache.kafka.streams.errors.StreamsException: task [0_1]
> > exception
> > > > caught when producing at
> > > >
> > > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
> > > checkForException(RecordCollectorImpl.java:121)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.
> > RecordCollectorImpl.flush(
> > > RecordCollectorImpl.java:129)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.StreamTask.flushState(
> > > StreamTask.java:422)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> > > StreamThread.java:555)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.
> > > StreamThread.performOnTasks(StreamThread.java:501)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.
> > StreamThread.flushAllState(
> > > StreamThread.java:551)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.StreamThread.
> > > shutdownTasksAndState(StreamThread.java:449)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
> > > StreamThread.java:391)
> > > > at
> > > >
> > > > org.apache.kafka.streams.processor.internals.
> > > StreamThread.run(StreamThread.java:372)
> > > > Caused by: org.apache.kafka.common.errors.TimeoutException:
> Expiring 1
> > > > record(s) for
> > > >
> > > > audit-metrics-collector-store_context.operation-message.
> > > bucketStartMs-message.dc-repartition-0:
> > > > 30026 ms has passed since batch creation plus linger time
> > > >
> > > > after that clearly only restart of app and it continues processing.
> > > >
> > > > We believe it is correlated with our outage testing and question is:
> > what
> > > > are recommended options for make kafka-stream application more
> > resilient
> > > to
> > > > broker crashes?
> > > >
> > > > Thank you.
> > > >
> > >
> >
>


Re: kafka-streams configuration for high resilience (crashing when broker crashing) ?

2017-10-05 Thread Stas Chizhov
It is a producer config:
https://docs.confluent.io/current/streams/developer-guide.html#non-streams-configuration-parameters

2017-10-05 19:12 GMT+02:00 Dmitriy Vsekhvalnov :

> replication.factor set to match source topics. (3 in our case).
>
> what do you mean by retires? I don't see retries property in StreamConfig
> class.
>
> On Thu, Oct 5, 2017 at 7:55 PM, Stas Chizhov  wrote:
>
> > Hi
> >
> > Have you set replication.factor and retries properties?
> >
> > BR
> >
> > tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov <
> > dvsekhval...@gmail.com
> > >:
> >
> > > Hi all,
> > >
> > > we were testing Kafka cluster outages by randomly crashing broker nodes
> > (1
> > > of 3 for instance) while still keeping majority of replicas available.
> > >
> > > Time to time our kafka-stream app crashing with exception:
> > >
> > > [ERROR] [StreamThread-1]
> > > [org.apache.kafka.streams.processor.internals.StreamThread]
> > [stream-thread
> > > [StreamThread-1] Failed while executing StreamTask 0_1 due to flush
> > state:
> > > ] org.apache.kafka.streams.errors.StreamsException: task [0_1]
> exception
> > > caught when producing at
> > >
> > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
> > checkForException(RecordCollectorImpl.java:121)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.
> RecordCollectorImpl.flush(
> > RecordCollectorImpl.java:129)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.StreamTask.flushState(
> > StreamTask.java:422)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> > StreamThread.java:555)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.
> > StreamThread.performOnTasks(StreamThread.java:501)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.
> StreamThread.flushAllState(
> > StreamThread.java:551)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.StreamThread.
> > shutdownTasksAndState(StreamThread.java:449)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
> > StreamThread.java:391)
> > > at
> > >
> > > org.apache.kafka.streams.processor.internals.
> > StreamThread.run(StreamThread.java:372)
> > > Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1
> > > record(s) for
> > >
> > > audit-metrics-collector-store_context.operation-message.
> > bucketStartMs-message.dc-repartition-0:
> > > 30026 ms has passed since batch creation plus linger time
> > >
> > > after that clearly only restart of app and it continues processing.
> > >
> > > We believe it is correlated with our outage testing and question is:
> what
> > > are recommended options for make kafka-stream application more
> resilient
> > to
> > > broker crashes?
> > >
> > > Thank you.
> > >
> >
>


Re: kafka-streams configuration for high resilience (crashing when broker crashing) ?

2017-10-05 Thread Dmitriy Vsekhvalnov
replication.factor set to match source topics. (3 in our case).

what do you mean by retires? I don't see retries property in StreamConfig
class.

On Thu, Oct 5, 2017 at 7:55 PM, Stas Chizhov  wrote:

> Hi
>
> Have you set replication.factor and retries properties?
>
> BR
>
> tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov <
> dvsekhval...@gmail.com
> >:
>
> > Hi all,
> >
> > we were testing Kafka cluster outages by randomly crashing broker nodes
> (1
> > of 3 for instance) while still keeping majority of replicas available.
> >
> > Time to time our kafka-stream app crashing with exception:
> >
> > [ERROR] [StreamThread-1]
> > [org.apache.kafka.streams.processor.internals.StreamThread]
> [stream-thread
> > [StreamThread-1] Failed while executing StreamTask 0_1 due to flush
> state:
> > ] org.apache.kafka.streams.errors.StreamsException: task [0_1] exception
> > caught when producing at
> >
> > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
> checkForException(RecordCollectorImpl.java:121)
> > at
> >
> > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(
> RecordCollectorImpl.java:129)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamTask.flushState(
> StreamTask.java:422)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> StreamThread.java:555)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamThread.performOnTasks(StreamThread.java:501)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(
> StreamThread.java:551)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.
> shutdownTasksAndState(StreamThread.java:449)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
> StreamThread.java:391)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:372)
> > Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1
> > record(s) for
> >
> > audit-metrics-collector-store_context.operation-message.
> bucketStartMs-message.dc-repartition-0:
> > 30026 ms has passed since batch creation plus linger time
> >
> > after that clearly only restart of app and it continues processing.
> >
> > We believe it is correlated with our outage testing and question is: what
> > are recommended options for make kafka-stream application more resilient
> to
> > broker crashes?
> >
> > Thank you.
> >
>


Re: kafka-streams configuration for high resilience (crashing when broker crashing) ?

2017-10-05 Thread Stas Chizhov
Hi

Have you set replication.factor and retries properties?

BR

tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov :

> Hi all,
>
> we were testing Kafka cluster outages by randomly crashing broker nodes (1
> of 3 for instance) while still keeping majority of replicas available.
>
> Time to time our kafka-stream app crashing with exception:
>
> [ERROR] [StreamThread-1]
> [org.apache.kafka.streams.processor.internals.StreamThread] [stream-thread
> [StreamThread-1] Failed while executing StreamTask 0_1 due to flush state:
> ] org.apache.kafka.streams.errors.StreamsException: task [0_1] exception
> caught when producing at
>
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121)
> at
>
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129)
> at
>
> org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:422)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:555)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.performOnTasks(StreamThread.java:501)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:551)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:449)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:391)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:372)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1
> record(s) for
>
> audit-metrics-collector-store_context.operation-message.bucketStartMs-message.dc-repartition-0:
> 30026 ms has passed since batch creation plus linger time
>
> after that clearly only restart of app and it continues processing.
>
> We believe it is correlated with our outage testing and question is: what
> are recommended options for make kafka-stream application more resilient to
> broker crashes?
>
> Thank you.
>