Re: does kafka streams guarantee EOS with external producer used in application?

2021-07-15 Thread Matthias J. Sax
The app cannot know. That is why you need to use sync-writes. Kafka
Streams won't commit offset at long as custom code is executed, thus, if
you call `producer.send()` and wait to the ack and block within
`Processor.process()` you can be sure that no commit happens in between,
ie., you can be sure that yoru write will succeed, before the next
commit may happen.

Yes, there can be multiple sink nodes.


-Matthias


On 7/14/21 10:09 PM, Pushkar Deole wrote:
> That's exactly what my question was: since there is an external producer in
> the application without a sink node in topology, how will the streams know
> that task is completed before committing the offset or it will not know at
> all ?
> 
> Second question is: can there be multiple sink nodes if the record is to be
> produced on different topics based on some conditions (e.g. just giving a
> simplified example:  if record contains field A then produce on topic A,
> field B then produce on Topic B etc.)
> 
> On Thu, Jul 15, 2021 at 4:29 AM Matthias J. Sax  wrote:
> 
>> Yes, if you use async writes, it could lead to data loss in case if
>> failure as offsets could have been committed before the write succeeded.
>> Only sync writes guard you from data loss.
>>
>> Note though that in Kafka Streams there is not manual commit anyway.
>> Commits happen based on `commit.interval.ms` config. Calling
>> `context.commit()` only schedules an earlier commit, but after the call
>> returned, no commit happened yet (just a request to commit asap was
>> registered).
>>
>>
>> -Matthias
>>
>> On 7/14/21 12:00 AM, Pushkar Deole wrote:
>>> If async-writes are used either with manual or auto commit, where the
>>> record is sent for async write and consumer thread commits the offset
>>> immediately, however async write fails,
>>>
>>> this may cause the loss of event ?
>>>
>>> On Wed, Jul 14, 2021 at 12:20 AM Matthias J. Sax 
>> wrote:
>>>
 If you want to use EOS, you cannot use your own producer, but you need
 to use a "sink node" (via `addSink()` or `to()`).

 For at-least-once, if you use sync-writes, you should still get
 at-least-once guarantees.

 -Matthias

 On 7/9/21 4:12 AM, Pushkar Deole wrote:
> Matthias,
>
> Do you have any inputs on above queries?
>
> On Wed, Jun 30, 2021 at 7:15 PM Pushkar Deole 
 wrote:
>
>> Hi,
>>
>> Our application uses kafka streams that reads from a source topic,
>> does
>> processing on records and produces processed record on destination
>> topic
>> through the use of external producer i.e. the producer created via
>> kafka
>> producer API.
>>
>> Does this model still guarantee exactly once semantic or it won't?
>>
>> Currently we are using at_least_once, however the question is how
 streams
>> handles offset commits here?
>> Though the event is produced using synchronous API, could there be
>> possibility of event loss in case streams commit offset before
>> external
>> producer hasn't produced event on destination topic yet?
>>
>

>>>
>>
> 


Re: Confluent's parallel consumer

2021-07-15 Thread Israel Ekpo
Hi Pushkar,

Based on what I understand about the library, I don't think you need to
worry about data loss because there are mechanisms in place to track which
offsets have been processed in the event something goes wrong during
processing.

If the processing client goes offline or is unresponsive and has to be
resumed or restarted, it should restart from where it left off since the
offsets are being tracked.

https://www.confluent.io/events/kafka-summit-europe-2021/introducing-confluent-labs-parallel-consumer-client/

This should be true regardless of the ordering strategy you select.

Take a look at the talk from the most recent Kafka Summit for additional
details on this

The github documentation for the project also covers this briefly

https://github.com/confluentinc/parallel-consumer/blob/master/README.adoc#commit-mode

I hope this was helpful. If you still have additional questions, please
continue to bring it up.

Thanks.

On Thu, Jul 15, 2021 at 11:21 AM Pushkar Deole  wrote:

> It is consumer client node that has received events and is processing
> those...
>
> On Thu, Jul 15, 2021 at 8:49 PM Israel Ekpo  wrote:
>
> > Hi Pushkar,
> >
> > When you use the term “node/instance” are you referring to the Kafka
> > Brokers or the consuming clients that are retrieving events from the
> > broker?
> >
> > Please could you elaborate/clarify?
> >
> > On Thu, Jul 15, 2021 at 10:00 AM Pushkar Deole 
> > wrote:
> >
> > > Well... with key-level ordering, i am mainly concerned about event
> loss,
> > if
> > > any, in below mentioned scenario:
> > >
> > > 1. since event1 with key1 and event2 with key2 are both part of the
> same
> > > partition1
> > > 2. key1 event has offset 30 while key2 has offset 40
> > > 3. key2 is processed by background thread and offset committed which is
> > 40
> > > 4. before key1 gets processed by background thread, the instance/node
> > goes
> > > down
> > > 5. partition1 gets rebalanced to node2 and start processing ahead of
> > offset
> > > 40, thus losing key1
> > >
> > >
> > >
> > > On Thu, Jul 15, 2021 at 7:18 PM Israel Ekpo 
> > wrote:
> > >
> > > > Hi Pushkar,
> > > >
> > > > If you are selecting key-based ordering, you should not be concerned
> > > about
> > > > the other keys from the same partitions being committed first
> > > >
> > > > If that is a concern for your use cases then you should go with
> > partition
> > > > based ordering to ensure that the events are processed in the
> sequence
> > > they
> > > > are picked up from the topic partition.
> > > >
> > > > For commit mode, you have the asynchronous, synchronous and
> > transactional
> > > > modes. I think if you are concerned with the order of commits you
> > should
> > > > look into the last two modes.
> > > >
> > > > My recommendation would be to go with the partition based ordering
> with
> > > > synchronous commits to start with.
> > > >
> > > >
> > > >
> > > > On Thu, Jul 15, 2021 at 7:36 AM Pushkar Deole 
> > > > wrote:
> > > >
> > > > > Hi All, and Antony (author of below article)
> > > > >
> > > > > i came across this article which seemed interesting: Introducing
> > > > > Confluent’s Parallel Consumer Message Processing Client
> > > > > <
> > > > >
> > > >
> > >
> >
> https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/
> > > > > >
> > > > >
> > > > > I would like to use the key-level ordering strategy mentioned in
> the
> > > > > article to scale my consumers, however I would like to check how
> the
> > > > offset
> > > > > commits are handled in this strategy
> > > > > e.g. on partition 1, key1 has offsets 20 and 30 respectively and on
> > the
> > > > > same partition key2 has offset 40. With key-level ordering model,
> > key2
> > > > will
> > > > > be processed by a different thread in background and might gets
> > > processed
> > > > > before events related to key1, in this case offset for key2 will be
> > > > > committed before key1 gets processed ? How is this handled?
> > > > >
> > > >
> > >
> >
>


Re: Confluent's parallel consumer

2021-07-15 Thread Pushkar Deole
It is consumer client node that has received events and is processing
those...

On Thu, Jul 15, 2021 at 8:49 PM Israel Ekpo  wrote:

> Hi Pushkar,
>
> When you use the term “node/instance” are you referring to the Kafka
> Brokers or the consuming clients that are retrieving events from the
> broker?
>
> Please could you elaborate/clarify?
>
> On Thu, Jul 15, 2021 at 10:00 AM Pushkar Deole 
> wrote:
>
> > Well... with key-level ordering, i am mainly concerned about event loss,
> if
> > any, in below mentioned scenario:
> >
> > 1. since event1 with key1 and event2 with key2 are both part of the same
> > partition1
> > 2. key1 event has offset 30 while key2 has offset 40
> > 3. key2 is processed by background thread and offset committed which is
> 40
> > 4. before key1 gets processed by background thread, the instance/node
> goes
> > down
> > 5. partition1 gets rebalanced to node2 and start processing ahead of
> offset
> > 40, thus losing key1
> >
> >
> >
> > On Thu, Jul 15, 2021 at 7:18 PM Israel Ekpo 
> wrote:
> >
> > > Hi Pushkar,
> > >
> > > If you are selecting key-based ordering, you should not be concerned
> > about
> > > the other keys from the same partitions being committed first
> > >
> > > If that is a concern for your use cases then you should go with
> partition
> > > based ordering to ensure that the events are processed in the sequence
> > they
> > > are picked up from the topic partition.
> > >
> > > For commit mode, you have the asynchronous, synchronous and
> transactional
> > > modes. I think if you are concerned with the order of commits you
> should
> > > look into the last two modes.
> > >
> > > My recommendation would be to go with the partition based ordering with
> > > synchronous commits to start with.
> > >
> > >
> > >
> > > On Thu, Jul 15, 2021 at 7:36 AM Pushkar Deole 
> > > wrote:
> > >
> > > > Hi All, and Antony (author of below article)
> > > >
> > > > i came across this article which seemed interesting: Introducing
> > > > Confluent’s Parallel Consumer Message Processing Client
> > > > <
> > > >
> > >
> >
> https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/
> > > > >
> > > >
> > > > I would like to use the key-level ordering strategy mentioned in the
> > > > article to scale my consumers, however I would like to check how the
> > > offset
> > > > commits are handled in this strategy
> > > > e.g. on partition 1, key1 has offsets 20 and 30 respectively and on
> the
> > > > same partition key2 has offset 40. With key-level ordering model,
> key2
> > > will
> > > > be processed by a different thread in background and might gets
> > processed
> > > > before events related to key1, in this case offset for key2 will be
> > > > committed before key1 gets processed ? How is this handled?
> > > >
> > >
> >
>


Re: Confluent's parallel consumer

2021-07-15 Thread Israel Ekpo
Hi Pushkar,

When you use the term “node/instance” are you referring to the Kafka
Brokers or the consuming clients that are retrieving events from the broker?

Please could you elaborate/clarify?

On Thu, Jul 15, 2021 at 10:00 AM Pushkar Deole  wrote:

> Well... with key-level ordering, i am mainly concerned about event loss, if
> any, in below mentioned scenario:
>
> 1. since event1 with key1 and event2 with key2 are both part of the same
> partition1
> 2. key1 event has offset 30 while key2 has offset 40
> 3. key2 is processed by background thread and offset committed which is 40
> 4. before key1 gets processed by background thread, the instance/node goes
> down
> 5. partition1 gets rebalanced to node2 and start processing ahead of offset
> 40, thus losing key1
>
>
>
> On Thu, Jul 15, 2021 at 7:18 PM Israel Ekpo  wrote:
>
> > Hi Pushkar,
> >
> > If you are selecting key-based ordering, you should not be concerned
> about
> > the other keys from the same partitions being committed first
> >
> > If that is a concern for your use cases then you should go with partition
> > based ordering to ensure that the events are processed in the sequence
> they
> > are picked up from the topic partition.
> >
> > For commit mode, you have the asynchronous, synchronous and transactional
> > modes. I think if you are concerned with the order of commits you should
> > look into the last two modes.
> >
> > My recommendation would be to go with the partition based ordering with
> > synchronous commits to start with.
> >
> >
> >
> > On Thu, Jul 15, 2021 at 7:36 AM Pushkar Deole 
> > wrote:
> >
> > > Hi All, and Antony (author of below article)
> > >
> > > i came across this article which seemed interesting: Introducing
> > > Confluent’s Parallel Consumer Message Processing Client
> > > <
> > >
> >
> https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/
> > > >
> > >
> > > I would like to use the key-level ordering strategy mentioned in the
> > > article to scale my consumers, however I would like to check how the
> > offset
> > > commits are handled in this strategy
> > > e.g. on partition 1, key1 has offsets 20 and 30 respectively and on the
> > > same partition key2 has offset 40. With key-level ordering model, key2
> > will
> > > be processed by a different thread in background and might gets
> processed
> > > before events related to key1, in this case offset for key2 will be
> > > committed before key1 gets processed ? How is this handled?
> > >
> >
>


Re: Confluent's parallel consumer

2021-07-15 Thread Pushkar Deole
Well... with key-level ordering, i am mainly concerned about event loss, if
any, in below mentioned scenario:

1. since event1 with key1 and event2 with key2 are both part of the same
partition1
2. key1 event has offset 30 while key2 has offset 40
3. key2 is processed by background thread and offset committed which is 40
4. before key1 gets processed by background thread, the instance/node goes
down
5. partition1 gets rebalanced to node2 and start processing ahead of offset
40, thus losing key1



On Thu, Jul 15, 2021 at 7:18 PM Israel Ekpo  wrote:

> Hi Pushkar,
>
> If you are selecting key-based ordering, you should not be concerned about
> the other keys from the same partitions being committed first
>
> If that is a concern for your use cases then you should go with partition
> based ordering to ensure that the events are processed in the sequence they
> are picked up from the topic partition.
>
> For commit mode, you have the asynchronous, synchronous and transactional
> modes. I think if you are concerned with the order of commits you should
> look into the last two modes.
>
> My recommendation would be to go with the partition based ordering with
> synchronous commits to start with.
>
>
>
> On Thu, Jul 15, 2021 at 7:36 AM Pushkar Deole 
> wrote:
>
> > Hi All, and Antony (author of below article)
> >
> > i came across this article which seemed interesting: Introducing
> > Confluent’s Parallel Consumer Message Processing Client
> > <
> >
> https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/
> > >
> >
> > I would like to use the key-level ordering strategy mentioned in the
> > article to scale my consumers, however I would like to check how the
> offset
> > commits are handled in this strategy
> > e.g. on partition 1, key1 has offsets 20 and 30 respectively and on the
> > same partition key2 has offset 40. With key-level ordering model, key2
> will
> > be processed by a different thread in background and might gets processed
> > before events related to key1, in this case offset for key2 will be
> > committed before key1 gets processed ? How is this handled?
> >
>


Re: Confluent's parallel consumer

2021-07-15 Thread Israel Ekpo
Hi Pushkar,

If you are selecting key-based ordering, you should not be concerned about
the other keys from the same partitions being committed first

If that is a concern for your use cases then you should go with partition
based ordering to ensure that the events are processed in the sequence they
are picked up from the topic partition.

For commit mode, you have the asynchronous, synchronous and transactional
modes. I think if you are concerned with the order of commits you should
look into the last two modes.

My recommendation would be to go with the partition based ordering with
synchronous commits to start with.



On Thu, Jul 15, 2021 at 7:36 AM Pushkar Deole  wrote:

> Hi All, and Antony (author of below article)
>
> i came across this article which seemed interesting: Introducing
> Confluent’s Parallel Consumer Message Processing Client
> <
> https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/
> >
>
> I would like to use the key-level ordering strategy mentioned in the
> article to scale my consumers, however I would like to check how the offset
> commits are handled in this strategy
> e.g. on partition 1, key1 has offsets 20 and 30 respectively and on the
> same partition key2 has offset 40. With key-level ordering model, key2 will
> be processed by a different thread in background and might gets processed
> before events related to key1, in this case offset for key2 will be
> committed before key1 gets processed ? How is this handled?
>


Confluent's parallel consumer

2021-07-15 Thread Pushkar Deole
Hi All, and Antony (author of below article)

i came across this article which seemed interesting: Introducing
Confluent’s Parallel Consumer Message Processing Client


I would like to use the key-level ordering strategy mentioned in the
article to scale my consumers, however I would like to check how the offset
commits are handled in this strategy
e.g. on partition 1, key1 has offsets 20 and 30 respectively and on the
same partition key2 has offset 40. With key-level ordering model, key2 will
be processed by a different thread in background and might gets processed
before events related to key1, in this case offset for key2 will be
committed before key1 gets processed ? How is this handled?


configure brokers with 2 listeners

2021-07-15 Thread Claudia Kesslau
Hi,
I tried to add a second listener to my kafka brokers running in docker like 
this:

listeners=INTERNAL://{{getenv "KAFKA_SERVER_IP"}}:{{getenv "KAFKA_PORT" 
"9092"}}, EXTERNAL://0.0.0.0:{{getenv "KAFKA_PORT_EXTERNAL" "9096"}}
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
inter.broker.listener.name=INTERNAL
advertised.listeners=INTERNAL://{{getenv "KAFKA_SERVER_IP"}}:{{getenv 
"KAFKA_PORT" "9092"}}, EXTERNAL://{{getenv 
"KAFKA_SERVER_IP_EXTERNAL"}}:{{getenv "KAFKA_PORT_EXTERNAL" "9096"}}

I tried to follow the explanations here: 
https://www.confluent.io/blog/kafka-listeners-explained/

When I start one of the brokers with this configuration, it seems to boot fine 
but then does not take up any work again. My guess is, that it's not connected 
to the other brokers anymore.
Did I miss some vital configuration to start a broker with 2 listeners?
Is there a way to check what listener brokers use for their internal cluster 
communication?

Thanks,
Claudia