Which record is processed when topology restart ?

2018-02-05 Thread Saïd Bouras
Hi everyone;

I have a question about the record that is currently processed by the kafka
stream app when this app stop (suddenly or not).

When restarting the app, the last record that was processed before the
shutdown is replayed, but I noticed that the topology don't replay the
entire DAG for that record and thus don't replay from the inputs topics of
the topology but from the last internal topic associate to a processor node.

Example :
*leftJoin -> selectKey -> flatMap-> someOtherOperatio ->...*

At the *flatMap* processor, we have a internal topic with *"repartition"*
suffix, and if the topology is stopped at that moment, when it restart, the
last record is picked from the repartition topic and try to execute the
operation after the *flatMap.*

The behavior that I was expecting was to pick the message from the inputs
topics and try to replay the entire topology for that record.

In the example, at the beginning we join with an other Ktable, the result
of that join is processed and written in the internal topic, the app
crashed, in the meantime the state of the ktable has changed and when the
topology restart, the message from the internal topic is processed with an
out dated message resulting of the left join before the crash.
Furthermore, if the app crashed because of that particular record, when the
topology restart it will crashed again..

The actual behavior is of course linked to the actual implementation of
consumers and producers in kafka streams (consumers retry to read the last
offset of the internal topic when the topology restart).

Is the actual behavior not correct or I am missing something here ?

Thanks, regards
-- 
*Saïd Bouras*


Re: [DISCUSS] KIP-247: Add public test utils for Kafka Streams

2018-01-17 Thread Saïd Bouras
Matthias,

What about testing topology that use avro schema ? Have you read my
previous response ?

Thanks.

On Wed, Jan 17, 2018 at 3:34 AM Matthias J. Sax 
wrote:

> Colin,
>
> the TopologyTestDriver does not connect to any broker and simulates
> processing of single-partitioned input topics purely in-memory (the
> driver is basically a mock for a StreamThread). This is sufficient to
> test basic business logic. For more complex topologies that are actually
> divided into sub-topologies and connected via topics, the driver detects
> this case and does an in-memory forward.
>
>
> -Matthias
>
> On 1/16/18 10:08 AM, Colin McCabe wrote:
> > Thanks, Matthias, this looks great.
> >
> > It seems like these APIs could either be used against mock objects, or
> against real brokers running in the same process.  Is there a way for the
> user to select which they want when using the API?  Sorry if it's in the
> KIP and I missed it.
> >
> > cheers,
> > Colin
> >
> >
> > On Thu, Jan 11, 2018, at 18:06, Matthias J. Sax wrote:
> >> Dear Kafka community,
> >>
> >> I want to propose KIP-247 to add public test utils to the Streams API.
> >> The goal is to simplify testing of Kafka Streams applications.
> >>
> >> Please find details in the wiki:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
> >>
> >> This is an initial KIP, and we hope to add more utility functions later.
> >> Thus, this KIP is not comprehensive but a first step. Of course, we can
> >> enrich this initial KIP if we think it falls too short. But we should
> >> not aim to be comprehensive to keep the scope manageable.
> >>
> >> In fact, I think we should add some more helpers to simplify result
> >> verification. I will update the KIP with this asap. Just wanted to start
> >> the discussion early on.
> >>
> >> An initial WIP PR can be found here:
> >> https://github.com/apache/kafka/pull/4402
> >>
> >> I also included the user-list (please hit "reply-all" to include both
> >> lists in this KIP discussion).
> >>
> >> Thanks a lot.
> >>
> >>
> >> -Matthias
> >>
> >>
> >> Email had 1 attachment:
> >> + signature.asc
> >>   1k (application/pgp-signature)
>
>

-- 
*Saïd Bouras*


Re: [DISCUSS] KIP-247: Add public test utils for Kafka Streams

2018-01-15 Thread Saïd Bouras
Hi Matthias,

I read the KIP and it will be very helpful thanks to the changes, I don't
see though a part that handle topologies that use avro schemas, is it in
the scope of the KIP ?

I open an issue two month ago in the schema-registry repo :
https://github.com/confluentinc/schema-registry/issues/651 that explain
that when testing topologies using schema registry, the schema registry
client mock is not thread safe and thus in the different processors nodes
when deserializing it will not work...

In my unit tests I wrapped the mock schema registry client into a singleton
but this solution is not enough satisfying.

Thanks in advance, regards :-)


On Fri, Jan 12, 2018 at 3:06 AM Matthias J. Sax 
wrote:

> Dear Kafka community,
>
> I want to propose KIP-247 to add public test utils to the Streams API.
> The goal is to simplify testing of Kafka Streams applications.
>
> Please find details in the wiki:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
>
> This is an initial KIP, and we hope to add more utility functions later.
> Thus, this KIP is not comprehensive but a first step. Of course, we can
> enrich this initial KIP if we think it falls too short. But we should
> not aim to be comprehensive to keep the scope manageable.
>
> In fact, I think we should add some more helpers to simplify result
> verification. I will update the KIP with this asap. Just wanted to start
> the discussion early on.
>
> An initial WIP PR can be found here:
> https://github.com/apache/kafka/pull/4402
>
> I also included the user-list (please hit "reply-all" to include both
> lists in this KIP discussion).
>
> Thanks a lot.
>
>
> -Matthias
>
>
>

-- 

Saïd BOURAS

Consultant Big Data
Mobile: 0662988731
Zenika Paris
10 rue de Milan 75009 Paris
Standard : +33(0)1 45 26 19 15 <+33(0)145261915> - Fax : +33(0)1 72 70 45 10
<+33(0)172704510>


Re: Consumer Offsets Being Deleted by Broker

2017-12-10 Thread Saïd Bouras
Hi M.Musheer,

Can you share some code ? It's difficult to tell without it ?
Are you using synchronous or asyn commit ?

Thanks,

Best regards

Le dim. 10 déc. 2017 à 07:13, M. Musheer  a écrit :

> Hi,
> We are using "kafka_2.11-1.0.0" kafka version with default "offset" related
> configurations.
> Issue:
> Consumer offsets are being deleted and we are not using auto commits at
> consumer side.
> Is there any configuration we need to add for consumer offset retention ??
>
> Please help us.
>
>
>
> Thanks,
> Musheer
>
-- 

Saïd BOURAS

Consultant Big Data
Mobile: 0662988731
Zenika Paris
10 rue de Milan 75009 Paris
Standard : +33(0)1 45 26 19 15 <+33(0)145261915> - Fax : +33(0)1 72 70 45 10
<+33(0)172704510>


Re: Kafka Consumer Committing Offset Even After Re-Assignment

2017-12-08 Thread Saïd Bouras
Hi Praveen,

I don't know if you that's your case but if you know that consumers will
lost ownership of partitions, you have to use the
*ConsumerRebalanceListener* to drop the last offset of record processed in
a clean way.

If you don't do that, the rebalance will start when the GroupCoordinator
stopped to receive heartbeats from the Consumer.
I don't know if that will solve your problem, but the consumer after
commiting his last offset will leave the ConsumerGroup.

Regards


On Thu, Dec 7, 2017 at 9:58 PM Praveen  wrote:

> I have 4 consumers on 2 boxes (running two consumers each) and 16
> partitions. Each consumer takes 4 partitions.
>
> In Kafka 0.9.0.1, I'm noticing that even when a consumer is no longer
> assigned the partition, it is able to commit offset to it.
>
> *Box 1 Started*
> t1 - Box 1, Consumer 1 - Owns 8 partitions
>   Box 1, Consumer 2 - Owns 8 partitions
>
>   Consumers start polling and are submitting tasks to a task pool for
> processing.
>
> *Box 2 Started*
> t2 - Box 1, Consumer 1 - Owns 4 partitions
>   Box 1, Consumer 2 - Owns 4 partitions
>   Box 2, Consumer 1 - Owns 4 partitions
>   Box 2, Consumer 2 - Owns 4 partitions
>
>   Partition-1 is now reassigned to Box 2, Consumer 1.
>   But Box 1, Consumer 1 already submitted some of the records for
> processing when it owned the partition earlier.
>
> t3 - Box 1, Consumer 1 - After the tasks finish executing, even tho it
> longer owns the partition, it is still able to commit the offset
>
> t4 - Box 2, Consumer 1 - Commits offsets as well, overwriting offset
> committed by Box 1, Consumer 1.
>
> Is this expected? Should I be using the ConsumerRebalanceListener to
> prevent commits to partitions not owned by the consumer?
>
> - Praveen
>
-- 

Saïd BOURAS

Consultant Big Data
Mobile: 0662988731
Zenika Paris
10 rue de Milan 75009 Paris
Standard : +33(0)1 45 26 19 15 <+33(0)145261915> - Fax : +33(0)1 72 70 45 10
<+33(0)172704510>


Re: Kafka Streams app error while rebalancing

2017-12-06 Thread Saïd Bouras
;   at
> > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> > ensureActiveGroup(AbstractCoordinator.java:303)
> > >   at
> > > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> > ConsumerCoordinator.java:290)
> > >   at
> > > org.apache.kafka.clients.consumer.KafkaConsumer.
> > pollOnce(KafkaConsumer.java:1029)
> > >   at
> > > org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> > KafkaConsumer.java:995)
> > >   at
> > > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> > StreamThread.java:592)
> > >   at
> > > org.apache.kafka.streams.processor.internals.
> > StreamThread.run(StreamThread.java:361)
> > > 17/12/04 18:34:58 WARN StreamThread: Could not create task 0_8. Will
> > retry.
> > > org.apache.kafka.streams.errors.LockException: task [0_8] Failed to
> lock
> > > the state directory for task 0_8
> > >
> > >
> > > kafka-consumer-groups --new-consumer --bootstrap-server <...>
> --describe
> > > --group GeoTest
> > > Note: This will only show information about consumers that use the Java
> > > consumer API (non-ZooKeeper-based consumers).
> > >
> > > Warning: Consumer group 'GeoTest' is rebalancing.
> > >
> > > I keep seeing the above lock exception continuously and app is not
> making
> > > any progress. Any idea why it is stuck?
> > > I read a few suggestions that required me to manually delete state
> > > directory. I'd like to avoid that.
> > >
> > > Thanks,
> > > Srikanth
> > >
> >
> >
>
-- 

Saïd BOURAS

Consultant Big Data
Mobile: 0662988731
Zenika Paris
10 rue de Milan 75009 Paris
Standard : +33(0)1 45 26 19 15 <+33(0)145261915> - Fax : +33(0)1 72 70 45 10
<+33(0)172704510>


Re: Kafka Performance testing

2017-11-30 Thread Saïd Bouras
Hi,

Depending on what you want, you can go :
1) https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
where its explain the procedure to initiates performances tests and how to
use *kafka-consumer-perf-test.sh* and *kafka-producer-perf-test.sh* among
others useful scripts.

2) https://github.com/confluentinc/ducktape
It's testing integrations systems but has features to tests performances as
well and it's a tool developed by Confluent

I don't use Jmeter so I don't know if it will fit your request, but I found
a Kafka extension for Jmeter which seems to be a little popular :
https://github.com/BrightTag/kafkameter

Best regards


On Thu, Nov 30, 2017 at 5:17 PM Ranganath 
wrote:

> I am new to Kafka world, wanted to do performance testing on kafka.
> I googled around, but dint find any useful document from which i can start
> with.
>
> Could you explain or point me to document it would be great.
> Also which performance loading tool would be best to test kafka, I prefer
> Jmeter though.
>
> regards,
>
-- 

Saïd BOURAS

Consultant Big Data
Mobile: 0662988731
Zenika Paris
10 rue de Milan 75009 Paris
Standard : +33(0)1 45 26 19 15 <+33(0)145261915> - Fax : +33(0)1 72 70 45 10
<+33(0)172704510>