Re: Moving data from one cluster to another with Kafka Streams

2018-02-04 Thread Geoffrey Holmes

> Kafka Streams only work with a single cluster.

Ok, that’s what I was thinking after I looked at it more.

> Thus, you would need to either transform the data first and replicate
> the output topic to the target cluster, or replicate first and transform
> within the target cluster.

I don’t control the source cluster, so the second is the only possibility.

> Note, for the "intermediate" topic you need, you can set a low retention
> time to reduce storage footprint, as it only acts as a temporal topic
> and the actual data is safely stored in the original source and final
> target topic.

Makes sense.

> As an alternative, you might want to check out
> "single-message-transforms" (SMT) using Kafka Connect. Those allow you
> to do simple transformation on the fly while copying data around. If you
> don't need advanced transformations like aggregations or joins, SMT
> might be sufficient and you don't need to use Kafka Streams.

I started to look into that. The transformation I need to do is really 
light-weight. What I don’t quite understand is, do have to make source and sink 
connectors to get records from the one Kafka topic and write the transformed 
records to the destination Kafka topic? Or can I write a consumer and a 
producer and incorporate SMTs that way?

> -Matthias

>> On 2/2/18 2:54 PM, Geoffrey Holmes wrote:
>> I need to get messages from a topic in one Kafka cluster, transform the 
>> message payload,
>> and put the messages into topics in another Kafka cluster. Is it possible to 
>> do this with
>> Kafka Streams? I don’t see how I can configure the stream to use one cluster 
>> for the input
>> and another cluster for output.



Re: Kafka Consumer Offsets unavailable during rebalancing

2018-02-04 Thread Hans Jespersen
Do the consumers in consumer group ‘X’ have a regex subscription that matches 
the newly created topic ‘C’?

If they do then they will only discover this new topic once their 
‘metadata.max.age.ms’  metadata refresh interval has passed, which defaults to 
5 minutes.

metadata.max.age.ms The period of time in milliseconds after which we force 
a refresh of metadata even if we haven't seen any partition leadership changes 
to proactively discover any new brokers or partitions
-hans 


> On Feb 4, 2018, at 2:16 PM, Wouter Bancken  wrote:
> 
> Hi Hans,
> 
> Thanks for the response!
> 
> However, I get this result for all topics, not just for the newly created
> topic.
> 
> Situation sketch:
> 1. I have a consumer group 'X' subscribed to topics 'A' and 'B' with
> partition assignments and lag information. Consumer group 'X' is "Stable".
> 2a. Topic 'C' is (being) created.
> 2b. During this creation, I do not have a partition assignment for consumer
> group 'X' for topics 'A' and 'B' but the consumer group is still "Stable".
> 3. A second later: I have a partition assignment for consumer group 'X' for
> topics 'A' and 'B' again and the consumer group is still "Stable".
> 
> I expected the state of consumer group 'X' during step 2b to be
> "PreparingRebalance" or "AwaitingSync".
> 
> Best regards,
> Wouter
> 
>> On 4 February 2018 at 21:25, Hans Jespersen  wrote:
>> 
>> I believe this is expected behavior.
>> 
>> If there are no subscriptions to a new topic, and therefor no partition
>> assignments, and definitely no committed offsets, then lag is an undefined
>> concept. When the consumers subscribe to this new topic they may chose to
>> start at the beginning or end of the commit log so the lag cannot be
>> predicted in advance.
>> 
>> -hans
>> 
>>> On Feb 4, 2018, at 11:51 AM, Wouter Bancken 
>> wrote:
>>> 
>>> Can anyone clarify if this is a bug in Kafka or the expected behavior?
>>> 
>>> Best regards,
>>> Wouter
>>> 
>>> 
>>> On 30 January 2018 at 21:04, Wouter Bancken 
>>> wrote:
>>> 
 Hi,
 
 I'm trying to write an external tool to monitor consumer lag on Apache
 Kafka.
 
 For this purpose, I'm using the kafka-consumer-groups tool to fetch the
 consumer offsets.
 
 When using this tool, partition assignments seem to be unavailable
 temporarily during the creation of a new topic even if the consumer
>> group
 has no subscription on this new topic. This seems to match the
 documentation
 > Kafka+Client-side+Assignment+Proposal>
 saying *"Topic metadata changes which have no impact on subscriptions
 cause resync"*.
 
 However, when this occurs I'd expect the state of the consumer to be
 "PreparingRebalance" or "AwaitingSync" but it is simply "Stable".
 
 Is this a bug in the tooling or is there a different way to obtain the
 correct offsets for a consumer group during a rebalance?
 
 I'm using Kafka 10.2.1 but I haven't found any related issues in recent
 changelogs.
 Best regards,
 Wouter
 
>> 


Re: Kafka Consumer Offsets unavailable during rebalancing

2018-02-04 Thread Wouter Bancken
Hi Hans,

Thanks for the response!

However, I get this result for all topics, not just for the newly created
topic.

Situation sketch:
1. I have a consumer group 'X' subscribed to topics 'A' and 'B' with
partition assignments and lag information. Consumer group 'X' is "Stable".
2a. Topic 'C' is (being) created.
2b. During this creation, I do not have a partition assignment for consumer
group 'X' for topics 'A' and 'B' but the consumer group is still "Stable".
3. A second later: I have a partition assignment for consumer group 'X' for
topics 'A' and 'B' again and the consumer group is still "Stable".

I expected the state of consumer group 'X' during step 2b to be
"PreparingRebalance" or "AwaitingSync".

Best regards,
Wouter

On 4 February 2018 at 21:25, Hans Jespersen  wrote:

> I believe this is expected behavior.
>
> If there are no subscriptions to a new topic, and therefor no partition
> assignments, and definitely no committed offsets, then lag is an undefined
> concept. When the consumers subscribe to this new topic they may chose to
> start at the beginning or end of the commit log so the lag cannot be
> predicted in advance.
>
> -hans
>
> > On Feb 4, 2018, at 11:51 AM, Wouter Bancken 
> wrote:
> >
> > Can anyone clarify if this is a bug in Kafka or the expected behavior?
> >
> > Best regards,
> > Wouter
> >
> >
> > On 30 January 2018 at 21:04, Wouter Bancken 
> > wrote:
> >
> >> Hi,
> >>
> >> I'm trying to write an external tool to monitor consumer lag on Apache
> >> Kafka.
> >>
> >> For this purpose, I'm using the kafka-consumer-groups tool to fetch the
> >> consumer offsets.
> >>
> >> When using this tool, partition assignments seem to be unavailable
> >> temporarily during the creation of a new topic even if the consumer
> group
> >> has no subscription on this new topic. This seems to match the
> >> documentation
> >>  Kafka+Client-side+Assignment+Proposal>
> >> saying *"Topic metadata changes which have no impact on subscriptions
> >> cause resync"*.
> >>
> >> However, when this occurs I'd expect the state of the consumer to be
> >> "PreparingRebalance" or "AwaitingSync" but it is simply "Stable".
> >>
> >> Is this a bug in the tooling or is there a different way to obtain the
> >> correct offsets for a consumer group during a rebalance?
> >>
> >> I'm using Kafka 10.2.1 but I haven't found any related issues in recent
> >> changelogs.
> >> Best regards,
> >> Wouter
> >>
>


Re: Kafka Consumer Offsets unavailable during rebalancing

2018-02-04 Thread Hans Jespersen
I believe this is expected behavior.

If there are no subscriptions to a new topic, and therefor no partition 
assignments, and definitely no committed offsets, then lag is an undefined 
concept. When the consumers subscribe to this new topic they may chose to start 
at the beginning or end of the commit log so the lag cannot be predicted in 
advance.

-hans

> On Feb 4, 2018, at 11:51 AM, Wouter Bancken  wrote:
> 
> Can anyone clarify if this is a bug in Kafka or the expected behavior?
> 
> Best regards,
> Wouter
> 
> 
> On 30 January 2018 at 21:04, Wouter Bancken 
> wrote:
> 
>> Hi,
>> 
>> I'm trying to write an external tool to monitor consumer lag on Apache
>> Kafka.
>> 
>> For this purpose, I'm using the kafka-consumer-groups tool to fetch the
>> consumer offsets.
>> 
>> When using this tool, partition assignments seem to be unavailable
>> temporarily during the creation of a new topic even if the consumer group
>> has no subscription on this new topic. This seems to match the
>> documentation
>> 
>> saying *"Topic metadata changes which have no impact on subscriptions
>> cause resync"*.
>> 
>> However, when this occurs I'd expect the state of the consumer to be
>> "PreparingRebalance" or "AwaitingSync" but it is simply "Stable".
>> 
>> Is this a bug in the tooling or is there a different way to obtain the
>> correct offsets for a consumer group during a rebalance?
>> 
>> I'm using Kafka 10.2.1 but I haven't found any related issues in recent
>> changelogs.
>> Best regards,
>> Wouter
>> 


Re: Kafka Consumer Offsets unavailable during rebalancing

2018-02-04 Thread Wouter Bancken
Can anyone clarify if this is a bug in Kafka or the expected behavior?

Best regards,
Wouter


On 30 January 2018 at 21:04, Wouter Bancken 
wrote:

> Hi,
>
> I'm trying to write an external tool to monitor consumer lag on Apache
> Kafka.
>
> For this purpose, I'm using the kafka-consumer-groups tool to fetch the
> consumer offsets.
>
> When using this tool, partition assignments seem to be unavailable
> temporarily during the creation of a new topic even if the consumer group
> has no subscription on this new topic. This seems to match the
> documentation
> 
> saying *"Topic metadata changes which have no impact on subscriptions
> cause resync"*.
>
> However, when this occurs I'd expect the state of the consumer to be
> "PreparingRebalance" or "AwaitingSync" but it is simply "Stable".
>
> Is this a bug in the tooling or is there a different way to obtain the
> correct offsets for a consumer group during a rebalance?
>
> I'm using Kafka 10.2.1 but I haven't found any related issues in recent
> changelogs.
> Best regards,
> Wouter
>


RE: Strange Topic ...

2018-02-04 Thread adrien ruffie
Hello Ted,


I use Kafka_2.11-1.0.0

And I log file I only have this:

[2018-01-27 23:12:09,545] INFO Shutting down the log cleaner. 
(kafka.log.LogCleaner)
[2018-01-27 23:12:09,545] INFO [kafka-log-cleaner-thread-0]: Shutting down 
(kafka.log.LogCleaner)
[2018-01-27 23:12:09,546] INFO [kafka-log-cleaner-thread-0]: Stopped 
(kafka.log.LogCleaner)
[2018-01-27 23:12:09,546] INFO [kafka-log-cleaner-thread-0]: Shutdown completed 
(kafka.log.LogCleaner)


And the topic appeared this morning, and log log-cleaner.log file, don't log 
anything after 01-27 ...


De : Ted Yu 
Envoyé : dimanche 4 février 2018 20:37:59
À : users@kafka.apache.org
Objet : Re: Strange Topic ...

Which Kafka version are you using ?
Older versions of kafka (0.10 and prior) had some bugs in the log-cleaner
thread that might sometimes cause it to crash.

Please check the log-cleaner.log file to see if there was some clue.

Cheers

On Sun, Feb 4, 2018 at 11:14 AM, adrien ruffie 
wrote:

> Hello all,
>
>
> I'm a beginner in Kafka and this morning when I try some tests and when
> running this following cmd:
>
> ./bin kafka-topics.sh --zookeeper localhost:2181 --describe
>
>
> I understand my 3 created topic like "customer-topic",
> "streams-plaintext-input", and "streams-wordcount-output"
>
>
> But I already get this following output, why __consumer_offsets have 50
> partitions ! I never created it ... do you know this beavior ?
>
>
> Topic:__consumer_offsetsPartitionCount:50
>  ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=
> compact,compression.type=produ$
> Topic: __consumer_offsets   Partition: 0Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 1Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 2Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 3Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 4Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 5Leader: 0
>  Replicas: 0 Isr: 0
> 
> Topic: __consumer_offsets   Partition: 49   Leader: 0
>  Replicas: 0 Isr: 0
>
>
> Topic:customer-topicPartitionCount:1ReplicationFactor:1
>  Configs:
> Topic: customer-topic   Partition: 0Leader: 0   Replicas:
> 0 Isr: 0
> Topic:streams-plaintext-input   PartitionCount:1
> ReplicationFactor:1 Configs:
> Topic: streams-plaintext-input  Partition: 0Leader: 0
>  Replicas: 0 Isr: 0
> Topic:streams-wordcount-output  PartitionCount:1
> ReplicationFactor:1 Configs:cleanup.policy=compact
> Topic: streams-wordcount-output Partition: 0Leader: 0
>  Replicas: 0 Isr: 0
>
>
> Thank and bests regards,
>
> Adrien
>
>
>
>


Re: Strange Topic ...

2018-02-04 Thread naresh Goud
This is the topic used and created by Kafka internally to store consumer
offsets while use consumer programs running.

Thank you,
Naresh

On Sun, Feb 4, 2018 at 1:38 PM Ted Yu  wrote:

> Which Kafka version are you using ?
> Older versions of kafka (0.10 and prior) had some bugs in the log-cleaner
> thread that might sometimes cause it to crash.
>
> Please check the log-cleaner.log file to see if there was some clue.
>
> Cheers
>
> On Sun, Feb 4, 2018 at 11:14 AM, adrien ruffie 
> wrote:
>
> > Hello all,
> >
> >
> > I'm a beginner in Kafka and this morning when I try some tests and when
> > running this following cmd:
> >
> > ./bin kafka-topics.sh --zookeeper localhost:2181 --describe
> >
> >
> > I understand my 3 created topic like "customer-topic",
> > "streams-plaintext-input", and "streams-wordcount-output"
> >
> >
> > But I already get this following output, why __consumer_offsets have 50
> > partitions ! I never created it ... do you know this beavior ?
> >
> >
> > Topic:__consumer_offsetsPartitionCount:50
> >  ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=
> > compact,compression.type=produ$
> > Topic: __consumer_offsets   Partition: 0Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 1Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 2Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 3Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 4Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 5Leader: 0
> >  Replicas: 0 Isr: 0
> > 
> > Topic: __consumer_offsets   Partition: 49   Leader: 0
> >  Replicas: 0 Isr: 0
> >
> >
> > Topic:customer-topicPartitionCount:1ReplicationFactor:1
> >  Configs:
> > Topic: customer-topic   Partition: 0Leader: 0   Replicas:
> > 0 Isr: 0
> > Topic:streams-plaintext-input   PartitionCount:1
> > ReplicationFactor:1 Configs:
> > Topic: streams-plaintext-input  Partition: 0Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic:streams-wordcount-output  PartitionCount:1
> > ReplicationFactor:1 Configs:cleanup.policy=compact
> > Topic: streams-wordcount-output Partition: 0Leader: 0
> >  Replicas: 0 Isr: 0
> >
> >
> > Thank and bests regards,
> >
> > Adrien
> >
> >
> >
> >
>


Re: Strange Topic ...

2018-02-04 Thread Ted Yu
Which Kafka version are you using ?
Older versions of kafka (0.10 and prior) had some bugs in the log-cleaner
thread that might sometimes cause it to crash.

Please check the log-cleaner.log file to see if there was some clue.

Cheers

On Sun, Feb 4, 2018 at 11:14 AM, adrien ruffie 
wrote:

> Hello all,
>
>
> I'm a beginner in Kafka and this morning when I try some tests and when
> running this following cmd:
>
> ./bin kafka-topics.sh --zookeeper localhost:2181 --describe
>
>
> I understand my 3 created topic like "customer-topic",
> "streams-plaintext-input", and "streams-wordcount-output"
>
>
> But I already get this following output, why __consumer_offsets have 50
> partitions ! I never created it ... do you know this beavior ?
>
>
> Topic:__consumer_offsetsPartitionCount:50
>  ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=
> compact,compression.type=produ$
> Topic: __consumer_offsets   Partition: 0Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 1Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 2Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 3Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 4Leader: 0
>  Replicas: 0 Isr: 0
> Topic: __consumer_offsets   Partition: 5Leader: 0
>  Replicas: 0 Isr: 0
> 
> Topic: __consumer_offsets   Partition: 49   Leader: 0
>  Replicas: 0 Isr: 0
>
>
> Topic:customer-topicPartitionCount:1ReplicationFactor:1
>  Configs:
> Topic: customer-topic   Partition: 0Leader: 0   Replicas:
> 0 Isr: 0
> Topic:streams-plaintext-input   PartitionCount:1
> ReplicationFactor:1 Configs:
> Topic: streams-plaintext-input  Partition: 0Leader: 0
>  Replicas: 0 Isr: 0
> Topic:streams-wordcount-output  PartitionCount:1
> ReplicationFactor:1 Configs:cleanup.policy=compact
> Topic: streams-wordcount-output Partition: 0Leader: 0
>  Replicas: 0 Isr: 0
>
>
> Thank and bests regards,
>
> Adrien
>
>
>
>


Strange Topic ...

2018-02-04 Thread adrien ruffie
Hello all,


I'm a beginner in Kafka and this morning when I try some tests and when running 
this following cmd:

./bin kafka-topics.sh --zookeeper localhost:2181 --describe


I understand my 3 created topic like "customer-topic", 
"streams-plaintext-input", and "streams-wordcount-output"


But I already get this following output, why __consumer_offsets have 50 
partitions ! I never created it ... do you know this beavior ?


Topic:__consumer_offsetsPartitionCount:50   ReplicationFactor:1 
Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=produ$
Topic: __consumer_offsets   Partition: 0Leader: 0   
Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 1Leader: 0   
Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 2Leader: 0   
Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 3Leader: 0   
Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 4Leader: 0   
Replicas: 0 Isr: 0
Topic: __consumer_offsets   Partition: 5Leader: 0   
Replicas: 0 Isr: 0

Topic: __consumer_offsets   Partition: 49   Leader: 0   
Replicas: 0 Isr: 0


Topic:customer-topicPartitionCount:1ReplicationFactor:1 Configs:
Topic: customer-topic   Partition: 0Leader: 0   Replicas: 0 
Isr: 0
Topic:streams-plaintext-input   PartitionCount:1ReplicationFactor:1 
Configs:
Topic: streams-plaintext-input  Partition: 0Leader: 0   
Replicas: 0 Isr: 0
Topic:streams-wordcount-output  PartitionCount:1ReplicationFactor:1 
Configs:cleanup.policy=compact
Topic: streams-wordcount-output Partition: 0Leader: 0   
Replicas: 0 Isr: 0


Thank and bests regards,

Adrien