Re: Replicas not equally distributed within rack
Yes, it’s similar. Replicas are evenly distribute among racks but not among brokers within rack even if no. of brokers are same in all racks. Is there a workaround for this? On Wed, 27 Mar 2024 at 5:36 PM, Chia-Ping Tsai wrote: > hi Abhishek > > Is this issue similar to the unbalance you had met? > > https://issues.apache.org/jira/browse/KAFKA-10368 > > best, > chia-ping > > On 2024/03/23 21:06:59 Abhishek Singla wrote: > > Hi Team, > > > > Kafka version: 2_2.12-2.6.0 > > Zookeeper version: 3.8.x > > > > We have a Kafka Cluster of 12 brokers spread equally across 3 racks. > Topic > > gets auto created with default num.partitions=6 and replication_factor=3. > > It is observed that replicas are equally distributed over racks but > within > > the rack the replicas are randomly distributed like sometimes 3,3,0,0 or > > sometimes 3:2:1 or sometime 2,2,1,1 > > > > Is there a configuration to evenly distribute replicas across brokers > > within a rack, maybe some sort of round robin strategy 2,2,1,1? > > > > And also it is observed that over time 1 broker ends up having way more > > replicas across topics than the other broker in the same rack. Is there a > > config for even distribution of replicas across topics also? > > > > Regards, > > Abhishek Singla > > >
Re: Replicas not equally distributed within rack
hi Abhishek Is this issue similar to the unbalance you had met? https://issues.apache.org/jira/browse/KAFKA-10368 best, chia-ping On 2024/03/23 21:06:59 Abhishek Singla wrote: > Hi Team, > > Kafka version: 2_2.12-2.6.0 > Zookeeper version: 3.8.x > > We have a Kafka Cluster of 12 brokers spread equally across 3 racks. Topic > gets auto created with default num.partitions=6 and replication_factor=3. > It is observed that replicas are equally distributed over racks but within > the rack the replicas are randomly distributed like sometimes 3,3,0,0 or > sometimes 3:2:1 or sometime 2,2,1,1 > > Is there a configuration to evenly distribute replicas across brokers > within a rack, maybe some sort of round robin strategy 2,2,1,1? > > And also it is observed that over time 1 broker ends up having way more > replicas across topics than the other broker in the same rack. Is there a > config for even distribution of replicas across topics also? > > Regards, > Abhishek Singla >
Re: Messages disappearing from Kafka Streams topology
Hey Karsten, You don't need to do any other configuration to enable EOS. See here - https://docs.confluent.io/platform/current/streams/concepts.html#processing-guarantees It mentions that the producer will be idempotent. That also mans ack=all will be considered. Not that if you have any other ack from the config, it will be ignored in the favour of exactly-once. Do let me know if that solves your problem. I am curious. if yes, then I would ask you to create an issue. Regards, Mangat On Wed, Mar 27, 2024 at 10:49 AM Karsten Stöckmann < karsten.stoeckm...@gmail.com> wrote: > Hi Mangat, > > thanks for clarification. So to my knowledge exactly-once is configured > using the 'processing.guarantee=exactly_once_v2' setting? Is the > configuration setting 'acks=all' somehow related and would you advise > setting that as well? > > Best wishes > Karsten > > > mangat rai schrieb am Di., 26. März 2024, 15:44: > > > Hey Karsten, > > > > So if a topic has not been created yet. Streams app will keep the data in > > memory, and then write it later when it is available. if your app is > > restarted (or thread is killed), you may lose data but it depends if the > > app will commit in the source topics. If there is no errors, then it > should > > be persisted eventually. > > > > However, overall exactly-once provides a much tighter and better commit > > control. If you don't have scaling issue, I will strongly advise you to > use > > EOS. > > > > Thanks, > > Mangat > > > > > > On Tue, Mar 26, 2024 at 3:33 PM Karsten Stöckmann < > > karsten.stoeckm...@gmail.com> wrote: > > > > > Hi Mangat, > > > > > > thanks for your thoughts. I had actually considered exactly-once > > semantics > > > already, was unsure whether it would help, and left it aside for once > > then. > > > I'll try that immediately when I get back to work. > > > > > > About snapshots and deserialization - I doubt that the issue is caused > by > > > deserialization failures because: when taking another (i.e. at a later > > > point of time) snapshot of the exact same data, all messages fed into > the > > > input topic pass the pipeline as expected. > > > > > > Logs of both Kafka and Kafka Streams show no signs of notable issues as > > far > > > as I can tell, apart from these (when initially starting up, > intermediate > > > topics not existing yet): > > > > > > 2024-03-22 22:36:11,386 WARN [org.apa.kaf.cli.NetworkClient] > > > > > > > > > (kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4) > > > [Consumer > > > > > > > > > clientId=kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4-consumer, > > > groupId=kstreams-folder-aggregator] Error while fetching metadata with > > > correlation id 69 : > > > > > > > > > {kstreams-folder-aggregator-folder-to-agency-subscription-response-topic=UNKNOWN_TOPIC_OR_PARTITION, > > > } > > > > > > Best wishes > > > Karsten > > > > > > > > > > > > mangat rai schrieb am Di., 26. März 2024, > 11:06: > > > > > > > Hey Karsten, > > > > > > > > There could be several reasons this could happen. > > > > 1. Did you check the error logs? There are several reasons why the > > Kafka > > > > stream app may drop incoming messages. Use exactly-once semantics to > > > limit > > > > such cases. > > > > 2. Are you sure there was no error when deserializing the records > from > > > > `folderTopicName`. You mentioned that it happens only when you start > > > > processing and the other table snapshot works fine. This gives me a > > > feeling > > > > that the first records in the topic might not be deserialized > properly. > > > > > > > > Regards, > > > > Mangat > > > > > > > > On Tue, Mar 26, 2024 at 8:45 AM Karsten Stöckmann < > > > > karsten.stoeckm...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > thanks for getting back. I'll try and illustrate the issue. > > > > > > > > > > I've got an input topic 'folderTopicName' fed by a database CDC > > system. > > > > > Messages then pass a series of FK left joins and are eventually > sent > > to > > > > an > > > > > output topic like this ('agencies' and 'documents' being KTables): > > > > > > > > > > > > > > > streamsBuilder // > > > > > .table( // > > > > > folderTopicName, // > > > > > Consumed.with( // > > > > > folderKeySerde, // > > > > > folderSerde)) // > > > > > .leftJoin( // > > > > > agencies, // > > > > > Folder::agencyIdValue, // > > > > > AggregateFolder::new, // > > > > > TableJoined.as("folder-to-agency"), // > > > > > Materializer // > > > > > . > > > > AggregateFolder>named("folder-to-agency-materialized") // > > > > >
Re: Messages disappearing from Kafka Streams topology
Hi Mangat, thanks for clarification. So to my knowledge exactly-once is configured using the 'processing.guarantee=exactly_once_v2' setting? Is the configuration setting 'acks=all' somehow related and would you advise setting that as well? Best wishes Karsten mangat rai schrieb am Di., 26. März 2024, 15:44: > Hey Karsten, > > So if a topic has not been created yet. Streams app will keep the data in > memory, and then write it later when it is available. if your app is > restarted (or thread is killed), you may lose data but it depends if the > app will commit in the source topics. If there is no errors, then it should > be persisted eventually. > > However, overall exactly-once provides a much tighter and better commit > control. If you don't have scaling issue, I will strongly advise you to use > EOS. > > Thanks, > Mangat > > > On Tue, Mar 26, 2024 at 3:33 PM Karsten Stöckmann < > karsten.stoeckm...@gmail.com> wrote: > > > Hi Mangat, > > > > thanks for your thoughts. I had actually considered exactly-once > semantics > > already, was unsure whether it would help, and left it aside for once > then. > > I'll try that immediately when I get back to work. > > > > About snapshots and deserialization - I doubt that the issue is caused by > > deserialization failures because: when taking another (i.e. at a later > > point of time) snapshot of the exact same data, all messages fed into the > > input topic pass the pipeline as expected. > > > > Logs of both Kafka and Kafka Streams show no signs of notable issues as > far > > as I can tell, apart from these (when initially starting up, intermediate > > topics not existing yet): > > > > 2024-03-22 22:36:11,386 WARN [org.apa.kaf.cli.NetworkClient] > > > > > (kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4) > > [Consumer > > > > > clientId=kstreams-folder-aggregator-a38397c2-d30a-437e-9817-baa605d49e23-StreamThread-4-consumer, > > groupId=kstreams-folder-aggregator] Error while fetching metadata with > > correlation id 69 : > > > > > {kstreams-folder-aggregator-folder-to-agency-subscription-response-topic=UNKNOWN_TOPIC_OR_PARTITION, > > } > > > > Best wishes > > Karsten > > > > > > > > mangat rai schrieb am Di., 26. März 2024, 11:06: > > > > > Hey Karsten, > > > > > > There could be several reasons this could happen. > > > 1. Did you check the error logs? There are several reasons why the > Kafka > > > stream app may drop incoming messages. Use exactly-once semantics to > > limit > > > such cases. > > > 2. Are you sure there was no error when deserializing the records from > > > `folderTopicName`. You mentioned that it happens only when you start > > > processing and the other table snapshot works fine. This gives me a > > feeling > > > that the first records in the topic might not be deserialized properly. > > > > > > Regards, > > > Mangat > > > > > > On Tue, Mar 26, 2024 at 8:45 AM Karsten Stöckmann < > > > karsten.stoeckm...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > thanks for getting back. I'll try and illustrate the issue. > > > > > > > > I've got an input topic 'folderTopicName' fed by a database CDC > system. > > > > Messages then pass a series of FK left joins and are eventually sent > to > > > an > > > > output topic like this ('agencies' and 'documents' being KTables): > > > > > > > > > > > > streamsBuilder // > > > > .table( // > > > > folderTopicName, // > > > > Consumed.with( // > > > > folderKeySerde, // > > > > folderSerde)) // > > > > .leftJoin( // > > > > agencies, // > > > > Folder::agencyIdValue, // > > > > AggregateFolder::new, // > > > > TableJoined.as("folder-to-agency"), // > > > > Materializer // > > > > . > > > AggregateFolder>named("folder-to-agency-materialized") // > > > > .withKeySerde(folderKeySerde) // > > > > > > > .withValueSerde(aggregateFolderSerde)) > > > > // > > > > .leftJoin( // > > > > documents, // > > > > .toStream(... > > > > .to(... > > > > > > > > ... > > > > > > > > As far as I understand, left join sematics should be similar to those > > of > > > > relational databases, i.e. the left hand value always passes the join > > > with > > > > the right hand value set as if not present. Whereas what I am > > > > observing is this: lots of messages on the input topic are even > absent > > on > > > > the first left join changelog topic > > > > ('folder-to-agency-materialized-changelog'). But: this seems to > happen > > > only > > > > in case the Streams application is fired up for the first time, i.e. > > > >
Community Over Code NA 2024 Travel Assistance Applications now open!
Hello to all users, contributors and Committers! [ You are receiving this email as a subscriber to one or more ASF project dev or user mailing lists and is not being sent to you directly. It is important that we reach all of our users and contributors/committers so that they may get a chance to benefit from this. We apologise in advance if this doesn't interest you but it is on topic for the mailing lists of the Apache Software Foundation; and it is important please that you do not mark this as spam in your email client. Thank You! ] The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code NA 2024 are now open! We will be supporting Community over Code NA, Denver Colorado in October 7th to the 10th 2024. TAC exists to help those that would like to attend Community over Code events, but are unable to do so for financial reasons. For more info on this years applications and qualifying criteria, please visit the TAC website at < https://tac.apache.org/ >. Applications are already open on https://tac-apply.apache.org/, so don't delay! The Apache Travel Assistance Committee will only be accepting applications from those people that are able to attend the full event. Important: Applications close on Monday 6th May, 2024. Applicants have until the the closing date above to submit their applications (which should contain as much supporting material as required to efficiently and accurately process their request), this will enable TAC to announce successful applications shortly afterwards. As usual, TAC expects to deal with a range of applications from a diverse range of backgrounds; therefore, we encourage (as always) anyone thinking about sending in an application to do so ASAP. For those that will need a Visa to enter the Country - we advise you apply now so that you have enough time in case of interview delays. So do not wait until you know if you have been accepted or not. We look forward to greeting many of you in Denver, Colorado , October 2024! Kind Regards, Gavin (On behalf of the Travel Assistance Committee)
Re: Replicas not equally distributed within rack
Hi Team, Could someone help me with how to distribute kafka topic replicas evenly across brokers to avoid data skew (disk utilisation). Regards, Abhishek Singla On Sun, Mar 24, 2024 at 2:36 AM Abhishek Singla wrote: > Hi Team, > > Kafka version: 2_2.12-2.6.0 > Zookeeper version: 3.8.x > > We have a Kafka Cluster of 12 brokers spread equally across 3 racks. Topic > gets auto created with default num.partitions=6 and replication_factor=3. > It is observed that replicas are equally distributed over racks but within > the rack the replicas are randomly distributed like sometimes 3,3,0,0 or > sometimes 3:2:1 or sometime 2,2,1,1 > > Is there a configuration to evenly distribute replicas across brokers > within a rack, maybe some sort of round robin strategy 2,2,1,1? > > And also it is observed that over time 1 broker ends up having way more > replicas across topics than the other broker in the same rack. Is there a > config for even distribution of replicas across topics also? > > Regards, > Abhishek Singla >
Re: [ANNOUNCE] New committer: Christo Lolov
Congrats! On 3/26/24 9:39 PM, Christo Lolov wrote: Thank you everyone! It wouldn't have been possible without quite a lot of reviews and extremely helpful inputs from you and the rest of the community! I am looking forward to working more closely with you going forward :) On Tue, 26 Mar 2024 at 14:31, Kirk True wrote: Congratulations Christo! On Mar 26, 2024, at 7:27 AM, Satish Duggana wrote: Congratulations Christo! On Tue, 26 Mar 2024 at 19:20, Ivan Yurchenko wrote: Congrats! On Tue, Mar 26, 2024, at 14:48, Lucas Brutschy wrote: Congrats! On Tue, Mar 26, 2024 at 2:44 PM Federico Valeri wrote: Congrats! On Tue, Mar 26, 2024 at 2:27 PM Mickael Maison < mickael.mai...@gmail.com> wrote: Congratulations Christo! On Tue, Mar 26, 2024 at 2:26 PM Chia-Ping Tsai wrote: Congrats Christo! Chia-Ping