Re: How to balance messages in kafka topics with newly added partitions?
Yes but I find this even easier to do with KSQL. CREATE STREAM OUTPUTTOPIC AS SELECT * FROM INPUTTOPIC; There are similar examples like this that also filter messages while copying, or change the message format while copying on the KSQL Recipe page here https://www.confluent.io/stream-processing-cookbook/ There is even an example for repartitioning topics using the PARTITIONS parameter. CREATE STREAM clickstream_new WITH (PARTITIONS=5) AS SELECT * from clickstream_raw; -hans > On Jan 27, 2019, at 9:24 AM, Ryanne Dolan wrote: > > You can use MirrorMaker to copy data between topics. > > Ryanne > >> On Sun, Jan 27, 2019, 7:12 AM jaaz jozz > >> Thanks, Sönke >> Is there any available kafka tool to move messages between topics? >> >> On Sun, Jan 27, 2019 at 2:40 PM Sönke Liebau >> wrote: >> >>> Hi Jazz, >>> >>> I'm afraid the only way of rebalancing old messages is indeed to >>> rewrite them to the topic - thus creating duplication. >>> Once a message has been written to a partition by Kafka this >>> assignment is final, there is no way of moving it to another >>> partition. >>> >>> Changing the partition count of topics at a later time can be a huge >>> headache, if you depend on partitioning. For this exact reason the >>> general recommendation is to overpartition your topics a little when >>> creating them, so that you can add consumers as the data volume >>> increases. >>> >>> In your case the best solution might be to delete and then recreate >>> the topic with more partitions. Now you can rewrite all your data and >>> it will result in a clean partitioning. >>> >>> Hope this helps a little, feel free to get back to us if you have more >>> questions! >>> >>> Best regards, >>> Sönke >>> On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz wrote: Hello, I have kafka cluster with certain topic that had too few partitions, >> so a large backlog of messages was collected. After i added additional partitions, only the newly messages balanced between all the new >>> partitions. What is the preferred way to balance the "old" backlog of messages >> inside the original partitions across all the new partitions? I thought of reading and writing again all the messages backlog to this topic and update the offsets accordingly, but it will make duplication >> of messages if a new consumer group will start consuming from the >> beginning >>> of this topic. How can i solve this? Thanks. >>> >>> >>> >>> -- >>> Sönke Liebau >>> Partner >>> Tel. +49 179 7940878 >>> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany >>> >>
Re: How to balance messages in kafka topics with newly added partitions?
You can use MirrorMaker to copy data between topics. Ryanne On Sun, Jan 27, 2019, 7:12 AM jaaz jozz Thanks, Sönke > Is there any available kafka tool to move messages between topics? > > On Sun, Jan 27, 2019 at 2:40 PM Sönke Liebau > wrote: > > > Hi Jazz, > > > > I'm afraid the only way of rebalancing old messages is indeed to > > rewrite them to the topic - thus creating duplication. > > Once a message has been written to a partition by Kafka this > > assignment is final, there is no way of moving it to another > > partition. > > > > Changing the partition count of topics at a later time can be a huge > > headache, if you depend on partitioning. For this exact reason the > > general recommendation is to overpartition your topics a little when > > creating them, so that you can add consumers as the data volume > > increases. > > > > In your case the best solution might be to delete and then recreate > > the topic with more partitions. Now you can rewrite all your data and > > it will result in a clean partitioning. > > > > Hope this helps a little, feel free to get back to us if you have more > > questions! > > > > Best regards, > > Sönke > > > > On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz wrote: > > > > > > Hello, > > > > > > I have kafka cluster with certain topic that had too few partitions, > so a > > > large backlog of messages was collected. After i added additional > > > partitions, only the newly messages balanced between all the new > > partitions. > > > > > > What is the preferred way to balance the "old" backlog of messages > inside > > > the original partitions across all the new partitions? > > > > > > I thought of reading and writing again all the messages backlog to this > > > topic and update the offsets accordingly, but it will make duplication > of > > > messages if a new consumer group will start consuming from the > beginning > > of > > > this topic. > > > > > > How can i solve this? > > > > > > Thanks. > > > > > > > > -- > > Sönke Liebau > > Partner > > Tel. +49 179 7940878 > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany > > >
Re: How to balance messages in kafka topics with newly added partitions?
Thanks, Sönke Is there any available kafka tool to move messages between topics? On Sun, Jan 27, 2019 at 2:40 PM Sönke Liebau wrote: > Hi Jazz, > > I'm afraid the only way of rebalancing old messages is indeed to > rewrite them to the topic - thus creating duplication. > Once a message has been written to a partition by Kafka this > assignment is final, there is no way of moving it to another > partition. > > Changing the partition count of topics at a later time can be a huge > headache, if you depend on partitioning. For this exact reason the > general recommendation is to overpartition your topics a little when > creating them, so that you can add consumers as the data volume > increases. > > In your case the best solution might be to delete and then recreate > the topic with more partitions. Now you can rewrite all your data and > it will result in a clean partitioning. > > Hope this helps a little, feel free to get back to us if you have more > questions! > > Best regards, > Sönke > > On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz wrote: > > > > Hello, > > > > I have kafka cluster with certain topic that had too few partitions, so a > > large backlog of messages was collected. After i added additional > > partitions, only the newly messages balanced between all the new > partitions. > > > > What is the preferred way to balance the "old" backlog of messages inside > > the original partitions across all the new partitions? > > > > I thought of reading and writing again all the messages backlog to this > > topic and update the offsets accordingly, but it will make duplication of > > messages if a new consumer group will start consuming from the beginning > of > > this topic. > > > > How can i solve this? > > > > Thanks. > > > > -- > Sönke Liebau > Partner > Tel. +49 179 7940878 > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany >
Re: How to balance messages in kafka topics with newly added partitions?
Hi Jazz, I'm afraid the only way of rebalancing old messages is indeed to rewrite them to the topic - thus creating duplication. Once a message has been written to a partition by Kafka this assignment is final, there is no way of moving it to another partition. Changing the partition count of topics at a later time can be a huge headache, if you depend on partitioning. For this exact reason the general recommendation is to overpartition your topics a little when creating them, so that you can add consumers as the data volume increases. In your case the best solution might be to delete and then recreate the topic with more partitions. Now you can rewrite all your data and it will result in a clean partitioning. Hope this helps a little, feel free to get back to us if you have more questions! Best regards, Sönke On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz wrote: > > Hello, > > I have kafka cluster with certain topic that had too few partitions, so a > large backlog of messages was collected. After i added additional > partitions, only the newly messages balanced between all the new partitions. > > What is the preferred way to balance the "old" backlog of messages inside > the original partitions across all the new partitions? > > I thought of reading and writing again all the messages backlog to this > topic and update the offsets accordingly, but it will make duplication of > messages if a new consumer group will start consuming from the beginning of > this topic. > > How can i solve this? > > Thanks. -- Sönke Liebau Partner Tel. +49 179 7940878 OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
How to balance messages in kafka topics with newly added partitions?
Hello, I have kafka cluster with certain topic that had too few partitions, so a large backlog of messages was collected. After i added additional partitions, only the newly messages balanced between all the new partitions. What is the preferred way to balance the "old" backlog of messages inside the original partitions across all the new partitions? I thought of reading and writing again all the messages backlog to this topic and update the offsets accordingly, but it will make duplication of messages if a new consumer group will start consuming from the beginning of this topic. How can i solve this? Thanks.