Re: Something like a unique key to prevent same record from being inserted twice?
And to share my experience of doing similar - certain messages on our system must not be duplicated, but as they are bounced back to us from third parties, duplication is inevitable. So I deduplicate them using Spark structured streaming's flapMapGroupsWithState to deduplicate based on a business key derived from the message. Kind regards, Liam Clarke On Thu, Apr 4, 2019 at 4:09 AM Hans Jespersen wrote: > Ok what you are describing is different from accidental duplicate message > pruning which is what the idempotent publish feature does. > > You are describing a situation were multiple independent messages just > happen to have the same contents (both key and value). > > Removing those messages is an application specific function as you can > imaging applications which would not want independent but identical > messages to be removed (for example temperature sensor readings, heartbeat > messages, or other telemetry data that has repeat but independent values). > > Your best bet is to write a simple intermediate processor that implements > your message pruning algorithm of choice and republishes (or not) to > another topic that your consumers read from. Its a stateful app because it > needs to remember 1 or more past messages but that can be done using the > Kafka Streams processor API and the embedded rocksdb state store that comes > with Kafka Streams (or as a UDF in KSQL). > > You can alternatively write your consuming apps to implement similar > message pruning functionality themselves and avoid one extra component in > the end to end architecture > > -hans > > > On Apr 2, 2019, at 7:28 PM, jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > > > > > > > >> On 2019/04/02 22:43:31, jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > >> > >> > >>> On 2019/04/02 22:25:16, jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > >>> > >>> > On 2019/04/02 21:59:21, Hans Jespersen wrote: > yes. Idempotent publish uses a unique messageID to discard potential > duplicate messages caused by failure conditions when publishing. > > -hans > > > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > > > > Does Kafka have something that behaves like a unique key so a > producer can’t write the same value to a topic twice? > >>> > >>> Hi Hans, > >>> > >>>Is there some documentation or an example with source code where I > can learn more about this feature and how it is implemented? > >>> > >>> Thanks, > >>> Jim > >> > >> By the way I tried this... > >> echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh > --broker-list localhost:9092 --topic TestTopic --property "parse.key=true" > --property "key.separator=:" --property "enable.idempotence=true" > > /dev/null > >> > >> And... that didn't seem to do the trick - after running that command > multiple times I did receive key1 value1 for as many times as I had run the > prior command. > >> > >> Maybe it is the way I am setting the flags... > >> Recently I saw that someone did this... > >> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test > --producer-property enable.idempotence=true --request-required-acks -1 > > > > Also... the reason for my question is that we are going to have two JMS > topics with nearly redundant data in them have the UNION written to Kafka > for further processing. > > >
Re: Something like a unique key to prevent same record from being inserted twice?
Ok what you are describing is different from accidental duplicate message pruning which is what the idempotent publish feature does. You are describing a situation were multiple independent messages just happen to have the same contents (both key and value). Removing those messages is an application specific function as you can imaging applications which would not want independent but identical messages to be removed (for example temperature sensor readings, heartbeat messages, or other telemetry data that has repeat but independent values). Your best bet is to write a simple intermediate processor that implements your message pruning algorithm of choice and republishes (or not) to another topic that your consumers read from. Its a stateful app because it needs to remember 1 or more past messages but that can be done using the Kafka Streams processor API and the embedded rocksdb state store that comes with Kafka Streams (or as a UDF in KSQL). You can alternatively write your consuming apps to implement similar message pruning functionality themselves and avoid one extra component in the end to end architecture -hans > On Apr 2, 2019, at 7:28 PM, jim.me...@concept-solutions.com > wrote: > > > >> On 2019/04/02 22:43:31, jim.me...@concept-solutions.com >> wrote: >> >> >>> On 2019/04/02 22:25:16, jim.me...@concept-solutions.com >>> wrote: >>> >>> On 2019/04/02 21:59:21, Hans Jespersen wrote: yes. Idempotent publish uses a unique messageID to discard potential duplicate messages caused by failure conditions when publishing. -hans > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com > wrote: > > Does Kafka have something that behaves like a unique key so a producer > can’t write the same value to a topic twice? >>> >>> Hi Hans, >>> >>>Is there some documentation or an example with source code where I can >>> learn more about this feature and how it is implemented? >>> >>> Thanks, >>> Jim >> >> By the way I tried this... >> echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh --broker-list >> localhost:9092 --topic TestTopic --property "parse.key=true" --property >> "key.separator=:" --property "enable.idempotence=true" > /dev/null >> >> And... that didn't seem to do the trick - after running that command >> multiple times I did receive key1 value1 for as many times as I had run the >> prior command. >> >> Maybe it is the way I am setting the flags... >> Recently I saw that someone did this... >> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test >> --producer-property enable.idempotence=true --request-required-acks -1 > > Also... the reason for my question is that we are going to have two JMS > topics with nearly redundant data in them have the UNION written to Kafka for > further processing. >
Re: Something like a unique key to prevent same record from being inserted twice?
I've done this using kafka streams: specifically, I created a processor, and use a keystore (a functionality of streams) to save/check for keys and only forwarding messages that were not in the keystore. Since the keystore is in memory, and backed by the local filesystem on the node the processor is running, you avoid the network lag you'd have using a keystore like cassandra. I think you'll have to use a similar approach to dedupe -- you don't necessarily need to use streams, you can it handle it directly in your consumer, but then you'll have to solve a lot of problems streams already handles ... such as what happens if your node is shutdown or crashes and etc. On Wed, Apr 3, 2019 at 9:22 AM Vincent Maurin wrote: > Hi, > > Idempotence flag will guarantee that the message is produce exactly one > time on the topic i.e that running your command a single time will produce > a single message. > It is not a unique enforcement on the message key, there is no such thing > in Kafka. > > In Kafka, a topic containing the "history" of values for a given key. That > means that a consumer need to consume the whole topic and keep only the > last value for a given key. > So the uniqueness concept is mean to be done on the consumer side. > Additionally to that, Kafka can perform log compaction to keep only the > last value and preserve disk space (but consumers will still receive > duplicates) > > > Best > > On Wed, Apr 3, 2019 at 1:28 AM jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > > > > > > > On 2019/04/02 22:43:31, jim.me...@concept-solutions.com < > > jim.me...@concept-solutions.com> wrote: > > > > > > > > > On 2019/04/02 22:25:16, jim.me...@concept-solutions.com < > > jim.me...@concept-solutions.com> wrote: > > > > > > > > > > > > On 2019/04/02 21:59:21, Hans Jespersen wrote: > > > > > yes. Idempotent publish uses a unique messageID to discard > potential > > duplicate messages caused by failure conditions when publishing. > > > > > > > > > > -hans > > > > > > > > > > > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com < > > jim.me...@concept-solutions.com> wrote: > > > > > > > > > > > > Does Kafka have something that behaves like a unique key so a > > producer can’t write the same value to a topic twice? > > > > > > > > > > > > > Hi Hans, > > > > > > > > Is there some documentation or an example with source code where > I > > can learn more about this feature and how it is implemented? > > > > > > > > Thanks, > > > > Jim > > > > > > > > > > By the way I tried this... > > > echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh > > --broker-list localhost:9092 --topic TestTopic --property > "parse.key=true" > > --property "key.separator=:" --property "enable.idempotence=true" > > > /dev/null > > > > > > And... that didn't seem to do the trick - after running that command > > multiple times I did receive key1 value1 for as many times as I had run > the > > prior command. > > > > > > Maybe it is the way I am setting the flags... > > > Recently I saw that someone did this... > > > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test > > --producer-property enable.idempotence=true --request-required-acks -1 > > > > > > > Also... the reason for my question is that we are going to have two JMS > > topics with nearly redundant data in them have the UNION written to Kafka > > for further processing. > > > > >
Re: Something like a unique key to prevent same record from being inserted twice?
Hi, Idempotence flag will guarantee that the message is produce exactly one time on the topic i.e that running your command a single time will produce a single message. It is not a unique enforcement on the message key, there is no such thing in Kafka. In Kafka, a topic containing the "history" of values for a given key. That means that a consumer need to consume the whole topic and keep only the last value for a given key. So the uniqueness concept is mean to be done on the consumer side. Additionally to that, Kafka can perform log compaction to keep only the last value and preserve disk space (but consumers will still receive duplicates) Best On Wed, Apr 3, 2019 at 1:28 AM jim.me...@concept-solutions.com < jim.me...@concept-solutions.com> wrote: > > > On 2019/04/02 22:43:31, jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > > > > > > On 2019/04/02 22:25:16, jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > > > > > > > > > On 2019/04/02 21:59:21, Hans Jespersen wrote: > > > > yes. Idempotent publish uses a unique messageID to discard potential > duplicate messages caused by failure conditions when publishing. > > > > > > > > -hans > > > > > > > > > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com < > jim.me...@concept-solutions.com> wrote: > > > > > > > > > > Does Kafka have something that behaves like a unique key so a > producer can’t write the same value to a topic twice? > > > > > > > > > > Hi Hans, > > > > > > Is there some documentation or an example with source code where I > can learn more about this feature and how it is implemented? > > > > > > Thanks, > > > Jim > > > > > > > By the way I tried this... > > echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh > --broker-list localhost:9092 --topic TestTopic --property "parse.key=true" > --property "key.separator=:" --property "enable.idempotence=true" > > /dev/null > > > > And... that didn't seem to do the trick - after running that command > multiple times I did receive key1 value1 for as many times as I had run the > prior command. > > > > Maybe it is the way I am setting the flags... > > Recently I saw that someone did this... > > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test > --producer-property enable.idempotence=true --request-required-acks -1 > > > > Also... the reason for my question is that we are going to have two JMS > topics with nearly redundant data in them have the UNION written to Kafka > for further processing. > >
Re: Something like a unique key to prevent same record from being inserted twice?
On 2019/04/02 22:43:31, jim.me...@concept-solutions.com wrote: > > > On 2019/04/02 22:25:16, jim.me...@concept-solutions.com > wrote: > > > > > > On 2019/04/02 21:59:21, Hans Jespersen wrote: > > > yes. Idempotent publish uses a unique messageID to discard potential > > > duplicate messages caused by failure conditions when publishing. > > > > > > -hans > > > > > > > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com > > > > wrote: > > > > > > > > Does Kafka have something that behaves like a unique key so a producer > > > > can’t write the same value to a topic twice? > > > > > > > Hi Hans, > > > > Is there some documentation or an example with source code where I can > > learn more about this feature and how it is implemented? > > > > Thanks, > > Jim > > > > By the way I tried this... > echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh --broker-list > localhost:9092 --topic TestTopic --property "parse.key=true" --property > "key.separator=:" --property "enable.idempotence=true" > /dev/null > > And... that didn't seem to do the trick - after running that command multiple > times I did receive key1 value1 for as many times as I had run the prior > command. > > Maybe it is the way I am setting the flags... > Recently I saw that someone did this... > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test > --producer-property enable.idempotence=true --request-required-acks -1 > Also... the reason for my question is that we are going to have two JMS topics with nearly redundant data in them have the UNION written to Kafka for further processing.
Re: Something like a unique key to prevent same record from being inserted twice?
On 2019/04/02 22:25:16, jim.me...@concept-solutions.com wrote: > > > On 2019/04/02 21:59:21, Hans Jespersen wrote: > > yes. Idempotent publish uses a unique messageID to discard potential > > duplicate messages caused by failure conditions when publishing. > > > > -hans > > > > > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com > > > wrote: > > > > > > Does Kafka have something that behaves like a unique key so a producer > > > can’t write the same value to a topic twice? > > > > Hi Hans, > > Is there some documentation or an example with source code where I can > learn more about this feature and how it is implemented? > > Thanks, > Jim > By the way I tried this... echo "key1:value1" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TestTopic --property "parse.key=true" --property "key.separator=:" --property "enable.idempotence=true" > /dev/null And... that didn't seem to do the trick - after running that command multiple times I did receive key1 value1 for as many times as I had run the prior command. Maybe it is the way I am setting the flags... Recently I saw that someone did this... bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test --producer-property enable.idempotence=true --request-required-acks -1
Re: Something like a unique key to prevent same record from being inserted twice?
On 2019/04/02 21:59:21, Hans Jespersen wrote: > yes. Idempotent publish uses a unique messageID to discard potential > duplicate messages caused by failure conditions when publishing. > > -hans > > > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com > > wrote: > > > > Does Kafka have something that behaves like a unique key so a producer > > can’t write the same value to a topic twice? > Hi Hans, Is there some documentation or an example with source code where I can learn more about this feature and how it is implemented? Thanks, Jim
Re: Something like a unique key to prevent same record from being inserted twice?
yes. Idempotent publish uses a unique messageID to discard potential duplicate messages caused by failure conditions when publishing. -hans > On Apr 1, 2019, at 9:49 PM, jim.me...@concept-solutions.com > wrote: > > Does Kafka have something that behaves like a unique key so a producer can’t > write the same value to a topic twice?
Something like a unique key to prevent same record from being inserted twice?
Does Kafka have something that behaves like a unique key so a producer can’t write the same value to a topic twice?