Mirror Maker2 Event filter capability from topic before replication

2021-05-07 Thread Anup Tiwari
Hi Team,

I have a topic which contains JSON data and multiple user click events. I
just wanted to know if we can filter out events based on keys of JSON
before replicating it to some other kafka.

Regards,
Anup Tiwari


Re: Mirror Maker2 Event filter capability from topic before replication

2021-05-07 Thread Tom Bentley
Hi Anup,

This should be possible using the Filter SMT in Kafka Connect together with
a custom Predicate which you'd have to write yourself and provide on the
plugin path.

See http://kafka.apache.org/documentation.html#connect_predicates and
https://rmoff.net/2020/12/22/twelve-days-of-smt-day-11-predicate-and-filter/
for info about Filter and predicates, and
https://stackoverflow.com/questions/61742209/smt-does-not-work-when-added-to-connect-mirror-maker-properties-mirrormaker-2
for a MM2-specific example of SMTs

Kind regards,

Tom

On Fri, May 7, 2021 at 8:49 AM Anup Tiwari  wrote:

> Hi Team,
>
> I have a topic which contains JSON data and multiple user click events. I
> just wanted to know if we can filter out events based on keys of JSON
> before replicating it to some other kafka.
>
> Regards,
> Anup Tiwari
>


Re: Mirror Maker2 Event filter capability from topic before replication

2021-05-07 Thread Anup Tiwari
Hi Tom,

Thanks for quick reply. As per your comments, it seems we will have to
build something of our own together to work with MM2.
So just wanted to confirm that it is not an inbuilt feature.. right? Like
confluent replicator provide this.
Also are we planning to add this in near future?


On Fri, 7 May 2021 13:33 Tom Bentley,  wrote:

> Hi Anup,
>
> This should be possible using the Filter SMT in Kafka Connect together with
> a custom Predicate which you'd have to write yourself and provide on the
> plugin path.
>
> See http://kafka.apache.org/documentation.html#connect_predicates and
>
> https://rmoff.net/2020/12/22/twelve-days-of-smt-day-11-predicate-and-filter/
> for info about Filter and predicates, and
>
> https://stackoverflow.com/questions/61742209/smt-does-not-work-when-added-to-connect-mirror-maker-properties-mirrormaker-2
> for a MM2-specific example of SMTs
>
> Kind regards,
>
> Tom
>
> On Fri, May 7, 2021 at 8:49 AM Anup Tiwari  wrote:
>
> > Hi Team,
> >
> > I have a topic which contains JSON data and multiple user click events. I
> > just wanted to know if we can filter out events based on keys of JSON
> > before replicating it to some other kafka.
> >
> > Regards,
> > Anup Tiwari
> >
>


Re: Mirror Maker2 Event filter capability from topic before replication

2021-05-07 Thread Tom Bentley
Just to be clear: It's only the Predicate implementation for filtering on a
JSON key that you'd need to write. MM2 already supports SMTs and
predicates.

Alternatively
https://rmoff.net/2020/12/22/twelve-days-of-smt-day-11-predicate-and-filter/#_filtering_based_on_the_contents_of_a_message
might work for you, but it's not an Apache Kafka feature currently. I'm not
aware that anyone is currently planning on contributing a JSONPath-based
Predicate to Apache Kafka.

On Fri, May 7, 2021 at 10:06 AM Anup Tiwari  wrote:

> Hi Tom,
>
> Thanks for quick reply. As per your comments, it seems we will have to
> build something of our own together to work with MM2.
> So just wanted to confirm that it is not an inbuilt feature.. right? Like
> confluent replicator provide this.
> Also are we planning to add this in near future?
>
>
> On Fri, 7 May 2021 13:33 Tom Bentley,  wrote:
>
> > Hi Anup,
> >
> > This should be possible using the Filter SMT in Kafka Connect together
> with
> > a custom Predicate which you'd have to write yourself and provide on the
> > plugin path.
> >
> > See http://kafka.apache.org/documentation.html#connect_predicates and
> >
> >
> https://rmoff.net/2020/12/22/twelve-days-of-smt-day-11-predicate-and-filter/
> > for info about Filter and predicates, and
> >
> >
> https://stackoverflow.com/questions/61742209/smt-does-not-work-when-added-to-connect-mirror-maker-properties-mirrormaker-2
> > for a MM2-specific example of SMTs
> >
> > Kind regards,
> >
> > Tom
> >
> > On Fri, May 7, 2021 at 8:49 AM Anup Tiwari 
> wrote:
> >
> > > Hi Team,
> > >
> > > I have a topic which contains JSON data and multiple user click
> events. I
> > > just wanted to know if we can filter out events based on keys of JSON
> > > before replicating it to some other kafka.
> > >
> > > Regards,
> > > Anup Tiwari
> > >
> >
>


Re: Mirror Maker2 Event filter capability from topic before replication

2021-05-07 Thread Robin Moffatt
There's also this
https://forum.confluent.io/t/kafka-connect-jmespath-expressive-content-based-record-filtering/1104
which is worth checking out - I came across it after recording that video
unfortunately but it looks like a useful predicate implementation.


-- 

Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | @rmoff


On Fri, 7 May 2021 at 11:30, Tom Bentley  wrote:

> Just to be clear: It's only the Predicate implementation for filtering on a
> JSON key that you'd need to write. MM2 already supports SMTs and
> predicates.
>
> Alternatively
>
> https://rmoff.net/2020/12/22/twelve-days-of-smt-day-11-predicate-and-filter/#_filtering_based_on_the_contents_of_a_message
> might work for you, but it's not an Apache Kafka feature currently. I'm not
> aware that anyone is currently planning on contributing a JSONPath-based
> Predicate to Apache Kafka.
>
> On Fri, May 7, 2021 at 10:06 AM Anup Tiwari 
> wrote:
>
> > Hi Tom,
> >
> > Thanks for quick reply. As per your comments, it seems we will have to
> > build something of our own together to work with MM2.
> > So just wanted to confirm that it is not an inbuilt feature.. right? Like
> > confluent replicator provide this.
> > Also are we planning to add this in near future?
> >
> >
> > On Fri, 7 May 2021 13:33 Tom Bentley,  wrote:
> >
> > > Hi Anup,
> > >
> > > This should be possible using the Filter SMT in Kafka Connect together
> > with
> > > a custom Predicate which you'd have to write yourself and provide on
> the
> > > plugin path.
> > >
> > > See http://kafka.apache.org/documentation.html#connect_predicates and
> > >
> > >
> >
> https://rmoff.net/2020/12/22/twelve-days-of-smt-day-11-predicate-and-filter/
> > > for info about Filter and predicates, and
> > >
> > >
> >
> https://stackoverflow.com/questions/61742209/smt-does-not-work-when-added-to-connect-mirror-maker-properties-mirrormaker-2
> > > for a MM2-specific example of SMTs
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > On Fri, May 7, 2021 at 8:49 AM Anup Tiwari 
> > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > I have a topic which contains JSON data and multiple user click
> > events. I
> > > > just wanted to know if we can filter out events based on keys of JSON
> > > > before replicating it to some other kafka.
> > > >
> > > > Regards,
> > > > Anup Tiwari
> > > >
> > >
> >
>


Re: KafkaStreams aggregation with multiple instance

2021-05-07 Thread Alex Craig
1.  The aggregation is done based on the key to the message.  So for a
silly example, if your messages were data about new car sales and you
wanted to count how many cars sold by color, you could consume the messages
and then "re-key" them so that the key to the message was the color.  Then
later in your streams topology, you would aggregate (count) based on that
new key.  Because kafka will guarantee that the same key will always wind
up in the same partition, you won't have a scenario where messages with the
key "red" will end up being consumed by more than 1 instance.  "Red" might
always be getting consumed/aggregated on instance A, "blue" on instance B,
etc etc.

2.  You can use other data stores as state stores and the documentation
describes how to do this, however my opinion is that unless you can a good
reason to NOT use RocksDB, I would use RocksDB - especially to start with.

Hope that helps!

Alex

On Fri, May 7, 2021 at 12:59 AM Pietro Galassi 
wrote:

> Hi Neeraj,
>
> 1) I have multiple instance reading from orderTopic and using aggregate
> (sum). So if instance A reads and do a +1 and instance B reads and do a +1
> at the same time can i have wrong count numbers (some +1 may be lost ?).
> Yes i'm using messageKeys and multiple partitions.
>
> 2) What state store can i use ? I'm actually using spring kafka and it
> relays on RockDB it seems.
>
> Regards,
> Pietro
>
> On Fri, May 7, 2021 at 12:39 AM Neeraj Vaidya
>  wrote:
>
> >  Hi Pietro,
> > 1) What do you mean by problems in counts due to multiple instances ?
> > Also, do you use Keys in your messages ?
> > 2) If you want to maintain state and refer to that state when processing
> > each message, then yes you will need a state store. A state store will
> also
> > be needed if you want to I guess query that state externally.
> >
> > Regards,
> > Neeraj
> >
> >
> >  On Friday, 7 May, 2021, 01:47:59 am GMT+10, Pietro Galassi <
> > pietro.gala...@gmail.com> wrote:
> >
> >  Hi all,
> > hi have hope you can help me figure out this scenario.
> >
> > I have a multiinstance microservice that consumes from a topic
> > (ordersTopic) all of them use the same consumer_group.
> >
> > This microservice uses a KStream to aggregate (sum) topic events and
> > produces results on another topic (countTopic).
> >
> > Have two questions:
> >
> > 1) Can i have problems on counts due to multiple instance of the same
> > microservies ?
> > 2) I need rockDB and materialized view in order to store data ?
> >
> > Thanks a lot.
> > Regards,
> > Pietro Galassi
> >
>


Multiple producers using same message key

2021-05-07 Thread Neeraj Vaidya
Hi all,
I think I kind of know the answer but wanted to confirm.
If I have multiple producers sending messages with the same key, will they end 
up in the same partition (assuming I am using the default partitioner) ?

Regards,
Neeraj

Sent from my iPhone


Re: State Store Data Retention

2021-05-07 Thread Navneeth Krishnan
Hi Bruno/All,

I have a follow up question regarding the same topic. As per you had
mentioned there will be no impact to key value stores even when retention.ms
and clean up policy is provided. Does that mean the change log topic will
not clear the data in the broker even after the retention period is over?

I agree the local state stores will not be able to delete the data but when
there is any reallocation then the state replay would just have to replay
the data for the given retention time. Is this understanding correct?

Thanks

On Mon, Apr 19, 2021 at 1:57 AM Bruno Cadonna  wrote:

> Hi Upesh,
>
> The answers to your questions are:
>
> 1.
> The configs cleanup.policy and retention.ms are topic configs. Hence,
> they only affect the changelog of a state store, not the local state
> store in a Kafka Streams client.
>
> Locally, window and session stores remove data they do not need anymore.
> Window and session stores are segmented stores. That means they consist
> of segments that are ordered by the windows they contain. Once the
> segment that contains the oldest windows is not needed anymore, i.e.,
> the data exceeded the retention time of the state store, the segment is
> removed.
>
> Non-windowed state store will not remove data.
>
> Worth noting here: If you change retention.ms directly on the brokers,
> it will not affect the behavior of local state stores.
>
> 2.
> Yes, this behavior is the same for in-memory state stores and persistent
> state stores.
>
> 3.
> Window and session state stores do remove data.
>
>
> Best,
> Bruno
>
>
>
> On 18.04.21 18:18, Upesh Desai wrote:
> > Hello, I have not been able to find a concrete answer on if/how state
> > stores on a running kafka streams instance remove data when it has
> > passed the configured retention.ms config. So a couple clarification
> > questions:
> >
> >  1. If the stores are configured with: cleanup.policy=compact,delete AND
> > retention.ms=N, will the stores remove data automatically over time
> > in the running stream instance stores?
> >  2. Is this behavior the same for in-memory stores and persistent
> > rocksdb stores?
> >  3. If they do not remove data that has passed the retention.ms period,
> > is there a different way to periodically remove old data from the
> > stores?
> >
> > I’m using kafka 2.7.0 components across the board (broker, connect,
> etc.).
> >
> > Thanks in advance,
> > Upesh
> >
> > 
> >
> >
> > Upesh Desai​
> > Senior Software Developer
> >
> > *ude...@itrsgroup.com* 
> > *www.itrsgroup.com* 
> >
> > Internet communications are not secure and therefore the ITRS Group does
> > not accept legal responsibility for the contents of this message. Any
> > view or opinions presented are solely those of the author and do not
> > necessarily represent those of the ITRS Group unless otherwise
> > specifically stated.
> >
> > [itrs.email.signature]
> >
> >
> >
> > *Disclaimer*
> >
> > The information contained in this communication from the sender is
> > confidential. It is intended solely for use by the recipient and others
> > authorized to receive it. If you are not the recipient, you are hereby
> > notified that any disclosure, copying, distribution or taking action in
> > relation of the contents of this information is strictly prohibited and
> > may be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> > automatically archived by *Mimecast Ltd*, an innovator in Software as a
> > Service (SaaS) for business. Providing a *safer* and *more useful* place
> > for your human generated data. Specializing in; Security, archiving and
> > compliance.
> >
>


Re: Multiple producers using same message key

2021-05-07 Thread sunil chaudhari
Hi Neeraj,
I dont think there is relation of key and the partition in that sense..



On Sat, 8 May 2021 at 3:16 AM, Neeraj Vaidya
 wrote:

> Hi all,
> I think I kind of know the answer but wanted to confirm.
> If I have multiple producers sending messages with the same key, will they
> end up in the same partition (assuming I am using the default partitioner) ?
>
> Regards,
> Neeraj
>
> Sent from my iPhone
>


The most appropriate version in the production

2021-05-07 Thread luoc
Hi all,
  Could you please tell me which version is the most appropriate in the 
production? There are our several requirements :
  1. Most stable
  2. More latest
  3. Easy to upgrade
  4. Support the 500GB every day in the production (not the test).
  5. 2.2.2 or 2.6.2 and more?

Thanks for your time.

- Luo

Re: Multiple producers using same message key

2021-05-07 Thread Daniel Hinojosa
Yes, murmur2(key) % number of partitions.

On Fri, May 7, 2021 at 7:24 PM sunil chaudhari 
wrote:

> Hi Neeraj,
> I dont think there is relation of key and the partition in that sense..
>
>
>
> On Sat, 8 May 2021 at 3:16 AM, Neeraj Vaidya
>  wrote:
>
> > Hi all,
> > I think I kind of know the answer but wanted to confirm.
> > If I have multiple producers sending messages with the same key, will
> they
> > end up in the same partition (assuming I am using the default
> partitioner) ?
> >
> > Regards,
> > Neeraj
> >
> > Sent from my iPhone
> >
>