[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783173#comment-17783173 ] Murari Goswami commented on KAFKA-6020: --- Valid point. I vote for this feature to be picked up as many will get benefit for this. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783143#comment-17783143 ] Vincent Bernardi commented on KAFKA-6020: - [~murari.it] note that this could be restricted to deserialization of headers and not of the payload, and maybe also to standard deserializers (Byte [], String, JSON), if it helps lighten the load on the broker. Basically _any_ compromise would be acceptable tu us to spare the bandwidth. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782515#comment-17782515 ] Murari Goswami commented on KAFKA-6020: --- We are also suffered with moving expansive amount of event data at consumer side and then throw away the majority of the records as one of the consumer is interested in specific data. A broker side filter will help a lot in reducing network load in moving these unwanted data. So if this can be a part of feature in kafka in next release will be of great help. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770509#comment-17770509 ] Alexander Grzesik commented on KAFKA-6020: -- It would definitly help for several of our use cases if a consumer could define based on some header filters which messages it consumes. Currently we either use consumer side filtering with the increased bandwith needs and also some security concerns or use the streaming API to split general topics in more specific ones but that increases for some cases drastically the amount of topics. We would be very happy if the proposed filter feature could become part of the Kafka core. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768362#comment-17768362 ] Barak commented on KAFKA-6020: -- Same comment as [~Enzo90910] - Broker side filtering would help a lot in multiple use-cases. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768258#comment-17768258 ] Vincent Bernardi commented on KAFKA-6020: - My company is currently wasting a non-negligible amount of BW resources with several consumer groups reading a whole topic only to process 1% of messages. Now I’m guessing resource wasting impacts the community very inequally with some providers actually benefiting from it, which may explain why this issue hasn’t gained more traction. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768241#comment-17768241 ] Fiachra Corcoran commented on KAFKA-6020: - Looking to revive this topic to gauge the level of interest. Is this something that the community would see as beneficial? See also https://issues.apache.org/jira/browse/KAFKA-10280 > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580330#comment-17580330 ] Vincent Bernardi commented on KAFKA-6020: - >From my PoV, it's quite clear the right way to do this would be to enable >limited-CPU usage filters (i.e. either exact string match, substring, or at >the most not-extended regex match) on message headers. Hope this is >considered for a future KIP. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17536490#comment-17536490 ] flavio livide commented on KAFKA-6020: -- An interesting extension of this feature would be enabling ABAC. Imagine the broker is capable of taking a header based filter. The same type of filter could be specified in an ACL, enabling a client to "see" only messages with certain header fields. At the moment the only way to achieve this is having producers to split the data in different topics - potentially duplicating a large amount of it. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419571#comment-17419571 ] King Jin commented on KAFKA-6020: - How about broker side filtering by Kafka Headers? The consumer sent filtering header when request more messages from broker, the broker filtering message by header before response message back to consumer. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248887#comment-17248887 ] flavio livide commented on KAFKA-6020: -- We have use cases where some consumers need the whole topic and others only a small subset. The set up is quite dynamic so setting up topics on purpose for consumers becomes quite complicated to manage. A very simple form of broker side filter would make a big improvement. Could be a key prefix, header based, or as brutal as letting a producer set a number 1 to 1024 and use a bitmask to filter for clients. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026910#comment-17026910 ] DEAN JAIN commented on KAFKA-6020: -- its been almost an year and we are looking forward to this feature, any updates, just let us know even if there is any plan in near future for this ??? > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786242#comment-16786242 ] Yiming Zang commented on KAFKA-6020: Any updates for this? We have smilier needs on our side, strongly support this idea on broker-side filtering. Our use case comes from N-DC replication. Basically imagine if you have 5 data centers and you need to replicate data to everywhere, typically you'll have to run N*(N-1) which is 20 mirror-maker jobs in order replicate messages in each local data center to all remote data centers. Each mirror maker will have to read the whole 5 copies of events, do some processing and only replicate one fifth of the events. This is a huge waste of network bandwidth and cpu resources. If we can have a way to pre filter the events on broker side, mirror maker doesn't need to read all 5 copies of events any more, which can be a huge amount of savings when we have even more data centers in the future. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597415#comment-16597415 ] Geoffray Adde commented on KAFKA-6020: -- Hello. I strongly support the idea on message filtering on the broker side. I am using kafka as a message streaming system. I am not committing anything and I am not even keeping track of the offsets. On the other hands, I have to deal with a vast variety of distinct objects. Hundreds of thousands, up to millions. Obviously, I cannot have that many topics, but with broker side filtering, I could get get some sort of sub-topics. To me, the idea would be, on a fetch request to send a mask to test a custom header. If either the custom header is not present or if it does not match the mask, the message is not sent as part of the reply to the fetch request. The concept seems simple but I have no clue how much work it is to implement. > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: Pavel Micka >Priority: Major > Labels: needs-kip > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6020) Broker side filtering
[ https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195223#comment-16195223 ] Ted Yu commented on KAFKA-6020: --- This needs a KIP, right ? > Broker side filtering > - > > Key: KAFKA-6020 > URL: https://issues.apache.org/jira/browse/KAFKA-6020 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Pavel Micka > > Currently, it is not possible to filter messages on broker side. Filtering > messages on broker side is convenient for filter with very low selectivity > (one message in few thousands). In my case it means to transfer several GB of > data to consumer, throw it away, take one message and do it again... > While I understand that filtering by message body is not feasible (for > performance reasons), I propose to filter just by message key prefix. This > can be achieved even without any deserialization, as the prefix to be matched > can be passed as an array (hence the broker would do just array prefix > compare). -- This message was sent by Atlassian JIRA (v6.4.14#64029)