[jira] [Commented] (KAFKA-10448) Preserve Source Partition in Kafka Streams from context
[ https://issues.apache.org/jira/browse/KAFKA-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189790#comment-17189790 ] satya commented on KAFKA-10448: --- Thank you. I submitted the KIP-669. Please let me know if you have any suggestions/comments > Preserve Source Partition in Kafka Streams from context > --- > > Key: KAFKA-10448 > URL: https://issues.apache.org/jira/browse/KAFKA-10448 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.5.0 >Reporter: satya >Priority: Minor > Labels: needs-kip > > Currently Kafka streams Sink Nodes use default partitioner or has the > provision of using a custom partitioner which has to be dependent on > key/value. I am looking for an enhancement of Sink Node to ensure source > partition is preserved instead of deriving the partition again using > key/value. One of our use case has producers which have custom partitioners > that we dont have access to as it is a third-party application. By simply > preserving the partition through context.partition() would be helpful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10448) Preserve Source Partition in Kafka Streams from context
[ https://issues.apache.org/jira/browse/KAFKA-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188447#comment-17188447 ] satya commented on KAFKA-10448: --- [~mjsax] I would like to submit a request for KIP as you suggested. Need some guidance please. I have requested for access . I am newbie over here, but would like to contribute in anyway possible > Preserve Source Partition in Kafka Streams from context > --- > > Key: KAFKA-10448 > URL: https://issues.apache.org/jira/browse/KAFKA-10448 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.5.0 >Reporter: satya >Priority: Minor > Labels: needs-kip > > Currently Kafka streams Sink Nodes use default partitioner or has the > provision of using a custom partitioner which has to be dependent on > key/value. I am looking for an enhancement of Sink Node to ensure source > partition is preserved instead of deriving the partition again using > key/value. One of our use case has producers which have custom partitioners > that we dont have access to as it is a third-party application. By simply > preserving the partition through context.partition() would be helpful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-10448) Preserve Source Partition in Kafka Streams from context
[ https://issues.apache.org/jira/browse/KAFKA-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satya updated KAFKA-10448: -- Description: Currently Kafka streams Sink Nodes use default partitioner or has the provision of using a custom partitioner which has to be dependent on key/value. I am looking for an enhancement of Sink Node to ensure source partition is preserved instead of deriving the partition again using key/value. One of our use case has producers which have custom partitioners that we dont have access to as it is a third-party application. By simply preserving the partition through context.partition() would be helpful. (was: Currently Kafka streams Sink Nodes use default partitioner or has the provision of using a custom partitioner which has to be dependent on key/value. I am looking for an enhancement of Sink Node to ensure source partition is preserved instead of deriving the partition again using key/value. One of our use case has producers which have customer partitioners that we dont have access to as it is a third-party application. By simply preserving the partition through context.partition() would be helpful.) > Preserve Source Partition in Kafka Streams from context > --- > > Key: KAFKA-10448 > URL: https://issues.apache.org/jira/browse/KAFKA-10448 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.5.0 >Reporter: satya >Priority: Critical > > Currently Kafka streams Sink Nodes use default partitioner or has the > provision of using a custom partitioner which has to be dependent on > key/value. I am looking for an enhancement of Sink Node to ensure source > partition is preserved instead of deriving the partition again using > key/value. One of our use case has producers which have custom partitioners > that we dont have access to as it is a third-party application. By simply > preserving the partition through context.partition() would be helpful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10448) Preserve Source Partition in Kafka Streams from context
satya created KAFKA-10448: - Summary: Preserve Source Partition in Kafka Streams from context Key: KAFKA-10448 URL: https://issues.apache.org/jira/browse/KAFKA-10448 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 2.5.0 Reporter: satya Currently Kafka streams Sink Nodes use default partitioner or has the provision of using a custom partitioner which has to be dependent on key/value. I am looking for an enhancement of Sink Node to ensure source partition is preserved instead of deriving the partition again using key/value. One of our use case has producers which have customer partitioners that we dont have access to as it is a third-party application. By simply preserving the partition through context.partition() would be helpful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-7644) Worker Re balance enhancements
satya created KAFKA-7644: Summary: Worker Re balance enhancements Key: KAFKA-7644 URL: https://issues.apache.org/jira/browse/KAFKA-7644 Project: Kafka Issue Type: Improvement Components: KafkaConnect Reporter: satya Currently Kafka Connect distributed worker triggers a re balance any time there is a new connector/task is added irrespective of whether the connector added is a source connector or sink connector. My understanding has been the worker re balance should be identical to consumer group re balance. That said, should not source connectors be immune to the re balance ? Are we not supposed to use source connectors with distributed workers ? It does appear to me there is some caveat in the way the worker re balance is working and it needs enhancement to not trigger unwanted re balances causing restarts of all tasks etc. Kafka connectors should have a way to not restart and stay with existing partition assignment if the re balance trigger is related to a different connector -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7642) Kafka Connect - graceful shutdown of distributed worker
satya created KAFKA-7642: Summary: Kafka Connect - graceful shutdown of distributed worker Key: KAFKA-7642 URL: https://issues.apache.org/jira/browse/KAFKA-7642 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: satya Currently i dont find any ability to gracefully shutdown a distributed worker other than killing the process . Could you a shutdown option to the workers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7643) Connectors do not unload even when tasks fail in kafka connect
satya created KAFKA-7643: Summary: Connectors do not unload even when tasks fail in kafka connect Key: KAFKA-7643 URL: https://issues.apache.org/jira/browse/KAFKA-7643 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: satya If there any issues in the tasks associated with a connector, even though the tasks fail, i have seen the connector themselves is not released many times. The only option out of this is like submitting an explicit request to delete the connector. There should be a way to shut down the connectors gracefully, if there are exceptions encountered in the task that are not retriable. In addition, Kafka connect also does not have a graceful exit option. There will be situations like server maintenance, outages etc where it would be prudent to gracefully shutdown the connectors rather than performing a DELETE through the REST endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3726) Enable cold storage option
[ https://issues.apache.org/jira/browse/KAFKA-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173020#comment-16173020 ] Satya commented on KAFKA-3726: -- idle approach would be using Kafka Connector HDFS Source/SInk to take backup of kafka segment file. which can be replay when it required > Enable cold storage option > -- > > Key: KAFKA-3726 > URL: https://issues.apache.org/jira/browse/KAFKA-3726 > Project: Kafka > Issue Type: Wish >Reporter: Radoslaw Gruchalski > Attachments: kafka-cold-storage.txt > > > This JIRA builds up on the cold storage article I have published on Medium. > The copy of the article attached here. > The need for cold storage or an "indefinite" log seems to be quite often > discussed on the user mailing list. > The cold storage idea would enable the opportunity for the operator to keep > the raw Kafka offset files in a third party storage and allow retrieving the > data back for re-consumption. > The two possible options for enabling such functionality are, from the > article: > First approach: if Kafka provided a notification mechanism and could trigger > a program when a segment file is to be discarded, it would become feasible to > provide a standard method of moving data to cold storage in reaction to those > events. Once the program finishes backing the segments up, it could tell > Kafka “it is now safe to delete these segments”. > The second option is to provide an additional value for the > log.cleanup.policy setting, call it cold-storage. In case of this value, > Kafka would move the segment files — which otherwise would be deleted — to > another destination on the server. They can be picked up from there and moved > to the cold storage. > Both have their limitations. The former one is simply a mechanism exposed to > allow operator building up the tooling necessary to enable this. Events could > be published in a manner similar to Mesos Event Bus > (https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself > could provide a control topic on which such info would be published. The > outcome is, the operator can subscribe to the event bus and get notified > about, at least, two events: > - log segment is complete and can be backed up > - partition leader changed > These two, together with an option to keep the log segment safe from > compaction for a certain amount of time, would be sufficient to reliably > implement cold storage. > The latter option, {{log.cleanup.policy}} setting would be more complete > feature but it is also much more difficult to implement. All brokers would > have keep the backup of the data in the cold storage significantly increasing > the size requirements, also, the de-duplication of the data for the > replicated data would be left completely to the operator. > In any case, the thing to stay away from is having Kafka to deal with the > physical aspect of moving the data to and back from the cold storage. This is > not Kafka's task. The intent is to provide a method for reliable cold storage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)