[jira] [Created] (KAFKA-5478) TransactionsTest.testFencingOnAddPartitions Timeout
Jason Gustafson created KAFKA-5478: -- Summary: TransactionsTest.testFencingOnAddPartitions Timeout Key: KAFKA-5478 URL: https://issues.apache.org/jira/browse/KAFKA-5478 Project: Kafka Issue Type: Sub-task Reporter: Jason Gustafson Assignee: Jason Gustafson Spotted here: https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/5468/console. {code} kafka.api.TransactionsTest > testFencingOnAddPartitions STARTED Build timed out (after 180 minutes). Marking the build as aborted. Build was aborted {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5472) Connector validate REST endpoint returning duplicate entries in "groups"
[ https://issues.apache.org/jira/browse/KAFKA-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch updated KAFKA-5472: - Fix Version/s: 0.11.0.0 > Connector validate REST endpoint returning duplicate entries in "groups" > > > Key: KAFKA-5472 > URL: https://issues.apache.org/jira/browse/KAFKA-5472 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.11.0.0 >Reporter: Randall Hauch >Assignee: Randall Hauch >Priority: Blocker > Fix For: 0.11.0.0 > > > When calling the REST endpoint to validate a connector configuration, the > response includes duplicate values for the {{groups}} field: > {code:javascript} > { > // ... > groups: ["Common", "Transforms", "Common", "Transforms"] > } > {code} > This is a regression and may cause issues for any tooling that is using the > Kafka Connect REST API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5472) Connector validate REST endpoint returning duplicate entries in "groups"
[ https://issues.apache.org/jira/browse/KAFKA-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055019#comment-16055019 ] ASF GitHub Bot commented on KAFKA-5472: --- GitHub user rhauch opened a pull request: https://github.com/apache/kafka/pull/3379 KAFKA-5472 Eliminated duplicate group names when validating connector results Kafka Connect was adding duplicate group names in the response from the REST API's validation of connector configurations. This fixes the duplicates and maintains the order of the `ConfigDef` objects so that the `ConfigValue` results are in the same order. This is a blocker and should be merged to 0.11.0. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rhauch/kafka KAFKA-5472 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3379.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3379 commit d8e6168584cab68ff0844b6d18d8cdb79792427b Author: Randall Hauch Date: 2017-06-20T00:52:28Z KAFKA-5472 Eliminated duplicate group names when validating connector results > Connector validate REST endpoint returning duplicate entries in "groups" > > > Key: KAFKA-5472 > URL: https://issues.apache.org/jira/browse/KAFKA-5472 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.11.0.0 >Reporter: Randall Hauch >Assignee: Randall Hauch >Priority: Blocker > > When calling the REST endpoint to validate a connector configuration, the > response includes duplicate values for the {{groups}} field: > {code:javascript} > { > // ... > groups: ["Common", "Transforms", "Common", "Transforms"] > } > {code} > This is a regression and may cause issues for any tooling that is using the > Kafka Connect REST API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5477) TransactionalProducer sleeps unnecessarily long during back to back transactions
[ https://issues.apache.org/jira/browse/KAFKA-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054989#comment-16054989 ] ASF GitHub Bot commented on KAFKA-5477: --- GitHub user apurvam opened a pull request: https://github.com/apache/kafka/pull/3377 KAFKA-5477: Lower retryBackoff for AddPartitionsRequest This patch lowers the retry backoff when receiving a CONCURRENT_TRANSACTIONS error from an AddPartitions request. The default of 100ms would mean that back to back transactions would be 100ms long at minimum, making things to slow. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apurvam/kafka HOTFIX-lower-retry-for-add-partitions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3377.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3377 commit 0d676688e7ed9a8d63189eb704143e62752707cc Author: Apurva Mehta Date: 2017-06-20T00:36:28Z Lower retryBackoff when receiving a CONCURRENT_TRANSACTIONS error from an AddPartitions request > TransactionalProducer sleeps unnecessarily long during back to back > transactions > > > Key: KAFKA-5477 > URL: https://issues.apache.org/jira/browse/KAFKA-5477 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Apurva Mehta >Assignee: Apurva Mehta > > I am running some perf tests for EOS and there is a severe perf impact with > our default configs. > Here is the issue. > # When we do a commit transaction, the producer sends an `EndTxn` request to > the coordinator. The coordinator writes the `PrepareCommit` message to the > transaction log and then returns the response the client. It writes the > transaction markers and the final 'CompleteCommit' message asynchronously. > # In the mean time, if the client starts another transaction, it will send an > `AddPartitions` request on the next `Sender.run` loop. If the markers haven't > been written yet, then the coordinator will return a retriable > `CONCURRENT_TRANSACTIONS` error to the client. > # The current behavior in the producer is to sleep for `retryBackoffMs` > before retrying the request. The current default for this is 100ms. So the > producer will sleep for 100ms before sending the `AddPartitions` again. This > puts a floor on the latency for back to back transactions. > The impact: Back to back transactions (the typical usecase for streams) would > have a latency floor of 100ms. > Ideally, we don't want to sleep the full 100ms in this particular case, > because the retry is 'expected'. > The options are: > # do nothing, let streams override the retry.backoff.ms in their producer to > 10 when EOS is enabled (since they have a HOTFIX patch out anyway). > # Introduce a special 'transactionRetryBackoffMs' non-configurable variable > and hard code that to a low value which applies to all transactional requests. > # do nothing and fix it properly in 0.11.0.1 > Option 2 as stated is a 1 line fix. If we want to lower the retry just for > this particular error, it would be a slightly bigger change (10-15 lines). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5477) TransactionalProducer sleeps unnecessarily long during back to back transactions
[ https://issues.apache.org/jira/browse/KAFKA-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apurva Mehta updated KAFKA-5477: Description: I am running some perf tests for EOS and there is a severe perf impact with our default configs. Here is the issue. # When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously. # In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client. # The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for back to back transactions. The impact: Back to back transactions (the typical usecase for streams) would have a latency floor of 100ms. Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'. The options are: # do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway). # Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests. # do nothing and fix it properly in 0.11.0.1 Option 2 as stated is a 1 line fix. If we want to lower the retry just for this particular error, it would be a slightly bigger change (10-15 lines). was: I am running some perf tests for EOS and there is a severe perf impact with our default configs. Here is the issue. # When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously. # In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client. # The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for back to back transactions. Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'. The options are: # do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway). # Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests. # do nothing and fix it properly in 0.11.0.1 > TransactionalProducer sleeps unnecessarily long during back to back > transactions > > > Key: KAFKA-5477 > URL: https://issues.apache.org/jira/browse/KAFKA-5477 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Apurva Mehta >Assignee: Apurva Mehta > > I am running some perf tests for EOS and there is a severe perf impact with > our default configs. > Here is the issue. > # When we do a commit transaction, the producer sends an `EndTxn` request to > the coordinator. The coordinator writes the `PrepareCommit` message to the > transaction log and then returns the response the client. It writes the > transaction markers and the final 'CompleteCommit' message asynchronously. > # In the mean time, if the client starts another transaction, it will send an > `AddPartitions` request on the next `Sender.run` loop. If the markers haven't > been written yet, then the coordinator will return a retriable > `CONCURRENT_TRANSACTIONS` error to the client. > # The current behavior in the producer is to sleep for `retryBackoffMs` > before retrying the request. The current default for this is 100ms. So the > producer will sleep for 100ms before sending the `AddPartitions` again. This > puts a floor on the latency for back to back transactions. > The impact: Back to back transactions (the typical usecase for streams) would > have a latency floor of 100ms. > Ideally, we don't want to sleep the full 100ms
[jira] [Updated] (KAFKA-5477) TransactionalProducer sleeps unnecessarily long during back to back transactions
[ https://issues.apache.org/jira/browse/KAFKA-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apurva Mehta updated KAFKA-5477: Description: I am running some perf tests for EOS and there is a severe perf impact with our default configs. Here is the issue. # When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously. # In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client. # The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for back to back transactions. Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'. The options are: # do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway). # Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests. # do nothing and fix it properly in 0.11.0.1 was: I am running some perf tests for EOS and there is a severe perf impact with our default configs. Here is the issue. # When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously. # In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client. # The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for transactions. Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'. The options are: # do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway). # Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests. # do nothing and fix it properly in 0.11.0.1 > TransactionalProducer sleeps unnecessarily long during back to back > transactions > > > Key: KAFKA-5477 > URL: https://issues.apache.org/jira/browse/KAFKA-5477 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Apurva Mehta >Assignee: Apurva Mehta > > I am running some perf tests for EOS and there is a severe perf impact with > our default configs. > Here is the issue. > # When we do a commit transaction, the producer sends an `EndTxn` request to > the coordinator. The coordinator writes the `PrepareCommit` message to the > transaction log and then returns the response the client. It writes the > transaction markers and the final 'CompleteCommit' message asynchronously. > # In the mean time, if the client starts another transaction, it will send an > `AddPartitions` request on the next `Sender.run` loop. If the markers haven't > been written yet, then the coordinator will return a retriable > `CONCURRENT_TRANSACTIONS` error to the client. > # The current behavior in the producer is to sleep for `retryBackoffMs` > before retrying the request. The current default for this is 100ms. So the > producer will sleep for 100ms before sending the `AddPartitions` again. This > puts a floor on the latency for back to back transactions. > Ideally, we don't want to sleep the full 100ms in this particular case, > because the retry is 'expected'. > The options are: > # do nothing, let streams override the retry.backoff.ms in their producer to > 10 when EOS is enabled (since they have a HOTFIX patch out anyway). > # Introduce a special 'transactionRetryBackoffMs' non-configurable variable > and hard code that to a low value which applies to all transactional requests
[jira] [Created] (KAFKA-5477) TransactionalProducer sleeps unnecessarily long during back to back transactions
Apurva Mehta created KAFKA-5477: --- Summary: TransactionalProducer sleeps unnecessarily long during back to back transactions Key: KAFKA-5477 URL: https://issues.apache.org/jira/browse/KAFKA-5477 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0 Reporter: Apurva Mehta Assignee: Apurva Mehta I am running some perf tests for EOS and there is a severe perf impact with our default configs. Here is the issue. # When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously. # In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client. # The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for transactions. Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'. The options are: # do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway). # Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests. # do nothing and fix it properly in 0.11.0.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5476) Implement a system test that creates network partitions
Colin P. McCabe created KAFKA-5476: -- Summary: Implement a system test that creates network partitions Key: KAFKA-5476 URL: https://issues.apache.org/jira/browse/KAFKA-5476 Project: Kafka Issue Type: Test Reporter: Colin P. McCabe Assignee: Colin P. McCabe Implement a system test that creates network partitions -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054960#comment-16054960 ] Jun Rao commented on KAFKA-5473: Yes, the issue is there since KAFKA-2405 was committed. So, it's affecting all versions since 0.9.0.0. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5465) FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set
[ https://issues.apache.org/jira/browse/KAFKA-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054956#comment-16054956 ] Jeff Widman commented on KAFKA-5465: I agree with the "WontFix" resolution; however, I'm scared that I'll forget about this when upgrading our cluster. Can a link to this issue be added to the Upgrade instructions for 0.11? > FetchResponse v0 does not return any messages when max_bytes smaller than v2 > message set > - > > Key: KAFKA-5465 > URL: https://issues.apache.org/jira/browse/KAFKA-5465 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Dana Powers >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 0.11.0.1 > > > In prior releases, when consuming uncompressed messages, FetchResponse v0 > will return a message if it is smaller than the max_bytes sent in the > FetchRequest. In 0.11.0.0 RC0, when messages are stored as v2 internally, the > response will be empty unless the full MessageSet is smaller than max_bytes. > In some configurations, this may cause old consumers to get stuck on large > messages where previously they were able to make progress one message at a > time. > For example, when I produce 10 5KB messages using ProduceRequest v0 and then > attempt FetchRequest v0 with partition max bytes = 6KB (larger than a single > message but smaller than all 10 messages together), I get an empty message > set from 0.11.0.0. Previous brokers would have returned a single message. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054955#comment-16054955 ] Jeff Widman commented on KAFKA-5473: Saw this was tagged as 0.9.0... Does this affect any of the 0.10 or 0.11 releases? > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4201) Add an --assignment-strategy option to new-consumer-based Mirror Maker
[ https://issues.apache.org/jira/browse/KAFKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054915#comment-16054915 ] Vahid Hashemian commented on KAFKA-4201: [~johnma] It seems to me that the focus of [KAFKA-2111|https://issues.apache.org/jira/browse/KAFKA-2111] is on existing arguments of Kafka tools. [This JIRA|https://issues.apache.org/jira/browse/KAFKA-4201] however introduces a new argument. I would suggest handling existing and approved arguments in [KAFKA-2111|https://issues.apache.org/jira/browse/KAFKA-2111]. If necessary, the [existing PR for this JIRA|https://github.com/apache/kafka/pull/1912] can later be rebased to comply with any standardization introduced in [KAFKA-2111|https://issues.apache.org/jira/browse/KAFKA-2111]. I hope it makes sense. Please advise if you disagree. Thanks. > Add an --assignment-strategy option to new-consumer-based Mirror Maker > -- > > Key: KAFKA-4201 > URL: https://issues.apache.org/jira/browse/KAFKA-4201 > Project: Kafka > Issue Type: Improvement >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian > Fix For: 0.11.1.0 > > > The default assignment strategy in mirror maker will be changed from range to > round robin in an upcoming release ([see > KAFKA-3818|https://issues.apache.org/jira/browse/KAFKA-3818]). In order to > make it easier for users to change the assignment strategy, add an > {{--assignment-strategy}} option to Mirror Maker command line tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5475) Connector config validation REST API endpoint not including fields for transformations
Ewen Cheslack-Postava created KAFKA-5475: Summary: Connector config validation REST API endpoint not including fields for transformations Key: KAFKA-5475 URL: https://issues.apache.org/jira/browse/KAFKA-5475 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Ewen Cheslack-Postava Priority: Blocker Fix For: 0.11.0.0 An issue with how embedded transformation configurations are included seems to have been introduced during 0.11.0.0. We are no longer seeing the `transforms..type` being included in the validation output. {code} curl -X PUT -H "Content-Type: application/json" --data '{"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector", "transforms": "foo,bar"}' http://localhost:8083/connector-plugins/org.apache.kafka.connect.file.FileStreamSourceConnector/config/validate | jq % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 3428 100 3325 100 103 344k 10917 --:--:-- --:--:-- --:--:-- 360k { "name": "org.apache.kafka.connect.file.FileStreamSourceConnector", "error_count": 1, "groups": [ "Common", "Transforms" ], "configs": [ { "definition": { "name": "value.converter", "type": "CLASS", "required": false, "default_value": null, "importance": "LOW", "documentation": "Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the values in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.", "group": "Common", "width": "SHORT", "display_name": "Value converter class", "dependents": [], "order": 5 }, "value": { "name": "value.converter", "value": null, "recommended_values": [], "errors": [], "visible": true } }, { "definition": { "name": "name", "type": "STRING", "required": true, "default_value": null, "importance": "HIGH", "documentation": "Globally unique name to use for this connector.", "group": "Common", "width": "MEDIUM", "display_name": "Connector name", "dependents": [], "order": 1 }, "value": { "name": "name", "value": null, "recommended_values": [], "errors": [ "Missing required configuration \"name\" which has no default value." ], "visible": true } }, { "definition": { "name": "tasks.max", "type": "INT", "required": false, "default_value": "1", "importance": "HIGH", "documentation": "Maximum number of tasks to use for this connector.", "group": "Common", "width": "SHORT", "display_name": "Tasks max", "dependents": [], "order": 3 }, "value": { "name": "tasks.max", "value": "1", "recommended_values": [], "errors": [], "visible": true } }, { "definition": { "name": "connector.class", "type": "STRING", "required": true, "default_value": null, "importance": "HIGH", "documentation": "Name or alias of the class for this connector. Must be a subclass of org.apache.kafka.connect.connector.Connector. If the connector is org.apache.kafka.connect.file.FileStreamSinkConnector, you can either specify this full name, or use \"FileStreamSink\" or \"FileStreamSinkConnector\" to make the configuration a bit shorter", "group": "Common", "width": "LONG", "display_name": "Connector class", "dependents": [], "order": 2 }, "value": { "name": "connector.class", "value": "org.apache.kafka.connect.file.FileStreamSourceConnector", "recommended_values": [], "errors": [], "visible": true } }, { "definition": { "name": "key.converter", "type": "CLASS", "required": false, "default_value": null, "importance": "LOW", "documentation": "Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the keys in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.", "group": "Common", "width": "SHORT", "display_name
[jira] [Commented] (KAFKA-4437) Incremental Batch Processing for Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054906#comment-16054906 ] Matthias J. Sax commented on KAFKA-4437: Thanks [~avi_parkassist]. Updated :) > Incremental Batch Processing for Kafka Streams > -- > > Key: KAFKA-4437 > URL: https://issues.apache.org/jira/browse/KAFKA-4437 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax > > We want to add an “auto stop” feature that terminate a stream application > when it has processed all the data that was newly available at the time the > application started (to at current end-of-log, i.e., current high watermark). > This allows to chop the (infinite) log into finite chunks where each run for > the application processes one chunk. This feature allows for incremental > batch-like processing; think "start-process-stop-restart-process-stop-..." > For details see KIP-95: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4857) Use AdminClient in Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054904#comment-16054904 ] Matthias J. Sax commented on KAFKA-4857: Doing the KIP is part of the JIRA, so it's your call to do it :) Sorry if this was not clear. Please have a look at https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Let us know if you need any help. However, as we did not do the switch to {{AdminClient}} yet, I would recommend to defer this JIRA until this internal change was done. Otherwise, we need to update {{KafkaClientSupplier}} twice. > Use AdminClient in Kafka Streams > > > Key: KAFKA-4857 > URL: https://issues.apache.org/jira/browse/KAFKA-4857 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Matthias J. Sax >Assignee: Sharad > Labels: needs-kip > > Streams uses {{KafkaClientSupplier}} to get > consumer/restore-consumer/producer clients. Streams also uses one more client > for admin purpose namely {{StreamsKafkaClient}} that is instantiated > "manually". > With the newly upcoming {{AdminClient}} from KIP-117, we can simplify (or > even replace {{StreamsKafkaClient}} with the new {{AdminClient}}. We > furthermore want to unify how the client in generated and extend > {{KafkaClientSupplier}} with method that return this client. > As this is a public API change, a KIP is required. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4201) Add an --assignment-strategy option to new-consumer-based Mirror Maker
[ https://issues.apache.org/jira/browse/KAFKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054902#comment-16054902 ] Mariam John commented on KAFKA-4201: Vahid, Can I assign this to myself since I am working on KAFKA-2111 which is related to this? Thanks. > Add an --assignment-strategy option to new-consumer-based Mirror Maker > -- > > Key: KAFKA-4201 > URL: https://issues.apache.org/jira/browse/KAFKA-4201 > Project: Kafka > Issue Type: Improvement >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian > Fix For: 0.11.1.0 > > > The default assignment strategy in mirror maker will be changed from range to > round robin in an upcoming release ([see > KAFKA-3818|https://issues.apache.org/jira/browse/KAFKA-3818]). In order to > make it easier for users to change the assignment strategy, add an > {{--assignment-strategy}} option to Mirror Maker command line tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4437) Incremental Batch Processing for Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054888#comment-16054888 ] Avi Flax commented on KAFKA-4437: - The KIP page says: bq. Released: 0.10.2.0 Seems like it should be updated. > Incremental Batch Processing for Kafka Streams > -- > > Key: KAFKA-4437 > URL: https://issues.apache.org/jira/browse/KAFKA-4437 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax > > We want to add an “auto stop” feature that terminate a stream application > when it has processed all the data that was newly available at the time the > application started (to at current end-of-log, i.e., current high watermark). > This allows to chop the (infinite) log into finite chunks where each run for > the application processes one chunk. This feature allows for incremental > batch-like processing; think "start-process-stop-restart-process-stop-..." > For details see KIP-95: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5474) Streams StandbyTask should no checkpoint on commit if EOS is enabled
[ https://issues.apache.org/jira/browse/KAFKA-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054804#comment-16054804 ] ASF GitHub Bot commented on KAFKA-5474: --- GitHub user mjsax opened a pull request: https://github.com/apache/kafka/pull/3375 KAFKA-5474: Streams StandbyTask should no checkpoint on commit if EOS is enabled - actual fix for `StandbyTask#commit()` Additionally (for debugging): - EOS test, does not report "expected" value correctly - `StreamThread` does not report "standby tasks" correctly - add `IntegerDecoder` (to be use with `kafka.tools.DumpLogSegments`) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/kafka kafka-5474-eos-standby-task Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3375.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3375 commit 89ea5aa85659177907b6ab14abce1a37cb24f21a Author: Matthias J. Sax Date: 2017-06-19T21:47:48Z KAFKA-5474: Streams StandbyTask should no checkpoint on commit if EOS is enabled commit 2b6349d61c2959c671d3f5157fbc4aa40c0f47d1 Author: Matthias J. Sax Date: 2017-06-19T21:49:28Z Debug, cleanup, and minor fixes > Streams StandbyTask should no checkpoint on commit if EOS is enabled > > > Key: KAFKA-5474 > URL: https://issues.apache.org/jira/browse/KAFKA-5474 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax >Priority: Blocker > Labels: exactly-once > Fix For: 0.11.0.0 > > > Discovered by system test {{streams_eos_test#test_failure_and_recovery}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5474) Streams StandbyTask should no checkpoint on commit if EOS is enabled
Matthias J. Sax created KAFKA-5474: -- Summary: Streams StandbyTask should no checkpoint on commit if EOS is enabled Key: KAFKA-5474 URL: https://issues.apache.org/jira/browse/KAFKA-5474 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.11.0.0 Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Blocker Fix For: 0.11.0.0 Discovered by system test {{streams_eos_test#test_failure_and_recovery}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054781#comment-16054781 ] Jun Rao commented on KAFKA-5473: One thing to note is that the construction of the ZooKeeper object can throw an exception. org.I0Itec.zkclient.exception.ZkException: Unable to connect to zookeeper.host.xxx.xxx:2181 at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:72) at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:1289) at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:721) at org.I0Itec.zkclient.ZkClient.process(ZkClient.java:632) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) Caused by: java.net.UnknownHostException: zookeeper.host.xxx.xxx at java.net.InetAddress.getAllByName0(InetAddress.java:1280) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61) at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445) at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380) at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:70) > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-2405) KafkaHealthCheck kills the JVM in handleSessionEstablishmentError
[ https://issues.apache.org/jira/browse/KAFKA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054764#comment-16054764 ] Jun Rao commented on KAFKA-2405: Just realized this change. This seems to cause the ZK session expiration handling to be worse than before. Filed https://issues.apache.org/jira/browse/KAFKA-5473 to address this. > KafkaHealthCheck kills the JVM in handleSessionEstablishmentError > - > > Key: KAFKA-2405 > URL: https://issues.apache.org/jira/browse/KAFKA-2405 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: jaikiran pai >Assignee: jaikiran pai > Fix For: 0.9.0.0 > > > The current code in KafkaHealthCheck in trunk does this: > {code} > override def handleSessionEstablishmentError(error: Throwable): Unit = { > fatal("Could not establish session with zookeeper", error) > System.exit(-1) > } > {code} > thus terminating the JVM. A session establishment error shouldn't cause the > JVM to terminate. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5032) Think through implications of max.message.size affecting record batches in message format V2
[ https://issues.apache.org/jira/browse/KAFKA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054763#comment-16054763 ] ASF GitHub Bot commented on KAFKA-5032: --- GitHub user apurvam opened a pull request: https://github.com/apache/kafka/pull/3374 KAFKA-5032: Update the docs for message size configs across the board Before 0.11, we used to have limits for maximum message size on the producer, broker, and consumer side. From 0.11 onward, these limits apply to record batches as a whole. This patch updates the documentation of the configs to make this explicit. A separate patch will have more extensive upgrade notes to tie all the changes together in one narrative. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apurvam/kafka KAFKA-5032-message-size-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3374.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3374 commit b8e1379a54d21141a22694b2aa6d422709bfb89f Author: Apurva Mehta Date: 2017-06-19T21:20:03Z Change references to 'message' in the size options to 'record batch', since everything is written and read in batches in the current version. > Think through implications of max.message.size affecting record batches in > message format V2 > > > Key: KAFKA-5032 > URL: https://issues.apache.org/jira/browse/KAFKA-5032 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Assignee: Apurva Mehta >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > It's worth noting that the new behaviour for uncompressed messages is the > same as the existing behaviour for compressed messages. > A few things to think about: > 1. Do the producer settings max.request.size and batch.size still make sense > and do we need to update the documentation? My conclusion is that things are > still fine, but we may need to revise the docs. > 2. (Seems like we don't need to do this) Consider changing default max > message set size to include record batch overhead. This is currently defined > as: > {code} > val MessageMaxBytes = 100 + MessageSet.LogOverhead > {code} > We should consider changing it to (I haven't thought it through though): > {code} > val MessageMaxBytes = 100 + DefaultRecordBatch.RECORD_BATCH_OVERHEAD > {code} > 3. When a record batch is too large, we throw RecordTooLargeException, which > is confusing because there's also a RecordBatchTooLargeException. We should > consider renaming these exceptions to make the behaviour clearer. > 4. We should consider deprecating max.message.bytes (server config) and > message.max.bytes (topic config) in favour of configs that make it clear that > we are talking about record batches instead of individual messages. > Part of the work in this JIRA is working out what should be done for 0.11.0.0 > and what can be done later. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5472) Connector validate REST endpoint returning duplicate entries in "groups"
[ https://issues.apache.org/jira/browse/KAFKA-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch reassigned KAFKA-5472: Assignee: Randall Hauch Affects Version/s: 0.11.0.0 Priority: Blocker (was: Major) Component/s: KafkaConnect The logic changed recently as [part of the classloader isolation changes|https://github.com/apache/kafka/commit/45f2261763eac5caaebf860daab32ef5337c9293#diff-fad18b064f6a9aa922ad13e6d7e60df3R252], and it looks to me like this introduced duplicate validation of the basic config definitions that contain properties in the "Common" and "Transform" groups. Prior to the classloading feature, the connector-independent (i.e., basic) configuration definitions were validated and then the connector-specific configurations were validated, and all groups added to the list of group names. However, with the classloading feature this logic was changed to validate the connector-independent (i.e., basic) configuration first, and then the connector-specific configurations are enriched with the basic configuration definitions and then all of these are validated. That leads to validating the basic configuration definitions twice. There are several approaches to correct this: * Simply remove duplicates from the list of all group names. This would still lead to the basic configurations being performed twice, but it is simple and low-risk. * Remove [lines 245-252|https://github.com/apache/kafka/blob/45f2261763eac5caaebf860daab32ef5337c9293/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L245-L252] that validate just the basic config definitions, since the remaining code would enrich the connector-specific configuration definitions with the basic config defs and validate all of them. This approach is probably more correct, as long as we ensure that the basic config defs are validated first. * Remove the enrichment process so that the second validation step only validates the connector-specific configuration properties. This is closest to the previous behavior. > Connector validate REST endpoint returning duplicate entries in "groups" > > > Key: KAFKA-5472 > URL: https://issues.apache.org/jira/browse/KAFKA-5472 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.11.0.0 >Reporter: Randall Hauch >Assignee: Randall Hauch >Priority: Blocker > > When calling the REST endpoint to validate a connector configuration, the > response includes duplicate values for the {{groups}} field: > {code:javascript} > { > // ... > groups: ["Common", "Transforms", "Common", "Transforms"] > } > {code} > This is a regression and may cause issues for any tooling that is using the > Kafka Connect REST API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054757#comment-16054757 ] Jun Rao commented on KAFKA-5473: The simplest solution is to just fail the broker if the new ZK session can't be established. A more complicated solution is to retry for some time. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
Jun Rao created KAFKA-5473: -- Summary: handle ZK session expiration properly when a new session can't be established Key: KAFKA-5473 URL: https://issues.apache.org/jira/browse/KAFKA-5473 Project: Kafka Issue Type: Sub-task Affects Versions: 0.9.0.0 Reporter: Jun Rao In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in handling ZK session expiration a bit. If a new ZK session can't be established after session expiration, we just log an error and continue. However, this can leave the broker in a bad state since it's up, but not registered from the controller's perspective. Replicas on this broker may never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5472) Connector validate REST endpoint returning duplicate entries in "groups"
Randall Hauch created KAFKA-5472: Summary: Connector validate REST endpoint returning duplicate entries in "groups" Key: KAFKA-5472 URL: https://issues.apache.org/jira/browse/KAFKA-5472 Project: Kafka Issue Type: Bug Reporter: Randall Hauch When calling the REST endpoint to validate a connector configuration, the response includes duplicate values for the {{groups}} field: {code:javascript} { // ... groups: ["Common", "Transforms", "Common", "Transforms"] } {code} This is a regression and may cause issues for any tooling that is using the Kafka Connect REST API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5452) Aggressive log compaction ratio appears to have no negative effect on log-compacted topics
[ https://issues.apache.org/jira/browse/KAFKA-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Chao updated KAFKA-5452: - Description: Some of our users are seeing unintuitive/unexpected behavior with log-compacted topics where they receive multiple records for the same key when consuming. This is a result of low throughput on log-compacted topics such that conditions ({{min.cleanable.dirty.ratio = 0.5}}, default) aren't met for compaction to kick in. This prompted us to test and tune {{min.cleanable.dirty.ratio}} in our clusters. It appears that having more aggressive log compaction ratios don't have negative effects on CPU and memory utilization. If this is truly the case, we should consider changing the default from {{0.5}} to something more aggressive. Setup: # 8 brokers # 5 zk nodes # 32 partitions on a topic # replication factor 3 # log roll 3 hours # log segment bytes 1 GB # log retention 24 hours # all messages to a single key # all messages to a unique key # all messages to a bounded key range [0, 999] # {{min.cleanable.dirty.ratio}} per topic = {{0}}, {{0.5}}, and {{1}} # 200 MB/s sustained, produce and consume traffic Observations: We were able to verify log cleaner threads were performing work by checking the logs and verifying the {{cleaner-offset-checkpoint}} file for all topics. We also observed the log cleaner's {{time-since-last-run-ms}} metric was normal, never going above the default of 15 seconds. Under-replicated partitions stayed steady, same for replication lag. Here's an example test run where we try out {{min.cleanable.dirty.ratio = 0}}, {{min.cleanable.dirty.ratio = 1}}, and {{min.cleanable.dirty.ratio = 0.5}}. Troughs in between the peaks represent zero traffic and reconfiguring of topics. (200mbs-dirty-0-dirty1-dirty05.png attached) !200mbs-dirty0-dirty-1-dirty05.png|thumbnail! Memory utilization is fine, but more interestingly, CPU doesn't appear to have much difference. To get more detail, here is a flame graph (raw svg attached) of the run for {{min.cleanable.dirty.ratio = 0}}. The conservative and default ratio flame graphs are equivalent. (flame-graph-200mbs-dirty0.png attached) !flame-graph-200mbs-dirty0.png|thumbnail! Notice that the majority of CPU is coming from: # SSL operations (on reads/writes) # KafkaApis::handleFetchRequest (ReplicaManager::fetchMessages) # KafkaApis::handleOffsetFetchRequest We also have examples from small scale test runs which show similar behavior but with scaled down CPU usage. It seems counterintuitive that there's no apparent difference in CPU whether it be aggressive or conservative compaction ratios, so we'd like to get some thoughts from the community. We're looking for feedback on whether or not anyone else has experienced this behavior before as well or, if CPU isn't affected, has anyone seen something related instead. If this is true, then we'd be happy to discuss further and provide a patch. was: Some of our users are seeing unintuitive/unexpected behavior with log-compacted topics where they receive multiple records for the same key when consuming. This is a result of low throughput on log-compacted topics such that conditions ({{min.cleanable.dirty.ratio = 0.5}}, default) aren't met for compaction to kick in. This prompted us to test and tune {{min.cleanable.dirty.ratio}} in our clusters. It appears that having more aggressive log compaction ratios don't have negative effects on CPU and memory utilization. If this is truly the case, we should consider changing the default from {{0.5}} to something more aggressive. Setup: # 1. 8 brokers # 2. 5 zk nodes # 3. 32 partitions on a topic # 4. replication factor 3 # 5. log roll 3 hours # 6. log segment bytes 1 GB # 7. log retention 24 hours # 8. all messages to a single key # 9. all messages to a unique key # 10. all messages to a bounded key range [0, 999] # 11. {{min.cleanable.dirty.ratio}} per topic = {{0}}, {{0.5}}, and {{1}} # 12. 200 MB/s sustained, produce and consume traffic Observations: We were able to verify log cleaner threads were performing work by checking the logs and verifying the {{cleaner-offset-checkpoint}} file for all topics. We also observed the log cleaner's {{time-since-last-run-ms}} metric was normal, never going above the default of 15 seconds. Under-replicated partitions stayed steady, same for replication lag. Here's an example test run where we try out {{min.cleanable.dirty.ratio = 0}}, {{min.cleanable.dirty.ratio = 1}}, and {{min.cleanable.dirty.ratio = 0.5}}. Troughs in between the peaks represent zero traffic and reconfiguring of topics. (200mbs-dirty-0-dirty1-dirty05.png attached) !200mbs-dirty0-dirty-1-dirty05.png|thumbnail! Memory utilization is fine, but more interestingly, CPU doesn't appear to have much difference. To get more detail, here is a flame graph (raw svg attached) of the run for {{min.cleanable.dirty.ratio
[jira] [Commented] (KAFKA-5431) LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException
[ https://issues.apache.org/jira/browse/KAFKA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054686#comment-16054686 ] Jun Rao commented on KAFKA-5431: [~crietz], in Log.roll(), we call LogSegment.trim() to reset the size of the log file to the actual size, which eventually calls FileRecords.truncateTo(). If this is reproducible, could you add some instrumentation in FileRecords.truncateTo() to see if the logic is actually called during log rolling? > LogCleaner stopped due to > org.apache.kafka.common.errors.CorruptRecordException > --- > > Key: KAFKA-5431 > URL: https://issues.apache.org/jira/browse/KAFKA-5431 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.2.1 >Reporter: Carsten Rietz > Labels: reliability > Fix For: 0.11.0.1 > > > Hey all, > i have a strange problem with our uat cluster of 3 kafka brokers. > the __consumer_offsets topic was replicated to two instances and our disks > ran full due to a wrong configuration of the log cleaner. We fixed the > configuration and updated from 0.10.1.1 to 0.10.2.1 . > Today i increased the replication of the __consumer_offsets topic to 3 and > triggered replication to the third cluster via kafka-reassign-partitions.sh. > That went well but i get many errors like > {code} > [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for > partition [__consumer_offsets,18] offset 0 error Record size is less than the > minimum record overhead (14) (kafka.server.ReplicaFetcherThread) > [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for > partition [__consumer_offsets,24] offset 0 error Record size is less than the > minimum record overhead (14) (kafka.server.ReplicaFetcherThread) > {code} > Which i think are due to the full disk event. > The log cleaner threads died on these wrong messages: > {code} > [2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > org.apache.kafka.common.errors.CorruptRecordException: Record size is less > than the minimum record overhead (14) > [2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped > (kafka.log.LogCleaner) > {code} > Looking at the file is see that some are truncated and some are jsut empty: > $ ls -lsh 00594653.log > 0 -rw-r--r-- 1 user user 100M Jun 12 11:00 00594653.log > Sadly i do not have the logs any more from the disk full event itsself. > I have three questions: > * What is the best way to clean this up? Deleting the old log files and > restarting the brokers? > * Why did kafka not handle the disk full event well? Is this only affecting > the cleanup or may we also loose data? > * Is this maybe caused by the combination of upgrade and disk full? > And last but not least: Keep up the good work. Kafka is really performing > well while being easy to administer and has good documentation! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5032) Think through implications of max.message.size affecting record batches in message format V2
[ https://issues.apache.org/jira/browse/KAFKA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apurva Mehta reassigned KAFKA-5032: --- Assignee: Apurva Mehta > Think through implications of max.message.size affecting record batches in > message format V2 > > > Key: KAFKA-5032 > URL: https://issues.apache.org/jira/browse/KAFKA-5032 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Assignee: Apurva Mehta >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > It's worth noting that the new behaviour for uncompressed messages is the > same as the existing behaviour for compressed messages. > A few things to think about: > 1. Do the producer settings max.request.size and batch.size still make sense > and do we need to update the documentation? My conclusion is that things are > still fine, but we may need to revise the docs. > 2. (Seems like we don't need to do this) Consider changing default max > message set size to include record batch overhead. This is currently defined > as: > {code} > val MessageMaxBytes = 100 + MessageSet.LogOverhead > {code} > We should consider changing it to (I haven't thought it through though): > {code} > val MessageMaxBytes = 100 + DefaultRecordBatch.RECORD_BATCH_OVERHEAD > {code} > 3. When a record batch is too large, we throw RecordTooLargeException, which > is confusing because there's also a RecordBatchTooLargeException. We should > consider renaming these exceptions to make the behaviour clearer. > 4. We should consider deprecating max.message.bytes (server config) and > message.max.bytes (topic config) in favour of configs that make it clear that > we are talking about record batches instead of individual messages. > Part of the work in this JIRA is working out what should be done for 0.11.0.0 > and what can be done later. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5348) kafka-consumer-groups.sh refuses to remove groups without ids
[ https://issues.apache.org/jira/browse/KAFKA-5348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054545#comment-16054545 ] Vahid Hashemian commented on KAFKA-5348: [~bobrik] Then perhaps ZooKeeper data for this group is somehow corrupted?! Because normally, even if all consumers in the group die the {{/ids}} path would still be there. Is this something that you can easily reproduce? > kafka-consumer-groups.sh refuses to remove groups without ids > - > > Key: KAFKA-5348 > URL: https://issues.apache.org/jira/browse/KAFKA-5348 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 0.10.2.0 >Reporter: Ivan Babrou >Assignee: Vahid Hashemian > > In zookeeper I have: > {noformat} > [zk: foo(CONNECTED) 37] ls /kafka/logs/consumers/console-consumer-4107 > [offsets] > {noformat} > This consumer group also shows up when I list consumer groups: > {noformat} > $ /usr/local/kafka/bin/kafka-consumer-groups.sh --zookeeper > foo:2181/kafka/logs --list | fgrep console-consumer-4107 > Note: This will only show information about consumers that use ZooKeeper (not > those using the Java consumer API). > console-consumer-4107 > {noformat} > But I cannot remove this group: > {noformat} > $ /usr/local/kafka/bin/kafka-consumer-groups.sh --zookeeper > 36zk1.in.pdx.cfdata.org:2181/kafka/logs --delete --group console-consumer-4107 > Note: This will only show information about consumers that use ZooKeeper (not > those using the Java consumer API). > Error: Delete for group 'console-consumer-4107' failed because group does not > exist. > {noformat} > I ran tcpdump and it turns out that /ids path is checked: > {noformat} > $.e.P.fP...&..<...//kafka/logs/consumers/console-consumer-4107/ids. > {noformat} > I think kafka should not check for /ids, it should check for / instead here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5465) FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set
[ https://issues.apache.org/jira/browse/KAFKA-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054422#comment-16054422 ] Dana Powers commented on KAFKA-5465: Agree on won't fix. > FetchResponse v0 does not return any messages when max_bytes smaller than v2 > message set > - > > Key: KAFKA-5465 > URL: https://issues.apache.org/jira/browse/KAFKA-5465 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Dana Powers >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 0.11.0.1 > > > In prior releases, when consuming uncompressed messages, FetchResponse v0 > will return a message if it is smaller than the max_bytes sent in the > FetchRequest. In 0.11.0.0 RC0, when messages are stored as v2 internally, the > response will be empty unless the full MessageSet is smaller than max_bytes. > In some configurations, this may cause old consumers to get stuck on large > messages where previously they were able to make progress one message at a > time. > For example, when I produce 10 5KB messages using ProduceRequest v0 and then > attempt FetchRequest v0 with partition max bytes = 6KB (larger than a single > message but smaller than all 10 messages together), I get an empty message > set from 0.11.0.0. Previous brokers would have returned a single message. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-5465) FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set
[ https://issues.apache.org/jira/browse/KAFKA-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-5465. Resolution: Won't Fix For the reasons mentioned by Ismael above, we've decided not to fix this for now. If anyone has concerns, please leave a comment and we will reconsider. > FetchResponse v0 does not return any messages when max_bytes smaller than v2 > message set > - > > Key: KAFKA-5465 > URL: https://issues.apache.org/jira/browse/KAFKA-5465 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Dana Powers >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 0.11.0.1 > > > In prior releases, when consuming uncompressed messages, FetchResponse v0 > will return a message if it is smaller than the max_bytes sent in the > FetchRequest. In 0.11.0.0 RC0, when messages are stored as v2 internally, the > response will be empty unless the full MessageSet is smaller than max_bytes. > In some configurations, this may cause old consumers to get stuck on large > messages where previously they were able to make progress one message at a > time. > For example, when I produce 10 5KB messages using ProduceRequest v0 and then > attempt FetchRequest v0 with partition max bytes = 6KB (larger than a single > message but smaller than all 10 messages together), I get an empty message > set from 0.11.0.0. Previous brokers would have returned a single message. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-2967) Move Kafka documentation to ReStructuredText
[ https://issues.apache.org/jira/browse/KAFKA-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054289#comment-16054289 ] Tom Bentley commented on KAFKA-2967: Any progress on this [[~ceposta], [[~gwenshap]? I would like to help improve the documentation, and not having to edit the raw HTML would make that a nicer experience. > Move Kafka documentation to ReStructuredText > > > Key: KAFKA-2967 > URL: https://issues.apache.org/jira/browse/KAFKA-2967 > Project: Kafka > Issue Type: Bug >Reporter: Gwen Shapira >Assignee: Gwen Shapira > > Storing documentation as HTML is kind of BS :) > * Formatting is a pain, and making it look good is even worse > * Its just HTML, can't generate PDFs > * Reading and editting is painful > * Validating changes is hard because our formatting relies on all kinds of > Apache Server features. > I suggest: > * Move to RST > * Generate HTML and PDF during build using Sphinx plugin for Gradle. > Lots of Apache projects are doing this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found
[ https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054278#comment-16054278 ] ASF GitHub Bot commented on KAFKA-5412: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/3279 > Using connect-console-sink/source.properties raises an exception related to > "file" property not found > - > > Key: KAFKA-5412 > URL: https://issues.apache.org/jira/browse/KAFKA-5412 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.2.1 >Reporter: Paolo Patierno >Assignee: Paolo Patierno > Fix For: 0.11.1.0 > > > With the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example > using connect-console-sink/source.properties doesn't work anymore because the > needed "file" property isn't found. This is because the underlying used > FileStreamSink/Source connector and task has defined a ConfigDef with "file" > as mandatory parameter. In the case of console example we want to have > file=null so that stdin and stdout are used. > One possible solution and workaround is set "file=" inside the provided > connect-console-sink/source.properties. The other one could be modify the > FileStreamSink/Source source code in order to remove the "file" definition > from the ConfigDef. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5470) Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class
[ https://issues.apache.org/jira/browse/KAFKA-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054216#comment-16054216 ] Hongyuan Li edited comment on KAFKA-5470 at 6/19/17 3:32 PM: - LGTM. That decreases the rate of invoking oom. Thanks, [~ijuma]. was (Author: hongyuan li): LGTM. That decreases the rate of invoking oom. Thanks, Ismael Juma. > Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in > kafka-run-class > --- > > Key: KAFKA-5470 > URL: https://issues.apache.org/jira/browse/KAFKA-5470 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Critical > Fix For: 0.11.0.1 > > > This is important because Bits.reserveMemory calls System.gc() hoping to free > native memory in order to avoid throwing an OutOfMemoryException. This call > is currently a no-op due to -XX:+DisableExplicitGC. > It's worth mentioning that -XX:MaxDirectMemorySize can be used to increase > the amount of native memory available for allocation of direct byte buffers. > Hongyuan Li pointed out the issue with the usage of -XX:+DisableExplicitGC > in: > https://issues.apache.org/jira/browse/KAFKA-5444?focusedCommentId=16054129&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16054129 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5470) Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class
[ https://issues.apache.org/jira/browse/KAFKA-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054216#comment-16054216 ] Hongyuan Li commented on KAFKA-5470: LGTM. That decreases the rate of invoking oom. Thanks, Ismael Juma. > Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in > kafka-run-class > --- > > Key: KAFKA-5470 > URL: https://issues.apache.org/jira/browse/KAFKA-5470 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Critical > Fix For: 0.11.0.1 > > > This is important because Bits.reserveMemory calls System.gc() hoping to free > native memory in order to avoid throwing an OutOfMemoryException. This call > is currently a no-op due to -XX:+DisableExplicitGC. > It's worth mentioning that -XX:MaxDirectMemorySize can be used to increase > the amount of native memory available for allocation of direct byte buffers. > Hongyuan Li pointed out the issue with the usage of -XX:+DisableExplicitGC > in: > https://issues.apache.org/jira/browse/KAFKA-5444?focusedCommentId=16054129&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16054129 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5471) Original Kafka paper link broken
[ https://issues.apache.org/jira/browse/KAFKA-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054211#comment-16054211 ] Jeremy Hanna commented on KAFKA-5471: - I can get it done if I could get confluence access for my user - jeromatron. > Original Kafka paper link broken > > > Key: KAFKA-5471 > URL: https://issues.apache.org/jira/browse/KAFKA-5471 > Project: Kafka > Issue Type: Bug > Components: documentation >Reporter: Jeremy Hanna >Priority: Trivial > > Currently on the [Kafka papers and presentations > site|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations] > the original Kafka paper is linked but it's a broken link. > Currently it links to > [here|http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf] > but that person may have taken the paper down. I found it > [here|http://notes.stephenholiday.com/Kafka.pdf] but that could have a > similar problem in the future. We should be able to put the file as an > attachment in the confluence wiki to make it a more permanent link. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5471) Original Kafka paper link broken
Jeremy Hanna created KAFKA-5471: --- Summary: Original Kafka paper link broken Key: KAFKA-5471 URL: https://issues.apache.org/jira/browse/KAFKA-5471 Project: Kafka Issue Type: Bug Components: documentation Reporter: Jeremy Hanna Priority: Trivial Currently on the [Kafka papers and presentations site|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations] the original Kafka paper is linked but it's a broken link. Currently it links to [here|http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf] but that person may have taken the paper down. I found it [here|http://notes.stephenholiday.com/Kafka.pdf] but that could have a similar problem in the future. We should be able to put the file as an attachment in the confluence wiki to make it a more permanent link. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5444) Producer.send() will hang 8+ hours
[ https://issues.apache.org/jira/browse/KAFKA-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054185#comment-16054185 ] Ismael Juma commented on KAFKA-5444: That's a good point [~Hongyuan Li], I filed KAFKA-5470 with a PR that replaces -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent > Producer.send() will hang 8+ hours > -- > > Key: KAFKA-5444 > URL: https://issues.apache.org/jira/browse/KAFKA-5444 > Project: Kafka > Issue Type: Bug >Reporter: Hongyuan Li > > Frequent kafka old Producer open and close with cause the server hang with > lots of error messages logged in the server.log .In my occasion,we may > frequent open and close kafka producer,the procedure just like the code below: > {code} > Producer producer = …… > producer.send(List lists); > producer.close(); > {code} > the error is below: > {code} > 2017-06-13 00:00:00,084] ERROR Processor got uncaught exception. > (kafka.network.Processor) > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) > at sun.nio.ch.IOUtil.read(IOUtil.java:195) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:110) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at kafka.network.Processor.poll(SocketServer.scala:476) > at kafka.network.Processor.run(SocketServer.scala:416) > at java.lang.Thread.run(Thread.java:745) > {code} > seen from all existing logs, all error is repeats of error above. > Any good idea to solve this? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5470) Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class
[ https://issues.apache.org/jira/browse/KAFKA-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054179#comment-16054179 ] ASF GitHub Bot commented on KAFKA-5470: --- GitHub user ijuma opened a pull request: https://github.com/apache/kafka/pull/3371 KAFKA-5470: Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijuma/kafka kafka-5470-explicit-gc-invokes-concurrent Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3371.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3371 commit da81116ca8d5c9c75ea447c0febe11f97a67957a Author: Ismael Juma Date: 2017-06-19T15:17:46Z KAFKA-5470: Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class > Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in > kafka-run-class > --- > > Key: KAFKA-5470 > URL: https://issues.apache.org/jira/browse/KAFKA-5470 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Critical > Fix For: 0.11.0.1 > > > This is important because Bits.reserveMemory calls System.gc() hoping to free > native memory in order to avoid throwing an OutOfMemoryException. This call > is currently a no-op due to -XX:+DisableExplicitGC. > It's worth mentioning that -XX:MaxDirectMemorySize can be used to increase > the amount of native memory available for allocation of direct byte buffers. > Hongyuan Li pointed out the issue with the usage of -XX:+DisableExplicitGC > in: > https://issues.apache.org/jira/browse/KAFKA-5444?focusedCommentId=16054129&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16054129 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5470) Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class
[ https://issues.apache.org/jira/browse/KAFKA-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-5470: -- Assignee: Ismael Juma Fix Version/s: 0.11.0.1 > Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in > kafka-run-class > --- > > Key: KAFKA-5470 > URL: https://issues.apache.org/jira/browse/KAFKA-5470 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Critical > Fix For: 0.11.0.1 > > > This is important because Bits.reserveMemory calls System.gc() hoping to free > native memory in order to avoid throwing an OutOfMemoryException. This call > is currently a no-op due to -XX:+DisableExplicitGC. > It's worth mentioning that -XX:MaxDirectMemorySize can be used to increase > the amount of native memory available for allocation of direct byte buffers. > Hongyuan Li pointed out the issue with the usage of -XX:+DisableExplicitGC > in: > https://issues.apache.org/jira/browse/KAFKA-5444?focusedCommentId=16054129&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16054129 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5470) Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class
Ismael Juma created KAFKA-5470: -- Summary: Replace -XX:+DisableExplicitGC with -XX:+ExplicitGCInvokesConcurrent in kafka-run-class Key: KAFKA-5470 URL: https://issues.apache.org/jira/browse/KAFKA-5470 Project: Kafka Issue Type: Improvement Reporter: Ismael Juma Priority: Critical This is important because Bits.reserveMemory calls System.gc() hoping to free native memory in order to avoid throwing an OutOfMemoryException. This call is currently a no-op due to -XX:+DisableExplicitGC. It's worth mentioning that -XX:MaxDirectMemorySize can be used to increase the amount of native memory available for allocation of direct byte buffers. Hongyuan Li pointed out the issue with the usage of -XX:+DisableExplicitGC in: https://issues.apache.org/jira/browse/KAFKA-5444?focusedCommentId=16054129&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16054129 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5444) Producer.send() will hang 8+ hours
[ https://issues.apache.org/jira/browse/KAFKA-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054129#comment-16054129 ] Hongyuan Li commented on KAFKA-5444: I think this problem is associate with direct buffer allocation, use new producer api may ecountered similar problem which may not hang. I thought we can solve this in two steps: 1、move {{XX:+DisableExplicitGC}} from default {{kafka-run-class.sh}} 2、use {{sun.misc.cleaner}} to clean the directbuffer or use {{netty}} {{PlatformDependent}}#{{freeDirectBuffer}} to clean the direct buffer. Is that right? > Producer.send() will hang 8+ hours > -- > > Key: KAFKA-5444 > URL: https://issues.apache.org/jira/browse/KAFKA-5444 > Project: Kafka > Issue Type: Bug >Reporter: Hongyuan Li > > Frequent kafka old Producer open and close with cause the server hang with > lots of error messages logged in the server.log .In my occasion,we may > frequent open and close kafka producer,the procedure just like the code below: > {code} > Producer producer = …… > producer.send(List lists); > producer.close(); > {code} > the error is below: > {code} > 2017-06-13 00:00:00,084] ERROR Processor got uncaught exception. > (kafka.network.Processor) > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) > at sun.nio.ch.IOUtil.read(IOUtil.java:195) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:110) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at kafka.network.Processor.poll(SocketServer.scala:476) > at kafka.network.Processor.run(SocketServer.scala:416) > at java.lang.Thread.run(Thread.java:745) > {code} > seen from all existing logs, all error is repeats of error above. > Any good idea to solve this? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5021) Update Message Delivery Semantics section to take into account KIP-98
[ https://issues.apache.org/jira/browse/KAFKA-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054046#comment-16054046 ] Ismael Juma commented on KAFKA-5021: Moving back to 0.11.0.0. This is a doc-only change and does not require a new RC. > Update Message Delivery Semantics section to take into account KIP-98 > - > > Key: KAFKA-5021 > URL: https://issues.apache.org/jira/browse/KAFKA-5021 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Assignee: Jason Gustafson >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > Reference: > https://kafka.apache.org/documentation/#semantics -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5020) Update protocol documentation to mention message format v2
[ https://issues.apache.org/jira/browse/KAFKA-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054047#comment-16054047 ] Ismael Juma commented on KAFKA-5020: Moving back to 0.11.0.0. This is a doc-only change and does not require a new RC. > Update protocol documentation to mention message format v2 > -- > > Key: KAFKA-5020 > URL: https://issues.apache.org/jira/browse/KAFKA-5020 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Assignee: Apurva Mehta >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > Sections 5.3, 5.4 and 5.5 should be updated: > https://kafka.apache.org/documentation/#messages > We may want to mention record batches along with message sets here: > https://kafka.apache.org/protocol#protocol_message_sets > And we should update the wiki page linked from the protocol documentation: > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5020) Update protocol documentation to mention message format v2
[ https://issues.apache.org/jira/browse/KAFKA-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5020: --- Fix Version/s: (was: 0.11.0.1) 0.11.0.0 > Update protocol documentation to mention message format v2 > -- > > Key: KAFKA-5020 > URL: https://issues.apache.org/jira/browse/KAFKA-5020 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Assignee: Apurva Mehta >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > Sections 5.3, 5.4 and 5.5 should be updated: > https://kafka.apache.org/documentation/#messages > We may want to mention record batches along with message sets here: > https://kafka.apache.org/protocol#protocol_message_sets > And we should update the wiki page linked from the protocol documentation: > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5021) Update Message Delivery Semantics section to take into account KIP-98
[ https://issues.apache.org/jira/browse/KAFKA-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5021: --- Fix Version/s: (was: 0.11.0.1) 0.11.0.0 > Update Message Delivery Semantics section to take into account KIP-98 > - > > Key: KAFKA-5021 > URL: https://issues.apache.org/jira/browse/KAFKA-5021 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Assignee: Jason Gustafson >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > Reference: > https://kafka.apache.org/documentation/#semantics -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5032) Think through implications of max.message.size affecting record batches in message format V2
[ https://issues.apache.org/jira/browse/KAFKA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054045#comment-16054045 ] Ismael Juma commented on KAFKA-5032: Moving back to 0.11.0.0. This is a doc-only change and does not require a new RC. > Think through implications of max.message.size affecting record batches in > message format V2 > > > Key: KAFKA-5032 > URL: https://issues.apache.org/jira/browse/KAFKA-5032 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > It's worth noting that the new behaviour for uncompressed messages is the > same as the existing behaviour for compressed messages. > A few things to think about: > 1. Do the producer settings max.request.size and batch.size still make sense > and do we need to update the documentation? My conclusion is that things are > still fine, but we may need to revise the docs. > 2. (Seems like we don't need to do this) Consider changing default max > message set size to include record batch overhead. This is currently defined > as: > {code} > val MessageMaxBytes = 100 + MessageSet.LogOverhead > {code} > We should consider changing it to (I haven't thought it through though): > {code} > val MessageMaxBytes = 100 + DefaultRecordBatch.RECORD_BATCH_OVERHEAD > {code} > 3. When a record batch is too large, we throw RecordTooLargeException, which > is confusing because there's also a RecordBatchTooLargeException. We should > consider renaming these exceptions to make the behaviour clearer. > 4. We should consider deprecating max.message.bytes (server config) and > message.max.bytes (topic config) in favour of configs that make it clear that > we are talking about record batches instead of individual messages. > Part of the work in this JIRA is working out what should be done for 0.11.0.0 > and what can be done later. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5032) Think through implications of max.message.size affecting record batches in message format V2
[ https://issues.apache.org/jira/browse/KAFKA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5032: --- Fix Version/s: (was: 0.11.0.1) 0.11.0.0 > Think through implications of max.message.size affecting record batches in > message format V2 > > > Key: KAFKA-5032 > URL: https://issues.apache.org/jira/browse/KAFKA-5032 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Ismael Juma >Priority: Critical > Labels: documentation, exactly-once > Fix For: 0.11.0.0 > > > It's worth noting that the new behaviour for uncompressed messages is the > same as the existing behaviour for compressed messages. > A few things to think about: > 1. Do the producer settings max.request.size and batch.size still make sense > and do we need to update the documentation? My conclusion is that things are > still fine, but we may need to revise the docs. > 2. (Seems like we don't need to do this) Consider changing default max > message set size to include record batch overhead. This is currently defined > as: > {code} > val MessageMaxBytes = 100 + MessageSet.LogOverhead > {code} > We should consider changing it to (I haven't thought it through though): > {code} > val MessageMaxBytes = 100 + DefaultRecordBatch.RECORD_BATCH_OVERHEAD > {code} > 3. When a record batch is too large, we throw RecordTooLargeException, which > is confusing because there's also a RecordBatchTooLargeException. We should > consider renaming these exceptions to make the behaviour clearer. > 4. We should consider deprecating max.message.bytes (server config) and > message.max.bytes (topic config) in favour of configs that make it clear that > we are talking about record batches instead of individual messages. > Part of the work in this JIRA is working out what should be done for 0.11.0.0 > and what can be done later. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4985) kafka-acls should resolve dns names and accept ip ranges
[ https://issues.apache.org/jira/browse/KAFKA-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053942#comment-16053942 ] Tom Bentley commented on KAFKA-4985: > The problem with resolving hostnames client-side is that it would cause a lot > of confusion when resolution happened differently client-side versus > server-side. That argument could be applied to practically any use of DNS, so I'm not convinced it makes a good reason not to do this. > kafka-acls should resolve dns names and accept ip ranges > > > Key: KAFKA-4985 > URL: https://issues.apache.org/jira/browse/KAFKA-4985 > Project: Kafka > Issue Type: Improvement > Components: security >Reporter: Ryan P > > Per KAFKA-2869 it looks like a conscious decision was made to move away from > using hostnames for authorization purposes. > This is fine however IP addresses are terrible inconvenient compared to > hostname with regard to configuring ACLs. > I'd like to propose the following two improvements to make managing these > ACLs easier for end-users. > 1. Allow for simple patterns to be matched > i.e --allow-host 10.17.81.11[1-9] > 2. Allow for hostnames to be used even if they are resolved on the client > side. Simple pattern matching on hostnames would be a welcome addition as well > i.e. --allow-host host.name.com > Accepting a comma delimited list of hostnames and ip addresses would also be > helpful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5458) SASL handshake fails for METADATA requests
[ https://issues.apache.org/jira/browse/KAFKA-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053938#comment-16053938 ] Ismael Juma commented on KAFKA-5458: Yes, the client config for JAAS is only supported by the 0.10.2 Java clients. > SASL handshake fails for METADATA requests > -- > > Key: KAFKA-5458 > URL: https://issues.apache.org/jira/browse/KAFKA-5458 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.1.1 >Reporter: Sagar Lad > Attachments: kafka_client_jaas.conf, kafka_server_jaas.conf, > producer.properties, server.properties > > > i have enabled SASL Authentication for my kafka with > {code:java} > listeners=SASL_PLAINTEXT://0.0.0.0:9092 > advertised.listeners=SASL_PLAINTEXT://0.0.0.0:9092 > sasl.enabled.mechanisms=PLAIN > security.protocol=SASL_PLAINTEXT > security.inter.broker.protocol=SASL_PLAINTEXT > sasl.mechanism=PLAIN > sasl.mechanism.inter.broker.protocol=PLAIN > {code} > but in kafkaServer.out getting > {code:java} > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to HANDSHAKE_REQUEST > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Handle Kafka request METADATA > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to FAILED > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Connection with /127.0.0.1 disconnected > (org.apache.kafka.common.network.Selector) > java.io.IOException: > org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected Kafka > request of type METADATA during SASL handshake. > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:243) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:338) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at kafka.network.Processor.poll(SocketServer.scala:476) > at kafka.network.Processor.run(SocketServer.scala:416) > at java.lang.Thread.run(Thread.java:745) > {code} > where as in client log i observed > {code:java} > 2017-06-16 11:33:33,501 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,607 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,721 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,834 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,952 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:34,059 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:34,162 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > ### > {code} > metadata request are being every 100ms > *Problem* : > how can we make SASL handshake to support METADATA type requests -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5458) SASL handshake fails for METADATA requests
[ https://issues.apache.org/jira/browse/KAFKA-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053934#comment-16053934 ] Sagar Lad commented on KAFKA-5458: -- is that supported in Kafka 0.10.2 ? because i am using 0.10.1.1 > SASL handshake fails for METADATA requests > -- > > Key: KAFKA-5458 > URL: https://issues.apache.org/jira/browse/KAFKA-5458 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.1.1 >Reporter: Sagar Lad > Attachments: kafka_client_jaas.conf, kafka_server_jaas.conf, > producer.properties, server.properties > > > i have enabled SASL Authentication for my kafka with > {code:java} > listeners=SASL_PLAINTEXT://0.0.0.0:9092 > advertised.listeners=SASL_PLAINTEXT://0.0.0.0:9092 > sasl.enabled.mechanisms=PLAIN > security.protocol=SASL_PLAINTEXT > security.inter.broker.protocol=SASL_PLAINTEXT > sasl.mechanism=PLAIN > sasl.mechanism.inter.broker.protocol=PLAIN > {code} > but in kafkaServer.out getting > {code:java} > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to HANDSHAKE_REQUEST > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Handle Kafka request METADATA > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to FAILED > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Connection with /127.0.0.1 disconnected > (org.apache.kafka.common.network.Selector) > java.io.IOException: > org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected Kafka > request of type METADATA during SASL handshake. > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:243) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:338) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at kafka.network.Processor.poll(SocketServer.scala:476) > at kafka.network.Processor.run(SocketServer.scala:416) > at java.lang.Thread.run(Thread.java:745) > {code} > where as in client log i observed > {code:java} > 2017-06-16 11:33:33,501 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,607 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,721 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,834 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,952 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:34,059 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:34,162 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > ### > {code} > metadata request are being every 100ms > *Problem* : > how can we make SASL handshake to support METADATA type requests -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5431) LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException
[ https://issues.apache.org/jira/browse/KAFKA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053936#comment-16053936 ] Carsten Rietz commented on KAFKA-5431: -- Yes the preallocatiion size is 100MB. I set the roll time to 1s to provoke many log rolls to more easily trigger this problem. Therefore only one offset makes it into a new log. The segment size is normal (log.segment.bytes=10485760). > LogCleaner stopped due to > org.apache.kafka.common.errors.CorruptRecordException > --- > > Key: KAFKA-5431 > URL: https://issues.apache.org/jira/browse/KAFKA-5431 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.2.1 >Reporter: Carsten Rietz > Labels: reliability > Fix For: 0.11.0.1 > > > Hey all, > i have a strange problem with our uat cluster of 3 kafka brokers. > the __consumer_offsets topic was replicated to two instances and our disks > ran full due to a wrong configuration of the log cleaner. We fixed the > configuration and updated from 0.10.1.1 to 0.10.2.1 . > Today i increased the replication of the __consumer_offsets topic to 3 and > triggered replication to the third cluster via kafka-reassign-partitions.sh. > That went well but i get many errors like > {code} > [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for > partition [__consumer_offsets,18] offset 0 error Record size is less than the > minimum record overhead (14) (kafka.server.ReplicaFetcherThread) > [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for > partition [__consumer_offsets,24] offset 0 error Record size is less than the > minimum record overhead (14) (kafka.server.ReplicaFetcherThread) > {code} > Which i think are due to the full disk event. > The log cleaner threads died on these wrong messages: > {code} > [2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > org.apache.kafka.common.errors.CorruptRecordException: Record size is less > than the minimum record overhead (14) > [2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped > (kafka.log.LogCleaner) > {code} > Looking at the file is see that some are truncated and some are jsut empty: > $ ls -lsh 00594653.log > 0 -rw-r--r-- 1 user user 100M Jun 12 11:00 00594653.log > Sadly i do not have the logs any more from the disk full event itsself. > I have three questions: > * What is the best way to clean this up? Deleting the old log files and > restarting the brokers? > * Why did kafka not handle the disk full event well? Is this only affecting > the cleanup or may we also loose data? > * Is this maybe caused by the combination of upgrade and disk full? > And last but not least: Keep up the good work. Kafka is really performing > well while being easy to administer and has good documentation! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5458) SASL handshake fails for METADATA requests
[ https://issues.apache.org/jira/browse/KAFKA-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053896#comment-16053896 ] Edoardo Comar commented on KAFKA-5458: -- looks like your properties file does not correctly set the property {{sasl.jaas.config}} note that using that you must not set the JVm system property > SASL handshake fails for METADATA requests > -- > > Key: KAFKA-5458 > URL: https://issues.apache.org/jira/browse/KAFKA-5458 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.1.1 >Reporter: Sagar Lad > Attachments: kafka_client_jaas.conf, kafka_server_jaas.conf, > producer.properties, server.properties > > > i have enabled SASL Authentication for my kafka with > {code:java} > listeners=SASL_PLAINTEXT://0.0.0.0:9092 > advertised.listeners=SASL_PLAINTEXT://0.0.0.0:9092 > sasl.enabled.mechanisms=PLAIN > security.protocol=SASL_PLAINTEXT > security.inter.broker.protocol=SASL_PLAINTEXT > sasl.mechanism=PLAIN > sasl.mechanism.inter.broker.protocol=PLAIN > {code} > but in kafkaServer.out getting > {code:java} > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to HANDSHAKE_REQUEST > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Handle Kafka request METADATA > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to FAILED > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Connection with /127.0.0.1 disconnected > (org.apache.kafka.common.network.Selector) > java.io.IOException: > org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected Kafka > request of type METADATA during SASL handshake. > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:243) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:338) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at kafka.network.Processor.poll(SocketServer.scala:476) > at kafka.network.Processor.run(SocketServer.scala:416) > at java.lang.Thread.run(Thread.java:745) > {code} > where as in client log i observed > {code:java} > 2017-06-16 11:33:33,501 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,607 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,721 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,834 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,952 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:34,059 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:34,162 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > ### > {code} > metadata request are being every 100ms > *Problem* : > how can we make SASL handshake to support METADATA type requests -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5458) SASL handshake fails for METADATA requests
[ https://issues.apache.org/jira/browse/KAFKA-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053452#comment-16053452 ] Sagar Lad edited comment on KAFKA-5458 at 6/19/17 12:14 PM: Hi [~ecomar], i tried using properties files as well instead of jaas.conf but now getting : {code:java} Caused by: org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:86)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:70)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:85)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:302)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] ... 74 more {code} was (Author: sagar lad): i tried using properties files as well instead of jaas.conf but now getting : {code:java} Caused by: org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:86)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:70)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:85)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:302)[352:wrap_mvn_org.apache.kafka_kafka-clients_0.10.1.1:0.0.0] ... 74 more {code} > SASL handshake fails for METADATA requests > -- > > Key: KAFKA-5458 > URL: https://issues.apache.org/jira/browse/KAFKA-5458 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.1.1 >Reporter: Sagar Lad > Attachments: kafka_client_jaas.conf, kafka_server_jaas.conf, > producer.properties, server.properties > > > i have enabled SASL Authentication for my kafka with > {code:java} > listeners=SASL_PLAINTEXT://0.0.0.0:9092 > advertised.listeners=SASL_PLAINTEXT://0.0.0.0:9092 > sasl.enabled.mechanisms=PLAIN > security.protocol=SASL_PLAINTEXT > security.inter.broker.protocol=SASL_PLAINTEXT > sasl.mechanism=PLAIN > sasl.mechanism.inter.broker.protocol=PLAIN > {code} > but in kafkaServer.out getting > {code:java} > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to HANDSHAKE_REQUEST > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Handle Kafka request METADATA > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Set SASL server state to FAILED > (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator) > [2017-06-16 11:21:12,167] DEBUG Connection with /127.0.0.1 disconnected > (org.apache.kafka.common.network.Selector) > java.io.IOException: > org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected Kafka > request of type METADATA during SASL handshake. > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:243) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:338) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at kafka.network.Processor.poll(SocketServer.scala:476) > at kafka.network.Processor.run(SocketServer.scala:416) > at java.lang.Thread.run(Thread.java:745) > {code} > where as in client log i observed > {code:java} > 2017-06-16 11:33:33,501 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,607 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:568) > - Bootstrap broker 127.0.0.1:9092 disconnected > 2017-06-16 11:33:33,721 WARN [kafka-producer-network-thread | > producer-7] NetworkClient.maybeHandleDisconnection (NetworkClient.java:
[jira] [Commented] (KAFKA-4900) Brokers stuck in controller re-election loop after failing to register metrics
[ https://issues.apache.org/jira/browse/KAFKA-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053852#comment-16053852 ] Pengwei commented on KAFKA-4900: [~ijuma] OK. I see the code in the trunk, after send failed, it will call handleIllegalState, and then will call onControllerResignation() before setting the leaderId to -1, it seems this PR can fix this issue. > Brokers stuck in controller re-election loop after failing to register metrics > -- > > Key: KAFKA-4900 > URL: https://issues.apache.org/jira/browse/KAFKA-4900 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 0.10.1.1 >Reporter: Nick Travers > > We hit this today in one of out three node staging clusters. The exception > continues to occur on all three nodes. > {code} > 2017-03-15 02:17:30,677 ERROR > [ZkClient-EventThread-35-samsa-zkserver.stage.sjc1.square:26101/samsa] > server.ZookeeperLeaderElector - Error while electing or becoming leader on > broker 9 > java.lang.IllegalArgumentException: A metric named 'MetricName > [name=connection-close-rate, > group=controller-channel-metrics,description=Connections closed per second in > the window., tags={broker-id=10}]' already exists, can't register another one. > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:380) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:179) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:164) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:617) > at org.apache.kafka.common.network.Selector.(Selector.java:138) > at > kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:101) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at scala.collection.immutable.Set$Set3.foreach(Set.scala:163) > at > kafka.controller.ControllerChannelManager.(ControllerChannelManager.scala:45) > at > kafka.controller.KafkaController.startChannelManager(KafkaController.scala:814) > at > kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742) > at > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:334) > at > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:167) > at > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141) > at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:824) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {code} > We observe a tight loop of controller (re-)election, i.e. one node hits this > exception, and leadership transitions to the next, which then hits the > exception, ad infinitum. > Producers and consumers appear to be connecting ok, and are able to produce > and consume messages. > Relevant data points: > - prior to this cluster restart, a partition reassignment was attempted for > a number of topics, which appeared to get stuck in the "in progress" state > (on the order of days) > - these topics were subsequently deleted > - a rolling restart of the cluster was performed was to turn on > broker-to-broker SSL communication > - the SSL change has subsequently been _rolled back_ after we observed these > exceptions > - the entire cluster was shut down, and nodes brought back one at a time in > an attempt to clear the exception. We were able to restart the cluster, but > we continue to see the exceptions > We also observed, during the same time as the exception above, the following > exception on all hosts: > {code} > 2017-03-15 01:44:04,572 ERROR > [ZkClient-EventThread-36-samsa-zkserver.stage.sjc1.square:26101/samsa] > controller.ReplicaStateMachine$BrokerChangeListener - [BrokerChangeListener > on Controller 10]: Error while handling broker changes > java.lang.ClassCastException: java
[jira] [Commented] (KAFKA-4900) Brokers stuck in controller re-election loop after failing to register metrics
[ https://issues.apache.org/jira/browse/KAFKA-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053830#comment-16053830 ] Ismael Juma commented on KAFKA-4900: No, #2935 is only in trunk and 0.11.0 branches and it's not for KAFKA-3761 (it did remove something that has been left behind by KAFKA-3761). > Brokers stuck in controller re-election loop after failing to register metrics > -- > > Key: KAFKA-4900 > URL: https://issues.apache.org/jira/browse/KAFKA-4900 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 0.10.1.1 >Reporter: Nick Travers > > We hit this today in one of out three node staging clusters. The exception > continues to occur on all three nodes. > {code} > 2017-03-15 02:17:30,677 ERROR > [ZkClient-EventThread-35-samsa-zkserver.stage.sjc1.square:26101/samsa] > server.ZookeeperLeaderElector - Error while electing or becoming leader on > broker 9 > java.lang.IllegalArgumentException: A metric named 'MetricName > [name=connection-close-rate, > group=controller-channel-metrics,description=Connections closed per second in > the window., tags={broker-id=10}]' already exists, can't register another one. > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:380) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:179) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:164) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:617) > at org.apache.kafka.common.network.Selector.(Selector.java:138) > at > kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:101) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at scala.collection.immutable.Set$Set3.foreach(Set.scala:163) > at > kafka.controller.ControllerChannelManager.(ControllerChannelManager.scala:45) > at > kafka.controller.KafkaController.startChannelManager(KafkaController.scala:814) > at > kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742) > at > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:334) > at > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:167) > at > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141) > at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:824) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {code} > We observe a tight loop of controller (re-)election, i.e. one node hits this > exception, and leadership transitions to the next, which then hits the > exception, ad infinitum. > Producers and consumers appear to be connecting ok, and are able to produce > and consume messages. > Relevant data points: > - prior to this cluster restart, a partition reassignment was attempted for > a number of topics, which appeared to get stuck in the "in progress" state > (on the order of days) > - these topics were subsequently deleted > - a rolling restart of the cluster was performed was to turn on > broker-to-broker SSL communication > - the SSL change has subsequently been _rolled back_ after we observed these > exceptions > - the entire cluster was shut down, and nodes brought back one at a time in > an attempt to clear the exception. We were able to restart the cluster, but > we continue to see the exceptions > We also observed, during the same time as the exception above, the following > exception on all hosts: > {code} > 2017-03-15 01:44:04,572 ERROR > [ZkClient-EventThread-36-samsa-zkserver.stage.sjc1.square:26101/samsa] > controller.ReplicaStateMachine$BrokerChangeListener - [BrokerChangeListener > on Controller 10]: Error while handling broker changes > java.lang.ClassCastException: java.lang.String cannot be cast to > java.lang.Integer > a
[jira] [Commented] (KAFKA-4900) Brokers stuck in controller re-election loop after failing to register metrics
[ https://issues.apache.org/jira/browse/KAFKA-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053813#comment-16053813 ] Pengwei commented on KAFKA-4900: [~ijuma] This PR is for the https://issues.apache.org/jira/browse/KAFKA-3761, and Kafka-3761 is fixed in 0.10.1.0. But I found that in the 0.10.2.0, the ZookeeperLeaderElector.resign still setting the leaderId to -1 > Brokers stuck in controller re-election loop after failing to register metrics > -- > > Key: KAFKA-4900 > URL: https://issues.apache.org/jira/browse/KAFKA-4900 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 0.10.1.1 >Reporter: Nick Travers > > We hit this today in one of out three node staging clusters. The exception > continues to occur on all three nodes. > {code} > 2017-03-15 02:17:30,677 ERROR > [ZkClient-EventThread-35-samsa-zkserver.stage.sjc1.square:26101/samsa] > server.ZookeeperLeaderElector - Error while electing or becoming leader on > broker 9 > java.lang.IllegalArgumentException: A metric named 'MetricName > [name=connection-close-rate, > group=controller-channel-metrics,description=Connections closed per second in > the window., tags={broker-id=10}]' already exists, can't register another one. > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:380) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:179) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:164) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:617) > at org.apache.kafka.common.network.Selector.(Selector.java:138) > at > kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:101) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at scala.collection.immutable.Set$Set3.foreach(Set.scala:163) > at > kafka.controller.ControllerChannelManager.(ControllerChannelManager.scala:45) > at > kafka.controller.KafkaController.startChannelManager(KafkaController.scala:814) > at > kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742) > at > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:334) > at > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:167) > at > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141) > at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:824) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {code} > We observe a tight loop of controller (re-)election, i.e. one node hits this > exception, and leadership transitions to the next, which then hits the > exception, ad infinitum. > Producers and consumers appear to be connecting ok, and are able to produce > and consume messages. > Relevant data points: > - prior to this cluster restart, a partition reassignment was attempted for > a number of topics, which appeared to get stuck in the "in progress" state > (on the order of days) > - these topics were subsequently deleted > - a rolling restart of the cluster was performed was to turn on > broker-to-broker SSL communication > - the SSL change has subsequently been _rolled back_ after we observed these > exceptions > - the entire cluster was shut down, and nodes brought back one at a time in > an attempt to clear the exception. We were able to restart the cluster, but > we continue to see the exceptions > We also observed, during the same time as the exception above, the following > exception on all hosts: > {code} > 2017-03-15 01:44:04,572 ERROR > [ZkClient-EventThread-36-samsa-zkserver.stage.sjc1.square:26101/samsa] > controller.ReplicaStateMachine$BrokerChangeListener - [BrokerChangeListener > on Controller 10]: Error while handling broker changes > java.lang.ClassCastException:
[jira] [Commented] (KAFKA-5469) Created state changelog topics not logged correctly
[ https://issues.apache.org/jira/browse/KAFKA-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053801#comment-16053801 ] ASF GitHub Bot commented on KAFKA-5469: --- GitHub user ppatierno opened a pull request: https://github.com/apache/kafka/pull/3368 KAFKA-5469: Created state changelog topics not logged correctly Fixed debug logging for the created state changelog topics Added toString() for InternalTopicMetadata and InternalTopicConfig for above debug logging You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppatierno/kafka kafka-5469 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3368.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3368 commit 33d1ce3abdd99c98a8d878d266849347d1f20807 Author: ppatierno Date: 2017-06-19T10:56:35Z Fixed debug loggin for the created state changelog topics Added toString() for InternalTopicMetadata and InternalTopicConfig for above debug logging > Created state changelog topics not logged correctly > --- > > Key: KAFKA-5469 > URL: https://issues.apache.org/jira/browse/KAFKA-5469 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Paolo Patierno >Assignee: Paolo Patierno > > In the StreamPartitionAssignor class, the created state changelog topics > aren't not logged correctly if the DEBUG log level is set. > {code} > [2017-06-19 12:18:44,186] DEBUG stream-thread > [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] > Created state changelog topics > {streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-changelog=org.apache.kafka.streams.processor.internals.StreamPartitionAssignor$InternalTopicMetadata@65ae7693} > from the parsed topology. > (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:477) > {code} > against repartition topics which are well logged : > {code} > [2017-06-19 12:18:37,871] DEBUG stream-thread > [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] > Created repartition topics [Partition(topic = > streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-repartition, > partition = 0, leader = none, replicas = [], isr = [])] from the parsed > topology. > (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:402) > {code} > At same time if source topics are not created before launching the stream > application, the state changelog topics log shows just {} (the placeholder) > while for repartition topics the [] that is right because it's an empty list. > It seems that there are two problems. For state changelog topics the values() > is not used but then InternalTopicMetadata hasn't a toString() for having a > well formatted output. > I'm already working on that .. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5469) Created state changelog topics not logged correctly
[ https://issues.apache.org/jira/browse/KAFKA-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paolo Patierno updated KAFKA-5469: -- Description: In the StreamPartitionAssignor class, the created state changelog topics aren't not logged correctly if the DEBUG log level is set. {code} [2017-06-19 12:18:44,186] DEBUG stream-thread [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] Created state changelog topics {streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-changelog=org.apache.kafka.streams.processor.internals.StreamPartitionAssignor$InternalTopicMetadata@65ae7693} from the parsed topology. (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:477) {code} against repartition topics which are well logged : {code} [2017-06-19 12:18:37,871] DEBUG stream-thread [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] Created repartition topics [Partition(topic = streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-repartition, partition = 0, leader = none, replicas = [], isr = [])] from the parsed topology. (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:402) {code} At same time if source topics are not created before launching the stream application, the state changelog topics log shows just {} (the placeholder) while for repartition topics the [] that is right because it's an empty list. It seems that there are two problems. For state changelog topics the values() is not used but then InternalTopicMetadata hasn't a toString() for having a well formatted output. I'm already working on that .. was: In the StreamPartitionAssignor class, the created state changelog topics aren't not logged correctly if the DEBUG log level is set. [2017-06-19 12:18:44,186] DEBUG stream-thread [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] Created state changelog topics {streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-changelog=org.apache.kafka.streams.processor.internals.StreamPartitionAssignor$InternalTopicMetadata@65ae7693} from the parsed topology. (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:477) against repartition topics which are well logged : [2017-06-19 12:18:37,871] DEBUG stream-thread [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] Created repartition topics [Partition(topic = streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-repartition, partition = 0, leader = none, replicas = [], isr = [])] from the parsed topology. (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:402) At same time if source topics are not created before launching the stream application, the state changelog topics log shows just {} (the placeholder) while for repartition topics the [] that is right because it's an empty list. It seems that there are two problems. For state changelog topics the values() is not used but then InternalTopicMetadata hasn't a toString() for having a well formatted output. I'm already working on that .. > Created state changelog topics not logged correctly > --- > > Key: KAFKA-5469 > URL: https://issues.apache.org/jira/browse/KAFKA-5469 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Paolo Patierno >Assignee: Paolo Patierno > > In the StreamPartitionAssignor class, the created state changelog topics > aren't not logged correctly if the DEBUG log level is set. > {code} > [2017-06-19 12:18:44,186] DEBUG stream-thread > [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] > Created state changelog topics > {streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-changelog=org.apache.kafka.streams.processor.internals.StreamPartitionAssignor$InternalTopicMetadata@65ae7693} > from the parsed topology. > (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:477) > {code} > against repartition topics which are well logged : > {code} > [2017-06-19 12:18:37,871] DEBUG stream-thread > [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] > Created repartition topics [Partition(topic = > streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-repartition, > partition = 0, leader = none, replicas = [], isr = [])] from the parsed > topology. > (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:402) > {code} > At same time if source topics are not created before launching the stream > application, the state changelog topics log shows just {} (the placeholder) > while for repartition topics the [] that is right because it's an empty list. > It seems that there are two problems. For state changelog topics the values() > is not used but then InternalTopicMetadata hasn't a to
[jira] [Created] (KAFKA-5469) Created state changelog topics not logged correctly
Paolo Patierno created KAFKA-5469: - Summary: Created state changelog topics not logged correctly Key: KAFKA-5469 URL: https://issues.apache.org/jira/browse/KAFKA-5469 Project: Kafka Issue Type: Bug Components: streams Reporter: Paolo Patierno Assignee: Paolo Patierno In the StreamPartitionAssignor class, the created state changelog topics aren't not logged correctly if the DEBUG log level is set. [2017-06-19 12:18:44,186] DEBUG stream-thread [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] Created state changelog topics {streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-changelog=org.apache.kafka.streams.processor.internals.StreamPartitionAssignor$InternalTopicMetadata@65ae7693} from the parsed topology. (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:477) against repartition topics which are well logged : [2017-06-19 12:18:37,871] DEBUG stream-thread [streams-temperature-6d25c7ff-0927-4469-8ab3-2685e7d8-StreamThread-1] Created repartition topics [Partition(topic = streams-temperature-KSTREAM-REDUCE-STATE-STORE-02-repartition, partition = 0, leader = none, replicas = [], isr = [])] from the parsed topology. (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:402) At same time if source topics are not created before launching the stream application, the state changelog topics log shows just {} (the placeholder) while for repartition topics the [] that is right because it's an empty list. It seems that there are two problems. For state changelog topics the values() is not used but then InternalTopicMetadata hasn't a toString() for having a well formatted output. I'm already working on that .. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4900) Brokers stuck in controller re-election loop after failing to register metrics
[ https://issues.apache.org/jira/browse/KAFKA-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053750#comment-16053750 ] Ismael Juma commented on KAFKA-4900: [~pengwei], can you please double-check that the following PR has not fixed the issue you are describing? https://github.com/apache/kafka/pull/2935 > Brokers stuck in controller re-election loop after failing to register metrics > -- > > Key: KAFKA-4900 > URL: https://issues.apache.org/jira/browse/KAFKA-4900 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 0.10.1.1 >Reporter: Nick Travers > > We hit this today in one of out three node staging clusters. The exception > continues to occur on all three nodes. > {code} > 2017-03-15 02:17:30,677 ERROR > [ZkClient-EventThread-35-samsa-zkserver.stage.sjc1.square:26101/samsa] > server.ZookeeperLeaderElector - Error while electing or becoming leader on > broker 9 > java.lang.IllegalArgumentException: A metric named 'MetricName > [name=connection-close-rate, > group=controller-channel-metrics,description=Connections closed per second in > the window., tags={broker-id=10}]' already exists, can't register another one. > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:380) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:179) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:164) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:617) > at org.apache.kafka.common.network.Selector.(Selector.java:138) > at > kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:101) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at scala.collection.immutable.Set$Set3.foreach(Set.scala:163) > at > kafka.controller.ControllerChannelManager.(ControllerChannelManager.scala:45) > at > kafka.controller.KafkaController.startChannelManager(KafkaController.scala:814) > at > kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742) > at > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:334) > at > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:167) > at > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141) > at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:824) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {code} > We observe a tight loop of controller (re-)election, i.e. one node hits this > exception, and leadership transitions to the next, which then hits the > exception, ad infinitum. > Producers and consumers appear to be connecting ok, and are able to produce > and consume messages. > Relevant data points: > - prior to this cluster restart, a partition reassignment was attempted for > a number of topics, which appeared to get stuck in the "in progress" state > (on the order of days) > - these topics were subsequently deleted > - a rolling restart of the cluster was performed was to turn on > broker-to-broker SSL communication > - the SSL change has subsequently been _rolled back_ after we observed these > exceptions > - the entire cluster was shut down, and nodes brought back one at a time in > an attempt to clear the exception. We were able to restart the cluster, but > we continue to see the exceptions > We also observed, during the same time as the exception above, the following > exception on all hosts: > {code} > 2017-03-15 01:44:04,572 ERROR > [ZkClient-EventThread-36-samsa-zkserver.stage.sjc1.square:26101/samsa] > controller.ReplicaStateMachine$BrokerChangeListener - [BrokerChangeListener > on Controller 10]: Error while handling broker changes > java.lang.ClassCastException: java.lang.String cannot be cast to > java.lang.Integer >
[jira] [Commented] (KAFKA-3741) KStream config for changelog min.in.sync.replicas
[ https://issues.apache.org/jira/browse/KAFKA-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053728#comment-16053728 ] ASF GitHub Bot commented on KAFKA-3741: --- Github user dguy closed the pull request at: https://github.com/apache/kafka/pull/3255 > KStream config for changelog min.in.sync.replicas > - > > Key: KAFKA-3741 > URL: https://issues.apache.org/jira/browse/KAFKA-3741 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover >Assignee: Damian Guy > Labels: api > > Kafka Streams currently allows you to specify a replication factor for > changelog and repartition topics that it creates. It should also allow you > to specify min.in.sync.replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4900) Brokers stuck in controller re-election loop after failing to register metrics
[ https://issues.apache.org/jira/browse/KAFKA-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053661#comment-16053661 ] Pengwei commented on KAFKA-4900: I found this issue seems because of when controller sending the message failed, it will call : controllerElector.resign(). In the controllerElector.resign(), it will set the leaderId = -1 and try to delete the zk node. After set the leaderId to -1, the zk client to handle the LeaderChangeListener.handleDataDeleted, it will think the previous controller is not itself because leaderId is set to -1, and try to elect again. Then the previous controller-to-broker-send-threads are not closed and try to start the new controller-to-broker-sned-threads then will found this issue. > Brokers stuck in controller re-election loop after failing to register metrics > -- > > Key: KAFKA-4900 > URL: https://issues.apache.org/jira/browse/KAFKA-4900 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 0.10.1.1 >Reporter: Nick Travers > > We hit this today in one of out three node staging clusters. The exception > continues to occur on all three nodes. > {code} > 2017-03-15 02:17:30,677 ERROR > [ZkClient-EventThread-35-samsa-zkserver.stage.sjc1.square:26101/samsa] > server.ZookeeperLeaderElector - Error while electing or becoming leader on > broker 9 > java.lang.IllegalArgumentException: A metric named 'MetricName > [name=connection-close-rate, > group=controller-channel-metrics,description=Connections closed per second in > the window., tags={broker-id=10}]' already exists, can't register another one. > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:380) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:179) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:164) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:617) > at org.apache.kafka.common.network.Selector.(Selector.java:138) > at > kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$addNewBroker(ControllerChannelManager.scala:101) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at > kafka.controller.ControllerChannelManager$$anonfun$1.apply(ControllerChannelManager.scala:45) > at scala.collection.immutable.Set$Set3.foreach(Set.scala:163) > at > kafka.controller.ControllerChannelManager.(ControllerChannelManager.scala:45) > at > kafka.controller.KafkaController.startChannelManager(KafkaController.scala:814) > at > kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742) > at > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:334) > at > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:167) > at > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141) > at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:824) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {code} > We observe a tight loop of controller (re-)election, i.e. one node hits this > exception, and leadership transitions to the next, which then hits the > exception, ad infinitum. > Producers and consumers appear to be connecting ok, and are able to produce > and consume messages. > Relevant data points: > - prior to this cluster restart, a partition reassignment was attempted for > a number of topics, which appeared to get stuck in the "in progress" state > (on the order of days) > - these topics were subsequently deleted > - a rolling restart of the cluster was performed was to turn on > broker-to-broker SSL communication > - the SSL change has subsequently been _rolled back_ after we observed these > exceptions > - the entire cluster was shut down, and nodes brought back one at a time in > an attempt to clear the exception. We were able to restart the cluster, but > we continue to see the exceptions > We also observed, d