[jira] [Resolved] (KAFKA-12407) Document omitted Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-12407. -- Resolution: Fixed > Document omitted Controller Health Metrics > -- > > Key: KAFKA-12407 > URL: https://issues.apache.org/jira/browse/KAFKA-12407 > Project: Kafka > Issue Type: Improvement > Components: documentation >Reporter: Dongjin Lee >Assignee: Dongjin Lee >Priority: Major > > [KIP-237|https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics] > introduced 3 controller health metics like the following, but none of them > are documented. > * kafka.controller:type=ControllerEventManager,name=EventQueueSize > * kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs > * > kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs,brokerId=\{broker-id} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9016) Warn when log dir stopped serving replicas
[ https://issues.apache.org/jira/browse/KAFKA-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-9016. - Resolution: Fixed > Warn when log dir stopped serving replicas > -- > > Key: KAFKA-9016 > URL: https://issues.apache.org/jira/browse/KAFKA-9016 > Project: Kafka > Issue Type: Improvement > Components: log, log cleaner >Reporter: Viktor Somogyi-Vass >Assignee: kumar uttpal >Priority: Major > Labels: easy, newbie > > Kafka should warn if the log directory stops serving replicas as usually it > is due to an error and it's much visible if it's on warn level. > {noformat} > 2019-09-19 12:39:54,194 ERROR kafka.server.LogDirFailureChannel: Error while > writing to checkpoint file /kafka/data/diskX/replication-offset-checkpoint > java.io.SyncFailedException: sync failed > .. > 2019-09-19 12:39:54,197 INFO kafka.server.ReplicaManager: [ReplicaManager > broker=638] Stopping serving replicas in dir /kafka/data/diskX > .. > 2019-09-19 12:39:54,205 INFO kafka.server.ReplicaFetcherManager: > [ReplicaFetcherManager on broker 638] Removed fetcher for partitions > Set(test1-0, test2-2, test-0, test2-2, test4-14, test4-0, test2-6) > 2019-09-19 12:39:54,206 INFO kafka.server.ReplicaAlterLogDirsManager: > [ReplicaAlterLogDirsManager on broker 638] Removed fetcher for partitions > Set(test1-0, test2-2, test-0, test3-2, test4-14, test4-0, test2-6) > 2019-09-19 12:39:54,222 INFO kafka.server.ReplicaManager: [ReplicaManager > broker=638] Broker 638 stopped fetcher for partitions > test1-0,test2-2,test-0,test3-2,test4-14,test4-0,test2-6 and stopped moving > logs for partitions because they are in the failed log directory > /kafka/data/diskX. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9016) Warn when log dir stopped serving replicas
[ https://issues.apache.org/jira/browse/KAFKA-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-9016: --- Assignee: kumar uttpal > Warn when log dir stopped serving replicas > -- > > Key: KAFKA-9016 > URL: https://issues.apache.org/jira/browse/KAFKA-9016 > Project: Kafka > Issue Type: Improvement > Components: log, log cleaner >Reporter: Viktor Somogyi-Vass >Assignee: kumar uttpal >Priority: Major > Labels: easy, newbie > > Kafka should warn if the log directory stops serving replicas as usually it > is due to an error and it's much visible if it's on warn level. > {noformat} > 2019-09-19 12:39:54,194 ERROR kafka.server.LogDirFailureChannel: Error while > writing to checkpoint file /kafka/data/diskX/replication-offset-checkpoint > java.io.SyncFailedException: sync failed > .. > 2019-09-19 12:39:54,197 INFO kafka.server.ReplicaManager: [ReplicaManager > broker=638] Stopping serving replicas in dir /kafka/data/diskX > .. > 2019-09-19 12:39:54,205 INFO kafka.server.ReplicaFetcherManager: > [ReplicaFetcherManager on broker 638] Removed fetcher for partitions > Set(test1-0, test2-2, test-0, test2-2, test4-14, test4-0, test2-6) > 2019-09-19 12:39:54,206 INFO kafka.server.ReplicaAlterLogDirsManager: > [ReplicaAlterLogDirsManager on broker 638] Removed fetcher for partitions > Set(test1-0, test2-2, test-0, test3-2, test4-14, test4-0, test2-6) > 2019-09-19 12:39:54,222 INFO kafka.server.ReplicaManager: [ReplicaManager > broker=638] Broker 638 stopped fetcher for partitions > test1-0,test2-2,test-0,test3-2,test4-14,test4-0,test2-6 and stopped moving > logs for partitions because they are in the failed log directory > /kafka/data/diskX. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7837) maybeShrinkIsr may not reflect OfflinePartitions immediately
[ https://issues.apache.org/jira/browse/KAFKA-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745832#comment-16745832 ] Dong Lin commented on KAFKA-7837: - [~junrao] Currently `partition.maybeShrinkIsr(...)` reads `leaderReplicaIfLocal` to determine whether the partition is still the leader. When there is disk failure, we can also do `partition.leaderReplicaIfLocal = None` in `replicaManager.handleLogDirFailure(...)` for every partition on the offline disk, so that broker will no longer shrink ISR for these partitions. I personally feel this approach is probably simpler than accessing ReplicaManager.allPartitions(). > maybeShrinkIsr may not reflect OfflinePartitions immediately > > > Key: KAFKA-7837 > URL: https://issues.apache.org/jira/browse/KAFKA-7837 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Priority: Major > > When a partition is marked offline due to a failed disk, the leader is > supposed to not shrink its ISR any more. In ReplicaManager.maybeShrinkIsr(), > we iterate through all non-offline partitions to shrink the ISR. If an ISR > needs to shrink, we need to write the new ISR to ZK, which can take a bit of > time. In this window, some partitions could now be marked as offline, but may > not be picked up by the iterator since it only reflects the state at that > point. This can cause all in-sync followers to be dropped out of ISR > unnecessarily and prevents a clean leader election. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7836) The propagation of log dir failure can be delayed due to slowness in closing the file handles
[ https://issues.apache.org/jira/browse/KAFKA-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745816#comment-16745816 ] Dong Lin commented on KAFKA-7836: - [~junrao] This solution sounds good to me. > The propagation of log dir failure can be delayed due to slowness in closing > the file handles > - > > Key: KAFKA-7836 > URL: https://issues.apache.org/jira/browse/KAFKA-7836 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Priority: Major > > In ReplicaManager.handleLogDirFailure(), we call > zkClient.propagateLogDirEvent after logManager.handleLogDirFailure. The > latter closes the file handles of the offline replicas, which could take time > when the disk is bad. This will delay the new leader election by the > controller. In one incident, we have seen the closing of file handles of > multiple replicas taking more than 20 seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7829) Javadoc should show that AdminClient.alterReplicaLogDirs() is supported in Kafka 1.1.0 or later
[ https://issues.apache.org/jira/browse/KAFKA-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7829: Summary: Javadoc should show that AdminClient.alterReplicaLogDirs() is supported in Kafka 1.1.0 or later (was: Javadoc should show that alterReplicaLogDirs() is supported in Kafka 1.1.0 or later) > Javadoc should show that AdminClient.alterReplicaLogDirs() is supported in > Kafka 1.1.0 or later > --- > > Key: KAFKA-7829 > URL: https://issues.apache.org/jira/browse/KAFKA-7829 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Dong Lin >Priority: Major > > In ReassignPartitionsCommand, the --reassignment-json-file option says "...If > absolute log directory path is specified, it is currently required that the > replica has not already been created on that broker...". This is inaccurate > since we support moving existing replicas to new log dirs. > In addition, the Javadoc of AdminClient.alterReplicaLogDirs(...) should show > the API is supported by brokers with version 1.1.0 or later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7829) Javadoc should show that alterReplicaLogDirs() is supported in Kafka 1.1.0 or later
[ https://issues.apache.org/jira/browse/KAFKA-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7829: Description: In ReassignPartitionsCommand, the --reassignment-json-file option says "...If absolute log directory path is specified, it is currently required that the replica has not already been created on that broker...". This is inaccurate since we support moving existing replicas to new log dirs. In addition, the Javadoc of AdminClient.alterReplicaLogDirs(...) should show the API is supported by brokers with version 1.1.0 or later. was: In ReassignPartitionsCommand, the --reassignment-json-file option has the following. This seems inaccurate since we support moving existing replicas to new log dirs. If absolute log directory path is specified, it is currently required that " + "the replica has not already been created on that broker. The replica will then be created in the specified log directory on the broker later > Javadoc should show that alterReplicaLogDirs() is supported in Kafka 1.1.0 or > later > --- > > Key: KAFKA-7829 > URL: https://issues.apache.org/jira/browse/KAFKA-7829 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Dong Lin >Priority: Major > > In ReassignPartitionsCommand, the --reassignment-json-file option says "...If > absolute log directory path is specified, it is currently required that the > replica has not already been created on that broker...". This is inaccurate > since we support moving existing replicas to new log dirs. > In addition, the Javadoc of AdminClient.alterReplicaLogDirs(...) should show > the API is supported by brokers with version 1.1.0 or later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7829) Javadoc should show that alterReplicaLogDirs() is supported in Kafka 1.1.0 or later
[ https://issues.apache.org/jira/browse/KAFKA-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7829: Summary: Javadoc should show that alterReplicaLogDirs() is supported in Kafka 1.1.0 or later (was: Inaccurate description for --reassignment-json-file option in ReassignPartitionsCommand) > Javadoc should show that alterReplicaLogDirs() is supported in Kafka 1.1.0 or > later > --- > > Key: KAFKA-7829 > URL: https://issues.apache.org/jira/browse/KAFKA-7829 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Dong Lin >Priority: Major > > In ReassignPartitionsCommand, the --reassignment-json-file option has the > following. This seems inaccurate since we support moving existing replicas to > new log dirs. > If absolute log directory path is specified, it is currently required that " + > "the replica has not already been created on that broker. The replica will > then be created in the specified log directory on the broker later -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7829) Inaccurate description for --reassignment-json-file option in ReassignPartitionsCommand
[ https://issues.apache.org/jira/browse/KAFKA-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744321#comment-16744321 ] Dong Lin commented on KAFKA-7829: - [~junrao] Yeah this is inaccurate after https://github.com/apache/kafka/pull/3874 is merged. I will submit PR to fix this. > Inaccurate description for --reassignment-json-file option in > ReassignPartitionsCommand > --- > > Key: KAFKA-7829 > URL: https://issues.apache.org/jira/browse/KAFKA-7829 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Priority: Major > > In ReassignPartitionsCommand, the --reassignment-json-file option has the > following. This seems inaccurate since we support moving existing replicas to > new log dirs. > If absolute log directory path is specified, it is currently required that " + > "the replica has not already been created on that broker. The replica will > then be created in the specified log directory on the broker later -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7829) Inaccurate description for --reassignment-json-file option in ReassignPartitionsCommand
[ https://issues.apache.org/jira/browse/KAFKA-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-7829: --- Assignee: Dong Lin > Inaccurate description for --reassignment-json-file option in > ReassignPartitionsCommand > --- > > Key: KAFKA-7829 > URL: https://issues.apache.org/jira/browse/KAFKA-7829 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Dong Lin >Priority: Major > > In ReassignPartitionsCommand, the --reassignment-json-file option has the > following. This seems inaccurate since we support moving existing replicas to > new log dirs. > If absolute log directory path is specified, it is currently required that " + > "the replica has not already been created on that broker. The replica will > then be created in the specified log directory on the broker later -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723236#comment-16723236 ] Dong Lin commented on KAFKA-7297: - [~ijuma] Hmm.. I think we may have different understanding. Jun originally does mention the use of materialized view (which involves copying) to resolve the underlying map may be changed. In the latest discussion, we agree that it is OK for the underlying view because ConcurrentSkipListMap supports weak consistency. So what we are trying to address in this ticket is the issue in the JIRA description, i.e. additional entry may be returned by `Log.logSegments `. There are two solutions to solve this problem. One solution is to use the atomic update which involves extra copy for write operation. The other solution is the use read/write lock which requires lock for read operation but there is no need to copy for read/write operation. Is this understanding correct? > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Zhanxiang (Patrick) Huang >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722347#comment-16722347 ] Dong Lin commented on KAFKA-7297: - [~ijuma] Atomic update is another good alternative to the read/write lock discussed above. Note that atomic update still requires lock to avoid race condition between concurrent mutations. If mutations are rare compared to reads, then both solutions should have low performance overhead. Atomic update has a bit extra memory (due to copy of the segments) overhead whereas the read/write lock solution has a bit extra lock overhead (due to the race condition between concurrent mutation and read operation). Is there a bit more detail to understand why atomic update is better in this case? > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Zhanxiang (Patrick) Huang >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722331#comment-16722331 ] Dong Lin commented on KAFKA-7297: - [~junrao] Yeah I agree. [~hzxa21] does this solution sounds reasonable? > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Zhanxiang (Patrick) Huang >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722044#comment-16722044 ] Dong Lin commented on KAFKA-7297: - [~junrao] After going through the code and the Java doc, it appears that currently the only "correctness" concern due to this issue are 1) Log.deleteRetentionSizeBreachedSegments() may delete segment that otherwise would not be deleted and 2) ReplicaManager.describeLogDirs() and thus result in the DescribeLogDirResponse may be larger than the actual value which causes overestimation on the user side. The probability of this happening is very rare and the impact of the consequence seems low. I agree it may not be causing problem now. Instead of documenting that the iterator may return overlapping segment, would it be be better to fix the issue described in this JIRA so that we will not see overlapping segment in the iterator? As you mentioned in the earlier comment, since all callers of Log.logSegments seem to either be holding lock already or is infrequent, the overhead seems OK, right? > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Zhanxiang (Patrick) Huang >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720876#comment-16720876 ] Dong Lin commented on KAFKA-7297: - [~junrao] Ah I would not be able to work on this. Is this an urgent issue? [~hzxa21] is interested to work on this issue. > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-7297: --- Assignee: Zhanxiang (Patrick) Huang (was: Dong Lin) > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Zhanxiang (Patrick) Huang >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7297: Reporter: Dong Lin (was: Zhanxiang (Patrick) Huang) > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7297) Both read/write access to Log.segments should be protected by lock
[ https://issues.apache.org/jira/browse/KAFKA-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7297: Reporter: Zhanxiang (Patrick) Huang (was: Dong Lin) > Both read/write access to Log.segments should be protected by lock > -- > > Key: KAFKA-7297 > URL: https://issues.apache.org/jira/browse/KAFKA-7297 > Project: Kafka > Issue Type: Improvement >Reporter: Zhanxiang (Patrick) Huang >Assignee: Dong Lin >Priority: Major > > Log.replaceSegments() updates segments in two steps. It first adds new > segments and then remove old segments. Though this operation is protected by > a lock, other read access such as Log.logSegments does not grab lock and thus > these methods may return an inconsistent view of the segments. > As an example, say Log.replaceSegments() intends to replace segments [0, > 100), [100, 200) with a new segment [0, 200). In this case if Log.logSegments > is called right after the new segments are added, the method may return > segments [0, 200), [100, 200) and messages in the range [100, 200) may be > duplicated if caller choose to enumerate all messages in all segments > returned by the method. > The solution is probably to protect read/write access to Log.segments with > read/write lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-5335) Controller should batch updatePartitionReassignmentData() operation
[ https://issues.apache.org/jira/browse/KAFKA-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-5335. - Resolution: Won't Do > Controller should batch updatePartitionReassignmentData() operation > --- > > Key: KAFKA-5335 > URL: https://issues.apache.org/jira/browse/KAFKA-5335 > Project: Kafka > Issue Type: Bug >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > > Currently controller will update partition reassignment data every time a > partition in the reassignment is completed. It means that if user specifies a > huge reassignment znode of size 1 MB to move 10K partitions, controller will > need to write roughly 0.5 MB * 1 = 5 GB data to zookeeper in order to > complete this reassignment. This is because controller needs to write the > remaining partitions to the znode every time a partition is completely moved. > This is problematic because such a huge reassignment may greatly slow down > Kafka controller. Note that partition reassignment doesn't necessarily cause > data movement between brokers because we may use it only to recorder the > replica list of partitions to evenly distribute preferred leader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7647) Flaky test LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic
[ https://issues.apache.org/jira/browse/KAFKA-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7647: Description: {code} kafka.log.LogCleanerParameterizedIntegrationTest > testCleansCombinedCompactAndDeleteTopic[3] FAILED java.lang.AssertionError: Contents of the map shouldn't change expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 -> (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:118) at kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129) {code} was: kafka.log.LogCleanerParameterizedIntegrationTest > testCleansCombinedCompactAndDeleteTopic[3] FAILED java.lang.AssertionError: Contents of the map shouldn't change expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 -> (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:118) at kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129) > Flaky test > LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic > - > > Key: KAFKA-7647 > URL: https://issues.apache.org/jira/browse/KAFKA-7647 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > {code} > kafka.log.LogCleanerParameterizedIntegrationTest > > testCleansCombinedCompactAndDeleteTopic[3] FAILED > java.lang.AssertionError: Contents of the map shouldn't change > expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> > (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), > 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> > (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 -> > (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but > was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), > 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> > (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> > (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 -> > (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> > (355,355))> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:118) > at > kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7646) Flaky test SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe
[ https://issues.apache.org/jira/browse/KAFKA-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690174#comment-16690174 ] Dong Lin commented on KAFKA-7646: - This issue is not captured by the automatic test runs in https://builds.apache.org/job/kafka-2.0-jdk8. It is hard to debug this without having stacktrace. Given that the test is related to Sasal and it passes most of the time, for the same reason as explained in https://issues.apache.org/jira/browse/KAFKA-7651, this does not appear to be a blocking issue for 2.1.0 release. > Flaky test > SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe > --- > > Key: KAFKA-7646 > URL: https://issues.apache.org/jira/browse/KAFKA-7646 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > This test is reported to fail by [~enothereska] as part of 2.1.0 RC0 release > certification. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7651) Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts
[ https://issues.apache.org/jira/browse/KAFKA-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7651: Issue Type: Sub-task (was: Task) Parent: KAFKA-7645 > Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts > --- > > Key: KAFKA-7651 > URL: https://issues.apache.org/jira/browse/KAFKA-7651 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > Here is stacktrace from > https://builds.apache.org/job/kafka-2.1-jdk8/51/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts/ > {code} > Error Message > java.lang.AssertionError: Expected an exception of type > org.apache.kafka.common.errors.TimeoutException; got type > org.apache.kafka.common.errors.SslAuthenticationException > Stacktrace > java.lang.AssertionError: Expected an exception of type > org.apache.kafka.common.errors.TimeoutException; got type > org.apache.kafka.common.errors.SslAuthenticationException > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404) > at > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7651) Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts
[ https://issues.apache.org/jira/browse/KAFKA-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690173#comment-16690173 ] Dong Lin commented on KAFKA-7651: - The test failure is related to SSL handshake. In general SSL handshake is a stateful operation without timeout. Thus it is understandable the SSL logic in the test may be flaky if the GC pause is long or there is ephemeral network issue in the test. There are same test failure for 2.0 branch in https://builds.apache.org/job/kafka-2.0-jdk8/183/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts. Since user has been running Kafka 2.0.0 well without major issues, the test failure here should not be a blocking issue for 2.1.0 release. > Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts > --- > > Key: KAFKA-7651 > URL: https://issues.apache.org/jira/browse/KAFKA-7651 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > Here is stacktrace from > https://builds.apache.org/job/kafka-2.1-jdk8/51/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts/ > {code} > Error Message > java.lang.AssertionError: Expected an exception of type > org.apache.kafka.common.errors.TimeoutException; got type > org.apache.kafka.common.errors.SslAuthenticationException > Stacktrace > java.lang.AssertionError: Expected an exception of type > org.apache.kafka.common.errors.TimeoutException; got type > org.apache.kafka.common.errors.SslAuthenticationException > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404) > at > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (KAFKA-7312) Transient failure in kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts
[ https://issues.apache.org/jira/browse/KAFKA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7312: Comment: was deleted (was: Here is another stacktrace from https://builds.apache.org/job/kafka-2.1-jdk8/51/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts/ {code} Error Message java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException Stacktrace java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404) at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} ) > Transient failure in > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts > > > Key: KAFKA-7312 > URL: https://issues.apache.org/jira/browse/KAFKA-7312 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Guozhang Wang >Priority: Major > > {code} > Error Message > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > Stacktrace > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262) > at > kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1345) > at > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7312) Transient failure in kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts
[ https://issues.apache.org/jira/browse/KAFKA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7312: Issue Type: Bug (was: Sub-task) Parent: (was: KAFKA-7645) > Transient failure in > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts > > > Key: KAFKA-7312 > URL: https://issues.apache.org/jira/browse/KAFKA-7312 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Guozhang Wang >Priority: Major > > {code} > Error Message > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > Stacktrace > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262) > at > kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1345) > at > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7651) Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts
Dong Lin created KAFKA-7651: --- Summary: Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts Key: KAFKA-7651 URL: https://issues.apache.org/jira/browse/KAFKA-7651 Project: Kafka Issue Type: Task Reporter: Dong Lin Here is stacktrace from https://builds.apache.org/job/kafka-2.1-jdk8/51/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts/ {code} Error Message java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException Stacktrace java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404) at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7312) Transient failure in kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts
[ https://issues.apache.org/jira/browse/KAFKA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690150#comment-16690150 ] Dong Lin edited comment on KAFKA-7312 at 11/17/18 12:36 AM: Here is another stacktrace from https://builds.apache.org/job/kafka-2.1-jdk8/51/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts/ {code} Error Message java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException Stacktrace java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404) at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} was (Author: lindong): Here is another stacktrace: {code} Error Message java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException Stacktrace java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404) at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} > Transient failure in > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts > > > Key: KAFKA-7312 > URL: https://issues.apache.org/jira/browse/KAFKA-7312 > Project: Kafka > Issue Type: Sub-task > Components: unit tests >Reporter: Guozhang Wang >Priority: Major > > {code} > Error Message > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > Stacktrace > org.junit.runners.model.TestTimedOutException: test timed out afte
[jira] [Commented] (KAFKA-7649) Flaky test SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl
[ https://issues.apache.org/jira/browse/KAFKA-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690168#comment-16690168 ] Dong Lin commented on KAFKA-7649: - The there is error in the log that says "No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/kafka7346315539242944484.tmp'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn:1011)". The source code confirms that broker will fail to start with "java.lang.SecurityException: zookeeper.set.acl is true, but the verification of the JAAS login file failed." if broker can not find JAAS configuration file. So the question is why broker fails to find the JAAS configuration file even though "startSasl(jaasSections(kafkaServerSaslMechanisms, Option(kafkaClientSaslMechanism), Both))" in SaslEndToEndAuthorizationTest.setUp() should have created the JAAS configuration file. I could not find the root cause yet. Since this happens rarely in the integration test and this issue is related to the existing of a configuration file during broker initialization. My guess is that the bug is related to the test setup, or maybe the temporary file `'/tmp/kafka7346315539242944484.tmp` is somehow cleaned up by the test machine. Though I am not 100% sure, my opinion is that this is not a blocking issue for 2.1.0 release. > Flaky test SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl > > > Key: KAFKA-7649 > URL: https://issues.apache.org/jira/browse/KAFKA-7649 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > Observed in > https://builds.apache.org/job/kafka-2.1-jdk8/49/testReport/junit/kafka.api/SslEndToEndAuthorizationTest/testNoProduceWithDescribeAcl/ > {code} > Error Message > java.lang.SecurityException: zookeeper.set.acl is true, but the verification > of the JAAS login file failed. > Stacktrace > java.lang.SecurityException: zookeeper.set.acl is true, but the verification > of the JAAS login file failed. > at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:361) > at kafka.server.KafkaServer.startup(KafkaServer.scala:202) > at kafka.utils.TestUtils$.createServer(TestUtils.scala:135) > at > kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:101) > at > kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:100) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:100) > at > kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81) > at > kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73) > at > kafka.api.EndToEndAuthorizationTest.setUp(EndToEndAuthorizationTest.scala:180) > at > kafka.api.SslEndToEndAuthorizationTest.setUp(SslEndToEndAuthorizationTest.scala:72) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(R
[jira] [Commented] (KAFKA-7312) Transient failure in kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts
[ https://issues.apache.org/jira/browse/KAFKA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690150#comment-16690150 ] Dong Lin commented on KAFKA-7312: - Here is another stacktrace: {code} Error Message java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException Stacktrace java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404) at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} > Transient failure in > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts > > > Key: KAFKA-7312 > URL: https://issues.apache.org/jira/browse/KAFKA-7312 > Project: Kafka > Issue Type: Sub-task > Components: unit tests >Reporter: Guozhang Wang >Priority: Major > > {code} > Error Message > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > Stacktrace > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262) > at > kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1345) > at > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7648) Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests
[ https://issues.apache.org/jira/browse/KAFKA-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690140#comment-16690140 ] Dong Lin commented on KAFKA-7648: - Currently TestUtils.createTopic(...) will re-send znode creation request to zookeeper service if the previous response shows Code.CONNECTIONLOSS. See KafkaZkClient.retryRequestsUntilConnected() for related logic. This means that the test will fail if the zookeeper has created znode upon the first request, the response to the first request is lost or timed-out, the second request is sent, and the response of the second request shows Code.NODEEXISTS. In order to fix this flaky test, we probably should implement some logic similar to KafkaZkClient.CheckedEphemeral() to check whether the znode has been created in the with the same session id after receiving Code.NODEEXISTS. > Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests > --- > > Key: KAFKA-7648 > URL: https://issues.apache.org/jira/browse/KAFKA-7648 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > Observed in > [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/] > > {code} > Error Message > org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already > exists. > h3. Stacktrace > org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already > exists. > h3. Standard Output > [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=0] Error for partition topic-3-3 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client > session timed out, have not heard from server in 4000ms for sessionid > 0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 > 17:53:14,806] WARN Unable to read additional data from client sessionid > 0x10051eebf480003, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:14,807] WARN Unable to read additional data from client > sessionid 0x10051eebf480002, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:14,824] WARN Unable to read additional data from client > sessionid 0x10051eebf480001, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf48 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:15,423] WARN Unable to read additional data from client > sessionid 0x10051eebf48, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] > WARN fsync-ing the write ahead log in SyncThread:0 took 4456ms which will > adversely effect operation latency. See the ZooKeeper troubleshooting guide > (org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 > 17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] > Error for partition topic-4-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:23,087] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > invalid-timeout-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:23,088] ERROR > [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition > invalid-timeout-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:23,137] ERROR > [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition > invalid-timeo
[jira] [Comment Edited] (KAFKA-7648) Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests
[ https://issues.apache.org/jira/browse/KAFKA-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690140#comment-16690140 ] Dong Lin edited comment on KAFKA-7648 at 11/16/18 11:47 PM: Currently TestUtils.createTopic(...) will re-send znode creation request to zookeeper service if the previous response shows Code.CONNECTIONLOSS. See KafkaZkClient.retryRequestsUntilConnected() for related logic. This means that the test will fail if the zookeeper has created znode upon the first request, the response to the first request is lost or timed-out, the second request is sent, and the response of the second request shows Code.NODEEXISTS. In order to fix this flaky test, we probably should implement some logic similar to KafkaZkClient.CheckedEphemeral() to check whether the znode has been created in the with the same session id after receiving Code.NODEEXISTS. Given the above understanding and the fact that the test passes with high probability, this flaky test does not indicate bug and should not be a blocking issue for 2.1.0 release. was (Author: lindong): Currently TestUtils.createTopic(...) will re-send znode creation request to zookeeper service if the previous response shows Code.CONNECTIONLOSS. See KafkaZkClient.retryRequestsUntilConnected() for related logic. This means that the test will fail if the zookeeper has created znode upon the first request, the response to the first request is lost or timed-out, the second request is sent, and the response of the second request shows Code.NODEEXISTS. In order to fix this flaky test, we probably should implement some logic similar to KafkaZkClient.CheckedEphemeral() to check whether the znode has been created in the with the same session id after receiving Code.NODEEXISTS. > Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests > --- > > Key: KAFKA-7648 > URL: https://issues.apache.org/jira/browse/KAFKA-7648 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > Observed in > [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/] > > {code} > Error Message > org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already > exists. > h3. Stacktrace > org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already > exists. > h3. Standard Output > [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=0] Error for partition topic-3-3 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client > session timed out, have not heard from server in 4000ms for sessionid > 0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 > 17:53:14,806] WARN Unable to read additional data from client sessionid > 0x10051eebf480003, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:14,807] WARN Unable to read additional data from client > sessionid 0x10051eebf480002, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:14,824] WARN Unable to read additional data from client > sessionid 0x10051eebf480001, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf48 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:15,423] WARN Unable to read additional data from client > sessionid 0x10051eebf48, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] > WARN fsync-ing the write ahead log in SyncThread:0 took 4456ms which will > adversely effect operation latency. See the ZooKeeper troubleshooting guide > (org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 > 17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId
[jira] [Comment Edited] (KAFKA-7486) Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic`
[ https://issues.apache.org/jira/browse/KAFKA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690115#comment-16690115 ] Dong Lin edited comment on KAFKA-7486 at 11/16/18 11:13 PM: The test calls `AdminUtils.addPartitions` to add partition to the topic after the async topic deletion is triggered. It assumes that the topic deletion is not completed when `AdminUtils.addPartitions` is executed but this is not guaranteed. This is the root cause of the test failure. So it does not indicate any bug in the code and thus this is not a blocking issue for 2.1.0 release. was (Author: lindong): The test calls `AdminUtils.addPartitions` to add partition to the topic after the async topic deletion is triggered. It assumes that the topic deletion is not completed when `AdminUtils.addPartitions` is executed but this is not guaranteed. This is the root cause of the test failure. So it does not indicate any bug in the code and thus this is not a blocking issue for 2.1 release. > Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic` > -- > > Key: KAFKA-7486 > URL: https://issues.apache.org/jira/browse/KAFKA-7486 > Project: Kafka > Issue Type: Sub-task >Reporter: Jason Gustafson >Priority: Major > > Starting to see more of this recently: > {code} > 10:06:28 kafka.admin.DeleteTopicTest > testAddPartitionDuringDeleteTopic > FAILED > 10:06:28 kafka.admin.AdminOperationException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/test > 10:06:28 at > kafka.zk.AdminZkClient.writeTopicPartitionAssignment(AdminZkClient.scala:162) > 10:06:28 at > kafka.zk.AdminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK(AdminZkClient.scala:102) > 10:06:28 at > kafka.zk.AdminZkClient.addPartitions(AdminZkClient.scala:229) > 10:06:28 at > kafka.admin.DeleteTopicTest.testAddPartitionDuringDeleteTopic(DeleteTopicTest.scala:266) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7541) Transient Failure: kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable
[ https://issues.apache.org/jira/browse/KAFKA-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690130#comment-16690130 ] Dong Lin commented on KAFKA-7541: - According to the source code DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(), the test will fail with the exception above if leader election is not completed within 15 seconds. Thus the test may fail if there is long GC. We can reduce the chance of the test failure by increasing the wait time. Given the above understanding and the fact that the test passes with high probability, this flaky test does not indicate bug and should not be a blocking issue for 2.1.0 release. > Transient Failure: > kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable > > > Key: KAFKA-7541 > URL: https://issues.apache.org/jira/browse/KAFKA-7541 > Project: Kafka > Issue Type: Sub-task >Reporter: John Roesler >Priority: Major > > Observed on Java 11: > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/264/testReport/junit/kafka.server/DynamicBrokerReconfigurationTest/testUncleanLeaderElectionEnable/] > > Stacktrace: > {noformat} > java.lang.AssertionError: Unclean leader not elected > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(DynamicBrokerReconfigurationTest.scala:487) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(Pro
[jira] [Commented] (KAFKA-7486) Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic`
[ https://issues.apache.org/jira/browse/KAFKA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690115#comment-16690115 ] Dong Lin commented on KAFKA-7486: - The test calls `AdminUtils.addPartitions` to add partition to the topic after the async topic deletion is triggered. It assumes that the topic deletion is not completed when `AdminUtils.addPartitions` is executed but this is not guaranteed. This is the root cause of the test failure. So it does not indicate any bug in the code and thus this is not a blocking issue for 2.1 release. > Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic` > -- > > Key: KAFKA-7486 > URL: https://issues.apache.org/jira/browse/KAFKA-7486 > Project: Kafka > Issue Type: Sub-task >Reporter: Jason Gustafson >Priority: Major > > Starting to see more of this recently: > {code} > 10:06:28 kafka.admin.DeleteTopicTest > testAddPartitionDuringDeleteTopic > FAILED > 10:06:28 kafka.admin.AdminOperationException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/test > 10:06:28 at > kafka.zk.AdminZkClient.writeTopicPartitionAssignment(AdminZkClient.scala:162) > 10:06:28 at > kafka.zk.AdminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK(AdminZkClient.scala:102) > 10:06:28 at > kafka.zk.AdminZkClient.addPartitions(AdminZkClient.scala:229) > 10:06:28 at > kafka.admin.DeleteTopicTest.testAddPartitionDuringDeleteTopic(DeleteTopicTest.scala:266) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7541) Transient Failure: kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable
[ https://issues.apache.org/jira/browse/KAFKA-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7541: Issue Type: Sub-task (was: Bug) Parent: KAFKA-7645 > Transient Failure: > kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable > > > Key: KAFKA-7541 > URL: https://issues.apache.org/jira/browse/KAFKA-7541 > Project: Kafka > Issue Type: Sub-task >Reporter: John Roesler >Priority: Major > > Observed on Java 11: > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/264/testReport/junit/kafka.server/DynamicBrokerReconfigurationTest/testUncleanLeaderElectionEnable/] > > Stacktrace: > {noformat} > java.lang.AssertionError: Unclean leader not elected > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(DynamicBrokerReconfigurationTest.scala:487) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Delegati
[jira] [Created] (KAFKA-7646) Flaky test SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe
Dong Lin created KAFKA-7646: --- Summary: Flaky test SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe Key: KAFKA-7646 URL: https://issues.apache.org/jira/browse/KAFKA-7646 Project: Kafka Issue Type: Task Reporter: Dong Lin This test is reported to fail by [~enothereska] as part of 2.1.0 RC0 release certification. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7649) Flaky test SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl
[ https://issues.apache.org/jira/browse/KAFKA-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7649: Issue Type: Sub-task (was: Task) Parent: KAFKA-7645 > Flaky test SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl > > > Key: KAFKA-7649 > URL: https://issues.apache.org/jira/browse/KAFKA-7649 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > Observed in > https://builds.apache.org/job/kafka-2.1-jdk8/49/testReport/junit/kafka.api/SslEndToEndAuthorizationTest/testNoProduceWithDescribeAcl/ > {code} > Error Message > java.lang.SecurityException: zookeeper.set.acl is true, but the verification > of the JAAS login file failed. > Stacktrace > java.lang.SecurityException: zookeeper.set.acl is true, but the verification > of the JAAS login file failed. > at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:361) > at kafka.server.KafkaServer.startup(KafkaServer.scala:202) > at kafka.utils.TestUtils$.createServer(TestUtils.scala:135) > at > kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:101) > at > kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:100) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:100) > at > kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81) > at > kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73) > at > kafka.api.EndToEndAuthorizationTest.setUp(EndToEndAuthorizationTest.scala:180) > at > kafka.api.SslEndToEndAuthorizationTest.setUp(SslEndToEndAuthorizationTest.scala:72) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > a
[jira] [Created] (KAFKA-7649) Flaky test SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl
Dong Lin created KAFKA-7649: --- Summary: Flaky test SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl Key: KAFKA-7649 URL: https://issues.apache.org/jira/browse/KAFKA-7649 Project: Kafka Issue Type: Task Reporter: Dong Lin Observed in https://builds.apache.org/job/kafka-2.1-jdk8/49/testReport/junit/kafka.api/SslEndToEndAuthorizationTest/testNoProduceWithDescribeAcl/ {code} Error Message java.lang.SecurityException: zookeeper.set.acl is true, but the verification of the JAAS login file failed. Stacktrace java.lang.SecurityException: zookeeper.set.acl is true, but the verification of the JAAS login file failed. at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:361) at kafka.server.KafkaServer.startup(KafkaServer.scala:202) at kafka.utils.TestUtils$.createServer(TestUtils.scala:135) at kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:101) at kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:100) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:100) at kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81) at kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73) at kafka.api.EndToEndAuthorizationTest.setUp(EndToEndAuthorizationTest.scala:180) at kafka.api.SslEndToEndAuthorizationTest.setUp(SslEndToEndAuthorizationTest.scala:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.wo
[jira] [Updated] (KAFKA-7648) Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests
[ https://issues.apache.org/jira/browse/KAFKA-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7648: Description: Observed in [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/] {code} Error Message org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists. h3. Stacktrace org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists. h3. Standard Output [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition topic-3-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client session timed out, have not heard from server in 4000ms for sessionid 0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,806] WARN Unable to read additional data from client sessionid 0x10051eebf480003, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,807] WARN Unable to read additional data from client sessionid 0x10051eebf480002, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,824] WARN Unable to read additional data from client sessionid 0x10051eebf480001, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf48 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:15,423] WARN Unable to read additional data from client sessionid 0x10051eebf48, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] WARN fsync-ing the write ahead log in SyncThread:0 took 4456ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide (org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition topic-4-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,087] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition invalid-timeout-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,088] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition invalid-timeout-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,137] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition invalid-timeout-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. {code} was: Observed in [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/] h3. Error Message org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists. h3. Stacktrace org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists. h3. Standard Output [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition topic-3-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client session timed out, have not heard f
[jira] [Updated] (KAFKA-7312) Transient failure in kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts
[ https://issues.apache.org/jira/browse/KAFKA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7312: Issue Type: Sub-task (was: Improvement) Parent: KAFKA-7645 > Transient failure in > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts > > > Key: KAFKA-7312 > URL: https://issues.apache.org/jira/browse/KAFKA-7312 > Project: Kafka > Issue Type: Sub-task > Components: unit tests >Reporter: Guozhang Wang >Priority: Major > > {code} > Error Message > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > Stacktrace > org.junit.runners.model.TestTimedOutException: test timed out after 12 > milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262) > at > kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1345) > at > kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7648) Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests
Dong Lin created KAFKA-7648: --- Summary: Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests Key: KAFKA-7648 URL: https://issues.apache.org/jira/browse/KAFKA-7648 Project: Kafka Issue Type: Task Reporter: Dong Lin Observed in [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/] h3. Error Message org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists. h3. Stacktrace org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists. h3. Standard Output [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition topic-3-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client session timed out, have not heard from server in 4000ms for sessionid 0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,806] WARN Unable to read additional data from client sessionid 0x10051eebf480003, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,807] WARN Unable to read additional data from client sessionid 0x10051eebf480002, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,824] WARN Unable to read additional data from client sessionid 0x10051eebf480001, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf48 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:15,423] WARN Unable to read additional data from client sessionid 0x10051eebf48, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] WARN fsync-ing the write ahead log in SyncThread:0 took 4456ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide (org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition topic-4-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,087] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition invalid-timeout-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,088] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition invalid-timeout-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,137] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition invalid-timeout-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7648) Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests
[ https://issues.apache.org/jira/browse/KAFKA-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7648: Issue Type: Sub-task (was: Task) Parent: KAFKA-7645 > Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests > --- > > Key: KAFKA-7648 > URL: https://issues.apache.org/jira/browse/KAFKA-7648 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > Observed in > [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/] > > h3. Error Message > org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already > exists. > h3. Stacktrace > org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already > exists. > h3. Standard Output > [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=0] Error for partition topic-3-3 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client > session timed out, have not heard from server in 4000ms for sessionid > 0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 > 17:53:14,806] WARN Unable to read additional data from client sessionid > 0x10051eebf480003, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:14,807] WARN Unable to read additional data from client > sessionid 0x10051eebf480002, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:14,824] WARN Unable to read additional data from client > sessionid 0x10051eebf480001, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] > WARN Client session timed out, have not heard from server in 4002ms for > sessionid 0x10051eebf48 (org.apache.zookeeper.ClientCnxn:1112) > [2018-11-07 17:53:15,423] WARN Unable to read additional data from client > sessionid 0x10051eebf48, likely client has closed socket > (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] > WARN fsync-ing the write ahead log in SyncThread:0 took 4456ms which will > adversely effect operation latency. See the ZooKeeper troubleshooting guide > (org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 > 17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] > Error for partition topic-4-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:23,087] ERROR > [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition > invalid-timeout-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:23,088] ERROR > [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition > invalid-timeout-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. [2018-11-07 17:53:23,137] ERROR > [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition > invalid-timeout-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7647) Flaky test LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic
[ https://issues.apache.org/jira/browse/KAFKA-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7647: Issue Type: Sub-task (was: Task) Parent: KAFKA-7645 > Flaky test > LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic > - > > Key: KAFKA-7647 > URL: https://issues.apache.org/jira/browse/KAFKA-7647 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > kafka.log.LogCleanerParameterizedIntegrationTest > > testCleansCombinedCompactAndDeleteTopic[3] FAILED > java.lang.AssertionError: Contents of the map shouldn't change > expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> > (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), > 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> > (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 -> > (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but > was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), > 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> > (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> > (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 -> > (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> > (355,355))> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:118) > at > kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7647) Flaky test LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic
Dong Lin created KAFKA-7647: --- Summary: Flaky test LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic Key: KAFKA-7647 URL: https://issues.apache.org/jira/browse/KAFKA-7647 Project: Kafka Issue Type: Task Reporter: Dong Lin kafka.log.LogCleanerParameterizedIntegrationTest > testCleansCombinedCompactAndDeleteTopic[3] FAILED java.lang.AssertionError: Contents of the map shouldn't change expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 -> (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:118) at kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7646) Flaky test SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe
[ https://issues.apache.org/jira/browse/KAFKA-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7646: Issue Type: Sub-task (was: Task) Parent: KAFKA-7645 > Flaky test > SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe > --- > > Key: KAFKA-7646 > URL: https://issues.apache.org/jira/browse/KAFKA-7646 > Project: Kafka > Issue Type: Sub-task >Reporter: Dong Lin >Priority: Major > > This test is reported to fail by [~enothereska] as part of 2.1.0 RC0 release > certification. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7486) Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic`
[ https://issues.apache.org/jira/browse/KAFKA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7486: Issue Type: Sub-task (was: Bug) Parent: KAFKA-7645 > Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic` > -- > > Key: KAFKA-7486 > URL: https://issues.apache.org/jira/browse/KAFKA-7486 > Project: Kafka > Issue Type: Sub-task >Reporter: Jason Gustafson >Priority: Major > > Starting to see more of this recently: > {code} > 10:06:28 kafka.admin.DeleteTopicTest > testAddPartitionDuringDeleteTopic > FAILED > 10:06:28 kafka.admin.AdminOperationException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/test > 10:06:28 at > kafka.zk.AdminZkClient.writeTopicPartitionAssignment(AdminZkClient.scala:162) > 10:06:28 at > kafka.zk.AdminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK(AdminZkClient.scala:102) > 10:06:28 at > kafka.zk.AdminZkClient.addPartitions(AdminZkClient.scala:229) > 10:06:28 at > kafka.admin.DeleteTopicTest.testAddPartitionDuringDeleteTopic(DeleteTopicTest.scala:266) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7645) Fix flaky unit test for 2.1 branch
Dong Lin created KAFKA-7645: --- Summary: Fix flaky unit test for 2.1 branch Key: KAFKA-7645 URL: https://issues.apache.org/jira/browse/KAFKA-7645 Project: Kafka Issue Type: Task Reporter: Dong Lin Assignee: Dong Lin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7486) Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic`
[ https://issues.apache.org/jira/browse/KAFKA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690099#comment-16690099 ] Dong Lin commented on KAFKA-7486: - Hey [~chia7712], I think [~hachikuji] probably missed your message. I am sure [~hachikuji] (and myself) is happy for your to help take this JIRA. > Flaky test `DeleteTopicTest.testAddPartitionDuringDeleteTopic` > -- > > Key: KAFKA-7486 > URL: https://issues.apache.org/jira/browse/KAFKA-7486 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Major > > Starting to see more of this recently: > {code} > 10:06:28 kafka.admin.DeleteTopicTest > testAddPartitionDuringDeleteTopic > FAILED > 10:06:28 kafka.admin.AdminOperationException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /brokers/topics/test > 10:06:28 at > kafka.zk.AdminZkClient.writeTopicPartitionAssignment(AdminZkClient.scala:162) > 10:06:28 at > kafka.zk.AdminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK(AdminZkClient.scala:102) > 10:06:28 at > kafka.zk.AdminZkClient.addPartitions(AdminZkClient.scala:229) > 10:06:28 at > kafka.admin.DeleteTopicTest.testAddPartitionDuringDeleteTopic(DeleteTopicTest.scala:266) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7549) Old ProduceRequest with zstd compression does not return error to client
[ https://issues.apache.org/jira/browse/KAFKA-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680509#comment-16680509 ] Dong Lin commented on KAFKA-7549: - [~ijuma] [~hachikuji] Given the reasoning provided above, are you OK with moving this issue out of 2.1 release? > Old ProduceRequest with zstd compression does not return error to client > > > Key: KAFKA-7549 > URL: https://issues.apache.org/jira/browse/KAFKA-7549 > Project: Kafka > Issue Type: Bug > Components: compression >Reporter: Magnus Edenhill >Assignee: Lee Dongjin >Priority: Major > Fix For: 2.1.0 > > > Kafka broker v2.1.0rc0. > > KIP-110 states that: > "Zstd will only be allowed for the bumped produce API. That is, for older > version clients(=below KAFKA_2_1_IV0), we return UNSUPPORTED_COMPRESSION_TYPE > regardless of the message format." > > However, sending a ProduceRequest V3 with zstd compression (which is a client > side bug) closes the connection with the following exception rather than > returning UNSUPPORTED_COMPRESSION_TYPE in the ProduceResponse: > > {noformat} > [2018-10-25 11:40:31,813] ERROR Exception while processing request from > 127.0.0.1:60723-127.0.0.1:60656-94 (kafka.network.Processor) > org.apache.kafka.common.errors.InvalidRequestException: Error getting request > for apiKey: PRODUCE, apiVersion: 3, connectionId: > 127.0.0.1:60723-127.0.0.1:60656-94, listenerName: ListenerName(PLAINTEXT), > principal: User:ANONYMOUS > Caused by: org.apache.kafka.common.record.InvalidRecordException: Produce > requests with version 3 are note allowed to use ZStandard compression > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7603) Producer should negotiate message format version with broker
[ https://issues.apache.org/jira/browse/KAFKA-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678711#comment-16678711 ] Dong Lin commented on KAFKA-7603: - [~ijuma] Yes. I have not thought about how to do this yet. Just want to create a ticket to document the possible improvement here. > Producer should negotiate message format version with broker > > > Key: KAFKA-7603 > URL: https://issues.apache.org/jira/browse/KAFKA-7603 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > > Currently Producer will always send the record with the highest magic format > version that is supported by both the produce and broker library regardless > of log.message.format.version config in the broker. > This causes unnecessary message downconvert overhead if > log.message.format.version has not been upgraded and producer/broker library > has been upgraded. It is preferred for produce to produce message with format > version no higher than the log.message.format.version configured in the > broker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6262) KIP-232: Detect outdated metadata by adding ControllerMetadataEpoch field
[ https://issues.apache.org/jira/browse/KAFKA-6262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-6262. - Resolution: Duplicate The design in this Jira has been moved to KIP-320. > KIP-232: Detect outdated metadata by adding ControllerMetadataEpoch field > - > > Key: KAFKA-6262 > URL: https://issues.apache.org/jira/browse/KAFKA-6262 > Project: Kafka > Issue Type: Bug >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > > Currently the following sequence of events may happen that cause consumer to > rewind back to the earliest offset even if there is no log truncation in > Kafka. This can be a problem for MM by forcing MM to lag behind significantly > and duplicate a large amount of data. > - Say there are three brokers 1,2,3 for a given partition P. Broker 1 is the > leader. Initially they are all in ISR. HW and LEO are both 10. > - SRE does controlled shutdown for broker 1. Controller sends > LeaderAndIsrRequest to all three brokers so that leader = broker 2 and > isr_set = [broker 2, broker 3]. > - Broker 2 and 3 receives and processes LeaderAndIsrRequest almost > instantaneously. Now broker 2 and broker 3 can accept ProduceRequest and > FetchRequest for the partition P. > However, broker 1 has not processed this LeaderAndIsrRequest due to backlog > in its request queue. So broker 1 still think it is leader for the partition > P. > - Because there is leadership movement, a consumer receives > NotLeaderForPartitionException, which triggers this consumer to send > MetadataRequest to a randomly selected broker, say broker 2. Broker 2 tells > consumer that itself is the leader for partition P. Consumer fetches date of > partition P from broker 2. The latest data has offset 20. > - Later this consumer receives NotLeaderForPartitionException for another > partition. It sends MetadataRequest to a randomly selected broker again. This > time it sends MetadataRequest to broker 1, which tells the consumer that > itself is the leader for partition P. > - This consumer issues FetchRequest for the partition P at offset 21. Broker > 1 returns OffsetOutOfRangeExeption because it thinks the LogEndOffset for > this partition is 10. > There are two possible solutions for this problem. The long term solution is > probably to include version in the MetadataResponse so that consumer knows > whether the medata is outdated. This requires a KIP. > The short term solution, which should solve the problem in most cases, is to > let consumer keep fetching metadata from the same (initially randomly picked) > broker until the connection to this broker is disconnected. The metadata > version will not go back in time if consumer keeps fetching metadata from the > same broker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7603) Producer should negotiate message format version with broker
Dong Lin created KAFKA-7603: --- Summary: Producer should negotiate message format version with broker Key: KAFKA-7603 URL: https://issues.apache.org/jira/browse/KAFKA-7603 Project: Kafka Issue Type: Improvement Reporter: Dong Lin Currently Producer will always send the record with the highest magic format version that is supported by both the produce and broker library regardless of log.message.format.version config in the broker. This causes unnecessary message downconvert overhead if log.message.format.version has not been upgraded and producer/broker library has been upgraded. It is preferred for produce to produce message with format version no higher than the log.message.format.version configured in the broker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7603) Producer should negotiate message format version with broker
[ https://issues.apache.org/jira/browse/KAFKA-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-7603: --- Assignee: Dong Lin > Producer should negotiate message format version with broker > > > Key: KAFKA-7603 > URL: https://issues.apache.org/jira/browse/KAFKA-7603 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > > Currently Producer will always send the record with the highest magic format > version that is supported by both the produce and broker library regardless > of log.message.format.version config in the broker. > This causes unnecessary message downconvert overhead if > log.message.format.version has not been upgraded and producer/broker library > has been upgraded. It is preferred for produce to produce message with format > version no higher than the log.message.format.version configured in the > broker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7549) Old ProduceRequest with zstd compression does not return error to client
[ https://issues.apache.org/jira/browse/KAFKA-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678602#comment-16678602 ] Dong Lin edited comment on KAFKA-7549 at 11/7/18 6:25 PM: -- I actually think it is reasonable for client to receive InvalidRequestException if client sends ProduceRequest V3 with zstd codec. We can not send UnsupportedCompressionTypeException because UNSUPPORTED_COMPRESSION_TYPE exception because if ProduceRequest is V3 then there is no guarantee that client library can understand the error code 76. The meaning of UnsupportedVersionException is currently "The version of API is not supported", which does not match the scenario here because ProduceRequest V3 is actually supported by the broker. InvalidRequestException is reasonable because the issue here is that ProduceRequest V3 is used with zstd codec which makes the entire request invalid. [~edenhill] [~dongjin] By saying "client side bug" in the Jira description, is this bug in Apache Kafka or in another custom client library? If it is in Apache Kafka, is there Jira that tracks this issue? [~ijuma] [~hachikuji] Does this sound reasonable? was (Author: lindong): I actually think it is reasonable for client to receive InvalidRequestException if client sends ProduceRequest V3 with zstd codec. We can not send UnsupportedCompressionTypeException because UNSUPPORTED_COMPRESSION_TYPE exception because if ProduceRequest is V3 then there is no guarantee that client library can understand the error code 76. The meaning of UnsupportedVersionException is currently "The version of API is not supported", which does not match the scenario here because ProduceRequest V3 is actually supported by the broker. InvalidRequestException is reasonable because the issue here is that ProduceRequest V3 is used with zstd codec which makes the entire request invalid. [~edenhill] [~dongjin] By saying "client side bug" in the Jira description, is this bug in Apache Kafka or in another custom client library? If it is in Apache Kafka, is there Jira that tracks this issue? > Old ProduceRequest with zstd compression does not return error to client > > > Key: KAFKA-7549 > URL: https://issues.apache.org/jira/browse/KAFKA-7549 > Project: Kafka > Issue Type: Bug > Components: compression >Reporter: Magnus Edenhill >Assignee: Lee Dongjin >Priority: Major > Fix For: 2.1.0 > > > Kafka broker v2.1.0rc0. > > KIP-110 states that: > "Zstd will only be allowed for the bumped produce API. That is, for older > version clients(=below KAFKA_2_1_IV0), we return UNSUPPORTED_COMPRESSION_TYPE > regardless of the message format." > > However, sending a ProduceRequest V3 with zstd compression (which is a client > side bug) closes the connection with the following exception rather than > returning UNSUPPORTED_COMPRESSION_TYPE in the ProduceResponse: > > {noformat} > [2018-10-25 11:40:31,813] ERROR Exception while processing request from > 127.0.0.1:60723-127.0.0.1:60656-94 (kafka.network.Processor) > org.apache.kafka.common.errors.InvalidRequestException: Error getting request > for apiKey: PRODUCE, apiVersion: 3, connectionId: > 127.0.0.1:60723-127.0.0.1:60656-94, listenerName: ListenerName(PLAINTEXT), > principal: User:ANONYMOUS > Caused by: org.apache.kafka.common.record.InvalidRecordException: Produce > requests with version 3 are note allowed to use ZStandard compression > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7549) Old ProduceRequest with zstd compression does not return error to client
[ https://issues.apache.org/jira/browse/KAFKA-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678602#comment-16678602 ] Dong Lin commented on KAFKA-7549: - I actually think it is reasonable for client to receive InvalidRequestException if client sends ProduceRequest V3 with zstd codec. We can not send UnsupportedCompressionTypeException because UNSUPPORTED_COMPRESSION_TYPE exception because if ProduceRequest is V3 then there is no guarantee that client library can understand the error code 76. The meaning of UnsupportedVersionException is currently "The version of API is not supported", which does not match the scenario here because ProduceRequest V3 is actually supported by the broker. InvalidRequestException is reasonable because the issue here is that ProduceRequest V3 is used with zstd codec which makes the entire request invalid. [~edenhill] [~dongjin] By saying "client side bug" in the Jira description, is this bug in Apache Kafka or in another custom client library? If it is in Apache Kafka, is there Jira that tracks this issue? > Old ProduceRequest with zstd compression does not return error to client > > > Key: KAFKA-7549 > URL: https://issues.apache.org/jira/browse/KAFKA-7549 > Project: Kafka > Issue Type: Bug > Components: compression >Reporter: Magnus Edenhill >Assignee: Lee Dongjin >Priority: Major > Fix For: 2.1.0 > > > Kafka broker v2.1.0rc0. > > KIP-110 states that: > "Zstd will only be allowed for the bumped produce API. That is, for older > version clients(=below KAFKA_2_1_IV0), we return UNSUPPORTED_COMPRESSION_TYPE > regardless of the message format." > > However, sending a ProduceRequest V3 with zstd compression (which is a client > side bug) closes the connection with the following exception rather than > returning UNSUPPORTED_COMPRESSION_TYPE in the ProduceResponse: > > {noformat} > [2018-10-25 11:40:31,813] ERROR Exception while processing request from > 127.0.0.1:60723-127.0.0.1:60656-94 (kafka.network.Processor) > org.apache.kafka.common.errors.InvalidRequestException: Error getting request > for apiKey: PRODUCE, apiVersion: 3, connectionId: > 127.0.0.1:60723-127.0.0.1:60656-94, listenerName: ListenerName(PLAINTEXT), > principal: User:ANONYMOUS > Caused by: org.apache.kafka.common.record.InvalidRecordException: Produce > requests with version 3 are note allowed to use ZStandard compression > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7313) StopReplicaRequest should attempt to remove future replica for the partition only if future replica exists
[ https://issues.apache.org/jira/browse/KAFKA-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-7313. - Resolution: Fixed > StopReplicaRequest should attempt to remove future replica for the partition > only if future replica exists > -- > > Key: KAFKA-7313 > URL: https://issues.apache.org/jira/browse/KAFKA-7313 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.1, 2.1.0 > > > This patch fixes two issues: > 1) Currently if a broker received StopReplicaRequest with delete=true for the > same offline replica, the first StopRelicaRequest will show > KafkaStorageException and the second StopRelicaRequest will show > ReplicaNotAvailableException. This is because the first StopRelicaRequest > will remove the mapping (tp -> ReplicaManager.OfflinePartition) from > ReplicaManager.allPartitions before returning KafkaStorageException, thus the > second StopRelicaRequest will not find this partition as offline. > This result appears to be inconsistent. And since the replica is already > offline and broker will not be able to delete file for this replica, the > StopReplicaRequest should fail without making any change and broker should > still remember that this replica is offline. > 2) Currently if broker receives StopReplicaRequest with delete=true, the > broker will attempt to remove future replica for the partition, which will > cause KafkaStorageException in the StopReplicaResponse if this replica does > not have future replica. It is problematic to always return > KafkaStorageException in the response if future replica does not exist. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7481: Fix Version/s: (was: 2.1.0) 2.2.0 > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 2.2.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reopened KAFKA-7481: - > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-7481. - Resolution: Fixed > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-7481: --- Assignee: Jason Gustafson > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7313) StopReplicaRequest should attempt to remove future replica for the partition only if future replica exists
[ https://issues.apache.org/jira/browse/KAFKA-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7313: Fix Version/s: (was: 2.1.1) 2.1.0 2.0.1 > StopReplicaRequest should attempt to remove future replica for the partition > only if future replica exists > -- > > Key: KAFKA-7313 > URL: https://issues.apache.org/jira/browse/KAFKA-7313 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.1, 2.1.0 > > > This patch fixes two issues: > 1) Currently if a broker received StopReplicaRequest with delete=true for the > same offline replica, the first StopRelicaRequest will show > KafkaStorageException and the second StopRelicaRequest will show > ReplicaNotAvailableException. This is because the first StopRelicaRequest > will remove the mapping (tp -> ReplicaManager.OfflinePartition) from > ReplicaManager.allPartitions before returning KafkaStorageException, thus the > second StopRelicaRequest will not find this partition as offline. > This result appears to be inconsistent. And since the replica is already > offline and broker will not be able to delete file for this replica, the > StopReplicaRequest should fail without making any change and broker should > still remember that this replica is offline. > 2) Currently if broker receives StopReplicaRequest with delete=true, the > broker will attempt to remove future replica for the partition, which will > cause KafkaStorageException in the StopReplicaResponse if this replica does > not have future replica. It is problematic to always return > KafkaStorageException in the response if future replica does not exist. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7559) ConnectStandaloneFileTest system tests do not pass
[ https://issues.apache.org/jira/browse/KAFKA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-7559. - Resolution: Fixed > ConnectStandaloneFileTest system tests do not pass > -- > > Key: KAFKA-7559 > URL: https://issues.apache.org/jira/browse/KAFKA-7559 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Randall Hauch >Priority: Major > Fix For: 2.0.1, 2.1.0 > > > Both tests `test_skip_and_log_to_dlq` and `test_file_source_and_sink` under > `kafkatest.tests.connect.connect_test.ConnectStandaloneFileTest` fail with > error messages similar to: > "TimeoutError: Kafka Connect failed to start on node: ducker@ducker04 in > condition mode: LISTEN" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7559) ConnectStandaloneFileTest system tests do not pass
[ https://issues.apache.org/jira/browse/KAFKA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7559: Fix Version/s: 2.1.0 2.0.1 > ConnectStandaloneFileTest system tests do not pass > -- > > Key: KAFKA-7559 > URL: https://issues.apache.org/jira/browse/KAFKA-7559 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Randall Hauch >Priority: Major > Fix For: 2.0.1, 2.1.0 > > > Both tests `test_skip_and_log_to_dlq` and `test_file_source_and_sink` under > `kafkatest.tests.connect.connect_test.ConnectStandaloneFileTest` fail with > error messages similar to: > "TimeoutError: Kafka Connect failed to start on node: ducker@ducker04 in > condition mode: LISTEN" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7560) PushHttpMetricsReporter should not convert metric value to double
[ https://issues.apache.org/jira/browse/KAFKA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7560: Description: Currently PushHttpMetricsReporter will convert value from KafkaMetric.metricValue() to double. This will not work for non-numerical metrics such as version in AppInfoParser whose value can be string. This has caused issue for PushHttpMetricsReporter which in turn caused system test kafkatest.tests.client.quota_test.QuotaTest.test_quota to fail with the following exception: {code:java} File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, in validate metric.value for k, metrics in producer.metrics(group='producer-metrics', name='outgoing-byte-rate', client_id=producer.client_id) for metric in metrics ValueError: max() arg is an empty sequence {code} Since we allow metric value to be object, PushHttpMetricsReporter should also read metric value as object and pass it to the http server. was: Currently metricValue The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails when I run it locally. It produces the following error message: {code:java} File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, in validate metric.value for k, metrics in producer.metrics(group='producer-metrics', name='outgoing-byte-rate', client_id=producer.client_id) for metric in metrics ValueError: max() arg is an empty sequence {code} I assume it cannot find the metric it's searching for > PushHttpMetricsReporter should not convert metric value to double > - > > Key: KAFKA-7560 > URL: https://issues.apache.org/jira/browse/KAFKA-7560 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Dong Lin >Priority: Blocker > > Currently PushHttpMetricsReporter will convert value from > KafkaMetric.metricValue() to double. This will not work for non-numerical > metrics such as version in AppInfoParser whose value can be string. This has > caused issue for PushHttpMetricsReporter which in turn caused system test > kafkatest.tests.client.quota_test.QuotaTest.test_quota to fail with the > following exception: > {code:java} > File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, > in validate metric.value for k, metrics in > producer.metrics(group='producer-metrics', name='outgoing-byte-rate', > client_id=producer.client_id) for metric in metrics ValueError: max() arg is > an empty sequence > {code} > Since we allow metric value to be object, PushHttpMetricsReporter should also > read metric value as object and pass it to the http server. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7560) PushHttpMetricsReporter should not convert metric value to double
[ https://issues.apache.org/jira/browse/KAFKA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7560: Description: Currently metricValue The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails when I run it locally. It produces the following error message: {code:java} File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, in validate metric.value for k, metrics in producer.metrics(group='producer-metrics', name='outgoing-byte-rate', client_id=producer.client_id) for metric in metrics ValueError: max() arg is an empty sequence {code} I assume it cannot find the metric it's searching for was: The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails when I run it locally. It produces the following error message: {code:java} File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, in validate metric.value for k, metrics in producer.metrics(group='producer-metrics', name='outgoing-byte-rate', client_id=producer.client_id) for metric in metrics ValueError: max() arg is an empty sequence {code} I assume it cannot find the metric it's searching for > PushHttpMetricsReporter should not convert metric value to double > - > > Key: KAFKA-7560 > URL: https://issues.apache.org/jira/browse/KAFKA-7560 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Dong Lin >Priority: Blocker > > Currently metricValue > > The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails > when I run it locally. It produces the following error message: > {code:java} > File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, > in validate metric.value for k, metrics in > producer.metrics(group='producer-metrics', name='outgoing-byte-rate', > client_id=producer.client_id) for metric in metrics ValueError: max() arg is > an empty sequence > {code} > I assume it cannot find the metric it's searching for -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7560) PushHttpMetricsReporter should not convert metric value to double
[ https://issues.apache.org/jira/browse/KAFKA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7560: Summary: PushHttpMetricsReporter should not convert metric value to double (was: Client Quota - system test failure) > PushHttpMetricsReporter should not convert metric value to double > - > > Key: KAFKA-7560 > URL: https://issues.apache.org/jira/browse/KAFKA-7560 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Dong Lin >Priority: Blocker > > The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails > when I run it locally. It produces the following error message: > {code:java} > File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, > in validate metric.value for k, metrics in > producer.metrics(group='producer-metrics', name='outgoing-byte-rate', > client_id=producer.client_id) for metric in metrics ValueError: max() arg is > an empty sequence > {code} > I assume it cannot find the metric it's searching for -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7560) Client Quota - system test failure
[ https://issues.apache.org/jira/browse/KAFKA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677126#comment-16677126 ] Dong Lin commented on KAFKA-7560: - Never mind. I just found the issue. > Client Quota - system test failure > -- > > Key: KAFKA-7560 > URL: https://issues.apache.org/jira/browse/KAFKA-7560 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Dong Lin >Priority: Blocker > > The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails > when I run it locally. It produces the following error message: > {code:java} > File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, > in validate metric.value for k, metrics in > producer.metrics(group='producer-metrics', name='outgoing-byte-rate', > client_id=producer.client_id) for metric in metrics ValueError: max() arg is > an empty sequence > {code} > I assume it cannot find the metric it's searching for -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7560) Client Quota - system test failure
[ https://issues.apache.org/jira/browse/KAFKA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676512#comment-16676512 ] Dong Lin commented on KAFKA-7560: - Initially I thought the test failure is related to quota logic in this test and thus I would be one of the best person to debug this test. Now it seems that the test failed because the test suite is not able to read metrics from producer using the solution developed in [https://github.com/apache/kafka/pull/4072.] More specifically, the log message shows that 5 messages are successfully produced and consumed. But do_POST in http.py is never called and thus we have the exception shown in the Jira description. [~ewencp] [~apurva] could you have time to take a look since you are probably more familiar with the HTTP based approach of sending metrics here? I will also try to debug further. > Client Quota - system test failure > -- > > Key: KAFKA-7560 > URL: https://issues.apache.org/jira/browse/KAFKA-7560 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Dong Lin >Priority: Blocker > > The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails > when I run it locally. It produces the following error message: > {code:java} > File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, > in validate metric.value for k, metrics in > producer.metrics(group='producer-metrics', name='outgoing-byte-rate', > client_id=producer.client_id) for metric in metrics ValueError: max() arg is > an empty sequence > {code} > I assume it cannot find the metric it's searching for -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7560) Client Quota - system test failure
[ https://issues.apache.org/jira/browse/KAFKA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7560: Priority: Blocker (was: Major) > Client Quota - system test failure > -- > > Key: KAFKA-7560 > URL: https://issues.apache.org/jira/browse/KAFKA-7560 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Dong Lin >Priority: Blocker > > The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails > when I run it locally. It produces the following error message: > {code:java} > File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, > in validate metric.value for k, metrics in > producer.metrics(group='producer-metrics', name='outgoing-byte-rate', > client_id=producer.client_id) for metric in metrics ValueError: max() arg is > an empty sequence > {code} > I assume it cannot find the metric it's searching for -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669391#comment-16669391 ] Dong Lin commented on KAFKA-7481: - [~ijuma] Yeah this works and this is why I think we need two separate upgrade states. Currently there are three possible upgrade state, i.e. binary version is upgraded, binary version + inter.broker.protocol.version are upgraded, and binary version + inter.broker.protocol.version + message.format.version is upgraded. I guess my point is that it is reasonable to keep these three states in the long term. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669323#comment-16669323 ] Dong Lin commented on KAFKA-7481: - [~ijuma] Regarding `I am still not sure if there's a lot of value in having two separate upgrade states`, I think we need at least one separate upgrade state for changes that can not be rolled back, since it seems weird not to be able to downgrade if there is only minor version change in the Kafka. And the rational for the second separate upgrade state is that, there are two categories of changes that prevents downgrade, e.g. those that changes topic schema and those that changes message format. It is common for user to be willing to pickup the first category of change very soon, and only pickup the second category of change much later after client library has been upgraded to reduce performance cost in the broker. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7403) Offset commit failure after upgrading brokers past KIP-211/KAFKA-4682
[ https://issues.apache.org/jira/browse/KAFKA-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-7403: --- Assignee: Vahid Hashemian > Offset commit failure after upgrading brokers past KIP-211/KAFKA-4682 > - > > Key: KAFKA-7403 > URL: https://issues.apache.org/jira/browse/KAFKA-7403 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Jon Lee >Assignee: Vahid Hashemian >Priority: Blocker > Fix For: 2.1.0 > > > I am currently trying broker upgrade from 0.11 to 2.0 with some patches > including KIP-211/KAFKA-4682. After the upgrade, however, applications with > 0.10.2 Kafka clients failed with the following error: > {code:java} > 2018/09/11 19:34:52.814 ERROR Failed to commit offsets. Exiting. > org.apache.kafka.common.KafkaException: Unexpected error in commit: The > server experienced an unexpected error when processing the request at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:784) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:722) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:784) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:765) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:186) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:149) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:116) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:493) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:322) > ~[kafka-clients-0.10.2.86.jar:?] at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:253) > ~[kafka-clients-0.10.2.86.jar:?] > {code} > From my reading of the code, it looks like the following happened: > # The 0.10.2 client sends a v2 OffsetCommitRequest to the broker. It sets > the retentionTime field of the OffsetCommitRequest to DEFAULT_RETENTION_TIME. > # In the 2.0 broker code, upon receiving an OffsetCommitRequest with > DEFAULT_RETENTION_TIME, KafkaApis.handleOffsetCommitRequest() sets the > "expireTimestamp" field of OffsetAndMetadata to None. > # Later in the code path, GroupMetadataManager.offsetCommitValue() expects > OffsetAndMetadata to have a non-empty "expireTimestamp" field if the > inter.broker.protocol.version is < KAFKA_2_1_IV0. > # However, the inter.broker.protocol.version was set to "1.0" prior to the > upgrade, and as a result, the following code in offsetCommitValue() raises an > error because expireTimestamp is None: > {code:java} > value.set(OFFSET_VALUE_EXPIRE_TIMESTAMP_FIELD_V1, > offsetAndMetadata.expireTimestamp.get){code} > > Here is the stack trace for the broker side error > {code:java} > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) ~[scala-library-2.11.12.jar:?] > at scala.None$.get(Option.scala:345) ~[scala-library-2.11.12.jar:?] > at > kafka.coordinator.group.GroupMetadataManager$.offsetCommitValue(GroupMetadataManager.scala:1109) > ~[kafka_2.11-2.0.0.10.jar:?] > at > kafka.coordinator.group.GroupMetadataManager$$anonfun$7.apply(GroupMetadataManager.scala:326) > ~[kafka_2.11-2.0.0.10.jar:?] > at > kafka.coordinator.group.GroupMetadataManager$$anonfun$7.apply(GroupMetadataManager.scala:324) > ~[kafka_2.11-2.0.0.10.jar:?] > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > ~[scala-library-2.11.12.jar:?] > at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) > ~[scala-library-2.11.12.jar:?] > at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428) > ~[scala-library-2.11.12.jar:?] > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > ~[scala-library-2.11.12.jar:?] > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > ~[scala-
[jira] [Assigned] (KAFKA-7560) Client Quota - system test failure
[ https://issues.apache.org/jira/browse/KAFKA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin reassigned KAFKA-7560: --- Assignee: Dong Lin > Client Quota - system test failure > -- > > Key: KAFKA-7560 > URL: https://issues.apache.org/jira/browse/KAFKA-7560 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Stanislav Kozlovski >Assignee: Dong Lin >Priority: Major > > The test under `kafkatest.tests.client.quota_test.QuotaTest.test_quota` fails > when I run it locally. It produces the following error message: > {code:java} > File "/opt/kafka-dev/tests/kafkatest/tests/client/quota_test.py", line 196, > in validate metric.value for k, metrics in > producer.metrics(group='producer-metrics', name='outgoing-byte-rate', > client_id=producer.client_id) for metric in metrics ValueError: max() arg is > an empty sequence > {code} > I assume it cannot find the metric it's searching for -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663274#comment-16663274 ] Dong Lin commented on KAFKA-7481: - Hey [~ewencp], regarding the change for 2.1.0 release, say we don't make further code change for this issue, I agree it is good to clarify in the upgrade note that inter.broker.protocol.version can not be reverted after it is bumped to 2.1.0. Regarding the short term solution, I also prefer not to make big code change to e.g. use KIP-35 idea to solve the issue here. I would prefer to just clarify in the upgrade note that inter.broker.protocol.version can not be reverted after it is bumped to 2.1.0. Also, since we currently do not mention anything about downgrade in the upgrade note, and the other config log.message.format.version can not be downgraded, I am not sure user actually expect to be able to downgrade the inter.broker.protocol. So I feel this short term solution is OK and strictly speaking it does not break any semantic guarantee. Regarding the long term solution, it seems that we actually want user to manually manage the protocol version config in order to pickup any new feature that can change the data format on disk. Otherwise, say we always make things work with one rolling bounce, then whenever there is feature that change data format on disk, we will have to bump up the major version for the next Kafka release to indicate that the version can not be downgraded, which delays the acceptance for the release. Also, if we automatically bump up the message.format.version for the new broker version, the broker performance will downgrade so much because user wont' even have time upgrade client library version for most users in the organization. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7313) StopReplicaRequest should attempt to remove future replica for the partition only if future replica exists
[ https://issues.apache.org/jira/browse/KAFKA-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7313: Fix Version/s: 2.1.1 > StopReplicaRequest should attempt to remove future replica for the partition > only if future replica exists > -- > > Key: KAFKA-7313 > URL: https://issues.apache.org/jira/browse/KAFKA-7313 > Project: Kafka > Issue Type: Improvement >Reporter: Dong Lin >Assignee: Dong Lin >Priority: Major > Fix For: 2.1.1 > > > This patch fixes two issues: > 1) Currently if a broker received StopReplicaRequest with delete=true for the > same offline replica, the first StopRelicaRequest will show > KafkaStorageException and the second StopRelicaRequest will show > ReplicaNotAvailableException. This is because the first StopRelicaRequest > will remove the mapping (tp -> ReplicaManager.OfflinePartition) from > ReplicaManager.allPartitions before returning KafkaStorageException, thus the > second StopRelicaRequest will not find this partition as offline. > This result appears to be inconsistent. And since the replica is already > offline and broker will not be able to delete file for this replica, the > StopReplicaRequest should fail without making any change and broker should > still remember that this replica is offline. > 2) Currently if broker receives StopReplicaRequest with delete=true, the > broker will attempt to remove future replica for the partition, which will > cause KafkaStorageException in the StopReplicaResponse if this replica does > not have future replica. It is problematic to always return > KafkaStorageException in the response if future replica does not exist. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7535) KafkaConsumer doesn't report records-lag if isolation.level is read_committed
[ https://issues.apache.org/jira/browse/KAFKA-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662916#comment-16662916 ] Dong Lin commented on KAFKA-7535: - [~ijuma] Thanks for the notice. Yeah I would like to have this issue fixed in 2.1.0. > KafkaConsumer doesn't report records-lag if isolation.level is read_committed > - > > Key: KAFKA-7535 > URL: https://issues.apache.org/jira/browse/KAFKA-7535 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.0.0 >Reporter: Alexey Vakhrenev >Assignee: lambdaliu >Priority: Major > Labels: regression > Fix For: 2.1.0, 2.1.1, 2.0.2 > > > Starting from 2.0.0, {{KafkaConsumer}} doesn't report {{records-lag}} if > {{isolation.level}} is {{read_committed}}. The last version, where it works > is {{1.1.1}}. > The issue can be easily reproduced in {{kafka.api.PlaintextConsumerTest}} by > adding {{consumerConfig.setProperty("isolation.level", "read_committed")}} > witin related tests: > - {{testPerPartitionLagMetricsCleanUpWithAssign}} > - {{testPerPartitionLagMetricsCleanUpWithSubscribe}} > - {{testPerPartitionLagWithMaxPollRecords}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7535) KafkaConsumer doesn't report records-lag if isolation.level is read_committed
[ https://issues.apache.org/jira/browse/KAFKA-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7535: Fix Version/s: 2.1.0 > KafkaConsumer doesn't report records-lag if isolation.level is read_committed > - > > Key: KAFKA-7535 > URL: https://issues.apache.org/jira/browse/KAFKA-7535 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.0.0 >Reporter: Alexey Vakhrenev >Assignee: lambdaliu >Priority: Major > Labels: regression > Fix For: 2.1.0, 2.1.1, 2.0.2 > > > Starting from 2.0.0, {{KafkaConsumer}} doesn't report {{records-lag}} if > {{isolation.level}} is {{read_committed}}. The last version, where it works > is {{1.1.1}}. > The issue can be easily reproduced in {{kafka.api.PlaintextConsumerTest}} by > adding {{consumerConfig.setProperty("isolation.level", "read_committed")}} > witin related tests: > - {{testPerPartitionLagMetricsCleanUpWithAssign}} > - {{testPerPartitionLagMetricsCleanUpWithSubscribe}} > - {{testPerPartitionLagWithMaxPollRecords}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662619#comment-16662619 ] Dong Lin commented on KAFKA-7481: - BTW, if we think this the long term solution, then this issue is no longer blocking for this 2.1.0 release because we will tell user that both inter.broker.protocol.version and message.format.version involves disk format change and thus can not be rolled back. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662610#comment-16662610 ] Dong Lin commented on KAFKA-7481: - After reading through the discussion and thinking through the problem more, I have the following alternative solution. Basically we have the following three catagories of features: 1) Features that require only the protocol change. These features can be rolled back. 2) Features that require both the protocol and disk format change without having performance impact. These features can not be rolled back. 3) Features that require both the protocol and disk format change and have performance impact. These features can not be rolled back. My proposed solution is to this: 1) For features that require only the protocol change, let broker automatically detect the protocol version (e.g. KIP-35) as the lower version of the two brokers in communication instead of controlling the version explicitly using inter.broker.protocol.version. 2) For features that require both the protocol and disk format change without having performance impact, the version can be specified explicitly using the existing inter.broker.protocol.version config. And we tell user that this config can not be rolled back after it is bumped. 3) For features that require both the protocol and disk format change and have performance impact, the version will be specified explicitly using message.format.version as we are currently doing. There is no change in this category. This solution does not increase config or the testing surface area which meet the goal of Ismael. And this solution also minimizes the cases in which we do not allow broker version downgrade which meet the goal of Jason and Ewen. One additional benefit, as mentioned by Ewen with KIP-35, is that for features that require only the protocol change, which is the common case, user only need one rolling bounce to pick up the feature. Does this sound good? > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7481: Priority: Blocker (was: Critical) > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662555#comment-16662555 ] Dong Lin commented on KAFKA-7481: - Thanks [~ewencp] for the suggestion. I agree. Will do this in the future. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Critical > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7481: Priority: Critical (was: Blocker) > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Critical > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7481: Fix Version/s: 2.1.0 > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661482#comment-16661482 ] Dong Lin commented on KAFKA-7481: - I removed 2.1.0 tag so that release.py would generate release for 2.1.0-RC0. The goal is for open source users to be able to help test this release even before we have addressed all issues. We can add the tag back later. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7481: Fix Version/s: (was: 2.1.0) > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6045) All access to log should fail if log is closed
[ https://issues.apache.org/jira/browse/KAFKA-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-6045: Fix Version/s: (was: 2.1.0) > All access to log should fail if log is closed > -- > > Key: KAFKA-6045 > URL: https://issues.apache.org/jira/browse/KAFKA-6045 > Project: Kafka > Issue Type: Bug >Reporter: Dong Lin >Priority: Major > > After log.close() or log.closeHandlers() is called for a given log, all uses > of the Log's API should fail with proper exception. For example, > log.appendAsLeader() should throw KafkaStorageException. APIs such as > Log.activeProducersWithLastSequence() should also fail but not necessarily > with KafkaStorageException, since the KafkaStorageException indicates failure > to access disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-5857) Excessive heap usage on controller node during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-5857: Fix Version/s: (was: 2.1.0) > Excessive heap usage on controller node during reassignment > --- > > Key: KAFKA-5857 > URL: https://issues.apache.org/jira/browse/KAFKA-5857 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.11.0.0 > Environment: CentOs 7, Java 1.8 >Reporter: Raoufeh Hashemian >Priority: Major > Labels: reliability > Attachments: CPU.png, disk_write_x.png, memory.png, > reassignment_plan.txt > > > I was trying to expand our kafka cluster of 6 broker nodes to 12 broker > nodes. > Before expansion, we had a single topic with 960 partitions and a replication > factor of 3. So each node had 480 partitions. The size of data in each node > was 3TB . > To do the expansion, I submitted a partition reassignment plan (see attached > file for the current/new assignments). The plan was optimized to minimize > data movement and be rack aware. > When I submitted the plan, it took approximately 3 hours for moving data from > old to new nodes to complete. After that, it started deleting source > partitions (I say this based on the number of file descriptors) and > rebalancing leaders which has not been successful. Meanwhile, the heap usage > in the controller node started to go up with a large slope (along with long > GC times) and it took 5 hours for the controller to go out of memory and > another controller started to have the same behaviour for another 4 hours. At > this time the zookeeper ran out of disk and the service stopped. > To recover from this condition: > 1) Removed zk logs to free up disk and restarted all 3 zk nodes > 2) Deleted /kafka/admin/reassign_partitions node from zk > 3) Had to do unclean restarts of kafka service on oom controller nodes which > took 3 hours to complete . After this stage there was still 676 under > replicated partitions. > 4) Do a clean restart on all 12 broker nodes. > After step 4 , number of under replicated nodes went to 0. > So I was wondering if this memory footprint from controller is expected for > 1k partitions ? Did we do sth wrong or it is a bug? > Attached are some resource usage graph during this 30 hours event and the > reassignment plan. I'll try to add log files as well -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-5403) Transactions system test should dedup consumed messages by offset
[ https://issues.apache.org/jira/browse/KAFKA-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-5403: Fix Version/s: (was: 2.1.0) > Transactions system test should dedup consumed messages by offset > - > > Key: KAFKA-5403 > URL: https://issues.apache.org/jira/browse/KAFKA-5403 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Apurva Mehta >Assignee: Apurva Mehta >Priority: Major > > In KAFKA-5396, we saw that the consumers which verify the data in multiple > topics could read the same offsets multiple times, for instance when a > rebalance happens. > This would detect spurious duplicates, causing the test to fail. We should > dedup the consumed messages by offset and only fail the test if we have > duplicate values for a if for a unique set of offsets. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-4690) IllegalStateException using DeleteTopicsRequest when delete.topic.enable=false
[ https://issues.apache.org/jira/browse/KAFKA-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-4690: Fix Version/s: (was: 2.1.0) > IllegalStateException using DeleteTopicsRequest when delete.topic.enable=false > -- > > Key: KAFKA-4690 > URL: https://issues.apache.org/jira/browse/KAFKA-4690 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.2.0 > Environment: OS X >Reporter: Jon Chiu >Assignee: Manikumar >Priority: Major > Attachments: delete-topics-request.java > > > There is no indication as to why the delete request fails. Perhaps an error > code? > This can be reproduced with the following steps: > 1. Start ZK and 1 broker (with default {{delete.topic.enable=false}}) > 2. Create a topic test > {noformat} > bin/kafka-topics.sh --zookeeper localhost:2181 \ > --create --topic test --partition 1 --replication-factor 1 > {noformat} > 3. Delete topic by sending a DeleteTopicsRequest > 4. An error is returned > {noformat} > org.apache.kafka.common.errors.DisconnectException > {noformat} > or > {noformat} > java.lang.IllegalStateException: Attempt to retrieve exception from future > which hasn't failed > at > org.apache.kafka.clients.consumer.internals.RequestFuture.exception(RequestFuture.java:99) > at > io.confluent.adminclient.KafkaAdminClient.send(KafkaAdminClient.java:195) > at > io.confluent.adminclient.KafkaAdminClient.deleteTopic(KafkaAdminClient.java:152) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7464) Fail to shutdown ReplicaManager during broker cleaned shutdown
[ https://issues.apache.org/jira/browse/KAFKA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7464: Fix Version/s: 2.0.1 > Fail to shutdown ReplicaManager during broker cleaned shutdown > -- > > Key: KAFKA-7464 > URL: https://issues.apache.org/jira/browse/KAFKA-7464 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Zhanxiang (Patrick) Huang >Assignee: Zhanxiang (Patrick) Huang >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > In 2.0 deployment, we saw the following log when shutting down the > ReplicaManager in broker cleaned shutdown: > {noformat} > 2018/09/27 08:22:18.699 WARN [CoreUtils$] [Thread-1] [kafka-server] [] null > java.lang.IllegalArgumentException: null > at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_121] > at sun.nio.ch.IOUtil.write(IOUtil.java:68) ~[?:1.8.0_121] > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > ~[?:1.8.0_121] > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:214) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:164) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:806) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:107) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.Selector.doClose(Selector.java:751) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:739) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:701) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:315) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.clients.NetworkClient.close(NetworkClient.java:595) > ~[kafka-clients-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherBlockingSend.close(ReplicaFetcherBlockingSend.scala:107) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherThread.initiateShutdown(ReplicaFetcherThread.scala:108) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:183) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > ~[scala-library-2.11.12.jar:?] > at > kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherManager.shutdown(ReplicaFetcherManager.scala:37) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.ReplicaManager.shutdown(ReplicaManager.scala:1471) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.KafkaServer$$anonfun$shutdown$12.apply$mcV$sp(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > {noformat} > After that, we noticed that some of the replica fetcher thread fail to > shutdown: > {noformat} > 2018/09/27 08:22:46.176 ERROR [LogDirFailureChannel] > [ReplicaFetcherThread-26-13085] [kafka-server] [] Error while rolling log > segment for video-social-gestures-30 in dir /export/content/kafka/i001_caches > java.nio.channels.ClosedChannelException: null > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) > ~[?:1.8.0_121] > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) > ~[?:1.8.0_121] > at > org.apache.kafka.c
[jira] [Commented] (KAFKA-7519) Transactional Ids Left in Pending State by TransactionStateManager During Transactional Id Expiration Are Unusable
[ https://issues.apache.org/jira/browse/KAFKA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657983#comment-16657983 ] Dong Lin commented on KAFKA-7519: - [~ijuma] Thanks for updating the state. I would like to help review it. But it seems more related to the stream processing and transaction semantics. So it may be safer if someone with more expertise in these two areas can take a look :) > Transactional Ids Left in Pending State by TransactionStateManager During > Transactional Id Expiration Are Unusable > -- > > Key: KAFKA-7519 > URL: https://issues.apache.org/jira/browse/KAFKA-7519 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 2.0.0 >Reporter: Bridger Howell >Priority: Blocker > Fix For: 2.1.0 > > Attachments: KAFKA-7519.patch, image-2018-10-18-13-02-22-371.png > > > > After digging into a case where an exactly-once streams process was bizarrely > unable to process incoming data, we observed the following: > * StreamThreads stalling while creating a producer, eventually resulting in > no consumption by that streams process. Looking into those threads, we found > they were stuck in a loop, sending InitProducerIdRequests and always > receiving back the retriable error CONCURRENT_TRANSACTIONS and trying again. > These requests always had the same transactional id. > * After changing the streams process to not use exactly-once, it was able to > process messages with no problems. > * Alternatively, changing the applicationId for that streams process, it was > able to process with no problems. > * Every hour, every broker would fail the task `transactionalId-expiration` > with the following error: > ** > {code:java} > {"exception":{"stacktrace":"java.lang.IllegalStateException: Preparing > transaction state transition to Dead while it already a pending sta > te Dead > at > kafka.coordinator.transaction.TransactionMetadata.prepareTransitionTo(TransactionMetadata.scala:262) > at kafka.coordinator > .transaction.TransactionMetadata.prepareDead(TransactionMetadata.scala:237) > at kafka.coordinator.transaction.TransactionStateManager$$a > nonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2$$anonfun$apply$9$$anonfun$3.apply(TransactionStateManager.scal > a:151) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$ano > nfun$2$$anonfun$apply$9$$anonfun$3.apply(TransactionStateManager.scala:151) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) > at > > kafka.coordinator.transaction.TransactionMetadata.inLock(TransactionMetadata.scala:172) > at kafka.coordinator.transaction.TransactionSt > ateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2$$anonfun$apply$9.apply(TransactionStateManager.sc > ala:150) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$a > nonfun$2$$anonfun$apply$9.apply(TransactionStateManager.scala:149) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversable > Like.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.Li > st.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.Li > st.map(List.scala:296) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$app > ly$mcV$sp$1$$anonfun$2.apply(TransactionStateManager.scala:149) > at kafka.coordinator.transaction.TransactionStateManager$$anonfun$enabl > eTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2.apply(TransactionStateManager.scala:142) > at scala.collection.Traversabl > eLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike. > scala:241) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > at scala.collection.mutable.HashMap$$anon > fun$foreach$1.apply(HashMap.scala:130) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > at scala.collec > tion.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > at scala.collecti > on.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) > a > t > kafka.coordinator.transaction.TransactionStateManager
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657754#comment-16657754 ] Dong Lin commented on KAFKA-7481: - [~hachikuji] Thanks for the quick work on the KIP! The KIP looks good overall. I may have some question about how to make it easier for user to understand the use of inter.broker.protocol.version, message.format.version and persistent.metadata.version and document it in both Kafka wiki and the KIP. This can be discussed in more detail in the KIP discussion thread. Overall I think this KIP is the right long term solution for the problem. I am inclined not to block the 2.1 release on this KIP given that it is an existing issue that has affected previous release. But I am open to the other choice if other committers prefer that. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657635#comment-16657635 ] Dong Lin commented on KAFKA-7481: - Previously I thought that major version indicates 1) major features addition for marketing purpose and 2) backward incompatible change. I was not aware that user in general will also relate major version bump with 1) irreversible disk change and 2) inability to downgrade. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656026#comment-16656026 ] Dong Lin commented on KAFKA-7481: - Thanks for the detailed explanation [~hachikuji]. I agree with your short term and long term solution. I assume previously when we upgrade the consumer offset topic schema version, we have the same issue of not being able to downgrade the Kafka broker version after the schema version has been upgraded. So this is the status quo. I took a look at the upgrade.html. It seems that we currently don't have downgrade note. Maybe we need to additionally note in the upgrade.html that after user bumps up the inter.broker.protocol to 2.1.0, they can no longer downgrade the server version to be below 2.1.0. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7464) Fail to shutdown ReplicaManager during broker cleaned shutdown
[ https://issues.apache.org/jira/browse/KAFKA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7464: Fix Version/s: (was: 2.0.1) > Fail to shutdown ReplicaManager during broker cleaned shutdown > -- > > Key: KAFKA-7464 > URL: https://issues.apache.org/jira/browse/KAFKA-7464 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Zhanxiang (Patrick) Huang >Assignee: Zhanxiang (Patrick) Huang >Priority: Critical > Fix For: 2.1.0 > > > In 2.0 deployment, we saw the following log when shutting down the > ReplicaManager in broker cleaned shutdown: > {noformat} > 2018/09/27 08:22:18.699 WARN [CoreUtils$] [Thread-1] [kafka-server] [] null > java.lang.IllegalArgumentException: null > at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_121] > at sun.nio.ch.IOUtil.write(IOUtil.java:68) ~[?:1.8.0_121] > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > ~[?:1.8.0_121] > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:214) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:164) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:806) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:107) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.Selector.doClose(Selector.java:751) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:739) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:701) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:315) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.clients.NetworkClient.close(NetworkClient.java:595) > ~[kafka-clients-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherBlockingSend.close(ReplicaFetcherBlockingSend.scala:107) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherThread.initiateShutdown(ReplicaFetcherThread.scala:108) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:183) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > ~[scala-library-2.11.12.jar:?] > at > kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherManager.shutdown(ReplicaFetcherManager.scala:37) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.ReplicaManager.shutdown(ReplicaManager.scala:1471) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.KafkaServer$$anonfun$shutdown$12.apply$mcV$sp(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > {noformat} > After that, we noticed that some of the replica fetcher thread fail to > shutdown: > {noformat} > 2018/09/27 08:22:46.176 ERROR [LogDirFailureChannel] > [ReplicaFetcherThread-26-13085] [kafka-server] [] Error while rolling log > segment for video-social-gestures-30 in dir /export/content/kafka/i001_caches > java.nio.channels.ClosedChannelException: null > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) > ~[?:1.8.0_121] > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) > ~[?:1.8.0_121] > at > org.apache.kaf
[jira] [Resolved] (KAFKA-7464) Fail to shutdown ReplicaManager during broker cleaned shutdown
[ https://issues.apache.org/jira/browse/KAFKA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin resolved KAFKA-7464. - Resolution: Fixed > Fail to shutdown ReplicaManager during broker cleaned shutdown > -- > > Key: KAFKA-7464 > URL: https://issues.apache.org/jira/browse/KAFKA-7464 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Zhanxiang (Patrick) Huang >Assignee: Zhanxiang (Patrick) Huang >Priority: Critical > Fix For: 2.1.0 > > > In 2.0 deployment, we saw the following log when shutting down the > ReplicaManager in broker cleaned shutdown: > {noformat} > 2018/09/27 08:22:18.699 WARN [CoreUtils$] [Thread-1] [kafka-server] [] null > java.lang.IllegalArgumentException: null > at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_121] > at sun.nio.ch.IOUtil.write(IOUtil.java:68) ~[?:1.8.0_121] > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > ~[?:1.8.0_121] > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:214) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:164) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:806) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:107) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.Selector.doClose(Selector.java:751) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:739) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:701) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:315) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.clients.NetworkClient.close(NetworkClient.java:595) > ~[kafka-clients-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherBlockingSend.close(ReplicaFetcherBlockingSend.scala:107) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherThread.initiateShutdown(ReplicaFetcherThread.scala:108) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:183) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > ~[scala-library-2.11.12.jar:?] > at > kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherManager.shutdown(ReplicaFetcherManager.scala:37) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.ReplicaManager.shutdown(ReplicaManager.scala:1471) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.KafkaServer$$anonfun$shutdown$12.apply$mcV$sp(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > {noformat} > After that, we noticed that some of the replica fetcher thread fail to > shutdown: > {noformat} > 2018/09/27 08:22:46.176 ERROR [LogDirFailureChannel] > [ReplicaFetcherThread-26-13085] [kafka-server] [] Error while rolling log > segment for video-social-gestures-30 in dir /export/content/kafka/i001_caches > java.nio.channels.ClosedChannelException: null > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) > ~[?:1.8.0_121] > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) > ~[?:1.8.0_121] > at > org.apache.kafka.common.re
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654759#comment-16654759 ] Dong Lin commented on KAFKA-7481: - BTW, this is currently the only blocking issue for 2.1.0 release. It will be great to fix it and unblock 2.1.0 release. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema
[ https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654757#comment-16654757 ] Dong Lin commented on KAFKA-7481: - Option 2 sounds good to me. For option 2, it is true that "features which depend on the persistent format could not be tested". This is anyway the case for all other features (e.g. transaction semantics) that depends on the newer message format, right? So this is the existing state and I probably will not call it the downside of option 2. > Consider options for safer upgrade of offset commit value schema > > > Key: KAFKA-7481 > URL: https://issues.apache.org/jira/browse/KAFKA-7481 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Blocker > Fix For: 2.1.0 > > > KIP-211 and KIP-320 add new versions of the offset commit value schema. The > use of the new schema version is controlled by the > `inter.broker.protocol.version` configuration. Once the new inter-broker > version is in use, it is not possible to downgrade since the older brokers > will not be able to parse the new schema. > The options at the moment are the following: > 1. Do nothing. Users can try the new version and keep > `inter.broker.protocol.version` locked to the old release. Downgrade will > still be possible, but users will not be able to test new capabilities which > depend on inter-broker protocol changes. > 2. Instead of using `inter.broker.protocol.version`, we could use > `message.format.version`. This would basically extend the use of this config > to apply to all persistent formats. The advantage is that it allows users to > upgrade the broker and begin using the new inter-broker protocol while still > allowing downgrade. But features which depend on the persistent format could > not be tested. > Any other options? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7464) Fail to shutdown ReplicaManager during broker cleaned shutdown
[ https://issues.apache.org/jira/browse/KAFKA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652812#comment-16652812 ] Dong Lin commented on KAFKA-7464: - [~ijuma] Previously I marked this issue as blocker because the effect looks very concerning: the broker will fail to shutdown cleanly. After taking a closer look at the issue, it seems that the issue happens very rarely and it should not be a blocker. The bug has been introduced since Oct 2017 . And kafka broker at LinkedIn has also been running well without requiring a fix for this issue. I have updated the JIRA to mark it as critical instead of blocker. It is still good to fix this issue in Apache Kafka. > Fail to shutdown ReplicaManager during broker cleaned shutdown > -- > > Key: KAFKA-7464 > URL: https://issues.apache.org/jira/browse/KAFKA-7464 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Zhanxiang (Patrick) Huang >Assignee: Zhanxiang (Patrick) Huang >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > In 2.0 deployment, we saw the following log when shutting down the > ReplicaManager in broker cleaned shutdown: > {noformat} > 2018/09/27 08:22:18.699 WARN [CoreUtils$] [Thread-1] [kafka-server] [] null > java.lang.IllegalArgumentException: null > at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_121] > at sun.nio.ch.IOUtil.write(IOUtil.java:68) ~[?:1.8.0_121] > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > ~[?:1.8.0_121] > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:214) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:164) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:806) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:107) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.Selector.doClose(Selector.java:751) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:739) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:701) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:315) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.clients.NetworkClient.close(NetworkClient.java:595) > ~[kafka-clients-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherBlockingSend.close(ReplicaFetcherBlockingSend.scala:107) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherThread.initiateShutdown(ReplicaFetcherThread.scala:108) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:183) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > ~[scala-library-2.11.12.jar:?] > at > kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherManager.shutdown(ReplicaFetcherManager.scala:37) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.ReplicaManager.shutdown(ReplicaManager.scala:1471) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.KafkaServer$$anonfun$shutdown$12.apply$mcV$sp(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > {noformat} > After that, we noticed th
[jira] [Updated] (KAFKA-7464) Fail to shutdown ReplicaManager during broker cleaned shutdown
[ https://issues.apache.org/jira/browse/KAFKA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Lin updated KAFKA-7464: Priority: Critical (was: Blocker) > Fail to shutdown ReplicaManager during broker cleaned shutdown > -- > > Key: KAFKA-7464 > URL: https://issues.apache.org/jira/browse/KAFKA-7464 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Zhanxiang (Patrick) Huang >Assignee: Zhanxiang (Patrick) Huang >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > In 2.0 deployment, we saw the following log when shutting down the > ReplicaManager in broker cleaned shutdown: > {noformat} > 2018/09/27 08:22:18.699 WARN [CoreUtils$] [Thread-1] [kafka-server] [] null > java.lang.IllegalArgumentException: null > at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_121] > at sun.nio.ch.IOUtil.write(IOUtil.java:68) ~[?:1.8.0_121] > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > ~[?:1.8.0_121] > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:214) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:164) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:806) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:107) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.common.network.Selector.doClose(Selector.java:751) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:739) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:701) > ~[kafka-clients-2.0.0.22.jar:?] > at org.apache.kafka.common.network.Selector.close(Selector.java:315) > ~[kafka-clients-2.0.0.22.jar:?] > at > org.apache.kafka.clients.NetworkClient.close(NetworkClient.java:595) > ~[kafka-clients-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherBlockingSend.close(ReplicaFetcherBlockingSend.scala:107) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherThread.initiateShutdown(ReplicaFetcherThread.scala:108) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:183) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > ~[scala-library-2.11.12.jar:?] > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > ~[scala-library-2.11.12.jar:?] > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > ~[scala-library-2.11.12.jar:?] > at > kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:182) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.ReplicaFetcherManager.shutdown(ReplicaFetcherManager.scala:37) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.ReplicaManager.shutdown(ReplicaManager.scala:1471) > ~[kafka_2.11-2.0.0.22.jar:?] > at > kafka.server.KafkaServer$$anonfun$shutdown$12.apply$mcV$sp(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86) > ~[kafka_2.11-2.0.0.22.jar:?] > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:616) > ~[kafka_2.11-2.0.0.22.jar:?] > {noformat} > After that, we noticed that some of the replica fetcher thread fail to > shutdown: > {noformat} > 2018/09/27 08:22:46.176 ERROR [LogDirFailureChannel] > [ReplicaFetcherThread-26-13085] [kafka-server] [] Error while rolling log > segment for video-social-gestures-30 in dir /export/content/kafka/i001_caches > java.nio.channels.ClosedChannelException: null > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) > ~[?:1.8.0_121] > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300) > ~[?:1.8.0_121] > at > org.