[jira] [Created] (KAFKA-7575) 'Error while writing to checkpoint file' Issue

2018-10-30 Thread Dasun Nirmitha (JIRA)
Dasun Nirmitha created KAFKA-7575:
-

 Summary: 'Error while writing to checkpoint file' Issue
 Key: KAFKA-7575
 URL: https://issues.apache.org/jira/browse/KAFKA-7575
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 1.1.1
 Environment: Windows 10, Kafka 1.1.1
Reporter: Dasun Nirmitha
 Attachments: Dry run error.rar

I'm currently testing a Java Kafka producer application coded to retrieve a db 
value from a local mysql db and produce to a single topic. Locally I've got a 
Zookeeper server and a Kafka single broker running.
My issue is I need to produce this from the Kafka producer each second, and 
that works for around 2 hours until broker throws an 'Error while writing to 
checkpoint file' and shuts down. Producing with a 1 minute interval works with 
no issues but unfortunately I need the produce interval to be 1 second.
I have attached a rar containing screenshots of the Errors thrown from the 
Broker and my application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7574) Kafka unable to delete segment file in Linux

2018-10-30 Thread Ben Maas (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Maas updated KAFKA-7574:

Description: 
The initial error is that Kafka believes it is unable to delete a log segment 
file. This causes Kafka to mark the log directory as unavailable and eventually 
shut down without flushing data to disk (which is probably the right thing to 
do). There are no indications in the OS logs of a failed filesystem or any 
other OS level issue. We have verified that filesystem is consistent.

Is there a IO timeout of some kind that can be adjusted or is something else 
happening? Potential duplicate of race condition seen in KAFKA-6194.

See attached files for config and example log pattern.

  was:
The initial error is that Kafka believes it is unable to delete a log segment 
file. This causes Kafka to mark the log directory as unavailable and eventually 
shut down without flushing data to disk. There are no indications in the OS 
logs of a failed filesystem or any other OS level issue. We have verified that 
filesystem is consistent.

Is there a IO timeout of some kind that can be adjusted or is something else 
happening? Potential duplicate of race condition seen in KAFKA-6194.

See attached files for config and example log pattern.


> Kafka unable to delete segment file in Linux
> 
>
> Key: KAFKA-7574
> URL: https://issues.apache.org/jira/browse/KAFKA-7574
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 1.0.0
> Environment: Debian Stretch
>Reporter: Ben Maas
>Priority: Major
> Attachments: kafka-config.log, kafka-delete_error_pattern.log
>
>
> The initial error is that Kafka believes it is unable to delete a log segment 
> file. This causes Kafka to mark the log directory as unavailable and 
> eventually shut down without flushing data to disk (which is probably the 
> right thing to do). There are no indications in the OS logs of a failed 
> filesystem or any other OS level issue. We have verified that filesystem is 
> consistent.
> Is there a IO timeout of some kind that can be adjusted or is something else 
> happening? Potential duplicate of race condition seen in KAFKA-6194.
> See attached files for config and example log pattern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7574) Kafka unable to delete segment file in Linux

2018-10-30 Thread Ben Maas (JIRA)
Ben Maas created KAFKA-7574:
---

 Summary: Kafka unable to delete segment file in Linux
 Key: KAFKA-7574
 URL: https://issues.apache.org/jira/browse/KAFKA-7574
 Project: Kafka
  Issue Type: Bug
  Components: core, log
Affects Versions: 1.0.0
 Environment: Debian Stretch
Reporter: Ben Maas
 Attachments: kafka-config.log, kafka-delete_error_pattern.log

The initial error is that Kafka believes it is unable to delete a log segment 
file. This causes Kafka to mark the log directory as unavailable and eventually 
shut down without flushing data to disk. There are no indications in the OS 
logs of a failed filesystem or any other OS level issue. We have verified that 
filesystem is consistent.

Is there a IO timeout of some kind that can be adjusted or is something else 
happening? Potential duplicate of race condition seen in KAFKA-6194.

See attached files for config and example log pattern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-10-30 Thread Dong Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669391#comment-16669391
 ] 

Dong Lin commented on KAFKA-7481:
-

[~ijuma] Yeah this works and this is why I think we need two separate upgrade 
states. Currently there are three possible upgrade state, i.e. binary version 
is upgraded, binary version + inter.broker.protocol.version are upgraded, and 
binary version + inter.broker.protocol.version + message.format.version is 
upgraded. I guess my point is that it is reasonable to keep these three states 
in the long term.

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-10-30 Thread Ismael Juma (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669360#comment-16669360
 ] 

Ismael Juma commented on KAFKA-7481:


The state where you can roll back is "upgrade binaries" and don't bump either 
of the configs. :)

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-10-30 Thread Dong Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669323#comment-16669323
 ] 

Dong Lin commented on KAFKA-7481:
-

[~ijuma] Regarding `I am still not sure if there's a lot of value in having two 
separate upgrade states`, I think we need at least one separate upgrade state 
for changes that can not be rolled back, since it seems weird not to be able to 
downgrade if there is only minor version change in the Kafka. And the rational 
for the second separate upgrade state is that, there are two categories of 
changes that prevents downgrade, e.g. those that changes topic schema and those 
that changes message format. It is common for user to be willing to pickup the 
first category of change very soon, and only pickup the second category of 
change much later after client library has been upgraded to reduce performance 
cost in the broker.

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7553) Jenkins PR tests hung

2018-10-30 Thread Jason Gustafson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669285#comment-16669285
 ] 

Jason Gustafson commented on KAFKA-7553:


I investigated a few additional hanging builds and also observed 
`testLowMaxFetchSizeForRequestAndPartition` hanging. Curiously it only seems to 
happen when the PR builder runs (AFAICT).

> Jenkins PR tests hung
> -
>
> Key: KAFKA-7553
> URL: https://issues.apache.org/jira/browse/KAFKA-7553
> Project: Kafka
>  Issue Type: Bug
>Reporter: John Roesler
>Priority: Minor
> Attachments: consoleText.txt
>
>
> I wouldn't worry about this unless it continues to happen, but I wanted to 
> document it.
> This was a Java 11 build: 
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/266/]
> It was for this PR: [https://github.com/apache/kafka/pull/5795]
> And this commit: 
> [https://github.com/apache/kafka/pull/5795/commits/5bdcd0e023c6f406d585155399f6541bb6a9f9c2]
>  
> It looks like the tests just hung after 46 minutes, until the build timed out 
> at 180 minutes.
> End of the output:
> {noformat}
> ...
> 00:46:27.275 kafka.server.ServerGenerateBrokerIdTest > 
> testConsistentBrokerIdFromUserConfigAndMetaProps STARTED
> 00:46:29.775 
> 00:46:29.775 kafka.server.ServerGenerateBrokerIdTest > 
> testConsistentBrokerIdFromUserConfigAndMetaProps PASSED
> 03:00:51.124 Build timed out (after 180 minutes). Marking the build as 
> aborted.
> 03:00:51.440 Build was aborted
> 03:00:51.492 [FINDBUGS] Skipping publisher since build result is ABORTED
> 03:00:51.492 Recording test results
> 03:00:51.495 Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
> 03:00:58.017 Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
> 03:00:59.330 Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
> 03:00:59.331 Adding one-line test results to commit status...
> 03:00:59.332 Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
> 03:00:59.334 Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
> 03:00:59.335 Setting status of 5bdcd0e023c6f406d585155399f6541bb6a9f9c2 to 
> FAILURE with url https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/266/ 
> and message: 'FAILURE
> 03:00:59.335  9053 tests run, 1 skipped, 0 failed.'
> 03:00:59.335 Using context: JDK 11 and Scala 2.12
> 03:00:59.541 Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
> 03:00:59.542 Finished: ABORTED{noformat}
>  
> I did find one test that started but did not finish:
> {noformat}
> 00:23:29.576 kafka.api.PlaintextConsumerTest > 
> testLowMaxFetchSizeForRequestAndPartition STARTED
> {noformat}
> But note that the tests continued to run for another 23 minutes after this 
> one started.
>  
> Just for completeness, there were 4 failures:
> {noformat}
> 00:22:06.875 kafka.admin.ResetConsumerGroupOffsetTest > 
> testResetOffsetsNotExistingGroup FAILED
> 00:22:06.875 java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> 00:22:06.875 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
> 00:22:06.875 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
> 00:22:06.875 at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
> 00:22:06.876 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
> 00:22:06.876 at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService.resetOffsets(ConsumerGroupCommand.scala:307)
> 00:22:06.876 at 
> kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup(ResetConsumerGroupOffsetTest.scala:89)
> 00:22:06.876 
> 00:22:06.876 Caused by:
> 00:22:06.876 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.{noformat}
>  
> {noformat}
> 00:25:22.175 kafka.api.CustomQuotaCallbackTest > testCustomQuotaCallback 
> FAILED
> 00:25:22.175 java.lang.AssertionError: Partition [group1_largeTopic,69] 
> metadata not propagated after 15000 ms
> 00:25:22.176 at kafka.utils.TestUtils$.fail(TestUtils.scala:351)
> 00:25:22.176 at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:741)
> 00:25:22.176 at 
> kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:831)
> 00:25:22.176 at 
> kafka.utils.TestUtils$$anonfun$createTopic$2.apply(TestUtils.scala:330)
> 00:25:22.176 at 
> kafka.utils.TestUtils$$anonfun$createTopic$2.apply(TestUtils.scala:329)
> 00:25:22.176 at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> 00:25:22.176 at 
> 

[jira] [Resolved] (KAFKA-7567) Clean up internal metadata usage for consistency and extensibility

2018-10-30 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7567.

   Resolution: Fixed
Fix Version/s: 2.2.0

> Clean up internal metadata usage for consistency and extensibility
> --
>
> Key: KAFKA-7567
> URL: https://issues.apache.org/jira/browse/KAFKA-7567
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.2.0
>
>
> This refactor has two objectives to improve metadata handling logic and 
> testing:
> 1. We want to reduce dependence on the public object `Cluster` for internal 
> metadata propagation since it is not easy to evolve. As an example, we need 
> to propagate leader epochs from the metadata response to `Metadata`, but it 
> is not straightforward to do this without exposing it in `PartitionInfo` 
> since that is what `Cluster` uses internally. By doing this change, we are 
> able to remove some redundant `Cluster` building logic. 
> 2. We want to make the metadata handling in `MockClient` simpler and more 
> consistent. Currently we have mix of metadata update mechanisms which are 
> internally inconsistent with each other and also do not match the 
> implementation in `NetworkClient`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7573) Add an interface that allows broker to intercept every request/response pair

2018-10-30 Thread Lincong Li (JIRA)
Lincong Li created KAFKA-7573:
-

 Summary: Add an interface that allows broker to intercept every 
request/response pair
 Key: KAFKA-7573
 URL: https://issues.apache.org/jira/browse/KAFKA-7573
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 2.0.0
Reporter: Lincong Li
Assignee: Lincong Li


This interface is called "observer" and it opens up several opportunities. One 
major opportunity is that it enables an auditing system to be built for Kafka 
deployment. Details are discussed in a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7572) Producer should not send requests with negative partition id

2018-10-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669268#comment-16669268
 ] 

ASF GitHub Bot commented on KAFKA-7572:
---

yaodong66 opened a new pull request #5858: KAFKA-7572: Producer should not send 
requests with negative partition id
URL: https://github.com/apache/kafka/pull/5858
 
 
   Partition id should never be a negative value.
   This commit will make debug easier, when custom Partitioner generate an 
invalid negative partition id.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Producer should not send requests with negative partition id
> 
>
> Key: KAFKA-7572
> URL: https://issues.apache.org/jira/browse/KAFKA-7572
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 1.0.1
>Reporter: Yaodong Yang
>Priority: Major
>
> h3. Issue:
> In one Kafka producer log from our users, we found the following weird one:
> timestamp="2018-10-09T17:37:41,237-0700",level="ERROR", Message="Write to 
> Kafka failed with: ",exception="java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
> topicName--2: 30042 ms has passed since batch creation plus linger time
>  at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
>  at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
>  at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 
> record(s) for topicName--2: 30042 ms has passed since batch creation plus 
> linger time"
> After a few hours debugging, we finally understood the root cause of this 
> issue:
>  # The producer used a buggy custom Partitioner, which sometimes generates 
> negative partition ids for new records.
>  # The corresponding produce requests were rejected by brokers, because it's 
> illegal to have a partition with a negative id.
>  # The client kept refreshing its local cluster metadata, but could not send 
> produce requests successfully.
>  # From the above log, we found a suspicious string "topicName--2":
>  # According to the source code, the format of this string in the log is 
> TopicName+"-"+PartitionId.
>  # It's not easy to notice that there were 2 consecutive dash in the above 
> log.
>  # Eventually, we found that the second dash was a negative sign. Therefore, 
> the partition id is -2, rather than 2.
>  # The bug the custom Partitioner.
> h3. Proposal:
>  # Producer code should check the partitionId before sending requests to 
> brokers.
>  # If there is a negative partition Id, just throw an IllegalStateException{{ 
> }}exception.
>  # Such a quick check can save lots of time for people debugging their 
> producer code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7572) Producer should not send requests with negative partition id

2018-10-30 Thread Yaodong Yang (JIRA)
Yaodong Yang created KAFKA-7572:
---

 Summary: Producer should not send requests with negative partition 
id
 Key: KAFKA-7572
 URL: https://issues.apache.org/jira/browse/KAFKA-7572
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 1.0.1
Reporter: Yaodong Yang


h3. Issue:

In one Kafka producer log from our users, we found the following weird one:

timestamp="2018-10-09T17:37:41,237-0700",level="ERROR", Message="Write to Kafka 
failed with: ",exception="java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
topicName--2: 30042 ms has passed since batch creation plus linger time
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 
record(s) for topicName--2: 30042 ms has passed since batch creation plus 
linger time"

After a few hours debugging, we finally understood the root cause of this issue:
 # The producer used a buggy custom Partitioner, which sometimes generates 
negative partition ids for new records.
 # The corresponding produce requests were rejected by brokers, because it's 
illegal to have a partition with a negative id.
 # The client kept refreshing its local cluster metadata, but could not send 
produce requests successfully.
 # From the above log, we found a suspicious string "topicName--2":
 # According to the source code, the format of this string in the log is 
TopicName+"-"+PartitionId.
 # It's not easy to notice that there were 2 consecutive dash in the above log.
 # Eventually, we found that the second dash was a negative sign. Therefore, 
the partition id is -2, rather than 2.
 # The bug the custom Partitioner.

h3. Proposal:
 # Producer code should check the partitionId before sending requests to 
brokers.
 # If there is a negative partition Id, just throw an IllegalStateException{{ 
}}exception.
 # Such a quick check can save lots of time for people debugging their producer 
code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-10-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669203#comment-16669203
 ] 

ASF GitHub Bot commented on KAFKA-7481:
---

hachikuji opened a new pull request #5857: KAFKA-7481; Add upgrade/downgrade 
notes for 2.1.x
URL: https://github.com/apache/kafka/pull/5857
 
 
   We seemed to be missing the usual rolling upgrade instructions so I've added 
them and emphasized the impact for downgrades after bumping the inter-broker 
protocol version.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-10-30 Thread Jason Gustafson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669120#comment-16669120
 ] 

Jason Gustafson commented on KAFKA-7481:


I am also in favor of option 1 for the time being. While writing the KIP above, 
I felt the actual benefit to having a separate configuration was marginal and 
probably not worth the upgrade complexity that it adds. Note that I filed 
KAFKA-7570 to explore better options for managing the compatibility of the 
internal topic schemas. If we could figure that out, then we may not ultimately 
need the separate configuration. This discussion has also made it clear that we 
need a consistent and tested approach for downgrading Kafka. I also filed 
KAFKA-7571 to add system tests, which are lacking at the moment. Then we just 
need to agree on the long-term approach and document it.

For now, I will go ahead and submit a PR to add some additional upgrade notes 
mentioning the incompatible changes to the offsets schema and the impact for 
downgrades.

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7571) Add system tests for downgrading Kafka

2018-10-30 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7571:
--

 Summary: Add system tests for downgrading Kafka
 Key: KAFKA-7571
 URL: https://issues.apache.org/jira/browse/KAFKA-7571
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Jason Gustafson


We have system tests which verify client behavior when upgrading Kafka. We 
should add similar tests for supported downgrades.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7570) Make internal offsets/transaction schemas forward compatible

2018-10-30 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7570:
--

 Summary: Make internal offsets/transaction schemas forward 
compatible
 Key: KAFKA-7570
 URL: https://issues.apache.org/jira/browse/KAFKA-7570
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


Currently changes to the data stored in the internal topics (__consumer_offsets 
and __transaction_state) are not forward compatible. This means that once users 
have upgraded to a Kafka version which includes a bumped schema, then it is no 
longer possible to downgrade. The changes to these schemas tend to be 
incremental, so we should consider options that at least allow new fields to be 
added without breaking downgrades.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7192) State-store can desynchronise with changelog

2018-10-30 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669064#comment-16669064
 ] 

Matthias J. Sax commented on KAFKA-7192:


I hope so. Did not have time yet, to port the fix into 1.1 branch.

> State-store can desynchronise with changelog
> 
>
> Key: KAFKA-7192
> URL: https://issues.apache.org/jira/browse/KAFKA-7192
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.3, 1.0.2, 1.1.1, 2.0.0
>Reporter: Jon Bates
>Assignee: Guozhang Wang
>Priority: Critical
>  Labels: bugs
> Fix For: 0.11.0.4, 1.0.3, 2.0.1, 2.1.0
>
>
> n.b. this bug has been verified with exactly-once processing enabled
> Consider the following scenario:
>  * A record, N is read into a Kafka topology
>  * the state store is updated
>  * the topology crashes
> h3. *Expected behaviour:*
>  # Node is restarted
>  # Offset was never updated, so record N is reprocessed
>  # State-store is reset to position N-1
>  # Record is reprocessed
> h3. *Actual Behaviour*
>  # Node is restarted
>  # Record N is reprocessed (good)
>  # The state store has the state from the previous processing
> I'd consider this a corruption of the state-store, hence the critical 
> Priority, although High may be more appropriate.
> I wrote a proof-of-concept here, which demonstrates the problem on Linux:
> [https://github.com/spadger/kafka-streams-sad-state-store]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7192) State-store can desynchronise with changelog

2018-10-30 Thread Tobias Johansson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668859#comment-16668859
 ] 

Tobias Johansson commented on KAFKA-7192:
-

Will this be fixed in Kafka v1.1.2?

> State-store can desynchronise with changelog
> 
>
> Key: KAFKA-7192
> URL: https://issues.apache.org/jira/browse/KAFKA-7192
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.3, 1.0.2, 1.1.1, 2.0.0
>Reporter: Jon Bates
>Assignee: Guozhang Wang
>Priority: Critical
>  Labels: bugs
> Fix For: 0.11.0.4, 1.0.3, 2.0.1, 2.1.0
>
>
> n.b. this bug has been verified with exactly-once processing enabled
> Consider the following scenario:
>  * A record, N is read into a Kafka topology
>  * the state store is updated
>  * the topology crashes
> h3. *Expected behaviour:*
>  # Node is restarted
>  # Offset was never updated, so record N is reprocessed
>  # State-store is reset to position N-1
>  # Record is reprocessed
> h3. *Actual Behaviour*
>  # Node is restarted
>  # Record N is reprocessed (good)
>  # The state store has the state from the previous processing
> I'd consider this a corruption of the state-store, hence the critical 
> Priority, although High may be more appropriate.
> I wrote a proof-of-concept here, which demonstrates the problem on Linux:
> [https://github.com/spadger/kafka-streams-sad-state-store]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7165) Error while creating ephemeral at /brokers/ids/BROKER_ID

2018-10-30 Thread Jonathan Santilli (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668672#comment-16668672
 ] 

Jonathan Santilli edited comment on KAFKA-7165 at 10/30/18 12:59 PM:
-

Thanks for your reply [~junrao] , that was the only time we had that issue of 
the _org.apache.zookeeper.KeeperException$SessionExpiredException_ so far.

Now we are in version 2.0 of Kafka and from time to time suffering the 
*NODEEXISTS* issue.

Maybe the errors were related but difficult to ensure that, hopefully with the 
fix, we can get rid of the *NODEEXISTS* error.

 

Cheers!


was (Author: pachilo):
Thanks for your reply [~junrao] , that was the only time we had that issue of 
the _org.apache.zookeeper.KeeperException$SessionExpiredException_ so far.

Now we are in version 2.0 of Kafka and from time to time suffering the 
*NODEEXISTS* issue.

Maybe the errors were related but difficult to ensure that, hopefully with the 
fix, we can get rid of the *NODEEXISTS* error*.*

 

Cheers!

> Error while creating ephemeral at /brokers/ids/BROKER_ID
> 
>
> Key: KAFKA-7165
> URL: https://issues.apache.org/jira/browse/KAFKA-7165
> Project: Kafka
>  Issue Type: Bug
>  Components: core, zkclient
>Affects Versions: 1.1.0
>Reporter: Jonathan Santilli
>Assignee: Jonathan Santilli
>Priority: Major
>
> Kafka version: 1.1.0
> Zookeeper version: 3.4.12
> 4 Kafka Brokers
> 4 Zookeeper servers
>  
> In one of the 4 brokers of the cluster, we detect the following error:
> [2018-07-14 04:38:23,784] INFO Unable to read additional data from server 
> sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,509] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,510] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_1:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,513] INFO Unable to read additional data from server 
> sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_2:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
> broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition)
>  [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, 
> session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, 
> session 0x3000c2420cb458d has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,445] INFO EventThread shut down for session: 
> 0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. 
> (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session 
> to 
> *ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*.
>  (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO Initiating client connection, 
> connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*
>  sessionTimeout=6000 
> watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 
> (org.apache.zookeeper.ZooKeeper)
>  [2018-07-14 04:38:26,465] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,477] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_1:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,484] INFO Session establishment complete on server 
> *ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a, negotiated timeout 
> = 6000 (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,496] *INFO Creating /brokers/ids/1* (is it secure? 
> false) (kafka.zk.KafkaZkClient)
>  [2018-07-14 04:38:26,500] INFO Processing notification(s) to /config/changes 
> (kafka.common.ZkNodeChangeNotificationListener)
>  *[2018-07-14 04:38:26,547] ERROR Error while 

[jira] [Commented] (KAFKA-7165) Error while creating ephemeral at /brokers/ids/BROKER_ID

2018-10-30 Thread Jonathan Santilli (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668672#comment-16668672
 ] 

Jonathan Santilli commented on KAFKA-7165:
--

Thanks for your reply [~junrao] , that was the only time we had that issue of 
the _org.apache.zookeeper.KeeperException$SessionExpiredException_ so far.

Now we are in version 2.0 of Kafka and from time to time suffering the 
*NODEEXISTS* issue.

Maybe the errors were related but difficult to ensure that, hopefully with the 
fix, we can get rid of the *NODEEXISTS* error*.*

 

Cheers!

> Error while creating ephemeral at /brokers/ids/BROKER_ID
> 
>
> Key: KAFKA-7165
> URL: https://issues.apache.org/jira/browse/KAFKA-7165
> Project: Kafka
>  Issue Type: Bug
>  Components: core, zkclient
>Affects Versions: 1.1.0
>Reporter: Jonathan Santilli
>Assignee: Jonathan Santilli
>Priority: Major
>
> Kafka version: 1.1.0
> Zookeeper version: 3.4.12
> 4 Kafka Brokers
> 4 Zookeeper servers
>  
> In one of the 4 brokers of the cluster, we detect the following error:
> [2018-07-14 04:38:23,784] INFO Unable to read additional data from server 
> sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,509] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,510] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_1:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:24,513] INFO Unable to read additional data from server 
> sessionid 0x3000c2420cb458d, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_2:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,287] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_2:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:25,954] INFO [Partition TOPIC_NAME-PARTITION-# broker=1|#* 
> broker=1] Shrinking ISR from 1,3,4,2 to 1,4,2 (kafka.cluster.Partition)
>  [2018-07-14 04:38:26,444] WARN Unable to reconnect to ZooKeeper service, 
> session 0x3000c2420cb458d has expired (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,444] INFO Unable to reconnect to ZooKeeper service, 
> session 0x3000c2420cb458d has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,445] INFO EventThread shut down for session: 
> 0x3000c2420cb458d (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,446] INFO [ZooKeeperClient] Session expired. 
> (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO [ZooKeeperClient] Initializing a new session 
> to 
> *ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*.
>  (kafka.zookeeper.ZooKeeperClient)
>  [2018-07-14 04:38:26,459] INFO Initiating client connection, 
> connectString=*ZOOKEEPER_SERVER_1:PORT*,*ZOOKEEPER_SERVER_2:PORT*,*ZOOKEEPER_SERVER_3:PORT*,*ZOOKEEPER_SERVER_4:PORT*
>  sessionTimeout=6000 
> watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@44821a96 
> (org.apache.zookeeper.ZooKeeper)
>  [2018-07-14 04:38:26,465] INFO Opening socket connection to server 
> *ZOOKEEPER_SERVER_1:PORT*. Will not attempt to authenticate using SASL 
> (unknown error) (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,477] INFO Socket connection established to 
> *ZOOKEEPER_SERVER_1:PORT*, initiating session 
> (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,484] INFO Session establishment complete on server 
> *ZOOKEEPER_SERVER_1:PORT*, sessionid = 0x4005b59eb6a, negotiated timeout 
> = 6000 (org.apache.zookeeper.ClientCnxn)
>  [2018-07-14 04:38:26,496] *INFO Creating /brokers/ids/1* (is it secure? 
> false) (kafka.zk.KafkaZkClient)
>  [2018-07-14 04:38:26,500] INFO Processing notification(s) to /config/changes 
> (kafka.common.ZkNodeChangeNotificationListener)
>  *[2018-07-14 04:38:26,547] ERROR Error while creating ephemeral at 
> /brokers/ids/1, node already exists and owner '216186131422332301' does not 
> match current session '288330817911521280' 
> (kafka.zk.KafkaZkClient$CheckedEphemeral)*
>  [2018-07-14 04:38:26,547] *INFO Result of znode creation at /brokers/ids/1 
> is: NODEEXISTS* (kafka.zk.KafkaZkClient)
>  [2018-07-14 04:38:26,559] ERROR Uncaught exception in scheduled task 
> 'isr-expiration' (kafka.utils.KafkaScheduler)
> 

[jira] [Created] (KAFKA-7569) Kafka doesnt appear to cleanup dangling partitions

2018-10-30 Thread Dao Quang Minh (JIRA)
Dao Quang Minh created KAFKA-7569:
-

 Summary: Kafka doesnt appear to cleanup dangling partitions
 Key: KAFKA-7569
 URL: https://issues.apache.org/jira/browse/KAFKA-7569
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dao Quang Minh


In our current cluster running kafka 1.0.0, we recently observed that Kafka 
doesnt cleanup dangling partitions ( i.e. partion data on disk, but partition 
is not assigned to the current broker anymore ).

For example of the dangling partition data, we have:

{code}
total 26G
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:19 352433304663.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:16 352414164340.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:24 352466972892.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:23 352457368236.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:17 352423709566.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:21 352447702369.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:20 352442921890.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:22 352452551548.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:17 352418945305.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:18 352428477361.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:15 352409416538.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:24 352462192103.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:20 352438136012.log
-rw-r--r-- 1 kafka kafka 1.8G Aug 6 17:43 352471757311.log
-rw-r--r-- 1 kafka kafka 10M Oct 16 21:44 352471757311.index
-rw-r--r-- 1 kafka kafka 10M Oct 16 21:44 352471757311.timeindex
drwxr-xr-x 2 kafka kafka 4.0K Oct 8 15:27 .
drwxr-xr-x 49 kafka kafka 4.0K Oct 30 11:21 ..
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352414164340.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352423709566.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352433304663.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352447702369.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352457368236.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352466972892.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352409416538.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352418945305.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352428477361.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352438136012.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352442921890.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352452551548.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352462192103.timeindex
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352414164340.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352423709566.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352433304663.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352447702369.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352457368236.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352466972892.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352409416538.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352418945305.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352428477361.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352438136012.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352442921890.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352452551548.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352462192103.index
-rw-r--r-- 1 kafka kafka 20 Aug 6 16:23 leader-epoch-checkpoint
-rw-r--r-- 1 kafka kafka 10 Aug 6 16:24 352466972892.snapshot
-rw-r--r-- 1 kafka kafka 10 Aug 6 16:24 352471757311.snapshot
-rw-r--r-- 1 kafka kafka 10 Oct 8 15:27 352476186724.snapshot
{code}

I'm unsure how we ended up in this situation as partition data should be marked 
as removed and eventually remove when it's not assigned to the broker anymore. 
But in this edge case, should Kafka detect that automatically when it loads the 
partition and re-mark it as to be deleted again ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-2607) Review `Time` interface and its usage

2018-10-30 Thread Aleksei Kogan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668325#comment-16668325
 ] 

Aleksei Kogan commented on KAFKA-2607:
--

[~ijuma] Hello! I would like to help with this task, as existing pull request 
is completely outdated. Could you assign it to me?

> Review `Time` interface and its usage
> -
>
> Key: KAFKA-2607
> URL: https://issues.apache.org/jira/browse/KAFKA-2607
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.11.0.0, 1.0.0
>Reporter: Ismael Juma
>Priority: Major
>  Labels: newbie
>
> Two of `Time` interface's methods are `milliseconds` and `nanoseconds` which 
> are implemented in `SystemTime` as follows:
> {code}
> @Override
> public long milliseconds() {
> return System.currentTimeMillis();
> }
> @Override
> public long nanoseconds() {
> return System.nanoTime();
> }
> {code}
> The issue with this interface is that it makes it seem that the difference is 
> about the unit (`ms` versus `ns`) whereas it's much more than that:
> https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
> We should probably change the names of the methods and review our usage to 
> see if we're using the right one in the various places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7568) Return leader epoch in ListOffsets responses

2018-10-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668262#comment-16668262
 ] 

ASF GitHub Bot commented on KAFKA-7568:
---

hachikuji opened a new pull request #5855: KAFKA-7568; Return leader epoch in 
ListOffsets response
URL: https://github.com/apache/kafka/pull/5855
 
 
   As part of KIP-320, the ListOffsets API should return the leader epoch of 
any fetched offset. We either get this epoch from the log itself for a 
timestamp query or from the epoch cache if we are searching the earliest or 
latest offset in the log. When handling queries for the latest offset, we have 
elected to choose the current leader epoch, which is consistent with other 
handling (e.g. OffsetsForTimes).
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return leader epoch in ListOffsets responses
> 
>
> Key: KAFKA-7568
> URL: https://issues.apache.org/jira/browse/KAFKA-7568
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> This is part of KIP-320. The changes to the API have already been made, but 
> currently we return unknown epoch. We need to update the logic to search for 
> the epoch corresponding to a fetched offset in the leader epoch cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)