Re: BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

2019-11-27 Thread Daniyar Kulakhmetov
bumping this up with new update:

I've investigated another occurrence of this exception.

For analyzes, I used:
1) a memory dump that was taken from the broker
2) kafka log file
3) kafka state-change log
4) log, index and time-index files of a failed segment
5) Kafka source code, version 2.3.1 and 1.1.0

Here's how the exception looks like in the kafka log:

2019/11/19 16:03:00 INFO [ProducerStateManager
partition=ad_group_metrics-62] Writing producer snapshot at offset 13886052
(kafka.log.ProducerStateManager) 2019/11/19 16:03:00 INFO [Log
partition=ad_group_metrics-62, dir=/mnt/kafka] Rolled new log segment at
offset 13886052 in 1 ms. (kafka.log.Log) 2019/11/19 16:03:00 ERROR
[ReplicaManager broker=17] Error processing append operation on partition
ad_group_metrics-62 (kafka.server.ReplicaManager) 2019/11/19 16:03:00
java.nio.BufferOverflowException 2019/11/19 16:03:00 at
java.nio.Buffer.nextPutIndex(Buffer.java:527) 2019/11/19 16:03:00 at
java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797) 2019/11/19
16:03:00 at kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
2019/11/19 16:03:00 at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
2019/11/19 16:03:00 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
2019/11/19 16:03:00 at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
2019/11/19 16:03:00 at
kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
2019/11/19 16:03:00 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
2019/11/19 16:03:00 at
kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690) 2019/11/19 16:03:00
at scala.Option.foreach(Option.scala:407) 2019/11/19 16:03:00 at
kafka.log.Log.$anonfun$roll$2(Log.scala:1690) 2019/11/19 16:03:00 at
kafka.log.Log.maybeHandleIOException(Log.scala:2085) 2019/11/19 16:03:00 at
kafka.log.Log.roll(Log.scala:1654) 2019/11/19 16:03:00 at
kafka.log.Log.maybeRoll(Log.scala:1639) 2019/11/19 16:03:00 at
kafka.log.Log.$anonfun$append$2(Log.scala:966) 2019/11/19 16:03:00 at
kafka.log.Log.maybeHandleIOException(Log.scala:2085) 2019/11/19 16:03:00 at
kafka.log.Log.append(Log.scala:850) 2019/11/19 16:03:00 at
kafka.log.Log.appendAsLeader(Log.scala:819) 2019/11/19 16:03:00 at
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
2019/11/19 16:03:00 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
2019/11/19 16:03:00 at
kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259) 2019/11/19 16:03:00
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
2019/11/19 16:03:00 at
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
2019/11/19 16:03:00 at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
2019/11/19 16:03:00 at
scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
2019/11/19 16:03:00 at
scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
2019/11/19 16:03:00 at
scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
2019/11/19 16:03:00 at
scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) 2019/11/19
16:03:00 at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
2019/11/19 16:03:00 at
scala.collection.TraversableLike.map(TraversableLike.scala:238) 2019/11/19
16:03:00 at
scala.collection.TraversableLike.map$(TraversableLike.scala:231) 2019/11/19
16:03:00 at scala.collection.AbstractTraversable.map(Traversable.scala:108)
2019/11/19 16:03:00 at
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
2019/11/19 16:03:00 at
kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
2019/11/19 16:03:00 at
kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544) 2019/11/19
16:03:00 at kafka.server.KafkaApis.handle(KafkaApis.scala:113) 2019/11/19
16:03:00 at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
2019/11/19 16:03:00 at java.lang.Thread.run(Thread.java:748) ...


What we see here, is that a new segment was rolled out at the offset
13886052 and then an exception happened while trying to make *some* segment
as inactive (`onBecomeInactiveSegment`) on appending new messages to the
Log. The timing of the rolling out of a new segment and appending new
messages doesn't play a role. There are many other similar exceptions where
this occurs a few seconds after rolling out of a new segment.

I managed to find the `LogSegment` object for the offset 13886052 in the
memory dump. I followed the source code logic, checking the LogSegment
state and Kafka logs, and found that the `TimeIndex` object somehow went
into the state with 0 entries and 0 max possible entries (and an empty
memory map). Having 0 entries is normal for TimeIndex and OffsetIndex even
if there are some records in the Log unless their size passes some
threshold. But having 0 max possible entries along with 0 entries made the
TimeIndex considered as full (0 entries == 0 max entries) and was
triggering the rolling out a new segment. The Log was 

Re: BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

2019-11-19 Thread Daniyar Kulakhmetov
Filed JIRA bug:
https://issues.apache.org/jira/browse/KAFKA-9213



On Tue, Nov 19, 2019 at 2:58 PM Ismael Juma  wrote:

> Can you please file a JIRA?
>
> Ismael
>
> On Tue, Nov 19, 2019 at 11:52 AM Daniyar Kulakhmetov <
> dkulakhme...@liftoff.io> wrote:
>
> > Hi Kafka users,
> >
> > We updated our Kafka cluster from 1.1.0 version to 2.3.1.
> > Message format and inter-broker protocol versions left the same:
> >
> > inter.broker.protocol.version=1.1
> > log.message.format.version=1.1
> >
> > After upgrading, we started to get some occasional exceptions:
> >
> > 2019/11/19 05:30:53 INFO [ProducerStateManager
> > partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at
> > offset 788532 (kafka.log.ProducerStateManager)
> > 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2,
> > dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms.
> > (kafka.log.Log)
> > 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing
> append
> > operation on partition matchmaker_retry_clicks_15m-2
> > (kafka.server.ReplicaManager)
> > 2019/11/19 05:31:01 java.nio.BufferOverflowException
> > 2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527)
> > 2019/11/19 05:31:01 at
> > java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797)
> > 2019/11/19 05:31:01 at
> > kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
> > 2019/11/19 05:31:01 at
> > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> > 2019/11/19 05:31:01 at
> > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> > 2019/11/19 05:31:01 at
> > kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
> > 2019/11/19 05:31:01 at
> > kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
> > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
> > 2019/11/19 05:31:01 at
> > kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690)
> > 2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407)
> > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690)
> > 2019/11/19 05:31:01 at
> > kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> > 2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654)
> > 2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639)
> > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966)
> > 2019/11/19 05:31:01 at
> > kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> > 2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850)
> > 2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819)
> > 2019/11/19 05:31:01 at
> >
> >
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> > 2019/11/19 05:31:01 at
> > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> > 2019/11/19 05:31:01 at
> > kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> > 2019/11/19 05:31:01 at
> > kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> > 2019/11/19 05:31:01 at
> >
> >
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> > 2019/11/19 05:31:01 at
> >
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> > 2019/11/19 05:31:01 at
> > scala.collection.TraversableLike.map(TraversableLike.scala:238)
> > 2019/11/19 05:31:01 at
> > scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> > 2019/11/19 05:31:01 at
> > scala.collection.AbstractTraversable.map(Traversable.scala:108)
> > 2019/11/19 05:31:01 at
> > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
> > 2019/11/19 05:31:01 at
> > kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
> > 2019/11/19 05:31:01 at
> > kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544)
> > 2019/11/19 05:31:01 at
> > kafka.server.KafkaApis.handle(KafkaApis.scala:113)
> > 2019/11/19 05:31:01 at
> > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
> > 2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748)
> >
> >
> > This error persists until broker gets restarted (or leadership get moved
> to
> > another broker).
> >
> > What could be the issue and how we can solve it?
> >
> > Thank you!
> >
> > Best regards,
> > Daniyar.
> >
>


Re: BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

2019-11-19 Thread Ismael Juma
Can you please file a JIRA?

Ismael

On Tue, Nov 19, 2019 at 11:52 AM Daniyar Kulakhmetov <
dkulakhme...@liftoff.io> wrote:

> Hi Kafka users,
>
> We updated our Kafka cluster from 1.1.0 version to 2.3.1.
> Message format and inter-broker protocol versions left the same:
>
> inter.broker.protocol.version=1.1
> log.message.format.version=1.1
>
> After upgrading, we started to get some occasional exceptions:
>
> 2019/11/19 05:30:53 INFO [ProducerStateManager
> partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at
> offset 788532 (kafka.log.ProducerStateManager)
> 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2,
> dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms.
> (kafka.log.Log)
> 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append
> operation on partition matchmaker_retry_clicks_15m-2
> (kafka.server.ReplicaManager)
> 2019/11/19 05:31:01 java.nio.BufferOverflowException
> 2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527)
> 2019/11/19 05:31:01 at
> java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797)
> 2019/11/19 05:31:01 at
> kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
> 2019/11/19 05:31:01 at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 2019/11/19 05:31:01 at
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> 2019/11/19 05:31:01 at
> kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
> 2019/11/19 05:31:01 at
> kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
> 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
> 2019/11/19 05:31:01 at
> kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690)
> 2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407)
> 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690)
> 2019/11/19 05:31:01 at
> kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> 2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654)
> 2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639)
> 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966)
> 2019/11/19 05:31:01 at
> kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> 2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850)
> 2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819)
> 2019/11/19 05:31:01 at
>
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> 2019/11/19 05:31:01 at
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> 2019/11/19 05:31:01 at
> kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> 2019/11/19 05:31:01 at
> kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> 2019/11/19 05:31:01 at
>
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> 2019/11/19 05:31:01 at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> 2019/11/19 05:31:01 at
> scala.collection.TraversableLike.map(TraversableLike.scala:238)
> 2019/11/19 05:31:01 at
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> 2019/11/19 05:31:01 at
> scala.collection.AbstractTraversable.map(Traversable.scala:108)
> 2019/11/19 05:31:01 at
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
> 2019/11/19 05:31:01 at
> kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
> 2019/11/19 05:31:01 at
> kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544)
> 2019/11/19 05:31:01 at
> kafka.server.KafkaApis.handle(KafkaApis.scala:113)
> 2019/11/19 05:31:01 at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
> 2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748)
>
>
> This error persists until broker gets restarted (or leadership get moved to
> another broker).
>
> What could be the issue and how we can solve it?
>
> Thank you!
>
> Best regards,
> Daniyar.
>


Re: BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

2019-11-19 Thread Daniyar Kulakhmetov
Hi,

We followed the upgrade instruction (
https://kafka.apache.org/documentation/#upgrade) up to step 2, and as it is
said in step 3
"Once the cluster's behavior and performance has been verified, bump the
protocol version by editing" we were verifying cluster's behavior.

Thanks,

On Tue, Nov 19, 2019 at 12:19 PM M. Manna  wrote:

> Hi,
>
> Is there any reason why you haven’t performed the upgrade based on official
> docs ? Or, is this something you’re planning to do now?
>
> Thanks,
>
> On Tue, 19 Nov 2019 at 19:52, Daniyar Kulakhmetov  >
> wrote:
>
> > Hi Kafka users,
> >
> > We updated our Kafka cluster from 1.1.0 version to 2.3.1.
> > Message format and inter-broker protocol versions left the same:
> >
> > inter.broker.protocol.version=1.1
> > log.message.format.version=1.1
> >
> > After upgrading, we started to get some occasional exceptions:
> >
> > 2019/11/19 05:30:53 INFO [ProducerStateManager
> > partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at
> > offset 788532 (kafka.log.ProducerStateManager)
> > 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2,
> > dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms.
> > (kafka.log.Log)
> > 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing
> append
> > operation on partition matchmaker_retry_clicks_15m-2
> > (kafka.server.ReplicaManager)
> > 2019/11/19 05:31:01 java.nio.BufferOverflowException
> > 2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527)
> > 2019/11/19 05:31:01 at
> > java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797)
> > 2019/11/19 05:31:01 at
> > kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
> > 2019/11/19 05:31:01 at
> > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> > 2019/11/19 05:31:01 at
> > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> > 2019/11/19 05:31:01 at
> > kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
> > 2019/11/19 05:31:01 at
> > kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
> > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
> > 2019/11/19 05:31:01 at
> > kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690)
> > 2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407)
> > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690)
> > 2019/11/19 05:31:01 at
> > kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> > 2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654)
> > 2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639)
> > 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966)
> > 2019/11/19 05:31:01 at
> > kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> > 2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850)
> > 2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819)
> > 2019/11/19 05:31:01 at
> >
> >
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> > 2019/11/19 05:31:01 at
> > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> > 2019/11/19 05:31:01 at
> > kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> > 2019/11/19 05:31:01 at
> > kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> > 2019/11/19 05:31:01 at
> >
> >
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> > 2019/11/19 05:31:01 at
> >
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> > 2019/11/19 05:31:01 at
> > scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> > 2019/11/19 05:31:01 at
> > scala.collection.TraversableLike.map(TraversableLike.scala:238)
> > 2019/11/19 05:31:01 at
> > scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> > 2019/11/19 05:31:01 at
> > scala.collection.AbstractTraversable.map(Traversable.scala:108)
> > 2019/11/19 05:31:01 at
> > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
> > 2019/11/19 05:31:01 at
> > kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
> > 2019/11/19 05:31:01 at
> > kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544)
> > 2019/11/19 05:31:01 at
> > kafka.server.KafkaApis.handle(KafkaApis.scala:113)
> > 2019/11/19 05:31:01 at
> > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
> > 2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748)
> >
> >
> > This error persists until broker gets 

Re: BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

2019-11-19 Thread M. Manna
Hi,

Is there any reason why you haven’t performed the upgrade based on official
docs ? Or, is this something you’re planning to do now?

Thanks,

On Tue, 19 Nov 2019 at 19:52, Daniyar Kulakhmetov 
wrote:

> Hi Kafka users,
>
> We updated our Kafka cluster from 1.1.0 version to 2.3.1.
> Message format and inter-broker protocol versions left the same:
>
> inter.broker.protocol.version=1.1
> log.message.format.version=1.1
>
> After upgrading, we started to get some occasional exceptions:
>
> 2019/11/19 05:30:53 INFO [ProducerStateManager
> partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at
> offset 788532 (kafka.log.ProducerStateManager)
> 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2,
> dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms.
> (kafka.log.Log)
> 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append
> operation on partition matchmaker_retry_clicks_15m-2
> (kafka.server.ReplicaManager)
> 2019/11/19 05:31:01 java.nio.BufferOverflowException
> 2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527)
> 2019/11/19 05:31:01 at
> java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797)
> 2019/11/19 05:31:01 at
> kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
> 2019/11/19 05:31:01 at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 2019/11/19 05:31:01 at
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> 2019/11/19 05:31:01 at
> kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
> 2019/11/19 05:31:01 at
> kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
> 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
> 2019/11/19 05:31:01 at
> kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690)
> 2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407)
> 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690)
> 2019/11/19 05:31:01 at
> kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> 2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654)
> 2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639)
> 2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966)
> 2019/11/19 05:31:01 at
> kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> 2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850)
> 2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819)
> 2019/11/19 05:31:01 at
>
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> 2019/11/19 05:31:01 at
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> 2019/11/19 05:31:01 at
> kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> 2019/11/19 05:31:01 at
> kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> 2019/11/19 05:31:01 at
>
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> 2019/11/19 05:31:01 at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> 2019/11/19 05:31:01 at
> scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> 2019/11/19 05:31:01 at
> scala.collection.TraversableLike.map(TraversableLike.scala:238)
> 2019/11/19 05:31:01 at
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> 2019/11/19 05:31:01 at
> scala.collection.AbstractTraversable.map(Traversable.scala:108)
> 2019/11/19 05:31:01 at
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
> 2019/11/19 05:31:01 at
> kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
> 2019/11/19 05:31:01 at
> kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544)
> 2019/11/19 05:31:01 at
> kafka.server.KafkaApis.handle(KafkaApis.scala:113)
> 2019/11/19 05:31:01 at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
> 2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748)
>
>
> This error persists until broker gets restarted (or leadership get moved to
> another broker).
>
> What could be the issue and how we can solve it?
>
> Thank you!
>
> Best regards,
> Daniyar.
>


BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

2019-11-19 Thread Daniyar Kulakhmetov
Hi Kafka users,

We updated our Kafka cluster from 1.1.0 version to 2.3.1.
Message format and inter-broker protocol versions left the same:

inter.broker.protocol.version=1.1
log.message.format.version=1.1

After upgrading, we started to get some occasional exceptions:

2019/11/19 05:30:53 INFO [ProducerStateManager
partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at
offset 788532 (kafka.log.ProducerStateManager)
2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2,
dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms.
(kafka.log.Log)
2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append
operation on partition matchmaker_retry_clicks_15m-2
(kafka.server.ReplicaManager)
2019/11/19 05:31:01 java.nio.BufferOverflowException
2019/11/19 05:31:01 at java.nio.Buffer.nextPutIndex(Buffer.java:527)
2019/11/19 05:31:01 at
java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797)
2019/11/19 05:31:01 at
kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
2019/11/19 05:31:01 at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
2019/11/19 05:31:01 at
kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
2019/11/19 05:31:01 at
kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
2019/11/19 05:31:01 at
kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
2019/11/19 05:31:01 at
kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690)
2019/11/19 05:31:01 at scala.Option.foreach(Option.scala:407)
2019/11/19 05:31:01 at kafka.log.Log.$anonfun$roll$2(Log.scala:1690)
2019/11/19 05:31:01 at
kafka.log.Log.maybeHandleIOException(Log.scala:2085)
2019/11/19 05:31:01 at kafka.log.Log.roll(Log.scala:1654)
2019/11/19 05:31:01 at kafka.log.Log.maybeRoll(Log.scala:1639)
2019/11/19 05:31:01 at kafka.log.Log.$anonfun$append$2(Log.scala:966)
2019/11/19 05:31:01 at
kafka.log.Log.maybeHandleIOException(Log.scala:2085)
2019/11/19 05:31:01 at kafka.log.Log.append(Log.scala:850)
2019/11/19 05:31:01 at kafka.log.Log.appendAsLeader(Log.scala:819)
2019/11/19 05:31:01 at
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
2019/11/19 05:31:01 at
kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
2019/11/19 05:31:01 at
kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
2019/11/19 05:31:01 at
kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
2019/11/19 05:31:01 at
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
2019/11/19 05:31:01 at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
2019/11/19 05:31:01 at
scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
2019/11/19 05:31:01 at
scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
2019/11/19 05:31:01 at
scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
2019/11/19 05:31:01 at
scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
2019/11/19 05:31:01 at
scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
2019/11/19 05:31:01 at
scala.collection.TraversableLike.map(TraversableLike.scala:238)
2019/11/19 05:31:01 at
scala.collection.TraversableLike.map$(TraversableLike.scala:231)
2019/11/19 05:31:01 at
scala.collection.AbstractTraversable.map(Traversable.scala:108)
2019/11/19 05:31:01 at
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
2019/11/19 05:31:01 at
kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
2019/11/19 05:31:01 at
kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544)
2019/11/19 05:31:01 at
kafka.server.KafkaApis.handle(KafkaApis.scala:113)
2019/11/19 05:31:01 at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
2019/11/19 05:31:01 at java.lang.Thread.run(Thread.java:748)


This error persists until broker gets restarted (or leadership get moved to
another broker).

What could be the issue and how we can solve it?

Thank you!

Best regards,
Daniyar.