[jira] [Resolved] (KAFKA-6486) TimeWindows causes unordered calls to windowed aggregation functions

2018-03-18 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6486.

Resolution: Fixed

> TimeWindows causes unordered calls to windowed aggregation functions
> 
>
> Key: KAFKA-6486
> URL: https://issues.apache.org/jira/browse/KAFKA-6486
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Valentino Proietti
>Assignee: Asutosh Pandya
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: KAFKA-6486.patch
>
>
> This is not a real bug but it causes some weird behaviour, at least in my 
> opinion.
> The TimeWindows has a method called windowsFor() that uses and returns an 
> HashMap:
>     @Override
>     *public* Map windowsFor(*final* *long* timestamp) {
>         *long* windowStart = (Math._max_(0, timestamp - sizeMs + advanceMs) / 
> advanceMs) * advanceMs;
>         *final* Map windows = *new* HashMap<>();
>         
> the HashMap does not preserve the order of insertion and this ends up later 
> in calls to any streams windowed aggregation functions that are not ordered 
> by window time as I would expect.
> A simple solution is to replace the HashMap with a LinkedHashMap and that's 
> what I did.
> Anyway replacing it directly in your code can save hours of debugging to 
> understand what's happening.
> Thank you 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-6486) TimeWindows causes unordered calls to windowed aggregation functions

2018-03-18 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-6486:
--

Assignee: Asutosh Pandya

> TimeWindows causes unordered calls to windowed aggregation functions
> 
>
> Key: KAFKA-6486
> URL: https://issues.apache.org/jira/browse/KAFKA-6486
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Valentino Proietti
>Assignee: Asutosh Pandya
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: KAFKA-6486.patch
>
>
> This is not a real bug but it causes some weird behaviour, at least in my 
> opinion.
> The TimeWindows has a method called windowsFor() that uses and returns an 
> HashMap:
>     @Override
>     *public* Map windowsFor(*final* *long* timestamp) {
>         *long* windowStart = (Math._max_(0, timestamp - sizeMs + advanceMs) / 
> advanceMs) * advanceMs;
>         *final* Map windows = *new* HashMap<>();
>         
> the HashMap does not preserve the order of insertion and this ends up later 
> in calls to any streams windowed aggregation functions that are not ordered 
> by window time as I would expect.
> A simple solution is to replace the HashMap with a LinkedHashMap and that's 
> what I did.
> Anyway replacing it directly in your code can save hours of debugging to 
> understand what's happening.
> Thank you 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6486) TimeWindows causes unordered calls to windowed aggregation functions

2018-03-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404345#comment-16404345
 ] 

ASF GitHub Bot commented on KAFKA-6486:
---

mjsax closed pull request #4628: KAFKA-6486: Implemented LinkedHashMap in 
TimeWindows
URL: https://github.com/apache/kafka/pull/4628
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindows.java 
b/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindows.java
index f9090c594cb..c2b910df5b1 100644
--- a/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindows.java
+++ b/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindows.java
@@ -19,7 +19,7 @@
 import org.apache.kafka.streams.kstream.internals.TimeWindow;
 import org.apache.kafka.streams.processor.TimestampExtractor;
 
-import java.util.HashMap;
+import java.util.LinkedHashMap;
 import java.util.Map;
 
 /**
@@ -105,7 +105,7 @@ public TimeWindows advanceBy(final long advanceMs) {
 @Override
 public Map windowsFor(final long timestamp) {
 long windowStart = (Math.max(0, timestamp - sizeMs + advanceMs) / 
advanceMs) * advanceMs;
-final Map windows = new HashMap<>();
+final Map windows = new LinkedHashMap<>();
 while (windowStart <= timestamp) {
 final TimeWindow window = new TimeWindow(windowStart, windowStart 
+ sizeMs);
 windows.put(windowStart, window);
diff --git 
a/streams/src/test/java/org/apache/kafka/streams/kstream/internals/TimeWindowTest.java
 
b/streams/src/test/java/org/apache/kafka/streams/kstream/internals/TimeWindowTest.java
index 09ac173b737..f260bee62ef 100644
--- 
a/streams/src/test/java/org/apache/kafka/streams/kstream/internals/TimeWindowTest.java
+++ 
b/streams/src/test/java/org/apache/kafka/streams/kstream/internals/TimeWindowTest.java
@@ -16,8 +16,12 @@
  */
 package org.apache.kafka.streams.kstream.internals;
 
+import org.apache.kafka.streams.kstream.TimeWindows;
 import org.junit.Test;
 
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
 
@@ -117,4 +121,15 @@ public void 
shouldNotOverlapIsOtherWindowIsAfterThisWindow() {
 public void cannotCompareTimeWindowWithDifferentWindowType() {
 window.overlap(sessionWindow);
 }
+
+@Test
+public void shouldReturnMatchedWindowsOrderedByTimestamp() {
+final TimeWindows windows = TimeWindows.of(12L).advanceBy(5L);
+final Map matched = windows.windowsFor(21L);
+
+final Long[] expected = matched.keySet().toArray(new 
Long[matched.size()]);
+assertEquals(expected[0].longValue(), 10L);
+assertEquals(expected[1].longValue(), 15L);
+assertEquals(expected[2].longValue(), 20L);
+}
 }
\ No newline at end of file


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TimeWindows causes unordered calls to windowed aggregation functions
> 
>
> Key: KAFKA-6486
> URL: https://issues.apache.org/jira/browse/KAFKA-6486
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Valentino Proietti
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: KAFKA-6486.patch
>
>
> This is not a real bug but it causes some weird behaviour, at least in my 
> opinion.
> The TimeWindows has a method called windowsFor() that uses and returns an 
> HashMap:
>     @Override
>     *public* Map windowsFor(*final* *long* timestamp) {
>         *long* windowStart = (Math._max_(0, timestamp - sizeMs + advanceMs) / 
> advanceMs) * advanceMs;
>         *final* Map windows = *new* HashMap<>();
>         
> the HashMap does not preserve the order of insertion and this ends up later 
> in calls to any streams windowed aggregation functions that are not ordered 
> by window time as I would expect.
> A simple solution is to replace the HashMap with a LinkedHashMap and that's 
> what I did.
> Anyway replacing it directly in your code can save hours of debugging to 
> understand what's happening.
> Thank you 
>  



--
This message was sent by 

[jira] [Updated] (KAFKA-6486) TimeWindows causes unordered calls to windowed aggregation functions

2018-03-18 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6486:
---
Fix Version/s: 1.2.0

> TimeWindows causes unordered calls to windowed aggregation functions
> 
>
> Key: KAFKA-6486
> URL: https://issues.apache.org/jira/browse/KAFKA-6486
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Valentino Proietti
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: KAFKA-6486.patch
>
>
> This is not a real bug but it causes some weird behaviour, at least in my 
> opinion.
> The TimeWindows has a method called windowsFor() that uses and returns an 
> HashMap:
>     @Override
>     *public* Map windowsFor(*final* *long* timestamp) {
>         *long* windowStart = (Math._max_(0, timestamp - sizeMs + advanceMs) / 
> advanceMs) * advanceMs;
>         *final* Map windows = *new* HashMap<>();
>         
> the HashMap does not preserve the order of insertion and this ends up later 
> in calls to any streams windowed aggregation functions that are not ordered 
> by window time as I would expect.
> A simple solution is to replace the HashMap with a LinkedHashMap and that's 
> what I did.
> Anyway replacing it directly in your code can save hours of debugging to 
> understand what's happening.
> Thank you 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6520) When a Kafka Stream can't communicate with the server, it's Status stays RUNNING

2018-03-18 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404326#comment-16404326
 ] 

Matthias J. Sax commented on KAFKA-6520:


This issue is target to Kafka's Streams API, ie, if one starts a 
{{KafkaStreams}} instance. Check the docs for details about Streams API: 
[https://kafka.apache.org/10/documentation/streams/] Hope this helps.

> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> 
>
> Key: KAFKA-6520
> URL: https://issues.apache.org/jira/browse/KAFKA-6520
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michael Kohout
>Assignee: Milind Jain
>Priority: Major
>  Labels: newbie, user-experience
>
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
> See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  
> [This|https://issues.apache.org/jira/browse/KAFKA-4564] is a link to a 
> related issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6659) Improve error message if state store is not found

2018-03-18 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404323#comment-16404323
 ] 

Matthias J. Sax commented on KAFKA-6659:


Thanks for picking this up.

I think it would be helpful to explain what users need to do to fix the issue. 
Something like "did you connect to store to the processor".

> Improve error message if state store is not found
> -
>
> Key: KAFKA-6659
> URL: https://issues.apache.org/jira/browse/KAFKA-6659
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Trivial
>  Labels: beginner, easy-fix, newbie
>
> If a processor tries to access a store but the store is not connected to the 
> processor, Streams fails with
> {quote}Caused by: org.apache.kafka.streams.errors.TopologyBuilderException: 
> Invalid topology building: Processor KSTREAM-TRANSFORM-36 has no 
> access to StateStore questions-awaiting-answers-store
> {quote}
> We should improve this error message and give a hint to the user how to fix 
> the issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-5846) Use singleton NoOpConsumerRebalanceListener in subscribe() call where listener is not specified

2018-03-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191405#comment-16191405
 ] 

Ted Yu edited comment on KAFKA-5846 at 3/19/18 1:28 AM:


Patch looks good to me .


was (Author: yuzhih...@gmail.com):
Patch looks good to me.

> Use singleton NoOpConsumerRebalanceListener in subscribe() call where 
> listener is not specified
> ---
>
> Key: KAFKA-5846
> URL: https://issues.apache.org/jira/browse/KAFKA-5846
> Project: Kafka
>  Issue Type: Task
>  Components: clients
>Reporter: Ted Yu
>Assignee: Kamal Chandraprakash
>Priority: Minor
>
> Currently KafkaConsumer creates instance of NoOpConsumerRebalanceListener for 
> each subscribe() call where ConsumerRebalanceListener is not specified:
> {code}
> public void subscribe(Pattern pattern) {
> subscribe(pattern, new NoOpConsumerRebalanceListener());
> {code}
> We can create a singleton NoOpConsumerRebalanceListener to be used in such 
> scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-5946) Give connector method parameter better name

2018-03-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392411#comment-16392411
 ] 

Ted Yu edited comment on KAFKA-5946 at 3/19/18 1:27 AM:


Thanks for taking it .


was (Author: yuzhih...@gmail.com):
Thanks for taking it.

> Give connector method parameter better name
> ---
>
> Key: KAFKA-5946
> URL: https://issues.apache.org/jira/browse/KAFKA-5946
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Tanvi Jaywant
>Priority: Major
>  Labels: connector, newbie
>
> During the development of KAFKA-5657, there were several iterations where 
> method call didn't match what the connector parameter actually represents.
> [~ewencp] had used connType as equivalent to connClass because Type wasn't 
> used to differentiate source vs sink.
> [~ewencp] proposed the following:
> {code}
> It would help to convert all the uses of connType to connClass first, then 
> standardize on class == java class, type == source/sink, name == 
> user-specified name.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6520) When a Kafka Stream can't communicate with the server, it's Status stays RUNNING

2018-03-18 Thread Milind Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404180#comment-16404180
 ] 

Milind Jain commented on KAFKA-6520:


Hi [~mjsax] [~mwkohout]

I tried reproducing the issue

On using Storm kafka Spout I am getting the below exception on stopping the 
broker

{color:#FF}Caused by: java.nio.channels.ClosedChannelException{color}
{color:#FF} at 
kafka.network.BlockingChannel.send(BlockingChannel.scala:100) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
 ~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:112)
 ~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:112)
 ~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:112)
 ~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:111)
 ~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:111) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:111) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:110) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
kafka.javaapi.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:47) 
~[kafka_2.10-0.8.2.0.jar:?]{color}
{color:#FF} at 
org.apache.storm.kafka.KafkaUtils.fetchMessages(KafkaUtils.java:191) 
~[storm-kafka-1.0.3.jar:1.0.3]{color}

And in console consumer/producer I am getting

{color:#f6c342}WARN Connection to node 0 could not be established. Broker may 
not be available. (org.apache.kafka.clients.NetworkClient){color}

So what should I try doing next?

> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> 
>
> Key: KAFKA-6520
> URL: https://issues.apache.org/jira/browse/KAFKA-6520
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michael Kohout
>Assignee: Milind Jain
>Priority: Major
>  Labels: newbie, user-experience
>
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
> See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  
> [This|https://issues.apache.org/jira/browse/KAFKA-4564] is a link to a 
> related issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6679) Random corruption (CRC validation issues)

2018-03-18 Thread Ari Uka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Uka updated KAFKA-6679:
---
Description: 
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

The error popped up again the next day after fixing it tho, so I'm trying to 
find the root cause. 

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]. 

At first, I thought it could be the consumer libraries, but the error happens 
with kafka-console-consumer.sh as well when a specific message is corrupted in 
Kafka. I don't think it's possible for Kafka producers to actually push corrupt 
messages to Kafka and then cause all consumers to break right? I assume Kafka 
would reject corrupt messages, so I'm not sure what's going on here.

Should I just re-create the cluster, I don't think it's hardware failure across 
the 3 machines tho.

  was:
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]. 

At first, I thought it could be the consumer libraries, but the error happens 
with kafka-console-consumer.sh as well when a specific message is corrupted in 
Kafka. I don't think it's 

[jira] [Updated] (KAFKA-6679) Random corruption (CRC validation issues)

2018-03-18 Thread Ari Uka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Uka updated KAFKA-6679:
---
Description: 
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]. 

At first, I thought it could be the consumer libraries, but the error happens 
with kafka-console-consumer.sh as well when a specific message is corrupted in 
Kafka. I don't think it's possible for Kafka producers to actually push corrupt 
messages to Kafka and then cause all consumers to break right? I assume Kafka 
would reject corrupt messages, so I'm not sure what's going on here.

  was:
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]


> Random corruption (CRC validation issues) 
> --
>
> Key: KAFKA-6679
> URL: https://issues.apache.org/jira/browse/KAFKA-6679
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, replication
>Affects Versions: 0.10.2.0, 1.0.1
> Environment: FreeBSD 11.0-RELEASE-p8
>

[jira] [Updated] (KAFKA-6679) Random corruption (CRC validation issues)

2018-03-18 Thread Ari Uka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Uka updated KAFKA-6679:
---
Description: 
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]

  was:
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition telemetry-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.


I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]


> Random corruption (CRC validation issues) 
> --
>
> Key: KAFKA-6679
> URL: https://issues.apache.org/jira/browse/KAFKA-6679
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, replication
>Affects Versions: 0.10.2.0, 1.0.1
> Environment: FreeBSD 11.0-RELEASE-p8
>Reporter: Ari Uka
>Priority: Major
>
> I'm running into a really strange issue on production. I have 3 brokers and 
> randomly consumers will start to fail with an error message saying the CRC 
> does not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 
> with the hope that upgrading would help fix the issue.
> On the kafka side, I see errors 

[jira] [Updated] (KAFKA-6679) Random corruption (CRC validation issues)

2018-03-18 Thread Ari Uka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Uka updated KAFKA-6679:
---
Description: 
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition telemetry-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.


I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]

  was:
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:
{noformat}
[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)
[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)
org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).
[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition telemetry-b-0, offset 23848795 
(kafka.server.ReplicaManager)
org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)
[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread) 
{noformat}
 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

After pushing the offsets forward again, the issue came up again a few days 
later. I'm unsure of what to do here, there doesn't appear to be a tool to go 
through the logs and scan for corruption and fix it, has anyone ever run into 
this before?

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]. Is it even possible for Kafka 
producers to push messages to topics with corrupt messages. I thought perhaps 
the consumer logic was broken on my libraries, but the CRC issue also happens 
with the kafka-console-consumer,sh and other command line tools when it happens.


> Random corruption (CRC validation issues) 
> --
>
> Key: KAFKA-6679
> URL: https://issues.apache.org/jira/browse/KAFKA-6679
> Project: Kafka
>  Issue 

[jira] [Updated] (KAFKA-6679) Random corruption (CRC validation issues)

2018-03-18 Thread Ari Uka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Uka updated KAFKA-6679:
---
Description: 
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition telemetry-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

After pushing the offsets forward again, the issue came up again a few days 
later. I'm unsure of what to do here, there doesn't appear to be a tool to go 
through the logs and scan for corruption and fix it, has anyone ever run into 
this before?


I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]. Is it even possible for Kafka 
producers to push messages to topics with corrupt messages. I thought perhaps 
the consumer logic was broken on my libraries, but the CRC issue also happens 
with the kafka-console-consumer,sh and other command line tools when it happens.

> Random corruption (CRC validation issues) 
> --
>
> Key: KAFKA-6679
> URL: https://issues.apache.org/jira/browse/KAFKA-6679
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, replication
>Affects Versions: 0.10.2.0, 1.0.1
> Environment: FreeBSD 11.0-RELEASE-p8
>Reporter: Ari Uka
>Priority: Major
>
> I'm running into a really strange issue on production. I have 3 brokers and 
> randomly consumers will start to fail with an error message saying the CRC 
> does not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 
> with the hope that upgrading would help fix the issue.
> On the kafka side, I see errors related to this across all 3 brokers:
> ```
> [2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Error for partition topic-a-0 to broker 
> 1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
> failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
> (kafka.server.ReplicaFetcherThread)
> [2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
> fetch operation on partition topic-b-0, offset 23848795 
> (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
> than minimum record overhead (14).
> [2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
> fetch operation on partition telemetry-b-0, offset 23848795 
> (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
> than minimum record overhead (14)
> [2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
> fetcherId=0] Error for partition topic-c-2 to broker 
> 2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
> failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
> (kafka.server.ReplicaFetcherThread)
> ```
>  
> To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
> do a binary search until I can find a non corrupt message and push the 
> offsets forward. It's annoying because I can't actually 

[jira] [Updated] (KAFKA-6679) Random corruption (CRC validation issues)

2018-03-18 Thread Ari Uka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Uka updated KAFKA-6679:
---
Description: 
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:
{noformat}
[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)
[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)
org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).
[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition telemetry-b-0, offset 23848795 
(kafka.server.ReplicaManager)
org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)
[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread) 
{noformat}
 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

After pushing the offsets forward again, the issue came up again a few days 
later. I'm unsure of what to do here, there doesn't appear to be a tool to go 
through the logs and scan for corruption and fix it, has anyone ever run into 
this before?

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]. Is it even possible for Kafka 
producers to push messages to topics with corrupt messages. I thought perhaps 
the consumer logic was broken on my libraries, but the CRC issue also happens 
with the kafka-console-consumer,sh and other command line tools when it happens.

  was:
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition telemetry-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

After pushing the offsets forward again, the issue came up again a few days 
later. I'm unsure of what to do here, there doesn't appear to be a tool to go 
through the logs and scan for corruption and fix it, has anyone ever run into 
this before?


I'm using the Go consumer [https://github.com/Shopify/sarama] and 

[jira] [Created] (KAFKA-6679) Random corruption (CRC validation issues)

2018-03-18 Thread Ari Uka (JIRA)
Ari Uka created KAFKA-6679:
--

 Summary: Random corruption (CRC validation issues) 
 Key: KAFKA-6679
 URL: https://issues.apache.org/jira/browse/KAFKA-6679
 Project: Kafka
  Issue Type: Bug
  Components: consumer, replication
Affects Versions: 1.0.1, 0.10.2.0
 Environment: FreeBSD 11.0-RELEASE-p8
Reporter: Ari Uka






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6678) Upgrade dependencies with later release versions

2018-03-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated KAFKA-6678:
--
Attachment: k-update.txt

> Upgrade dependencies with later release versions
> 
>
> Key: KAFKA-6678
> URL: https://issues.apache.org/jira/browse/KAFKA-6678
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Major
> Attachments: k-update.txt
>
>
> {code}
> The following dependencies have later release versions:
>  - net.sourceforge.argparse4j:argparse4j [0.7.0 -> 0.8.1]
>  - org.bouncycastle:bcpkix-jdk15on [1.58 -> 1.59]
>  - com.puppycrawl.tools:checkstyle [6.19 -> 8.8]
>  - org.owasp:dependency-check-gradle [3.0.2 -> 3.1.1]
>  - org.ajoberstar:grgit [1.9.3 -> 2.1.1]
>  - org.glassfish.jersey.containers:jersey-container-servlet [2.25.1 -> 2.26]
>  - org.eclipse.jetty:jetty-client [9.2.24.v20180105 -> 9.4.8.v20171121]
>  - org.eclipse.jetty:jetty-server [9.2.24.v20180105 -> 9.4.8.v20171121]
>  - org.eclipse.jetty:jetty-servlet [9.2.24.v20180105 -> 9.4.8.v20171121]
>  - org.eclipse.jetty:jetty-servlets [9.2.24.v20180105 -> 9.4.8.v20171121]
>  - org.openjdk.jmh:jmh-core [1.19 -> 1.20]
>  - org.openjdk.jmh:jmh-core-benchmarks [1.19 -> 1.20]
>  - org.openjdk.jmh:jmh-generator-annprocess [1.19 -> 1.20]
>  - org.lz4:lz4-java [1.4 -> 1.4.1]
>  - org.apache.maven:maven-artifact [3.5.2 -> 3.5.3]
>  - org.jacoco:org.jacoco.agent [0.7.9 -> 0.8.0]
>  - org.jacoco:org.jacoco.ant [0.7.9 -> 0.8.0]
>  - org.rocksdb:rocksdbjni [5.7.3 -> 5.11.3]
>  - org.scala-lang:scala-library [2.11.12 -> 2.12.4]
>  - com.typesafe.scala-logging:scala-logging_2.11 [3.7.2 -> 3.8.0]
>  - org.scala-lang:scala-reflect [2.11.12 -> 2.12.4]
>  - org.scalatest:scalatest_2.11 [3.0.4 -> 3.0.5]
> {code}
> Looks like we can consider upgrading scalatest, jmh-core and checkstyle



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6678) Upgrade dependencies with later release versions

2018-03-18 Thread Ted Yu (JIRA)
Ted Yu created KAFKA-6678:
-

 Summary: Upgrade dependencies with later release versions
 Key: KAFKA-6678
 URL: https://issues.apache.org/jira/browse/KAFKA-6678
 Project: Kafka
  Issue Type: Improvement
Reporter: Ted Yu


{code}
The following dependencies have later release versions:
 - net.sourceforge.argparse4j:argparse4j [0.7.0 -> 0.8.1]
 - org.bouncycastle:bcpkix-jdk15on [1.58 -> 1.59]
 - com.puppycrawl.tools:checkstyle [6.19 -> 8.8]
 - org.owasp:dependency-check-gradle [3.0.2 -> 3.1.1]
 - org.ajoberstar:grgit [1.9.3 -> 2.1.1]
 - org.glassfish.jersey.containers:jersey-container-servlet [2.25.1 -> 2.26]
 - org.eclipse.jetty:jetty-client [9.2.24.v20180105 -> 9.4.8.v20171121]
 - org.eclipse.jetty:jetty-server [9.2.24.v20180105 -> 9.4.8.v20171121]
 - org.eclipse.jetty:jetty-servlet [9.2.24.v20180105 -> 9.4.8.v20171121]
 - org.eclipse.jetty:jetty-servlets [9.2.24.v20180105 -> 9.4.8.v20171121]
 - org.openjdk.jmh:jmh-core [1.19 -> 1.20]
 - org.openjdk.jmh:jmh-core-benchmarks [1.19 -> 1.20]
 - org.openjdk.jmh:jmh-generator-annprocess [1.19 -> 1.20]
 - org.lz4:lz4-java [1.4 -> 1.4.1]
 - org.apache.maven:maven-artifact [3.5.2 -> 3.5.3]
 - org.jacoco:org.jacoco.agent [0.7.9 -> 0.8.0]
 - org.jacoco:org.jacoco.ant [0.7.9 -> 0.8.0]
 - org.rocksdb:rocksdbjni [5.7.3 -> 5.11.3]
 - org.scala-lang:scala-library [2.11.12 -> 2.12.4]
 - com.typesafe.scala-logging:scala-logging_2.11 [3.7.2 -> 3.8.0]
 - org.scala-lang:scala-reflect [2.11.12 -> 2.12.4]
 - org.scalatest:scalatest_2.11 [3.0.4 -> 3.0.5]
{code}
Looks like we can consider upgrading scalatest, jmh-core and checkstyle



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6659) Improve error message if state store is not found

2018-03-18 Thread Stuart (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403894#comment-16403894
 ] 

Stuart commented on KAFKA-6659:
---

Hi I would like to pick this Jira up. I have found the code section 
ProcessorContextImpl, wanted some advice on the type of message wanted. Do we 
want to enhance the message to include "... has no access to StateStore 
questions-awaiting-answers-store 'as the store is not connected to the 
processor'? 

Any help on the hint message to fix the issue would be great?

 

> Improve error message if state store is not found
> -
>
> Key: KAFKA-6659
> URL: https://issues.apache.org/jira/browse/KAFKA-6659
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Trivial
>  Labels: beginner, easy-fix, newbie
>
> If a processor tries to access a store but the store is not connected to the 
> processor, Streams fails with
> {quote}Caused by: org.apache.kafka.streams.errors.TopologyBuilderException: 
> Invalid topology building: Processor KSTREAM-TRANSFORM-36 has no 
> access to StateStore questions-awaiting-answers-store
> {quote}
> We should improve this error message and give a hint to the user how to fix 
> the issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)