[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"
[ https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606600#comment-15606600 ] Arup Malakar commented on KAFKA-1466: - I think we may have found the cause for this issue on our systems (after two and half years). The chef recipe used to deploy the server had a bug and was deleting the configuration directory and recreating it. Luckily with Kafka's replication etc it wasn't a problem and we didn't see issues except occasional exception in our logs. > Kafka server is hung after throwing "Attempt to swap the new high watermark > file with the old one failed" > - > > Key: KAFKA-1466 > URL: https://issues.apache.org/jira/browse/KAFKA-1466 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Arup Malakar > Attachments: kafka.log.1 > > > We have a kafka cluster of four nodes. The cluster was down after one of the > nodes threw the following error: > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed. I saw the following message in the log file of the > failed node: > {code} > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaServer:67 - [Kafka Server 4], > Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper > client... > 2014-05-21 23:19:44 INFO > [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u > s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread. > 2014-05-21 23:19:44 INFO [main-EventThread]: ClientCnxn:521 - EventThread > shut down > 2014-05-21 23:19:44 INFO [Thread-1]: ZooKeeper:544 - Session: > 0x1456b562865b172 closed > 2014-05-21 23:19:44 INFO [kafka-processor-9092-0]: Processor:67 - Closing > socket connection to /10.245.173.136. > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutdown completed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutted down completely > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaScheduler:67 - Shutdown Kafka > scheduler > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaManager:67 - [Replica Manager on > Broker 4]: Shut down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutting down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-3], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-2], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutdown completed > {code} > I notice that since this error was logged there weren't any more logs in the > log file but the process was still alive, so I guess it was hung. > The other nodes in the cluster was not able to recover from this error. The > partitions owned by this failed node had its leader set to -1: > {code} > topic: test_topic partition: 8leader: -1 replicas: 4 isr: > {code} > And other nodes were continuously logging the following errors in the log > file: > {code} > 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - > [KafkaApi-3] Error while fetching metadata for partition [test_topic,8] > kafka.common.LeaderNotAvailableException: Leader not available for partition > [test_topic,8] > at > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474) > at > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) > at > scala.collection.Traversab
[jira] [Commented] (KAFKA-1479) Logs filling up while Kafka ReplicaFetcherThread tries to retrieve partition info for deleted topics
[ https://issues.apache.org/jira/browse/KAFKA-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249071#comment-14249071 ] Arup Malakar commented on KAFKA-1479: - For people who may stumble upon this JIRA, the steps mentioned by [~manasi] in https://issues.apache.org/jira/browse/KAFKA-1479?focusedCommentId=14017044&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14017044 worked for me as well. > Logs filling up while Kafka ReplicaFetcherThread tries to retrieve partition > info for deleted topics > > > Key: KAFKA-1479 > URL: https://issues.apache.org/jira/browse/KAFKA-1479 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.8.1 > Environment: CentOS >Reporter: Manasi Manasi > > Started noticing that logs are filling up fast with lines like this: > {quote} > [2014-06-01 15:18:08,218] WARN [KafkaApi-2] Fetch request with correlation id > 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-27,26] > failed due to Topic sams_2014-05-27 either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > [2014-06-01 15:18:08,218] WARN [KafkaApi-2] Fetch request with correlation id > 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-28,38] > failed due to Topic sams_2014-05-28 either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > [2014-06-01 15:18:08,219] WARN [KafkaApi-2] Fetch request with correlation id > 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-30,20] > failed due to Topic sams_2014-05-30 either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > [2014-06-01 15:18:08,219] WARN [KafkaApi-2] Fetch request with correlation id > 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-22,46] > failed due to Topic sams_2014-05-22 either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > [2014-06-01 15:18:08,219] WARN [KafkaApi-2] Fetch request with correlation id > 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-27,8] > failed due to Topic sams_2014-05-27 either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > {quote} > The above is from kafkaServer.out. Also seeing errors in server.log: > {quote} > [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for > partition [sams_2014-05-26,19] to broker 0:class > kafka.common.UnknownTopicOrPartitionException > (kafka.server.ReplicaFetcherThread) > [2014-06-01 15:23:52,788] WARN [KafkaApi-2] Fetch request with correlation id > 10887 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-30,4] > failed due to Topic sams_2014-05-30 either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for > partition [sams_2014-05-24,34] to broker 0:class > kafka.common.UnknownTopicOrPartitionException > (kafka.server.ReplicaFetcherThread) > [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for > partition [sams_2014-05-26,41] to broker 0:class > kafka.common.UnknownTopicOrPartitionException > (kafka.server.ReplicaFetcherThread) > [2014-06-01 15:23:52,788] WARN [KafkaApi-2] Fetch request with correlation id > 10887 from client ReplicaFetcherThread-0-2 on partition [2014-05-21,0] failed > due to Topic 2014-05-21 either doesn't exist or is in the process of being > deleted (kafka.server.KafkaApis) > [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for > partition [sams_2014-05-28,42] to broker 0:class > kafka.common.UnknownTopicOrPartitionException > (kafka.server.ReplicaFetcherThread) > [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for > partition [sams_2014-05-22,21] to broker 0:class > kafka.common.UnknownTopicOrPartitionException > (kafka.server.ReplicaFetcherThread) > [2014-06-01 15:23:52,788] WARN [KafkaApi-2] Fetch request with correlation id > 10887 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-20,26] > failed due to Topic sams_2014-05-20 either doesn't exist or is in the process > of being deleted (kafka.server.KafkaApis) > {quote} > All these partitions belong to deleted topics. Nothing changed on our end > when we started noticing these logs filling up. Any ideas what is going on? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"
[ https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012677#comment-14012677 ] Arup Malakar commented on KAFKA-1466: - [~junrao] no it wasn't wrapped in any container. Unfortunately I couldn't take a thread dump before the server was restarted. Looks like we don't have any more info to proceed, one option could be to try and reproduce it by playing with file permission of the waremark file. I will update the JIRA if I could get to it. Hopefully the discussion here would be useful to others if they happen to face the same issue. > Kafka server is hung after throwing "Attempt to swap the new high watermark > file with the old one failed" > - > > Key: KAFKA-1466 > URL: https://issues.apache.org/jira/browse/KAFKA-1466 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Arup Malakar > Attachments: kafka.log.1 > > > We have a kafka cluster of four nodes. The cluster was down after one of the > nodes threw the following error: > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed. I saw the following message in the log file of the > failed node: > {code} > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaServer:67 - [Kafka Server 4], > Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper > client... > 2014-05-21 23:19:44 INFO > [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u > s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread. > 2014-05-21 23:19:44 INFO [main-EventThread]: ClientCnxn:521 - EventThread > shut down > 2014-05-21 23:19:44 INFO [Thread-1]: ZooKeeper:544 - Session: > 0x1456b562865b172 closed > 2014-05-21 23:19:44 INFO [kafka-processor-9092-0]: Processor:67 - Closing > socket connection to /10.245.173.136. > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutdown completed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutted down completely > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaScheduler:67 - Shutdown Kafka > scheduler > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaManager:67 - [Replica Manager on > Broker 4]: Shut down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutting down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-3], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-2], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutdown completed > {code} > I notice that since this error was logged there weren't any more logs in the > log file but the process was still alive, so I guess it was hung. > The other nodes in the cluster was not able to recover from this error. The > partitions owned by this failed node had its leader set to -1: > {code} > topic: test_topic partition: 8leader: -1 replicas: 4 isr: > {code} > And other nodes were continuously logging the following errors in the log > file: > {code} > 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - > [KafkaApi-3] Error while fetching metadata for partition [test_topic,8] > kafka.common.LeaderNotAvailableException: Leader not available for partition > [test_topic,8] > at > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474) > at > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462) > at > scala.collection.TraversableLike$$anon
[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"
[ https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011913#comment-14011913 ] Arup Malakar commented on KAFKA-1466: - [~jkreps] I am running kafka-0.8 so it would be using HighwaterMarkCheckpoint.scala I assume. In the code snippet you could see that the process used to exit earlier: {code:java} if(!tempHwFile.renameTo(hwFile)) { // renameTo() fails on Windows if the destination file exists. hwFile.delete() if(!tempHwFile.renameTo(hwFile)) { fatal("Attempt to swap the new high watermark file with the old one failed") System.exit(1) } } {code} It is quite possible that the disk was full, so throwing error was fine. Was concerned about the fact that it didn't die and the rest of the cluster didn't continue working > Kafka server is hung after throwing "Attempt to swap the new high watermark > file with the old one failed" > - > > Key: KAFKA-1466 > URL: https://issues.apache.org/jira/browse/KAFKA-1466 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Arup Malakar > Attachments: kafka.log.1 > > > We have a kafka cluster of four nodes. The cluster was down after one of the > nodes threw the following error: > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed. I saw the following message in the log file of the > failed node: > {code} > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaServer:67 - [Kafka Server 4], > Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper > client... > 2014-05-21 23:19:44 INFO > [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u > s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread. > 2014-05-21 23:19:44 INFO [main-EventThread]: ClientCnxn:521 - EventThread > shut down > 2014-05-21 23:19:44 INFO [Thread-1]: ZooKeeper:544 - Session: > 0x1456b562865b172 closed > 2014-05-21 23:19:44 INFO [kafka-processor-9092-0]: Processor:67 - Closing > socket connection to /10.245.173.136. > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutdown completed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutted down completely > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaScheduler:67 - Shutdown Kafka > scheduler > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaManager:67 - [Replica Manager on > Broker 4]: Shut down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutting down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-3], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-2], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutdown completed > {code} > I notice that since this error was logged there weren't any more logs in the > log file but the process was still alive, so I guess it was hung. > The other nodes in the cluster was not able to recover from this error. The > partitions owned by this failed node had its leader set to -1: > {code} > topic: test_topic partition: 8leader: -1 replicas: 4 isr: > {code} > And other nodes were continuously logging the following errors in the log > file: > {code} > 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - > [KafkaApi-3] Error while fetching metadata for partition [test_topic,8] > kafka.common.LeaderNotAvailableException: Leader not available
[jira] [Updated] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"
[ https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated KAFKA-1466: Attachment: kafka.log.1 [~jjkoshy] Some more info: 1. The topic we use for actual prod messages is not test_topic and has *two replicas*. We were unable to push messages to that topic, so the cluster was indeed unavailable: {code} topic: staging_thrift_streaming partition: 0leader: 2 replicas: 4,2 isr: 2,4 topic: staging_thrift_streaming partition: 1leader: 1 replicas: 1,3 isr: 1,3 topic: staging_thrift_streaming partition: 2leader: 2 replicas: 2,4 isr: 2,4 .. {code} 2. More info: Java version: {code} java -version java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) {code} OS Version: {code} ~# uname -a Linux ip-X-X-X-X 3.2.0-51-virtual #77-Ubuntu SMP Wed Jul 24 20:38:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux ~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 12.04 LTS Release:12.04 Codename: precise {code} I couldn't find anything strange in the kernel logs though. I am attaching the kafka logs here. > Kafka server is hung after throwing "Attempt to swap the new high watermark > file with the old one failed" > - > > Key: KAFKA-1466 > URL: https://issues.apache.org/jira/browse/KAFKA-1466 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Arup Malakar > Attachments: kafka.log.1 > > > We have a kafka cluster of four nodes. The cluster was down after one of the > nodes threw the following error: > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed. I saw the following message in the log file of the > failed node: > {code} > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaServer:67 - [Kafka Server 4], > Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper > client... > 2014-05-21 23:19:44 INFO > [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u > s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread. > 2014-05-21 23:19:44 INFO [main-EventThread]: ClientCnxn:521 - EventThread > shut down > 2014-05-21 23:19:44 INFO [Thread-1]: ZooKeeper:544 - Session: > 0x1456b562865b172 closed > 2014-05-21 23:19:44 INFO [kafka-processor-9092-0]: Processor:67 - Closing > socket connection to /10.245.173.136. > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutdown completed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutted down completely > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaScheduler:67 - Shutdown Kafka > scheduler > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaManager:67 - [Replica Manager on > Broker 4]: Shut down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutting down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-3], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-2], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutdown completed > {code} > I notice that since this error was logged there weren't any more logs in the > log file but the process was still alive, so I guess it was hung. > The other nodes in the cluster was not able to recover from this error. The > partitions owned by this failed node had its leader set to -1: > {code} > topic:
[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"
[ https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007651#comment-14007651 ] Arup Malakar commented on KAFKA-1466: - [~fancyrao] couldn't reproduce the error. It went away the moment I restarted the kafka server and haven't seen the error since. There are two parts to the issue: - We expected the rest of the kafla cluster to continue functioning, despite failure of one node in the cluster, but it wasn't. We were unable to push any message to the cluster. - The kafka node which threw error should have died after throwing the fatal error so that it would get restarted by upstart/init.d. Instead it was just hung and upstart was not aware of it. I would be happy to provide any other details that may be helpful in finding out the issue. > Kafka server is hung after throwing "Attempt to swap the new high watermark > file with the old one failed" > - > > Key: KAFKA-1466 > URL: https://issues.apache.org/jira/browse/KAFKA-1466 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Arup Malakar > > We have a kafka cluster of four nodes. The cluster was down after one of the > nodes threw the following error: > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed. I saw the following message in the log file of the > failed node: > {code} > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaServer:67 - [Kafka Server 4], > Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper > client... > 2014-05-21 23:19:44 INFO > [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u > s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread. > 2014-05-21 23:19:44 INFO [main-EventThread]: ClientCnxn:521 - EventThread > shut down > 2014-05-21 23:19:44 INFO [Thread-1]: ZooKeeper:544 - Session: > 0x1456b562865b172 closed > 2014-05-21 23:19:44 INFO [kafka-processor-9092-0]: Processor:67 - Closing > socket connection to /10.245.173.136. > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutdown completed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutted down completely > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaScheduler:67 - Shutdown Kafka > scheduler > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaManager:67 - [Replica Manager on > Broker 4]: Shut down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutting down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-3], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-2], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutdown completed > {code} > I notice that since this error was logged there weren't any more logs in the > log file but the process was still alive, so I guess it was hung. > The other nodes in the cluster was not able to recover from this error. The > partitions owned by this failed node had its leader set to -1: > {code} > topic: test_topic partition: 8leader: -1 replicas: 4 isr: > {code} > And other nodes were continuously logging the following errors in the log > file: > {code} > 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - > [KafkaApi-3] Error while fetching metadata for partition [test_topic,8] > kafka.common.LeaderNotAvailableException: Leader not available for partition > [test_topic,8] > at > kafka.server.KafkaApis$$anonfun$17$
[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"
[ https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006813#comment-14006813 ] Arup Malakar commented on KAFKA-1466: - No it is in linux. > Kafka server is hung after throwing "Attempt to swap the new high watermark > file with the old one failed" > - > > Key: KAFKA-1466 > URL: https://issues.apache.org/jira/browse/KAFKA-1466 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Arup Malakar > > We have a kafka cluster of four nodes. The cluster was down after one of the > nodes threw the following error: > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed. I saw the following message in the log file of the > failed node: > {code} > 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: > HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file > with the old one failed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaServer:67 - [Kafka Server 4], > Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper > client... > 2014-05-21 23:19:44 INFO > [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u > s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread. > 2014-05-21 23:19:44 INFO [main-EventThread]: ClientCnxn:521 - EventThread > shut down > 2014-05-21 23:19:44 INFO [Thread-1]: ZooKeeper:544 - Session: > 0x1456b562865b172 closed > 2014-05-21 23:19:44 INFO [kafka-processor-9092-0]: Processor:67 - Closing > socket connection to /10.245.173.136. > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on > Broker 4], Shutdown completed > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutting down > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka > Request Handler on Broker 4], shutted down completely > 2014-05-21 23:19:44 INFO [Thread-1]: KafkaScheduler:67 - Shutdown Kafka > scheduler > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaManager:67 - [Replica Manager on > Broker 4]: Shut down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutting down > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-3], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-3], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutting down > 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 > - [ReplicaFetcherThread-0-2], Stopped > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - > [ReplicaFetcherThread-0-2], Shutdown completed > 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - > [ReplicaFetcherManager on broker 4] shutdown completed > {code} > I notice that since this error was logged there weren't any more logs in the > log file but the process was still alive, so I guess it was hung. > The other nodes in the cluster was not able to recover from this error. The > partitions owned by this failed node had its leader set to -1: > {code} > topic: test_topic partition: 8leader: -1 replicas: 4 isr: > {code} > And other nodes were continuously logging the following errors in the log > file: > {code} > 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - > [KafkaApi-3] Error while fetching metadata for partition [test_topic,8] > kafka.common.LeaderNotAvailableException: Leader not available for partition > [test_topic,8] > at > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474) > at > kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) > at > scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) > at scala.collection.immutable.List.foreach(List.scala:45) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:206) > at scala.collection.immutable.List.map(List.scala:45
[jira] [Created] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"
Arup Malakar created KAFKA-1466: --- Summary: Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed" Key: KAFKA-1466 URL: https://issues.apache.org/jira/browse/KAFKA-1466 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Arup Malakar We have a kafka cluster of four nodes. The cluster was down after one of the nodes threw the following error: 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file with the old one failed. I saw the following message in the log file of the failed node: {code} 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file with the old one failed 2014-05-21 23:19:44 INFO [Thread-1]: KafkaServer:67 - [Kafka Server 4], Shutting down 2014-05-21 23:19:44 INFO [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper client... 2014-05-21 23:19:44 INFO [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread. 2014-05-21 23:19:44 INFO [main-EventThread]: ClientCnxn:521 - EventThread shut down 2014-05-21 23:19:44 INFO [Thread-1]: ZooKeeper:544 - Session: 0x1456b562865b172 closed 2014-05-21 23:19:44 INFO [kafka-processor-9092-0]: Processor:67 - Closing socket connection to /10.245.173.136. 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on Broker 4], Shutting down 2014-05-21 23:19:44 INFO [Thread-1]: SocketServer:67 - [Socket Server on Broker 4], Shutdown completed 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka Request Handler on Broker 4], shutting down 2014-05-21 23:19:44 INFO [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka Request Handler on Broker 4], shutted down completely 2014-05-21 23:19:44 INFO [Thread-1]: KafkaScheduler:67 - Shutdown Kafka scheduler 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaManager:67 - [Replica Manager on Broker 4]: Shut down 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - [ReplicaFetcherManager on broker 4] shutting down 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - [ReplicaFetcherThread-0-3], Shutting down 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 - [ReplicaFetcherThread-0-3], Stopped 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - [ReplicaFetcherThread-0-3], Shutdown completed 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - [ReplicaFetcherThread-0-2], Shutting down 2014-05-21 23:19:45 INFO [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 - [ReplicaFetcherThread-0-2], Stopped 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherThread:67 - [ReplicaFetcherThread-0-2], Shutdown completed 2014-05-21 23:19:45 INFO [Thread-1]: ReplicaFetcherManager:67 - [ReplicaFetcherManager on broker 4] shutdown completed {code} I notice that since this error was logged there weren't any more logs in the log file but the process was still alive, so I guess it was hung. The other nodes in the cluster was not able to recover from this error. The partitions owned by this failed node had its leader set to -1: {code} topic: test_topic partition: 8leader: -1 replicas: 4 isr: {code} And other nodes were continuously logging the following errors in the log file: {code} 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - [KafkaApi-3] Error while fetching metadata for partition [test_topic,8] kafka.common.LeaderNotAvailableException: Leader not available for partition [test_topic,8] at kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474) at kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at scala.collection.TraversableLike$class.map(TraversableLike.scala:206) at scala.collection.immutable.List.map(List.scala:45) at kafka.server.KafkaApis$$anonfun$17.apply(KafkaApis.scala:462) at kafka.server.KafkaApis$$anonfun$17.apply(KafkaApis.scala:458) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:123) at scala.co
[jira] [Created] (KAFKA-1455) Expose ConsumerOffsetChecker as an api instead of being command line only
Arup Malakar created KAFKA-1455: --- Summary: Expose ConsumerOffsetChecker as an api instead of being command line only Key: KAFKA-1455 URL: https://issues.apache.org/jira/browse/KAFKA-1455 Project: Kafka Issue Type: Improvement Components: tools Reporter: Arup Malakar Priority: Minor I find ConsumerOffsetChecker very useful when it comes to checking offset/lag for a consumer group. It would be nice if it could be exposed as a class that could be used from other programs instead of being only a command line too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated KAFKA-1146: Attachment: KAFKA-1146.patch > toString() on KafkaStream gets stuck indefinitely > - > > Key: KAFKA-1146 > URL: https://issues.apache.org/jira/browse/KAFKA-1146 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.8.0 >Reporter: Arup Malakar >Priority: Trivial > Labels: newbie > Fix For: 0.9.0 > > Attachments: KAFKA-1146.patch > > > There is no toString implementation for KafkaStream, so if a user tries to > print the stream it falls back to default toString implementation which tries > to iterate over the collection and gets stuck indefinitely as it awaits > messages. KafkaStream could instead override the toString and return a > verbose description of the stream with topic name etc. > println("Current stream: " + stream) // This call never returns -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated KAFKA-1146: Status: Patch Available (was: Open) > toString() on KafkaStream gets stuck indefinitely > - > > Key: KAFKA-1146 > URL: https://issues.apache.org/jira/browse/KAFKA-1146 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.8.0 >Reporter: Arup Malakar >Priority: Trivial > Labels: newbie > Fix For: 0.9.0 > > Attachments: KAFKA-1146.patch > > > There is no toString implementation for KafkaStream, so if a user tries to > print the stream it falls back to default toString implementation which tries > to iterate over the collection and gets stuck indefinitely as it awaits > messages. KafkaStream could instead override the toString and return a > verbose description of the stream with topic name etc. > println("Current stream: " + stream) // This call never returns -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833003#comment-13833003 ] Arup Malakar commented on KAFKA-1146: - [~jjkoshy] Yes overriding would definitely be beneficial. I can submit a patch for this. Any suggestion on what I could put in the toString method? > toString() on KafkaStream gets stuck indefinitely > - > > Key: KAFKA-1146 > URL: https://issues.apache.org/jira/browse/KAFKA-1146 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.8 >Reporter: Arup Malakar >Assignee: Neha Narkhede >Priority: Trivial > Fix For: 0.8.1 > > > There is no toString implementation for KafkaStream, so if a user tries to > print the stream it falls back to default toString implementation which tries > to iterate over the collection and gets stuck indefinitely as it awaits > messages. KafkaStream could instead override the toString and return a > verbose description of the stream with topic name etc. > println("Current stream: " + stream) // This call never returns -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely
Arup Malakar created KAFKA-1146: --- Summary: toString() on KafkaStream gets stuck indefinitely Key: KAFKA-1146 URL: https://issues.apache.org/jira/browse/KAFKA-1146 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8 Reporter: Arup Malakar Assignee: Neha Narkhede Priority: Trivial There is no toString implementation for KafkaStream, so if a user tries to print the stream it falls back to default toString implementation which tries to iterate over the collection and gets stuck indefinitely as it awaits messages. KafkaStream could instead override the toString and return a verbose description of the stream with topic name etc. println("Current stream: " + stream) // This call never returns -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (KAFKA-1110) Unable to produce messages with snappy/gzip compression
[ https://issues.apache.org/jira/browse/KAFKA-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817016#comment-13817016 ] Arup Malakar commented on KAFKA-1110: - May be Evan would be able to provide more information. But gzip is not working either. > Unable to produce messages with snappy/gzip compression > --- > > Key: KAFKA-1110 > URL: https://issues.apache.org/jira/browse/KAFKA-1110 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 > Environment: Kafka version: kafka-0.8.0-beta1 > OS version: Darwin 12.4.1 Darwin Kernel Version 12.4.1: Tue May 21 17:04:50 > PDT 2013; root:xnu-2050.40.51~1/RELEASE_X86_64 x86_64 >Reporter: Arup Malakar > Attachments: kafka_producer_snappy_pkt_63.pcapng, > sarama_producer_snappy_pkt_1.pcapng > > > Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is > following the specs as defined in: > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol > but messages are not getting into the kafka log file and consumers never see > them when gzip/snappy is used. Without compression it works fine though. > Few observations we made: > 1. Kafka service does have required jars to be able to interpret snappy > messages. When I modify ConsoleProducer to produce messages using > SnappyCompressionCodec instead of default GZip codec. I was able to > produce/consume messages. Looking at the kafka log files I see that Snappy > Compression was indeed getting used: > % bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files > /tmp/kafka-logs/aruptest-0/.log | tail -1 > offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 > compresscodec: SnappyCompressionCodec crc: 1602790249 > So I don't think it would be a case of missing jars in kafka server. > 2. Kafka doesn't return any error if the message doesn't make it to the log > file. This seems pretty serious, as I would expect kafka to throw an error if > I am using WaitForLocal/WaitForAll. > 3. We did an inspection of the tcp packet to see the difference between what > ConsoleProducer sends vs what sarama sends > (Following is a copy/paste from a github issue): > [~eapache] : "So I have no idea what the ConsoleProducer is actually sending > in this case. The outer protocol layers in both cases look identical, but if > you compare the actual message value: > a. Sarama sends two bytes of snappy header and then "" (since > Snappy decides it's too short to properly encode, so makes it a literal). > Pretty straightforward. > b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what > appears to be a complete embedded produce request without any compression. > This is neither valid snappy nor valid Kafka according to anything I've seen, > so I'm pretty confused. It looks almost like an incorrect version of [1] but > it's missing several key fields and the case of the identifying string is > wrong. > 1: http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt " > Let us know if recent changes in the codebase makes the protocol page > obsolete, in that case if the protocol page is updated we could update our > client to use the new spec. > More information could be found in the following github issue: > https://github.com/Shopify/sarama/issues/32 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (KAFKA-1110) Unable to produce messages with snappy/gzip compression
[ https://issues.apache.org/jira/browse/KAFKA-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated KAFKA-1110: Description: Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is following the specs as defined in: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol but messages are not getting into the kafka log file and consumers never see them when gzip/snappy is used. Without compression it works fine though. Few observations we made: 1. Kafka service does have required jars to be able to interpret snappy messages. When I modify ConsoleProducer to produce messages using SnappyCompressionCodec instead of default GZip codec. I was able to produce/consume messages. Looking at the kafka log files I see that Snappy Compression was indeed getting used: % bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /tmp/kafka-logs/aruptest-0/.log | tail -1 offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 compresscodec: SnappyCompressionCodec crc: 1602790249 So I don't think it would be a case of missing jars in kafka server. 2. Kafka doesn't return any error if the message doesn't make it to the log file. This seems pretty serious, as I would expect kafka to throw an error if I am using WaitForLocal/WaitForAll. 3. We did an inspection of the tcp packet to see the difference between what ConsoleProducer sends vs what sarama sends (Following is a copy/paste from a github issue): [~eapache] : "So I have no idea what the ConsoleProducer is actually sending in this case. The outer protocol layers in both cases look identical, but if you compare the actual message value: a. Sarama sends two bytes of snappy header and then "" (since Snappy decides it's too short to properly encode, so makes it a literal). Pretty straightforward. b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what appears to be a complete embedded produce request without any compression. This is neither valid snappy nor valid Kafka according to anything I've seen, so I'm pretty confused. It looks almost like an incorrect version of [1] but it's missing several key fields and the case of the identifying string is wrong. 1: http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt " Let us know if recent changes in the codebase makes the protocol page obsolete, in that case if the protocol page is updated we could update our client to use the new spec. More information could be found in the following github issue: https://github.com/Shopify/sarama/issues/32 was: Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is following the specs as defined in: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol but messages are not getting into the kafka log file and consumers never see them when gzip/snappy is used. Without compression it works fine though. Few observations we made: 1. Kafka service does have required jars to be able to interpret snappy messages. When I modify ConsoleProducer to produce messages using SnappyCompressionCodec instead of default GZip codec. I was able to produce/consume messages. Looking at the kafka log files I see that Snappy Compression was indeed getting used: % bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /tmp/kafka-logs/aruptest-0/.log | tail -1 offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 compresscodec: SnappyCompressionCodec crc: 1602790249 So I don't think it would be a case of missing jars in kafka server. 2. Kafka doesn't return any error if the message doesn't make it to the log file. This seems pretty serious, as I would expect kafka to throw an error if I am using WaitForLocal/WaitForAll. 3. We did an inspection of the tcp packet to see the difference between what ConsoleProducer sends vs what sarama sends (Following is a copy/paste from a github issue): [~eapache] : So I have no idea what the ConsoleProducer is actually sending in this case. The outer protocol layers in both cases look identical, but if you compare the actual message value: a. Sarama sends two bytes of snappy header and then "" (since Snappy decides it's too short to properly encode, so makes it a literal). Pretty straightforward. b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what appears to be a complete embedded produce request without any compression. This is neither valid snappy nor valid Kafka according to anything I've seen, so I'm pretty confused. It looks almost like an incorrect version of [1] but it's missing several key fields and the case of the identifying string is wrong. Let us know if recent changes in the codebase makes the protocol page obsolete, in that case if the protocol page is updated we could update our client to use the new spec. M
[jira] [Updated] (KAFKA-1110) Unable to produce messages with snappy/gzip compression
[ https://issues.apache.org/jira/browse/KAFKA-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated KAFKA-1110: Attachment: kafka_producer_snappy_pkt_63.pcapng sarama_producer_snappy_pkt_1.pcapng TCP dump of the sarama/ConsoleProducer packet. > Unable to produce messages with snappy/gzip compression > --- > > Key: KAFKA-1110 > URL: https://issues.apache.org/jira/browse/KAFKA-1110 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8 > Environment: Kafka version: kafka-0.8.0-beta1 > OS version: Darwin 12.4.1 Darwin Kernel Version 12.4.1: Tue May 21 17:04:50 > PDT 2013; root:xnu-2050.40.51~1/RELEASE_X86_64 x86_64 >Reporter: Arup Malakar > Attachments: kafka_producer_snappy_pkt_63.pcapng, > sarama_producer_snappy_pkt_1.pcapng > > > Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is > following the specs as defined in: > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol > but messages are not getting into the kafka log file and consumers never see > them when gzip/snappy is used. Without compression it works fine though. > Few observations we made: > 1. Kafka service does have required jars to be able to interpret snappy > messages. When I modify ConsoleProducer to produce messages using > SnappyCompressionCodec instead of default GZip codec. I was able to > produce/consume messages. Looking at the kafka log files I see that Snappy > Compression was indeed getting used: > % bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files > /tmp/kafka-logs/aruptest-0/.log | tail -1 > offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 > compresscodec: SnappyCompressionCodec crc: 1602790249 > So I don't think it would be a case of missing jars in kafka server. > 2. Kafka doesn't return any error if the message doesn't make it to the log > file. This seems pretty serious, as I would expect kafka to throw an error if > I am using WaitForLocal/WaitForAll. > 3. We did an inspection of the tcp packet to see the difference between what > ConsoleProducer sends vs what sarama sends > (Following is a copy/paste from a github issue): > [~eapache] : > So I have no idea what the ConsoleProducer is actually sending in this case. > The outer protocol layers in both cases look identical, but if you compare > the actual message value: > a. Sarama sends two bytes of snappy header and then "" (since > Snappy decides it's too short to properly encode, so makes it a literal). > Pretty straightforward. > b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what > appears to be a complete embedded produce request without any compression. > This is neither valid snappy nor valid Kafka according to anything I've seen, > so I'm pretty confused. It looks almost like an incorrect version of [1] but > it's missing several key fields and the case of the identifying string is > wrong. > Let us know if recent changes in the codebase makes the protocol page > obsolete, in that case if the protocol page is updated we could update our > client to use the new spec. > More information could be found in the following github issue: > https://github.com/Shopify/sarama/issues/32 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (KAFKA-1110) Unable to produce messages with snappy/gzip compression
Arup Malakar created KAFKA-1110: --- Summary: Unable to produce messages with snappy/gzip compression Key: KAFKA-1110 URL: https://issues.apache.org/jira/browse/KAFKA-1110 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Environment: Kafka version: kafka-0.8.0-beta1 OS version: Darwin 12.4.1 Darwin Kernel Version 12.4.1: Tue May 21 17:04:50 PDT 2013; root:xnu-2050.40.51~1/RELEASE_X86_64 x86_64 Reporter: Arup Malakar Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is following the specs as defined in: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol but messages are not getting into the kafka log file and consumers never see them when gzip/snappy is used. Without compression it works fine though. Few observations we made: 1. Kafka service does have required jars to be able to interpret snappy messages. When I modify ConsoleProducer to produce messages using SnappyCompressionCodec instead of default GZip codec. I was able to produce/consume messages. Looking at the kafka log files I see that Snappy Compression was indeed getting used: % bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /tmp/kafka-logs/aruptest-0/.log | tail -1 offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 compresscodec: SnappyCompressionCodec crc: 1602790249 So I don't think it would be a case of missing jars in kafka server. 2. Kafka doesn't return any error if the message doesn't make it to the log file. This seems pretty serious, as I would expect kafka to throw an error if I am using WaitForLocal/WaitForAll. 3. We did an inspection of the tcp packet to see the difference between what ConsoleProducer sends vs what sarama sends (Following is a copy/paste from a github issue): [~eapache] : So I have no idea what the ConsoleProducer is actually sending in this case. The outer protocol layers in both cases look identical, but if you compare the actual message value: a. Sarama sends two bytes of snappy header and then "" (since Snappy decides it's too short to properly encode, so makes it a literal). Pretty straightforward. b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what appears to be a complete embedded produce request without any compression. This is neither valid snappy nor valid Kafka according to anything I've seen, so I'm pretty confused. It looks almost like an incorrect version of [1] but it's missing several key fields and the case of the identifying string is wrong. Let us know if recent changes in the codebase makes the protocol page obsolete, in that case if the protocol page is updated we could update our client to use the new spec. More information could be found in the following github issue: https://github.com/Shopify/sarama/issues/32 -- This message was sent by Atlassian JIRA (v6.1#6144)