[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"

2016-10-25 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606600#comment-15606600
 ] 

Arup Malakar commented on KAFKA-1466:
-

I think we may have found the cause for this issue on our systems (after two 
and half years). The chef recipe used to deploy the server had a bug and was 
deleting the configuration directory and recreating it. Luckily with Kafka's 
replication etc it wasn't a problem and we didn't see issues except occasional 
exception in our logs.

> Kafka server is hung after throwing "Attempt to swap the new high watermark 
> file with the old one failed"
> -
>
> Key: KAFKA-1466
> URL: https://issues.apache.org/jira/browse/KAFKA-1466
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
> Attachments: kafka.log.1
>
>
> We have a kafka cluster of four nodes. The cluster was down after one of the 
> nodes threw the following error:
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed. I saw the following message in the log file of the 
> failed node:
> {code}
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaServer:67 - [Kafka Server 4], 
> Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper 
> client...
> 2014-05-21 23:19:44 INFO  
> [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u
> s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread.
> 2014-05-21 23:19:44 INFO  [main-EventThread]: ClientCnxn:521 - EventThread 
> shut down
> 2014-05-21 23:19:44 INFO  [Thread-1]: ZooKeeper:544 - Session: 
> 0x1456b562865b172 closed
> 2014-05-21 23:19:44 INFO  [kafka-processor-9092-0]: Processor:67 - Closing 
> socket connection to /10.245.173.136.
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutdown completed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutted down completely
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaScheduler:67 - Shutdown Kafka 
> scheduler
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaManager:67 - [Replica Manager on 
> Broker 4]: Shut down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutting down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-3], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-2], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutdown completed
> {code} 
> I notice that since this error was logged there weren't any more logs in the 
> log file but the process was still alive, so I guess it was hung.
> The other nodes in the cluster was not able to recover from this error. The 
> partitions owned by this failed node had its leader set to -1:
> {code}
> topic: test_topic partition: 8leader: -1  replicas: 4 isr:
> {code}
> And other nodes were continuously logging the following errors in the log 
> file:
> {code}
> 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - 
> [KafkaApi-3] Error while fetching metadata for partition [test_topic,8]
> kafka.common.LeaderNotAvailableException: Leader not available for partition 
> [test_topic,8]
>   at 
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474)
>   at 
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>   at 
> scala.collection.Traversab

[jira] [Commented] (KAFKA-1479) Logs filling up while Kafka ReplicaFetcherThread tries to retrieve partition info for deleted topics

2014-12-16 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249071#comment-14249071
 ] 

Arup Malakar commented on KAFKA-1479:
-

For people who may stumble upon this JIRA, the steps mentioned by [~manasi] in 
https://issues.apache.org/jira/browse/KAFKA-1479?focusedCommentId=14017044&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14017044
 worked for me as well.

> Logs filling up while Kafka ReplicaFetcherThread tries to retrieve partition 
> info for deleted topics
> 
>
> Key: KAFKA-1479
> URL: https://issues.apache.org/jira/browse/KAFKA-1479
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.8.1
> Environment: CentOS
>Reporter: Manasi Manasi
>
> Started noticing that logs are filling up fast with lines like this:
> {quote}
> [2014-06-01 15:18:08,218] WARN [KafkaApi-2] Fetch request with correlation id 
> 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-27,26] 
> failed due to Topic sams_2014-05-27 either doesn't exist or is in the process 
> of being deleted (kafka.server.KafkaApis)
> [2014-06-01 15:18:08,218] WARN [KafkaApi-2] Fetch request with correlation id 
> 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-28,38] 
> failed due to Topic sams_2014-05-28 either doesn't exist or is in the process 
> of being deleted (kafka.server.KafkaApis)
> [2014-06-01 15:18:08,219] WARN [KafkaApi-2] Fetch request with correlation id 
> 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-30,20] 
> failed due to Topic sams_2014-05-30 either doesn't exist or is in the process 
> of being deleted (kafka.server.KafkaApis)
> [2014-06-01 15:18:08,219] WARN [KafkaApi-2] Fetch request with correlation id 
> 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-22,46] 
> failed due to Topic sams_2014-05-22 either doesn't exist or is in the process 
> of being deleted (kafka.server.KafkaApis)
> [2014-06-01 15:18:08,219] WARN [KafkaApi-2] Fetch request with correlation id 
> 10049 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-27,8] 
> failed due to Topic sams_2014-05-27 either doesn't exist or is in the process 
> of being deleted (kafka.server.KafkaApis)
> {quote}
> The above is from kafkaServer.out. Also seeing errors in server.log:
> {quote}
> [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for 
> partition [sams_2014-05-26,19] to broker 0:class 
> kafka.common.UnknownTopicOrPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2014-06-01 15:23:52,788] WARN [KafkaApi-2] Fetch request with correlation id 
> 10887 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-30,4] 
> failed due to Topic sams_2014-05-30 either doesn't exist or is in the process 
> of being deleted (kafka.server.KafkaApis)
> [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for 
> partition [sams_2014-05-24,34] to broker 0:class 
> kafka.common.UnknownTopicOrPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for 
> partition [sams_2014-05-26,41] to broker 0:class 
> kafka.common.UnknownTopicOrPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2014-06-01 15:23:52,788] WARN [KafkaApi-2] Fetch request with correlation id 
> 10887 from client ReplicaFetcherThread-0-2 on partition [2014-05-21,0] failed 
> due to Topic 2014-05-21 either doesn't exist or is in the process of being 
> deleted (kafka.server.KafkaApis)
> [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for 
> partition [sams_2014-05-28,42] to broker 0:class 
> kafka.common.UnknownTopicOrPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2014-06-01 15:23:52,788] ERROR [ReplicaFetcherThread-0-0], Error for 
> partition [sams_2014-05-22,21] to broker 0:class 
> kafka.common.UnknownTopicOrPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2014-06-01 15:23:52,788] WARN [KafkaApi-2] Fetch request with correlation id 
> 10887 from client ReplicaFetcherThread-0-2 on partition [sams_2014-05-20,26] 
> failed due to Topic sams_2014-05-20 either doesn't exist or is in the process 
> of being deleted (kafka.server.KafkaApis)
> {quote}
> All these partitions belong to deleted topics. Nothing changed on our end 
> when we started noticing these logs filling up. Any ideas what is going on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"

2014-05-29 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012677#comment-14012677
 ] 

Arup Malakar commented on KAFKA-1466:
-

[~junrao] no it wasn't wrapped in any container. Unfortunately I couldn't take 
a thread dump before the server was restarted. Looks like we don't have any 
more info to proceed, one option could be to try and reproduce it by playing 
with file permission of the waremark file. I will update the JIRA if I could 
get to it. Hopefully the discussion here would be useful to others if they 
happen to face the same issue.

> Kafka server is hung after throwing "Attempt to swap the new high watermark 
> file with the old one failed"
> -
>
> Key: KAFKA-1466
> URL: https://issues.apache.org/jira/browse/KAFKA-1466
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
> Attachments: kafka.log.1
>
>
> We have a kafka cluster of four nodes. The cluster was down after one of the 
> nodes threw the following error:
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed. I saw the following message in the log file of the 
> failed node:
> {code}
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaServer:67 - [Kafka Server 4], 
> Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper 
> client...
> 2014-05-21 23:19:44 INFO  
> [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u
> s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread.
> 2014-05-21 23:19:44 INFO  [main-EventThread]: ClientCnxn:521 - EventThread 
> shut down
> 2014-05-21 23:19:44 INFO  [Thread-1]: ZooKeeper:544 - Session: 
> 0x1456b562865b172 closed
> 2014-05-21 23:19:44 INFO  [kafka-processor-9092-0]: Processor:67 - Closing 
> socket connection to /10.245.173.136.
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutdown completed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutted down completely
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaScheduler:67 - Shutdown Kafka 
> scheduler
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaManager:67 - [Replica Manager on 
> Broker 4]: Shut down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutting down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-3], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-2], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutdown completed
> {code} 
> I notice that since this error was logged there weren't any more logs in the 
> log file but the process was still alive, so I guess it was hung.
> The other nodes in the cluster was not able to recover from this error. The 
> partitions owned by this failed node had its leader set to -1:
> {code}
> topic: test_topic partition: 8leader: -1  replicas: 4 isr:
> {code}
> And other nodes were continuously logging the following errors in the log 
> file:
> {code}
> 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - 
> [KafkaApi-3] Error while fetching metadata for partition [test_topic,8]
> kafka.common.LeaderNotAvailableException: Leader not available for partition 
> [test_topic,8]
>   at 
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474)
>   at 
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462)
>   at 
> scala.collection.TraversableLike$$anon

[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"

2014-05-28 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011913#comment-14011913
 ] 

Arup Malakar commented on KAFKA-1466:
-

[~jkreps] I am running kafka-0.8 so it would be using 
HighwaterMarkCheckpoint.scala I assume. In the code snippet you could see that 
the process used to exit earlier: 

{code:java}
  if(!tempHwFile.renameTo(hwFile)) {
// renameTo() fails on Windows if the destination file exists.
hwFile.delete()
if(!tempHwFile.renameTo(hwFile)) {
  fatal("Attempt to swap the new high watermark file with the old one 
failed")
  System.exit(1)
}
  }
{code}

It is quite possible that the disk was full, so throwing error was fine. Was 
concerned about the fact that it didn't die and the rest of the cluster didn't 
continue working  




> Kafka server is hung after throwing "Attempt to swap the new high watermark 
> file with the old one failed"
> -
>
> Key: KAFKA-1466
> URL: https://issues.apache.org/jira/browse/KAFKA-1466
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
> Attachments: kafka.log.1
>
>
> We have a kafka cluster of four nodes. The cluster was down after one of the 
> nodes threw the following error:
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed. I saw the following message in the log file of the 
> failed node:
> {code}
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaServer:67 - [Kafka Server 4], 
> Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper 
> client...
> 2014-05-21 23:19:44 INFO  
> [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u
> s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread.
> 2014-05-21 23:19:44 INFO  [main-EventThread]: ClientCnxn:521 - EventThread 
> shut down
> 2014-05-21 23:19:44 INFO  [Thread-1]: ZooKeeper:544 - Session: 
> 0x1456b562865b172 closed
> 2014-05-21 23:19:44 INFO  [kafka-processor-9092-0]: Processor:67 - Closing 
> socket connection to /10.245.173.136.
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutdown completed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutted down completely
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaScheduler:67 - Shutdown Kafka 
> scheduler
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaManager:67 - [Replica Manager on 
> Broker 4]: Shut down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutting down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-3], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-2], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutdown completed
> {code} 
> I notice that since this error was logged there weren't any more logs in the 
> log file but the process was still alive, so I guess it was hung.
> The other nodes in the cluster was not able to recover from this error. The 
> partitions owned by this failed node had its leader set to -1:
> {code}
> topic: test_topic partition: 8leader: -1  replicas: 4 isr:
> {code}
> And other nodes were continuously logging the following errors in the log 
> file:
> {code}
> 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - 
> [KafkaApi-3] Error while fetching metadata for partition [test_topic,8]
> kafka.common.LeaderNotAvailableException: Leader not available

[jira] [Updated] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"

2014-05-28 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated KAFKA-1466:


Attachment: kafka.log.1

[~jjkoshy] Some more info:

1. The topic we use for actual prod messages is not test_topic and has *two 
replicas*. We were unable to push messages to that topic, so the cluster was 
indeed unavailable:
{code}
topic: staging_thrift_streaming partition: 0leader: 2   replicas: 4,2   
isr: 2,4
topic: staging_thrift_streaming partition: 1leader: 1   replicas: 1,3   
isr: 1,3
topic: staging_thrift_streaming partition: 2leader: 2   replicas: 2,4   
isr: 2,4
..
{code}

2. More info:

Java version:
{code}
java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
{code}

OS Version:
{code}
~# uname -a
Linux ip-X-X-X-X 3.2.0-51-virtual #77-Ubuntu SMP Wed Jul 24 20:38:32 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux
~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 12.04 LTS
Release:12.04
Codename:   precise
{code}

I couldn't find anything strange in the kernel logs though. I am attaching the 
kafka logs here.

> Kafka server is hung after throwing "Attempt to swap the new high watermark 
> file with the old one failed"
> -
>
> Key: KAFKA-1466
> URL: https://issues.apache.org/jira/browse/KAFKA-1466
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
> Attachments: kafka.log.1
>
>
> We have a kafka cluster of four nodes. The cluster was down after one of the 
> nodes threw the following error:
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed. I saw the following message in the log file of the 
> failed node:
> {code}
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaServer:67 - [Kafka Server 4], 
> Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper 
> client...
> 2014-05-21 23:19:44 INFO  
> [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u
> s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread.
> 2014-05-21 23:19:44 INFO  [main-EventThread]: ClientCnxn:521 - EventThread 
> shut down
> 2014-05-21 23:19:44 INFO  [Thread-1]: ZooKeeper:544 - Session: 
> 0x1456b562865b172 closed
> 2014-05-21 23:19:44 INFO  [kafka-processor-9092-0]: Processor:67 - Closing 
> socket connection to /10.245.173.136.
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutdown completed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutted down completely
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaScheduler:67 - Shutdown Kafka 
> scheduler
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaManager:67 - [Replica Manager on 
> Broker 4]: Shut down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutting down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-3], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-2], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutdown completed
> {code} 
> I notice that since this error was logged there weren't any more logs in the 
> log file but the process was still alive, so I guess it was hung.
> The other nodes in the cluster was not able to recover from this error. The 
> partitions owned by this failed node had its leader set to -1:
> {code}
> topic: 

[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"

2014-05-23 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007651#comment-14007651
 ] 

Arup Malakar commented on KAFKA-1466:
-

[~fancyrao] couldn't reproduce the error. It went away the moment I restarted 
the kafka server and haven't seen the error since. 

There are two parts to the issue:
- We expected the rest of the kafla cluster to continue functioning, despite 
failure of one node in the cluster, but it wasn't. We were unable to push any 
message to the cluster.
- The kafka node which threw error should have died after throwing the fatal 
error so  that it would get restarted by upstart/init.d. Instead it  was just 
hung and upstart was not aware of it.

I would be happy to provide any other details that may be helpful in finding 
out the issue.

> Kafka server is hung after throwing "Attempt to swap the new high watermark 
> file with the old one failed"
> -
>
> Key: KAFKA-1466
> URL: https://issues.apache.org/jira/browse/KAFKA-1466
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
>
> We have a kafka cluster of four nodes. The cluster was down after one of the 
> nodes threw the following error:
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed. I saw the following message in the log file of the 
> failed node:
> {code}
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaServer:67 - [Kafka Server 4], 
> Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper 
> client...
> 2014-05-21 23:19:44 INFO  
> [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u
> s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread.
> 2014-05-21 23:19:44 INFO  [main-EventThread]: ClientCnxn:521 - EventThread 
> shut down
> 2014-05-21 23:19:44 INFO  [Thread-1]: ZooKeeper:544 - Session: 
> 0x1456b562865b172 closed
> 2014-05-21 23:19:44 INFO  [kafka-processor-9092-0]: Processor:67 - Closing 
> socket connection to /10.245.173.136.
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutdown completed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutted down completely
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaScheduler:67 - Shutdown Kafka 
> scheduler
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaManager:67 - [Replica Manager on 
> Broker 4]: Shut down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutting down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-3], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-2], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutdown completed
> {code} 
> I notice that since this error was logged there weren't any more logs in the 
> log file but the process was still alive, so I guess it was hung.
> The other nodes in the cluster was not able to recover from this error. The 
> partitions owned by this failed node had its leader set to -1:
> {code}
> topic: test_topic partition: 8leader: -1  replicas: 4 isr:
> {code}
> And other nodes were continuously logging the following errors in the log 
> file:
> {code}
> 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - 
> [KafkaApi-3] Error while fetching metadata for partition [test_topic,8]
> kafka.common.LeaderNotAvailableException: Leader not available for partition 
> [test_topic,8]
>   at 
> kafka.server.KafkaApis$$anonfun$17$

[jira] [Commented] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"

2014-05-22 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006813#comment-14006813
 ] 

Arup Malakar commented on KAFKA-1466:
-

No it is in linux.

> Kafka server is hung after throwing "Attempt to swap the new high watermark 
> file with the old one failed"
> -
>
> Key: KAFKA-1466
> URL: https://issues.apache.org/jira/browse/KAFKA-1466
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
>
> We have a kafka cluster of four nodes. The cluster was down after one of the 
> nodes threw the following error:
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed. I saw the following message in the log file of the 
> failed node:
> {code}
> 2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
> HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file 
> with the old one failed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaServer:67 - [Kafka Server 4], 
> Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper 
> client...
> 2014-05-21 23:19:44 INFO  
> [ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u
> s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread.
> 2014-05-21 23:19:44 INFO  [main-EventThread]: ClientCnxn:521 - EventThread 
> shut down
> 2014-05-21 23:19:44 INFO  [Thread-1]: ZooKeeper:544 - Session: 
> 0x1456b562865b172 closed
> 2014-05-21 23:19:44 INFO  [kafka-processor-9092-0]: Processor:67 - Closing 
> socket connection to /10.245.173.136.
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
> Broker 4], Shutdown completed
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutting down
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
> Request Handler on Broker 4], shutted down completely
> 2014-05-21 23:19:44 INFO  [Thread-1]: KafkaScheduler:67 - Shutdown Kafka 
> scheduler
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaManager:67 - [Replica Manager on 
> Broker 4]: Shut down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutting down
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-3], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-3], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutting down
> 2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 
> - [ReplicaFetcherThread-0-2], Stopped
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
> [ReplicaFetcherThread-0-2], Shutdown completed
> 2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
> [ReplicaFetcherManager on broker 4] shutdown completed
> {code} 
> I notice that since this error was logged there weren't any more logs in the 
> log file but the process was still alive, so I guess it was hung.
> The other nodes in the cluster was not able to recover from this error. The 
> partitions owned by this failed node had its leader set to -1:
> {code}
> topic: test_topic partition: 8leader: -1  replicas: 4 isr:
> {code}
> And other nodes were continuously logging the following errors in the log 
> file:
> {code}
> 2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - 
> [KafkaApi-3] Error while fetching metadata for partition [test_topic,8]
> kafka.common.LeaderNotAvailableException: Leader not available for partition 
> [test_topic,8]
>   at 
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474)
>   at 
> kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>   at 
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
>   at scala.collection.immutable.List.foreach(List.scala:45)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
>   at scala.collection.immutable.List.map(List.scala:45

[jira] [Created] (KAFKA-1466) Kafka server is hung after throwing "Attempt to swap the new high watermark file with the old one failed"

2014-05-22 Thread Arup Malakar (JIRA)
Arup Malakar created KAFKA-1466:
---

 Summary: Kafka server is hung after throwing "Attempt to swap the 
new high watermark file with the old one failed"
 Key: KAFKA-1466
 URL: https://issues.apache.org/jira/browse/KAFKA-1466
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Arup Malakar


We have a kafka cluster of four nodes. The cluster was down after one of the 
nodes threw the following error:
2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file with 
the old one failed. I saw the following message in the log file of the failed 
node:
{code}
2014-05-21 23:19:44 FATAL [highwatermark-checkpoint-thread1]: 
HighwaterMarkCheckpoint:109 - Attempt to swap the new high watermark file with 
the old one failed
2014-05-21 23:19:44 INFO  [Thread-1]: KafkaServer:67 - [Kafka Server 4], 
Shutting down
2014-05-21 23:19:44 INFO  [Thread-1]: KafkaZooKeeper:67 - Closing zookeeper 
client...
2014-05-21 23:19:44 INFO  
[ZkClient-EventThread-21-zoo-c2n1.us-east-1.ooyala.com,zoo-c2n2.us-east-1.ooyala.com,zoo-c2n3.us-east-1.ooyala.com,zoo-c2n4.us-east-1.ooyala.com,zoo-c2n5.u
s-east-1.ooyala.com]: ZkEventThread:82 - Terminate ZkClient event thread.
2014-05-21 23:19:44 INFO  [main-EventThread]: ClientCnxn:521 - EventThread shut 
down
2014-05-21 23:19:44 INFO  [Thread-1]: ZooKeeper:544 - Session: 
0x1456b562865b172 closed
2014-05-21 23:19:44 INFO  [kafka-processor-9092-0]: Processor:67 - Closing 
socket connection to /10.245.173.136.
2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
Broker 4], Shutting down
2014-05-21 23:19:44 INFO  [Thread-1]: SocketServer:67 - [Socket Server on 
Broker 4], Shutdown completed
2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
Request Handler on Broker 4], shutting down
2014-05-21 23:19:44 INFO  [Thread-1]: KafkaRequestHandlerPool:67 - [Kafka 
Request Handler on Broker 4], shutted down completely
2014-05-21 23:19:44 INFO  [Thread-1]: KafkaScheduler:67 - Shutdown Kafka 
scheduler
2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaManager:67 - [Replica Manager on 
Broker 4]: Shut down
2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
[ReplicaFetcherManager on broker 4] shutting down
2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
[ReplicaFetcherThread-0-3], Shutting down
2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-3]: ReplicaFetcherThread:67 - 
[ReplicaFetcherThread-0-3], Stopped
2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
[ReplicaFetcherThread-0-3], Shutdown completed
2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
[ReplicaFetcherThread-0-2], Shutting down
2014-05-21 23:19:45 INFO  [ReplicaFetcherThread-0-2]: ReplicaFetcherThread:67 - 
[ReplicaFetcherThread-0-2], Stopped
2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherThread:67 - 
[ReplicaFetcherThread-0-2], Shutdown completed
2014-05-21 23:19:45 INFO  [Thread-1]: ReplicaFetcherManager:67 - 
[ReplicaFetcherManager on broker 4] shutdown completed
{code} 

I notice that since this error was logged there weren't any more logs in the 
log file but the process was still alive, so I guess it was hung.

The other nodes in the cluster was not able to recover from this error. The 
partitions owned by this failed node had its leader set to -1:

{code}
topic: test_topic   partition: 8leader: -1  replicas: 4 isr:
{code}

And other nodes were continuously logging the following errors in the log file:
{code}
2014-05-22 20:03:28 ERROR [kafka-request-handler-7]: KafkaApis:102 - 
[KafkaApi-3] Error while fetching metadata for partition [test_topic,8]
kafka.common.LeaderNotAvailableException: Leader not available for partition 
[test_topic,8]
at 
kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:474)
at 
kafka.server.KafkaApis$$anonfun$17$$anonfun$20.apply(KafkaApis.scala:462)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at 
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
at scala.collection.immutable.List.map(List.scala:45)
at kafka.server.KafkaApis$$anonfun$17.apply(KafkaApis.scala:462)
at kafka.server.KafkaApis$$anonfun$17.apply(KafkaApis.scala:458)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at 
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:123)
at 
scala.co

[jira] [Created] (KAFKA-1455) Expose ConsumerOffsetChecker as an api instead of being command line only

2014-05-16 Thread Arup Malakar (JIRA)
Arup Malakar created KAFKA-1455:
---

 Summary: Expose ConsumerOffsetChecker as an api instead of being 
command line only
 Key: KAFKA-1455
 URL: https://issues.apache.org/jira/browse/KAFKA-1455
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: Arup Malakar
Priority: Minor


I find ConsumerOffsetChecker very useful when it comes to checking offset/lag 
for a consumer group. It would be nice if it could be exposed as a class that 
could be used from other programs instead of being only a command line too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely

2014-05-01 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated KAFKA-1146:


Attachment: KAFKA-1146.patch

> toString() on KafkaStream gets stuck indefinitely
> -
>
> Key: KAFKA-1146
> URL: https://issues.apache.org/jira/browse/KAFKA-1146
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
>Priority: Trivial
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1146.patch
>
>
> There is no toString implementation for KafkaStream, so if a user tries to 
> print the stream it falls back to default toString implementation which tries 
> to iterate over the collection and gets stuck indefinitely as it awaits 
> messages. KafkaStream could instead override the toString and return a 
> verbose description of the stream with topic name etc.
> println("Current stream: " + stream) // This call never returns



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely

2014-05-01 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated KAFKA-1146:


Status: Patch Available  (was: Open)

> toString() on KafkaStream gets stuck indefinitely
> -
>
> Key: KAFKA-1146
> URL: https://issues.apache.org/jira/browse/KAFKA-1146
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
>Reporter: Arup Malakar
>Priority: Trivial
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1146.patch
>
>
> There is no toString implementation for KafkaStream, so if a user tries to 
> print the stream it falls back to default toString implementation which tries 
> to iterate over the collection and gets stuck indefinitely as it awaits 
> messages. KafkaStream could instead override the toString and return a 
> verbose description of the stream with topic name etc.
> println("Current stream: " + stream) // This call never returns



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely

2013-11-26 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833003#comment-13833003
 ] 

Arup Malakar commented on KAFKA-1146:
-

[~jjkoshy] Yes overriding would definitely be beneficial. I can submit a patch 
for this. Any suggestion on what I could put in the toString method?

> toString() on KafkaStream gets stuck indefinitely
> -
>
> Key: KAFKA-1146
> URL: https://issues.apache.org/jira/browse/KAFKA-1146
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8
>Reporter: Arup Malakar
>Assignee: Neha Narkhede
>Priority: Trivial
> Fix For: 0.8.1
>
>
> There is no toString implementation for KafkaStream, so if a user tries to 
> print the stream it falls back to default toString implementation which tries 
> to iterate over the collection and gets stuck indefinitely as it awaits 
> messages. KafkaStream could instead override the toString and return a 
> verbose description of the stream with topic name etc.
> println("Current stream: " + stream) // This call never returns



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely

2013-11-26 Thread Arup Malakar (JIRA)
Arup Malakar created KAFKA-1146:
---

 Summary: toString() on KafkaStream gets stuck indefinitely
 Key: KAFKA-1146
 URL: https://issues.apache.org/jira/browse/KAFKA-1146
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8
Reporter: Arup Malakar
Assignee: Neha Narkhede
Priority: Trivial


There is no toString implementation for KafkaStream, so if a user tries to 
print the stream it falls back to default toString implementation which tries 
to iterate over the collection and gets stuck indefinitely as it awaits 
messages. KafkaStream could instead override the toString and return a verbose 
description of the stream with topic name etc.

println("Current stream: " + stream) // This call never returns



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1110) Unable to produce messages with snappy/gzip compression

2013-11-07 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817016#comment-13817016
 ] 

Arup Malakar commented on KAFKA-1110:
-

May be Evan would be able to provide more information. But gzip is not working 
either.

> Unable to produce messages with snappy/gzip compression
> ---
>
> Key: KAFKA-1110
> URL: https://issues.apache.org/jira/browse/KAFKA-1110
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
> Environment: Kafka version: kafka-0.8.0-beta1
> OS version: Darwin 12.4.1 Darwin Kernel Version 12.4.1: Tue May 21 17:04:50 
> PDT 2013; root:xnu-2050.40.51~1/RELEASE_X86_64 x86_64
>Reporter: Arup Malakar
> Attachments: kafka_producer_snappy_pkt_63.pcapng, 
> sarama_producer_snappy_pkt_1.pcapng
>
>
> Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is 
> following the specs as defined in: 
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
>  but messages are not getting into the kafka log file and consumers never see 
> them when gzip/snappy is used. Without compression it works fine though.
> Few observations we made:
> 1. Kafka service does have required jars to be able to interpret snappy 
> messages. When I modify ConsoleProducer to produce messages using   
> SnappyCompressionCodec instead of default GZip codec. I was able to 
> produce/consume messages. Looking at the kafka log files I see that Snappy 
> Compression was indeed getting used:
> % bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
> /tmp/kafka-logs/aruptest-0/.log | tail -1
> offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 
> compresscodec: SnappyCompressionCodec crc: 1602790249
> So I don't think it would be a case of missing jars in kafka server.
> 2. Kafka doesn't return any error if the message doesn't make it to the log 
> file. This seems pretty serious, as I would expect kafka to throw an error if 
> I am using WaitForLocal/WaitForAll.
> 3. We did an inspection of the tcp packet to see the difference between what 
> ConsoleProducer sends vs what sarama sends
> (Following is a copy/paste from a github issue):
> [~eapache] : "So I have no idea what the ConsoleProducer is actually sending 
> in this case. The outer protocol layers in both cases look identical, but if 
> you compare the actual message value:
> a. Sarama sends two bytes of snappy header and then "" (since 
> Snappy decides it's too short to properly encode, so makes it a literal). 
> Pretty straightforward.
> b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what 
> appears to be a complete embedded produce request without any compression. 
> This is neither valid snappy nor valid Kafka according to anything I've seen, 
> so I'm pretty confused. It looks almost like an incorrect version of [1] but 
> it's missing several key fields and the case of the identifying string is 
> wrong.
> 1: http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt "
> Let us know if recent changes in the codebase makes the protocol page 
> obsolete, in that case if the protocol page is updated we could update our 
> client to use the new spec.
> More information could be found in the following github issue: 
> https://github.com/Shopify/sarama/issues/32



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1110) Unable to produce messages with snappy/gzip compression

2013-10-30 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated KAFKA-1110:


Description: 
Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is 
following the specs as defined in: 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol 
but messages are not getting into the kafka log file and consumers never see 
them when gzip/snappy is used. Without compression it works fine though.

Few observations we made:
1. Kafka service does have required jars to be able to interpret snappy 
messages. When I modify ConsoleProducer to produce messages using   
SnappyCompressionCodec instead of default GZip codec. I was able to 
produce/consume messages. Looking at the kafka log files I see that Snappy 
Compression was indeed getting used:

% bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
/tmp/kafka-logs/aruptest-0/.log | tail -1

offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 
compresscodec: SnappyCompressionCodec crc: 1602790249

So I don't think it would be a case of missing jars in kafka server.

2. Kafka doesn't return any error if the message doesn't make it to the log 
file. This seems pretty serious, as I would expect kafka to throw an error if I 
am using WaitForLocal/WaitForAll.

3. We did an inspection of the tcp packet to see the difference between what 
ConsoleProducer sends vs what sarama sends
(Following is a copy/paste from a github issue):
[~eapache] : "So I have no idea what the ConsoleProducer is actually sending in 
this case. The outer protocol layers in both cases look identical, but if you 
compare the actual message value:

a. Sarama sends two bytes of snappy header and then "" (since 
Snappy decides it's too short to properly encode, so makes it a literal). 
Pretty straightforward.
b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what 
appears to be a complete embedded produce request without any compression. This 
is neither valid snappy nor valid Kafka according to anything I've seen, so I'm 
pretty confused. It looks almost like an incorrect version of [1] but it's 
missing several key fields and the case of the identifying string is wrong.

1: http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt "

Let us know if recent changes in the codebase makes the protocol page obsolete, 
in that case if the protocol page is updated we could update our client to use 
the new spec.

More information could be found in the following github issue: 
https://github.com/Shopify/sarama/issues/32

  was:
Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is 
following the specs as defined in: 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol 
but messages are not getting into the kafka log file and consumers never see 
them when gzip/snappy is used. Without compression it works fine though.

Few observations we made:
1. Kafka service does have required jars to be able to interpret snappy 
messages. When I modify ConsoleProducer to produce messages using   
SnappyCompressionCodec instead of default GZip codec. I was able to 
produce/consume messages. Looking at the kafka log files I see that Snappy 
Compression was indeed getting used:

% bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
/tmp/kafka-logs/aruptest-0/.log | tail -1

offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 
compresscodec: SnappyCompressionCodec crc: 1602790249

So I don't think it would be a case of missing jars in kafka server.

2. Kafka doesn't return any error if the message doesn't make it to the log 
file. This seems pretty serious, as I would expect kafka to throw an error if I 
am using WaitForLocal/WaitForAll.

3. We did an inspection of the tcp packet to see the difference between what 
ConsoleProducer sends vs what sarama sends
(Following is a copy/paste from a github issue):
[~eapache] : 
So I have no idea what the ConsoleProducer is actually sending in this case. 
The outer protocol layers in both cases look identical, but if you compare the 
actual message value:

a. Sarama sends two bytes of snappy header and then "" (since 
Snappy decides it's too short to properly encode, so makes it a literal). 
Pretty straightforward.
b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what 
appears to be a complete embedded produce request without any compression. This 
is neither valid snappy nor valid Kafka according to anything I've seen, so I'm 
pretty confused. It looks almost like an incorrect version of [1] but it's 
missing several key fields and the case of the identifying string is wrong.

Let us know if recent changes in the codebase makes the protocol page obsolete, 
in that case if the protocol page is updated we could update our client to use 
the new spec.

M

[jira] [Updated] (KAFKA-1110) Unable to produce messages with snappy/gzip compression

2013-10-30 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated KAFKA-1110:


Attachment: kafka_producer_snappy_pkt_63.pcapng
sarama_producer_snappy_pkt_1.pcapng

TCP dump of the sarama/ConsoleProducer packet.

> Unable to produce messages with snappy/gzip compression
> ---
>
> Key: KAFKA-1110
> URL: https://issues.apache.org/jira/browse/KAFKA-1110
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
> Environment: Kafka version: kafka-0.8.0-beta1
> OS version: Darwin 12.4.1 Darwin Kernel Version 12.4.1: Tue May 21 17:04:50 
> PDT 2013; root:xnu-2050.40.51~1/RELEASE_X86_64 x86_64
>Reporter: Arup Malakar
> Attachments: kafka_producer_snappy_pkt_63.pcapng, 
> sarama_producer_snappy_pkt_1.pcapng
>
>
> Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is 
> following the specs as defined in: 
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
>  but messages are not getting into the kafka log file and consumers never see 
> them when gzip/snappy is used. Without compression it works fine though.
> Few observations we made:
> 1. Kafka service does have required jars to be able to interpret snappy 
> messages. When I modify ConsoleProducer to produce messages using   
> SnappyCompressionCodec instead of default GZip codec. I was able to 
> produce/consume messages. Looking at the kafka log files I see that Snappy 
> Compression was indeed getting used:
> % bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
> /tmp/kafka-logs/aruptest-0/.log | tail -1
> offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 
> compresscodec: SnappyCompressionCodec crc: 1602790249
> So I don't think it would be a case of missing jars in kafka server.
> 2. Kafka doesn't return any error if the message doesn't make it to the log 
> file. This seems pretty serious, as I would expect kafka to throw an error if 
> I am using WaitForLocal/WaitForAll.
> 3. We did an inspection of the tcp packet to see the difference between what 
> ConsoleProducer sends vs what sarama sends
> (Following is a copy/paste from a github issue):
> [~eapache] : 
> So I have no idea what the ConsoleProducer is actually sending in this case. 
> The outer protocol layers in both cases look identical, but if you compare 
> the actual message value:
> a. Sarama sends two bytes of snappy header and then "" (since 
> Snappy decides it's too short to properly encode, so makes it a literal). 
> Pretty straightforward.
> b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what 
> appears to be a complete embedded produce request without any compression. 
> This is neither valid snappy nor valid Kafka according to anything I've seen, 
> so I'm pretty confused. It looks almost like an incorrect version of [1] but 
> it's missing several key fields and the case of the identifying string is 
> wrong.
> Let us know if recent changes in the codebase makes the protocol page 
> obsolete, in that case if the protocol page is updated we could update our 
> client to use the new spec.
> More information could be found in the following github issue: 
> https://github.com/Shopify/sarama/issues/32



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (KAFKA-1110) Unable to produce messages with snappy/gzip compression

2013-10-30 Thread Arup Malakar (JIRA)
Arup Malakar created KAFKA-1110:
---

 Summary: Unable to produce messages with snappy/gzip compression
 Key: KAFKA-1110
 URL: https://issues.apache.org/jira/browse/KAFKA-1110
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
 Environment: Kafka version: kafka-0.8.0-beta1
OS version: Darwin 12.4.1 Darwin Kernel Version 12.4.1: Tue May 21 17:04:50 PDT 
2013; root:xnu-2050.40.51~1/RELEASE_X86_64 x86_64
Reporter: Arup Malakar


Sarama[1] (A golang kafka library: https://github.com/Shopify/sarama) is 
following the specs as defined in: 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol 
but messages are not getting into the kafka log file and consumers never see 
them when gzip/snappy is used. Without compression it works fine though.

Few observations we made:
1. Kafka service does have required jars to be able to interpret snappy 
messages. When I modify ConsoleProducer to produce messages using   
SnappyCompressionCodec instead of default GZip codec. I was able to 
produce/consume messages. Looking at the kafka log files I see that Snappy 
Compression was indeed getting used:

% bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
/tmp/kafka-logs/aruptest-0/.log | tail -1

offset: 15 position: 18763 isvalid: true payloadsize: 52 magic: 0 
compresscodec: SnappyCompressionCodec crc: 1602790249

So I don't think it would be a case of missing jars in kafka server.

2. Kafka doesn't return any error if the message doesn't make it to the log 
file. This seems pretty serious, as I would expect kafka to throw an error if I 
am using WaitForLocal/WaitForAll.

3. We did an inspection of the tcp packet to see the difference between what 
ConsoleProducer sends vs what sarama sends
(Following is a copy/paste from a github issue):
[~eapache] : 
So I have no idea what the ConsoleProducer is actually sending in this case. 
The outer protocol layers in both cases look identical, but if you compare the 
actual message value:

a. Sarama sends two bytes of snappy header and then "" (since 
Snappy decides it's too short to properly encode, so makes it a literal). 
Pretty straightforward.
b. ConsoleProducer sends 0x82 then the string literal SNAPPY\0 then what 
appears to be a complete embedded produce request without any compression. This 
is neither valid snappy nor valid Kafka according to anything I've seen, so I'm 
pretty confused. It looks almost like an incorrect version of [1] but it's 
missing several key fields and the case of the identifying string is wrong.

Let us know if recent changes in the codebase makes the protocol page obsolete, 
in that case if the protocol page is updated we could update our client to use 
the new spec.

More information could be found in the following github issue: 
https://github.com/Shopify/sarama/issues/32



--
This message was sent by Atlassian JIRA
(v6.1#6144)