[ 
https://issues.apache.org/jira/browse/KAFKA-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279945#comment-15279945
 ] 

Rajini Sivaram commented on KAFKA-3689:
---------------------------------------

[~buvana.rama...@nokia.com] Were there any exceptions/errors in the logs before 
the first occurrence of this sequence of errors? Am I right in assuming that 
you are using PLAINTEXT? [~ijuma] Looking at the code, I am not sure if an 
issue with {{SocketServer}} alone would cause what looks a tight loop of 
errors. The errors seem too close to each other to correspond to new 
connections each time. If one connection resulted in a loop of millions of 
errors, I wonder whether {{Selector}} got itself into an inconsistent state.  
Since {{Selector.disconnected}} is cleared for every poll, and addition of 
entries to {{Selector.disconnected}} is always accompanied by a 
{{channel.close}} which cancels the key, I wonder whether there was some close 
exception that resulted in this state. {{PlaintextTransportLayer.close()}} 
doesn't cancel the key if the socket close throws an IOException, which we 
should probably fix anyway. But since an exception would have been logged if 
that was the case, it will be good to know if there were any errors in the logs 
prior to this exception sequence.

> ERROR Processor got uncaught exception. (kafka.network.Processor)
> -----------------------------------------------------------------
>
>                 Key: KAFKA-3689
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3689
>             Project: Kafka
>          Issue Type: Bug
>          Components: network
>    Affects Versions: 0.9.0.1
>         Environment: ubuntu 14.04,
> java version "1.7.0_95"
> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.2)
> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
> 3 broker cluster (all 3 servers identical -  Intel Xeon E5-2670 @2.6GHz, 
> 8cores, 16 threads 64 GB RAM & 1 TB Disk)
> Kafka Cluster is managed by 3 server ZK cluster (these servers are different 
> from Kafka broker servers). All 6 servers are connected via 10G switch. 
> Producers run from external servers.
>            Reporter: Buvaneswari Ramanan
>            Assignee: Jun Rao
>            Priority: Minor
>             Fix For: 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.11.0.0, 0.10.0.1, 0.9.0.2
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As per Ismael Juma's suggestion in email thread to us...@kafka.apache.org 
> with the same subject, I am creating this bug report.
> The following error occurs in one of the brokers in our 3 broker cluster, 
> which serves about 8000 topics. These topics are single partitioned with a 
> replication factor = 3. Each topic gets data at a low rate  – 200 bytes per 
> sec.  Leaders are balanced across the topics.
> Producers run from external servers (4 Ubuntu servers with same config as the 
> brokers), each producing to 2000 topics utilizing kafka-python library.
> This error message occurs repeatedly in one of the servers. Between the hours 
> of 10:30am and 1:30pm on 5/9/16, there were about 10 Million such 
> occurrences. This was right after a cluster restart.
> This is not the first time we got this error in this broker. In those 
> instances, error occurred hours / days after cluster restart.
> =====================================================
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144 (actual network address 
> masked)
>         at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
>         at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
>         at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>         at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>         at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
>         at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
>         at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>         at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>         at kafka.network.Processor.run(SocketServer.scala:445)
>         at java.lang.Thread.run(Thread.java:745)
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144
>         at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
>         at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
>         at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>         at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>         at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
>         at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
>         at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>         at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>         at kafka.network.Processor.run(SocketServer.scala:445)
>         at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to