[ https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eno Thereska reassigned KAFKA-5038: ----------------------------------- Assignee: Eno Thereska > running multiple kafka streams instances causes one or more instance to get > into file contention > ------------------------------------------------------------------------------------------------ > > Key: KAFKA-5038 > URL: https://issues.apache.org/jira/browse/KAFKA-5038 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.0 > Environment: 3 Kafka broker machines and 3 kafka streams machines. > Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in > AWS > 31GB java heap space allocated to each KafkaStreams instance and 4GB > allocated to each Kafka broker. > Reporter: Bharad Tirumala > Assignee: Eno Thereska > > Having multiple kafka streams application instances causes one or more > instances to get get into file lock contention and the instance(s) become > unresponsive with uncaught exception. > The exception is below: > 22:14:37.621 [StreamThread-7] WARN o.a.k.s.p.internals.StreamThread - > Unexpected state transition from RUNNING to NOT_RUNNING > 22:14:37.621 [StreamThread-13] WARN o.a.k.s.p.internals.StreamThread - > Unexpected state transition from RUNNING to NOT_RUNNING > 22:14:37.623 [StreamThread-18] WARN o.a.k.s.p.internals.StreamThread - > Unexpected state transition from RUNNING to NOT_RUNNING > 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught > Exception:org.apache.kafka.streams.errors.ProcessorStateException: task > directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and > couldn't be created > at > org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75) > at > org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102) > at > org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205) > at > org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368) > This happens within couple of minutes after the instances are up and there is > NO data being sent to the broker yet and the streams app is started with > auto.offset.reset set to "latest". > Please note that there are no permissions or capacity issues. This may have > nothing to do with number of instances, but I could easily reproduce it when > I've 3 stream instances running. This is similar to the (and may be the same) > bug as [KAFKA-3758] > Here are some relevant configuration info: > 3 kafka brokers have one topic with 128 partitions and 1 replication > 3 kafka streams applications (running on 3 machines) have a single processor > topology and this processor is not doing anything (the process() method just > returns and the punctuate method just commits) > There is no data flowing yet, so the process() and puctuate() methods are not > even called yet. > The 3 kafka stream instances have 43, 43 and 42 threads each respectively > (totally making up to 128 threads, so one task per thread distributed across > three streams instances on 3 machines). > Here are the configurations that I'd played around with: > session.timeout.ms=300000 > heartbeat.interval.ms=60000 > max.poll.records=100 > num.standby.replicas=1 > commit.interval.ms=10000 > poll.ms=100 > When punctuate is scheduled to be called every 1000ms or 3000ms, the problem > happens every time. If punctuate is scheduled for 5000ms, I didn't see the > problem in my test scenario (described above), but it happened in my real > application. But this may have nothing to do with the issue, since punctuate > is not even called as there are no messages streaming through yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)