[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966069#comment-15966069
 ] 

ASF GitHub Bot commented on KAFKA-5038:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2848


> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
>Assignee: Eno Thereska
> Fix For: 0.11.0.0, 0.10.2.1
>
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965971#comment-15965971
 ] 

ASF GitHub Bot commented on KAFKA-5038:
---

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2848

KAFKA-5038: Catch exception

Porting from 0.10.2 PR https://github.com/apache/kafka/pull/2841

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-5038-trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2848


commit 2f2be04147d2bc844cdfa56a4fe6a5235733f07b
Author: Eno Thereska 
Date:   2017-04-12T14:36:16Z

Porting from 0.10.2




> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
>Assignee: Eno Thereska
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atla

[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965958#comment-15965958
 ] 

ASF GitHub Bot commented on KAFKA-5038:
---

Github user enothereska closed the pull request at:

https://github.com/apache/kafka/pull/2841


> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
>Assignee: Eno Thereska
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-11 Thread Bharad Tirumala (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965448#comment-15965448
 ] 

Bharad Tirumala commented on KAFKA-5038:


[~enothereska], I tried the fix and it seems to be working. Appreciate the 
quick help.


> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
>Assignee: Eno Thereska
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-11 Thread Eno Thereska (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964883#comment-15964883
 ] 

Eno Thereska commented on KAFKA-5038:
-

[~btirumala] thank you for posting. Looks like it could be a bug. I've opened 
at PR: https://github.com/apache/kafka/pull/2841. Would you mind trying to see 
if it fixes the problem? Thanks.


> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964881#comment-15964881
 ] 

ASF GitHub Bot commented on KAFKA-5038:
---

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2841

KAFKA-5038: Catch exception



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-5038

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2841.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2841


commit b484904a74458dd20d3b7b0deb26f822feca9140
Author: Eno Thereska 
Date:   2017-04-11T19:49:28Z

Catch exception




> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-06 Thread Bharad Tirumala (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960352#comment-15960352
 ] 

Bharad Tirumala commented on KAFKA-5038:


btw, this happens with as few as 5 threads per streams instance. The tasks get 
distributed across all the 15 threads in the 3 instances before this file lock 
contention (mkdir in directoryForTask() failing).

I've upload the test program with the relevant properties file and also the 
logs on the instance when the issue happened.
(I've masked the actual server urls in the logs and in the properties file)

test code and properties file at:
https://gist.github.com/btirumala/e996b2d09415f3ba42ad58dac4a64507

The debug logs of a failed instance at:
https://gist.github.com/btirumala/4516b3cd31271fab3eaf390a483f07fb


> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5038) running multiple kafka streams instances causes one or more instance to get into file contention

2017-04-06 Thread Bharad Tirumala (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960123#comment-15960123
 ] 

Bharad Tirumala commented on KAFKA-5038:


Tried it with 0.10.2.1 build with PR #2793 from enothereska/KAFKA-4916-0.10.2
and the issue happened again...

> running multiple kafka streams instances causes one or more instance to get 
> into file contention
> 
>
> Key: KAFKA-5038
> URL: https://issues.apache.org/jira/browse/KAFKA-5038
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: 3 Kafka broker machines and 3 kafka streams machines.
> Each machine is Linux 64 bit, CentOS 6.5 with 64GB memory, 8 vCPUs running in 
> AWS
> 31GB java heap space allocated to each KafkaStreams instance and 4GB 
> allocated to each Kafka broker.
>Reporter: Bharad Tirumala
> Fix For: 0.10.2.0
>
>
> Having multiple kafka streams application instances causes one or more 
> instances to get get into file lock contention and the instance(s) become 
> unresponsive with uncaught exception.
> The exception is below:
> 22:14:37.621 [StreamThread-7] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.621 [StreamThread-13] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.623 [StreamThread-18] WARN  o.a.k.s.p.internals.StreamThread - 
> Unexpected state transition from RUNNING to NOT_RUNNING
> 22:14:37.625 [StreamThread-7] ERROR n.a.a.k.t.KStreamTopologyBase - Uncaught 
> Exception:org.apache.kafka.streams.errors.ProcessorStateException: task 
> directory [/data/kafka-streams/rtp-kstreams-metrics/0_119] doesn't exist and 
> couldn't be created
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.directoryForTask(StateDirectory.java:75)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.lock(StateDirectory.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:205)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeClean(StreamThread.java:753)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:664)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> This happens within couple of minutes after the instances are up and there is 
> NO data being sent to the broker yet and the streams app is started with 
> auto.offset.reset set to "latest".
> Please note that there are no permissions or capacity issues. This may have 
> nothing to do with number of instances, but I could easily reproduce it when 
> I've 3 stream instances running. This is similar to the (and may be the same) 
> bug as [KAFKA-3758]
> Here are some relevant configuration info:
> 3 kafka brokers have one topic with 128 partitions and 1 replication
> 3 kafka streams applications (running on 3 machines) have a single processor 
> topology and this processor is not doing anything (the process() method just 
> returns and the punctuate method just commits)
> There is no data flowing yet, so the process() and puctuate() methods are not 
> even called yet.
> The 3 kafka stream instances have 43, 43 and 42 threads each respectively 
> (totally making up to 128 threads, so one task per thread distributed across 
> three streams instances on 3 machines).
> Here are the configurations that I'd played around with:
> session.timeout.ms=30
> heartbeat.interval.ms=6
> max.poll.records=100
> num.standby.replicas=1
> commit.interval.ms=1
> poll.ms=100
> When punctuate is scheduled to be called every 1000ms or 3000ms, the problem 
> happens every time. If punctuate is scheduled for 5000ms, I didn't see the 
> problem in my test scenario (described above), but it happened in my real 
> application. But this may have nothing to do with the issue, since punctuate 
> is not even called as there are no messages streaming through yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)