[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944835#comment-16944835
 ] 

Sophie Blee-Goldman commented on KAFKA-8970:
--------------------------------------------

What is the motivation for running two instances instead of using two threads 
(I assume they're both running with the same app.id, or is that incorrect?). I 
agree that having each use a separate state directory is not ideal, since they 
then won't be able to share state and will have to rebuild from scratch if a 
task is migrated from one instance to another. 

That said, I agree it is actually not safe to share the state dir between 
different KafkaStreams. The reason is that each task has a subdirectory that is 
locked by the owning thread to prevent access by others, and each KafkaStreams 
has a StateDirectory object keeping track of these with a Map<TaskId, 
OwnerAndLock>. If you have two KafkaStreams, they will each have a separate 
StateDirectory object (even if the actual underlying state.dir is the same) and 
each one will be completely unaware of the locks owned by the other thread 
(from the other KafkaStreams)

> StateDirectory creation fails with Exception
> --------------------------------------------
>
>                 Key: KAFKA-8970
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8970
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Nishkam Ravi
>            Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to