from:"Adrian McCague $JIRA$"

[jira] [Updated] (KAFKA-5998) /.checkpoint.tmp Not found exception

2019-03-04 Thread Adrian McCague (JIRA)



 [ 
https://issues.apache.org/jira/browse/KAFKA-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian McCague updated KAFKA-5998:
--
Affects Version/s: 2.1.1

> /.checkpoint.tmp Not found exception
> 
>
> Key: KAFKA-5998
> URL: https://issues.apache.org/jira/browse/KAFKA-5998
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 2.1.1
>Reporter: Yogesh BG
>Priority: Major
> Attachments: 5998.v1.txt, 5998.v2.txt, Topology.txt, exc.txt, 
> props.txt, streams.txt
>
>
> I have one kafka broker and one kafka stream running... I am running its 
> since two days under load of around 2500 msgs per second.. On third day am 
> getting below exception for some of the partitions, I have 16 partitions only 
> 0_0 and 0_1 gives this error
> {{09:43:25.955 [ks_0_inst-StreamThread-6] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> 09:43:25.974 [ks_0_inst-StreamThread-15] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>

[jira] [Created] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)

Adrian McCague created KAFKA-8042:
-

 Summary: Kafka Streams creates many segment stores on state restore
 Key: KAFKA-8042
 URL: https://issues.apache.org/jira/browse/KAFKA-8042
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.1.1, 2.1.0
Reporter: Adrian McCague
 Attachments: StateStoreSegments-StreamsConfig.txt

Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-store: 819
./KSTREAM-JOINOTHER-30-store: 819
./KSTREAM-JOINOTHER-45-store: 745
./KSTREAM-JOINTHIS-39-store: 819
./KSTREAM-JOINTHIS-44-store: 685
./KSTREAM-JOINTHIS-34-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-25-store.155146629
./KSTREAM-JOINOTHER-25-store.155155902
./KSTREAM-JOINOTHER-25-store.155149269
./KSTREAM-JOINOTHER-25-store.155154879
./KSTREAM-JOINOTHER-25-store.155169861
./KSTREAM-JOINOTHER-25-store.155153064
./KSTREAM-JOINOTHER-25-store.155148444
./KSTREAM-JOINOTHER-25-store.155155671
./KSTREAM-JOINOTHER-25-store.155168673
./KSTREAM-JOINOTHER-25-store.155159565
./KSTREAM-JOINOTHER-25-store.155175735
./KSTREAM-JOINOTHER-25-store.155168574
./KSTREAM-JOINOTHER-25-store.155163525
./KSTREAM-JOINOTHER-25-store.155165241
./KSTREAM-JOINOTHER-25-store.155146662
./KSTREAM-JOINOTHER-25-store.155178177
./KSTREAM-JOINOTHER-25-store.155158740
./KSTREAM-JOINOTHER-25-store.155168145
./KSTREAM-JOINOTHER-25-store.155166231
./KSTREAM-JOINOTHER-25-store.155172171
./KSTREAM-JOINOTHER-25-store.155175075
./KSTREAM-JOINOTHER-25-store.155163096
./KSTREAM-JOINOTHER-25-store.155161512
./KSTREAM-JOINOTHER-25-store.155179233
./KSTREAM-JOINOTHER-25-store.155146266
./KSTREAM-JOINOTHER-25-store.155153691
./KSTREAM-JOINOTHER-25-store.155159235
./KSTREAM-JOINOTHER-25-store.155152734
./KSTREAM-JOINOTHER-25-store.155160687
./KSTREAM-JOINOTHER-25-store.155174415
./KSTREAM-JOINOTHER-25-store.155150820
./KSTREAM-JOINOTHER-25-store.155148642
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)



 [ 
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian McCague updated KAFKA-8042:
--
Description: 
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-store: 819
./KSTREAM-JOINOTHER-30-store: 819
./KSTREAM-JOINOTHER-45-store: 745
./KSTREAM-JOINTHIS-39-store: 819
./KSTREAM-JOINTHIS-44-store: 685
./KSTREAM-JOINTHIS-34-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-25-store.155146629
./KSTREAM-JOINOTHER-25-store.155155902
./KSTREAM-JOINOTHER-25-store.155149269
./KSTREAM-JOINOTHER-25-store.155154879
./KSTREAM-JOINOTHER-25-store.155169861
./KSTREAM-JOINOTHER-25-store.155153064
./KSTREAM-JOINOTHER-25-store.155148444
./KSTREAM-JOINOTHER-25-store.155155671
./KSTREAM-JOINOTHER-25-store.155168673
./KSTREAM-JOINOTHER-25-store.155159565
./KSTREAM-JOINOTHER-25-store.155175735
./KSTREAM-JOINOTHER-25-store.155168574
./KSTREAM-JOINOTHER-25-store.155163525
./KSTREAM-JOINOTHER-25-store.155165241
./KSTREAM-JOINOTHER-25-store.155146662
./KSTREAM-JOINOTHER-25-store.155178177
./KSTREAM-JOINOTHER-25-store.155158740
./KSTREAM-JOINOTHER-25-store.155168145
./KSTREAM-JOINOTHER-25-store.155166231
./KSTREAM-JOINOTHER-25-store.155172171
./KSTREAM-JOINOTHER-25-store.155175075
./KSTREAM-JOINOTHER-25-store.155163096
./KSTREAM-JOINOTHER-25-store.155161512
./KSTREAM-JOINOTHER-25-store.155179233
./KSTREAM-JOINOTHER-25-store.155146266
./KSTREAM-JOINOTHER-25-store.155153691
./KSTREAM-JOINOTHER-25-store.155159235
./KSTREAM-JOINOTHER-25-store.155152734
./KSTREAM-JOINOTHER-25-store.155160687
./KSTREAM-JOINOTHER-25-store.155174415
./KSTREAM-JOINOTHER-25-store.155150820
./KSTREAM-JOINOTHER-25-store.155148642
... etc
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.

  was:
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-sto

[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrian McCague updated KAFKA-8042:
--
Description:
Note that this from the perspective of one instance of an application, where
there are 8 instances total, with partition count 8 for all topics and of
course stores. Standby replicas = 1.

In the process there are multiple instances of {{KafkaStreams}} so the below
detail is from one of these.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am
using MANIFEST files as a marker since they preallocate 4MB each). As can be
seen this topology has 5 joins - which is the extent of its state.
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-store: 819
./KSTREAM-JOINOTHER-30-store: 819
./KSTREAM-JOINOTHER-45-store: 745
./KSTREAM-JOINTHIS-39-store: 819
./KSTREAM-JOINTHIS-44-store: 685
./KSTREAM-JOINTHIS-34-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-25-store.155146629
./KSTREAM-JOINOTHER-25-store.155155902
./KSTREAM-JOINOTHER-25-store.155149269
./KSTREAM-JOINOTHER-25-store.155154879
./KSTREAM-JOINOTHER-25-store.155169861
./KSTREAM-JOINOTHER-25-store.155153064
./KSTREAM-JOINOTHER-25-store.155148444
./KSTREAM-JOINOTHER-25-store.155155671
./KSTREAM-JOINOTHER-25-store.155168673
./KSTREAM-JOINOTHER-25-store.155159565
./KSTREAM-JOINOTHER-25-store.155175735
./KSTREAM-JOINOTHER-25-store.155168574
./KSTREAM-JOINOTHER-25-store.155163525
./KSTREAM-JOINOTHER-25-store.155165241
./KSTREAM-JOINOTHER-25-store.155146662
./KSTREAM-JOINOTHER-25-store.155178177
./KSTREAM-JOINOTHER-25-store.155158740
./KSTREAM-JOINOTHER-25-store.155168145
./KSTREAM-JOINOTHER-25-store.155166231
./KSTREAM-JOINOTHER-25-store.155172171
./KSTREAM-JOINOTHER-25-store.155175075
./KSTREAM-JOINOTHER-25-store.155163096
./KSTREAM-JOINOTHER-25-store.155161512
./KSTREAM-JOINOTHER-25-store.155179233
./KSTREAM-JOINOTHER-25-store.155146266
./KSTREAM-JOINOTHER-25-store.155153691
./KSTREAM-JOINOTHER-25-store.155159235
./KSTREAM-JOINOTHER-25-store.155152734
./KSTREAM-JOINOTHER-25-store.155160687
./KSTREAM-JOINOTHER-25-store.155174415
./KSTREAM-JOINOTHER-25-store.155150820
./KSTREAM-JOINOTHER-25-store.155148642
... etc
{code}

Once re-balancing and state restoration is complete - the redundant segment
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over
the baseline 508 which can fill smaller volumes. *This means that a state
volume that would normally have ~300MB total disk usage would use in excess of
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be
active and existing throughout rebalancing. Give or take migrated tasks that
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated
amount for MANIFEST files to a lower value such as 64KB, however the number of
segment stores appears to be unbounded so disk volumes may still fill up for a
heavier application.

was:
Note that this from the perspective of one instance of an application, where
there are 8 instances total, with partition count 8 for all topics and of
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOIN

[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrian McCague updated KAFKA-8042:
--
Description:
Note that this is from the perspective of one instance of an application, where
there are 8 instances total, with partition count 8 for all topics and of
course stores. Standby replicas = 1.

In the process there are multiple instances of {{KafkaStreams}} so the below
detail is from one of these.

h2. Actual Behaviour

Once re-balancing and state restoration is complete - the redundant segment
files are deleted and the segment count drops to 508 total (where the above
mentioned state directory is one of many).

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be
active and existing throughout rebalancing. Give or take migrated tasks that
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

was:
Note that this from the perspective of one instance of an application, where
there are 8 instances total, with partition count 8 for all topics and of
course stores. Standby replicas = 1.

In the process there are multiple instances of {{KafkaStreams}} so the below
detail is from one of these.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am
using MANIFEST files as a marker since they preallocate 4MB each). As can be
seen this topo

[jira] [Commented] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-26 Thread Adrian McCague (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802020#comment-16802020
 ] 

Adrian McCague commented on KAFKA-8042:
---

This definitely applies to the 2.1 release.

KAFKA-7934 does look like a path to avoid this issue, but I feel like there is 
another problem here with regard to the number of segments being created and 
not removed. It appears the total number that could be created during 
rebalancing is arbitrary, implying to me that there is a degree of redundancy 
here. As it stands moving to this version introduces a risk of filling up our 
state volumes.

> Kafka Streams creates many segment stores on state restore
> --
>
> Key: KAFKA-8042
> URL: https://issues.apache.org/jira/browse/KAFKA-8042
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Adrian McCague
>Priority: Major
> Attachments: StateStoreSegments-StreamsConfig.txt
>
>
> Note that this is from the perspective of one instance of an application, 
> where there are 8 instances total, with partition count 8 for all topics and 
> of course stores. Standby replicas = 1.
> In the process there are multiple instances of {{KafkaStreams}} so the below 
> detail is from one of these.
> h2. Actual Behaviour
> During state restore of an application, many segment stores are created (I am 
> using MANIFEST files as a marker since they preallocate 4MB each). As can be 
> seen this topology has 5 joins - which is the extent of its state.
> {code:java}
> bash-4.2# pwd
> /data/fooapp/0_7
> bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
> ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
> .: 8058
> ./KSTREAM-JOINOTHER-25-store: 851
> ./KSTREAM-JOINOTHER-40-store: 819
> ./KSTREAM-JOINTHIS-24-store: 851
> ./KSTREAM-JOINTHIS-29-store: 836
> ./KSTREAM-JOINOTHER-35-store: 819
> ./KSTREAM-JOINOTHER-30-store: 819
> ./KSTREAM-JOINOTHER-45-store: 745
> ./KSTREAM-JOINTHIS-39-store: 819
> ./KSTREAM-JOINTHIS-44-store: 685
> ./KSTREAM-JOINTHIS-34-store: 819
> There are many (x800 as above) of these segment files:
> ./KSTREAM-JOINOTHER-25-store.155146629
> ./KSTREAM-JOINOTHER-25-store.155155902
> ./KSTREAM-JOINOTHER-25-store.155149269
> ./KSTREAM-JOINOTHER-25-store.155154879
> ./KSTREAM-JOINOTHER-25-store.155169861
> ./KSTREAM-JOINOTHER-25-store.155153064
> ./KSTREAM-JOINOTHER-25-store.155148444
> ./KSTREAM-JOINOTHER-25-store.155155671
> ./KSTREAM-JOINOTHER-25-store.155168673
> ./KSTREAM-JOINOTHER-25-store.155159565
> ./KSTREAM-JOINOTHER-25-store.155175735
> ./KSTREAM-JOINOTHER-25-store.155168574
> ./KSTREAM-JOINOTHER-25-store.155163525
> ./KSTREAM-JOINOTHER-25-store.155165241
> ./KSTREAM-JOINOTHER-25-store.155146662
> ./KSTREAM-JOINOTHER-25-store.155178177
> ./KSTREAM-JOINOTHER-25-store.155158740
> ./KSTREAM-JOINOTHER-25-store.155168145
> ./KSTREAM-JOINOTHER-25-store.155166231
> ./KSTREAM-JOINOTHER-25-store.155172171
> ./KSTREAM-JOINOTHER-25-store.155175075
> ./KSTREAM-JOINOTHER-25-store.155163096
> ./KSTREAM-JOINOTHER-25-store.155161512
> ./KSTREAM-JOINOTHER-25-store.155179233
> ./KSTREAM-JOINOTHER-25-store.155146266
> ./KSTREAM-JOINOTHER-25-store.155153691
> ./KSTREAM-JOINOTHER-25-store.155159235
> ./KSTREAM-JOINOTHER-25-store.155152734
> ./KSTREAM-JOINOTHER-25-store.155160687
> ./KSTREAM-JOINOTHER-25-store.155174415
> ./KSTREAM-JOINOTHER-25-store.155150820
> ./KSTREAM-JOINOTHER-25-store.155148642
> ... etc
> {code}
> Once re-balancing and state restoration is complete - the redundant segment 
> files are deleted and the segment count drops to 508 total (where the above 
> mentioned state directory is one of many).
> We have seen the number of these segment stores grow to as many as 15000 over 
> the baseline 508 which can fill smaller volumes. *This means that a state 
> volume that would normally have ~300MB total disk usage would use in excess 
> of 30GB during rebalancing*, mostly preallocated MANIFEST files.
> h2. Expected Behaviour
> For this particular application we expect 508 segment folders total to be 
> active and existing throughout rebalancing. Give or take migrated tasks that 
> are subject to the {{state.cleanup.delay.ms}}.
> h2. Preliminary investigation
> * This does not appear to be the case in v1.1.0. With our applicatio

[jira] [Commented] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

2017-10-25 Thread Adrian McCague (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218969#comment-16218969
 ] 

Adrian McCague commented on KAFKA-6075:
---

We have witnessed this issue as well, we develop on windows machines and our 
acceptance tests use the embedded kafka instances to run against.

We are running 0.10.2.1 and do not see this issue, if we attempt to upgrade to 
0.11.0.0 then we see this for any and all tests that use the Embedded Kafka 
instance, currently blocking our upgrade path until we figure out a workaround.

> Kafka cannot recover after an unclean shutdown on Windows
> -
>
> Key: KAFKA-6075
> URL: https://issues.apache.org/jira/browse/KAFKA-6075
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.1
>Reporter: Vahid Hashemian
>
> An unclean shutdown of broker on Windows cannot be recovered by Kafka. Steps 
> to reproduce from a fresh build:
> # Start zookeeper
> # Start a broker
> # Create a topic {{test}}
> # Do an unclean shutdown of broker (find the process id by {{wmic process 
> where "caption = 'java.exe' and commandline like '%server.properties%'" get 
> processid}}), then kill the process by {{taskkill /pid  /f}}
> # Start the broker again
> This leads to the following errors:
> {code}
> [2017-10-17 17:13:24,819] ERROR Error while loading log dir C:\tmp\kafka-logs 
> (kafka.log.LogManager)
> java.nio.file.FileSystemException: 
> C:\tmp\kafka-logs\test-0\.timeindex: The process cannot 
> access the file because it is being used by another process.
> at 
> sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
> at 
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
> at 
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
> at 
> sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
> at 
> sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
> at java.nio.file.Files.deleteIfExists(Files.java:1165)
> at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333)
> at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> at kafka.log.Log.loadSegmentFiles(Log.scala:295)
> at kafka.log.Log.loadSegments(Log.scala:404)
> at kafka.log.Log.(Log.scala:201)
> at kafka.log.Log$.apply(Log.scala:1729)
> at 
> kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:221)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$8$$anonfun$apply$16$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:292)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> [2017-10-17 17:13:24,819] ERROR Error while deleting the clean shutdown file 
> in dir C:\tmp\kafka-logs (kafka.server.LogDirFailureChannel)
> java.nio.file.FileSystemException: 
> C:\tmp\kafka-logs\test-0\.timeindex: The process cannot 
> access the file because it is being used by another process.
> at 
> sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
> at 
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
> at 
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
> at 
> sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
> at 
> sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
> at java.nio.file.Files.deleteIfExists(Files.java:1165)
> at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333)
> at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.co

[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

2017-10-25 Thread Adrian McCague (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218969#comment-16218969
 ] 

Adrian McCague edited comment on KAFKA-6075 at 10/25/17 10:48 PM:
--

We have witnessed this issue as well, we develop on windows machines and our 
acceptance tests use the embedded kafka instances to run against.

```
java.nio.file.FileSystemException: 
C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex:
 The process cannot access the file because it is being used by another process.


at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at 
sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580)
at java.nio.file.Files.walkFileTree(Files.java:2670)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:580)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88)
at kafka.utils.CoreUtils.delete(CoreUtils.scala)
at 
org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135)
at 
org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93)
```

This is the timeindex file as well that the `java.nio` util can not delete, 
potentially not all handles on this file were closed when Kafka thinks it has 
shut down.


was (Author: amccague):
We have witnessed this issue as well, we develop on windows machines and our 
acceptance tests use the embedded kafka instances to run against.

We are running 0.10.2.1 and do not see this issue, if we attempt to upgrade to 
0.11.0.0 then we see this for any and all tests that use the Embedded Kafka 
instance, currently blocking our upgrade path until we figure out a workaround.

> Kafka cannot recover after an unclean shutdown on Windows
> -
>
> Key: KAFKA-6075
> URL: https://issues.apache.org/jira/browse/KAFKA-6075
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.1
>Reporter: Vahid Hashemian
> Attachments: 6075.v4
>
>
> An unclean shutdown of broker on Windows cannot be recovered by Kafka. Steps 
> to reproduce from a fresh build:
> # Start zookeeper
> # Start a broker
> # Create a topic {{test}}
> # Do an unclean shutdown of broker (find the process id by {{wmic process 
> where "caption = 'java.exe' and commandline like '%server.properties%'" get 
> processid}}), then kill the process by {{taskkill /pid  /f}}
> # Start the broker again
> This leads to the following errors:
> {code}
> [2017-10-17 17:13:24,819] ERROR Error while loading log dir C:\tmp\kafka-logs 
> (kafka.log.LogManager)
> java.nio.file.FileSystemException: 
> C:\tmp\kafka-logs\test-0\.timeindex: The process cannot 
> access the file because it is being used by another process.
> at 
> sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
> at 
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
> at 
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
> at 
> sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
> at 
> sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
> at java.nio.file.Files.deleteIfExists(Files.java:1165)
> at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333)
> at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutab

[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

2017-10-25 Thread Adrian McCague (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218969#comment-16218969
 ] 

Adrian McCague edited comment on KAFKA-6075 at 10/25/17 10:48 PM:
--

We have witnessed this issue as well, we develop on windows machines and our 
acceptance tests use the embedded kafka instances to run against.

{code:java}
java.nio.file.FileSystemException: 
C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex:
 The process cannot access the file because it is being used by another process.


at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at 
sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580)
at java.nio.file.Files.walkFileTree(Files.java:2670)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:580)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88)
at kafka.utils.CoreUtils.delete(CoreUtils.scala)
at 
org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135)
at 
org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93)
{code}

This is the timeindex file as well that the `java.nio` util can not delete, 
potentially not all handles on this file were closed when Kafka thinks it has 
shut down.


was (Author: amccague):
We have witnessed this issue as well, we develop on windows machines and our 
acceptance tests use the embedded kafka instances to run against.

```
java.nio.file.FileSystemException: 
C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex:
 The process cannot access the file because it is being used by another process.


at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at 
sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580)
at java.nio.file.Files.walkFileTree(Files.java:2670)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:580)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88)
at kafka.utils.CoreUtils.delete(CoreUtils.scala)
at 
org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135)
at 
org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93)
```

This is the timeindex file as well that the `java.nio` util can not delete, 
potentially not all handles on this file were closed when Kafka thinks it has 
shut down.

> Kafka cannot recover after an unclean shutdown on Windows
> -
>
> Key: KAFKA-6075
> URL: https://issues.apache.org/jira/browse/KAFKA-6075
> Project: Kafka
>  Issue Type: Bug
>

[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

2017-10-25 Thread Adrian McCague (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218969#comment-16218969
 ] 

Adrian McCague edited comment on KAFKA-6075 at 10/25/17 10:49 PM:
--

We have witnessed this issue as well, we develop on windows machines and our 
acceptance tests use the {{KafkaEmbedded}} instances to run against.

{code:java}
java.nio.file.FileSystemException: 
C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex:
 The process cannot access the file because it is being used by another process.


at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at 
sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580)
at java.nio.file.Files.walkFileTree(Files.java:2670)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:580)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88)
at kafka.utils.CoreUtils.delete(CoreUtils.scala)
at 
org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135)
at 
org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93)
{code}

This is the timeindex file as well that the {{java.nio}} util can not delete, 
potentially not all handles on this file were closed when Kafka thinks it has 
shut down.


was (Author: amccague):
We have witnessed this issue as well, we develop on windows machines and our 
acceptance tests use the embedded kafka instances to run against.

{code:java}
java.nio.file.FileSystemException: 
C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex:
 The process cannot access the file because it is being used by another process.


at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at 
sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580)
at java.nio.file.Files.walkFileTree(Files.java:2670)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:580)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88)
at kafka.utils.CoreUtils.delete(CoreUtils.scala)
at 
org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135)
at 
org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93)
{code}

This is the timeindex file as well that the `java.nio` util can not delete, 
potentially not all handles on this file were closed when Kafka thinks it has 
shut down.

> Kafka cannot recover after an unclean shutdown on Windows
> -
>
> Key: KAFKA-6075
> URL: https://issues.apache.org/jira/browse/KAFKA-6075
> Project: Kafka
>  Is

[jira] [Commented] (KAFKA-6167) Timestamp on streams directory contains a colon, which is an illegal character

2017-11-03 Thread Adrian McCague (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237875#comment-16237875
 ] 

Adrian McCague commented on KAFKA-6167:
---

`.` feels quite standard here

> Timestamp on streams directory contains a colon, which is an illegal character
> --
>
> Key: KAFKA-6167
> URL: https://issues.apache.org/jira/browse/KAFKA-6167
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
> Environment: AK 1.0.0
> Kubernetes
> CoreOS
> JDK 1.8
> Windows
>Reporter: Justin Manchester
>Priority: Normal
>
> Problem:
> Development on Windows, which is not fully supported, however still a bug 
> that should be corrected.
> It looks like a timestamp was added to the streams directory using a colon as 
> separator. I believe this is an illegal character and potentially the cause 
> for the exception below.
> Error Stack:
> 2017-11-02 16:06:41 ERROR 
> [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1]
>  org.apache.kafka.streams.processor.internals.AssignedTasks:301 - 
> stream-thread 
> [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1]
>  Failed to process stream task 0_0 due to the following error: 
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=0_0, processor=KSTREAM-SOURCE-00, topic=input-a_1, 
> partition=0, offset=0 
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:232)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:403)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:317)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:942)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:822)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
>  [kafka-streams-1.0.0.jar:?] 
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error 
> opening store KSTREAM-JOINTHIS-04-store:150962400 at location 
> C:\Users\ADRIAN~1.MCC\AppData\Local\Temp\kafka3548813472740086814\StreamDeduplicatorAcceptanceTest1\0_0\KSTREAM-JOINTHIS-04-store\KSTREAM-JOINTHIS-04-store:150962400
>  
> at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:204)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:174)
>  ~[kafka-streams-1.0.0.jar:?] 
> at org.apache.kafka.streams.state.internals.Segment.openDB(Segment.java:40) 
> ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.Segments.getOrCreateSegment(Segments.java:89)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(RocksDBSegmentedBytesStore.java:81)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:43)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:34)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:67)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:33)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:96)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:89)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:64)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:12

[jira] [Comment Edited] (KAFKA-6167) Timestamp on streams directory contains a colon, which is an illegal character

2017-11-03 Thread Adrian McCague (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237875#comment-16237875
 ] 

Adrian McCague edited comment on KAFKA-6167 at 11/3/17 4:39 PM:


{code:java}.{code} feels quite standard here


was (Author: amccague):
`.` feels quite standard here

> Timestamp on streams directory contains a colon, which is an illegal character
> --
>
> Key: KAFKA-6167
> URL: https://issues.apache.org/jira/browse/KAFKA-6167
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
> Environment: AK 1.0.0
> Kubernetes
> CoreOS
> JDK 1.8
> Windows
>Reporter: Justin Manchester
>Priority: Normal
>
> Problem:
> Development on Windows, which is not fully supported, however still a bug 
> that should be corrected.
> It looks like a timestamp was added to the streams directory using a colon as 
> separator. I believe this is an illegal character and potentially the cause 
> for the exception below.
> Error Stack:
> 2017-11-02 16:06:41 ERROR 
> [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1]
>  org.apache.kafka.streams.processor.internals.AssignedTasks:301 - 
> stream-thread 
> [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1]
>  Failed to process stream task 0_0 due to the following error: 
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=0_0, processor=KSTREAM-SOURCE-00, topic=input-a_1, 
> partition=0, offset=0 
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:232)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:403)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:317)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:942)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:822)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
>  [kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
>  [kafka-streams-1.0.0.jar:?] 
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error 
> opening store KSTREAM-JOINTHIS-04-store:150962400 at location 
> C:\Users\ADRIAN~1.MCC\AppData\Local\Temp\kafka3548813472740086814\StreamDeduplicatorAcceptanceTest1\0_0\KSTREAM-JOINTHIS-04-store\KSTREAM-JOINTHIS-04-store:150962400
>  
> at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:204)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:174)
>  ~[kafka-streams-1.0.0.jar:?] 
> at org.apache.kafka.streams.state.internals.Segment.openDB(Segment.java:40) 
> ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.Segments.getOrCreateSegment(Segments.java:89)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(RocksDBSegmentedBytesStore.java:81)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:43)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:34)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:67)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:33)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:96)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:89)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:64)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
>  ~[kafka-streams-1.0.0.jar:?] 
> at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
>  ~[kafk

[jira] [Commented] (KAFKA-6767) OffsetCheckpoint write assumes parent directory exists

2018-07-24 Thread Adrian McCague (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554059#comment-16554059
 ] 

Adrian McCague commented on KAFKA-6767:
---

Hi [~guozhang] we are seeing this issue as well on a relatively frequent basis 
(Streams 1.1.0) here is my case:
{code:java}
task [1_1] Failed to write offset checkpoint file to 
/data/my-topology/1_1/.checkpoint: {}  
java.io.FileNotFoundException: /data/my-topology/1_1/.checkpoint.tmp (No such 
file or directory) at java.io.FileOutputStream.open0(FileOutputStream.java) at 
java.io.FileOutputStream.open(FileOutputStream.java:270) at 
java.io.FileOutputStream.(FileOutputStream.java:213) at 
java.io.FileOutputStream.(FileOutputStream.java:162) at 
org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:78)
 at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:320)
 at 
org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:314)
 at 
org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
 at 
org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:307)
 at 
org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:297)
 at 
org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:67)
 at 
org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:357)
 at 
org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:347)
 at 
org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:403)
 at 
org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:994)
 at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:811)
 at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:750)
 at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720){code}
This is the sub-topology:
{code:java}
Sub-topology: 1
Source: KSTREAM-SOURCE-11 (topics: [x])
--> KSTREAM-PEEK-12
Source: KSTREAM-SOURCE-15 (topics: [y])
--> KSTREAM-PEEK-16
Processor: KSTREAM-PEEK-12 (stores: [])
--> KSTREAM-KEY-SELECT-13
<-- KSTREAM-SOURCE-11
Processor: KSTREAM-PEEK-16 (stores: [])
--> KSTREAM-KEY-SELECT-17
<-- KSTREAM-SOURCE-15
Processor: KSTREAM-KEY-SELECT-13 (stores: [])
--> KSTREAM-MAPVALUES-14
<-- KSTREAM-PEEK-12
Processor: KSTREAM-KEY-SELECT-17 (stores: [])
--> KSTREAM-MAPVALUES-18
<-- KSTREAM-PEEK-16
Processor: KSTREAM-MAPVALUES-14 (stores: [])
--> KSTREAM-MERGE-19
<-- KSTREAM-KEY-SELECT-13
Processor: KSTREAM-MAPVALUES-18 (stores: [])
--> KSTREAM-MERGE-19
<-- KSTREAM-KEY-SELECT-17
Processor: KSTREAM-MERGE-19 (stores: [])
--> KSTREAM-FILTER-22
<-- KSTREAM-MAPVALUES-14, KSTREAM-MAPVALUES-18
Processor: KSTREAM-FILTER-22 (stores: [])
--> KSTREAM-SINK-21
<-- KSTREAM-MERGE-19
Sink: KSTREAM-SINK-21 (topic: z-store-repartition)
<-- KSTREAM-FILTER-22{code}
So I believe this supports your theory that stateless tasks are attempting to 
checkpoint. In this case it appears the final sink is related to a repartition 
before a DSL Aggregate, which may hint towards the bug.

> OffsetCheckpoint write assumes parent directory exists
> --
>
> Key: KAFKA-6767
> URL: https://issues.apache.org/jira/browse/KAFKA-6767
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Steven Schlansker
>Priority: Minor
>
> We run Kafka Streams with RocksDB state stores on ephemeral disks (i.e. if an 
> instance dies it is created from scratch, rather than reusing the existing 
> RocksDB.)
> We routinely see:
> {code:java}
> 2018-04-09T19:14:35.004Z WARN <> 
> [chat-0319e3c3-d8b2-4c60-bd69-a8484d8d4435-StreamThread-1] 
> o.a.k.s.p.i.ProcessorStateManager - task [0_11] Failed to write offset 
> checkpoint file to /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint: {}
> java.io.FileNotFoundException: 
> /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at java.io.FileOutputStream.(FileOutputStream.java:162)
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:78)
> at 
> org.apache.kafka.streams.processor.internals.ProcessorSta

[jira] [Commented] (KAFKA-6767) OffsetCheckpoint write assumes parent directory exists

2018-07-24 Thread Adrian McCague (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554541#comment-16554541
 ] 

Adrian McCague commented on KAFKA-6767:
---

Also perhaps a coincidence but topology was running fine for a month before a 
rebalance occurred, at which point these errors started cropping up

> OffsetCheckpoint write assumes parent directory exists
> --
>
> Key: KAFKA-6767
> URL: https://issues.apache.org/jira/browse/KAFKA-6767
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Steven Schlansker
>Priority: Minor
>
> We run Kafka Streams with RocksDB state stores on ephemeral disks (i.e. if an 
> instance dies it is created from scratch, rather than reusing the existing 
> RocksDB.)
> We routinely see:
> {code:java}
> 2018-04-09T19:14:35.004Z WARN <> 
> [chat-0319e3c3-d8b2-4c60-bd69-a8484d8d4435-StreamThread-1] 
> o.a.k.s.p.i.ProcessorStateManager - task [0_11] Failed to write offset 
> checkpoint file to /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint: {}
> java.io.FileNotFoundException: 
> /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at java.io.FileOutputStream.(FileOutputStream.java:162)
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:78)
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:320)
> at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:314)
> at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:307)
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:297)
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:67)
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:357)
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:347)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:403)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:994)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:811)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:750)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720){code}
> Inspecting the state store directory, I can indeed see that {{chat/0_11}} 
> does not exist (although many other partitions do).
>  
> Looking at the OffsetCheckpoint write method, it seems to try to open a new 
> checkpoint file without first ensuring that the parent directory exists.
>  
> {code:java}
>     public void write(final Map offsets) throws 
> IOException {
>     // if there is no offsets, skip writing the file to save disk IOs
>     if (offsets.isEmpty()) {
>     return;
>     }
>     synchronized (lock) {
>     // write to temp file and then swap with the existing file
>     final File temp = new File(file.getAbsolutePath() + ".tmp");{code}
>  
> Either the OffsetCheckpoint class should initialize the directories if 
> needed, or some precondition of it being called should ensure that is the 
> case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KAFKA-5998) /.checkpoint.tmp Not found exception

[jira] [Created] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

[jira] [Commented] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

[jira] [Commented] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows

[jira] [Commented] (KAFKA-6167) Timestamp on streams directory contains a colon, which is an illegal character

[jira] [Comment Edited] (KAFKA-6167) Timestamp on streams directory contains a colon, which is an illegal character

[jira] [Commented] (KAFKA-6767) OffsetCheckpoint write assumes parent directory exists

[jira] [Commented] (KAFKA-6767) OffsetCheckpoint write assumes parent directory exists

14 matches

Site Navigation

Mail list logo

Footer information