[jira] [Commented] (KAFKA-8042) Kafka Streams creates many segment stores on state restore
[ https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802020#comment-16802020 ] Adrian McCague commented on KAFKA-8042: --- This definitely applies to the 2.1 release. KAFKA-7934 does look like a path to avoid this issue, but I feel like there is another problem here with regard to the number of segments being created and not removed. It appears the total number that could be created during rebalancing is arbitrary, implying to me that there is a degree of redundancy here. As it stands moving to this version introduces a risk of filling up our state volumes. > Kafka Streams creates many segment stores on state restore > -- > > Key: KAFKA-8042 > URL: https://issues.apache.org/jira/browse/KAFKA-8042 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.1.0, 2.1.1 >Reporter: Adrian McCague >Priority: Major > Attachments: StateStoreSegments-StreamsConfig.txt > > > Note that this is from the perspective of one instance of an application, > where there are 8 instances total, with partition count 8 for all topics and > of course stores. Standby replicas = 1. > In the process there are multiple instances of {{KafkaStreams}} so the below > detail is from one of these. > h2. Actual Behaviour > During state restore of an application, many segment stores are created (I am > using MANIFEST files as a marker since they preallocate 4MB each). As can be > seen this topology has 5 joins - which is the extent of its state. > {code:java} > bash-4.2# pwd > /data/fooapp/0_7 > bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find > ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done > .: 8058 > ./KSTREAM-JOINOTHER-25-store: 851 > ./KSTREAM-JOINOTHER-40-store: 819 > ./KSTREAM-JOINTHIS-24-store: 851 > ./KSTREAM-JOINTHIS-29-store: 836 > ./KSTREAM-JOINOTHER-35-store: 819 > ./KSTREAM-JOINOTHER-30-store: 819 > ./KSTREAM-JOINOTHER-45-store: 745 > ./KSTREAM-JOINTHIS-39-store: 819 > ./KSTREAM-JOINTHIS-44-store: 685 > ./KSTREAM-JOINTHIS-34-store: 819 > There are many (x800 as above) of these segment files: > ./KSTREAM-JOINOTHER-25-store.155146629 > ./KSTREAM-JOINOTHER-25-store.155155902 > ./KSTREAM-JOINOTHER-25-store.155149269 > ./KSTREAM-JOINOTHER-25-store.155154879 > ./KSTREAM-JOINOTHER-25-store.155169861 > ./KSTREAM-JOINOTHER-25-store.155153064 > ./KSTREAM-JOINOTHER-25-store.155148444 > ./KSTREAM-JOINOTHER-25-store.155155671 > ./KSTREAM-JOINOTHER-25-store.155168673 > ./KSTREAM-JOINOTHER-25-store.155159565 > ./KSTREAM-JOINOTHER-25-store.155175735 > ./KSTREAM-JOINOTHER-25-store.155168574 > ./KSTREAM-JOINOTHER-25-store.155163525 > ./KSTREAM-JOINOTHER-25-store.155165241 > ./KSTREAM-JOINOTHER-25-store.155146662 > ./KSTREAM-JOINOTHER-25-store.155178177 > ./KSTREAM-JOINOTHER-25-store.155158740 > ./KSTREAM-JOINOTHER-25-store.155168145 > ./KSTREAM-JOINOTHER-25-store.155166231 > ./KSTREAM-JOINOTHER-25-store.155172171 > ./KSTREAM-JOINOTHER-25-store.155175075 > ./KSTREAM-JOINOTHER-25-store.155163096 > ./KSTREAM-JOINOTHER-25-store.155161512 > ./KSTREAM-JOINOTHER-25-store.155179233 > ./KSTREAM-JOINOTHER-25-store.155146266 > ./KSTREAM-JOINOTHER-25-store.155153691 > ./KSTREAM-JOINOTHER-25-store.155159235 > ./KSTREAM-JOINOTHER-25-store.155152734 > ./KSTREAM-JOINOTHER-25-store.155160687 > ./KSTREAM-JOINOTHER-25-store.155174415 > ./KSTREAM-JOINOTHER-25-store.155150820 > ./KSTREAM-JOINOTHER-25-store.155148642 > ... etc > {code} > Once re-balancing and state restoration is complete - the redundant segment > files are deleted and the segment count drops to 508 total (where the above > mentioned state directory is one of many). > We have seen the number of these segment stores grow to as many as 15000 over > the baseline 508 which can fill smaller volumes. *This means that a state > volume that would normally have ~300MB total disk usage would use in excess > of 30GB during rebalancing*, mostly preallocated MANIFEST files. > h2. Expected Behaviour > For this particular application we expect 508 segment folders total to be > active and existing throughout rebalancing. Give or take migrated tasks that > are subject to the {{state.cleanup.delay.ms}}. > h2. Preliminary investigation > * This does not appear to be the case in v1.1.0. With our application the > number
[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore
[ https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian McCague updated KAFKA-8042: -- Description: Note that this is from the perspective of one instance of an application, where there are 8 instances total, with partition count 8 for all topics and of course stores. Standby replicas = 1. In the process there are multiple instances of {{KafkaStreams}} so the below detail is from one of these. h2. Actual Behaviour During state restore of an application, many segment stores are created (I am using MANIFEST files as a marker since they preallocate 4MB each). As can be seen this topology has 5 joins - which is the extent of its state. {code:java} bash-4.2# pwd /data/fooapp/0_7 bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done .: 8058 ./KSTREAM-JOINOTHER-25-store: 851 ./KSTREAM-JOINOTHER-40-store: 819 ./KSTREAM-JOINTHIS-24-store: 851 ./KSTREAM-JOINTHIS-29-store: 836 ./KSTREAM-JOINOTHER-35-store: 819 ./KSTREAM-JOINOTHER-30-store: 819 ./KSTREAM-JOINOTHER-45-store: 745 ./KSTREAM-JOINTHIS-39-store: 819 ./KSTREAM-JOINTHIS-44-store: 685 ./KSTREAM-JOINTHIS-34-store: 819 There are many (x800 as above) of these segment files: ./KSTREAM-JOINOTHER-25-store.155146629 ./KSTREAM-JOINOTHER-25-store.155155902 ./KSTREAM-JOINOTHER-25-store.155149269 ./KSTREAM-JOINOTHER-25-store.155154879 ./KSTREAM-JOINOTHER-25-store.155169861 ./KSTREAM-JOINOTHER-25-store.155153064 ./KSTREAM-JOINOTHER-25-store.155148444 ./KSTREAM-JOINOTHER-25-store.155155671 ./KSTREAM-JOINOTHER-25-store.155168673 ./KSTREAM-JOINOTHER-25-store.155159565 ./KSTREAM-JOINOTHER-25-store.155175735 ./KSTREAM-JOINOTHER-25-store.155168574 ./KSTREAM-JOINOTHER-25-store.155163525 ./KSTREAM-JOINOTHER-25-store.155165241 ./KSTREAM-JOINOTHER-25-store.155146662 ./KSTREAM-JOINOTHER-25-store.155178177 ./KSTREAM-JOINOTHER-25-store.155158740 ./KSTREAM-JOINOTHER-25-store.155168145 ./KSTREAM-JOINOTHER-25-store.155166231 ./KSTREAM-JOINOTHER-25-store.155172171 ./KSTREAM-JOINOTHER-25-store.155175075 ./KSTREAM-JOINOTHER-25-store.155163096 ./KSTREAM-JOINOTHER-25-store.155161512 ./KSTREAM-JOINOTHER-25-store.155179233 ./KSTREAM-JOINOTHER-25-store.155146266 ./KSTREAM-JOINOTHER-25-store.155153691 ./KSTREAM-JOINOTHER-25-store.155159235 ./KSTREAM-JOINOTHER-25-store.155152734 ./KSTREAM-JOINOTHER-25-store.155160687 ./KSTREAM-JOINOTHER-25-store.155174415 ./KSTREAM-JOINOTHER-25-store.155150820 ./KSTREAM-JOINOTHER-25-store.155148642 ... etc {code} Once re-balancing and state restoration is complete - the redundant segment files are deleted and the segment count drops to 508 total (where the above mentioned state directory is one of many). We have seen the number of these segment stores grow to as many as 15000 over the baseline 508 which can fill smaller volumes. *This means that a state volume that would normally have ~300MB total disk usage would use in excess of 30GB during rebalancing*, mostly preallocated MANIFEST files. h2. Expected Behaviour For this particular application we expect 508 segment folders total to be active and existing throughout rebalancing. Give or take migrated tasks that are subject to the {{state.cleanup.delay.ms}}. h2. Preliminary investigation * This does not appear to be the case in v1.1.0. With our application the number of state directories only grows to 670 (over the base line 508) * The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many segment stores. * Suspect https://github.com/apache/kafka/pull/5253 to be the source of this change of behaviour. A workaround is to use {{rocksdb.config.setter}} and set the preallocated amount for MANIFEST files to a lower value such as 64KB, however the number of segment stores appears to be unbounded so disk volumes may still fill up for a heavier application. was: Note that this from the perspective of one instance of an application, where there are 8 instances total, with partition count 8 for all topics and of course stores. Standby replicas = 1. In the process there are multiple instances of {{KafkaStreams}} so the below detail is from one of these. h2. Actual Behaviour During state restore of an application, many segment stores are created (I am using MANIFEST files as a marker since they preallocate 4MB each). As can be seen this
[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore
[ https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian McCague updated KAFKA-8042: -- Description: Note that this from the perspective of one instance of an application, where there are 8 instances total, with partition count 8 for all topics and of course stores. Standby replicas = 1. In the process there are multiple instances of {{KafkaStreams}} so the below detail is from one of these. h2. Actual Behaviour During state restore of an application, many segment stores are created (I am using MANIFEST files as a marker since they preallocate 4MB each). As can be seen this topology has 5 joins - which is the extent of its state. {code:java} bash-4.2# pwd /data/fooapp/0_7 bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done .: 8058 ./KSTREAM-JOINOTHER-25-store: 851 ./KSTREAM-JOINOTHER-40-store: 819 ./KSTREAM-JOINTHIS-24-store: 851 ./KSTREAM-JOINTHIS-29-store: 836 ./KSTREAM-JOINOTHER-35-store: 819 ./KSTREAM-JOINOTHER-30-store: 819 ./KSTREAM-JOINOTHER-45-store: 745 ./KSTREAM-JOINTHIS-39-store: 819 ./KSTREAM-JOINTHIS-44-store: 685 ./KSTREAM-JOINTHIS-34-store: 819 There are many (x800 as above) of these segment files: ./KSTREAM-JOINOTHER-25-store.155146629 ./KSTREAM-JOINOTHER-25-store.155155902 ./KSTREAM-JOINOTHER-25-store.155149269 ./KSTREAM-JOINOTHER-25-store.155154879 ./KSTREAM-JOINOTHER-25-store.155169861 ./KSTREAM-JOINOTHER-25-store.155153064 ./KSTREAM-JOINOTHER-25-store.155148444 ./KSTREAM-JOINOTHER-25-store.155155671 ./KSTREAM-JOINOTHER-25-store.155168673 ./KSTREAM-JOINOTHER-25-store.155159565 ./KSTREAM-JOINOTHER-25-store.155175735 ./KSTREAM-JOINOTHER-25-store.155168574 ./KSTREAM-JOINOTHER-25-store.155163525 ./KSTREAM-JOINOTHER-25-store.155165241 ./KSTREAM-JOINOTHER-25-store.155146662 ./KSTREAM-JOINOTHER-25-store.155178177 ./KSTREAM-JOINOTHER-25-store.155158740 ./KSTREAM-JOINOTHER-25-store.155168145 ./KSTREAM-JOINOTHER-25-store.155166231 ./KSTREAM-JOINOTHER-25-store.155172171 ./KSTREAM-JOINOTHER-25-store.155175075 ./KSTREAM-JOINOTHER-25-store.155163096 ./KSTREAM-JOINOTHER-25-store.155161512 ./KSTREAM-JOINOTHER-25-store.155179233 ./KSTREAM-JOINOTHER-25-store.155146266 ./KSTREAM-JOINOTHER-25-store.155153691 ./KSTREAM-JOINOTHER-25-store.155159235 ./KSTREAM-JOINOTHER-25-store.155152734 ./KSTREAM-JOINOTHER-25-store.155160687 ./KSTREAM-JOINOTHER-25-store.155174415 ./KSTREAM-JOINOTHER-25-store.155150820 ./KSTREAM-JOINOTHER-25-store.155148642 ... etc {code} Once re-balancing and state restoration is complete - the redundant segment files are deleted and the segment count drops to 508. We have seen the number of these segment stores grow to as many as 15000 over the baseline 508 which can fill smaller volumes. *This means that a state volume that would normally have ~300MB total disk usage would use in excess of 30GB during rebalancing*, mostly preallocated MANIFEST files. h2. Expected Behaviour For this particular application we expect 508 segment folders total to be active and existing throughout rebalancing. Give or take migrated tasks that are subject to the {{state.cleanup.delay.ms}}. h2. Preliminary investigation * This does not appear to be the case in v1.1.0. With our application the number of state directories only grows to 670 (over the base line 508) * The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many segment stores. * Suspect https://github.com/apache/kafka/pull/5253 to be the source of this change of behaviour. A workaround is to use {{rocksdb.config.setter}} and set the preallocated amount for MANIFEST files to a lower value such as 64KB, however the number of segment stores appears to be unbounded so disk volumes may still fill up for a heavier application. was: Note that this from the perspective of one instance of an application, where there are 8 instances total, with partition count 8 for all topics and of course stores. Standby replicas = 1. h2. Actual Behaviour During state restore of an application, many segment stores are created (I am using MANIFEST files as a marker since they preallocate 4MB each): {code:java} bash-4.2# pwd /data/fooapp/0_7 bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done .: 8058
[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore
[ https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian McCague updated KAFKA-8042: -- Description: Note that this from the perspective of one instance of an application, where there are 8 instances total, with partition count 8 for all topics and of course stores. Standby replicas = 1. h2. Actual Behaviour During state restore of an application, many segment stores are created (I am using MANIFEST files as a marker since they preallocate 4MB each): {code:java} bash-4.2# pwd /data/fooapp/0_7 bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done .: 8058 ./KSTREAM-JOINOTHER-25-store: 851 ./KSTREAM-JOINOTHER-40-store: 819 ./KSTREAM-JOINTHIS-24-store: 851 ./KSTREAM-JOINTHIS-29-store: 836 ./KSTREAM-JOINOTHER-35-store: 819 ./KSTREAM-JOINOTHER-30-store: 819 ./KSTREAM-JOINOTHER-45-store: 745 ./KSTREAM-JOINTHIS-39-store: 819 ./KSTREAM-JOINTHIS-44-store: 685 ./KSTREAM-JOINTHIS-34-store: 819 There are many (x800 as above) of these segment files: ./KSTREAM-JOINOTHER-25-store.155146629 ./KSTREAM-JOINOTHER-25-store.155155902 ./KSTREAM-JOINOTHER-25-store.155149269 ./KSTREAM-JOINOTHER-25-store.155154879 ./KSTREAM-JOINOTHER-25-store.155169861 ./KSTREAM-JOINOTHER-25-store.155153064 ./KSTREAM-JOINOTHER-25-store.155148444 ./KSTREAM-JOINOTHER-25-store.155155671 ./KSTREAM-JOINOTHER-25-store.155168673 ./KSTREAM-JOINOTHER-25-store.155159565 ./KSTREAM-JOINOTHER-25-store.155175735 ./KSTREAM-JOINOTHER-25-store.155168574 ./KSTREAM-JOINOTHER-25-store.155163525 ./KSTREAM-JOINOTHER-25-store.155165241 ./KSTREAM-JOINOTHER-25-store.155146662 ./KSTREAM-JOINOTHER-25-store.155178177 ./KSTREAM-JOINOTHER-25-store.155158740 ./KSTREAM-JOINOTHER-25-store.155168145 ./KSTREAM-JOINOTHER-25-store.155166231 ./KSTREAM-JOINOTHER-25-store.155172171 ./KSTREAM-JOINOTHER-25-store.155175075 ./KSTREAM-JOINOTHER-25-store.155163096 ./KSTREAM-JOINOTHER-25-store.155161512 ./KSTREAM-JOINOTHER-25-store.155179233 ./KSTREAM-JOINOTHER-25-store.155146266 ./KSTREAM-JOINOTHER-25-store.155153691 ./KSTREAM-JOINOTHER-25-store.155159235 ./KSTREAM-JOINOTHER-25-store.155152734 ./KSTREAM-JOINOTHER-25-store.155160687 ./KSTREAM-JOINOTHER-25-store.155174415 ./KSTREAM-JOINOTHER-25-store.155150820 ./KSTREAM-JOINOTHER-25-store.155148642 ... etc {code} Once re-balancing and state restoration is complete - the redundant segment files are deleted and the segment count drops to 508. We have seen the number of these segment stores grow to as many as 15000 over the baseline 508 which can fill smaller volumes. *This means that a state volume that would normally have ~300MB total disk usage would use in excess of 30GB during rebalancing*, mostly preallocated MANIFEST files. h2. Expected Behaviour For this particular application we expect 508 segment folders total to be active and existing throughout rebalancing. Give or take migrated tasks that are subject to the {{state.cleanup.delay.ms}}. h2. Preliminary investigation * This does not appear to be the case in v1.1.0. With our application the number of state directories only grows to 670 (over the base line 508) * The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many segment stores. * Suspect https://github.com/apache/kafka/pull/5253 to be the source of this change of behaviour. A workaround is to use {{rocksdb.config.setter}} and set the preallocated amount for MANIFEST files to a lower value such as 64KB, however the number of segment stores appears to be unbounded so disk volumes may still fill up for a heavier application. was: Note that this from the perspective of one instance of an application, where there are 8 instances total, with partition count 8 for all topics and of course stores. Standby replicas = 1. h2. Actual Behaviour During state restore of an application, many segment stores are created (I am using MANIFEST files as a marker since they preallocate 4MB each): {code:java} bash-4.2# pwd /data/fooapp/0_7 bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done .: 8058 ./KSTREAM-JOINOTHER-25-store: 851 ./KSTREAM-JOINOTHER-40-store: 819 ./KSTREAM-JOINTHIS-24-store: 851 ./KSTREAM-JOINTHIS-29-store: 836
[jira] [Created] (KAFKA-8042) Kafka Streams creates many segment stores on state restore
Adrian McCague created KAFKA-8042: - Summary: Kafka Streams creates many segment stores on state restore Key: KAFKA-8042 URL: https://issues.apache.org/jira/browse/KAFKA-8042 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.1.1, 2.1.0 Reporter: Adrian McCague Attachments: StateStoreSegments-StreamsConfig.txt Note that this from the perspective of one instance of an application, where there are 8 instances total, with partition count 8 for all topics and of course stores. Standby replicas = 1. h2. Actual Behaviour During state restore of an application, many segment stores are created (I am using MANIFEST files as a marker since they preallocate 4MB each): {code:java} bash-4.2# pwd /data/fooapp/0_7 bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done .: 8058 ./KSTREAM-JOINOTHER-25-store: 851 ./KSTREAM-JOINOTHER-40-store: 819 ./KSTREAM-JOINTHIS-24-store: 851 ./KSTREAM-JOINTHIS-29-store: 836 ./KSTREAM-JOINOTHER-35-store: 819 ./KSTREAM-JOINOTHER-30-store: 819 ./KSTREAM-JOINOTHER-45-store: 745 ./KSTREAM-JOINTHIS-39-store: 819 ./KSTREAM-JOINTHIS-44-store: 685 ./KSTREAM-JOINTHIS-34-store: 819 There are many (x800 as above) of these segment files: ./KSTREAM-JOINOTHER-25-store.155146629 ./KSTREAM-JOINOTHER-25-store.155155902 ./KSTREAM-JOINOTHER-25-store.155149269 ./KSTREAM-JOINOTHER-25-store.155154879 ./KSTREAM-JOINOTHER-25-store.155169861 ./KSTREAM-JOINOTHER-25-store.155153064 ./KSTREAM-JOINOTHER-25-store.155148444 ./KSTREAM-JOINOTHER-25-store.155155671 ./KSTREAM-JOINOTHER-25-store.155168673 ./KSTREAM-JOINOTHER-25-store.155159565 ./KSTREAM-JOINOTHER-25-store.155175735 ./KSTREAM-JOINOTHER-25-store.155168574 ./KSTREAM-JOINOTHER-25-store.155163525 ./KSTREAM-JOINOTHER-25-store.155165241 ./KSTREAM-JOINOTHER-25-store.155146662 ./KSTREAM-JOINOTHER-25-store.155178177 ./KSTREAM-JOINOTHER-25-store.155158740 ./KSTREAM-JOINOTHER-25-store.155168145 ./KSTREAM-JOINOTHER-25-store.155166231 ./KSTREAM-JOINOTHER-25-store.155172171 ./KSTREAM-JOINOTHER-25-store.155175075 ./KSTREAM-JOINOTHER-25-store.155163096 ./KSTREAM-JOINOTHER-25-store.155161512 ./KSTREAM-JOINOTHER-25-store.155179233 ./KSTREAM-JOINOTHER-25-store.155146266 ./KSTREAM-JOINOTHER-25-store.155153691 ./KSTREAM-JOINOTHER-25-store.155159235 ./KSTREAM-JOINOTHER-25-store.155152734 ./KSTREAM-JOINOTHER-25-store.155160687 ./KSTREAM-JOINOTHER-25-store.155174415 ./KSTREAM-JOINOTHER-25-store.155150820 ./KSTREAM-JOINOTHER-25-store.155148642 {code} Once re-balancing and state restoration is complete - the redundant segment files are deleted and the segment count drops to 508. We have seen the number of these segment stores grow to as many as 15000 over the baseline 508 which can fill smaller volumes. *This means that a state volume that would normally have ~300MB total disk usage would use in excess of 30GB during rebalancing*, mostly preallocated MANIFEST files. h2. Expected Behaviour For this particular application we expect 508 segment folders total to be active and existing throughout rebalancing. Give or take migrated tasks that are subject to the {{state.cleanup.delay.ms}}. h2. Preliminary investigation * This does not appear to be the case in v1.1.0. With our application the number of state directories only grows to 670 (over the base line 508) * The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many segment stores. * Suspect https://github.com/apache/kafka/pull/5253 to be the source of this change of behaviour. A workaround is to use {{rocksdb.config.setter}} and set the preallocated amount for MANIFEST files to a lower value such as 64KB, however the number of segment stores appears to be unbounded so disk volumes may still fill up for a heavier application. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-5998) /.checkpoint.tmp Not found exception
[ https://issues.apache.org/jira/browse/KAFKA-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian McCague updated KAFKA-5998: -- Affects Version/s: 2.1.1 > /.checkpoint.tmp Not found exception > > > Key: KAFKA-5998 > URL: https://issues.apache.org/jira/browse/KAFKA-5998 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0, 0.11.0.1, 2.1.1 >Reporter: Yogesh BG >Priority: Major > Attachments: 5998.v1.txt, 5998.v2.txt, Topology.txt, exc.txt, > props.txt, streams.txt > > > I have one kafka broker and one kafka stream running... I am running its > since two days under load of around 2500 msgs per second.. On third day am > getting below exception for some of the partitions, I have 16 partitions only > 0_0 and 0_1 gives this error > {{09:43:25.955 [ks_0_inst-StreamThread-6] WARN > o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to > /data/kstreams/rtp-kafkastreams/0_1/.checkpoint: > java.io.FileNotFoundException: > /data/kstreams/rtp-kafkastreams/0_1/.checkpoint.tmp (No such file or > directory) > at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111] > at java.io.FileOutputStream.(FileOutputStream.java:221) > ~[na:1.7.0_111] > at java.io.FileOutputStream.(FileOutputStream.java:171) > ~[na:1.7.0_111] > at > org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > 09:43:25.974 [ks_0_inst-StreamThread-15] WARN > o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to > /data/kstreams/rtp-kafkastreams/0_0/.checkpoint: > java.io.FileNotFoundException: > /data/kstreams/rtp-kafkastreams/0_0/.checkpoint.tmp (No such file or > directory) > at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111] > at java.io.FileOutputStream.(FileOutputStream.java:221) > ~[na:1.7.0_111] > at java.io.FileOutputStream.(FileOutputStream.java:171) > ~[na:1.7.0_111] > at > org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] >
[jira] [Commented] (KAFKA-6767) OffsetCheckpoint write assumes parent directory exists
[ https://issues.apache.org/jira/browse/KAFKA-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554541#comment-16554541 ] Adrian McCague commented on KAFKA-6767: --- Also perhaps a coincidence but topology was running fine for a month before a rebalance occurred, at which point these errors started cropping up > OffsetCheckpoint write assumes parent directory exists > -- > > Key: KAFKA-6767 > URL: https://issues.apache.org/jira/browse/KAFKA-6767 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.1.0 >Reporter: Steven Schlansker >Priority: Minor > > We run Kafka Streams with RocksDB state stores on ephemeral disks (i.e. if an > instance dies it is created from scratch, rather than reusing the existing > RocksDB.) > We routinely see: > {code:java} > 2018-04-09T19:14:35.004Z WARN <> > [chat-0319e3c3-d8b2-4c60-bd69-a8484d8d4435-StreamThread-1] > o.a.k.s.p.i.ProcessorStateManager - task [0_11] Failed to write offset > checkpoint file to /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint: {} > java.io.FileNotFoundException: > /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint.tmp (No such file or > directory) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:78) > at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:320) > at > org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:314) > at > org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208) > at > org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:307) > at > org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:297) > at > org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:67) > at > org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:357) > at > org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:347) > at > org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:403) > at > org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:994) > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:811) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:750) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720){code} > Inspecting the state store directory, I can indeed see that {{chat/0_11}} > does not exist (although many other partitions do). > > Looking at the OffsetCheckpoint write method, it seems to try to open a new > checkpoint file without first ensuring that the parent directory exists. > > {code:java} > public void write(final Map offsets) throws > IOException { > // if there is no offsets, skip writing the file to save disk IOs > if (offsets.isEmpty()) { > return; > } > synchronized (lock) { > // write to temp file and then swap with the existing file > final File temp = new File(file.getAbsolutePath() + ".tmp");{code} > > Either the OffsetCheckpoint class should initialize the directories if > needed, or some precondition of it being called should ensure that is the > case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6767) OffsetCheckpoint write assumes parent directory exists
[ https://issues.apache.org/jira/browse/KAFKA-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554059#comment-16554059 ] Adrian McCague commented on KAFKA-6767: --- Hi [~guozhang] we are seeing this issue as well on a relatively frequent basis (Streams 1.1.0) here is my case: {code:java} task [1_1] Failed to write offset checkpoint file to /data/my-topology/1_1/.checkpoint: {} java.io.FileNotFoundException: /data/my-topology/1_1/.checkpoint.tmp (No such file or directory) at java.io.FileOutputStream.open0(FileOutputStream.java) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:162) at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:78) at org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:320) at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:314) at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208) at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:307) at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:297) at org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:67) at org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:357) at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:347) at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:403) at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:994) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:811) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:750) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720){code} This is the sub-topology: {code:java} Sub-topology: 1 Source: KSTREAM-SOURCE-11 (topics: [x]) --> KSTREAM-PEEK-12 Source: KSTREAM-SOURCE-15 (topics: [y]) --> KSTREAM-PEEK-16 Processor: KSTREAM-PEEK-12 (stores: []) --> KSTREAM-KEY-SELECT-13 <-- KSTREAM-SOURCE-11 Processor: KSTREAM-PEEK-16 (stores: []) --> KSTREAM-KEY-SELECT-17 <-- KSTREAM-SOURCE-15 Processor: KSTREAM-KEY-SELECT-13 (stores: []) --> KSTREAM-MAPVALUES-14 <-- KSTREAM-PEEK-12 Processor: KSTREAM-KEY-SELECT-17 (stores: []) --> KSTREAM-MAPVALUES-18 <-- KSTREAM-PEEK-16 Processor: KSTREAM-MAPVALUES-14 (stores: []) --> KSTREAM-MERGE-19 <-- KSTREAM-KEY-SELECT-13 Processor: KSTREAM-MAPVALUES-18 (stores: []) --> KSTREAM-MERGE-19 <-- KSTREAM-KEY-SELECT-17 Processor: KSTREAM-MERGE-19 (stores: []) --> KSTREAM-FILTER-22 <-- KSTREAM-MAPVALUES-14, KSTREAM-MAPVALUES-18 Processor: KSTREAM-FILTER-22 (stores: []) --> KSTREAM-SINK-21 <-- KSTREAM-MERGE-19 Sink: KSTREAM-SINK-21 (topic: z-store-repartition) <-- KSTREAM-FILTER-22{code} So I believe this supports your theory that stateless tasks are attempting to checkpoint. In this case it appears the final sink is related to a repartition before a DSL Aggregate, which may hint towards the bug. > OffsetCheckpoint write assumes parent directory exists > -- > > Key: KAFKA-6767 > URL: https://issues.apache.org/jira/browse/KAFKA-6767 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.1.0 >Reporter: Steven Schlansker >Priority: Minor > > We run Kafka Streams with RocksDB state stores on ephemeral disks (i.e. if an > instance dies it is created from scratch, rather than reusing the existing > RocksDB.) > We routinely see: > {code:java} > 2018-04-09T19:14:35.004Z WARN <> > [chat-0319e3c3-d8b2-4c60-bd69-a8484d8d4435-StreamThread-1] > o.a.k.s.p.i.ProcessorStateManager - task [0_11] Failed to write offset > checkpoint file to /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint: {} > java.io.FileNotFoundException: > /mnt/mesos/sandbox/storage/chat/0_11/.checkpoint.tmp (No such file or > directory) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at java.io.FileOutputStream.(FileOutputStream.java:162) > at > org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:78) > at >
[jira] [Commented] (KAFKA-6167) Timestamp on streams directory contains a colon, which is an illegal character
[ https://issues.apache.org/jira/browse/KAFKA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237875#comment-16237875 ] Adrian McCague commented on KAFKA-6167: --- `.` feels quite standard here > Timestamp on streams directory contains a colon, which is an illegal character > -- > > Key: KAFKA-6167 > URL: https://issues.apache.org/jira/browse/KAFKA-6167 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 > Environment: AK 1.0.0 > Kubernetes > CoreOS > JDK 1.8 > Windows >Reporter: Justin Manchester >Priority: Normal > > Problem: > Development on Windows, which is not fully supported, however still a bug > that should be corrected. > It looks like a timestamp was added to the streams directory using a colon as > separator. I believe this is an illegal character and potentially the cause > for the exception below. > Error Stack: > 2017-11-02 16:06:41 ERROR > [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1] > org.apache.kafka.streams.processor.internals.AssignedTasks:301 - > stream-thread > [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1] > Failed to process stream task 0_0 due to the following error: > org.apache.kafka.streams.errors.StreamsException: Exception caught in > process. taskId=0_0, processor=KSTREAM-SOURCE-00, topic=input-a_1, > partition=0, offset=0 > at > org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:232) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:403) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:317) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:942) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:822) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744) > [kafka-streams-1.0.0.jar:?] > Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error > opening store KSTREAM-JOINTHIS-04-store:150962400 at location > C:\Users\ADRIAN~1.MCC\AppData\Local\Temp\kafka3548813472740086814\StreamDeduplicatorAcceptanceTest1\0_0\KSTREAM-JOINTHIS-04-store\KSTREAM-JOINTHIS-04-store:150962400 > > at > org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:204) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:174) > ~[kafka-streams-1.0.0.jar:?] > at org.apache.kafka.streams.state.internals.Segment.openDB(Segment.java:40) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.Segments.getOrCreateSegment(Segments.java:89) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(RocksDBSegmentedBytesStore.java:81) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:43) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:34) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:67) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:33) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:96) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:89) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:64) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124) >
[jira] [Comment Edited] (KAFKA-6167) Timestamp on streams directory contains a colon, which is an illegal character
[ https://issues.apache.org/jira/browse/KAFKA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237875#comment-16237875 ] Adrian McCague edited comment on KAFKA-6167 at 11/3/17 4:39 PM: {code:java}.{code} feels quite standard here was (Author: amccague): `.` feels quite standard here > Timestamp on streams directory contains a colon, which is an illegal character > -- > > Key: KAFKA-6167 > URL: https://issues.apache.org/jira/browse/KAFKA-6167 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 > Environment: AK 1.0.0 > Kubernetes > CoreOS > JDK 1.8 > Windows >Reporter: Justin Manchester >Priority: Normal > > Problem: > Development on Windows, which is not fully supported, however still a bug > that should be corrected. > It looks like a timestamp was added to the streams directory using a colon as > separator. I believe this is an illegal character and potentially the cause > for the exception below. > Error Stack: > 2017-11-02 16:06:41 ERROR > [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1] > org.apache.kafka.streams.processor.internals.AssignedTasks:301 - > stream-thread > [StreamDeduplicatorAcceptanceTest1-a3ae0ac6-a024-4006-bcb1-01ff0f433f6e-StreamThread-1] > Failed to process stream task 0_0 due to the following error: > org.apache.kafka.streams.errors.StreamsException: Exception caught in > process. taskId=0_0, processor=KSTREAM-SOURCE-00, topic=input-a_1, > partition=0, offset=0 > at > org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:232) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:403) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:317) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:942) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:822) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774) > [kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744) > [kafka-streams-1.0.0.jar:?] > Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error > opening store KSTREAM-JOINTHIS-04-store:150962400 at location > C:\Users\ADRIAN~1.MCC\AppData\Local\Temp\kafka3548813472740086814\StreamDeduplicatorAcceptanceTest1\0_0\KSTREAM-JOINTHIS-04-store\KSTREAM-JOINTHIS-04-store:150962400 > > at > org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:204) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:174) > ~[kafka-streams-1.0.0.jar:?] > at org.apache.kafka.streams.state.internals.Segment.openDB(Segment.java:40) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.Segments.getOrCreateSegment(Segments.java:89) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(RocksDBSegmentedBytesStore.java:81) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:43) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.RocksDBWindowStore$RocksDBWindowBytesStore.put(RocksDBWindowStore.java:34) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:67) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.ChangeLoggingWindowBytesStore.put(ChangeLoggingWindowBytesStore.java:33) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:96) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:89) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:64) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46) > ~[kafka-streams-1.0.0.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208) >
[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows
[ https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218969#comment-16218969 ] Adrian McCague edited comment on KAFKA-6075 at 10/25/17 10:49 PM: -- We have witnessed this issue as well, we develop on windows machines and our acceptance tests use the {{KafkaEmbedded}} instances to run against. {code:java} java.nio.file.FileSystemException: C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) at java.nio.file.Files.delete(Files.java:1126) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580) at java.nio.file.Files.walkFileTree(Files.java:2670) at java.nio.file.Files.walkFileTree(Files.java:2742) at org.apache.kafka.common.utils.Utils.delete(Utils.java:580) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88) at kafka.utils.CoreUtils.delete(CoreUtils.scala) at org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135) at org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93) {code} This is the timeindex file as well that the {{java.nio}} util can not delete, potentially not all handles on this file were closed when Kafka thinks it has shut down. was (Author: amccague): We have witnessed this issue as well, we develop on windows machines and our acceptance tests use the embedded kafka instances to run against. {code:java} java.nio.file.FileSystemException: C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) at java.nio.file.Files.delete(Files.java:1126) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580) at java.nio.file.Files.walkFileTree(Files.java:2670) at java.nio.file.Files.walkFileTree(Files.java:2742) at org.apache.kafka.common.utils.Utils.delete(Utils.java:580) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88) at kafka.utils.CoreUtils.delete(CoreUtils.scala) at org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135) at org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93) {code} This is the timeindex file as well that the `java.nio` util can not delete, potentially not all handles on this file were closed when Kafka thinks it has shut down. > Kafka cannot recover after an unclean shutdown on Windows > - > > Key: KAFKA-6075 > URL: https://issues.apache.org/jira/browse/KAFKA-6075 > Project: Kafka > Issue Type: Bug >
[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows
[ https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218969#comment-16218969 ] Adrian McCague edited comment on KAFKA-6075 at 10/25/17 10:48 PM: -- We have witnessed this issue as well, we develop on windows machines and our acceptance tests use the embedded kafka instances to run against. ``` java.nio.file.FileSystemException: C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) at java.nio.file.Files.delete(Files.java:1126) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580) at java.nio.file.Files.walkFileTree(Files.java:2670) at java.nio.file.Files.walkFileTree(Files.java:2742) at org.apache.kafka.common.utils.Utils.delete(Utils.java:580) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88) at kafka.utils.CoreUtils.delete(CoreUtils.scala) at org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135) at org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93) ``` This is the timeindex file as well that the `java.nio` util can not delete, potentially not all handles on this file were closed when Kafka thinks it has shut down. was (Author: amccague): We have witnessed this issue as well, we develop on windows machines and our acceptance tests use the embedded kafka instances to run against. We are running 0.10.2.1 and do not see this issue, if we attempt to upgrade to 0.11.0.0 then we see this for any and all tests that use the Embedded Kafka instance, currently blocking our upgrade path until we figure out a workaround. > Kafka cannot recover after an unclean shutdown on Windows > - > > Key: KAFKA-6075 > URL: https://issues.apache.org/jira/browse/KAFKA-6075 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.1 >Reporter: Vahid Hashemian > Attachments: 6075.v4 > > > An unclean shutdown of broker on Windows cannot be recovered by Kafka. Steps > to reproduce from a fresh build: > # Start zookeeper > # Start a broker > # Create a topic {{test}} > # Do an unclean shutdown of broker (find the process id by {{wmic process > where "caption = 'java.exe' and commandline like '%server.properties%'" get > processid}}), then kill the process by {{taskkill /pid /f}} > # Start the broker again > This leads to the following errors: > {code} > [2017-10-17 17:13:24,819] ERROR Error while loading log dir C:\tmp\kafka-logs > (kafka.log.LogManager) > java.nio.file.FileSystemException: > C:\tmp\kafka-logs\test-0\.timeindex: The process cannot > access the file because it is being used by another process. > at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) > at > sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) > at > sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) > at java.nio.file.Files.deleteIfExists(Files.java:1165) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at
[jira] [Comment Edited] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows
[ https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218969#comment-16218969 ] Adrian McCague edited comment on KAFKA-6075 at 10/25/17 10:48 PM: -- We have witnessed this issue as well, we develop on windows machines and our acceptance tests use the embedded kafka instances to run against. {code:java} java.nio.file.FileSystemException: C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) at java.nio.file.Files.delete(Files.java:1126) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580) at java.nio.file.Files.walkFileTree(Files.java:2670) at java.nio.file.Files.walkFileTree(Files.java:2742) at org.apache.kafka.common.utils.Utils.delete(Utils.java:580) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88) at kafka.utils.CoreUtils.delete(CoreUtils.scala) at org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135) at org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93) {code} This is the timeindex file as well that the `java.nio` util can not delete, potentially not all handles on this file were closed when Kafka thinks it has shut down. was (Author: amccague): We have witnessed this issue as well, we develop on windows machines and our acceptance tests use the embedded kafka instances to run against. ``` java.nio.file.FileSystemException: C:\Users\..\Local\Temp\junit7769464363651469214\junit8481958736421123475\_schemas-0\.timeindex: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) at java.nio.file.Files.delete(Files.java:1126) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:591) at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:580) at java.nio.file.Files.walkFileTree(Files.java:2670) at java.nio.file.Files.walkFileTree(Files.java:2742) at org.apache.kafka.common.utils.Utils.delete(Utils.java:580) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at kafka.utils.CoreUtils$$anonfun$delete$1.apply(CoreUtils.scala:88) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.utils.CoreUtils$.delete(CoreUtils.scala:88) at kafka.utils.CoreUtils.delete(CoreUtils.scala) at org.apache.kafka.streams.integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:135) at org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:93) ``` This is the timeindex file as well that the `java.nio` util can not delete, potentially not all handles on this file were closed when Kafka thinks it has shut down. > Kafka cannot recover after an unclean shutdown on Windows > - > > Key: KAFKA-6075 > URL: https://issues.apache.org/jira/browse/KAFKA-6075 > Project: Kafka > Issue Type: Bug >Affects
[jira] [Commented] (KAFKA-6075) Kafka cannot recover after an unclean shutdown on Windows
[ https://issues.apache.org/jira/browse/KAFKA-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218969#comment-16218969 ] Adrian McCague commented on KAFKA-6075: --- We have witnessed this issue as well, we develop on windows machines and our acceptance tests use the embedded kafka instances to run against. We are running 0.10.2.1 and do not see this issue, if we attempt to upgrade to 0.11.0.0 then we see this for any and all tests that use the Embedded Kafka instance, currently blocking our upgrade path until we figure out a workaround. > Kafka cannot recover after an unclean shutdown on Windows > - > > Key: KAFKA-6075 > URL: https://issues.apache.org/jira/browse/KAFKA-6075 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.1 >Reporter: Vahid Hashemian > > An unclean shutdown of broker on Windows cannot be recovered by Kafka. Steps > to reproduce from a fresh build: > # Start zookeeper > # Start a broker > # Create a topic {{test}} > # Do an unclean shutdown of broker (find the process id by {{wmic process > where "caption = 'java.exe' and commandline like '%server.properties%'" get > processid}}), then kill the process by {{taskkill /pid /f}} > # Start the broker again > This leads to the following errors: > {code} > [2017-10-17 17:13:24,819] ERROR Error while loading log dir C:\tmp\kafka-logs > (kafka.log.LogManager) > java.nio.file.FileSystemException: > C:\tmp\kafka-logs\test-0\.timeindex: The process cannot > access the file because it is being used by another process. > at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) > at > sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) > at > sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) > at java.nio.file.Files.deleteIfExists(Files.java:1165) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > at kafka.log.Log.loadSegmentFiles(Log.scala:295) > at kafka.log.Log.loadSegments(Log.scala:404) > at kafka.log.Log.(Log.scala:201) > at kafka.log.Log$.apply(Log.scala:1729) > at > kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:221) > at > kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$8$$anonfun$apply$16$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:292) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > [2017-10-17 17:13:24,819] ERROR Error while deleting the clean shutdown file > in dir C:\tmp\kafka-logs (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > C:\tmp\kafka-logs\test-0\.timeindex: The process cannot > access the file because it is being used by another process. > at > sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) > at > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) > at > sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) > at > sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) > at java.nio.file.Files.deleteIfExists(Files.java:1165) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at