[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-06-10 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-858453218


   Rebased onto the latest trunk. cc/ @vvcephei


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-05-25 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-847886290


   Hi @guozhangwang @vvcephei,
   Could you have a look now? :smiley: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-05-07 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-834361327


   Hi @vvcephei, Could you review this PR now? :bow:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-04-19 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-822991690


   > I know @vvcephei was quite swamped in the past months with pretty heavy 
release management duties.
   
   Totally agree. It's also why I did not press him. :smile: I rebased the PR 
into the latest trunk and checked it passes all tests!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-04-19 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-822452651


   Hi @guozhangwang,
   
   Thanks for your mention. I completed applying @vvcephei's comment (on March 
25, 2021) and awaiting review and merging, with maintaining the PR to the 
latest trunk.
   
   As you can see here, this issue was started much earlier but It seems like I 
misunderstood @vvcephei's intention.  And I recognized it this March.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-03-25 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-806739608


   Add to this, I also updated `EOSUncleanShutdownIntegrationTest` also. I 
inspected the history of this test suite precisely and found the following:
   
   1. This suite was introduced in c2ec974 by @abbccdda. It asserts that the 
Task's statestore directory (not root one in KafkaStreams's `state.dir` 
property) is deleted after an unclean shutdown. However, IMHO this assertion is 
a little bit inaccurate since the Task's statestore directory may not be 
deleted, for example, with an empty checkpoint file.
   2. A comment is added on L160 in d3c067f by @guozhangwang - 'the state 
directory should still exist with the empty checkpoint file'. However, IMHO 
this comment seems to be a mistake, since `assertFalse(stateDir.exists());` on 
L161 asserts that the Task's statestore directory does not exist.
   
   There are some other commits on this file later, but all of them are just 
minor improvements, not changing the testing logic.
   
   With this PR, Kafka Streams now deletes the empty application statestore 
directory (i.e., `{state.dir}/{application.id}`). So, I improved this suite 
like this:
   
   1. After initialization, wait until the Task's statestore directory is 
populated. (i.e., `{state.dir}/{application.id}/0_0/*`)
   2. Assert that one of the following is satisfied:
   
 - The Task's statestore directory is empty, so deleted.
 - The Task's statestore directory is not empty but without a checkpoint 
file.
 - The Task's statestore directory is not empty but with an empty 
checkpoint file.
   
   (Wait, then should we separate the modifications on 
`EOSUncleanShutdownIntegrationTest` into an independent PR?)
   
   Please have a look when you are free. Thanks again for reviewing my PR! 
:smiley:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-03-25 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-806736975


   Hi @vvcephei,
   
   It seems like I have failed to understand your intention correctly. Here is 
the fix. I reorganized the PR into two commits, one for `StateDirectory` and 
the other one for `EOSUncleanShutdownIntegrationTest`. The following summarizes 
the updates:
   
   1. Rebase the PR onto the latest trunk, resolving all broken syntaxes.
   2. Remove redundant assertion in 
`StateDirectoryTest#shouldNotDeleteAppDirWhenCleanUpIfNotEmpty`; it now 
validates logging message only.
   3. Make `StateDirectoryIntegrationTest` to use default timeout 
(`IntegrationTestUtils.DEFAULT_TIMEOUT`) with its latches, keeping the suite 
from being flaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2021-01-18 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-762079038


   Rebased onto the latest trunk. cc/ @vvcephei @mjsax



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-12-04 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-739132524


   Rebased onto the latest trunk.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-11-24 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-733525231


   Retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-11-20 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-731208645


   Hi @vvcephei,
   
   I re-based the branch onto the latest trunk, reorganized the changes into 
two commits, and fixed 
`EOSUncleanShutdownIntegrationTest#shouldWorkWithUncleanShutdownWipeOutStateStore`
 - Sure, its contents do not follow the description of its name. :wink: 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-11-10 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-724725724


   Hi @vvcephei,
   
   Here is the fix. The reason for the broken integration tests was: 
`KafkaStreams#cleanUp` can be called regardless of the `StreamThread`s are 
terminated.
   
   That is, if `KafkaStreams#cleanUp` is called when the `KafkaStreams` is 
still running, the `{state-store-directory}/{application-id}` directory is 
deleted and  `StreamThread` crashes with the exception from 
`StateDirectory#directoryForTask` - since it fails to create the task state 
store directory, i.e., `{state-store-directory}/{application-id}/{task-id}`.
   
   Moreover, `StateDirectory#directoryForTask` method has two additional 
vulnerabilities:
   
   1. When it creates the task state store directory, it does not create its 
parent directories automatically - if there is not 
`{state-store-directory}/{application-id}`, `taskDir.mkdir()` returns false and 
it throws an exception. For this behavior breaks the integration tests, I 
modified `taskDir.mkdir()` to `taskDir.mkdirs()` to create 
`{state-store-directory}/{application-id}` automatically. 
   2. This method only checks whether there is a `File` at 
`{state-store-directory}/{application-id}/{task-id}`, regardless of it is 
actually a directory or not. I added an additional check condition for this 
case and `StateDirectoryTest#shouldThrowProcessorStateException` is updated 
accordingly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-11-03 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-721135460


   Hi @vvcephei,
   
   Sorry for being late. Defending myself, I have been too busy to finalize my 
project last week, and it ended yesterday. Sure, I am now resolving the broken 
tests. There are 21 broken tests with the latest trunk, and I just resolved 16 
of them. It will be completed soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-10-22 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-714293384


   @vvcephei Here it is; I deleted a outdated condition statement in 
`EOSUncleanShutdownIntegrationTest`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-10-20 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-713251214


   @vvcephei Here it is; I updated the timeout to 
`IntegrationTestUtils.DEFAULT_TIMEOUT` and rebased onto the latest trunk also.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] dongjinleekr commented on pull request #9414: KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup

2020-10-12 Thread GitBox


dongjinleekr commented on pull request #9414:
URL: https://github.com/apache/kafka/pull/9414#issuecomment-707498871


   Hi @vvcephei,
   
   1. Sure. I added some validations in `StateDirectoryTest`.
   
 - If `StateDirectory#clean` is called, the empty `appDir` is also deleted. 
(see `StateDirectoryTest#shouldLogManualUserCallMessage`.)
 - If `StateDirectory#clean` is not called, the global state directory and 
it parent, `appDir` is not deleted. (see 
`StateDirectoryTest#shouldLogStateDirCleanerMessage`.)
   
   Please note the difference in 
`StateDirectoryTest#shouldCleanupAllTaskDirectoriesIncludingGlobalOne`; the 
`appDir` was an empty directory before but it is now deleted.
   
   2. Since the goal of this modification is deleting the empty directory, we 
don't need a recursive delete; it is also why I called `File#delete` here, 
since it works and returns `true` iff the target directory is empty.
   
   And one more thing: I added an exception handling for `SecurityException`. 
:smile:



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org