----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52492/#review152330 -----------------------------------------------------------
More thorough review this time. Feedback below. Also, we should add documentation for this monitor, particularly if it is going to be enabled by default. The documentation should clearly explain what the monitor does and how it's related to delete retention in a log compacted changelog. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 82) <https://reviews.apache.org/r/52492/#comment221325> I see config.getLocalStoreBaseDir called twice in quick succession. It implies that the paths are related. Suggestion: Since getHostAffinityEnabledJobs() creates a file for the baseDir, why not 1. create that file before the for-loop 2. pass that file into getHostAffinityEnabledJobs 3. use that file as the first argument (parent) to create the jobDir File. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (lines 85 - 91) <https://reviews.apache.org/r/52492/#comment221333> This code doesn't ensure that it will only delete the store for the appropriate task. This could be a problem if there were asymmetry in the stores across tasks So the file system looks like: .../job1-1/myStoreName1/task1 .../job1-1/myStoreName2/task2 task.getStoreNames().contains("myStoreName1") will be false for task2. task.getStoreNames().contains("myStoreName2") will be false for task1. And so they will delete each other's stores. This might be a stretch, but blindly deleting all the task dirs under a particular store dir seems like a bug waiting to happen. It would be useful to have some tests to cover these scenarios. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 94) <https://reviews.apache.org/r/52492/#comment221342> 3 issues with this section: 1. The big one: The way this code reads, it will delete the store for any non-running job, completely defeating the purpose of host affinity. After staring at it for a while, I realized the deleteTaskStore() method doesn't always delete. This is confusing and makes the code seem destructive. I'd highly recommend a more accurate name, like "markSweepIfStale()" 2. Do we want to delete the store for a job with status == STARTING? Feels like a race condition. see JobStatus.hasBeenStarted() 3. I didn't see any unit tests covering this logic. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 145) <https://reviews.apache.org/r/52492/#comment221346> Thanks for the careful logging. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 158) <https://reviews.apache.org/r/52492/#comment221324> I'd suggest renaming this to JobsClient for 2 reasons. 1. It might be something we refactor out to make it easier for users. 2. "Util" classes are like misc. buckets. They invite developers to lazily throw methods in that bucket rather than diligently planning them, naming them, and putting them where they belong. This almost always turns into a mess over time. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 179) <https://reviews.apache.org/r/52492/#comment221349> This seems wrong. The samza-rest service should not be using the RM port. This request should have nothing to do with YARN. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 192) <https://reviews.apache.org/r/52492/#comment221350> This seems wrong. The samza-rest service should not be using the RM port. This request should have nothing to do with YARN. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 206) <https://reviews.apache.org/r/52492/#comment221351> Why don't we just have a config "job status servers" that has a list of host:port to samza-rest servers that can provide job status? This would be analogous to kafka's bootstrap.servers config. Then we wouldn't need any Yarn dependency. samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitorConfig.java (line 46) <https://reviews.apache.org/r/52492/#comment221352> This must not be larger than delete.retention.ms! Slightly lower is better. So if we're defaulting delete.retention.ms to 24 hrs, I'd set this to 23-23.5 Javadocs should also clearly warn of this association. - Jake Maes On Oct. 11, 2016, 12:53 a.m., Shanthoosh Venkataraman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52492/ > ----------------------------------------------------------- > > (Updated Oct. 11, 2016, 12:53 a.m.) > > > Review request for samza. > > > Repository: samza > > > Description > ------- > > This patch contains the samza-rest monitor that periodically cleans up the > stale local stores of dead jobs/tasks. It performs the store deletion in two > phases. Initially it deletes the offset file in the local task stores if the > following condition is true. ((jobIsNotRunning || preferedHost != nmHost) && > offsetFilelastModifiedTime is greater than deleteRetention). During the > subsequent run, it deletes the local task stores if it does not contain > offset file. Please refer to the design doc of SAMZA-656 > (https://issues.apache.org/jira/secure/attachment/12828083/DESIGN-SAMZA-656.pdf) > for more details. > > > Diffs > ----- > > build.gradle 2bea27b75288d3103178bc3762b9556f6e69cdd1 > > samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java > PRE-CREATION > > samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitorConfig.java > PRE-CREATION > > samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitorFactory.java > PRE-CREATION > samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResource.java > a566db598c284d69ea61af88fdc0851483d5a089 > > samza-rest/src/main/java/org/apache/samza/rest/resources/ResourceConstants.java > PRE-CREATION > > samza-rest/src/test/java/org/apache/samza/monitor/TestYarnLocalStoreMonitor.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/52492/diff/ > > > Testing > ------- > > Unit testing and manual testing are done to verify the functionality. > > > Thanks, > > Shanthoosh Venkataraman > >