> On June 7, 2016, 1:44 a.m., Mike Percy wrote: > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, > > line 161 > > <https://reviews.apache.org/r/48161/diff/1/?file=1404557#file1404557line161> > > > > nit: spurious parenthesis before lastSeenParentDirMTime > > Attila Simon wrote: > the condition was described in the javadoc, unfortunately it is ugly but > needed > > Mike Percy wrote: > How about this? > > List<File> getMatchingFiles() { > long now = System.currentTimeMillis(); > long currentParentDirMTime = parentDir.lastModified(); > // Only check a maximum of once per second. > if (!cachePatternMatching || > (currentParentDirMTime > lastSeenParentDirMTime && > TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now)) > > lastCheckedTime)) { > lastMatchedFiles = getMatchingFilesNoCache(); > Collections.sort(lastMatchedFiles, new > TailFile.CompareByLastModifiedTime()); > lastSeenParentDirMTime = currentParentDirMTime; > lastCheckedTime = > TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now)); > } > return lastMatchedFiles; > } > > Except that we should replace the sorting with a helper function that > only runs stat() once per item.
updated > On June 7, 2016, 1:44 a.m., Mike Percy wrote: > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, > > line 164 > > <https://reviews.apache.org/r/48161/diff/1/?file=1404557#file1404557line164> > > > > This sorting function seems problematic to me. It can call stat() up to > > n^2 times (assuming quicksort). Shouldn't we get the last modified time of > > each file in the list once and then sort it? Why are we sorting, anyway? > > > > By the way, I just checked the OpenJDK source code and indeed every > > File.lastModified() maps to FileSystem.getLastModifiedTime(f) which maps to > > stat(2). See > > https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/solaris/native/java/io/UnixFileSystem_md.c#L205 > > Attila Simon wrote: > Sorting was part of the original implementation (my change didn't make it > worse). I wanted to be non-intrusive. > > Mike Percy wrote: > If you want to maintain the sorting behavior (and I still don't know how > it is used or why it is required) then please do the stat() call on each of > the files in the list, and cache the mtime of each file, then sort the files > based on those cached mtimes. It just doesn't make any sense to do IO to read > the metadata O(n^2) times. After a brief look I found a place: TestTaildirSource.testFileConsumeOrder() junit test breaks without sorting (Vague guessing: I think it was required to maintain some level of false causality between independent events). Anyway I voted for the improved sort. > On June 7, 2016, 1:44 a.m., Mike Percy wrote: > > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java, > > line 49 > > <https://reviews.apache.org/r/48161/diff/1/?file=1404560#file1404560line49> > > > > style nit: leave spaces around your brackets. We use the java style > > guide in Flume (admittedly not as consistently as I wish we did, though) > > Attila Simon wrote: > Related to coding style nits: is there any importable flume related style > configuration for intellij somewhere I haven't found already? If yes could > you please point me to its direction? > > Mike Percy wrote: > Nope, but a checkstyle file would be welcome that enforced Java style > with 2-space indent. Cool, Donat gonna have a contribution opportunity ;) - Attila ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48161/#review136086 ----------------------------------------------------------- On June 13, 2016, 2:14 p.m., Attila Simon wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48161/ > ----------------------------------------------------------- > > (Updated June 13, 2016, 2:14 p.m.) > > > Review request for Flume. > > > Bugs: FLUME-2918 > https://issues.apache.org/jira/browse/FLUME-2918 > > > Repository: flume-git > > > Description > ------- > > The way TailDir source checks which files should be tracked was improved. > Existing implementation caused unneccessary high CPU usage for huge (+50K > files) directories. This fix allows users to eliminate continous listing of > parent directory (on each Source.process invocation) and introduce a more > performant method for listing&matching files. > > used java.nio.file.DirectoryStream to filter files > made pattern match calculation optionally cached > added junit tests > added javadoc > added license > > > Diffs > ----- > > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java > 5b6d465 > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java > PRE-CREATION > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java > 8816327 > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java > 6165276 > > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java > PRE-CREATION > > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java > f9e614c > > Diff: https://reviews.apache.org/r/48161/diff/ > > > Testing > ------- > > mvn clean install -DskipTests -> built > junit tests for flume-taildir-source module -> passed > > > Thanks, > > Attila Simon > >
