> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 161
> > <https://reviews.apache.org/r/48161/diff/1/?file=1404557#file1404557line161>
> >
> >     nit: spurious parenthesis before lastSeenParentDirMTime
> 
> Attila Simon wrote:
>     the condition was described in the javadoc, unfortunately it is ugly but 
> needed
> 
> Mike Percy wrote:
>     How about this?
>     
>       List<File> getMatchingFiles() {
>         long now = System.currentTimeMillis();
>         long currentParentDirMTime = parentDir.lastModified();
>         // Only check a maximum of once per second.
>         if (!cachePatternMatching ||
>             (currentParentDirMTime > lastSeenParentDirMTime &&
>              TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now)) 
> > lastCheckedTime)) {
>           lastMatchedFiles = getMatchingFilesNoCache();
>           Collections.sort(lastMatchedFiles, new 
> TailFile.CompareByLastModifiedTime());
>           lastSeenParentDirMTime = currentParentDirMTime;
>           lastCheckedTime = 
> TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now));
>         }
>         return lastMatchedFiles;
>       }
>     
>     Except that we should replace the sorting with a helper function that 
> only runs stat() once per item.

updated


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 164
> > <https://reviews.apache.org/r/48161/diff/1/?file=1404557#file1404557line164>
> >
> >     This sorting function seems problematic to me. It can call stat() up to 
> > n^2 times (assuming quicksort). Shouldn't we get the last modified time of 
> > each file in the list once and then sort it? Why are we sorting, anyway?
> >     
> >     By the way, I just checked the OpenJDK source code and indeed every 
> > File.lastModified() maps to FileSystem.getLastModifiedTime(f) which maps to 
> > stat(2). See 
> > https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/solaris/native/java/io/UnixFileSystem_md.c#L205
> 
> Attila Simon wrote:
>     Sorting was part of the original implementation (my change didn't make it 
> worse). I wanted to be non-intrusive.
> 
> Mike Percy wrote:
>     If you want to maintain the sorting behavior (and I still don't know how 
> it is used or why it is required) then please do the stat() call on each of 
> the files in the list, and cache the mtime of each file, then sort the files 
> based on those cached mtimes. It just doesn't make any sense to do IO to read 
> the metadata O(n^2) times.

After a brief look I found a place: TestTaildirSource.testFileConsumeOrder() 
junit test breaks without sorting (Vague guessing: I think it was required to 
maintain some level of false causality between independent events). Anyway I 
voted for the improved sort.


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java,
> >  line 49
> > <https://reviews.apache.org/r/48161/diff/1/?file=1404560#file1404560line49>
> >
> >     style nit: leave spaces around your brackets. We use the java style 
> > guide in Flume (admittedly not as consistently as I wish we did, though)
> 
> Attila Simon wrote:
>     Related to coding style nits: is there any importable flume related style 
> configuration for intellij somewhere I haven't found already? If yes could 
> you please point me to its direction?
> 
> Mike Percy wrote:
>     Nope, but a checkstyle file would be welcome that enforced Java style 
> with 2-space indent.

Cool, Donat gonna have a contribution opportunity ;)


- Attila


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review136086
-----------------------------------------------------------


On June 13, 2016, 2:14 p.m., Attila Simon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 2:14 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2918
>     https://issues.apache.org/jira/browse/FLUME-2918
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> The way TailDir source checks which files should be tracked was improved. 
> Existing implementation caused unneccessary high CPU usage for huge (+50K 
> files) directories. This fix allows users to eliminate continous listing of 
> parent directory (on each Source.process invocation) and introduce a more 
> performant method for listing&matching files.
> 
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
> 
> 
> Diffs
> -----
> 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
>  5b6d465 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
>  8816327 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
>  6165276 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
>  f9e614c 
> 
> Diff: https://reviews.apache.org/r/48161/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
> 
> 
> Thanks,
> 
> Attila Simon
> 
>

Reply via email to