[ https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Payne updated NIFI-11557: ------------------------------ Labels: content-repo content-repository performance slowness startup (was: ) > Eliminate use of Files.walkFileTree for any performance-critical parts of > application > ------------------------------------------------------------------------------------- > > Key: NIFI-11557 > URL: https://issues.apache.org/jira/browse/NIFI-11557 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework, Extensions > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Major > Labels: content-repo, content-repository, performance, slowness, > startup > Fix For: 1.latest, 2.latest > > > The FileSystemRepository (content repo implementation) as well as ListFile > both make use of the {{Files.walkFileTree}} method. Recently, I worked with a > user who had horribly long startup times. Thread dumps show that the time was > almost entirely in the FileSystemRepository's {{initializeRepository}} method > as it is walking the file tree in order to determine which archive files can > be cleaned up next. This is done during startup and again periodically in > background threads. > I made a small modification locally to instead use the standard synchronous > IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate > 1-byte FlowFiles and set {{nifi.content.claim.max.appendable.size=1 B}} in > nifi.properties in order to generate a huge number of files - about 1.2 > million files in the content repository and restarted a few times. > Additionally, added some log lines to show how long this part of the startup > process took. > With the existing code, startup took 210 seconds (3.5 mins). With the new > implementation, it took 6.7 seconds. The appears to be due to the fact that > when using NIO.2 for every file, it does an individual disk access to obtain > File attributes, while when using the {{File.listFiles}} method the File > objects that are returned already have the necessary attributes. As a result, > the NIO.2 approach makes millions of disk accesses that are unnecessary. As > the number of files in the repository grows, the discrepancy also grows. > We need to eliminate any use of {{File.walkFileTree}} for any > performance-critical parts of the codebase. -- This message was sent by Atlassian Jira (v8.20.10#820010)