Mark Payne created NIFI-11557:
---------------------------------

             Summary: Eliminate use of NIO.2 for any performance-critical parts 
of application
                 Key: NIFI-11557
                 URL: https://issues.apache.org/jira/browse/NIFI-11557
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework, Extensions
            Reporter: Mark Payne
            Assignee: Mark Payne
             Fix For: 1.latest, 2.latest


The FileSystemRepository (content repo implementation) as well as ListFile both 
make use of the {{Files.walkFileTree}} method. Recently, I worked with a user 
who had horribly long startup times. Thread dumps show that the time was almost 
entirely in the FileSystemRepository's {{initializeRepository}} method as it is 
walking the file tree in order to determine which archive files can be cleaned 
up next. This is done during startup and again periodically in background 
threads.

I made a small modification locally to instead use the standard synchronous IO 
methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate 1-byte 
FlowFiles and set  {{nifi.content.claim.max.appendable.size=1 B}} in 
nifi.properties in order to generate a huge number of files - about 1.2 million 
files in the content repository and restarted a few times. Additionally, added 
some log lines to show how long this part of the startup process took.

With the existing code, startup took 210 seconds (3.5 mins). With the new 
implementation, it took 6.7 seconds. The appears to be due to the fact that 
when using NIO.2 for every file, it does an individual disk access to obtain 
File attributes, while when using the {{File.listFiles}} method the File 
objects that are returned already have the necessary attributes. As a result, 
the NIO.2 approach makes millions of disk accesses that are unnecessary. As the 
number of files in the repository grows, the discrepancy also grows.

We need to eliminate any use of {{File.walkFileTree}} for any 
performance-critical parts of the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to