[ 
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-11557:
------------------------------
    Labels: content-repo content-repository performance slowness startup  (was: 
)

> Eliminate use of Files.walkFileTree for any performance-critical parts of 
> application
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-11557
>                 URL: https://issues.apache.org/jira/browse/NIFI-11557
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>              Labels: content-repo, content-repository, performance, slowness, 
> startup
>             Fix For: 1.latest, 2.latest
>
>
> The FileSystemRepository (content repo implementation) as well as ListFile 
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a 
> user who had horribly long startup times. Thread dumps show that the time was 
> almost entirely in the FileSystemRepository's {{initializeRepository}} method 
> as it is walking the file tree in order to determine which archive files can 
> be cleaned up next. This is done during startup and again periodically in 
> background threads.
> I made a small modification locally to instead use the standard synchronous 
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate 
> 1-byte FlowFiles and set  {{nifi.content.claim.max.appendable.size=1 B}} in 
> nifi.properties in order to generate a huge number of files - about 1.2 
> million files in the content repository and restarted a few times. 
> Additionally, added some log lines to show how long this part of the startup 
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new 
> implementation, it took 6.7 seconds. The appears to be due to the fact that 
> when using NIO.2 for every file, it does an individual disk access to obtain 
> File attributes, while when using the {{File.listFiles}} method the File 
> objects that are returned already have the necessary attributes. As a result, 
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As 
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any 
> performance-critical parts of the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to