[ https://issues.apache.org/jira/browse/NIFI-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051808#comment-17051808 ]
Joe Witt commented on NIFI-7222: -------------------------------- Since this issue has been reported now 3 times and with different impacts in each case I am merging them into a single issue and plan to resolve it here and ensure it lands on 1.12.0. It possibly warrants doing a 1.11.4 but we'll see. > FetchSFTP appears to not advise the remote system it is done with a given > resource resulting in too many open files > ------------------------------------------------------------------------------------------------------------------- > > Key: NIFI-7222 > URL: https://issues.apache.org/jira/browse/NIFI-7222 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Reporter: Joe Witt > Assignee: Joe Witt > Priority: Major > Fix For: 1.12.0 > > > Hi guys, > > We have an issue with the FetchSFTP processor and the max open file > descriptors. In short, it seems that the FetchSFTP keeps the file open > “forever” on our Synology NAS, so we are reaching always the default max open > files limit of 1024 from our Synlogy NAS if we try to fetch 500’000 small 1MB > files (so in fact it’s not possible to read the files as everything is > blocked after 1024 files). > > We found no option to rise the limit of max open files on the Synology NAS > (but that’s not NiFi’s fault 😉). We have also other linux machine with > CentOS, but the behavior there isn’t exactly always the same. Sometimes the > file descriptors get closed but sometimes as well not. > > Synology has no lsof command, but this is how I’ve checked it: > user@nas-01:~$ sudo ls -l /proc/<SSHD SFTP process PID>/fd | wc -l > 1024 > > Any comments how we can troubleshoot the issue? > > Cheers Josef > Oh sorry, missed one of of the most important parts, we are using a 8-node > cluster with nifi 1.11.3 – so perfectly up to date. > > Cheers Josef > Hi Joe > > Ok, to our setup, we just bought a new powerful Synology NAS to use it as > SFTP server mainly for NiFi to replace our current SFTP linux machine. So the > NAS is empty and just configured for this single use case (read/write SFTP > from NiFi). Nothing else is running there at the moment. Important limit is > per SSH/user session ulimit -a 1024 open files max.: > > root@nas-01:~# ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 62025 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 62025 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > > On NiFi side we are using an 8 node cluster, but it doesn’t matter whether > I’m using the whole cluster or just one single (primary) node. It’s clearly > visible that it’s related to the number of FetchSFTP processors running. So > if I’m distributing the load to 8 nodes I’m seeing 8 SFTP sessions on the NAS > and we can fetch 8x1024 files. I’m also seeing the file descriptors from > each file (per FetchSFTP processor = PID) on the NAS which has been fetched > by NiFi. In my understanding this files should be fetched and the file > descriptor should be closed after the transfer, but this doesn’t seems to be > the case in most of the times. > > As soon as I’m stopping the “FetchSFTP” processor, the SFTP session seems to > be closed and all FDs are gone. So after stop/start I can fetch again 1024 > files. > > So I tried to troubleshoot a bit further and here is what I’ve done in NiFi > and on the NAS: > > A screenshot of text > Description automatically generated > > So I’ve done a ListSFTP and got 2880 flowfiles, they will be loadbalanced to > one single node (to simplify to test and only get 1 SFTP session on the NAS). > In the ControlRate I’m transferring every 10 seconds 10 flowfiles to the > FetchSFTP, that corelates directly with the open file descriptors on my NAS, > as you can see below. Sometimes, and I don’t know when or why, the SFTP > session will be closed and everything starts from scratch (not happened here) > without any notice on NiFi side. As you see, the FDs are growing with +10 > every 10sec and if I’m checking the path/filename of the open FDs I see that > this are the one which I’ve fetched. > > root@nas-01:~# ps aux | grep sftp > root 1740 0.5 0.0 240848 8584 ? Ss 15:01 0:00 sshd: > ldr@internal-sftp > root 1753 0.0 0.0 23144 2360 pts/2 S+ 15:01 0:00 grep > --color=auto sftp > root 15520 0.0 0.0 241088 9252 ? Ss 13:38 0:02 sshd: > ldr@internal-sftp > root@nas-01:~# > root@nas-01:~# ls -l /proc/1740/fd | wc -l > 24 > root@nas-01:~# ls -l /proc/1740/fd | wc -l > 34 > root@nas-01:~# ls -l /proc/1740/fd | wc -l > 44 > root@nas-01:~# ls -l /proc/1740/fd | wc -l > 54 > root@nas-01:~# ls -l /proc/1740/fd | wc -l > 64 > > root@p-li-nas-01:~# ls -l /proc/1740/fd | head > total 0 > lr-x------ 1 root root 64 Mar 4 15:01 0 -> pipe:[1086218] > l-wx------ 1 root root 64 Mar 4 15:01 1 -> pipe:[1086219] > lr-x------+ 1 root root 64 Mar 4 15:01 10 -> > /volume1/test/2019-08-31/detail-20190831-0104-92.log.gz > lr-x------+ 1 root root 64 Mar 4 15:03 100 -> > /volume1/test/2019-08-31/detail-20190831-0052-91.log.gz > lr-x------+ 1 root root 64 Mar 4 15:03 101 -> > /volume1/test/2019-08-31/detail-20190831-0340-92.log.gz > lr-x------+ 1 root root 64 Mar 4 15:03 102 -> > /volume1/test/2019-08-31/detail-20190831-0246-91.log.gz > lr-x------+ 1 root root 64 Mar 4 15:03 103 -> > /volume1/test/2019-08-31/detail-20190831-0104-91.log.gz > lr-x------+ 1 root root 64 Mar 4 15:03 104 -> > /volume1/test/2019-08-31/detail-20190831-0150-91.log.gz > lr-x------+ 1 root root 64 Mar 4 15:03 105 -> > /volume1/test/2019-08-31/detail-20190831-0013-91.log.gz > … > > So to sum up, one FetchSFTP generates one SFTP Session on the NAS. The SFTP > Session holds FDs which ,most of the time, doesn’t get closed. Reproduceable > with the template above with FetchSFTP to a CentOS machine or a Synology NAS. > Main question is now, why were the FDs not closed or when should the SFTP > session gets closed. > > Thanks > Just checked the open sockets on the NiFi machine where FetchSFTP is running, > of course there is just one SSH session if I’m using just one single > processor… So the SFTP transfer is hidden in the SSH session. > > Open TCP sessions on NiFi > [user@ nifi-05 ~]$ netstat -vatn | grep x.y.z.232 > tcp 0 0 x.y.z.144:33628 x.y.z.232:22 ESTABLISHED > > Any comments are welcome. Still unclear where the open FDs on the NAS (SFTP > server) are coming from or how it should work from NiFi perspective. > > Cheers Josef -- This message was sent by Atlassian Jira (v8.3.4#803005)