[ 
https://issues.apache.org/jira/browse/NIFI-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard resolved NIFI-7222.
----------------------------------
    Resolution: Fixed

> FetchSFTP appears to not advise the remote system it is done with a given 
> resource resulting in too many open files
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-7222
>                 URL: https://issues.apache.org/jira/browse/NIFI-7222
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>            Reporter: Joe Witt
>            Assignee: Joe Witt
>            Priority: Major
>             Fix For: 1.12.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi guys,
>  
> We have an issue with the FetchSFTP processor and the max open file 
> descriptors. In short, it seems that the FetchSFTP keeps the file open 
> “forever” on our Synology NAS, so we are reaching always the default max open 
> files limit of 1024 from our Synlogy NAS if we try to fetch 500’000 small 1MB 
> files (so in fact it’s not possible to read the files as everything is 
> blocked after 1024 files).
>  
> We found no option to rise the limit of max open files on the Synology NAS 
> (but that’s not NiFi’s fault 😉). We have also other linux machine with 
> CentOS, but the behavior there isn’t exactly always the same. Sometimes the 
> file descriptors get closed but sometimes as well not.
>  
> Synology has no lsof command, but this is how I’ve checked it:
> user@nas-01:~$ sudo ls -l /proc/<SSHD SFTP process PID>/fd | wc -l
> 1024
>  
> Any comments how we can troubleshoot the issue?
>  
> Cheers Josef
> Oh sorry, missed one of of the most important parts, we are using a 8-node 
> cluster with nifi 1.11.3 – so perfectly up to date.
>  
> Cheers Josef
> Hi Joe
>  
> Ok, to our setup, we just bought a new powerful Synology NAS to use it as 
> SFTP server mainly for NiFi to replace our current SFTP linux machine. So the 
> NAS is empty and just configured for this single use case (read/write SFTP 
> from NiFi). Nothing else is running there at the moment. Important limit is 
> per SSH/user session ulimit -a 1024 open files max.:
>  
> root@nas-01:~# ulimit -a
> core file size          (blocks, -c) unlimited
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 62025
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 62025
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>  
>  
> On NiFi side we are using an 8 node cluster, but it doesn’t matter whether 
> I’m using the whole cluster or just one single (primary) node. It’s clearly 
> visible that it’s related to the number of FetchSFTP processors running. So 
> if I’m distributing the load to 8 nodes I’m seeing 8 SFTP sessions on the NAS 
> and we can fetch  8x1024 files. I’m also seeing the file descriptors from 
> each file (per FetchSFTP processor = PID) on the NAS which has been fetched 
> by NiFi. In my understanding this files should be fetched and the file 
> descriptor should be closed after the transfer, but this doesn’t seems to be 
> the case in most of the times.
>  
> As soon as I’m stopping the “FetchSFTP” processor, the SFTP session seems to 
> be closed and all FDs are gone. So after stop/start I can fetch again 1024 
> files.
>  
> So I tried to troubleshoot a bit further and here is what I’ve done in NiFi 
> and on the NAS:
>  
> A screenshot of text
> Description automatically generated
>  
> So I’ve done a ListSFTP and got 2880 flowfiles, they will be loadbalanced to 
> one single node (to simplify to test and only get 1 SFTP session on the NAS). 
> In the ControlRate I’m transferring every 10 seconds 10 flowfiles to the 
> FetchSFTP, that corelates directly with the open file descriptors on my NAS, 
> as you can see below. Sometimes, and I don’t know when or why, the SFTP 
> session will be closed and everything starts from scratch (not happened here) 
> without any notice on NiFi side.  As you see, the FDs are growing with +10 
> every 10sec and if I’m checking the path/filename of the open FDs I see that 
> this are the one which I’ve fetched.
>  
> root@nas-01:~# ps aux | grep sftp
> root      1740  0.5  0.0 240848  8584 ?        Ss   15:01   0:00 sshd: 
> ldr@internal-sftp
> root      1753  0.0  0.0  23144  2360 pts/2    S+   15:01   0:00 grep 
> --color=auto sftp
> root     15520  0.0  0.0 241088  9252 ?        Ss   13:38   0:02 sshd: 
> ldr@internal-sftp
> root@nas-01:~#
> root@nas-01:~# ls -l /proc/1740/fd | wc -l    
> 24
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 34
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 44
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 54
> root@nas-01:~# ls -l /proc/1740/fd | wc -l
> 64
>  
> root@p-li-nas-01:~# ls -l /proc/1740/fd | head
> total 0
> lr-x------  1 root root 64 Mar  4 15:01 0 -> pipe:[1086218]
> l-wx------  1 root root 64 Mar  4 15:01 1 -> pipe:[1086219]
> lr-x------+ 1 root root 64 Mar  4 15:01 10 -> 
> /volume1/test/2019-08-31/detail-20190831-0104-92.log.gz
> lr-x------+ 1 root root 64 Mar  4 15:03 100 -> 
> /volume1/test/2019-08-31/detail-20190831-0052-91.log.gz
> lr-x------+ 1 root root 64 Mar  4 15:03 101 -> 
> /volume1/test/2019-08-31/detail-20190831-0340-92.log.gz
> lr-x------+ 1 root root 64 Mar  4 15:03 102 -> 
> /volume1/test/2019-08-31/detail-20190831-0246-91.log.gz
> lr-x------+ 1 root root 64 Mar  4 15:03 103 -> 
> /volume1/test/2019-08-31/detail-20190831-0104-91.log.gz
> lr-x------+ 1 root root 64 Mar  4 15:03 104 -> 
> /volume1/test/2019-08-31/detail-20190831-0150-91.log.gz
> lr-x------+ 1 root root 64 Mar  4 15:03 105 -> 
> /volume1/test/2019-08-31/detail-20190831-0013-91.log.gz
> …
>  
> So to sum up, one FetchSFTP generates one SFTP Session on the NAS. The SFTP 
> Session holds FDs which ,most of the time, doesn’t get closed. Reproduceable 
> with the template above with FetchSFTP to a CentOS machine or a Synology NAS. 
> Main question is now, why were the FDs not closed or when should the SFTP 
> session gets closed.
>  
> Thanks
> Just checked the open sockets on the NiFi machine where FetchSFTP is running, 
> of course there is just one SSH session if I’m using just one single 
> processor… So the SFTP transfer is hidden in the SSH session.
>  
> Open TCP sessions on NiFi
> [user@ nifi-05 ~]$ netstat -vatn | grep x.y.z.232
> tcp        0      0 x.y.z.144:33628     x.y.z.232:22        ESTABLISHED
>  
> Any comments are welcome. Still unclear where the open FDs on the NAS (SFTP 
> server) are coming from or how it should work from NiFi perspective.
>  
> Cheers Josef



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to