Joe Witt created NIFI-7222:
------------------------------

             Summary: FetchSFTP appears to not advise the remote system it is 
done with a given resource resulting in too many open files
                 Key: NIFI-7222
                 URL: https://issues.apache.org/jira/browse/NIFI-7222
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
            Reporter: Joe Witt
            Assignee: Joe Witt


Hi guys,

 

We have an issue with the FetchSFTP processor and the max open file 
descriptors. In short, it seems that the FetchSFTP keeps the file open 
“forever” on our Synology NAS, so we are reaching always the default max open 
files limit of 1024 from our Synlogy NAS if we try to fetch 500’000 small 1MB 
files (so in fact it’s not possible to read the files as everything is blocked 
after 1024 files).

 

We found no option to rise the limit of max open files on the Synology NAS (but 
that’s not NiFi’s fault 😉). We have also other linux machine with CentOS, but 
the behavior there isn’t exactly always the same. Sometimes the file 
descriptors get closed but sometimes as well not.

 

Synology has no lsof command, but this is how I’ve checked it:

user@nas-01:~$ sudo ls -l /proc/<SSHD SFTP process PID>/fd | wc -l

1024

 

Any comments how we can troubleshoot the issue?

 

Cheers Josef

Oh sorry, missed one of of the most important parts, we are using a 8-node 
cluster with nifi 1.11.3 – so perfectly up to date.

 

Cheers Josef

Hi Joe

 

Ok, to our setup, we just bought a new powerful Synology NAS to use it as SFTP 
server mainly for NiFi to replace our current SFTP linux machine. So the NAS is 
empty and just configured for this single use case (read/write SFTP from NiFi). 
Nothing else is running there at the moment. Important limit is per SSH/user 
session ulimit -a 1024 open files max.:

 

root@nas-01:~# ulimit -a

core file size          (blocks, -c) unlimited

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 62025

max locked memory       (kbytes, -l) 64

max memory size         (kbytes, -m) unlimited

open files                      (-n) 1024

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) 8192

cpu time               (seconds, -t) unlimited

max user processes              (-u) 62025

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited

 

 

On NiFi side we are using an 8 node cluster, but it doesn’t matter whether I’m 
using the whole cluster or just one single (primary) node. It’s clearly visible 
that it’s related to the number of FetchSFTP processors running. So if I’m 
distributing the load to 8 nodes I’m seeing 8 SFTP sessions on the NAS and we 
can fetch  8x1024 files. I’m also seeing the file descriptors from each file 
(per FetchSFTP processor = PID) on the NAS which has been fetched by NiFi. In 
my understanding this files should be fetched and the file descriptor should be 
closed after the transfer, but this doesn’t seems to be the case in most of the 
times.

 

As soon as I’m stopping the “FetchSFTP” processor, the SFTP session seems to be 
closed and all FDs are gone. So after stop/start I can fetch again 1024 files.

 

So I tried to troubleshoot a bit further and here is what I’ve done in NiFi and 
on the NAS:

 

A screenshot of text

Description automatically generated

 

So I’ve done a ListSFTP and got 2880 flowfiles, they will be loadbalanced to 
one single node (to simplify to test and only get 1 SFTP session on the NAS). 
In the ControlRate I’m transferring every 10 seconds 10 flowfiles to the 
FetchSFTP, that corelates directly with the open file descriptors on my NAS, as 
you can see below. Sometimes, and I don’t know when or why, the SFTP session 
will be closed and everything starts from scratch (not happened here) without 
any notice on NiFi side.  As you see, the FDs are growing with +10 every 10sec 
and if I’m checking the path/filename of the open FDs I see that this are the 
one which I’ve fetched.
 

root@nas-01:~# ps aux | grep sftp

root      1740  0.5  0.0 240848  8584 ?        Ss   15:01   0:00 sshd: 
ldr@internal-sftp

root      1753  0.0  0.0  23144  2360 pts/2    S+   15:01   0:00 grep 
--color=auto sftp

root     15520  0.0  0.0 241088  9252 ?        Ss   13:38   0:02 sshd: 
ldr@internal-sftp

root@nas-01:~#

root@nas-01:~# ls -l /proc/1740/fd | wc -l    

24

root@nas-01:~# ls -l /proc/1740/fd | wc -l

34

root@nas-01:~# ls -l /proc/1740/fd | wc -l

44

root@nas-01:~# ls -l /proc/1740/fd | wc -l

54

root@nas-01:~# ls -l /proc/1740/fd | wc -l

64

 

root@p-li-nas-01:~# ls -l /proc/1740/fd | head

total 0

lr-x------  1 root root 64 Mar  4 15:01 0 -> pipe:[1086218]

l-wx------  1 root root 64 Mar  4 15:01 1 -> pipe:[1086219]

lr-x------+ 1 root root 64 Mar  4 15:01 10 -> 
/volume1/test/2019-08-31/detail-20190831-0104-92.log.gz

lr-x------+ 1 root root 64 Mar  4 15:03 100 -> 
/volume1/test/2019-08-31/detail-20190831-0052-91.log.gz

lr-x------+ 1 root root 64 Mar  4 15:03 101 -> 
/volume1/test/2019-08-31/detail-20190831-0340-92.log.gz

lr-x------+ 1 root root 64 Mar  4 15:03 102 -> 
/volume1/test/2019-08-31/detail-20190831-0246-91.log.gz

lr-x------+ 1 root root 64 Mar  4 15:03 103 -> 
/volume1/test/2019-08-31/detail-20190831-0104-91.log.gz

lr-x------+ 1 root root 64 Mar  4 15:03 104 -> 
/volume1/test/2019-08-31/detail-20190831-0150-91.log.gz

lr-x------+ 1 root root 64 Mar  4 15:03 105 -> 
/volume1/test/2019-08-31/detail-20190831-0013-91.log.gz

…

 

So to sum up, one FetchSFTP generates one SFTP Session on the NAS. The SFTP 
Session holds FDs which ,most of the time, doesn’t get closed. Reproduceable 
with the template above with FetchSFTP to a CentOS machine or a Synology NAS. 
Main question is now, why were the FDs not closed or when should the SFTP 
session gets closed.

 

Thanks

Just checked the open sockets on the NiFi machine where FetchSFTP is running, 
of course there is just one SSH session if I’m using just one single processor… 
So the SFTP transfer is hidden in the SSH session.

 

Open TCP sessions on NiFi
[user@ nifi-05 ~]$ netstat -vatn | grep x.y.z.232

tcp        0      0 x.y.z.144:33628     x.y.z.232:22        ESTABLISHED

 

Any comments are welcome. Still unclear where the open FDs on the NAS (SFTP 
server) are coming from or how it should work from NiFi perspective.

 

Cheers Josef



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to