[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050363#comment-17050363 ] Harald Dobbernack commented on NIFI-6790: - Thank you Shawn for the good news!! I found the Jira Number https://issues.apache.org/jira/browse/NIFI-7114 > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050353#comment-17050353 ] Shawn Weeks commented on NIFI-6790: --- This was fixed in 1.11.3. I don’t have the other jira issue number handy though. > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050322#comment-17050322 ] Harald Dobbernack commented on NIFI-6790: - We (NiFi Beginners) have been experiencing 'Too many open files' Problems with an one node development system (NiFi 1.11.1) , but many Processors and some Wait Processors relying on nofication via DistributedCache fed by Notify Processor which get's fed by listfile Processors checking every 30sec with yield and penalty set to 30sec. Even if the only active flows are in the wait connection of the wait processor and nothing else is happening then the open pipes will accumalate over time. In our use case the waiting flows can easily be waiting a few days. From Friday Evening (starting with about 4000 open Files for the nifi user) till Monday Morning with only three active flows we managed to surpass 5 open files. We will now as a workaround restart the service regularly, but I am hoping a solution can be found. As a side note: we had to put 'export FD_MAX=5' in the nifi-env.sh, as the settings described in https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html under Maximum File Handles in _/etc/security/limits.conf_ did not suffice to be able to set the max open file limit of nifi over 4096 > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954610#comment-16954610 ] Matthew Knight commented on NIFI-6790: -- It was after 6736 got merged, I was trying to see if that fixed it, leaky pipes were skill a problem. nifi.cluster.load.balance.connections.per.node=10 nifi.cluster.load.balance.max.thread.count=10 nifi.cluster.load.balance.comms.timeout=30 sec > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954606#comment-16954606 ] Mark Payne commented on NIFI-6790: -- [~Matthew Knight] thanks for the update. I do notice in your screenshot that you've got a 100-node cluster, which is certainly different than the 2-node cluster that I've got running to try to replicate. Are you using default settings for load balanced configuration in your `nifi.properties`? Also, when you say "this is still present in master as of a few days ago" - do you know how long ago that was? I'm wondering if this problem is addressed by https://issues.apache.org/jira/browse/NIFI-6736. What we saw in that that Jira was that there could be a lot of sockets left open in a situation like this. But it wasn't reported as pipe's. But that may be a difference in OSX vs. Ubuntu, perhaps? > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954580#comment-16954580 ] Matthew Knight commented on NIFI-6790: -- The images attached are the lsof output on the primary node, I just did some ETL to get it into a database to give me more options in querying it. The uptick starts pretty much immediately, within a minute of the flow with load balanced queue starting the handles have been eaten. > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954090#comment-16954090 ] Mark Payne commented on NIFI-6790: -- [~Matthew Knight] thanks for reporting this! I'm trying to replicate this on master but with no luck: {code:java} nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | wc -l 5634 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | wc -l 5661 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | wc -l 5658 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | wc -l 5660 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | wc -l 5655 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | wc -l 5657 {code} The number of open file handles for the nifi process is quite steady. If I look at only PIPE file handles, I see it very steady as well: {code:java} nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | grep PIPE | wc -l 27 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | grep PIPE | wc -l 27 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | grep PIPE | wc -l 27 nifi-1.10.0-SNAPSHOT-bin $ lsof -p 21433 | grep PIPE | wc -l 27 {code} I'm curious what's different. How long did you allow your system to run before starting to see an uptick in open file handles? Can you try running the command that I showed above and see if you get something different? > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-6790) Load-balanced queues cause huge number of open pipes
[ https://issues.apache.org/jira/browse/NIFI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954065#comment-16954065 ] Matthew Knight commented on NIFI-6790: -- I didn't run this exact same flow but I did try a similar experiment with the latest NiFi 1.10.0-SNAPSHOT I built myself and it was still a problem, so this is still present in master as of a few days ago. > Load-balanced queues cause huge number of open pipes > > > Key: NIFI-6790 > URL: https://issues.apache.org/jira/browse/NIFI-6790 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.9.2 > Environment: Ubuntu,OpenJDK8 >Reporter: Matthew Knight >Priority: Critical > Labels: Cluster, LoadBalancing > Attachments: flow_exception.png, lsof_loadbalanced.png, > lsof_notloadbalanced.png > > Original Estimate: 672h > Remaining Estimate: 672h > > Even very basic flows using load balance connecting queues result in huge > numbers of open pipes which don't seem to get closed and result in the entire > cluster being brought to a standstill because of "too many open files". I > tried apples-to-apples with equivalent very basic flows, one load balanced > and one not, to show how easy this has been to reproduce. I'm attaching some > screenshots that show the "too many open files" issue in action in my NiFi > cluster, along with the output of LSOF for the primary node of a cluster with > vs. without load balancing. When load balancing is enabled there is a huge > number of 'pipe' and 'eventpoll' items, without load balancing things are a > bit more balanced. -- This message was sent by Atlassian Jira (v8.3.4#803005)