Re: Nifi 1.12.1 cluster is getting hung after few days(15 days)
Hello Please capture and share a full thread dump by running bin/nifi.sh dump. and please post these so theyre easier to read than this email system. Thanks On Thu, Jan 7, 2021 at 5:22 AM sanjeet rath wrote: > Hi All, > > Could someone please give me thoughts on the trailed mail issue, so i can > do my further analysis. > > Regards, > Sanjeet > > On Wed, 6 Jan 2021, 7:40 pm sanjeet rath, wrote: > >> Hi All, >> >> Happy New Year :) >> >> I have upgraded our cluster from 1.8 to 1.12.1, few days ago and everything >> is working fine. I observed that Nifi was like hanged after running for few >> days (I have observed its nearly after 15 days of nifi service start) issue >> is after login the browser keep on loading , When I saw the bootstrap.log I >> saw this message "*Apache nifi is running at PID () but not responding >> to ping requests*”. >> This happened to only one node from a 3 node cluster. >> >> This issue happened *3 times on different cluster on different nodes.* >> >> *Everytime issue got fixed by restarting NiFi service.* >> >> During the hanged state I tried see the resource utilisation >> >> -> top -n 1 -H -p 943785 (nifi processid ) >> >> >> top - 08:26:36 up 40 days, 3:48, 2 users, load average: 5.28, 5.38, 5.43 >> Threads: 239 total, 4 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s): >> 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : >> 15829.5 total, 610.8 free, 10823.7 used, 4395.0 buff/cache MiB Swap: 0.0 >> total, 0.0 free, 0.0 used. 4456.1 avail Mem >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> >> *943806* root 20 0 12.5g 9.4g 18692 R *88.9* 60.7 12698:50 *GC Thread#1 * >> >> 943807 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:48 GC Thread#2 >> >> 943808 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:58 GC Thread#3 >> >> 943787 root 20 0 12.5g 9.4g 18692 R 83.3 60.7 12698:51 GC Thread#0 >> >> 943785 root 20 0 12.5g 9.4g 18692 S 0.0 60.7 0:00.00 java >> >> >> We have 4 core cpu, all *4 GC threads* are keep on this state and >> consuming more CPU.*cluster is hung state for 2 days,* Then after 2 days >> I saw these threads are moved and nifi comes out of the hung state for this >> node , but saw another node from the same cluster moved to the hung state >> with similar fashion means , 4 threads busy in GC and consuming more CPU. >> >> >> Could you please help me to identify what could be the possible reason. >> >> Details: >> >> Nifi 1.12.1 >> >> Jdk 11 >> >> Zookeeper 3.5.8 >> >> 16g memory >> >> >> >> Thanks, >> -- >> Sanjeet Kumar Rath, >> mob- +91 8777577470 >> >> >>
Re: Nifi 1.12.1 cluster is getting hung after few days(15 days)
Hi All, Could someone please give me thoughts on the trailed mail issue, so i can do my further analysis. Regards, Sanjeet On Wed, 6 Jan 2021, 7:40 pm sanjeet rath, wrote: > Hi All, > > Happy New Year :) > > I have upgraded our cluster from 1.8 to 1.12.1, few days ago and everything > is working fine. I observed that Nifi was like hanged after running for few > days (I have observed its nearly after 15 days of nifi service start) issue > is after login the browser keep on loading , When I saw the bootstrap.log I > saw this message "*Apache nifi is running at PID () but not responding to > ping requests*”. > This happened to only one node from a 3 node cluster. > > This issue happened *3 times on different cluster on different nodes.* > > *Everytime issue got fixed by restarting NiFi service.* > > During the hanged state I tried see the resource utilisation > > -> top -n 1 -H -p 943785 (nifi processid ) > > > top - 08:26:36 up 40 days, 3:48, 2 users, load average: 5.28, 5.38, 5.43 > Threads: 239 total, 4 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s): > 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : > 15829.5 total, 610.8 free, 10823.7 used, 4395.0 buff/cache MiB Swap: 0.0 > total, 0.0 free, 0.0 used. 4456.1 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > *943806* root 20 0 12.5g 9.4g 18692 R *88.9* 60.7 12698:50 *GC Thread#1 * > > 943807 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:48 GC Thread#2 > > 943808 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:58 GC Thread#3 > > 943787 root 20 0 12.5g 9.4g 18692 R 83.3 60.7 12698:51 GC Thread#0 > > 943785 root 20 0 12.5g 9.4g 18692 S 0.0 60.7 0:00.00 java > > > We have 4 core cpu, all *4 GC threads* are keep on this state and > consuming more CPU.*cluster is hung state for 2 days,* Then after 2 days > I saw these threads are moved and nifi comes out of the hung state for this > node , but saw another node from the same cluster moved to the hung state > with similar fashion means , 4 threads busy in GC and consuming more CPU. > > > Could you please help me to identify what could be the possible reason. > > Details: > > Nifi 1.12.1 > > Jdk 11 > > Zookeeper 3.5.8 > > 16g memory > > > > Thanks, > -- > Sanjeet Kumar Rath, > mob- +91 8777577470 > > >
Nifi 1.12.1 cluster is getting hung after few days(15 days)
Hi All, Happy New Year :) I have upgraded our cluster from 1.8 to 1.12.1, few days ago and everything is working fine. I observed that Nifi was like hanged after running for few days (I have observed its nearly after 15 days of nifi service start) issue is after login the browser keep on loading , When I saw the bootstrap.log I saw this message "*Apache nifi is running at PID () but not responding to ping requests*”. This happened to only one node from a 3 node cluster. This issue happened *3 times on different cluster on different nodes.* *Everytime issue got fixed by restarting NiFi service.* During the hanged state I tried see the resource utilisation -> top -n 1 -H -p 943785 (nifi processid ) top - 08:26:36 up 40 days, 3:48, 2 users, load average: 5.28, 5.38, 5.43 Threads: 239 total, 4 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s): 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 15829.5 total, 610.8 free, 10823.7 used, 4395.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 4456.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND *943806* root 20 0 12.5g 9.4g 18692 R *88.9* 60.7 12698:50 *GC Thread#1 * 943807 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:48 GC Thread#2 943808 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:58 GC Thread#3 943787 root 20 0 12.5g 9.4g 18692 R 83.3 60.7 12698:51 GC Thread#0 943785 root 20 0 12.5g 9.4g 18692 S 0.0 60.7 0:00.00 java We have 4 core cpu, all *4 GC threads* are keep on this state and consuming more CPU.*cluster is hung state for 2 days,* Then after 2 days I saw these threads are moved and nifi comes out of the hung state for this node , but saw another node from the same cluster moved to the hung state with similar fashion means , 4 threads busy in GC and consuming more CPU. Could you please help me to identify what could be the possible reason. Details: Nifi 1.12.1 Jdk 11 Zookeeper 3.5.8 16g memory Thanks, -- Sanjeet Kumar Rath, mob- +91 8777577470