Re: Restarting NiFi causing SiteToSiteBulletinReportingTask to fail
Hi Pierre, Thanks for the updates and HCC comments as well. -Chad On 4/18/18, 5:36 AM, "Pierre Villard" wrote: I created to https://issues.apache.org/jira/browse/NIFI-5092 to track the issue. Will submit a fix really soon. Current workaround: after a NiFi restart, stop the reporting task, clear the state of the reporting task and start the reporting task. Pierre 2018-04-18 0:04 GMT+02:00 Pierre Villard : > Hi Chad, > > I confirm that I can reproduce the issue on my side with a NiFi 1.5.0 > cluster and I don't see anything that would fix it in NiFi 1.6.0. > > I had a closer look and it does not seem related to the Site-to-Site > mechanism: the thread in charge of refreshing the peers is correctly > running and you should see logs like "Successfully refreshed Peer Status; > remote instance consists of X peers". > > As far as I can see, it sounds related to how we are caching the ID of the > last bulletin sent and how we retrieve this value to "restart" the task > after the NiFi node restarted. That's why you have to delete the task and > create it again: it'll delete the associated cache. > > That's just an assumption after a quick look, I'll keep digging tomorrow > and open a JIRA for that. > > Thanks for reporting it! > > Pierre > > > 2018-04-12 23:41 GMT+02:00 Pierre Villard : > >> Hi Chad, >> >> I believe this could have been fixed recently but I've very limited >> access right now (and for the next few days) and can't be sure... >> I will check next week if no one gave you feedbacks before. >> >> Pierre >> >> 2018-04-12 19:57 GMT+02:00 Woodhead, Chad : >> >>> I am running HDF https://urldefense.proofpoint.com/v2/url?u=http-3A__3.0.1.1&d=DwIBaQ&c=gJN2jf8AyP5Q6Np0yWY19w&r=MJ04HXP0mOz9-J4odYRNRx3ln4A_OnHTjJvmsZOEG64&m=HjckJSegMO_Vjm51wNuSBdY4V9QxOuWuJGoOWv-Q1hs&s=EHpb-XSM3jNvt8gU9Ozx8o9sSTZF0V4BgIZqCBDSn2g&e= which comes with NiFi 1.2.0.3.0.1.1-5. We are >>> using SiteToSiteBulletinReportingTask to monitor bulletins (for things >>> like Disk Usage and Memory Usage). When we restart NiFi via Ambari (either >>> with a Restart or Stop and then Start), when NiFi comes back up the >>> SiteToSiteBulletinReportingTask no longer works. It throws the >>> following error when it is first trying to start up: >>> >>> SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000--3ccd7573] >>> org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh >>> Remote Group's peers due to response code 409:Conflict with explanation: >>> null >>> >>> No matter how long we wait, it never works. The ways I have been able to >>> get it to start working again are as follows: >>> >>> * Stop and then Start the Remote Input Port the >>> SiteToSiteBulletinReportingTask is using >>> * Delete the SiteToSiteBulletinReportingTask and create a new one >>> * Wait a while and stop and start the SiteToSiteBulletinReportingTask >>> (however this doesn't work consistently) >>> >>> I have tested the same flow steps using a process that uses a Remote >>> Process Group and a different Remote Input Port, and that RPG throws the >>> same error when first coming up but then starts working after a period of >>> time. So maybe the SiteToSiteBulletinReportingTask isn't trying enough >>> times to connect to the Remote Input Port? >>> >>> Sincerely, >>> Chad Woodhead >>> >> >> >
Re: Restarting NiFi causing SiteToSiteBulletinReportingTask to fail
I created to https://issues.apache.org/jira/browse/NIFI-5092 to track the issue. Will submit a fix really soon. Current workaround: after a NiFi restart, stop the reporting task, clear the state of the reporting task and start the reporting task. Pierre 2018-04-18 0:04 GMT+02:00 Pierre Villard : > Hi Chad, > > I confirm that I can reproduce the issue on my side with a NiFi 1.5.0 > cluster and I don't see anything that would fix it in NiFi 1.6.0. > > I had a closer look and it does not seem related to the Site-to-Site > mechanism: the thread in charge of refreshing the peers is correctly > running and you should see logs like "Successfully refreshed Peer Status; > remote instance consists of X peers". > > As far as I can see, it sounds related to how we are caching the ID of the > last bulletin sent and how we retrieve this value to "restart" the task > after the NiFi node restarted. That's why you have to delete the task and > create it again: it'll delete the associated cache. > > That's just an assumption after a quick look, I'll keep digging tomorrow > and open a JIRA for that. > > Thanks for reporting it! > > Pierre > > > 2018-04-12 23:41 GMT+02:00 Pierre Villard : > >> Hi Chad, >> >> I believe this could have been fixed recently but I've very limited >> access right now (and for the next few days) and can't be sure... >> I will check next week if no one gave you feedbacks before. >> >> Pierre >> >> 2018-04-12 19:57 GMT+02:00 Woodhead, Chad : >> >>> I am running HDF 3.0.1.1 which comes with NiFi 1.2.0.3.0.1.1-5. We are >>> using SiteToSiteBulletinReportingTask to monitor bulletins (for things >>> like Disk Usage and Memory Usage). When we restart NiFi via Ambari (either >>> with a Restart or Stop and then Start), when NiFi comes back up the >>> SiteToSiteBulletinReportingTask no longer works. It throws the >>> following error when it is first trying to start up: >>> >>> SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000--3ccd7573] >>> org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh >>> Remote Group's peers due to response code 409:Conflict with explanation: >>> null >>> >>> No matter how long we wait, it never works. The ways I have been able to >>> get it to start working again are as follows: >>> >>> * Stop and then Start the Remote Input Port the >>> SiteToSiteBulletinReportingTask is using >>> * Delete the SiteToSiteBulletinReportingTask and create a new one >>> * Wait a while and stop and start the SiteToSiteBulletinReportingTask >>> (however this doesn't work consistently) >>> >>> I have tested the same flow steps using a process that uses a Remote >>> Process Group and a different Remote Input Port, and that RPG throws the >>> same error when first coming up but then starts working after a period of >>> time. So maybe the SiteToSiteBulletinReportingTask isn't trying enough >>> times to connect to the Remote Input Port? >>> >>> Sincerely, >>> Chad Woodhead >>> >> >> >
Re: Restarting NiFi causing SiteToSiteBulletinReportingTask to fail
Hi Chad, I confirm that I can reproduce the issue on my side with a NiFi 1.5.0 cluster and I don't see anything that would fix it in NiFi 1.6.0. I had a closer look and it does not seem related to the Site-to-Site mechanism: the thread in charge of refreshing the peers is correctly running and you should see logs like "Successfully refreshed Peer Status; remote instance consists of X peers". As far as I can see, it sounds related to how we are caching the ID of the last bulletin sent and how we retrieve this value to "restart" the task after the NiFi node restarted. That's why you have to delete the task and create it again: it'll delete the associated cache. That's just an assumption after a quick look, I'll keep digging tomorrow and open a JIRA for that. Thanks for reporting it! Pierre 2018-04-12 23:41 GMT+02:00 Pierre Villard : > Hi Chad, > > I believe this could have been fixed recently but I've very limited access > right now (and for the next few days) and can't be sure... > I will check next week if no one gave you feedbacks before. > > Pierre > > 2018-04-12 19:57 GMT+02:00 Woodhead, Chad : > >> I am running HDF 3.0.1.1 which comes with NiFi 1.2.0.3.0.1.1-5. We are >> using SiteToSiteBulletinReportingTask to monitor bulletins (for things >> like Disk Usage and Memory Usage). When we restart NiFi via Ambari (either >> with a Restart or Stop and then Start), when NiFi comes back up the >> SiteToSiteBulletinReportingTask no longer works. It throws the following >> error when it is first trying to start up: >> >> SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000--3ccd7573] >> org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh >> Remote Group's peers due to response code 409:Conflict with explanation: >> null >> >> No matter how long we wait, it never works. The ways I have been able to >> get it to start working again are as follows: >> >> * Stop and then Start the Remote Input Port the >> SiteToSiteBulletinReportingTask is using >> * Delete the SiteToSiteBulletinReportingTask and create a new one >> * Wait a while and stop and start the SiteToSiteBulletinReportingTask >> (however this doesn't work consistently) >> >> I have tested the same flow steps using a process that uses a Remote >> Process Group and a different Remote Input Port, and that RPG throws the >> same error when first coming up but then starts working after a period of >> time. So maybe the SiteToSiteBulletinReportingTask isn't trying enough >> times to connect to the Remote Input Port? >> >> Sincerely, >> Chad Woodhead >> > >
Re: Restarting NiFi causing SiteToSiteBulletinReportingTask to fail
Hi Chad, I believe this could have been fixed recently but I've very limited access right now (and for the next few days) and can't be sure... I will check next week if no one gave you feedbacks before. Pierre 2018-04-12 19:57 GMT+02:00 Woodhead, Chad : > I am running HDF 3.0.1.1 which comes with NiFi 1.2.0.3.0.1.1-5. We are > using SiteToSiteBulletinReportingTask to monitor bulletins (for things > like Disk Usage and Memory Usage). When we restart NiFi via Ambari (either > with a Restart or Stop and then Start), when NiFi comes back up the > SiteToSiteBulletinReportingTask no longer works. It throws the following > error when it is first trying to start up: > > SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000--3ccd7573] > org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh > Remote Group's peers due to response code 409:Conflict with explanation: > null > > No matter how long we wait, it never works. The ways I have been able to > get it to start working again are as follows: > > * Stop and then Start the Remote Input Port the > SiteToSiteBulletinReportingTask is using > * Delete the SiteToSiteBulletinReportingTask and create a new one > * Wait a while and stop and start the SiteToSiteBulletinReportingTask > (however this doesn't work consistently) > > I have tested the same flow steps using a process that uses a Remote > Process Group and a different Remote Input Port, and that RPG throws the > same error when first coming up but then starts working after a period of > time. So maybe the SiteToSiteBulletinReportingTask isn't trying enough > times to connect to the Remote Input Port? > > Sincerely, > Chad Woodhead >
Restarting NiFi causing SiteToSiteBulletinReportingTask to fail
I am running HDF 3.0.1.1 which comes with NiFi 1.2.0.3.0.1.1-5. We are using SiteToSiteBulletinReportingTask to monitor bulletins (for things like Disk Usage and Memory Usage). When we restart NiFi via Ambari (either with a Restart or Stop and then Start), when NiFi comes back up the SiteToSiteBulletinReportingTask no longer works. It throws the following error when it is first trying to start up: SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000--3ccd7573] org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh Remote Group's peers due to response code 409:Conflict with explanation: null No matter how long we wait, it never works. The ways I have been able to get it to start working again are as follows: * Stop and then Start the Remote Input Port the SiteToSiteBulletinReportingTask is using * Delete the SiteToSiteBulletinReportingTask and create a new one * Wait a while and stop and start the SiteToSiteBulletinReportingTask (however this doesn't work consistently) I have tested the same flow steps using a process that uses a Remote Process Group and a different Remote Input Port, and that RPG throws the same error when first coming up but then starts working after a period of time. So maybe the SiteToSiteBulletinReportingTask isn't trying enough times to connect to the Remote Input Port? Sincerely, Chad Woodhead