Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Thanks for the updates, Ben. Glad to hear that! Koji On Tue, Dec 26, 2017 at 4:21 PM, 尹文才 wrote: > Thanks Koji, I have already updated the logback configuration to produce > more verbose logs. > I was trying to reply to you with the verbose nifi logs but since I > switched to use the WriteAheadProvenanceRepository implementation, up till > now I haven't seen the error again. > I will continue to check when the error might occur and post the logs here > if needed. Once again thanks very much for your help. > > Regards, > Ben > > 2017-12-25 15:37 GMT+08:00 Koji Kawamura : > >> Hi Ben, >> >> You can make NiFi log more verbose by editing: >> NIFI_HOME/conf/logback.xml >> >> For example, adding following entry will reveal how NiFi repositories run: >> >> >> >> >> >> >> Thanks, >> Koji >> >> On Mon, Dec 25, 2017 at 4:30 PM, 尹文才 wrote: >> > Hi Koji, I also didn't find anything related to the unexpected shutdown >> in >> > my logs, is there anything I could do to make NIFI log more verbose >> > information to the logs? >> > >> > Regards, >> > Ben >> > >> > 2017-12-25 14:56 GMT+08:00 Koji Kawamura : >> > >> >> Hi Ben, >> >> >> >> I looked at the log and I expected to see some indication for the >> >> cause of shutdown, but couldn't find any. >> >> The PersistentProvenanceRepository rate warning is just a warning, and >> >> it shouldn't be the trigger of an unexpected shutdown. I suspect other >> >> reasons such as OOM killer, but I can't do any further investigation >> >> with only these logs. >> >> >> >> Thanks, >> >> Koji >> >> >> >> On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 wrote: >> >> > Hi Koji, one more thing, do you have any idea why my first issue >> leads to >> >> > the unexpected shutdown of NIFI? according to the words, it will just >> >> slow >> >> > down the flow. thanks. >> >> > >> >> > Regards, >> >> > Ben >> >> > >> >> > 2017-12-25 14:31 GMT+08:00 尹文才 : >> >> > >> >> >> Hi Koji, thanks for your help, for the first issue, I will switch to >> use >> >> >> the WriteAheadProvenanceReopsitory implementation. >> >> >> >> >> >> For the second issue, I have uploaded the relevant part of my log >> file >> >> >> onto my google drive, the link is: >> >> >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj >> >> >> >> >> >> You mean a custom processor could possibly process a flowfile twice >> only >> >> >> when it's trying to commit the session but it's interrupted so the >> >> flowfile >> >> >> still remains inside the original queue(like NIFI went down)? >> >> >> >> >> >> If you need to see the full log file, please let me know, thanks. >> >> >> >> >> >> Regards, >> >> >> Ben >> >> >> >> >> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura : >> >> >> >> >> >>> Hi Ben, >> >> >>> >> >> >>> For your 2nd issue, NiFi commits a process session in Processor >> >> >>> onTrigger when it's executed by NiFi flow engine by calling >> >> >>> session.commit(). >> >> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main >> >> >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28 >> >> >>> Once a process session is committed, the FlowFile state (including >> >> >>> which queue it is in) is persisted to disk. >> >> >>> >> >> >>> It's possible for a Processor to process the same FlowFile more than >> >> >>> once, if it has done its job, but failed to commit the session. >> >> >>> For example, if your custom processor created a temp table from a >> >> >>> FlowFile. Then before the process session is committed, something >> >> >>> happened and NiFi process session was rollback. In this case, the >> >> >>> target database is already updated (the temp table is created), but >> >> >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is >> >> >>> processed again, the processor will get an error indicating the >> table >> >> >>> already exists. >> >> >>> >> >> >>> I tried to look at the logs you attached, but attachments do not >> seem >> >> >>> to be delivered to this ML. I don't see anything attached. >> >> >>> >> >> >>> Thanks, >> >> >>> Koji >> >> >>> >> >> >>> >> >> >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura < >> ijokaruma...@gmail.com >> >> > >> >> >>> wrote: >> >> >>> > Hi Ben, >> >> >>> > >> >> >>> > Just a quick recommendation for your first issue, 'The rate of the >> >> >>> > dataflow is exceeding the provenance recording rate' warning >> message. >> >> >>> > I'd recommend using WriteAheadProvenanceRepository instead of >> >> >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository >> >> >>> > provides better performance. >> >> >>> > Please take a look at the documentation here. >> >> >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g >> >> >>> uide.html#provenance-repository >> >> >>> > >> >> >>> > Thanks, >> >> >>> > Koji >> >> >>> > >> >> >>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 >> wrote: >> >> >>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and >> I >> >> have >> >> >>> >> encountered 2 problems dur
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Thanks Koji, I have already updated the logback configuration to produce more verbose logs. I was trying to reply to you with the verbose nifi logs but since I switched to use the WriteAheadProvenanceRepository implementation, up till now I haven't seen the error again. I will continue to check when the error might occur and post the logs here if needed. Once again thanks very much for your help. Regards, Ben 2017-12-25 15:37 GMT+08:00 Koji Kawamura : > Hi Ben, > > You can make NiFi log more verbose by editing: > NIFI_HOME/conf/logback.xml > > For example, adding following entry will reveal how NiFi repositories run: > > > > > > > Thanks, > Koji > > On Mon, Dec 25, 2017 at 4:30 PM, 尹文才 wrote: > > Hi Koji, I also didn't find anything related to the unexpected shutdown > in > > my logs, is there anything I could do to make NIFI log more verbose > > information to the logs? > > > > Regards, > > Ben > > > > 2017-12-25 14:56 GMT+08:00 Koji Kawamura : > > > >> Hi Ben, > >> > >> I looked at the log and I expected to see some indication for the > >> cause of shutdown, but couldn't find any. > >> The PersistentProvenanceRepository rate warning is just a warning, and > >> it shouldn't be the trigger of an unexpected shutdown. I suspect other > >> reasons such as OOM killer, but I can't do any further investigation > >> with only these logs. > >> > >> Thanks, > >> Koji > >> > >> On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 wrote: > >> > Hi Koji, one more thing, do you have any idea why my first issue > leads to > >> > the unexpected shutdown of NIFI? according to the words, it will just > >> slow > >> > down the flow. thanks. > >> > > >> > Regards, > >> > Ben > >> > > >> > 2017-12-25 14:31 GMT+08:00 尹文才 : > >> > > >> >> Hi Koji, thanks for your help, for the first issue, I will switch to > use > >> >> the WriteAheadProvenanceReopsitory implementation. > >> >> > >> >> For the second issue, I have uploaded the relevant part of my log > file > >> >> onto my google drive, the link is: > >> >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj > >> >> > >> >> You mean a custom processor could possibly process a flowfile twice > only > >> >> when it's trying to commit the session but it's interrupted so the > >> flowfile > >> >> still remains inside the original queue(like NIFI went down)? > >> >> > >> >> If you need to see the full log file, please let me know, thanks. > >> >> > >> >> Regards, > >> >> Ben > >> >> > >> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura : > >> >> > >> >>> Hi Ben, > >> >>> > >> >>> For your 2nd issue, NiFi commits a process session in Processor > >> >>> onTrigger when it's executed by NiFi flow engine by calling > >> >>> session.commit(). > >> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main > >> >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28 > >> >>> Once a process session is committed, the FlowFile state (including > >> >>> which queue it is in) is persisted to disk. > >> >>> > >> >>> It's possible for a Processor to process the same FlowFile more than > >> >>> once, if it has done its job, but failed to commit the session. > >> >>> For example, if your custom processor created a temp table from a > >> >>> FlowFile. Then before the process session is committed, something > >> >>> happened and NiFi process session was rollback. In this case, the > >> >>> target database is already updated (the temp table is created), but > >> >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is > >> >>> processed again, the processor will get an error indicating the > table > >> >>> already exists. > >> >>> > >> >>> I tried to look at the logs you attached, but attachments do not > seem > >> >>> to be delivered to this ML. I don't see anything attached. > >> >>> > >> >>> Thanks, > >> >>> Koji > >> >>> > >> >>> > >> >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura < > ijokaruma...@gmail.com > >> > > >> >>> wrote: > >> >>> > Hi Ben, > >> >>> > > >> >>> > Just a quick recommendation for your first issue, 'The rate of the > >> >>> > dataflow is exceeding the provenance recording rate' warning > message. > >> >>> > I'd recommend using WriteAheadProvenanceRepository instead of > >> >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository > >> >>> > provides better performance. > >> >>> > Please take a look at the documentation here. > >> >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g > >> >>> uide.html#provenance-repository > >> >>> > > >> >>> > Thanks, > >> >>> > Koji > >> >>> > > >> >>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 > wrote: > >> >>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and > I > >> have > >> >>> >> encountered 2 problems during my testing. > >> >>> >> > >> >>> >> The first problem is I found the nifi bulletin board was showing > the > >> >>> >> following warning to me: > >> >>> >> > >> >>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] > >> >>> >> o
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Hi Ben, You can make NiFi log more verbose by editing: NIFI_HOME/conf/logback.xml For example, adding following entry will reveal how NiFi repositories run: Thanks, Koji On Mon, Dec 25, 2017 at 4:30 PM, 尹文才 wrote: > Hi Koji, I also didn't find anything related to the unexpected shutdown in > my logs, is there anything I could do to make NIFI log more verbose > information to the logs? > > Regards, > Ben > > 2017-12-25 14:56 GMT+08:00 Koji Kawamura : > >> Hi Ben, >> >> I looked at the log and I expected to see some indication for the >> cause of shutdown, but couldn't find any. >> The PersistentProvenanceRepository rate warning is just a warning, and >> it shouldn't be the trigger of an unexpected shutdown. I suspect other >> reasons such as OOM killer, but I can't do any further investigation >> with only these logs. >> >> Thanks, >> Koji >> >> On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 wrote: >> > Hi Koji, one more thing, do you have any idea why my first issue leads to >> > the unexpected shutdown of NIFI? according to the words, it will just >> slow >> > down the flow. thanks. >> > >> > Regards, >> > Ben >> > >> > 2017-12-25 14:31 GMT+08:00 尹文才 : >> > >> >> Hi Koji, thanks for your help, for the first issue, I will switch to use >> >> the WriteAheadProvenanceReopsitory implementation. >> >> >> >> For the second issue, I have uploaded the relevant part of my log file >> >> onto my google drive, the link is: >> >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj >> >> >> >> You mean a custom processor could possibly process a flowfile twice only >> >> when it's trying to commit the session but it's interrupted so the >> flowfile >> >> still remains inside the original queue(like NIFI went down)? >> >> >> >> If you need to see the full log file, please let me know, thanks. >> >> >> >> Regards, >> >> Ben >> >> >> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura : >> >> >> >>> Hi Ben, >> >>> >> >>> For your 2nd issue, NiFi commits a process session in Processor >> >>> onTrigger when it's executed by NiFi flow engine by calling >> >>> session.commit(). >> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main >> >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28 >> >>> Once a process session is committed, the FlowFile state (including >> >>> which queue it is in) is persisted to disk. >> >>> >> >>> It's possible for a Processor to process the same FlowFile more than >> >>> once, if it has done its job, but failed to commit the session. >> >>> For example, if your custom processor created a temp table from a >> >>> FlowFile. Then before the process session is committed, something >> >>> happened and NiFi process session was rollback. In this case, the >> >>> target database is already updated (the temp table is created), but >> >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is >> >>> processed again, the processor will get an error indicating the table >> >>> already exists. >> >>> >> >>> I tried to look at the logs you attached, but attachments do not seem >> >>> to be delivered to this ML. I don't see anything attached. >> >>> >> >>> Thanks, >> >>> Koji >> >>> >> >>> >> >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura > > >> >>> wrote: >> >>> > Hi Ben, >> >>> > >> >>> > Just a quick recommendation for your first issue, 'The rate of the >> >>> > dataflow is exceeding the provenance recording rate' warning message. >> >>> > I'd recommend using WriteAheadProvenanceRepository instead of >> >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository >> >>> > provides better performance. >> >>> > Please take a look at the documentation here. >> >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g >> >>> uide.html#provenance-repository >> >>> > >> >>> > Thanks, >> >>> > Koji >> >>> > >> >>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 wrote: >> >>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I >> have >> >>> >> encountered 2 problems during my testing. >> >>> >> >> >>> >> The first problem is I found the nifi bulletin board was showing the >> >>> >> following warning to me: >> >>> >> >> >>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] >> >>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is >> >>> exceeding >> >>> >> the provenance recording rate. Slowing down flow to accommodate. >> >>> Currently, >> >>> >> there are 96 journal files (158278228 bytes) and threshold for >> >>> blocking is >> >>> >> 80 (1181116006 bytes) >> >>> >> >> >>> >> I don't quite understand what this means, and I found also inside >> the >> >>> >> bootstrap log that nifi restarted itself: >> >>> >> >> >>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap. >> RunNiFi >> >>> Apache >> >>> >> NiFi appears to have died. Restarting... >> >>> >> >> >>> >> Is there anything I could do so solve this problem? >> >>> >> >> >>> >> The second problem is about the FlowFiles inside my
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Hi Koji, I also didn't find anything related to the unexpected shutdown in my logs, is there anything I could do to make NIFI log more verbose information to the logs? Regards, Ben 2017-12-25 14:56 GMT+08:00 Koji Kawamura : > Hi Ben, > > I looked at the log and I expected to see some indication for the > cause of shutdown, but couldn't find any. > The PersistentProvenanceRepository rate warning is just a warning, and > it shouldn't be the trigger of an unexpected shutdown. I suspect other > reasons such as OOM killer, but I can't do any further investigation > with only these logs. > > Thanks, > Koji > > On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 wrote: > > Hi Koji, one more thing, do you have any idea why my first issue leads to > > the unexpected shutdown of NIFI? according to the words, it will just > slow > > down the flow. thanks. > > > > Regards, > > Ben > > > > 2017-12-25 14:31 GMT+08:00 尹文才 : > > > >> Hi Koji, thanks for your help, for the first issue, I will switch to use > >> the WriteAheadProvenanceReopsitory implementation. > >> > >> For the second issue, I have uploaded the relevant part of my log file > >> onto my google drive, the link is: > >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj > >> > >> You mean a custom processor could possibly process a flowfile twice only > >> when it's trying to commit the session but it's interrupted so the > flowfile > >> still remains inside the original queue(like NIFI went down)? > >> > >> If you need to see the full log file, please let me know, thanks. > >> > >> Regards, > >> Ben > >> > >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura : > >> > >>> Hi Ben, > >>> > >>> For your 2nd issue, NiFi commits a process session in Processor > >>> onTrigger when it's executed by NiFi flow engine by calling > >>> session.commit(). > >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main > >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28 > >>> Once a process session is committed, the FlowFile state (including > >>> which queue it is in) is persisted to disk. > >>> > >>> It's possible for a Processor to process the same FlowFile more than > >>> once, if it has done its job, but failed to commit the session. > >>> For example, if your custom processor created a temp table from a > >>> FlowFile. Then before the process session is committed, something > >>> happened and NiFi process session was rollback. In this case, the > >>> target database is already updated (the temp table is created), but > >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is > >>> processed again, the processor will get an error indicating the table > >>> already exists. > >>> > >>> I tried to look at the logs you attached, but attachments do not seem > >>> to be delivered to this ML. I don't see anything attached. > >>> > >>> Thanks, > >>> Koji > >>> > >>> > >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura > > >>> wrote: > >>> > Hi Ben, > >>> > > >>> > Just a quick recommendation for your first issue, 'The rate of the > >>> > dataflow is exceeding the provenance recording rate' warning message. > >>> > I'd recommend using WriteAheadProvenanceRepository instead of > >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository > >>> > provides better performance. > >>> > Please take a look at the documentation here. > >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g > >>> uide.html#provenance-repository > >>> > > >>> > Thanks, > >>> > Koji > >>> > > >>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 wrote: > >>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I > have > >>> >> encountered 2 problems during my testing. > >>> >> > >>> >> The first problem is I found the nifi bulletin board was showing the > >>> >> following warning to me: > >>> >> > >>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] > >>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is > >>> exceeding > >>> >> the provenance recording rate. Slowing down flow to accommodate. > >>> Currently, > >>> >> there are 96 journal files (158278228 bytes) and threshold for > >>> blocking is > >>> >> 80 (1181116006 bytes) > >>> >> > >>> >> I don't quite understand what this means, and I found also inside > the > >>> >> bootstrap log that nifi restarted itself: > >>> >> > >>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap. > RunNiFi > >>> Apache > >>> >> NiFi appears to have died. Restarting... > >>> >> > >>> >> Is there anything I could do so solve this problem? > >>> >> > >>> >> The second problem is about the FlowFiles inside my flow, I actually > >>> >> implemented a few custom processors to do the ETL work. one is to > >>> extract > >>> >> multiple tables from sql server and for each flowfile out of it, it > >>> contains > >>> >> an attribute > >>> >> specifying the name of the temp ods table to create, and the second > >>> >> processor is to get all flowfiles from the first processor and
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Hi Ben, I looked at the log and I expected to see some indication for the cause of shutdown, but couldn't find any. The PersistentProvenanceRepository rate warning is just a warning, and it shouldn't be the trigger of an unexpected shutdown. I suspect other reasons such as OOM killer, but I can't do any further investigation with only these logs. Thanks, Koji On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 wrote: > Hi Koji, one more thing, do you have any idea why my first issue leads to > the unexpected shutdown of NIFI? according to the words, it will just slow > down the flow. thanks. > > Regards, > Ben > > 2017-12-25 14:31 GMT+08:00 尹文才 : > >> Hi Koji, thanks for your help, for the first issue, I will switch to use >> the WriteAheadProvenanceReopsitory implementation. >> >> For the second issue, I have uploaded the relevant part of my log file >> onto my google drive, the link is: >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj >> >> You mean a custom processor could possibly process a flowfile twice only >> when it's trying to commit the session but it's interrupted so the flowfile >> still remains inside the original queue(like NIFI went down)? >> >> If you need to see the full log file, please let me know, thanks. >> >> Regards, >> Ben >> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura : >> >>> Hi Ben, >>> >>> For your 2nd issue, NiFi commits a process session in Processor >>> onTrigger when it's executed by NiFi flow engine by calling >>> session.commit(). >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28 >>> Once a process session is committed, the FlowFile state (including >>> which queue it is in) is persisted to disk. >>> >>> It's possible for a Processor to process the same FlowFile more than >>> once, if it has done its job, but failed to commit the session. >>> For example, if your custom processor created a temp table from a >>> FlowFile. Then before the process session is committed, something >>> happened and NiFi process session was rollback. In this case, the >>> target database is already updated (the temp table is created), but >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is >>> processed again, the processor will get an error indicating the table >>> already exists. >>> >>> I tried to look at the logs you attached, but attachments do not seem >>> to be delivered to this ML. I don't see anything attached. >>> >>> Thanks, >>> Koji >>> >>> >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura >>> wrote: >>> > Hi Ben, >>> > >>> > Just a quick recommendation for your first issue, 'The rate of the >>> > dataflow is exceeding the provenance recording rate' warning message. >>> > I'd recommend using WriteAheadProvenanceRepository instead of >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository >>> > provides better performance. >>> > Please take a look at the documentation here. >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g >>> uide.html#provenance-repository >>> > >>> > Thanks, >>> > Koji >>> > >>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 wrote: >>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have >>> >> encountered 2 problems during my testing. >>> >> >>> >> The first problem is I found the nifi bulletin board was showing the >>> >> following warning to me: >>> >> >>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] >>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is >>> exceeding >>> >> the provenance recording rate. Slowing down flow to accommodate. >>> Currently, >>> >> there are 96 journal files (158278228 bytes) and threshold for >>> blocking is >>> >> 80 (1181116006 bytes) >>> >> >>> >> I don't quite understand what this means, and I found also inside the >>> >> bootstrap log that nifi restarted itself: >>> >> >>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi >>> Apache >>> >> NiFi appears to have died. Restarting... >>> >> >>> >> Is there anything I could do so solve this problem? >>> >> >>> >> The second problem is about the FlowFiles inside my flow, I actually >>> >> implemented a few custom processors to do the ETL work. one is to >>> extract >>> >> multiple tables from sql server and for each flowfile out of it, it >>> contains >>> >> an attribute >>> >> specifying the name of the temp ods table to create, and the second >>> >> processor is to get all flowfiles from the first processor and create >>> all >>> >> the temp ods tables specified in the flowfiles' attribute. >>> >> I found inside the app log that one of the temp table name already >>> existed >>> >> when trying to create the temp table, and it caused sql exception. >>> >> After taking some time investigating in the log, I found the sql query >>> was >>> >> executed twice in the second processor, once before nifi restart, the >>> second >>> >> execution was done right after nifi restart: >>>
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Hi Koji, one more thing, do you have any idea why my first issue leads to the unexpected shutdown of NIFI? according to the words, it will just slow down the flow. thanks. Regards, Ben 2017-12-25 14:31 GMT+08:00 尹文才 : > Hi Koji, thanks for your help, for the first issue, I will switch to use > the WriteAheadProvenanceReopsitory implementation. > > For the second issue, I have uploaded the relevant part of my log file > onto my google drive, the link is: > https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj > > You mean a custom processor could possibly process a flowfile twice only > when it's trying to commit the session but it's interrupted so the flowfile > still remains inside the original queue(like NIFI went down)? > > If you need to see the full log file, please let me know, thanks. > > Regards, > Ben > > 2017-12-25 13:51 GMT+08:00 Koji Kawamura : > >> Hi Ben, >> >> For your 2nd issue, NiFi commits a process session in Processor >> onTrigger when it's executed by NiFi flow engine by calling >> session.commit(). >> https://github.com/apache/nifi/blob/master/nifi-api/src/main >> /java/org/apache/nifi/processor/AbstractProcessor.java#L28 >> Once a process session is committed, the FlowFile state (including >> which queue it is in) is persisted to disk. >> >> It's possible for a Processor to process the same FlowFile more than >> once, if it has done its job, but failed to commit the session. >> For example, if your custom processor created a temp table from a >> FlowFile. Then before the process session is committed, something >> happened and NiFi process session was rollback. In this case, the >> target database is already updated (the temp table is created), but >> NiFi FlowFile stays in the incoming queue. If the FlowFile is >> processed again, the processor will get an error indicating the table >> already exists. >> >> I tried to look at the logs you attached, but attachments do not seem >> to be delivered to this ML. I don't see anything attached. >> >> Thanks, >> Koji >> >> >> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura >> wrote: >> > Hi Ben, >> > >> > Just a quick recommendation for your first issue, 'The rate of the >> > dataflow is exceeding the provenance recording rate' warning message. >> > I'd recommend using WriteAheadProvenanceRepository instead of >> > PersistentProvenanceRepository. WriteAheadProvenanceRepository >> > provides better performance. >> > Please take a look at the documentation here. >> > https://nifi.apache.org/docs/nifi-docs/html/administration-g >> uide.html#provenance-repository >> > >> > Thanks, >> > Koji >> > >> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 wrote: >> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have >> >> encountered 2 problems during my testing. >> >> >> >> The first problem is I found the nifi bulletin board was showing the >> >> following warning to me: >> >> >> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] >> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is >> exceeding >> >> the provenance recording rate. Slowing down flow to accommodate. >> Currently, >> >> there are 96 journal files (158278228 bytes) and threshold for >> blocking is >> >> 80 (1181116006 bytes) >> >> >> >> I don't quite understand what this means, and I found also inside the >> >> bootstrap log that nifi restarted itself: >> >> >> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi >> Apache >> >> NiFi appears to have died. Restarting... >> >> >> >> Is there anything I could do so solve this problem? >> >> >> >> The second problem is about the FlowFiles inside my flow, I actually >> >> implemented a few custom processors to do the ETL work. one is to >> extract >> >> multiple tables from sql server and for each flowfile out of it, it >> contains >> >> an attribute >> >> specifying the name of the temp ods table to create, and the second >> >> processor is to get all flowfiles from the first processor and create >> all >> >> the temp ods tables specified in the flowfiles' attribute. >> >> I found inside the app log that one of the temp table name already >> existed >> >> when trying to create the temp table, and it caused sql exception. >> >> After taking some time investigating in the log, I found the sql query >> was >> >> executed twice in the second processor, once before nifi restart, the >> second >> >> execution was done right after nifi restart: >> >> >> >> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7] >> >> c.z.nifi.processors.ExecuteSqlCommand >> >> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] >> 执行sql语句失败:SELECT >> >> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM >> >> dbo.ods_bd_e_reason; >> >> >> >> >> >> I have read the document of nifi in depth but I'm still not very aware >> of >> >> nifi's internal mechanism, my suspect is nifi didn't manage to >> checkpoint >> >> the flowfile's state(which queue it was in) in memo
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Hi Koji, thanks for your help, for the first issue, I will switch to use the WriteAheadProvenanceReopsitory implementation. For the second issue, I have uploaded the relevant part of my log file onto my google drive, the link is: https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj You mean a custom processor could possibly process a flowfile twice only when it's trying to commit the session but it's interrupted so the flowfile still remains inside the original queue(like NIFI went down)? If you need to see the full log file, please let me know, thanks. Regards, Ben 2017-12-25 13:51 GMT+08:00 Koji Kawamura : > Hi Ben, > > For your 2nd issue, NiFi commits a process session in Processor > onTrigger when it's executed by NiFi flow engine by calling > session.commit(). > https://github.com/apache/nifi/blob/master/nifi-api/src/ > main/java/org/apache/nifi/processor/AbstractProcessor.java#L28 > Once a process session is committed, the FlowFile state (including > which queue it is in) is persisted to disk. > > It's possible for a Processor to process the same FlowFile more than > once, if it has done its job, but failed to commit the session. > For example, if your custom processor created a temp table from a > FlowFile. Then before the process session is committed, something > happened and NiFi process session was rollback. In this case, the > target database is already updated (the temp table is created), but > NiFi FlowFile stays in the incoming queue. If the FlowFile is > processed again, the processor will get an error indicating the table > already exists. > > I tried to look at the logs you attached, but attachments do not seem > to be delivered to this ML. I don't see anything attached. > > Thanks, > Koji > > > On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura > wrote: > > Hi Ben, > > > > Just a quick recommendation for your first issue, 'The rate of the > > dataflow is exceeding the provenance recording rate' warning message. > > I'd recommend using WriteAheadProvenanceRepository instead of > > PersistentProvenanceRepository. WriteAheadProvenanceRepository > > provides better performance. > > Please take a look at the documentation here. > > https://nifi.apache.org/docs/nifi-docs/html/administration- > guide.html#provenance-repository > > > > Thanks, > > Koji > > > > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 wrote: > >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have > >> encountered 2 problems during my testing. > >> > >> The first problem is I found the nifi bulletin board was showing the > >> following warning to me: > >> > >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] > >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is > exceeding > >> the provenance recording rate. Slowing down flow to accommodate. > Currently, > >> there are 96 journal files (158278228 bytes) and threshold for blocking > is > >> 80 (1181116006 bytes) > >> > >> I don't quite understand what this means, and I found also inside the > >> bootstrap log that nifi restarted itself: > >> > >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi > Apache > >> NiFi appears to have died. Restarting... > >> > >> Is there anything I could do so solve this problem? > >> > >> The second problem is about the FlowFiles inside my flow, I actually > >> implemented a few custom processors to do the ETL work. one is to > extract > >> multiple tables from sql server and for each flowfile out of it, it > contains > >> an attribute > >> specifying the name of the temp ods table to create, and the second > >> processor is to get all flowfiles from the first processor and create > all > >> the temp ods tables specified in the flowfiles' attribute. > >> I found inside the app log that one of the temp table name already > existed > >> when trying to create the temp table, and it caused sql exception. > >> After taking some time investigating in the log, I found the sql query > was > >> executed twice in the second processor, once before nifi restart, the > second > >> execution was done right after nifi restart: > >> > >> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7] > >> c.z.nifi.processors.ExecuteSqlCommand > >> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] > 执行sql语句失败:SELECT > >> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM > >> dbo.ods_bd_e_reason; > >> > >> > >> I have read the document of nifi in depth but I'm still not very aware > of > >> nifi's internal mechanism, my suspect is nifi didn't manage to > checkpoint > >> the flowfile's state(which queue it was in) in memory into flowfile > >> repository > >> before it was dead and after restarting it recovered the flowfile's > state > >> from flowfile repository and then the flowfile went through the second > >> processor again and thus the sql was executed twice. Is this correct? > >> > >> I've attached the relevant part of app log, thanks. > >> > >> Regards, > >>
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Hi Ben, For your 2nd issue, NiFi commits a process session in Processor onTrigger when it's executed by NiFi flow engine by calling session.commit(). https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/AbstractProcessor.java#L28 Once a process session is committed, the FlowFile state (including which queue it is in) is persisted to disk. It's possible for a Processor to process the same FlowFile more than once, if it has done its job, but failed to commit the session. For example, if your custom processor created a temp table from a FlowFile. Then before the process session is committed, something happened and NiFi process session was rollback. In this case, the target database is already updated (the temp table is created), but NiFi FlowFile stays in the incoming queue. If the FlowFile is processed again, the processor will get an error indicating the table already exists. I tried to look at the logs you attached, but attachments do not seem to be delivered to this ML. I don't see anything attached. Thanks, Koji On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura wrote: > Hi Ben, > > Just a quick recommendation for your first issue, 'The rate of the > dataflow is exceeding the provenance recording rate' warning message. > I'd recommend using WriteAheadProvenanceRepository instead of > PersistentProvenanceRepository. WriteAheadProvenanceRepository > provides better performance. > Please take a look at the documentation here. > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository > > Thanks, > Koji > > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 wrote: >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have >> encountered 2 problems during my testing. >> >> The first problem is I found the nifi bulletin board was showing the >> following warning to me: >> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding >> the provenance recording rate. Slowing down flow to accommodate. Currently, >> there are 96 journal files (158278228 bytes) and threshold for blocking is >> 80 (1181116006 bytes) >> >> I don't quite understand what this means, and I found also inside the >> bootstrap log that nifi restarted itself: >> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi Apache >> NiFi appears to have died. Restarting... >> >> Is there anything I could do so solve this problem? >> >> The second problem is about the FlowFiles inside my flow, I actually >> implemented a few custom processors to do the ETL work. one is to extract >> multiple tables from sql server and for each flowfile out of it, it contains >> an attribute >> specifying the name of the temp ods table to create, and the second >> processor is to get all flowfiles from the first processor and create all >> the temp ods tables specified in the flowfiles' attribute. >> I found inside the app log that one of the temp table name already existed >> when trying to create the temp table, and it caused sql exception. >> After taking some time investigating in the log, I found the sql query was >> executed twice in the second processor, once before nifi restart, the second >> execution was done right after nifi restart: >> >> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7] >> c.z.nifi.processors.ExecuteSqlCommand >> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT >> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM >> dbo.ods_bd_e_reason; >> >> >> I have read the document of nifi in depth but I'm still not very aware of >> nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint >> the flowfile's state(which queue it was in) in memory into flowfile >> repository >> before it was dead and after restarting it recovered the flowfile's state >> from flowfile repository and then the flowfile went through the second >> processor again and thus the sql was executed twice. Is this correct? >> >> I've attached the relevant part of app log, thanks. >> >> Regards, >> Ben
Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate
Hi Ben, Just a quick recommendation for your first issue, 'The rate of the dataflow is exceeding the provenance recording rate' warning message. I'd recommend using WriteAheadProvenanceRepository instead of PersistentProvenanceRepository. WriteAheadProvenanceRepository provides better performance. Please take a look at the documentation here. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository Thanks, Koji On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 wrote: > Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have > encountered 2 problems during my testing. > > The first problem is I found the nifi bulletin board was showing the > following warning to me: > > 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1] > o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding > the provenance recording rate. Slowing down flow to accommodate. Currently, > there are 96 journal files (158278228 bytes) and threshold for blocking is > 80 (1181116006 bytes) > > I don't quite understand what this means, and I found also inside the > bootstrap log that nifi restarted itself: > > 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi Apache > NiFi appears to have died. Restarting... > > Is there anything I could do so solve this problem? > > The second problem is about the FlowFiles inside my flow, I actually > implemented a few custom processors to do the ETL work. one is to extract > multiple tables from sql server and for each flowfile out of it, it contains > an attribute > specifying the name of the temp ods table to create, and the second > processor is to get all flowfiles from the first processor and create all > the temp ods tables specified in the flowfiles' attribute. > I found inside the app log that one of the temp table name already existed > when trying to create the temp table, and it caused sql exception. > After taking some time investigating in the log, I found the sql query was > executed twice in the second processor, once before nifi restart, the second > execution was done right after nifi restart: > > 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7] > c.z.nifi.processors.ExecuteSqlCommand > ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT > TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM > dbo.ods_bd_e_reason; > > > I have read the document of nifi in depth but I'm still not very aware of > nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint > the flowfile's state(which queue it was in) in memory into flowfile > repository > before it was dead and after restarting it recovered the flowfile's state > from flowfile repository and then the flowfile went through the second > processor again and thus the sql was executed twice. Is this correct? > > I've attached the relevant part of app log, thanks. > > Regards, > Ben