Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

You can make NiFi log more verbose by editing:
NIFI_HOME/conf/logback.xml

For example, adding following entry will reveal how NiFi repositories run:






Thanks,
Koji

On Mon, Dec 25, 2017 at 4:30 PM, 尹文才  wrote:
> Hi Koji, I also didn't find anything related to the unexpected shutdown in
> my logs, is there anything I could do  to make NIFI log more verbose
> information to the logs?
>
> Regards,
> Ben
>
> 2017-12-25 14:56 GMT+08:00 Koji Kawamura :
>
>> Hi Ben,
>>
>> I looked at the log and I expected to see some indication for the
>> cause of shutdown, but couldn't find any.
>> The PersistentProvenanceRepository rate warning is just a warning, and
>> it shouldn't be the trigger of an unexpected shutdown. I suspect other
>> reasons such as OOM killer, but I can't do any further investigation
>> with only these logs.
>>
>> Thanks,
>> Koji
>>
>> On Mon, Dec 25, 2017 at 3:46 PM, 尹文才  wrote:
>> > Hi Koji, one more thing, do you have any idea why my first issue leads to
>> > the unexpected shutdown of NIFI? according to the words, it will just
>> slow
>> > down the flow. thanks.
>> >
>> > Regards,
>> > Ben
>> >
>> > 2017-12-25 14:31 GMT+08:00 尹文才 :
>> >
>> >> Hi Koji, thanks for your help, for the first issue, I will switch to use
>> >> the WriteAheadProvenanceReopsitory implementation.
>> >>
>> >> For the second issue, I have uploaded the relevant part of my log file
>> >> onto my google drive, the link is:
>> >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj
>> >>
>> >> You mean a custom processor could possibly process a flowfile twice only
>> >> when it's trying to commit the session but it's interrupted so the
>> flowfile
>> >> still remains inside the original queue(like NIFI went down)?
>> >>
>> >> If you need to see the full log file, please let me know, thanks.
>> >>
>> >> Regards,
>> >> Ben
>> >>
>> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura :
>> >>
>> >>> Hi Ben,
>> >>>
>> >>> For your 2nd issue, NiFi commits a process session in Processor
>> >>> onTrigger when it's executed by NiFi flow engine by calling
>> >>> session.commit().
>> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main
>> >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28
>> >>> Once a process session is committed, the FlowFile state (including
>> >>> which queue it is in) is persisted to disk.
>> >>>
>> >>> It's possible for a Processor to process the same FlowFile more than
>> >>> once, if it has done its job, but failed to commit the session.
>> >>> For example, if your custom processor created a temp table from a
>> >>> FlowFile. Then before the process session is committed, something
>> >>> happened and NiFi process session was rollback. In this case, the
>> >>> target database is already updated (the temp table is created), but
>> >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is
>> >>> processed again, the processor will get an error indicating the table
>> >>> already exists.
>> >>>
>> >>> I tried to look at the logs you attached, but attachments do not seem
>> >>> to be delivered to this ML. I don't see anything attached.
>> >>>
>> >>> Thanks,
>> >>> Koji
>> >>>
>> >>>
>> >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura > >
>> >>> wrote:
>> >>> > Hi Ben,
>> >>> >
>> >>> > Just a quick recommendation for your first issue, 'The rate of the
>> >>> > dataflow is exceeding the provenance recording rate' warning message.
>> >>> > I'd recommend using WriteAheadProvenanceRepository instead of
>> >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
>> >>> > provides better performance.
>> >>> > Please take a look at the documentation here.
>> >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g
>> >>> uide.html#provenance-repository
>> >>> >
>> >>> > Thanks,
>> >>> > Koji
>> >>> >
>> >>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
>> >>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I
>> have
>> >>> >> encountered 2 problems during my testing.
>> >>> >>
>> >>> >> The first problem is I found the nifi bulletin board was showing the
>> >>> >> following warning to me:
>> >>> >>
>> >>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
>> >>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
>> >>> exceeding
>> >>> >> the provenance recording rate. Slowing down flow to accommodate.
>> >>> Currently,
>> >>> >> there are 96 journal files (158278228 bytes) and threshold for
>> >>> blocking is
>> >>> >> 80 (1181116006 bytes)
>> >>> >>
>> >>> >> I don't quite understand what this means, and I found also inside
>> the
>> >>> >> bootstrap log that nifi restarted itself:
>> >>> >>
>> >>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.
>> RunNiFi
>> >>> Apache
>> >>> >> NiFi appears to have died. 

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread 尹文才
Hi Koji, I also didn't find anything related to the unexpected shutdown in
my logs, is there anything I could do  to make NIFI log more verbose
information to the logs?

Regards,
Ben

2017-12-25 14:56 GMT+08:00 Koji Kawamura :

> Hi Ben,
>
> I looked at the log and I expected to see some indication for the
> cause of shutdown, but couldn't find any.
> The PersistentProvenanceRepository rate warning is just a warning, and
> it shouldn't be the trigger of an unexpected shutdown. I suspect other
> reasons such as OOM killer, but I can't do any further investigation
> with only these logs.
>
> Thanks,
> Koji
>
> On Mon, Dec 25, 2017 at 3:46 PM, 尹文才  wrote:
> > Hi Koji, one more thing, do you have any idea why my first issue leads to
> > the unexpected shutdown of NIFI? according to the words, it will just
> slow
> > down the flow. thanks.
> >
> > Regards,
> > Ben
> >
> > 2017-12-25 14:31 GMT+08:00 尹文才 :
> >
> >> Hi Koji, thanks for your help, for the first issue, I will switch to use
> >> the WriteAheadProvenanceReopsitory implementation.
> >>
> >> For the second issue, I have uploaded the relevant part of my log file
> >> onto my google drive, the link is:
> >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj
> >>
> >> You mean a custom processor could possibly process a flowfile twice only
> >> when it's trying to commit the session but it's interrupted so the
> flowfile
> >> still remains inside the original queue(like NIFI went down)?
> >>
> >> If you need to see the full log file, please let me know, thanks.
> >>
> >> Regards,
> >> Ben
> >>
> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura :
> >>
> >>> Hi Ben,
> >>>
> >>> For your 2nd issue, NiFi commits a process session in Processor
> >>> onTrigger when it's executed by NiFi flow engine by calling
> >>> session.commit().
> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main
> >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28
> >>> Once a process session is committed, the FlowFile state (including
> >>> which queue it is in) is persisted to disk.
> >>>
> >>> It's possible for a Processor to process the same FlowFile more than
> >>> once, if it has done its job, but failed to commit the session.
> >>> For example, if your custom processor created a temp table from a
> >>> FlowFile. Then before the process session is committed, something
> >>> happened and NiFi process session was rollback. In this case, the
> >>> target database is already updated (the temp table is created), but
> >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is
> >>> processed again, the processor will get an error indicating the table
> >>> already exists.
> >>>
> >>> I tried to look at the logs you attached, but attachments do not seem
> >>> to be delivered to this ML. I don't see anything attached.
> >>>
> >>> Thanks,
> >>> Koji
> >>>
> >>>
> >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura  >
> >>> wrote:
> >>> > Hi Ben,
> >>> >
> >>> > Just a quick recommendation for your first issue, 'The rate of the
> >>> > dataflow is exceeding the provenance recording rate' warning message.
> >>> > I'd recommend using WriteAheadProvenanceRepository instead of
> >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
> >>> > provides better performance.
> >>> > Please take a look at the documentation here.
> >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g
> >>> uide.html#provenance-repository
> >>> >
> >>> > Thanks,
> >>> > Koji
> >>> >
> >>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
> >>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I
> have
> >>> >> encountered 2 problems during my testing.
> >>> >>
> >>> >> The first problem is I found the nifi bulletin board was showing the
> >>> >> following warning to me:
> >>> >>
> >>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
> >>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
> >>> exceeding
> >>> >> the provenance recording rate. Slowing down flow to accommodate.
> >>> Currently,
> >>> >> there are 96 journal files (158278228 bytes) and threshold for
> >>> blocking is
> >>> >> 80 (1181116006 bytes)
> >>> >>
> >>> >> I don't quite understand what this means, and I found also inside
> the
> >>> >> bootstrap log that nifi restarted itself:
> >>> >>
> >>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.
> RunNiFi
> >>> Apache
> >>> >> NiFi appears to have died. Restarting...
> >>> >>
> >>> >> Is there anything I could do so solve this problem?
> >>> >>
> >>> >> The second problem is about the FlowFiles inside my flow, I actually
> >>> >> implemented a few custom processors to do the ETL work. one is to
> >>> extract
> >>> >> multiple tables from sql server and for each flowfile out of it, it
> >>> contains
> >>> >> an attribute
> >>> >> 

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

I looked at the log and I expected to see some indication for the
cause of shutdown, but couldn't find any.
The PersistentProvenanceRepository rate warning is just a warning, and
it shouldn't be the trigger of an unexpected shutdown. I suspect other
reasons such as OOM killer, but I can't do any further investigation
with only these logs.

Thanks,
Koji

On Mon, Dec 25, 2017 at 3:46 PM, 尹文才  wrote:
> Hi Koji, one more thing, do you have any idea why my first issue leads to
> the unexpected shutdown of NIFI? according to the words, it will just slow
> down the flow. thanks.
>
> Regards,
> Ben
>
> 2017-12-25 14:31 GMT+08:00 尹文才 :
>
>> Hi Koji, thanks for your help, for the first issue, I will switch to use
>> the WriteAheadProvenanceReopsitory implementation.
>>
>> For the second issue, I have uploaded the relevant part of my log file
>> onto my google drive, the link is:
>> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj
>>
>> You mean a custom processor could possibly process a flowfile twice only
>> when it's trying to commit the session but it's interrupted so the flowfile
>> still remains inside the original queue(like NIFI went down)?
>>
>> If you need to see the full log file, please let me know, thanks.
>>
>> Regards,
>> Ben
>>
>> 2017-12-25 13:51 GMT+08:00 Koji Kawamura :
>>
>>> Hi Ben,
>>>
>>> For your 2nd issue, NiFi commits a process session in Processor
>>> onTrigger when it's executed by NiFi flow engine by calling
>>> session.commit().
>>> https://github.com/apache/nifi/blob/master/nifi-api/src/main
>>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28
>>> Once a process session is committed, the FlowFile state (including
>>> which queue it is in) is persisted to disk.
>>>
>>> It's possible for a Processor to process the same FlowFile more than
>>> once, if it has done its job, but failed to commit the session.
>>> For example, if your custom processor created a temp table from a
>>> FlowFile. Then before the process session is committed, something
>>> happened and NiFi process session was rollback. In this case, the
>>> target database is already updated (the temp table is created), but
>>> NiFi FlowFile stays in the incoming queue. If the FlowFile is
>>> processed again, the processor will get an error indicating the table
>>> already exists.
>>>
>>> I tried to look at the logs you attached, but attachments do not seem
>>> to be delivered to this ML. I don't see anything attached.
>>>
>>> Thanks,
>>> Koji
>>>
>>>
>>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura 
>>> wrote:
>>> > Hi Ben,
>>> >
>>> > Just a quick recommendation for your first issue, 'The rate of the
>>> > dataflow is exceeding the provenance recording rate' warning message.
>>> > I'd recommend using WriteAheadProvenanceRepository instead of
>>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
>>> > provides better performance.
>>> > Please take a look at the documentation here.
>>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g
>>> uide.html#provenance-repository
>>> >
>>> > Thanks,
>>> > Koji
>>> >
>>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
>>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
>>> >> encountered 2 problems during my testing.
>>> >>
>>> >> The first problem is I found the nifi bulletin board was showing the
>>> >> following warning to me:
>>> >>
>>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
>>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
>>> exceeding
>>> >> the provenance recording rate. Slowing down flow to accommodate.
>>> Currently,
>>> >> there are 96 journal files (158278228 bytes) and threshold for
>>> blocking is
>>> >> 80 (1181116006 bytes)
>>> >>
>>> >> I don't quite understand what this means, and I found also inside the
>>> >> bootstrap log that nifi restarted itself:
>>> >>
>>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi
>>> Apache
>>> >> NiFi appears to have died. Restarting...
>>> >>
>>> >> Is there anything I could do so solve this problem?
>>> >>
>>> >> The second problem is about the FlowFiles inside my flow, I actually
>>> >> implemented a few custom processors to do the ETL work. one is to
>>> extract
>>> >> multiple tables from sql server and for each flowfile out of it, it
>>> contains
>>> >> an attribute
>>> >> specifying the name of the temp ods table to create, and the second
>>> >> processor is to get all flowfiles from the first processor and create
>>> all
>>> >> the temp ods tables specified in the flowfiles' attribute.
>>> >> I found inside the app log that one of the temp table name already
>>> existed
>>> >> when trying to create the temp table, and it caused sql exception.
>>> >> After taking some time investigating in the log, I found the sql query
>>> was
>>> >> executed twice in the 

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread 尹文才
Hi Koji, one more thing, do you have any idea why my first issue leads to
the unexpected shutdown of NIFI? according to the words, it will just slow
down the flow. thanks.

Regards,
Ben

2017-12-25 14:31 GMT+08:00 尹文才 :

> Hi Koji, thanks for your help, for the first issue, I will switch to use
> the WriteAheadProvenanceReopsitory implementation.
>
> For the second issue, I have uploaded the relevant part of my log file
> onto my google drive, the link is:
> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj
>
> You mean a custom processor could possibly process a flowfile twice only
> when it's trying to commit the session but it's interrupted so the flowfile
> still remains inside the original queue(like NIFI went down)?
>
> If you need to see the full log file, please let me know, thanks.
>
> Regards,
> Ben
>
> 2017-12-25 13:51 GMT+08:00 Koji Kawamura :
>
>> Hi Ben,
>>
>> For your 2nd issue, NiFi commits a process session in Processor
>> onTrigger when it's executed by NiFi flow engine by calling
>> session.commit().
>> https://github.com/apache/nifi/blob/master/nifi-api/src/main
>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28
>> Once a process session is committed, the FlowFile state (including
>> which queue it is in) is persisted to disk.
>>
>> It's possible for a Processor to process the same FlowFile more than
>> once, if it has done its job, but failed to commit the session.
>> For example, if your custom processor created a temp table from a
>> FlowFile. Then before the process session is committed, something
>> happened and NiFi process session was rollback. In this case, the
>> target database is already updated (the temp table is created), but
>> NiFi FlowFile stays in the incoming queue. If the FlowFile is
>> processed again, the processor will get an error indicating the table
>> already exists.
>>
>> I tried to look at the logs you attached, but attachments do not seem
>> to be delivered to this ML. I don't see anything attached.
>>
>> Thanks,
>> Koji
>>
>>
>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura 
>> wrote:
>> > Hi Ben,
>> >
>> > Just a quick recommendation for your first issue, 'The rate of the
>> > dataflow is exceeding the provenance recording rate' warning message.
>> > I'd recommend using WriteAheadProvenanceRepository instead of
>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
>> > provides better performance.
>> > Please take a look at the documentation here.
>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g
>> uide.html#provenance-repository
>> >
>> > Thanks,
>> > Koji
>> >
>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
>> >> encountered 2 problems during my testing.
>> >>
>> >> The first problem is I found the nifi bulletin board was showing the
>> >> following warning to me:
>> >>
>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
>> exceeding
>> >> the provenance recording rate. Slowing down flow to accommodate.
>> Currently,
>> >> there are 96 journal files (158278228 bytes) and threshold for
>> blocking is
>> >> 80 (1181116006 bytes)
>> >>
>> >> I don't quite understand what this means, and I found also inside the
>> >> bootstrap log that nifi restarted itself:
>> >>
>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi
>> Apache
>> >> NiFi appears to have died. Restarting...
>> >>
>> >> Is there anything I could do so solve this problem?
>> >>
>> >> The second problem is about the FlowFiles inside my flow, I actually
>> >> implemented a few custom processors to do the ETL work. one is to
>> extract
>> >> multiple tables from sql server and for each flowfile out of it, it
>> contains
>> >> an attribute
>> >> specifying the name of the temp ods table to create, and the second
>> >> processor is to get all flowfiles from the first processor and create
>> all
>> >> the temp ods tables specified in the flowfiles' attribute.
>> >> I found inside the app log that one of the temp table name already
>> existed
>> >> when trying to create the temp table, and it caused sql exception.
>> >> After taking some time investigating in the log, I found the sql query
>> was
>> >> executed twice in the second processor, once before nifi restart, the
>> second
>> >> execution was done right after nifi restart:
>> >>
>> >> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
>> >> c.z.nifi.processors.ExecuteSqlCommand
>> >> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14]
>> 执行sql语句失败:SELECT
>> >> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
>> >> dbo.ods_bd_e_reason;
>> >>
>> >>
>> >> I have read the document of nifi in depth but I'm still not very aware
>> of
>> >> nifi's internal mechanism, my suspect is 

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread 尹文才
Hi Koji, thanks for your help, for the first issue, I will switch to use
the WriteAheadProvenanceReopsitory implementation.

For the second issue, I have uploaded the relevant part of my log file onto
my google drive, the link is:
https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj

You mean a custom processor could possibly process a flowfile twice only
when it's trying to commit the session but it's interrupted so the flowfile
still remains inside the original queue(like NIFI went down)?

If you need to see the full log file, please let me know, thanks.

Regards,
Ben

2017-12-25 13:51 GMT+08:00 Koji Kawamura :

> Hi Ben,
>
> For your 2nd issue, NiFi commits a process session in Processor
> onTrigger when it's executed by NiFi flow engine by calling
> session.commit().
> https://github.com/apache/nifi/blob/master/nifi-api/src/
> main/java/org/apache/nifi/processor/AbstractProcessor.java#L28
> Once a process session is committed, the FlowFile state (including
> which queue it is in) is persisted to disk.
>
> It's possible for a Processor to process the same FlowFile more than
> once, if it has done its job, but failed to commit the session.
> For example, if your custom processor created a temp table from a
> FlowFile. Then before the process session is committed, something
> happened and NiFi process session was rollback. In this case, the
> target database is already updated (the temp table is created), but
> NiFi FlowFile stays in the incoming queue. If the FlowFile is
> processed again, the processor will get an error indicating the table
> already exists.
>
> I tried to look at the logs you attached, but attachments do not seem
> to be delivered to this ML. I don't see anything attached.
>
> Thanks,
> Koji
>
>
> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura 
> wrote:
> > Hi Ben,
> >
> > Just a quick recommendation for your first issue, 'The rate of the
> > dataflow is exceeding the provenance recording rate' warning message.
> > I'd recommend using WriteAheadProvenanceRepository instead of
> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
> > provides better performance.
> > Please take a look at the documentation here.
> > https://nifi.apache.org/docs/nifi-docs/html/administration-
> guide.html#provenance-repository
> >
> > Thanks,
> > Koji
> >
> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
> >> encountered 2 problems during my testing.
> >>
> >> The first problem is I found the nifi bulletin board was showing the
> >> following warning to me:
> >>
> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
> exceeding
> >> the provenance recording rate. Slowing down flow to accommodate.
> Currently,
> >> there are 96 journal files (158278228 bytes) and threshold for blocking
> is
> >> 80 (1181116006 bytes)
> >>
> >> I don't quite understand what this means, and I found also inside the
> >> bootstrap log that nifi restarted itself:
> >>
> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi
> Apache
> >> NiFi appears to have died. Restarting...
> >>
> >> Is there anything I could do so solve this problem?
> >>
> >> The second problem is about the FlowFiles inside my flow, I actually
> >> implemented a few custom processors to do the ETL work. one is to
> extract
> >> multiple tables from sql server and for each flowfile out of it, it
> contains
> >> an attribute
> >> specifying the name of the temp ods table to create, and the second
> >> processor is to get all flowfiles from the first processor and create
> all
> >> the temp ods tables specified in the flowfiles' attribute.
> >> I found inside the app log that one of the temp table name already
> existed
> >> when trying to create the temp table, and it caused sql exception.
> >> After taking some time investigating in the log, I found the sql query
> was
> >> executed twice in the second processor, once before nifi restart, the
> second
> >> execution was done right after nifi restart:
> >>
> >> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
> >> c.z.nifi.processors.ExecuteSqlCommand
> >> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14]
> 执行sql语句失败:SELECT
> >> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
> >> dbo.ods_bd_e_reason;
> >>
> >>
> >> I have read the document of nifi in depth but I'm still not very aware
> of
> >> nifi's internal mechanism, my suspect is nifi didn't manage to
> checkpoint
> >> the flowfile's state(which queue it was in) in memory into flowfile
> >> repository
> >> before it was dead and after restarting it recovered the flowfile's
> state
> >> from flowfile repository and then the flowfile went through the second
> >> processor again and thus the sql was executed twice. Is this correct?
> >>
> >> I've 

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

For your 2nd issue, NiFi commits a process session in Processor
onTrigger when it's executed by NiFi flow engine by calling
session.commit().
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/AbstractProcessor.java#L28
Once a process session is committed, the FlowFile state (including
which queue it is in) is persisted to disk.

It's possible for a Processor to process the same FlowFile more than
once, if it has done its job, but failed to commit the session.
For example, if your custom processor created a temp table from a
FlowFile. Then before the process session is committed, something
happened and NiFi process session was rollback. In this case, the
target database is already updated (the temp table is created), but
NiFi FlowFile stays in the incoming queue. If the FlowFile is
processed again, the processor will get an error indicating the table
already exists.

I tried to look at the logs you attached, but attachments do not seem
to be delivered to this ML. I don't see anything attached.

Thanks,
Koji


On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura  wrote:
> Hi Ben,
>
> Just a quick recommendation for your first issue, 'The rate of the
> dataflow is exceeding the provenance recording rate' warning message.
> I'd recommend using WriteAheadProvenanceRepository instead of
> PersistentProvenanceRepository. WriteAheadProvenanceRepository
> provides better performance.
> Please take a look at the documentation here.
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository
>
> Thanks,
> Koji
>
> On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
>> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
>> encountered 2 problems during my testing.
>>
>> The first problem is I found the nifi bulletin board was showing the
>> following warning to me:
>>
>> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
>> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding
>> the provenance recording rate. Slowing down flow to accommodate. Currently,
>> there are 96 journal files (158278228 bytes) and threshold for blocking is
>> 80 (1181116006 bytes)
>>
>> I don't quite understand what this means, and I found also inside the
>> bootstrap log that nifi restarted itself:
>>
>> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi Apache
>> NiFi appears to have died. Restarting...
>>
>> Is there anything I could do so solve this problem?
>>
>> The second problem is about the FlowFiles inside my flow, I actually
>> implemented a few custom processors to do the ETL work. one is to extract
>> multiple tables from sql server and for each flowfile out of it, it contains
>> an attribute
>> specifying the name of the temp ods table to create, and the second
>> processor is to get all flowfiles from the first processor and create all
>> the temp ods tables specified in the flowfiles' attribute.
>> I found inside the app log that one of the temp table name already existed
>> when trying to create the temp table, and it caused sql exception.
>> After taking some time investigating in the log, I found the sql query was
>> executed twice in the second processor, once before nifi restart, the second
>> execution was done right after nifi restart:
>>
>> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
>> c.z.nifi.processors.ExecuteSqlCommand
>> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
>> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
>> dbo.ods_bd_e_reason;
>>
>>
>> I have read the document of nifi in depth but I'm still not very aware of
>> nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint
>> the flowfile's state(which queue it was in) in memory into flowfile
>> repository
>> before it was dead and after restarting it recovered the flowfile's state
>> from flowfile repository and then the flowfile went through the second
>> processor again and thus the sql was executed twice. Is this correct?
>>
>> I've attached the relevant part of app log, thanks.
>>
>> Regards,
>> Ben


Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

Just a quick recommendation for your first issue, 'The rate of the
dataflow is exceeding the provenance recording rate' warning message.
I'd recommend using WriteAheadProvenanceRepository instead of
PersistentProvenanceRepository. WriteAheadProvenanceRepository
provides better performance.
Please take a look at the documentation here.
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository

Thanks,
Koji

On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
> encountered 2 problems during my testing.
>
> The first problem is I found the nifi bulletin board was showing the
> following warning to me:
>
> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding
> the provenance recording rate. Slowing down flow to accommodate. Currently,
> there are 96 journal files (158278228 bytes) and threshold for blocking is
> 80 (1181116006 bytes)
>
> I don't quite understand what this means, and I found also inside the
> bootstrap log that nifi restarted itself:
>
> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi Apache
> NiFi appears to have died. Restarting...
>
> Is there anything I could do so solve this problem?
>
> The second problem is about the FlowFiles inside my flow, I actually
> implemented a few custom processors to do the ETL work. one is to extract
> multiple tables from sql server and for each flowfile out of it, it contains
> an attribute
> specifying the name of the temp ods table to create, and the second
> processor is to get all flowfiles from the first processor and create all
> the temp ods tables specified in the flowfiles' attribute.
> I found inside the app log that one of the temp table name already existed
> when trying to create the temp table, and it caused sql exception.
> After taking some time investigating in the log, I found the sql query was
> executed twice in the second processor, once before nifi restart, the second
> execution was done right after nifi restart:
>
> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
> c.z.nifi.processors.ExecuteSqlCommand
> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
> dbo.ods_bd_e_reason;
>
>
> I have read the document of nifi in depth but I'm still not very aware of
> nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint
> the flowfile's state(which queue it was in) in memory into flowfile
> repository
> before it was dead and after restarting it recovered the flowfile's state
> from flowfile repository and then the flowfile went through the second
> processor again and thus the sql was executed twice. Is this correct?
>
> I've attached the relevant part of app log, thanks.
>
> Regards,
> Ben


The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread 尹文才
Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
encountered 2 problems during my testing.

The first problem is I found the nifi bulletin board was showing the
following warning to me:

2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
exceeding the provenance recording rate. Slowing down flow to accommodate.
Currently, there are 96 journal files (158278228 bytes) and threshold for
blocking is 80 (1181116006 bytes)

I don't quite understand what this means, and I found also inside the
bootstrap log that nifi restarted itself:

2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi
Apache NiFi appears to have died. Restarting...

Is there anything I could do so solve this problem?

The second problem is about the FlowFiles inside my flow, I actually
implemented a few custom processors to do the ETL work. one is to extract
multiple tables from sql server and for each flowfile out of it, it
contains an attribute
specifying the name of the temp ods table to create, and the second
processor is to get all flowfiles from the first processor and create all
the temp ods tables specified in the flowfiles' attribute.
I found inside the app log that one of the temp table name already existed
when trying to create the temp table, and it caused sql exception.
After taking some time investigating in the log, I found the sql query was
executed twice in the second processor, once before nifi restart, the
second execution was done right after nifi restart:

2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
c.z.nifi.processors.ExecuteSqlCommand
ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
dbo.ods_bd_e_reason;


I have read the document of nifi in depth but I'm still not very aware of
nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint
the flowfile's state(which queue it was in) in memory into flowfile
repository
before it was dead and after restarting it recovered the flowfile's state
from flowfile repository and then the flowfile went through the second
processor again and thus the sql was executed twice. Is this correct?

I've attached the relevant part of app log, thanks.

Regards,
Ben