Hi Ben,

Just a quick recommendation for your first issue, 'The rate of the
dataflow is exceeding the provenance recording rate' warning message.
I'd recommend using WriteAheadProvenanceRepository instead of
PersistentProvenanceRepository. WriteAheadProvenanceRepository
provides better performance.
Please take a look at the documentation here.
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository

Thanks,
Koji

On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 <batman...@gmail.com> wrote:
> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
> encountered 2 problems during my testing.
>
> The first problem is I found the nifi bulletin board was showing the
> following warning to me:
>
> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding
> the provenance recording rate. Slowing down flow to accommodate. Currently,
> there are 96 journal files (158278228 bytes) and threshold for blocking is
> 80 (1181116006 bytes)
>
> I don't quite understand what this means, and I found also inside the
> bootstrap log that nifi restarted itself:
>
> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi Apache
> NiFi appears to have died. Restarting...
>
> Is there anything I could do so solve this problem?
>
> The second problem is about the FlowFiles inside my flow, I actually
> implemented a few custom processors to do the ETL work. one is to extract
> multiple tables from sql server and for each flowfile out of it, it contains
> an attribute
> specifying the name of the temp ods table to create, and the second
> processor is to get all flowfiles from the first processor and create all
> the temp ods tables specified in the flowfiles' attribute.
> I found inside the app log that one of the temp table name already existed
> when trying to create the temp table, and it caused sql exception.
> After taking some time investigating in the log, I found the sql query was
> executed twice in the second processor, once before nifi restart, the second
> execution was done right after nifi restart:
>
> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
> c.z.nifi.processors.ExecuteSqlCommand
> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
> dbo.ods_bd_e_reason;
>
>
> I have read the document of nifi in depth but I'm still not very aware of
> nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint
> the flowfile's state(which queue it was in) in memory into flowfile
> repository
> before it was dead and after restarting it recovered the flowfile's state
> from flowfile repository and then the flowfile went through the second
> processor again and thus the sql was executed twice. Is this correct?
>
> I've attached the relevant part of app log, thanks.
>
> Regards,
> Ben

Reply via email to