Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
encountered 2 problems during my testing.

The first problem is I found the nifi bulletin board was showing the
following warning to me:

2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
exceeding the provenance recording rate. Slowing down flow to accommodate.
Currently, there are 96 journal files (158278228 bytes) and threshold for
blocking is 80 (1181116006 bytes)

I don't quite understand what this means, and I found also inside the
bootstrap log that nifi restarted itself:

2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi
Apache NiFi appears to have died. Restarting...

Is there anything I could do so solve this problem?

The second problem is about the FlowFiles inside my flow, I actually
implemented a few custom processors to do the ETL work. one is to extract
multiple tables from sql server and for each flowfile out of it, it
contains an attribute
specifying the name of the temp ods table to create, and the second
processor is to get all flowfiles from the first processor and create all
the temp ods tables specified in the flowfiles' attribute.
I found inside the app log that one of the temp table name already existed
when trying to create the temp table, and it caused sql exception.
After taking some time investigating in the log, I found the sql query was
executed twice in the second processor, once before nifi restart, the
second execution was done right after nifi restart:

2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
c.z.nifi.processors.ExecuteSqlCommand
ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
dbo.ods_bd_e_reason;


I have read the document of nifi in depth but I'm still not very aware of
nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint
the flowfile's state(which queue it was in) in memory into flowfile
repository
before it was dead and after restarting it recovered the flowfile's state
from flowfile repository and then the flowfile went through the second
processor again and thus the sql was executed twice. Is this correct?

I've attached the relevant part of app log, thanks.

Regards,
Ben

Reply via email to