[ https://issues.apache.org/jira/browse/NIFI-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203535#comment-17203535 ]
Mengze Li commented on NIFI-7856: --------------------------------- Here is the stack trace of one incident, hopefully it is helpful. Also attached the ls results, it seems that these files are all compressed fine but the logs seem to show that it doesn't exist. A race condition? {code} 2020-09-27 21:37:34,747 INFO [Clustering Tasks Thread-3] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2020-09-27 21:37:34,616 and sent to 10.51.8.18:9999 at 2020-09-27 21:37:34,747; send took 131 millis 2020-09-27 21:37:39,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository 2020-09-27 21:37:39,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 15079 records in 0 milliseconds 2020-09-27 21:37:49,109 INFO [pool-61-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: shardId-000000000000 2020-09-27 21:37:49,110 INFO [pool-61-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ... 2020-09-27 21:37:59,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository 2020-09-27 21:37:59,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 15079 records in 0 milliseconds 2020-09-27 21:38:02,196 INFO [pool-43-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: shardId-000000000012 2020-09-27 21:38:02,196 INFO [pool-43-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ... 2020-09-27 21:38:19,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository 2020-09-27 21:38:19,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 15079 records in 0 milliseconds 2020-09-27 21:38:20,688 INFO [Timer-Driven Process Thread-6] o.a.nifi.groups.StandardProcessGroup StandardProcessGroup[identifier=9e102d08-0174-1000-ffff-ffffdb703545,name=ContactLookup] is not the most recent version of the flow that is under Version Control; current version is 3; most recent version is 7 2020-09-27 21:38:20,691 INFO [Timer-Driven Process Thread-6] o.a.nifi.groups.StandardProcessGroup StandardProcessGroup[identifier=4b226950-0174-1000-0000-000064a82b74,name=EcomdashOrderProcessingMain] is not the most recent version of the flow that is under Version Control; current version is 8; most recent version is 10 2020-09-27 21:38:20,694 INFO [Timer-Driven Process Thread-6] o.a.nifi.groups.StandardProcessGroup StandardProcessGroup[identifier=e366c899-0173-1000-0000-000026d80b41,name=ContactLookup] is not the most recent version of the flow that is under Version Control; current version is 5; most recent version is 7 2020-09-27 21:38:20,697 INFO [Timer-Driven Process Thread-6] o.a.nifi.groups.StandardProcessGroup StandardProcessGroup[identifier=a17c8629-0173-1000-0000-0000055a79e8,name=HandleFailedMessages] is not the most recent version of the flow that is under Version Control; current version is 2; most recent version is 3 2020-09-27 21:38:34,799 INFO [Framework Task Thread Thread-3] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED 2020-09-27 21:38:34,799 ERROR [Compress Provenance Logs-1-thread-2] o.a.n.p.s.EventFileCompressor Failed to compress ./provenance_repository/1693519.prov on rollover java.io.FileNotFoundException: ./provenance_repository/1693519.prov (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164) at org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-09-27 21:38:34,799 WARN [Compress Provenance Logs-1-thread-2] o.a.n.p.s.EventFileCompressor Failed to delete ./provenance_repository/1693519.prov; this file should be cleaned up manually 2020-09-27 21:38:34,887 INFO [Clustering Tasks Thread-3] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2020-09-27 21:38:34,748 and sent to 10.51.8.18:9999 at 2020-09-27 21:38:34,887; send took 139 millis 2020-09-27 21:38:39,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository 2020-09-27 21:38:39,660 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 15079 records in 0 milliseconds 2020-09-27 21:38:54,111 INFO [pool-61-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: shardId-000000000000 2020-09-27 21:38:54,111 INFO [pool-61-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ... 2020-09-27 21:38:59,661 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository 2020-09-27 21:38:59,661 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 15079 records in 0 milliseconds 2020-09-27 21:39:03,202 INFO [pool-43-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Current stream shard assignments: shardId-000000000012 2020-09-27 21:39:03,202 INFO [pool-43-thread-1] c.a.s.k.clientlibrary.lib.worker.Worker Sleeping ... 2020-09-27 21:39:19,661 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository 2020-09-27 21:39:19,661 INFO [pool-15-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 15079 records in 0 milliseconds 2020-09-27 21:39:20,156 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@1fe275d8 checkpointed with 4 Records and 0 Swap Files in 4 milliseconds (Stop-the-world time = 0 milliseconds, Clear Edit Logs time = 0 millis), max Transaction ID 1312 {code} > Provenance failed to be compressed after nifi upgrade to 1.12 > ------------------------------------------------------------- > > Key: NIFI-7856 > URL: https://issues.apache.org/jira/browse/NIFI-7856 > Project: Apache NiFi > Issue Type: Bug > Affects Versions: 1.12.0 > Reporter: Mengze Li > Priority: Major > Attachments: ls.png, screenshot-1.png > > > We upgraded our nifi cluster from 1.11.3 to 1.12.0. > The nodes come up and everything looks to be functional. I can see 1.12.0 is > running. > Later on, we discovered that the data provenance is missing. From checking > our logs, we see tons of errors compressing the logs. > {code} > 2020-09-28 03:38:35,205 ERROR [Compress Provenance Logs-1-thread-1] > o.a.n.p.s.EventFileCompressor Failed to compress > ./provenance_repository/2752821.prov on rollover > {code} > This didn't happen in 1.11.3. > Is this a known issue? We are considering reverting back if there is no > solution for this since we can't go prod with no/broken data provenance. -- This message was sent by Atlassian Jira (v8.3.4#803005)