[jira] [Commented] (NIFI-2395) PersistentProvenanceRepository Deadlocks caused by a blocked journal merge

Joseph Witt (JIRA) Tue, 26 Jul 2016 19:07:37 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394933#comment-15394933
 ]


Joseph Witt commented on NIFI-2395:
-----------------------------------

also [~badavis] can you please share the configuration settings you have in 
nifi.properties for the following 

{quote}
nifi.provenance.repository.directory.prov1=/repos/prov/prov-repo1
nifi.provenance.repository.max.storage.time=24 hours
nifi.provenance.repository.max.storage.size=50 GB
nifi.provenance.repository.rollover.time=30 secs
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=6
nifi.provenance.repository.indexing.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
nifi.provenance.repository.journal.count=16
# Comma-separated list of fields. Fields that are not indexed will not be 
searchable. Valid fields are:
# EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, 
AlternateIdentifierURI, Relationship, Details
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, 
ProcessorID
# FlowFile Attributes that should be indexed and made searchable
nifi.provenance.repository.indexed.attributes=twitter.msg, language
# Large values for the shard size will result in more Java heap usage when 
searching the Provenance Repository
# but should provide better performance
nifi.provenance.repository.index.shard.size=500 MB
{quote}

> PersistentProvenanceRepository Deadlocks caused by a blocked journal merge
> --------------------------------------------------------------------------
>
>                 Key: NIFI-2395
>                 URL: https://issues.apache.org/jira/browse/NIFI-2395
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Brian Davis
>            Assignee: Joseph Witt
>            Priority: Blocker
>
> I have a nifi instance that I have been running for about a week and has 
> deadlocked at least 3 times during this time.  When I say deadlock the whole 
> nifi instance stops doing any progress on flowfiles.  I looked at the stack 
> trace and there are a lot of threads stuck doing tasks in the 
> PersistentProvenanceRepository.  Looking at the code I think this is what is 
> happening:
> There is a ReadWriteLock that all the reads are waiting for a write.  The 
> write is in the loop:
> {code}
>                 while (journalFileCount > journalCountThreshold || repoSize > 
> sizeThreshold) {
>                     // if a shutdown happens while we are in this loop, kill 
> the rollover thread and break
>                     if (this.closed.get()) {
>                         if (future != null) {
>                             future.cancel(true);
>                         }
>                         break;
>                     }
>                     if (repoSize > sizeThreshold) {
>                         logger.debug("Provenance Repository has exceeded its 
> size threshold; will trigger purging of oldest events");
>                         purgeOldEvents();
>                         journalFileCount = getJournalCount();
>                         repoSize = getSize(getLogFiles(), 0L);
>                         continue;
>                     } else {
>                         // if we are constrained by the number of journal 
> files rather than the size of the repo,
>                         // then we will just sleep a bit because another 
> thread is already actively merging the journals,
>                         // due to the runnable that we scheduled above
>                         try {
>                             Thread.sleep(100L);
>                         } catch (final InterruptedException ie) {
>                         }
>                     }
>                     logger.debug("Provenance Repository is still behind. 
> Keeping flow slowed down "
>                             + "to accommodate. Currently, there are {} 
> journal files ({} bytes) and "
>                             + "threshold for blocking is {} ({} bytes)", 
> journalFileCount, repoSize, journalCountThreshold, sizeThreshold);
>                     journalFileCount = getJournalCount();
>                     repoSize = getSize(getLogFiles(), 0L);
>                 }
>                 logger.info("Provenance Repository has now caught up with 
> rolling over journal files. Current number of "
>                         + "journal files to be rolled over is {}", 
> journalFileCount);
>             }
> {code}
> My nifi is at the sleep indefinitely.  The reason my nifi cannot move forward 
> is because of the thread doing the merge is stopped.  The thread doing the 
> merge is at:
> {code}
> accepted = eventQueue.offer(new Tuple<>(record, blockIndex), 10, 
> TimeUnit.MILLISECONDS);
> {code}
> so the queue is full.  
> What I believe happened is that the callables created here:
> {code}
>                             final Callable<Object> callable = new 
> Callable<Object>() {
>                                 @Override
>                                 public Object call() throws IOException {
>                                     while (!eventQueue.isEmpty() || 
> !finishedAdding.get()) {
>                                         final 
> Tuple<StandardProvenanceEventRecord, Integer> tuple;
>                                         try {
>                                             tuple = eventQueue.poll(10, 
> TimeUnit.MILLISECONDS);
>                                         } catch (final InterruptedException 
> ie) {
>                                             continue;
>                                         }
>                                         if (tuple == null) {
>                                             continue;
>                                         }
>                                         indexingAction.index(tuple.getKey(), 
> indexWriter, tuple.getValue());
>                                     }
>                                     return null;
>                                 }
> {code}
> finish before the offer adds its first event because I do not see any Index 
> Provenance Events threads.  My guess is the while loop condition is wrong and 
> should be && instead of ||.
> I upped the thread count for the index creation from 1 to 3 to see if that 
> helps.  I can tell you if that helps later this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-2395) PersistentProvenanceRepository Deadlocks caused by a blocked journal merge

Reply via email to