[ 
https://issues.apache.org/jira/browse/NIFI-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005923#comment-15005923
 ] 

ASF GitHub Bot commented on NIFI-748:
-------------------------------------

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/123#discussion_r44872811
  
    --- Diff: 
nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/lucene/DocsReader.java
 ---
    @@ -100,101 +96,61 @@ private ProvenanceEventRecord getRecord(final 
Document d, final RecordReader rea
                 }
             }
     
    -        if ( record == null ) {
    -            throw new IOException("Failed to find Provenance Event " + d);
    -        } else {
    -            return record;
    +        if (record == null) {
    +            logger.warn("Failed to read Provenance Event for '" + d + "'. 
The event file may be missing or corrupted");
             }
    -    }
     
    +        return record;
    +    }
     
         public Set<ProvenanceEventRecord> read(final List<Document> docs, 
final Collection<Path> allProvenanceLogFiles,
    -        final AtomicInteger retrievalCount, final int maxResults, final 
int maxAttributeChars) throws IOException {
    -        if (retrievalCount.get() >= maxResults) {
    -            return Collections.emptySet();
    -        }
    -
    -        LuceneUtil.sortDocsForRetrieval(docs);
    -
    -        RecordReader reader = null;
    -        String lastStorageFilename = null;
    -        final Set<ProvenanceEventRecord> matchingRecords = new 
LinkedHashSet<>();
    +            final AtomicInteger retrievalCount, final int maxResults, 
final int maxAttributeChars) throws IOException {
     
             final long start = System.nanoTime();
    -        int logFileCount = 0;
    -
    -        final Set<String> storageFilesToSkip = new HashSet<>();
    -        int eventsReadThisFile = 0;
     
    -        try {
    -            for (final Document d : docs) {
    -                final String storageFilename = 
d.getField(FieldNames.STORAGE_FILENAME).stringValue();
    -                if ( storageFilesToSkip.contains(storageFilename) ) {
    -                    continue;
    -                }
    -
    -                try {
    -                    if (reader != null && 
storageFilename.equals(lastStorageFilename)) {
    -                        matchingRecords.add(getRecord(d, reader));
    -                        eventsReadThisFile++;
    -
    -                        if ( retrievalCount.incrementAndGet() >= 
maxResults ) {
    -                            break;
    -                        }
    -                    } else {
    -                        logger.debug("Opening log file {}", 
storageFilename);
    -
    -                        logFileCount++;
    -                        if (reader != null) {
    -                            reader.close();
    -                        }
    +        Set<ProvenanceEventRecord> matchingRecords = new LinkedHashSet<>();
    +        if (retrievalCount.get() >= maxResults) {
    +            return matchingRecords;
    --- End diff --
    
    We def need to be careful in here.  Performance and implications on GC in 
general are critical in anything prov related. 


> If unable to find a specific Provenance event, should not fail entire search
> ----------------------------------------------------------------------------
>
>                 Key: NIFI-748
>                 URL: https://issues.apache.org/jira/browse/NIFI-748
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Oleg Zhurakousky
>             Fix For: 0.4.0
>
>
> We have a case where running with the prov being written to a disk that can 
> be ejected. Disk was accidentally ejected while running. Provenance Event 
> appears to have been indexed but event is not in the repo.
> Specifically, we are reaching Line 104 of DocsReader:
> {code}
> throw new IOException("Failed to find Provenance Event " + d);
> {code}
> As a result, searching for a specific Component ID is returning an error, so 
> we can't search on that Component ID at all (unless we shrink the time range 
> to a time when that didn't occur).
> We should generate a warning, and notify the user that X number of events 
> could not be found and show what we can, rather than erroring out entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to