Serhii Nesterov created NIFI-11971:
--------------------------------------

             Summary: FlowFile content is corrupted across the whole NiFi 
instance throughout ProcessSession::write with omitting writing any byte to 
OutputStream
                 Key: NIFI-11971
                 URL: https://issues.apache.org/jira/browse/NIFI-11971
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 1.23.1, 1.23.0
            Reporter: Serhii Nesterov


One of the scenarios for ProcessSession::write was broken after recent code 
refactoring within the following pull request: 
[https://github.com/apache/nifi/pull/7363/files]

The issue is located in StandardContentClaimWriteCache.java in the write(final 
ContentClaim claim) method that returns an OutputStream used in the 
OutputStreamCallback interface to let NiFi processors write flowfile content 
through the ProcessSession::write method.

If a processor calls session.write but does not write any data to the output 
stream, then none of the write methods in the OutputStream is invoked, hence 
the length of the content claim is not recomputed which means the length will 
have the default value that is equal to -1. Because of the latest refactoring 
changes that are based on creating a new content claim on each 
ProcessSession::write invocation the following formula gives the wrong result:

previous offset + previous length = new offset.

For example, if the previous offset was 1000 and nothing was written to the 
stream (length is -1), then 1000 + (-1) will give us 999 which means that the 
offset is shifted back by one, hence the next content will have an extra 
character from the previous content at the beginning and will lose the last 
character at the end, and all other FlowFiles anywhere in NiFi will be 
corrupted by this defect until the NiFi instance is restarted.

 

The following steps can be taken to reproduce the issue (critical in our 
commercial project):
 * Create an empty text file (“a.txt”);
 * Create a text file with any text (“b.txt”);
 * Package these files into a .zip archive;
 * Put it into a file system on Azure Cloud (we use ADLS Gen2);
 * Read the zip file and unpack it content on the NiFi Canvas using 
FetchAzureDataLakeStorage and UnpackContent processors;
 * Start a flow with the GenerateFlowFile processor. See the results. The empty 
file must be extracted before the non-empty file, otherwise the issue won’t 
reproduce. You’ll see that the second FlowFile content will be corrupted – the 
first character is an unreadable character from the zip archive (last character 
of the content with zip) fetched with FetchAzureDataLakeStorage and the last 
character will be lost. Starting from this point, NiFi cannot be used at all 
because any other processors will lead to FlowFile content corruption across 
the entire NiFi instance due to the shifted offset.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to