[ 
https://issues.apache.org/jira/browse/NIFI-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated NIFI-11971:
-----------------------------------
    Attachment: image-2023-08-20-19-37-43-772.png

> FlowFile content is corrupted across the whole NiFi instance throughout 
> ProcessSession::write with omitting writing any byte to OutputStream
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-11971
>                 URL: https://issues.apache.org/jira/browse/NIFI-11971
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.23.0, 1.23.1
>            Reporter: Serhii Nesterov
>            Priority: Critical
>         Attachments: image-2023-08-20-19-31-16-598.png, 
> image-2023-08-20-19-37-43-772.png
>
>
> One of the scenarios for ProcessSession::write was broken after recent code 
> refactoring within the following pull request: 
> [https://github.com/apache/nifi/pull/7363/files]
> The issue is located in StandardContentClaimWriteCache.java in the 
> write(final ContentClaim claim) method that returns an OutputStream used in 
> the OutputStreamCallback interface to let NiFi processors write flowfile 
> content through the ProcessSession::write method.
> If a processor calls session.write but does not write any data to the output 
> stream, then none of the write methods in the OutputStream is invoked, hence 
> the length of the content claim is not recomputed which means the length will 
> have the default value that is equal to -1. Because of the latest refactoring 
> changes that are based on creating a new content claim on each 
> ProcessSession::write invocation the following formula gives the wrong result:
> previous offset + previous length = new offset.
> For example, if the previous offset was 1000 and nothing was written to the 
> stream (length is -1), then 1000 + (-1) will give us 999 which means that the 
> offset is shifted back by one, hence the next content will have an extra 
> character from the previous content at the beginning and will lose the last 
> character at the end, and all other FlowFiles anywhere in NiFi will be 
> corrupted by this defect until the NiFi instance is restarted.
>  
> The following steps can be taken to reproduce the issue (critical in our 
> commercial project):
>  * Create an empty text file (“a.txt”);
>  * Create a text file with any text (“b.txt”);
>  * Package these files into a .zip archive;
>  * Put it into a file system on Azure Cloud (we use ADLS Gen2);
>  * Read the zip file and unpack it content on the NiFi Canvas using 
> FetchAzureDataLakeStorage and UnpackContent processors;
>  * Start a flow with the GenerateFlowFile processor. See the results. The 
> empty file must be extracted before the non-empty file, otherwise the issue 
> won’t reproduce. You’ll see that the second FlowFile content will be 
> corrupted – the first character is an unreadable character from the zip 
> archive (last character of the content with zip) fetched with 
> FetchAzureDataLakeStorage and the last character will be lost. Starting from 
> this point, NiFi cannot be used at all because any other processors will lead 
> to FlowFile content corruption across the entire NiFi instance due to the 
> shifted offset.
> A sample canvas:
> !image-2023-08-20-19-31-16-598.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to