[IO] Change in behavior in Commons FileUpload after upgrade to Commons IO 2.16.1

2024-04-10 Thread Stephan Markwalder
Hi,

Today, I found the following questions in 
https://github.com/apache/commons-io/pull/609:

> The behavior for a negative threshold should be the same as 0 IMO. WDYT?
> What does it even mean that a threshold is negative?
> Writing zero bytes writes nothing, so there is nothing to reach until you at 
> least write one byte.
> Am I missing something?

I would like to highlight a "use case" for a negative threshold, and how the 
change to disallow a negative threshold might impact existing code.

I upgraded from Commons IO 2.15.1 to 2.16.1 yesterday and found a small change 
in the behavior of Commons FileUpload when uploading and processing empty 
files. The effect is visible only when passing a negative value as file size 
threshold to Commons FileUpload. Here is a small extract from the Javadoc of 
Commons FileUpload, class `DefaultFileItemFactory`, constructor parameter 
`sizeThreshold`:

> sizeThreshold - The threshold, in bytes, below which items will be
> retained in memory and above which they will be stored as a file.
(source: 
https://javadoc.io/doc/commons-fileupload/commons-fileupload/latest/index.html)

By passing a negative value for `sizeThreshold`, Commons FileUpload can be 
configured to disable the in-memory caching for all uploaded files, including 
empty files with a size of 0 bytes. As a result, `DiskFileItem` objects created 
by Commons FileUpload will always have a `File` instance set internally, even 
for empty files.

`DiskFileItem` in Commons FileUpload internally makes use of 
`DeferredFileOutputStream`, and therefore `ThresholdingOutputStream`. At some 
point it calls `isThresholdExceeded()` to check whether the size of the 
uploaded file exceeds the given threshold. By disallowing a negative threshold, 
empty files will now be treated differently by Commons FileUpload. With a size 
of 0 bytes, they will not exceed the enforced minimum threshold of 0 bytes 
anymore, and their data will therefore be kept in memory. This can break 
follow-up code which relies on the previous behavior and expects a `File` 
instance to be created for every uploaded file, even empty files.

I know that this is a very specific use case. I don't know whether the 
developers of Commons FileUpload ever intended a negative threshold to be used. 
Still, the question was asked whether a negative threshold could have any 
meaning. I assume the answer is "yes". But I don't know whether this qualifies 
as a bug or a regression. I also don't know whether there are other similar use 
cases in other libraries depending on Commons IO.

Best,
Stephan
Email Disclaimer
FNZ (UK) Ltd registered in England and Wales (05435760) 10th Floor, 135 
Bishopsgate, London EC2M 3TP, FNZ (UK) Ltd is authorised and regulated by the 
Financial Conduct Authority (438687); FNZ TA Services Ltd registered in England 
and Wales (09571767) 10th Floor, 135 Bishopsgate, London EC2M 3TP, FNZ TA 
Services Ltd is authorised and regulated by the Financial Conduct Authority 
(932253); FNZ Securities Ltd registered in England and Wales (09486463) 10th 
Floor, 135 Bishopsgate, London EC2M 3TP, FNZ Securities Ltd, is authorised and 
regulated by the Financial Conduct Authority (733400); JHC Systems Limited 
registered in England and Wales (08729370) Temple Point 6th Floor, 1 Temple 
Row, Birmingham, West Midlands, B2 5LG; FNZ (Europe) DAC registered in Ireland 
(657886)  Block C, Irish Life Centre, Lower Abbey Street, Dublin 1, D01V9F5, 
Ireland; FNZ SA (Pty) Ltd registered under the laws of South Africa 
(2018/547997/07), 1st floor, Newport House, Prestwich Street, Greenpoint, 
western Cape, 8001; FNZ Limited registered in New Zealand (1797706) FNZ House, 
Level 3, 29A Brandon Street, Wellington, 6011 New Zealand; FNZ (Australia) Pty 
Ltd registered in Australia (138 819 119) Level 1, 99 Elizabeth St, Sydney 
2000; FNZ (Hong Kong) Limited registered in Hong Kong (1305362) 6A-1, Koshun 
House, 331 Nathan Road, Hong Kong; FNZ (Singapore) Services Pte. Ltd. 
registered in Singapore (201307199R) 61 Robinson Road, #13-03A, Robinson 
Centre, Singapore (068893); and FNZ (China) Ltd registered in China 
(91310115MA1K3G4K6T) [中国(上海)自由贸易试验区世纪大道1196 号二 座20 层.
This message is intended solely for the addressee and may contain confidential 
information. If you have received this message in error, please send it back to 
us, and immediately and permanently delete it. Do not use, copy or disclose the 
information contained in this message or in any attachment. 
Emails sent to and from FNZ may be monitored and read for legitimate business 
purposes. Emails cannot be guaranteed to be secure or error-free, and you 
should protect your systems. FNZ does not accept any liability arising from 
interception, corruption, loss or destruction of this email, or if it arrives 
late or incomplete or with errors or viruses.
For more information about the FNZ group please visit our website here where 
you can also find links to our policies including our Privacy policy which 

Re: [IO] Bug in FileChannels.contentEquals?

2023-11-06 Thread Stephan Markwalder
A PR with a test for FileChannels.contentEquals is available here:
https://github.com/apache/commons-io/pull/509

On Mon, Nov 6, 2023 at 5:37 PM Elliotte Rusty Harold 
wrote:

> Can you turn this into a PR against HEAD that adds a unit test that
> fails? That would be fairly conclusive proof. It's better if you don't
> send a fix, at least at first. That makes it easier to verify the bug.
>
> On Mon, Nov 6, 2023 at 10:04 AM Stephan Markwalder
>  wrote:
> >
> > Hi,
> >
> > I think I have found a bug in the new FileChannels.contentEquals(...) in
> > Commons IO 2.15.0 which then affects
> RandomAccessFiles.contentEquals(...),
> > PathUtils.fileContentEquals(...), FileUtils.contentEquals(...), and maybe
> > more methods. But before opening an issue in ASF JIRA, I would like to
> > present my findings here.
> >
> > My current working hypothesis:
> > If two files have the exact same size and the exact same content in the
> > last 8192 bytes (value of IOUtils.DEFAULT_BUFFER_SIZE), then all of the
> > above methods will return true, even if the content of the files is
> > different before the last 8192 bytes.
> >
> > Here is some example code:
> >
> > // create two files with same size but different content
> > // (3 different bytes followed by 8192 equal bytes)
> > File file1 = new File("file1.txt");
> > File file2 = new File("file2.txt");
> >
> > String sameContent = StringUtils.repeat("x", 8192);
> >
> > String content1 = "ABC" + sameContent;
> > String content2 = "XYZ" + sameContent;
> > FileUtils.writeStringToFile(file1, content1, StandardCharsets.US_ASCII);
> > FileUtils.writeStringToFile(file2, content2, StandardCharsets.US_ASCII);
> >
> >
> > // compare files
> > boolean equals = FileUtils.contentEquals(file1, file2);
> > System.out.println(equals);
> >
> >
> > I would expect this to print "false" as the first 3 bytes are different,
> > but the code prints "true". I tested with Eclipse Temurin 11.0.1, 17.0.7,
> > and 21.0.1, all on MacOS.
> >
> > I'm not an expert on FileChannels, but I think the problem has something
> to
> > do with the call to method FileChannel.read(ByteBuffer) in
> > org.apache.commons.io.channels.FileChannels. Before the call to this
> > method, the buffer is at position 0. After this call, the buffer is at
> > position 8192. With a limit of 8192, this means that there are 0
> remaining
> > bytes to be compared in ByteBuffer.equals(...). Maybe there should be a
> > call to ByteBuffer.rewind() or ByteBuffer.position(0) after the read so
> > that the position is set back to 0 before comparing the content of the
> > buffers? But as stated before, I'm not an expert on this topic. I'm also
> > not sure whether my hypothesis is 100% accurate. But this has been my
> > conclusion after some experiments.
> >
> > Best regards,
> > Ste
>
>
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


[IO] Bug in FileChannels.contentEquals?

2023-11-06 Thread Stephan Markwalder
Hi,

I think I have found a bug in the new FileChannels.contentEquals(...) in
Commons IO 2.15.0 which then affects RandomAccessFiles.contentEquals(...),
PathUtils.fileContentEquals(...), FileUtils.contentEquals(...), and maybe
more methods. But before opening an issue in ASF JIRA, I would like to
present my findings here.

My current working hypothesis:
If two files have the exact same size and the exact same content in the
last 8192 bytes (value of IOUtils.DEFAULT_BUFFER_SIZE), then all of the
above methods will return true, even if the content of the files is
different before the last 8192 bytes.

Here is some example code:

// create two files with same size but different content
// (3 different bytes followed by 8192 equal bytes)
File file1 = new File("file1.txt");
File file2 = new File("file2.txt");

String sameContent = StringUtils.repeat("x", 8192);

String content1 = "ABC" + sameContent;
String content2 = "XYZ" + sameContent;
FileUtils.writeStringToFile(file1, content1, StandardCharsets.US_ASCII);
FileUtils.writeStringToFile(file2, content2, StandardCharsets.US_ASCII);


// compare files
boolean equals = FileUtils.contentEquals(file1, file2);
System.out.println(equals);


I would expect this to print "false" as the first 3 bytes are different,
but the code prints "true". I tested with Eclipse Temurin 11.0.1, 17.0.7,
and 21.0.1, all on MacOS.

I'm not an expert on FileChannels, but I think the problem has something to
do with the call to method FileChannel.read(ByteBuffer) in
org.apache.commons.io.channels.FileChannels. Before the call to this
method, the buffer is at position 0. After this call, the buffer is at
position 8192. With a limit of 8192, this means that there are 0 remaining
bytes to be compared in ByteBuffer.equals(...). Maybe there should be a
call to ByteBuffer.rewind() or ByteBuffer.position(0) after the read so
that the position is set back to 0 before comparing the content of the
buffers? But as stated before, I'm not an expert on this topic. I'm also
not sure whether my hypothesis is 100% accurate. But this has been my
conclusion after some experiments.

Best regards,
Ste