[IO] Change in behavior in Commons FileUpload after upgrade to Commons IO 2.16.1
Hi, Today, I found the following questions in https://github.com/apache/commons-io/pull/609: > The behavior for a negative threshold should be the same as 0 IMO. WDYT? > What does it even mean that a threshold is negative? > Writing zero bytes writes nothing, so there is nothing to reach until you at > least write one byte. > Am I missing something? I would like to highlight a "use case" for a negative threshold, and how the change to disallow a negative threshold might impact existing code. I upgraded from Commons IO 2.15.1 to 2.16.1 yesterday and found a small change in the behavior of Commons FileUpload when uploading and processing empty files. The effect is visible only when passing a negative value as file size threshold to Commons FileUpload. Here is a small extract from the Javadoc of Commons FileUpload, class `DefaultFileItemFactory`, constructor parameter `sizeThreshold`: > sizeThreshold - The threshold, in bytes, below which items will be > retained in memory and above which they will be stored as a file. (source: https://javadoc.io/doc/commons-fileupload/commons-fileupload/latest/index.html) By passing a negative value for `sizeThreshold`, Commons FileUpload can be configured to disable the in-memory caching for all uploaded files, including empty files with a size of 0 bytes. As a result, `DiskFileItem` objects created by Commons FileUpload will always have a `File` instance set internally, even for empty files. `DiskFileItem` in Commons FileUpload internally makes use of `DeferredFileOutputStream`, and therefore `ThresholdingOutputStream`. At some point it calls `isThresholdExceeded()` to check whether the size of the uploaded file exceeds the given threshold. By disallowing a negative threshold, empty files will now be treated differently by Commons FileUpload. With a size of 0 bytes, they will not exceed the enforced minimum threshold of 0 bytes anymore, and their data will therefore be kept in memory. This can break follow-up code which relies on the previous behavior and expects a `File` instance to be created for every uploaded file, even empty files. I know that this is a very specific use case. I don't know whether the developers of Commons FileUpload ever intended a negative threshold to be used. Still, the question was asked whether a negative threshold could have any meaning. I assume the answer is "yes". But I don't know whether this qualifies as a bug or a regression. I also don't know whether there are other similar use cases in other libraries depending on Commons IO. Best, Stephan Email Disclaimer FNZ (UK) Ltd registered in England and Wales (05435760) 10th Floor, 135 Bishopsgate, London EC2M 3TP, FNZ (UK) Ltd is authorised and regulated by the Financial Conduct Authority (438687); FNZ TA Services Ltd registered in England and Wales (09571767) 10th Floor, 135 Bishopsgate, London EC2M 3TP, FNZ TA Services Ltd is authorised and regulated by the Financial Conduct Authority (932253); FNZ Securities Ltd registered in England and Wales (09486463) 10th Floor, 135 Bishopsgate, London EC2M 3TP, FNZ Securities Ltd, is authorised and regulated by the Financial Conduct Authority (733400); JHC Systems Limited registered in England and Wales (08729370) Temple Point 6th Floor, 1 Temple Row, Birmingham, West Midlands, B2 5LG; FNZ (Europe) DAC registered in Ireland (657886) Block C, Irish Life Centre, Lower Abbey Street, Dublin 1, D01V9F5, Ireland; FNZ SA (Pty) Ltd registered under the laws of South Africa (2018/547997/07), 1st floor, Newport House, Prestwich Street, Greenpoint, western Cape, 8001; FNZ Limited registered in New Zealand (1797706) FNZ House, Level 3, 29A Brandon Street, Wellington, 6011 New Zealand; FNZ (Australia) Pty Ltd registered in Australia (138 819 119) Level 1, 99 Elizabeth St, Sydney 2000; FNZ (Hong Kong) Limited registered in Hong Kong (1305362) 6A-1, Koshun House, 331 Nathan Road, Hong Kong; FNZ (Singapore) Services Pte. Ltd. registered in Singapore (201307199R) 61 Robinson Road, #13-03A, Robinson Centre, Singapore (068893); and FNZ (China) Ltd registered in China (91310115MA1K3G4K6T) [中国(上海)自由贸易试验区世纪大道1196 号二 座20 层. This message is intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to us, and immediately and permanently delete it. Do not use, copy or disclose the information contained in this message or in any attachment. Emails sent to and from FNZ may be monitored and read for legitimate business purposes. Emails cannot be guaranteed to be secure or error-free, and you should protect your systems. FNZ does not accept any liability arising from interception, corruption, loss or destruction of this email, or if it arrives late or incomplete or with errors or viruses. For more information about the FNZ group please visit our website here where you can also find links to our policies including our Privacy policy which
Re: [IO] Bug in FileChannels.contentEquals?
A PR with a test for FileChannels.contentEquals is available here: https://github.com/apache/commons-io/pull/509 On Mon, Nov 6, 2023 at 5:37 PM Elliotte Rusty Harold wrote: > Can you turn this into a PR against HEAD that adds a unit test that > fails? That would be fairly conclusive proof. It's better if you don't > send a fix, at least at first. That makes it easier to verify the bug. > > On Mon, Nov 6, 2023 at 10:04 AM Stephan Markwalder > wrote: > > > > Hi, > > > > I think I have found a bug in the new FileChannels.contentEquals(...) in > > Commons IO 2.15.0 which then affects > RandomAccessFiles.contentEquals(...), > > PathUtils.fileContentEquals(...), FileUtils.contentEquals(...), and maybe > > more methods. But before opening an issue in ASF JIRA, I would like to > > present my findings here. > > > > My current working hypothesis: > > If two files have the exact same size and the exact same content in the > > last 8192 bytes (value of IOUtils.DEFAULT_BUFFER_SIZE), then all of the > > above methods will return true, even if the content of the files is > > different before the last 8192 bytes. > > > > Here is some example code: > > > > // create two files with same size but different content > > // (3 different bytes followed by 8192 equal bytes) > > File file1 = new File("file1.txt"); > > File file2 = new File("file2.txt"); > > > > String sameContent = StringUtils.repeat("x", 8192); > > > > String content1 = "ABC" + sameContent; > > String content2 = "XYZ" + sameContent; > > FileUtils.writeStringToFile(file1, content1, StandardCharsets.US_ASCII); > > FileUtils.writeStringToFile(file2, content2, StandardCharsets.US_ASCII); > > > > > > // compare files > > boolean equals = FileUtils.contentEquals(file1, file2); > > System.out.println(equals); > > > > > > I would expect this to print "false" as the first 3 bytes are different, > > but the code prints "true". I tested with Eclipse Temurin 11.0.1, 17.0.7, > > and 21.0.1, all on MacOS. > > > > I'm not an expert on FileChannels, but I think the problem has something > to > > do with the call to method FileChannel.read(ByteBuffer) in > > org.apache.commons.io.channels.FileChannels. Before the call to this > > method, the buffer is at position 0. After this call, the buffer is at > > position 8192. With a limit of 8192, this means that there are 0 > remaining > > bytes to be compared in ByteBuffer.equals(...). Maybe there should be a > > call to ByteBuffer.rewind() or ByteBuffer.position(0) after the read so > > that the position is set back to 0 before comparing the content of the > > buffers? But as stated before, I'm not an expert on this topic. I'm also > > not sure whether my hypothesis is 100% accurate. But this has been my > > conclusion after some experiments. > > > > Best regards, > > Ste > > > > -- > Elliotte Rusty Harold > elh...@ibiblio.org > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
[IO] Bug in FileChannels.contentEquals?
Hi, I think I have found a bug in the new FileChannels.contentEquals(...) in Commons IO 2.15.0 which then affects RandomAccessFiles.contentEquals(...), PathUtils.fileContentEquals(...), FileUtils.contentEquals(...), and maybe more methods. But before opening an issue in ASF JIRA, I would like to present my findings here. My current working hypothesis: If two files have the exact same size and the exact same content in the last 8192 bytes (value of IOUtils.DEFAULT_BUFFER_SIZE), then all of the above methods will return true, even if the content of the files is different before the last 8192 bytes. Here is some example code: // create two files with same size but different content // (3 different bytes followed by 8192 equal bytes) File file1 = new File("file1.txt"); File file2 = new File("file2.txt"); String sameContent = StringUtils.repeat("x", 8192); String content1 = "ABC" + sameContent; String content2 = "XYZ" + sameContent; FileUtils.writeStringToFile(file1, content1, StandardCharsets.US_ASCII); FileUtils.writeStringToFile(file2, content2, StandardCharsets.US_ASCII); // compare files boolean equals = FileUtils.contentEquals(file1, file2); System.out.println(equals); I would expect this to print "false" as the first 3 bytes are different, but the code prints "true". I tested with Eclipse Temurin 11.0.1, 17.0.7, and 21.0.1, all on MacOS. I'm not an expert on FileChannels, but I think the problem has something to do with the call to method FileChannel.read(ByteBuffer) in org.apache.commons.io.channels.FileChannels. Before the call to this method, the buffer is at position 0. After this call, the buffer is at position 8192. With a limit of 8192, this means that there are 0 remaining bytes to be compared in ByteBuffer.equals(...). Maybe there should be a call to ByteBuffer.rewind() or ByteBuffer.position(0) after the read so that the position is set back to 0 before comparing the content of the buffers? But as stated before, I'm not an expert on this topic. I'm also not sure whether my hypothesis is 100% accurate. But this has been my conclusion after some experiments. Best regards, Ste