garydgregory commented on code in PR #776:
URL: https://github.com/apache/commons-io/pull/776#discussion_r2323630769


##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2707,9 +2790,6 @@ public static byte[] toByteArray(final InputStream input, 
final long size) throw
      * @throws IllegalArgumentException if {@code size} is less than zero.
      */
     static byte[] toByteArray(final IOTriFunction<byte[], Integer, Integer, 
Integer> input, final int size) throws IOException {
-        if (size < 0) {

Review Comment:
   This check should remain here; a negative array size never makes sense here. 



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2659,35 +2660,61 @@ public static byte[] toByteArray(final InputStream 
inputStream) throws IOExcepti
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>
+     *   This variant allocates the target array immediately and attempts to 
fill it in one pass.

Review Comment:
   Saying "one pass" feels off since the underlying worker method _loops_ until 
EOF or the right amount is read. I don't think we should describe the _how_ of 
this method.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2659,35 +2660,61 @@ public static byte[] toByteArray(final InputStream 
inputStream) throws IOExcepti
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>
+     *   This variant allocates the target array immediately and attempts to 
fill it in one pass.
+     *   It assumes that {@code size} is correct.
+     *   If the stream ends prematurely, an {@link EOFException} is thrown.
+     * </p>
+     *
+     * <p>
+     *   <strong>Important:</strong> This method does <em>not</em> defend 
against corrupted
+     *   or untrusted {@code size} values.
+     *   For untrusted input, use {@link #toByteArray(InputStream, int, int)} 
instead,
+     *   which validates that the stream contains at least {@code size} bytes 
before allocating the target array.
+     * </p>
+     *
+     * @param input the {@link InputStream} to read; must not be {@code null}.
+     * @param size  the exact number of bytes to read; must be {@code >= 0}.
+     * @return a new byte array of length {@code size}.
+     * @throws IllegalArgumentException if {@code size} is negative.
+     * @throws EOFException             if the stream ends before {@code size} 
bytes are read.
+     * @throws IOException              if an I/O error occurs while reading.
+     * @throws NullPointerException     if {@code input} is {@code null}.
      * @since 2.1
      */
     @SuppressWarnings("resource")
     public static byte[] toByteArray(final InputStream input, final int size) 
throws IOException {
-        return toByteArray(Objects.requireNonNull(input, "input")::read, size);
+        Objects.requireNonNull(input, "input");
+        if (size < 0) {
+            throw new IllegalArgumentException("Size must be equal or greater 
than zero: " + size);
+        }
+        return toByteArray(input::read, size);
     }
 
     /**
-     * Gets contents of an {@link InputStream} as a {@code byte[]}.
-     * Use this method instead of {@link #toByteArray(InputStream)}
-     * when {@link InputStream} size is known.
-     * <strong>NOTE:</strong> the method checks that the length can safely be 
cast to an int without truncation
-     * before using {@link IOUtils#toByteArray(InputStream, int)} to read into 
the byte array.
-     * (Arrays can have no more than Integer.MAX_VALUE entries anyway.)
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= min(Integer.MAX_VALUE, length of input stream).
-     * @return byte [] the requested byte array, of length {@code size}.
-     * @throws IOException              if an I/O error occurs or {@link 
InputStream} length is less than {@code size}.
-     * @throws IllegalArgumentException if size is less than zero or size is 
greater than Integer.MAX_VALUE.
-     * @see IOUtils#toByteArray(InputStream, int)
+     * <p>
+     *   This is a convenience overload of {@link #toByteArray(InputStream, 
int, int)} that accepts a

Review Comment:
   Let's not define convenience versus not; the convenience is only due to the 
implementation, which feels wrong, we should probably (and separately) fix the 
implementation to use `Math.toExactInt()`.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2697,6 +2724,62 @@ public static byte[] toByteArray(final InputStream 
input, final long size) throw
         return toByteArray(input, (int) size);
     }
 
+    /**
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
+     *
+     * <p>
+     *   This variant validates that the stream actually contains {@code size} 
bytes.
+     *   It is suitable for untrusted input because it prevents oversized 
allocations when the provided {@code size}

Review Comment:
   I don't think we should start to document which APIs are designed for 
trusted versus untrusted input. The documentation should just describe the 
intended functionality.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2659,35 +2660,61 @@ public static byte[] toByteArray(final InputStream 
inputStream) throws IOExcepti
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>
+     *   This variant allocates the target array immediately and attempts to 
fill it in one pass.
+     *   It assumes that {@code size} is correct.
+     *   If the stream ends prematurely, an {@link EOFException} is thrown.

Review Comment:
   You can remove this since it duplicates the exception documentation.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2659,35 +2660,61 @@ public static byte[] toByteArray(final InputStream 
inputStream) throws IOExcepti
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>
+     *   This variant allocates the target array immediately and attempts to 
fill it in one pass.
+     *   It assumes that {@code size} is correct.

Review Comment:
   Let's not say we assume a parameter is correct; after all, we don't say when 
we assume a parameter incorrect in other cases. That's what exceptions are for, 
and sometimes the return value.
   



##########
src/main/java/org/apache/commons/io/RandomAccessFiles.java:
##########
@@ -76,7 +76,11 @@ private static long length(final RandomAccessFile raf) 
throws IOException {
      *                     other I/O error occurs.
      */
     public static byte[] read(final RandomAccessFile input, final long 
position, final int length) throws IOException {
+        Objects.requireNonNull(input, "input");
         input.seek(position);
+        if (length < 0) {

Review Comment:
   This change is unrelated to the feature, and I don't understand the benefit 
of moving the check from the method actually using the value to here.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2659,35 +2660,61 @@ public static byte[] toByteArray(final InputStream 
inputStream) throws IOExcepti
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>
+     *   This variant allocates the target array immediately and attempts to 
fill it in one pass.
+     *   It assumes that {@code size} is correct.
+     *   If the stream ends prematurely, an {@link EOFException} is thrown.
+     * </p>
+     *
+     * <p>
+     *   <strong>Important:</strong> This method does <em>not</em> defend 
against corrupted
+     *   or untrusted {@code size} values.
+     *   For untrusted input, use {@link #toByteArray(InputStream, int, int)} 
instead,
+     *   which validates that the stream contains at least {@code size} bytes 
before allocating the target array.
+     * </p>
+     *
+     * @param input the {@link InputStream} to read; must not be {@code null}.
+     * @param size  the exact number of bytes to read; must be {@code >= 0}.
+     * @return a new byte array of length {@code size}.
+     * @throws IllegalArgumentException if {@code size} is negative.
+     * @throws EOFException             if the stream ends before {@code size} 
bytes are read.
+     * @throws IOException              if an I/O error occurs while reading.
+     * @throws NullPointerException     if {@code input} is {@code null}.
      * @since 2.1
      */
     @SuppressWarnings("resource")
     public static byte[] toByteArray(final InputStream input, final int size) 
throws IOException {
-        return toByteArray(Objects.requireNonNull(input, "input")::read, size);
+        Objects.requireNonNull(input, "input");
+        if (size < 0) {
+            throw new IllegalArgumentException("Size must be equal or greater 
than zero: " + size);
+        }
+        return toByteArray(input::read, size);
     }
 
     /**
-     * Gets contents of an {@link InputStream} as a {@code byte[]}.
-     * Use this method instead of {@link #toByteArray(InputStream)}
-     * when {@link InputStream} size is known.
-     * <strong>NOTE:</strong> the method checks that the length can safely be 
cast to an int without truncation
-     * before using {@link IOUtils#toByteArray(InputStream, int)} to read into 
the byte array.
-     * (Arrays can have no more than Integer.MAX_VALUE entries anyway.)
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= min(Integer.MAX_VALUE, length of input stream).
-     * @return byte [] the requested byte array, of length {@code size}.
-     * @throws IOException              if an I/O error occurs or {@link 
InputStream} length is less than {@code size}.
-     * @throws IllegalArgumentException if size is less than zero or size is 
greater than Integer.MAX_VALUE.
-     * @see IOUtils#toByteArray(InputStream, int)
+     * <p>
+     *   This is a convenience overload of {@link #toByteArray(InputStream, 
int, int)} that accepts a
+     *   {@code long} size parameter. The value is checked to ensure it does 
not exceed
+     *   {@link Integer#MAX_VALUE} before being safely converted to {@code 
int}.
+     * </p>

Review Comment:
   This whole paragraph (above) is wrong.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2659,35 +2660,61 @@ public static byte[] toByteArray(final InputStream 
inputStream) throws IOExcepti
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>
+     *   This variant allocates the target array immediately and attempts to 
fill it in one pass.
+     *   It assumes that {@code size} is correct.
+     *   If the stream ends prematurely, an {@link EOFException} is thrown.
+     * </p>
+     *
+     * <p>
+     *   <strong>Important:</strong> This method does <em>not</em> defend 
against corrupted
+     *   or untrusted {@code size} values.
+     *   For untrusted input, use {@link #toByteArray(InputStream, int, int)} 
instead,
+     *   which validates that the stream contains at least {@code size} bytes 
before allocating the target array.
+     * </p>
+     *
+     * @param input the {@link InputStream} to read; must not be {@code null}.
+     * @param size  the exact number of bytes to read; must be {@code >= 0}.
+     * @return a new byte array of length {@code size}.
+     * @throws IllegalArgumentException if {@code size} is negative.
+     * @throws EOFException             if the stream ends before {@code size} 
bytes are read.
+     * @throws IOException              if an I/O error occurs while reading.
+     * @throws NullPointerException     if {@code input} is {@code null}.
      * @since 2.1
      */
     @SuppressWarnings("resource")
     public static byte[] toByteArray(final InputStream input, final int size) 
throws IOException {
-        return toByteArray(Objects.requireNonNull(input, "input")::read, size);
+        Objects.requireNonNull(input, "input");
+        if (size < 0) {

Review Comment:
   This check should remain in the underlying method.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2697,6 +2724,62 @@ public static byte[] toByteArray(final InputStream 
input, final long size) throw
         return toByteArray(input, (int) size);
     }
 
+    /**
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
+     *
+     * <p>
+     *   This variant validates that the stream actually contains {@code size} 
bytes.
+     *   It is suitable for untrusted input because it prevents oversized 
allocations when the provided {@code size}
+     *   is corrupted or malicious.
+     * </p>
+     *
+     * <ul>
+     *   <li>If {@code size <= bufferSize}, the array is allocated directly 
and filled in a single pass.</li>
+     *   <li>
+     *     If {@code size > bufferSize}, the stream is read incrementally 
using a buffer of length {@code bufferSize}.
+     *     This avoids allocating an excessively large array up front,
+     *     but may temporarily double memory usage due to buffering.
+     *   </li>
+     * </ul>
+     *
+     * @param input      the {@link InputStream} to read; must not be {@code 
null}.
+     * @param size       the exact number of bytes to read; must be {@code >= 
0}.
+     *                   The actual bytes read are validated to equal {@code 
size}.
+     * @param bufferSize the buffer size for incremental reading; must be 
{@code > 0}.
+     * @return a new byte array of length {@code size}.
+     * @throws IllegalArgumentException if {@code size} is negative or {@code 
bufferSize <= 0}.
+     * @throws EOFException             if the stream ends before {@code size} 
bytes are read.
+     * @throws IOException              if an I/O error occurs while reading.
+     * @throws NullPointerException     if {@code input} is {@code null}.
+     * @since 2.21.0
+     */
+    public static byte[] toByteArray(final InputStream input, final int size, 
final int bufferSize) throws IOException {
+        Objects.requireNonNull(input, "input");
+        if (size < 0) {
+            throw new IllegalArgumentException("Size must be equal or greater 
than zero: " + size);
+        }
+        if (bufferSize <= 0) {
+            throw new IllegalArgumentException("Chunk size must be greater 
than zero: " + bufferSize);

Review Comment:
   The message text and parameter names should match: Either "buffer" or 
"chunk", not both.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2659,35 +2660,61 @@ public static byte[] toByteArray(final InputStream 
inputStream) throws IOExcepti
     }
 
     /**
-     * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use 
this method instead of
-     * {@link #toByteArray(InputStream)} when {@link InputStream} size is 
known.
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
      *
-     * @param input the {@link InputStream} to read.
-     * @param size the size of {@link InputStream} to read, where 0 &lt; 
{@code size} &lt;= length of input stream.
-     * @return byte [] of length {@code size}.
-     * @throws IOException if an I/O error occurs or {@link InputStream} 
length is smaller than parameter {@code size}.
-     * @throws IllegalArgumentException if {@code size} is less than zero.
+     * <p>
+     *   This variant allocates the target array immediately and attempts to 
fill it in one pass.
+     *   It assumes that {@code size} is correct.
+     *   If the stream ends prematurely, an {@link EOFException} is thrown.
+     * </p>
+     *
+     * <p>

Review Comment:
   This feels like FUD to me, and I would remove the whole paragraph. I don't 
think we want to start documenting which methods fall into which defensive 
category, especially when the actual issue would be in the call site, not this 
implementation.



##########
src/changes/changes.xml:
##########
@@ -57,6 +57,7 @@ The <action> type attribute can be add,update,fix,remove.
       <action dev="ggregory" type="add"                due-to="Gary 
Gregory">Add org.apache.commons.io.output.ProxyOutputStream.writeRepeat(byte[], 
int, int, long).</action>
       <action dev="ggregory" type="add"                due-to="Gary 
Gregory">Add org.apache.commons.io.output.ProxyOutputStream.writeRepeat(byte[], 
long).</action>
       <action dev="ggregory" type="add"                due-to="Gary 
Gregory">Add org.apache.commons.io.output.ProxyOutputStream.writeRepeat(int, 
long).</action>
+      <action dev="pkarwasz" type="add"                due-to="Piotr P. 
Karwasz">Added toByteArray(InputStream, int, int) for safer incremental reading 
with size validation.</action>

Review Comment:
   "Added" -> "Add", like all other descriptions that add something.



##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2697,6 +2724,62 @@ public static byte[] toByteArray(final InputStream 
input, final long size) throw
         return toByteArray(input, (int) size);
     }
 
+    /**
+     * Reads exactly {@code size} bytes from the given {@link InputStream} 
into a new {@code byte[]}.
+     *
+     * <p>
+     *   This variant validates that the stream actually contains {@code size} 
bytes.
+     *   It is suitable for untrusted input because it prevents oversized 
allocations when the provided {@code size}
+     *   is corrupted or malicious.
+     * </p>
+     *
+     * <ul>
+     *   <li>If {@code size <= bufferSize}, the array is allocated directly 
and filled in a single pass.</li>
+     *   <li>
+     *     If {@code size > bufferSize}, the stream is read incrementally 
using a buffer of length {@code bufferSize}.
+     *     This avoids allocating an excessively large array up front,
+     *     but may temporarily double memory usage due to buffering.
+     *   </li>
+     * </ul>
+     *
+     * @param input      the {@link InputStream} to read; must not be {@code 
null}.
+     * @param size       the exact number of bytes to read; must be {@code >= 
0}.
+     *                   The actual bytes read are validated to equal {@code 
size}.
+     * @param bufferSize the buffer size for incremental reading; must be 
{@code > 0}.
+     * @return a new byte array of length {@code size}.
+     * @throws IllegalArgumentException if {@code size} is negative or {@code 
bufferSize <= 0}.
+     * @throws EOFException             if the stream ends before {@code size} 
bytes are read.
+     * @throws IOException              if an I/O error occurs while reading.
+     * @throws NullPointerException     if {@code input} is {@code null}.
+     * @since 2.21.0
+     */
+    public static byte[] toByteArray(final InputStream input, final int size, 
final int bufferSize) throws IOException {
+        Objects.requireNonNull(input, "input");
+        if (size < 0) {

Review Comment:
   You can remove this check since the worker method does it (in master).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to