ecki commented on code in PR #776:
URL: https://github.com/apache/commons-io/pull/776#discussion_r2336532313
##########
src/changes/changes.xml:
##########
@@ -57,6 +57,7 @@ The <action> type attribute can be add,update,fix,remove.
<action dev="ggregory" type="add" due-to="Gary
Gregory">Add org.apache.commons.io.output.ProxyOutputStream.writeRepeat(byte[],
int, int, long).</action>
<action dev="ggregory" type="add" due-to="Gary
Gregory">Add org.apache.commons.io.output.ProxyOutputStream.writeRepeat(byte[],
long).</action>
<action dev="ggregory" type="add" due-to="Gary
Gregory">Add org.apache.commons.io.output.ProxyOutputStream.writeRepeat(int,
long).</action>
+ <action dev="pkarwasz" type="add" due-to="Piotr P.
Karwasz">Add IOUtils.toByteArray(InputStream, int, int) for safer incremental
reading with size validation.</action>
Review Comment:
Btw what’s incremental about reading a while stream into a single array?
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2645,61 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
Review Comment:
For me “proportional” could also mean 2x so maybe “is the same”? And “large
input streams” is also relatife with 2gb. Maybe say it directly “Only streams
which fit into a single byte array with roughly 2GB limit can be processed with
this method”?
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2645,61 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This
makes it unsuitable for
+ * processing large input streams, unless sufficient heap space is
available.</p>
+ *
+ * @param inputStream The {@link InputStream} to read; must not be {@code
null}.
+ * @return A new byte array.
+ * @throws IllegalArgumentException If the size of the stream is greater
than the maximum array size.
+ * @throws IOException If an I/O error occurs while reading.
+ * @throws NullPointerException If {@code inputStream} is {@code null}.
*/
public static byte[] toByteArray(final InputStream inputStream) throws
IOException {
- // We use a ThresholdingOutputStream to avoid reading AND writing more
than Integer.MAX_VALUE.
- try (UnsynchronizedByteArrayOutputStream ubaOutput =
UnsynchronizedByteArrayOutputStream.builder().get();
- ThresholdingOutputStream thresholdOutput = new
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
- throw new IllegalArgumentException(String.format("Cannot
read more than %,d into a byte array", Integer.MAX_VALUE));
- }, os -> ubaOutput)) {
- copy(inputStream, thresholdOutput);
- return ubaOutput.toByteArray();
+ final UnsynchronizedByteArrayOutputStream output =
copyToOutputStream(inputStream, SOFT_MAX_ARRAY_LENGTH + 1, DEFAULT_BUFFER_SIZE);
Review Comment:
Overall it looks like a simplification!
##########
src/test/java/org/apache/commons/io/IOUtilsTest.java:
##########
@@ -1659,6 +1662,48 @@ void testToByteArray_InputStream_SizeZero() throws
Exception {
}
}
+ @ParameterizedTest
+ @MethodSource
+ void testToByteArray_InputStream_Size_BufferSize_Succeeds(byte[] data, int
size, int bufferSize) throws IOException {
+ final ByteArrayInputStream input = new ByteArrayInputStream(data);
+ final byte[] expected = Arrays.copyOf(data, size);
+ final byte[] actual = IOUtils.toByteArray(input, size, bufferSize);
+ assertArrayEquals(expected, actual);
+ }
+
+ private static Stream<Arguments>
testToByteArray_InputStream_Size_BufferSize_Succeeds() {
+ final byte[] data = new byte[1024];
+ for (int i = 0; i < 1024; i++) {
+ data[i] = (byte) i;
+ }
+ return Stream.of(
+ // Eager reading
+ Arguments.of(data.clone(), 512, 1024),
Review Comment:
Those clones and array copies are pretty defensive, but I guess it’s fine
here the test only uses small ones
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -221,6 +221,14 @@ public class IOUtils {
*/
private static final char[] SCRATCH_CHAR_BUFFER_WO = charArray();
+ /**
+ * The maximum size of an array in many Java VMs.
+ * <p>
+ * The constant is copied from OpenJDK's {@link
jdk.internal.util.ArraysSupport#SOFT_MAX_ARRAY_LENGTH}.
+ * </p>
+ */
+ private static final int SOFT_MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
Review Comment:
I wonder what’s soft and what’s -8 is about, but I guess the question is for
openjdk (for the review it’s only important that it does not overflow, see
below)
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2645,61 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This
makes it unsuitable for
+ * processing large input streams, unless sufficient heap space is
available.</p>
+ *
+ * @param inputStream The {@link InputStream} to read; must not be {@code
null}.
+ * @return A new byte array.
+ * @throws IllegalArgumentException If the size of the stream is greater
than the maximum array size.
+ * @throws IOException If an I/O error occurs while reading.
+ * @throws NullPointerException If {@code inputStream} is {@code null}.
*/
public static byte[] toByteArray(final InputStream inputStream) throws
IOException {
- // We use a ThresholdingOutputStream to avoid reading AND writing more
than Integer.MAX_VALUE.
- try (UnsynchronizedByteArrayOutputStream ubaOutput =
UnsynchronizedByteArrayOutputStream.builder().get();
- ThresholdingOutputStream thresholdOutput = new
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
- throw new IllegalArgumentException(String.format("Cannot
read more than %,d into a byte array", Integer.MAX_VALUE));
- }, os -> ubaOutput)) {
- copy(inputStream, thresholdOutput);
- return ubaOutput.toByteArray();
+ final UnsynchronizedByteArrayOutputStream output =
copyToOutputStream(inputStream, SOFT_MAX_ARRAY_LENGTH + 1, DEFAULT_BUFFER_SIZE);
+ if (output.size() > SOFT_MAX_ARRAY_LENGTH) {
Review Comment:
Hm, the UABOS.size() feels a bit overloaded here and why does the write()
not throw?
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2645,61 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This
makes it unsuitable for
+ * processing large input streams, unless sufficient heap space is
available.</p>
+ *
+ * @param inputStream The {@link InputStream} to read; must not be {@code
null}.
+ * @return A new byte array.
+ * @throws IllegalArgumentException If the size of the stream is greater
than the maximum array size.
+ * @throws IOException If an I/O error occurs while reading.
+ * @throws NullPointerException If {@code inputStream} is {@code null}.
*/
public static byte[] toByteArray(final InputStream inputStream) throws
IOException {
- // We use a ThresholdingOutputStream to avoid reading AND writing more
than Integer.MAX_VALUE.
- try (UnsynchronizedByteArrayOutputStream ubaOutput =
UnsynchronizedByteArrayOutputStream.builder().get();
- ThresholdingOutputStream thresholdOutput = new
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
- throw new IllegalArgumentException(String.format("Cannot
read more than %,d into a byte array", Integer.MAX_VALUE));
- }, os -> ubaOutput)) {
- copy(inputStream, thresholdOutput);
- return ubaOutput.toByteArray();
+ final UnsynchronizedByteArrayOutputStream output =
copyToOutputStream(inputStream, SOFT_MAX_ARRAY_LENGTH + 1, DEFAULT_BUFFER_SIZE);
Review Comment:
Maybe add a comment that the soft max is guranteed to not overflow on +1?
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2645,61 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This
makes it unsuitable for
+ * processing large input streams, unless sufficient heap space is
available.</p>
+ *
+ * @param inputStream The {@link InputStream} to read; must not be {@code
null}.
+ * @return A new byte array.
+ * @throws IllegalArgumentException If the size of the stream is greater
than the maximum array size.
+ * @throws IOException If an I/O error occurs while reading.
+ * @throws NullPointerException If {@code inputStream} is {@code null}.
*/
public static byte[] toByteArray(final InputStream inputStream) throws
IOException {
- // We use a ThresholdingOutputStream to avoid reading AND writing more
than Integer.MAX_VALUE.
- try (UnsynchronizedByteArrayOutputStream ubaOutput =
UnsynchronizedByteArrayOutputStream.builder().get();
- ThresholdingOutputStream thresholdOutput = new
ThresholdingOutputStream(Integer.MAX_VALUE, os -> {
- throw new IllegalArgumentException(String.format("Cannot
read more than %,d into a byte array", Integer.MAX_VALUE));
- }, os -> ubaOutput)) {
- copy(inputStream, thresholdOutput);
- return ubaOutput.toByteArray();
+ final UnsynchronizedByteArrayOutputStream output =
copyToOutputStream(inputStream, SOFT_MAX_ARRAY_LENGTH + 1, DEFAULT_BUFFER_SIZE);
+ if (output.size() > SOFT_MAX_ARRAY_LENGTH) {
+ throw new IllegalArgumentException(String.format("Cannot read more
than %,d into a byte array", SOFT_MAX_ARRAY_LENGTH));
}
+ return output.toByteArray();
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}. Use
this method instead of
- * {@link #toByteArray(InputStream)} when {@link InputStream} size is
known.
+ * Reads exactly {@code size} bytes from the given {@link InputStream}
into a new {@code byte[]}.
*
- * @param input the {@link InputStream} to read.
- * @param size the size of {@link InputStream} to read, where 0 <
{@code size} <= length of input stream.
- * @return byte [] of length {@code size}.
- * @throws IOException if an I/O error occurs or {@link InputStream}
length is smaller than parameter {@code size}.
- * @throws IllegalArgumentException if {@code size} is less than zero.
+ * <p>This variant provides no safeguards against allocating very large
arrays.
+ * For large streams, prefer {@link #toByteArray(InputStream, int, int)},
+ * which enforces stricter memory usage constraints.</p>
+ *
+ * @param input the {@link InputStream} to read; must not be {@code null}.
+ * @param size the exact number of bytes to read; must be {@code >= 0}.
+ * @return a new byte array of length {@code size}.
+ * @throws IllegalArgumentException if {@code size} is negative.
+ * @throws EOFException if the stream ends before {@code size}
bytes are read.
+ * @throws IOException if an I/O error occurs while reading.
+ * @throws NullPointerException if {@code input} is {@code null}.
* @since 2.1
*/
- @SuppressWarnings("resource")
public static byte[] toByteArray(final InputStream input, final int size)
throws IOException {
- return toByteArray(Objects.requireNonNull(input, "input")::read, size);
+ return toByteArray(input::read, size);
}
/**
- * Gets contents of an {@link InputStream} as a {@code byte[]}.
- * Use this method instead of {@link #toByteArray(InputStream)}
- * when {@link InputStream} size is known.
- * <strong>NOTE:</strong> the method checks that the length can safely be
cast to an int without truncation
- * before using {@link IOUtils#toByteArray(InputStream, int)} to read into
the byte array.
- * (Arrays can have no more than Integer.MAX_VALUE entries anyway.)
+ * Reads exactly {@code size} bytes from the given {@link InputStream}
into a new {@code byte[]}.
*
- * @param input the {@link InputStream} to read.
- * @param size the size of {@link InputStream} to read, where 0 <
{@code size} <= min(Integer.MAX_VALUE, length of input stream).
- * @return byte [] the requested byte array, of length {@code size}.
- * @throws IOException if an I/O error occurs or {@link
InputStream} length is less than {@code size}.
- * @throws IllegalArgumentException if size is less than zero or size is
greater than Integer.MAX_VALUE.
- * @see IOUtils#toByteArray(InputStream, int)
+ * <p>This variant provides no safeguards against allocating very large
arrays.
Review Comment:
Why not? If size is small it allocates no large array?
##########
src/main/java/org/apache/commons/io/IOUtils.java:
##########
@@ -2637,57 +2645,61 @@ public static BufferedReader toBufferedReader(final
Reader reader, final int siz
}
/**
- * Gets the contents of an {@link InputStream} as a {@code byte[]}.
- * <p>
- * This method buffers the input internally, so there is no need to use a
{@link BufferedInputStream}.
- * </p>
+ * Reads all the bytes from an input stream in a byte array.
*
- * @param inputStream the {@link InputStream} to read.
- * @return the requested byte array.
- * @throws NullPointerException if the InputStream is {@code null}.
- * @throws IOException if an I/O error occurs or reading more
than {@link Integer#MAX_VALUE} occurs.
+ * <p>The memory used by this method is <strong>proportional</strong> to
the number
+ * of bytes read, which is only limited by {@link Integer#MAX_VALUE}. This
makes it unsuitable for
+ * processing large input streams, unless sufficient heap space is
available.</p>
+ *
+ * @param inputStream The {@link InputStream} to read; must not be {@code
null}.
+ * @return A new byte array.
+ * @throws IllegalArgumentException If the size of the stream is greater
than the maximum array size.
Review Comment:
IAE is a bit strange but since it’s the existing APi it’s fine to keep that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]