Re: [PR] Validate EMF description bounds before allocation [poi]

via GitHub Sat, 09 May 2026 05:59:43 -0700


Copilot commented on code in PR #1060:
URL: https://github.com/apache/poi/pull/1060#discussion_r3213175639



##########
poi-scratchpad/src/test/java/org/apache/poi/hemf/usermodel/TestHemfPicture.java:
##########
@@ -229,6 +232,29 @@ void testInfiniteLoopOnByteArray() throws Exception {
         }
     }
 
+    @Test
+    void testHeaderDescriptionBoundsAreValidated() throws Exception {
+        HemfHeader header = new HemfHeader();
+        byte[] data = createHeaderRecordData(0x40000001L, 88);
+
+        try (LittleEndianInputStream leis = new LittleEndianInputStream(new 
ByteArrayInputStream(data))) {
+            assertThrows(RecordFormatException.class,
+                    () -> header.init(leis, data.length, 
HemfRecordType.header.id));
+        }
+    }
+
+    @Test
+    void testHeaderDescriptionParsesValidUtf16lePayload() throws Exception {
+        HemfHeader header = new HemfHeader();
+        byte[] data = createHeaderRecordDataWithDescription("POI");
+
+        try (LittleEndianInputStream leis = new LittleEndianInputStream(new 
ByteArrayInputStream(data))) {
+            long readSize = header.init(leis, data.length, 
HemfRecordType.header.id);
+            assertEquals(data.length, readSize);

Review Comment:
   `HemfHeader.init(...)` intentionally uses `mark()/reset()` to *peek* at the 
description without consuming it, so it typically returns the size of the fixed 
header fields (not the full buffer length when a description is present). As 
written, `assertEquals(data.length, readSize)` will fail once a description is 
appended. Either remove this assertion, or assert the expected consumed size 
(and/or explicitly skip the remaining bytes if you want to validate full 
consumption).
   



##########
poi-scratchpad/src/main/java/org/apache/poi/hemf/record/emf/HemfHeader.java:
##########
@@ -165,10 +167,17 @@ public long init(LittleEndianInputStream leis, long 
recordSize, long recordId) t
         size += readDimensionInt(leis, milliDimension);
 
         if (nDescription > 0 && offDescription > 0) {
-            int skip = (int)(offDescription - (size + HEADER_SIZE));
-            leis.mark(skip+nDescription*2);
-            leis.skipFully(skip);
-            byte[] buf = new byte[(nDescription-1)*2];
+            long skip = offDescription - (size + HEADER_SIZE);
+            long descriptionBytes = (nDescription - 1) * 
LittleEndianConsts.SHORT_SIZE;
+            long descriptionEnd = offDescription + nDescription * 
LittleEndianConsts.SHORT_SIZE;
+            if (skip < 0 || descriptionEnd > recordSize + HEADER_SIZE || skip 
+ descriptionBytes > Integer.MAX_VALUE) {
+                throw new RecordFormatException("Invalid EMF header 
description bounds");
+            }
+            int maxDescriptionLength = (int)Math.min(recordSize, 
Integer.MAX_VALUE);
+            IOUtils.safelyAllocateCheck(descriptionBytes, 
maxDescriptionLength);
+            leis.mark((int)(skip + descriptionBytes));
+            leis.skipFully((int)skip);
+            byte[] buf = IOUtils.safelyAllocate(descriptionBytes, 
maxDescriptionLength);

Review Comment:
   `maxDescriptionLength` is set to `recordSize` (capped at 
`Integer.MAX_VALUE`), which means `safelyAllocate*/check` won’t meaningfully 
limit allocation size when `recordSize` comes from a crafted EMF record header. 
This still allows multi-GB allocations (up to ~2GB) and potential OOM/DoS even 
though bounds are validated against the record’s declared size. Consider using 
a fixed, record-type-specific maximum (similar to `HemfText`’s max record 
length) or another established POI limit instead of `recordSize` here, so 
maliciously large declared record sizes can’t drive huge allocations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Validate EMF description bounds before allocation [poi]

Reply via email to