Hi,

Thank you for finding this... however the dev list is public. We didn't get the mail to security yet.

There's one thing I need to test when I have more time (is the separator the same on windows than the one in the canonical path), but your fix is likely good :-)

Tilman

Am 21.03.2026 um 11:24 schrieb Kaixuan Li:
Dear Apache PDFBox Maintainers,

I am writing to follow up on our previous report sent to the security
mailing list regarding the directory validation logic in
ExtractEmbeddedFiles. We have not yet received a response, so we submitted
a pull request to make the issue easier to review and address:

https://github.com/apache/pdfbox/pull/427

The PR refines the directory validation check to avoid prefix-based path
matching issues in the example code.

Additionally, considering Apache’s security handling practices, we
intentionally kept both the commit message and PR description neutral,
without explicitly referring to any vulnerability, to align with your
preferred process.

We would greatly appreciate it if you could take a look when convenient and
let us know if any revisions are needed.

Best regards,

Kaixuan

On Sun, Mar 15, 2026 at 8:15 PM Kaixuan Li <[email protected]> wrote:

Dear Apache Security Team,

I am Kaixuan LI, and we found fix commit for CVE-2026-23907 (
https://github.com/apache/pdfbox/commit/b028eafdf101b58e4ee95430c3be25e3e3aa29d7)
in ExtractEmbeddedFiles.java within *PDFBox *can be bypassed:

This fix uses parentDir.getCanonicalPath().startsWith(directoryPath) to
prevent path traversal. However, directoryPath does not end with a path
separator, so String.startsWith() can be tricked by sibling directories
sharing the same name prefix (CWE-23). Example:



*  directoryPath         = "/home/user/Downloads"  malicious filename    =
"../Downloads-evil/payload.sh"  canonical parent path =
"/home/user/Downloads-evil"*

  "/home/user/Downloads-evil".startsWith("/home/user/Downloads") → true
(bypass)

We understand CVE-2026-23907 advisory already notes that users who copied
it should review their extraction paths. We wanted to flag this because
users who followed the advisory and adopted the same canonical-path +
startsWith pattern from the official fix may still be affected. A small
tweak to the example would help ensure it serves as a safe reference.

Our suggested fix:

```diff
---
a/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java
+++
b/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java
@@ -141,7 +141,9 @@
{
File file = new File(directoryPath, filename);
File parentDir = file.getParentFile();
- if (!parentDir.getCanonicalPath().startsWith(directoryPath))
+ String parentCanonical = parentDir.getCanonicalPath();
+ if (!parentCanonical.equals(directoryPath)
+ && !parentCanonical.startsWith(directoryPath + File.separator))
{
System.err.println("Ignoring " + filename + " (different directory)");
return;
```

We have verified this fix passes the existing TestEmbeddedFiles (i.e.,
*examples/src/test/java/org/apache/pdfbox/examples/pdmodel/TestEmbeddedFiles.java*)
test suite via *mvn test -pl examples
-Dtest="org.apache.pdfbox.examples.pdmodel.TestEmbeddedFiles". *

Best regards,
Kaixuan



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to