Dear Apache PDFBox Maintainers, I am writing to follow up on our previous report sent to the security mailing list regarding the directory validation logic in ExtractEmbeddedFiles. We have not yet received a response, so we submitted a pull request to make the issue easier to review and address:
https://github.com/apache/pdfbox/pull/427 The PR refines the directory validation check to avoid prefix-based path matching issues in the example code. Additionally, considering Apache’s security handling practices, we intentionally kept both the commit message and PR description neutral, without explicitly referring to any vulnerability, to align with your preferred process. We would greatly appreciate it if you could take a look when convenient and let us know if any revisions are needed. Best regards, Kaixuan On Sun, Mar 15, 2026 at 8:15 PM Kaixuan Li <[email protected]> wrote: > Dear Apache Security Team, > > I am Kaixuan LI, and we found fix commit for CVE-2026-23907 ( > https://github.com/apache/pdfbox/commit/b028eafdf101b58e4ee95430c3be25e3e3aa29d7) > in ExtractEmbeddedFiles.java within *PDFBox *can be bypassed: > > This fix uses parentDir.getCanonicalPath().startsWith(directoryPath) to > prevent path traversal. However, directoryPath does not end with a path > separator, so String.startsWith() can be tricked by sibling directories > sharing the same name prefix (CWE-23). Example: > > > > * directoryPath = "/home/user/Downloads" malicious filename = > "../Downloads-evil/payload.sh" canonical parent path = > "/home/user/Downloads-evil"* > > "/home/user/Downloads-evil".startsWith("/home/user/Downloads") → true > (bypass) > > We understand CVE-2026-23907 advisory already notes that users who copied > it should review their extraction paths. We wanted to flag this because > users who followed the advisory and adopted the same canonical-path + > startsWith pattern from the official fix may still be affected. A small > tweak to the example would help ensure it serves as a safe reference. > > Our suggested fix: > > ```diff > --- > a/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java > +++ > b/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java > @@ -141,7 +141,9 @@ > { > File file = new File(directoryPath, filename); > File parentDir = file.getParentFile(); > - if (!parentDir.getCanonicalPath().startsWith(directoryPath)) > + String parentCanonical = parentDir.getCanonicalPath(); > + if (!parentCanonical.equals(directoryPath) > + && !parentCanonical.startsWith(directoryPath + File.separator)) > { > System.err.println("Ignoring " + filename + " (different directory)"); > return; > ``` > > We have verified this fix passes the existing TestEmbeddedFiles (i.e., > *examples/src/test/java/org/apache/pdfbox/examples/pdmodel/TestEmbeddedFiles.java*) > test suite via *mvn test -pl examples > -Dtest="org.apache.pdfbox.examples.pdmodel.TestEmbeddedFiles". * > > Best regards, > Kaixuan >
