This is an automated email from the ASF dual-hosted git repository.

ppkarwasz pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/commons-xml.git


The following commit(s) were added to refs/heads/main by this push:
     new bfa4a5a  COMMONSXML-1: DocumentBuilderFactory: capability-driven 
hardening (#5)
bfa4a5a is described below

commit bfa4a5a1155575dc7531e54f32a5dccbe1957a91
Author: Piotr P. Karwasz <[email protected]>
AuthorDate: Sat Jun 27 19:54:27 2026 +0200

    COMMONSXML-1: DocumentBuilderFactory: capability-driven hardening (#5)
    
    * Base DocumentBuilderFactory hardening on feature support
    
    This change replaces the per-implementation class-name dispatch for DOM 
with a single capability-driven recipe. Secure processing is required and the 
external-DTD subset is skipped where supported; whether the JAXP 1.5 
accessExternal properties are honoured then decides whether the bare factory is 
already safe or needs a deny-all resolver wrapper, which is the only point 
where the stock JDK and the external Xerces distribution diverge. Android stays 
untouched because its parser expose [...]
    
    An implementation is no longer rejected for being unrecognised. Any parser 
that accepts secure processing and either the access properties or the resolver 
wrapper satisfies the contract on its own, so only a parser that refuses secure 
processing now fails.
    
    A new test covers the previously unverified guarantee that an 
xsi:schemaLocation hint is not fetched during DOM-side XSD validation, gated on 
parsers that honour accessExternalSchema.
    
    Assisted-By: Claude Opus 4.8 <[email protected]>
    
    * Add minimal-footprint DocumentBuilderHardener.newInstance() entry point
    
    Expose DocumentBuilderHardener with a public newInstance() so consumers that
    only need a hardened DocumentBuilderFactory can shade the library and copy a
    minimal set of classes. jdependency (the engine maven-shade-plugin's
    minimizeJar uses) works at class granularity, so the shaded set is the
    transitive closure of this one class.
    
    To keep that closure small, install a deny-all EntityResolver as a lambda
    local to DocumentBuilderHardener instead of reusing 
Resolvers.DenyAll.ENTITY2.
    EntityResolver's resolveEntity(publicId, systemId) hook is consulted for the
    external DTD subset and every external entity, so it blocks all external
    fetches on the DOM path just as well, while dropping the whole Resolvers
    nested-class tree from the closure (12 -> 7 class files).
    
    Route XmlFactories.newDocumentBuilderFactory() through the new entry point, 
and
    add ShadingFootprintTest, which uses jdependency to pin the reachable set to
    exactly those 7 classes so the footprint cannot silently grow back toward 
the
    full library.
    
    Assisted-By: Claude Opus 4.8 (1M context) <[email protected]>
    
    * fix: keep `DelegatingDocumentBuilderFactory` a top-level class
    
    * Fix Android test compilation
    
    The androidTest source set compiles all of ../src/test/java against
    android.jar, which broke on two tests new to this branch:
    
    - ShadingFootprintTest needs org.vafer.jdependency (a Maven-only test
      dependency) and pins the shading footprint, which is meaningless on
      Android. Exclude it from the Android test compilation.
    
    - SchemaLocationDomTest referenced XMLConstants.ACCESS_EXTERNAL_SCHEMA,
      which android.jar does not expose. Inline the constant's value
      (verified equal to the JDK constant); the test still skips at runtime
      on Android, since the parser does not honour that property.
    
    Verified on the api33 (Pixel 6a) managed device: 77 tests, 0 failures,
    0 errors, 17 assumption-gated skips.
    
    Assisted-By: Claude Opus 4.8 (1M context) <[email protected]>
    
    * Gate SchemaLocationDomTest on schema-language support, not 
accessExternalSchema
    
    The test exercises JAXP 1.2 XSD validation, which requires the 
SCHEMA_LANGUAGE
    property; that, not accessExternalSchema, is the capability a parser must 
have
    to run it (Android lacks it). Gate both methods on supportsSchemaLanguage().
    
    The hardened assertion also assumed the block was attributed to
    accessExternalSchema, which only holds on the stock JDK. External Xerces 
does
    not honour that property and instead blocks the schema fetch through the
    deny-all resolver, with a different message. Assert only that the failure
    references the external schema, not the mechanism.
    
    Two effects:
    - The test now runs and passes on external Xerces too (covering the
      resolver-based schema-block path), where it previously skipped.
    - It no longer references XMLConstants.ACCESS_EXTERNAL_SCHEMA, so the
      android.jar compile gap is resolved by this redesign rather than by
      inlining the constant.
    
    Verified: mvn -o test green across all executions; SchemaLocationDomTest 
runs
    2/2 with 0 skips on both stock JDK and external Xerces.
    
    Assisted-By: Claude Opus 4.8 (1M context) <[email protected]>
    
    * Remove the minimal-footprint shading entry point from this PR
    
    Split per review: this PR (COMMONSXML-1) keeps only the capability-driven
    DocumentBuilderFactory hardening. The minimal shading entry point (public
    DocumentBuilderHardener.newInstance(), the inlined deny-all resolver,
    ShadingFootprintTest and the jdependency test dependency) will be filed as
    its own issue.
    
    Reverses 68ad1d1 and 90ef0ae: DocumentBuilderHardener is package-private
    again and harden() reuses Resolvers.DenyAll.ENTITY2;
    XmlFactories.newDocumentBuilderFactory() calls harden() directly.
    
    Assisted-By: Claude Opus 4.8 (1M context) <[email protected]>
    
    * fix: apply review suggestions
---
 android-tests/build.gradle.kts                     |   1 +
 .../org/apache/commons/xml/AndroidProvider.java    |   4 -
 .../commons/xml/DocumentBuilderHardener.java       | 103 +++++++++++++++++++
 .../java/org/apache/commons/xml/JaxpSetters.java   |  22 ++++
 src/main/java/org/apache/commons/xml/Limits.java   |  30 +++++-
 .../org/apache/commons/xml/StockJdkProvider.java   |  14 ---
 .../org/apache/commons/xml/XercesProvider.java     |  53 +---------
 .../java/org/apache/commons/xml/XmlFactories.java  |  24 +----
 .../apache/commons/xml/SchemaLocationDomTest.java  | 112 +++++++++++++++++++++
 .../xml/UnsupportedXmlImplementationTest.java      |  14 +--
 .../resources/leaked/schema-location-instance.xml  |   4 +
 src/test/resources/leaked/schema-location.xsd      |  10 ++
 12 files changed, 294 insertions(+), 97 deletions(-)

diff --git a/android-tests/build.gradle.kts b/android-tests/build.gradle.kts
index 31e4161..a8938b9 100644
--- a/android-tests/build.gradle.kts
+++ b/android-tests/build.gradle.kts
@@ -16,6 +16,7 @@
  */
 
 import com.android.build.api.dsl.ManagedVirtualDevice
+import org.gradle.api.tasks.compile.JavaCompile
 
 plugins {
     id("com.android.library") version "8.6.1"
diff --git a/src/main/java/org/apache/commons/xml/AndroidProvider.java 
b/src/main/java/org/apache/commons/xml/AndroidProvider.java
index c71aacb..e32f1ee 100644
--- a/src/main/java/org/apache/commons/xml/AndroidProvider.java
+++ b/src/main/java/org/apache/commons/xml/AndroidProvider.java
@@ -172,10 +172,6 @@ public void setFeature(final String name, final boolean 
value) throws SAXNotReco
 
     private static final String NAMESPACE_PREFIXES_FEATURE = 
"http://xml.org/sax/features/namespace-prefixes";;
 
-    static DocumentBuilderFactory configure(final DocumentBuilderFactory 
factory) {
-        return factory;
-    }
-
     static SAXParserFactory configure(final SAXParserFactory factory) {
         return new GuardedSAXParserFactory(factory);
     }
diff --git a/src/main/java/org/apache/commons/xml/DocumentBuilderHardener.java 
b/src/main/java/org/apache/commons/xml/DocumentBuilderHardener.java
new file mode 100644
index 0000000..218e771
--- /dev/null
+++ b/src/main/java/org/apache/commons/xml/DocumentBuilderHardener.java
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.commons.xml;
+
+import static org.apache.commons.xml.JaxpSetters.setFeature;
+import static org.apache.commons.xml.JaxpSetters.setOptionalFeature;
+import static org.apache.commons.xml.JaxpSetters.trySetAttribute;
+
+import javax.xml.XMLConstants;
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
+import javax.xml.parsers.ParserConfigurationException;
+
+import org.xml.sax.EntityResolver;
+
+/**
+ * Capability-driven hardening for any {@link DocumentBuilderFactory} on the 
classpath.
+ *
+ * <p>Rather than branching on the implementation class, {@link 
#harden(DocumentBuilderFactory)} probes what the factory supports and 
adapts:</p>
+ * <ul>
+ *     <li><strong>Android</strong> (Harmony / KXmlParser): recognised by 
class name and left untouched. It exposes no {@link 
XMLConstants#FEATURE_SECURE_PROCESSING
+ *         FSP}, no JAXP 1.5 {@code ACCESS_EXTERNAL_*} and no attribute API at 
all, while KXmlParser silently drops user-defined entities, so there is nothing 
to
+ *         apply.</li>
+ *     <li><strong>FSP</strong>: required. It switches on the implementation's 
built-in security manager, which is what carries the processing limits.</li>
+ *     <li><strong>{@code XERCES_LOAD_EXTERNAL_DTD}</strong>: optional. Where 
supported, it skips the external DTD subset on non-validating parsers so a
+ *         DOCTYPE-only document parses without a fetch attempt. If not 
supported, the fetch will throw instead, due to the following settings.</li>
+ *     <li><strong>Limits</strong>: applied best-effort by {@link 
Limits#tryApply(DocumentBuilderFactory)}, which adapts to the JDK attribute 
limits or Xerces'
+ *         {@code SecurityManager} as appropriate.</li>
+ *     <li><strong>{@code ACCESS_EXTERNAL_DTD}</strong>: the dividing 
capability. Implementations that honour it (the JDK-internal Xerces) block 
external fetches
+ *         through the JAXP 1.5 properties and are returned as-is. 
Implementations that reject it (the external Xerces distribution) are wrapped 
so a deny-all
+ *         {@link EntityResolver} is installed on every {@link 
DocumentBuilder} produced.</li>
+ * </ul>
+ */
+final class DocumentBuilderHardener {
+
+    /**
+     * Wrapper that sets a deny-all {@link EntityResolver} on every {@link 
DocumentBuilder} produced.
+     *
+     * <p>Required for implementations that do not honour JAXP 1.5 {@code 
ACCESS_EXTERNAL_*} (the external Xerces distribution): the factory carries no 
resolver
+     * of its own, so it has to be set on each builder.</p>
+     */
+    private static final class HardeningDocumentBuilderFactory extends 
DelegatingDocumentBuilderFactory {
+
+        private final EntityResolver resolver;
+
+        HardeningDocumentBuilderFactory(final DocumentBuilderFactory delegate, 
final EntityResolver resolver) {
+            super(delegate);
+            this.resolver = resolver;
+        }
+
+        @Override
+        public DocumentBuilder newDocumentBuilder() throws 
ParserConfigurationException {
+            final DocumentBuilder builder = super.newDocumentBuilder();
+            builder.setEntityResolver(resolver);
+            return builder;
+        }
+    }
+
+    /** Class name of Android's Harmony-based {@link DocumentBuilderFactory}, 
which exposes no hardening surface. */
+    private static final String ANDROID_DOCUMENT_BUILDER_FACTORY = 
"org.apache.harmony.xml.parsers.DocumentBuilderFactoryImpl";
+
+    /** Xerces feature: load the external DTD subset for non-validating 
parsers. */
+    private static final String XERCES_LOAD_EXTERNAL_DTD = 
"http://apache.org/xml/features/nonvalidating/load-external-dtd";;
+
+    static DocumentBuilderFactory harden(final DocumentBuilderFactory factory) 
{
+        // Android exposes no FSP, ACCESS_EXTERNAL_* or attribute API, and 
KXmlParser drops user-defined entities; nothing to apply.
+        if 
(ANDROID_DOCUMENT_BUILDER_FACTORY.equals(factory.getClass().getName())) {
+            return factory;
+        }
+        // Required: enables the implementation's security manager, which 
carries the limits.
+        setFeature(factory, XMLConstants.FEATURE_SECURE_PROCESSING, true);
+        // Optional, implementation-based: JDK attribute limits or Xerces' 
SecurityManager.
+        Limits.tryApply(factory);
+        // Optional: skip the external DTD subset on non-validating parsers so 
DOCTYPE-only documents parse without a blocked fetch attempt.
+        setOptionalFeature(factory, XERCES_LOAD_EXTERNAL_DTD, false);
+        // ACCESS_EXTERNAL_* support is the dividing capability between JAXP 
1.5 implementations and older ones.
+        if (trySetAttribute(factory, XMLConstants.ACCESS_EXTERNAL_DTD, "")
+                && trySetAttribute(factory, 
XMLConstants.ACCESS_EXTERNAL_SCHEMA, "")) {
+            // Honoured: the JAXP 1.5 properties block external fetches, so 
the bare factory is already hardened.
+            return factory;
+        }
+        // Rejected: external Xerces ignores ACCESS_EXTERNAL_*; install a 
deny-all resolver on every DocumentBuilder.
+        return new HardeningDocumentBuilderFactory(factory, 
Resolvers.DenyAll.ENTITY2);
+    }
+
+    private DocumentBuilderHardener() {
+    }
+}
diff --git a/src/main/java/org/apache/commons/xml/JaxpSetters.java 
b/src/main/java/org/apache/commons/xml/JaxpSetters.java
index 55c7f62..e4ffa05 100644
--- a/src/main/java/org/apache/commons/xml/JaxpSetters.java
+++ b/src/main/java/org/apache/commons/xml/JaxpSetters.java
@@ -55,6 +55,20 @@ static void setAttribute(final DocumentBuilderFactory 
factory, final String attr
         apply(factory, "attribute", attribute, () -> 
factory.setAttribute(attribute, value));
     }
 
+    /** @return {@code true} if the attribute was applied, {@code false} if 
the implementation rejected it. */
+    static boolean trySetAttribute(final DocumentBuilderFactory factory, final 
String attribute, final Object value) {
+        try {
+            factory.setAttribute(attribute, value);
+            return true;
+        } catch (final Exception e) {
+            return false;
+        }
+    }
+
+    static void setOptionalAttribute(final DocumentBuilderFactory factory, 
final String attribute, final Object value) {
+        trySetAttribute(factory, attribute, value);
+    }
+
     static void setAttribute(final TransformerFactory factory, final String 
attribute, final Object value) {
         apply(factory, "attribute", attribute, () -> 
factory.setAttribute(attribute, value));
     }
@@ -63,6 +77,14 @@ static void setFeature(final DocumentBuilderFactory factory, 
final String featur
         apply(factory, "feature", feature, () -> factory.setFeature(feature, 
value));
     }
 
+    static void setOptionalFeature(final DocumentBuilderFactory factory, final 
String feature, final boolean value) {
+        try {
+            factory.setFeature(feature, value);
+        } catch (final Exception e) {
+            // Ignored: the implementation does not recognise this feature.
+        }
+    }
+
     static void setFeature(final SAXParserFactory factory, final String 
feature, final boolean value) {
         apply(factory, "feature", feature, () -> factory.setFeature(feature, 
value));
     }
diff --git a/src/main/java/org/apache/commons/xml/Limits.java 
b/src/main/java/org/apache/commons/xml/Limits.java
index c1f9a8f..03ff250 100644
--- a/src/main/java/org/apache/commons/xml/Limits.java
+++ b/src/main/java/org/apache/commons/xml/Limits.java
@@ -17,6 +17,7 @@
 package org.apache.commons.xml;
 
 import static org.apache.commons.xml.JaxpSetters.setAttribute;
+import static org.apache.commons.xml.JaxpSetters.setOptionalAttribute;
 import static org.apache.commons.xml.JaxpSetters.setProperty;
 
 import java.util.Collections;
@@ -216,6 +217,10 @@ final class Limits {
      * Woodstox property: maximum number of entity expansions in a single 
parse.
      */
     private static final String WSTX_MAX_ENTITY_COUNT = 
"com.ctc.wstx.maxEntityCount";
+    /**
+     * Class name of the external Apache Xerces {@link 
DocumentBuilderFactory}, whose limits live on a {@code SecurityManager} rather 
than JDK attributes.
+     */
+    private static final String EXTERNAL_XERCES_DOCUMENT_BUILDER_FACTORY = 
"org.apache.xerces.jaxp.DocumentBuilderFactoryImpl";
 
     static {
         final Map<String, IntSupplier> map = new LinkedHashMap<>();
@@ -231,12 +236,23 @@ final class Limits {
     }
 
     /**
-     * Sets every JDK-supported limit on a stock JDK {@link 
DocumentBuilderFactory}.
+     * Best-effort application of the processing limits to a {@link 
DocumentBuilderFactory}, dispatched on the implementation.
+     *
+     * <p>External Xerces carries its limits on an {@code 
org.apache.xerces.util.SecurityManager} instance. Every other implementation 
(the stock JDK and any
+     * future attribute-based parser) takes the JDK limit attributes. Neither 
path throws if the implementation declines a limit.</p>
      *
      * @param factory The target factory to modify.
      */
-    static void applyToJdkDom(final DocumentBuilderFactory factory) {
-        JDK_LIMITS.forEach((name, supplier) -> setAttribute(factory, name, 
Integer.toString(supplier.getAsInt())));
+    static void tryApply(final DocumentBuilderFactory factory) {
+        if 
(EXTERNAL_XERCES_DOCUMENT_BUILDER_FACTORY.equals(factory.getClass().getName())) 
{
+            // Install a fresh SecurityManager pinned to JDK 25 limits, 
replacing Xerces' built-in caps which are looser than even JDK 8.
+            final Object securityManager = newSecurityManager();
+            applyToXerces(securityManager);
+            setAttribute(factory, 
XercesProvider.XERCES_SECURITY_MANAGER_PROPERTY, securityManager);
+            return;
+        }
+        // Pin the JDK attribute limits to JDK 25 secure values; skip silently 
any attribute the implementation does not recognise.
+        JDK_LIMITS.forEach((name, supplier) -> setOptionalAttribute(factory, 
name, Integer.toString(supplier.getAsInt())));
     }
 
     /**
@@ -304,6 +320,14 @@ static void applyToXerces(final Object securityManager) {
         }
     }
 
+    private static Object newSecurityManager() {
+        try {
+            return 
Class.forName("org.apache.xerces.util.SecurityManager").getDeclaredConstructor().newInstance();
+        } catch (final ReflectiveOperationException e) {
+            throw new HardeningException("Failed to instantiate 
org.apache.xerces.util.SecurityManager; expected Xerces to be on the 
classpath", e);
+        }
+    }
+
     private static int getElementAttributeLimit() {
         return read(SP_ELEMENT_ATTRIBUTE_LIMIT, 
DEFAULT_ELEMENT_ATTRIBUTE_LIMIT);
     }
diff --git a/src/main/java/org/apache/commons/xml/StockJdkProvider.java 
b/src/main/java/org/apache/commons/xml/StockJdkProvider.java
index fa31557..99a7d8d 100644
--- a/src/main/java/org/apache/commons/xml/StockJdkProvider.java
+++ b/src/main/java/org/apache/commons/xml/StockJdkProvider.java
@@ -22,7 +22,6 @@
 import static org.apache.commons.xml.JaxpSetters.setProperty;
 
 import javax.xml.XMLConstants;
-import javax.xml.parsers.DocumentBuilderFactory;
 import javax.xml.parsers.SAXParserFactory;
 import javax.xml.stream.XMLInputFactory;
 import javax.xml.transform.TransformerFactory;
@@ -72,19 +71,6 @@ final class StockJdkProvider {
      */
     private static final String ZEPHYR_IGNORE_EXTERNAL_DTD = 
"http://java.sun.com/xml/stream/properties/ignore-external-dtd";;
 
-    static DocumentBuilderFactory configure(final DocumentBuilderFactory 
factory) {
-        // Required: enables the JDK XMLSecurityManager limits.
-        setFeature(factory, XMLConstants.FEATURE_SECURE_PROCESSING, true);
-        // Let DOCTYPE-only documents parse silently without SSRF: skip the 
external DTD subset on non-validating parsers.
-        setFeature(factory, XERCES_LOAD_EXTERNAL_DTD, false);
-        // Defense-in-depth: pin to JDK 25 limits so older JDKs do not fall 
back to looser secure values.
-        Limits.applyToJdkDom(factory);
-        // Defense-in-depth: already FSP-secure defaults, set explicitly so 
they are not relaxed via system property.
-        setAttribute(factory, XMLConstants.ACCESS_EXTERNAL_DTD, "");
-        setAttribute(factory, XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
-        return factory;
-    }
-
     static SAXParserFactory configure(final SAXParserFactory factory) {
         // Required: enables the JDK XMLSecurityManager limits.
         setFeature(factory, XMLConstants.FEATURE_SECURE_PROCESSING, true);
diff --git a/src/main/java/org/apache/commons/xml/XercesProvider.java 
b/src/main/java/org/apache/commons/xml/XercesProvider.java
index 7b89be1..4634c9f 100644
--- a/src/main/java/org/apache/commons/xml/XercesProvider.java
+++ b/src/main/java/org/apache/commons/xml/XercesProvider.java
@@ -20,9 +20,6 @@
 import static org.apache.commons.xml.JaxpSetters.setFeature;
 
 import javax.xml.XMLConstants;
-import javax.xml.parsers.DocumentBuilder;
-import javax.xml.parsers.DocumentBuilderFactory;
-import javax.xml.parsers.ParserConfigurationException;
 import javax.xml.parsers.SAXParser;
 import javax.xml.parsers.SAXParserFactory;
 import javax.xml.validation.Schema;
@@ -30,7 +27,6 @@
 import javax.xml.validation.Validator;
 import javax.xml.validation.ValidatorHandler;
 
-import org.xml.sax.EntityResolver;
 import org.xml.sax.SAXNotRecognizedException;
 import org.xml.sax.SAXNotSupportedException;
 import org.xml.sax.XMLReader;
@@ -39,7 +35,7 @@
  * Hardening recipes for the external Apache Xerces distribution (the {@code 
xerces:xercesImpl} artifact).
  *
  * <p>Factory classes live in the {@code org.apache.xerces.*} package. 
External Xerces does not ship a {@code TransformerFactory}, {@code 
XMLInputFactory} or
- * {@code XPathFactory}, so this class only handles DOM, SAX and Schema 
factories.</p>
+ * {@code XPathFactory}, so this class only handles SAX and Schema factories. 
DOM hardening lives in {@link DocumentBuilderHardener}.</p>
  *
  * <p>Hardening recipe applied to every factory below uses the same building 
blocks:</p>
  * <ul>
@@ -53,8 +49,7 @@
  *         {@code ACCESS_EXTERNAL_*} properties, so an explicit resolver 
installed on every parser/validator is the best way to block external
  *         entity, DTD and schema fetching, without disabling those features 
altogether. The wrappers exist for two reasons:</p>
  *         <ol>
- *             <li>{@link DocumentBuilderFactory} / {@link SAXParserFactory} 
carry no resolver, so it has to be set on each
- *             {@link DocumentBuilder} / {@link SAXParser} produced;</li>
+ *             <li>{@link SAXParserFactory} carries no resolver, so it has to 
be set on each {@link SAXParser} produced.</li>
  *             <li>Xerces' {@link Schema} does not propagate the {@link 
SchemaFactory}'s resolver or security manager to its
  *             {@link Validator} / {@link ValidatorHandler} products, so the 
wrapper re-installs both on every product.</li>
  *         </ol>
@@ -63,29 +58,6 @@
  */
 final class XercesProvider {
 
-    /**
-     * Hardened Xerces {@link DocumentBuilderFactory} wrapper.
-     *
-     * <p>Sets the deny-all {@link EntityResolver} on every {@link 
DocumentBuilder} produced; required because {@link DocumentBuilderFactory} 
carries no
-     * resolver of its own and Xerces does not honour JAXP 1.5 {@code 
ACCESS_EXTERNAL_*}.</p>
-     */
-    private static final class HardeningDocumentBuilderFactory extends 
DelegatingDocumentBuilderFactory {
-
-        private final EntityResolver resolver;
-
-        HardeningDocumentBuilderFactory(final DocumentBuilderFactory delegate, 
final EntityResolver resolver) {
-            super(delegate);
-            this.resolver = resolver;
-        }
-
-        @Override
-        public DocumentBuilder newDocumentBuilder() throws 
ParserConfigurationException {
-            final DocumentBuilder builder = super.newDocumentBuilder();
-            builder.setEntityResolver(resolver);
-            return builder;
-        }
-    }
-
     private static Validator hardenValidator(final Validator validator) {
         try {
             
Limits.applyToXerces(validator.getProperty(XERCES_SECURITY_MANAGER_PROPERTY));
@@ -114,27 +86,6 @@ private static ValidatorHandler 
hardenValidatorHandler(final ValidatorHandler ha
     /** Xerces feature: load the external DTD subset for non-validating 
parsers. */
     private static final String XERCES_LOAD_EXTERNAL_DTD = 
"http://apache.org/xml/features/nonvalidating/load-external-dtd";;
 
-    private static Object newSecurityManager() {
-        try {
-            return 
Class.forName("org.apache.xerces.util.SecurityManager").getDeclaredConstructor().newInstance();
-        } catch (final ReflectiveOperationException e) {
-            throw new HardeningException("Failed to instantiate 
org.apache.xerces.util.SecurityManager; expected Xerces to be on the 
classpath", e);
-        }
-    }
-
-    static DocumentBuilderFactory configure(final DocumentBuilderFactory 
factory) {
-        // Required: enables Xerces' built-in SecurityManager (which is what 
carries the limits).
-        setFeature(factory, XMLConstants.FEATURE_SECURE_PROCESSING, true);
-        // Let DOCTYPE-only documents parse silently without SSRF: skip the 
external DTD subset on non-validating parsers.
-        setFeature(factory, XERCES_LOAD_EXTERNAL_DTD, false);
-        // Defense-in-depth: install a fresh SecurityManager pinned to JDK 25 
limits, replacing Xerces' built-in caps which are looser than even JDK 8.
-        final Object securityManager = newSecurityManager();
-        Limits.applyToXerces(securityManager);
-        factory.setAttribute(XERCES_SECURITY_MANAGER_PROPERTY, 
securityManager);
-        // Required: Xerces does not honour JAXP 1.5 ACCESS_EXTERNAL_*; the 
wrapper installs a deny-all resolver on every DocumentBuilder.
-        return new HardeningDocumentBuilderFactory(factory, 
Resolvers.DenyAll.ENTITY2);
-    }
-
     static SAXParserFactory configure(final SAXParserFactory factory) {
         // Required: enables Xerces' built-in SecurityManager (which is what 
carries the limits).
         setFeature(factory, XMLConstants.FEATURE_SECURE_PROCESSING, true);
diff --git a/src/main/java/org/apache/commons/xml/XmlFactories.java 
b/src/main/java/org/apache/commons/xml/XmlFactories.java
index b97c59f..6228256 100644
--- a/src/main/java/org/apache/commons/xml/XmlFactories.java
+++ b/src/main/java/org/apache/commons/xml/XmlFactories.java
@@ -73,19 +73,6 @@
  */
 public final class XmlFactories {
 
-    static DocumentBuilderFactory dispatch(final DocumentBuilderFactory 
factory) {
-        switch (factory.getClass().getName()) {
-            case 
"com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl":
-                return StockJdkProvider.configure(factory);
-            case "org.apache.harmony.xml.parsers.DocumentBuilderFactoryImpl":
-                return AndroidProvider.configure(factory);
-            case "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl":
-                return XercesProvider.configure(factory);
-            default:
-                throw noProvider(factory);
-        }
-    }
-
     private static SAXParserFactory dispatch(final SAXParserFactory factory) {
         switch (factory.getClass().getName()) {
             case 
"com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl":
@@ -197,16 +184,15 @@ public static XMLReader harden(final XMLReader reader) {
     /**
      * Returns a fresh, hardened {@link DocumentBuilderFactory}.
      *
-     * <p>Beyond the three universal guarantees on {@link XmlFactories}, 
XInclude resolution is disabled. Calling
-     * {@link DocumentBuilderFactory#setXIncludeAware(boolean) 
setXIncludeAware(true)} on the returned factory does not re-enable resolution; 
a parse that
-     * encounters an {@code xi:include} element fails.</p>
+     * <p><strong>Enabling XInclude:</strong> {@link 
DocumentBuilderFactory#setXIncludeAware(boolean) setXIncludeAware(true)} on its 
own does not make XInclude
+     * usable, because an included resource is fetched like any other external 
resource and is therefore blocked, failing the parse. A caller that genuinely
+     * wants XInclude must, in addition to enabling awareness, install a 
custom {@link org.xml.sax.EntityResolver} that permits those specific 
lookups.</p>
      *
      * @return a hardened factory.
-     * @throws IllegalStateException if the underlying JAXP implementation is 
not recognised by any bundled hardening recipe, or if the matching recipe cannot
-     *         apply its settings to it.
+     * @throws IllegalStateException if a required hardening setting cannot be 
applied to the underlying implementation.
      */
     public static DocumentBuilderFactory newDocumentBuilderFactory() {
-        return dispatch(DocumentBuilderFactory.newInstance());
+        return 
DocumentBuilderHardener.harden(DocumentBuilderFactory.newInstance());
     }
 
     /**
diff --git a/src/test/java/org/apache/commons/xml/SchemaLocationDomTest.java 
b/src/test/java/org/apache/commons/xml/SchemaLocationDomTest.java
new file mode 100644
index 0000000..8b873e3
--- /dev/null
+++ b/src/test/java/org/apache/commons/xml/SchemaLocationDomTest.java
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.commons.xml;
+
+import static org.apache.commons.xml.AttackTestSupport.LEAKED_MARKER;
+import static org.apache.commons.xml.AttackTestSupport.resourceUrl;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.junit.jupiter.api.Assumptions.assumeTrue;
+
+import javax.xml.XMLConstants;
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
+
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.w3c.dom.Document;
+import org.xml.sax.InputSource;
+import org.xml.sax.SAXException;
+import org.xml.sax.SAXParseException;
+import org.xml.sax.helpers.DefaultHandler;
+
+/**
+ * Checks that a hardened {@link DocumentBuilderFactory} performing JAXP 1.2 
XSD validation does not fetch an external schema named by an
+ * {@code xsi:noNamespaceSchemaLocation} hint in the instance document.
+ *
+ * <p>The instance is empty {@code <root/>}; the referenced schema declares a 
default {@code leak} attribute carrying {@link 
AttackTestSupport#LEAKED_MARKER}. A
+ * parser that fetches the schema inlines that default into the DOM (the 
permissive control), while a hardened parser refuses the fetch and fails the 
parse. The
+ * attribution differs by implementation (the stock JDK reports an {@code 
accessExternalSchema} / {@code schema_reference} error; external Xerces' 
deny-all
+ * resolver reports a forbidden-fetch error), so the test only asserts the 
fetch was blocked, not how.</p>
+ *
+ * <p>The test runs only where the implementation supports JAXP 1.2 
schema-language XSD validation (the stock JDK and external Xerces do; Android 
does not), so
+ * it skips on parsers without it.</p>
+ */
+@Tag("dom")
+class SchemaLocationDomTest {
+
+    /** JAXP 1.2 property selecting the schema language used by {@link 
DocumentBuilderFactory#setValidating(boolean)}. */
+    private static final String SCHEMA_LANGUAGE = 
"http://java.sun.com/xml/jaxp/properties/schemaLanguage";;
+
+    private static final String INSTANCE = "schema-location-instance.xml";
+
+    /** Name of the external schema the instance points at; both block 
mechanisms name it in the failure message. */
+    private static final String SCHEMA = "schema-location.xsd";
+
+    @Test
+    void hardenedBlocksExternalSchemaFetch() {
+        assumeTrue(supportsSchemaLanguage(), "parser does not support JAXP 1.2 
schema-language XSD validation");
+        final DocumentBuilderFactory factory = 
enableXsdValidation(XmlFactories.newDocumentBuilderFactory());
+        // The schemaLocation fetch is denied and surfaced as a fatal error 
rather than a silent fetch. The attribution is implementation-specific, so 
assert
+        // only that the failure names the external schema, not the mechanism 
(accessExternalSchema on the JDK, the deny-all resolver on external Xerces).
+        final SAXException thrown = assertThrows(SAXException.class, () -> 
parse(factory));
+        assertTrue(thrown.getMessage() != null && 
thrown.getMessage().contains(SCHEMA),
+                "Block must reference the external schema, got: " + 
thrown.getMessage());
+    }
+
+    @Test
+    void unconfiguredFetchesExternalSchema() throws Exception {
+        assumeTrue(supportsSchemaLanguage(), "parser does not support JAXP 1.2 
schema-language XSD validation");
+        final DocumentBuilderFactory factory = 
DocumentBuilderFactory.newInstance();
+        factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, false);
+        // Positive control: without hardening the external schema is fetched 
and its default attribute is inlined into the DOM.
+        final Document document = parse(enableXsdValidation(factory));
+        assertEquals(LEAKED_MARKER, 
document.getDocumentElement().getAttribute("leak"),
+                "Permissive parse should have fetched the external schema and 
inlined its default attribute.");
+    }
+
+    private static DocumentBuilderFactory enableXsdValidation(final 
DocumentBuilderFactory factory) {
+        factory.setNamespaceAware(true);
+        factory.setValidating(true);
+        factory.setAttribute(SCHEMA_LANGUAGE, 
XMLConstants.W3C_XML_SCHEMA_NS_URI);
+        return factory;
+    }
+
+    private static Document parse(final DocumentBuilderFactory factory) throws 
Exception {
+        final DocumentBuilder builder = factory.newDocumentBuilder();
+        builder.setErrorHandler(new DefaultHandler() {
+            @Override
+            public void error(final SAXParseException exception) throws 
SAXException {
+                throw exception;
+            }
+        });
+        return builder.parse(new 
InputSource(resourceUrl(INSTANCE).toString()));
+    }
+
+    private static boolean supportsSchemaLanguage() {
+        try {
+            final DocumentBuilderFactory factory = 
DocumentBuilderFactory.newInstance();
+            factory.setValidating(true);
+            factory.setAttribute(SCHEMA_LANGUAGE, 
XMLConstants.W3C_XML_SCHEMA_NS_URI);
+            return true;
+        } catch (final Exception e) {
+            return false;
+        }
+    }
+}
diff --git 
a/src/test/java/org/apache/commons/xml/UnsupportedXmlImplementationTest.java 
b/src/test/java/org/apache/commons/xml/UnsupportedXmlImplementationTest.java
index c0f304a..863fdf4 100644
--- a/src/test/java/org/apache/commons/xml/UnsupportedXmlImplementationTest.java
+++ b/src/test/java/org/apache/commons/xml/UnsupportedXmlImplementationTest.java
@@ -23,16 +23,18 @@
 
 import javax.xml.parsers.DocumentBuilder;
 import javax.xml.parsers.DocumentBuilderFactory;
+import javax.xml.parsers.ParserConfigurationException;
 
 import org.junit.jupiter.api.Test;
 
 /**
- * Verifies that an unknown factory class surfaces {@link 
IllegalStateException} with a message naming the class.
+ * Verifies that an implementation which does not honour the required 
secure-processing feature surfaces {@link IllegalStateException} with a message 
naming the
+ * class.
  */
 class UnsupportedXmlImplementationTest {
 
     /**
-     * A stand-in factory class whose fully qualified name is not matched by 
any bundled hardening recipe.
+     * A stand-in factory that rejects the secure-processing feature, like a 
JAXP implementation that does not recognise it.
      */
     public static final class FakeDocumentBuilderFactory extends 
DocumentBuilderFactory {
 
@@ -52,8 +54,8 @@ public Object getAttribute(final String name) {
         }
 
         @Override
-        public void setFeature(final String name, final boolean value) {
-            // no-op
+        public void setFeature(final String name, final boolean value) throws 
ParserConfigurationException {
+            throw new ParserConfigurationException("feature not recognised: " 
+ name);
         }
 
         @Override
@@ -63,10 +65,10 @@ public boolean getFeature(final String name) {
     }
 
     @Test
-    void dispatchRejectsUnknownFactory() {
+    void hardenRejectsUnsecurableFactory() {
         final IllegalStateException thrown = assertThrows(
                 IllegalStateException.class,
-                () -> XmlFactories.dispatch(new FakeDocumentBuilderFactory()));
+                () -> DocumentBuilderHardener.harden(new 
FakeDocumentBuilderFactory()));
         assertNotNull(thrown.getMessage());
         
assertTrue(thrown.getMessage().contains(FakeDocumentBuilderFactory.class.getName()),
                 "Exception message must name the unsupported class: " + 
thrown.getMessage());
diff --git a/src/test/resources/leaked/schema-location-instance.xml 
b/src/test/resources/leaked/schema-location-instance.xml
new file mode 100644
index 0000000..3caf442
--- /dev/null
+++ b/src/test/resources/leaked/schema-location-instance.xml
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+      xsi:noNamespaceSchemaLocation="schema-location.xsd"/>
diff --git a/src/test/resources/leaked/schema-location.xsd 
b/src/test/resources/leaked/schema-location.xsd
new file mode 100644
index 0000000..e40ca40
--- /dev/null
+++ b/src/test/resources/leaked/schema-location.xsd
@@ -0,0 +1,10 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>
+  <xs:element name="root">
+    <xs:complexType>
+      <!-- Default attribute value; a schema-validating parser inlines it into 
the DOM only if it fetched this external schema. -->
+      <xs:attribute name="leak" type="xs:string" default="All your base are 
belong to us"/>
+    </xs:complexType>
+  </xs:element>
+</xs:schema>


Reply via email to