Nice, Michael, that xerces supports redirects !

There was a warning in the thread about xerces bundled in Java11, which seems to not support redirects. But I know it's an old one !

Best regards,
Christophe

Le 22/08/2022 à 00:03, Michael Glavassevich a écrit :
My first thoughts when reading this was the action [1] the W3C took against excessively accessing DTD and XML Schema documents hosted on their site. I would hope in the years since, users of XML parsers like Xerces learned a lesson and are caching these resources and using a resolver (such as an XML catalog) to load them.

As for concerns about redirects, I recall that java.net.URL supports that by default or at least it did in the pre-Oracle days of Java. I am responsible for patching Xerces’ XMLEntityManager to check if an HTTP URL was redirected and use that for resolving any resources relative to it. This worked with the current versions of Java when it was implemented and the code has not changed in the Apache version.

[1] https://www.w3.org/Help/Webmaster#block

On Aug 21, 2022, at 11:10 AM, Christophe Marchand <[email protected]> wrote:



Here a forward from [email protected], I think Xerces is concerned by this. There is an active thread on this mailing list, with archives available at https://lists.w3.org/Archives/Public/xmlschema-dev/2022Aug/

Best regards,
Christophe

    W3C's main web site https://www.w3.org/ will soon start to
    redirect all http requests to https. Will this cause issues for
    XML Schema-related resources hosted on www.w3.org?

    We announced this intended change a few weeks ago,

    [[
    W3C’s main web site www.w3.org has been available via https for
    over a decade, but until now we have not been redirecting all
    requests to https as is commonly done on most other sites.

    The primary reason for this is that we wanted to avoid causing
    issues for software requesting machine-readable resources from
    www.w3.org such as HTML DTDs, XML Schemas, and namespace documents.

    We believe enough time has passed for most such software to have
    been updated to handle redirects and https, so we are planning to
    start redirecting all requests received over http to https within
    a month or two.
    ]]
    --
    https://www.w3.org/blog/2022/07/redirecting-to-https-on-www-w3-org/

    And following an initial test of this change on August 1 we
    received some feedback that this caused issues with XML Schema
    validation. We are planning a followup test for 3 days starting
    at 14:00 UTC tomorrow, August 18.

    Some questions I have:

    Is it intended that www.w3.org is in the critical path when
    performing XML Schema validation? Are .xsd files and/or namespace
    documents retrieved each time a validation is done? Are there
    other use cases besides validation that might cause automated
    requests to www.w3.org?

    What are the most popular software packages that might be making
    these requests to www.w3.org? In what contexts do they make these
    requests? Do the latest versions typically have the ability to
    follow http to https redirects? Would XML catalogs help?

    The top UAs making requests for .xsd resources on www.w3.org are:

      127574 Java/1.8.0_121
       96712
       25860 Python-urllib/2.7
       16673 Apache-CXF/3.3.4
       16215 Zeep/4.1.0 (www.python-zeep.org)
        6481 Apache-CXF/3.2.10
        6205 Java/1.6.0_26
        4176 Java/17.0.2
        1827 Java/1.8.0_162
        1485 Python-urllib/3.7

    (1st col is the number of requests in a 90-min sample of the logs)

    Omitting version numbers:

      159765 Java
      101314
       29012 Python-urllib
       27912 Apache-CXF
       17640 Zeep
        1467 Mozilla
         623 Apache CXF
         322 sax Java
         211 Apache-HttpClient
         187 Oracle HTTPClient Version 10h
         120 node-soap
          88 SOA Model (see http:
          87 Elastic-Heartbeat
          74 python-requests
          74 curl

    Top UAs making requests matching /2001/XMLSchema :

       43290 Java
       15014 Python-urllib
        8358
        6106 ALTOVA
        3427 Mozilla
         364 Go-http-client
         130 Java1.8.0_291
          88 Zabbix
          70 WebexTeams
          66 MVision
          53 curl
          44 Baiduspider+(+http:
          42 Apache-HttpClient
          40 MapForce
          40 cubebot

    If we start redirecting http to https, will that fundamentally
    break compliance with W3C RECs that specify http: in references
    to .xsd files and namespaces? If so, which URIs would we need to
    continue to serve via http?

    Thanks,

-- Gerald Oskoboiny <[email protected]>
    http://www.w3.org/People/Gerald/
    tel:+1-604-906-1232 (mobile)

Reply via email to