The recent changes in Daffodil to include/import resolution was done with the intention that relative paths were only expected only be found in the same jar, and only absolute paths would look on the classpath. This was done to make it for very simple logic to be used to transform schemas to a "flattened" structure needed for some validation tools.

Schemas aren't yet using this convention, and it's only marked as deprecated for now, but shemas should start switching so that this flattening schemas can be done very simply.

Instead of feeding the new resource path (e.g. /d1/quux.dfdl.xsd) into getResource, can you strip off everything after the exclamation in the original jar path, append that new resource path, and use that as the new URI? This way there is no resource look up at all for relative paths.



On 2023-09-25 12:13 PM, Mike Beckerle wrote:
As of Java 20, the constructors for Java.net.URL have all been deprecated.

This has some implications for us. The matter is rather subtle, so bear
with me.

We use URL constructors so that a non-hierarchical URL, such as
jar:file:/foo.jar!/d1/d2/baz.dfdl.xsd can resolve a relative path such as
"../quux.dfdl.xsd" coming from an include/import statement, and the result
is jar:file:/foo.jar!/d1/quux.dfdl.xsd.

In Daffodil that means the relative search for this quux file does NOT go
back to the classpath, but will search for quux.dfdl.xsd at a specific
location in THE SAME EXACT JAR FILE that the baz.dfdl.xsd was found in.

For some reason having to do with some Internet RFC about non-hierarchical
URIs having to be opaque, the URL constructors were all deprecated.

I see no replacement available for this functionality. I found no advice
other than "don't use URLs, use URIs" online, but non-hierarchical URIs do
not support resolving.

This means the only way to get a jar file URL is to go "back to the well",
i.e., call getResource() again which searches the class path.

So, continuing our example, some call to getResource() created our original
non-hierarchical URI, jar:file:/foo.jar!/d1/d2/baz.dfdl.xsd, by searching
the class path.

Since they don't allow operations on jar:file:... URIs/URLs at all, you
have to split the string representation at the "!" point (by recognizing
the scheme is "jar", hence you know a "!" will be there). That gives you
the /d1/d2/baz.dfdl.xsd separately, which you can convert to a URI, then
resolve against the "../quux.dfdl.xsd" to get "/d1/quux.dfdl.xsd"

That string must then be fed back to getResource("/d1/quux.dfdl.xd") to get
the jar file URL that can actually be opened.

But, this has a different meaning. The jar file containing the
quux.dfdl.xsd file does NOT have to be the same one where baz.dfdl.xsd
originally was found. A different jar file earlier on the class path, can
contain the quux.dfdl.xsd file, and effectively overrides the one in the
original jar.

There is an analogy here to object oriented programming. If you can require
"quux.dfdl.xsd" to be in the same exact jar file, that's analogous to a
final method. It will be used from the original jar file. No other jar file
can override this.

If "quux.dfdl.xsd" can come from any jar on the classpath, that's like just
a non-final public method. Other things on the classpath can override it.

Right now, from the research I have done, I know of no way to achieve the
"exact same jar file" behavior in Java 20+. They've basically made it
impossible, save implementing the entirety of java.net over again.

My stack-overflow question is here:
https://stackoverflow.com/questions/77174224/no-replacement-for-url-2-arg-constructor-non-hierarchical-jar-url-operations-in

As you can imagine, this is a hard topic to do a precise web search on, and
ChatGPT4 has been useless regarding this also (Java 20 is too new for it)
in terms of finding anything useful online that addresses this.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com


Reply via email to