> Stephen Crawley <[EMAIL PROTECTED]> writes: > > > While the complete URI grammar looks a complex, a URI string typically > > doesn't need to be fully parsed. You only need to fully parse the > > components that are requested. > > I think you are wrong in this, the URI parser should accept *only* > valid URIs and not valid components within an URI. But I cannot be > sure untill I run some tests against sun's sdk. If what you said was > the case, then the regex-based, already submitted patch would be fine.
Here's an example. The getRawPath() method returns the path part of the URI with escaping in place. The getPath() method decodes the path and returns that, throwing an exception if the encoding is wrong. This suggests to me that URI(String) should not attempt to parse the escape sequences. > > Note that the JDK 1.4 spec for the URI(String) constructor states > > that its parsing more relaxed than the BNF in RFC 2396. How relaxed > > it is can only be determined by black box testing against the JDK 1.4 > > implementation. If I was doing this, my first step would be to build > > some extensive Mauve test cases ... > > I'm consulting JDK 1.4.1 API documentation, which does not state that > URI parser's grammar is more relaxed. On the contrary the assertions > made in URI(String) cover some implications within the RFC in > question which are not depicted in the BNF grammar. I think we are mostly saying the same thing; i.e. the BNF in RFC 2396 is not complete. However, the Sun people who wrote the javadoc seem to be implying that the RFC 2396 spec is (at least) ambiguous on the points in which URI(String) "deviates". Also doesn't the last deviation allow URI(String) to handle URIs that contain unescaped Unicode in some components? Isn't this a substantive extension (relaxation) of RFC 2396? > > I'd recommend hand building a pure Java parser. That way, the Classpath > > build process doesn't depend on an external parser or lexer generator, > > and the source code will be easier to understand. > > We could include the generated files (as Brian also noted) and avoid > the exotic dependencies. For the record, including generated code in Classpath without integrating the tools that generate would present maintenance problems. You DO need the tools if you are going to change the parser ... unless you are mad enough to try hand patch the parser tables. Obviously, if the grammar you are trying to implement is sufficiently complex, these issues would be minor compared with the difficulty of implementing an efficient parser by hand. > As for the "posible undocumented deviations", they would be bugs (or > features ;-)). I think we shouldn't rely on these at all. I disagree. According to Sun, in cases where the implementation and javadoc disagree, the former represents the conformance point. Each place where Classpath doesn't conform to the JDK behaviour represents a potential problem for someone trying to port Java applications between the Sun and Classpath implementations of the JRE. -- Steve _______________________________________________ Classpath mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/classpath

