Hi Giannis, --- Giannis Georgalis <[EMAIL PROTECTED]> wrote: > Hello, > > After a discussion I had with Michael Koch, I > decided to implement > the java.net.URI class. I found in the classpath > mail archives a > patch submited by Mr. Topic (I think) in which he
yes that was me. > implemented part of > the URI class using: > /** > * Regular expression for parsing URIs. > * > * Taken from RFC 2396, Appendix B. > * This expression doesn't parse IPv6 addresses. > */ > private static final String URI_REGEXP = > > "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?"; > > Appart from the fact that this expression cannot > parse IPv6 > addresses, it cannot be considered as a substitute > of an URI parser, > as it can only break up the parts of a *valid* URI. I doubt adding basic IPv6 parsing to the regexp used should pose significant problems. > For example the > uri : "http://1333.2123.232323.0.9.9~84.1" is not > valid, but can be > parsed from this regexp. You are mixing things up here. That's a valid URI. Sun's JDK 1.4.1_01 on linux prints for a trivial test program: /usr/lib/j2sdk1.4.1_01/bin/java test "http://1333.2123.232323.0.9.9~84.1" http://1333.2123.232323.0.9.9~84.1 Authority: 1333.2123.232323.0.9.9~84.1 Fragment: null Host: null Path: Port: -1 Query: null Scheme: http SchemeSpecificPart: //1333.2123.232323.0.9.9~84.1 UserInfo: null this is the test program I used: topic@clerks:~> cat test.java import java.net.*; public class test { public static void main (String [] args) { try { URI u = new URI(args[0]); printURI(u); } catch(Exception e) { e.printStackTrace(); } } public static void printURI(URI u) { System.out.println(u); System.out.println("Authority: " + u.getRawAuthority()); System.out.println("Fragment: " + u.getRawFragment()); System.out.println("Host: " + u.getHost()); System.out.println("Path: " + u.getRawPath()); System.out.println("Port: " + u.getPort()); System.out.println("Query: " + u.getRawQuery()); System.out.println("Scheme: " + u.getScheme()); System.out.println("SchemeSpecificPart: " + u.getRawSchemeSpecificPart()); System.out.println("UserInfo: " + u.getRawUserInfo()); } } Here's my question for you, as you've said you've read the URI RFCs: which section of the URI RFC does the URI you considered not valid violate? > After some digging in various RFCs I have written a > (complete) > grammar (in BNF) for parsing URIs (I'll append the > grammar at the end > of this message). That's nice. But it's overkill. You can achieve the same effect by using the regexp to separate URI components and doing some post-processing (preferably using simple regexps) on the generated Strings to ensure they contain only allowed characters, to get the port number of hierarchical URIs etc. I could have implemented URI parsing using a parser generator, but it seemed to me like the wrong solution to the problem: instead of simple regexp and 20 lines, you get a compile time dependency on a parser generator, x lines for the grammar + y lines for the generated code. I think your grammar alone is bigger than my parsing code. > So the URI parser can be implemented in either > native (c code) or > java. Implementing it in java, will be quite hard > and difficult to > maintain and keep up with potential URI changes. On > the other hand, > if it is implemented in c, it will be *very* easy to > implement and > maintain as I'll use flex and maximum parsing speed > will be > achieved. Additionally, provided that the URI > grammar is very simple, > bison (yacc) is not needed. It would be easy to > implement the URI > parser in java if jlex is used (that's another > option I'm > considering). I don't understand how implementing the URI parser in C would somehow magically make it easier to maintain than if its implemented in java. There are parser generators for java, too, as you already know. Sounds like you're comparing oranges and apples to me. That being said, feel free to reimplement URI parsing from scratch. I can understand your enthusiasm. Programming parsers can be fun, writing grammars is a nice passtime as well. I would humbly propose using my code and fixing its shortcomings, but I can't force anyone to use it ;) I know fully well that it is not a full implementation of java.net.URI (and I think I've stated that in the mail accompanying the patch), but it is a good starting point, in my opinion. It's certainly good enough to run Saxon 7.3 on kaffe ;) best regards, dalibor topic __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com _______________________________________________ Classpath mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/classpath

