+1 for trunk
-Elias
James M Snell wrote:
> Ok, so I've been looking into what is needed to allow Abdera to truly
> support IRIs as called for by the Atom spec. A week ago, the only
> viable option was to introduce a dependency on ICU, which gives us the
> unicode and IDNA support but didn't actually provide an IRI
> implementation. For that, we would have had to introduce yet another on
> something like the Jena projects IRI implementation (which uses ICU).
>
> Now, ICU is a very nice package and is pretty much THE standard for
> handling unicode in Java. The problem is that it's a very large package
> and includes a whole lot more than we actually need. (e.g. we don't
> need the calendar, collation, unicode compression, etc).
>
> So over the last week I've been working on some code to see how small of
> an implementation of the basic IRI/IDNA/Unicode stuff we could get and
> still claim compliance. While more testing is needed, I've got a jar
> that weighs in at a relatively lightweight 326.5kb and provides support
> for IRI, IDNA, Punycode, Unicode Normalization, supplementary
> characters, etc.
>
> Working with an IRI is almost identical to working with a java.net.URI.
>
> IRI iri = new IRI("http://www.詹姆斯.com/feed");
>
> System.out.println(iri.toString());
> System.out.println(iri.toASCIIString());
>
> > http://www.詹姆斯.com/feed
> > http://www.xn--8ws00zhy3a.com/feed
>
> System.out.println(iri.getHost());
> System.out.println(iri.getASCIIHost());
>
> > www.詹姆斯.com
> > www.xn--8ws00zhy3a.com
>
> IRI iri1 = new IRI("http://www.詹姆斯.com/feed");
> IRI iri2 = new IRI("http://www.xn--8ws00zhy3a.com/feed");
>
> System.out.println(iri1.equals(iri2));
> System.out.println(iri1.equivalent(iri2));
>
> > false
> > true
>
> The implementation also provides things that java's URI implementation
> doesn't. Such as scheme specific equivalent checking.
>
> There are even test cases already that, while not 100% comprehensive,
> provide fairly decent coverage based on examples given in the various
> RFC's implemented.
>
> That said...
>
> Right now, the IRI implementation depends on my Unicode implementation,
> which hasn't, of course, had anywhere near the level of testing ICU has
> had. It would be possible, however, for me to change the IRI
> implementation so that it can use either ICU or my Unicode stuff
> depending on whether ICU is in the classpath. If ICU is present, I can
> use that unicode and IDNA implementation instead of mine. It makes
> things a bit more complicated, but it's definitely something I can do.
>
> What I'm proposing is that I check in my IRI/IDNA/Unicode implementation
> and that we use it as the default impl. The code would become part of
> the parser module. After checking the code in and updating Abdera to
> use it, I'll work on enabling the automatic ICU switch.
>
> or...
>
> I create a branch of the trunk and integrate my implementation into the
> branch. We kick the tires around on it, see if it works, work on
> enabling the ICU switch and when we get both working and we're all
> comfortable with it, we merge back into the trunk.
>
> - James
>