Re: Abdera and IRIs

Elias Torres Thu, 21 Sep 2006 14:46:53 -0700

+1 for trunk

-Elias


James M Snell wrote:
> Ok, so I've been looking into what is needed to allow Abdera to truly
> support IRIs as called for by the Atom spec.  A week ago, the only
> viable option was to introduce a dependency on ICU, which gives us the
> unicode and IDNA support but didn't actually provide an IRI
> implementation.  For that, we would have had to introduce yet another on
> something like the Jena projects IRI implementation (which uses ICU).
> 
> Now, ICU is a very nice package and is pretty much THE standard for
> handling unicode in Java.  The problem is that it's a very large package
> and includes a whole lot more than we actually need.  (e.g. we don't
> need the calendar, collation, unicode compression, etc).
> 
> So over the last week I've been working on some code to see how small of
> an implementation of the basic IRI/IDNA/Unicode stuff we could get and
> still claim compliance.  While more testing is needed, I've got a jar
> that weighs in at a relatively lightweight 326.5kb and provides support
> for IRI, IDNA, Punycode, Unicode Normalization, supplementary
> characters, etc.
> 
> Working with an IRI is almost identical to working with a java.net.URI.
> 
>   IRI iri = new IRI("http://www.詹姆斯.com/feed";);
> 
>   System.out.println(iri.toString());
>   System.out.println(iri.toASCIIString());
> 
>   > http://www.詹姆斯.com/feed
>   > http://www.xn--8ws00zhy3a.com/feed
> 
>   System.out.println(iri.getHost());
>   System.out.println(iri.getASCIIHost());
> 
>   > www.詹姆斯.com
>   > www.xn--8ws00zhy3a.com
> 
>   IRI iri1 = new IRI("http://www.詹姆斯.com/feed";);
>   IRI iri2 = new IRI("http://www.xn--8ws00zhy3a.com/feed";);
> 
>   System.out.println(iri1.equals(iri2));
>   System.out.println(iri1.equivalent(iri2));
> 
>   > false
>   > true
> 
> The implementation also provides things that java's URI implementation
> doesn't.  Such as scheme specific equivalent checking.
> 
> There are even test cases already that, while not 100% comprehensive,
> provide fairly decent coverage based on examples given in the various
> RFC's implemented.
> 
> That said...
> 
> Right now, the IRI implementation depends on my Unicode implementation,
> which hasn't, of course, had anywhere near the level of testing ICU has
> had.  It would be possible, however, for me to change the IRI
> implementation so that it can use either ICU or my Unicode stuff
> depending on whether ICU is in the classpath.  If ICU is present, I can
> use that unicode and IDNA implementation instead of mine.  It makes
> things a bit more complicated, but it's definitely something I can do.
> 
> What I'm proposing is that I check in my IRI/IDNA/Unicode implementation
> and that we use it as the default impl.  The code would become part of
> the parser module.  After checking the code in and updating Abdera to
> use it, I'll work on enabling the automatic ICU switch.
> 
> or...
> 
> I create a branch of the trunk and integrate my implementation into the
> branch.  We kick the tires around on it, see if it works, work on
> enabling the ICU switch and when we get both working and we're all
> comfortable with it, we merge back into the trunk.
> 
> - James
>

Re: Abdera and IRIs

Reply via email to