On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum <gu...@python.org> wrote: > I have a different question about IronPython and Jython now. Do their > regular expression libraries support Unicode better than CPython's? > E.g. does "." match a surrogate pair? Tom C suggests that Java's regex > libraries get this and many other details right despite Java's use of > UTF-16 to represent strings. So hopefully Jython's re library is built > on top of Java's? > > PS. Is there a better contact for Jython? The best contact for Unicode and Jython is Jim Baker (I added him to the cc) - I'll do my best to answer though: Java 5 added a bunch of methods for dealing with Unicode that doesn't fit into 2 bytes - and looking at our code for our Unicode object, I see that we are using methods like the codePointCount method off of java.lang.String to compute length[1] and using similar methods all through that code to make sure we deal in code points when dealing with unicode. So it looks pretty good for us as far as I can tell.
[1] http://download.oracle.com/javase/6/docs/api/java/lang/String.html#codePointCount(int, int) -Frank Wierzbicki _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com