I've come on some trouble when parsing an Unicode character with
Clojure. I know it's likely to be a problem related to Java and not
Clojure itself but I'm looking for a Clojurish solution so that's why
I'm posting it here. FYI, I have a GNU / Linux OS on the top on which
I use emacs 24 in cunjunction with CIDER 0.10.0snapshot (package:
20150710.1304), Java 1.8.0_51, Clojure 1.6.0 and nREPL 0.2.6.

The first character of the Unicode block "CJK Unified Ideographs
Extension B" is 𠀀 (hope you can properly read it, get a Chinese font
otherwise). Emacs perfectly deals with it but in gedit, it's like this
character would have the glyph you see (something like ㄛ but more
angular) plus a negative space. In emacs it's displayed properly but
when it comes to be evaluated, the behaviour is weird:

``` Clojure
華文.core> (clojure.string/split "a𠀀a" #"\𠀀")
; => ["a" "a"]
華文.core> (clojure.string/split "a𠀀a" #"\u20000")
["a𠀀a"]
華文.core> (clojure.string/split "a𠀀a" #"[\u20000-\u2a6df]") ; it spans
over Extension B
; => ["" "𠀀"]
```

Moreover:

``` Clojure
華文.core> \u20000
; => IllegalArgumentException Invalid unicode character: \u20000
clojure.lang.LispReader.readUnicodeChar
華文.core> (int \𠀀)
; => RuntimeException Unsupported character: \𠀀
clojure.lang.Util.runtimeException (Util.java:221)
華文.core> (format "%04x" (int \u3403))
; => "3403"
華文.core> (format "%04x" (int \u20000))
; => IllegalArgumentException Invalid unicode character: \u20000
clojure.lang.LispReader.readUnicodeChar
```

Finally here is a very annoying side-effect, just like an overflow:
from 20000 it overlaps values from 0, so the whole legacy ASCII would
be contained is this block.

``` Clojure
華文.core> (clojure.string/split "cabac" #"[\u20000-\u2a6df]")
; => []
華文.core> (clojure.string/split "cabac" #"[a-b]")
; => []
```

Then I don't really know how I could handle this character. I've
picked haphazardly some characters and it seems to be the same mess
above \u9999 :/

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to