I've come on some trouble when parsing an Unicode character with Clojure. I know it's likely to be a problem related to Java and not Clojure itself but I'm looking for a Clojurish solution so that's why I'm posting it here. FYI, I have a GNU / Linux OS on the top on which I use emacs 24 in cunjunction with CIDER 0.10.0snapshot (package: 20150710.1304), Java 1.8.0_51, Clojure 1.6.0 and nREPL 0.2.6.
The first character of the Unicode block "CJK Unified Ideographs Extension B" is 𠀀 (hope you can properly read it, get a Chinese font otherwise). Emacs perfectly deals with it but in gedit, it's like this character would have the glyph you see (something like ㄛ but more angular) plus a negative space. In emacs it's displayed properly but when it comes to be evaluated, the behaviour is weird: ``` Clojure 華文.core> (clojure.string/split "a𠀀a" #"\𠀀") ; => ["a" "a"] 華文.core> (clojure.string/split "a𠀀a" #"\u20000") ["a𠀀a"] 華文.core> (clojure.string/split "a𠀀a" #"[\u20000-\u2a6df]") ; it spans over Extension B ; => ["" "𠀀"] ``` Moreover: ``` Clojure 華文.core> \u20000 ; => IllegalArgumentException Invalid unicode character: \u20000 clojure.lang.LispReader.readUnicodeChar 華文.core> (int \𠀀) ; => RuntimeException Unsupported character: \𠀀 clojure.lang.Util.runtimeException (Util.java:221) 華文.core> (format "%04x" (int \u3403)) ; => "3403" 華文.core> (format "%04x" (int \u20000)) ; => IllegalArgumentException Invalid unicode character: \u20000 clojure.lang.LispReader.readUnicodeChar ``` Finally here is a very annoying side-effect, just like an overflow: from 20000 it overlaps values from 0, so the whole legacy ASCII would be contained is this block. ``` Clojure 華文.core> (clojure.string/split "cabac" #"[\u20000-\u2a6df]") ; => [] 華文.core> (clojure.string/split "cabac" #"[a-b]") ; => [] ``` Then I don't really know how I could handle this character. I've picked haphazardly some characters and it seems to be the same mess above \u9999 :/ -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.