I made some progress.
[By the way, NetBean's console displays *everything* 100% fine.
I decided to use one of the worst repl consoles: that of IntelliJ.
I want to make sure I really understand what's the point behind all
this.]
(import '(java.io PrintWriter PrintStream FileInputStream)
'(java.nio CharBuffer ByteBuffer)
'(java.nio.charset Charset CharsetDecoder CharsetEncoder)
'(org.xml.sax InputSource))
(def utf8 "UTF-8")
(def d-utf8 (.newDecoder (Charset/forName utf8)))
(def e-utf8 (.newEncoder (Charset/forName utf8)))
(def latin1 "ISO-8859-1")
(def d-latin1 (.newDecoder (Charset/forName latin1)))
(def e-latin1 (.newEncoder (Charset/forName latin1)))
(defmacro with-out-encod
[encoding & body]
`(binding [*out* (PrintWriter. (PrintStream. System/out true
~encoding) true)]
~...@body
(flush)))
(def s "québécois français")
(print s) ;quÔøΩbÔøΩcois franÔøΩaisnil
(with-out-encod latin1 (print s)) ;qu?b?cois fran?aisnil
(with-out-encod utf8 (print s)) ;qu?b?cois fran?aisnil
(def encoded (.encode e-utf8
(CharBuffer/wrap "québécois français")))
(def s-d
(.toString (.decode d-utf8 encoded)))
(print s-d) ;quÔøΩbÔøΩcois franÔøΩaisnil
(with-out-encod latin1 (print s-d)) ;qu?b?cois fran?aisnil
(with-out-encod utf8 (print s-d)) ;qu?b?cois fran?aisnil
(def f-d
(:content (let [x (InputSource. (FileInputStream. "french.xml"))]
(.setEncoding x latin1)
(clojure.xml/parse x))))
(print f-d) ;quÔøΩbÔøΩcois franÔøΩaisnil
(with-out-encod latin1 (print f-d)) ;québécois français
(with-out-encod utf8 (print f-d)) ;québécois français
So my theory, which is still almost certainly wrong, is:
1. When the input is a file whose encoding is, say, latin-1, it's easy
to decode it and then encode it however one wants.
2. When the input is a literal string in the source file, it looks
like it's impossible to encode it correctly, unless one first decodes
it from the source file's encoding. But then, I don't yet know how to
do this without actually reading the source file. :\
Daniel Jomphe wrote:
> I tried under eclipse.
>
> Default console encoding configuration (MacRoman):
>
> #'user/s
> quÔøΩbÔøΩcois franÔøΩaisnil
> qu?b?cois fran?aisnil
>
> #'user/snc
> qu?b?cois fran?aisnil
> qu?b?cois fran?aisnil
>
> Console configured to print using ISO-8859-1:
>
> #'user/s
> qu�b�cois fran�aisnil
> qu?b?cois fran?aisnil
>
> #'user/snc
> qu?b?cois fran?aisnil
> qu?b?cois fran?aisnil
>
> Console configured to print using UTF-8:
>
> #'user/s
> québécois françaisnil
> québécois françaisnil
>
> #'user/snc
> québécois françaisnil
> québécois françaisnil
>
> So as I come to understand it, it looks like UTF-8 should be the rolls-
> royce for my needs.
>
> May I correctly conclude the following?
>
> Don't bother about encodings unless you're displaying something and
> it's unreadable; then, don't bother about it in the code; find a
> proper console or viewer.
>
> Doesn't that sound like offloading a problem to users? Isn't there
> something reliable that can be done in the code?
>
> Daniel Jomphe wrote:
> > Sorry for all these posts.
>
> > I pasted my last post's code into a fresh repl (not in my IDE), and
> > here's what I got (cleaned up):
>
> > #'user/s
> > québécois françaisnil
> > qu?b?cois fran?aisnil
>
> > #'user/snc
> > québécois françaisnil
> > qu?b?cois fran?aisnil
>
> > I'm not sure what to make out of it.
>
> > My terminal (Apple Terminal) supports the encoding, and prints
> > correctly s and snc out of the box.
> > When I use with-out-encoded, I actually screw up both s and snc's
> > printing.
>
> > Daniel Jomphe wrote:
> > > Now that I know for sure how to bind *out* to something else over
> > > System/out, it's time to bring back my encoding issues into scope:
>
> > > (import '(java.io PrintWriter PrintStream))
>
> > > (defmacro with-out-encoded
> > > [encoding & body]
> > > `(binding [*out* (java.io.PrintWriter. (java.io.PrintStream.
> > > System/out true ~encoding) true)]
> > > ~...@body
> > > (flush)))
>
> > > (def nc "ISO-8859-1")
>
> > > ;;; with a normal string
> > > (def s "québécois français")
>
> > > (print s)
> > > ; quÔøΩbÔøΩcois franÔøΩaisnil
>
> > > (with-out-encoded nc (print s))
> > > ; qu?b?cois fran?aisnil
>
> > > ;;; with a correctly-encoded string
> > > (def snc (String. (.getBytes s nc) nc))
>
> > > (print snc)
> > > ; qu?b?cois fran?aisnil
>
> > > (with-out-encoded nc (print snc))
> > > ; qu?b?cois fran?aisnil
>
> > > I'm certainly missing something fundamental somewhere.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---