I agree that the number of encodings makes a full proof transparent solution 
impossible to implement.

I still think that some simpler text file handling out of the box should exist 
on the JVM to read utf files.

Utf-8 is kind of natural within the JVM.

Exposing all this BOM machinery every time you need to read a text file is a 
pain.

Either implement BOM recognition on the fly or make it mandatory in utf-8 files 
every where.

The BOM is required for utf-16 and above as far as I know.

The time spent on stupid issues like this one must be significant given the 
number of people struggling with this...


Sent from my iPhone

> On Jul 13, 2015, at 18:46, Sungjin Chun <chu...@castlesoft.co.kr> wrote:
> 
> Assume that charset is the same, even this case, there're many types of 
> encoding scheme for it and for portability,
> you have to consider both input and output encoding. On Mac OS X or Linux, 
> this is controlled by locale system,
> on windows 1. you can force encoding system using control panel or you have 
> to change your encoding before
> output to console. Here, we in korea, do this stuffs for internationalized 
> application development. Of course, you have
> to use correct charset for i18n application :-)
> 
>> On Mon, Jul 13, 2015 at 11:56 PM Luc Préfontaine 
>> <lprefonta...@softaddicts.ca> wrote:
>> I cannot remember the details but in 2010 I had similar problem in a 
>> cross-platform project
>> using Clojure. And problems earlier in another cross-platform/cross-language 
>> project.
>> 
>> So it's the reverse way, no BOM at all...
>> 
>> Can't believe we are in 2015 still struggling with character set issues.
>> Having to to think about this when saving a file in notepad...That's 
>> depressing.
>> No wonder why I now stay away from Windows as much as possible.
>> 
>> I can't understand why we cannot get some transparent behavior from the Java 
>> runtime.
>> These are human readable text files. Not some unreadable binary format.
>> Googled a bit about this and numerous people face this problem reading 
>> windows generated
>> files. They all ended up having to skip the BOM if present when reading the 
>> file.
>> 
>> So much for portability. Beurk.
>> 
>> > On Mon, Jul 13, 2015 at 2:52 PM, Luc Préfontaine <
>> > lprefonta...@softaddicts.ca> wrote:
>> >
>> > > BG is right on it. I hit this problem a decade ago (roughly :)).
>> > > UTF-8 files with no BOM are not handled properly on windows.
>> > > It assumes that they are ASCII coded. That works partially (both 
>> > > character
>> > > sets have the same
>> > > encoding for many characters) but eventually fails.
>> > >
>> >
>> > > Make sure that the files have a BOM. You can do this on a per file basis
>> > > using an IDE
>> > > (Eclipse, ...) or if you can use bash scripts to do this if you have
>> > > access to a u*x environment.
>> > > I did not find an equivalent native windows tool but they might be some 
>> > > to
>> > > do this in batch.
>> > >
>> > > Luc P.
>> > >
>> >
>> > Clojure source files are expected to be in UTF-8 and Clojure on Windows
>> > doesn't require a BOM.
>> >
>> > In fact, Clojure files must not contain a BOM because it isn't considered
>> > to be whitespace by the clojure parser and will cause the error "Unable to
>> > resolve symbol: ? in this context".
>> >
>> > Some software, such as Windows notepad uses the presence of a BOM to detect
>> > UTF-8, but that can be overridden in the File | Open dialog.  Other than
>> > that, the behaviour of the BOM on Clojure between Linux and Windows should
>> > be the same - this stuff is all handled by Java code in the JDK - not by
>> > the Windows platform.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "Clojure" group.
>> > To post to this group, send email to clojure@googlegroups.com
>> > Note that posts from new members are moderated - please be patient with 
>> > your first post.
>> > To unsubscribe from this group, send email to
>> > clojure+unsubscr...@googlegroups.com
>> > For more options, visit this group at
>> > http://groups.google.com/group/clojure?hl=en
>> > ---
>> > You received this message because you are subscribed to the Google Groups 
>> > "Clojure" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to clojure+unsubscr...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>> >
>> --
>> Luc Préfontaine<lprefonta...@softaddicts.ca> sent by ibisMail!
>> 
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Clojure" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/clojure/Rk5JGhq-IJY/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to