Re: [VOTE] POM Element for Source File Encoding

Milos Kleint Wed, 09 Apr 2008 12:02:48 -0700

On Wed, Apr 9, 2008 at 7:36 PM, Benjamin Bentmann
<[EMAIL PROTECTED]> wrote:
>
> > Make sure you consider the case where you have people developing the  same
> code base all over the world, and the possible reasoning of  falling back to
> platform default encoding. Consider the team spread  across the US, Russia,
> and China and what do they do normally?
> >
>
>  This international spread of developers is in particular the case we have
> in mind. I mean, how should such a team (say the Maven community) deliver
> reliable build output if not all developers have agreed to use the same file
> encoding for the sources? Say the US devs would have ASCII as default
> encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri.
> Even if all have agreed to use English for coding, you still might encounter
> Non-ASCII characters that get messed up, e.g. in javadoc comments that carry
> the name of the contributor/committer. Other developers might experience
> build failures because of encoding mismatch, at best other people's names
> are disfigured which is rather impolite.
>
>  The Eclipse folks had a similar problem [0]. The solution: Lock the
> encoding down for the entire project.\


just for the record, netbeans.org projects all use UTF-8. We have devs
in US, Czech rep, Russia and elsewhere. Netbeans allows to set default
encoding per project, for maven project I currently lookup how
maven-compiler-plugin is configured. If no configuration is in place I
fallback to platform encoding.

Encoding is not only different across countries but also across
platforms. While most Linux distributions use UTF-8, you get different
encoding based on what localized version of Windows you buy I think.
East european set is different from west europe. My Mac fallbacks to
something called MacRoman as default encoding.

Milos



>
>
>
> > Is it possible to specify an encoding in one place that doesn't work
> somewhere else?
> >
>
>  Yes, in theory you can have one user specify an encoding that another
> user's JVM does not support. As the class javadoc about Charset [1] states,
> only a few encodings - including Latin-1 and UTF-8 - are required to be
> supported, although the reference implementation from Sun supports quite
> more encodings [2]. However, I don't consider this as a practical concern.
> Given that support for UTF-8 is mandatory, there exists an encoding that can
> handle quite any character people would like to enter and Java can handle.
> Hence there exists a solution that works for everyone on the team.
>
>
>
> > I am fortunate in that I've never seen an encoding problem in Maven
> personally. In your proposal you talk about aligning the encoding  value but
> my question in what cases have you found the default  encoding not working
> as you don't talk about that at all in the  proposal.
> >
>
>  Well, choose your favorite from a search for "encoding" on all Maven 2
> projects in JIRA ;-)
>  - http://jira.codehaus.org/browse/MNG-2932
>  - http://jira.codehaus.org/browse/MANTTASKS-14
>  - http://jira.codehaus.org/browse/MTAGLIST-27
>  - http://jira.codehaus.org/browse/MRELEASE-302
>  - http://jira.codehaus.org/browse/DOXIA-103
>  - http://jira.codehaus.org/browse/MCHANGES-71
>  - (about 300 more hits)
>
>  ASCII is quite safe, but anything which requires more than those 7 bits
> just needs special care.
>
>
>
> > Do you know what happens with all the tools that people use. Like checking
> into all SCMs, and what happens when people checkout on to  their system,
> editors, IDEs. I'm merely suggesting that their might be  a reason most
> things fall back to the default encoding on the system  because it's
> generally been a hard thing to coral.
> >
>
>  In principle you're right, most of the tools are intended for usage with
> the platform's encoding. This seems to include the popular diff/patch tools
> used by many SCMs, they have not really support for different encodings [3]
> (yet another historic design flaw, next to the two-digit year format).
>
>  Also, the SCMs themselves seem not to care about (file content) encoding
> yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it.
> However, as far as I can tell, not knowing about file encoding SCMs also do
> not perform any conversions on the file content but simply assume a simple
> byte-to-char mapping like ASCII when doing EOL normalization or keyword
> substitution.
>
>  As for editors and IDEs: Even this tiny thing "Notepad" from Windows
> supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody
> know about a popular editor/IDE that calls itself mature but does not allow
> to configure file encoding?
>
>
>  Benjamin
>
>
>  [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
>  [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html
>  [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
>  [3]
> http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization
>  [4]
> http://bazaar-vcs.org/UnicodeSupport?action=show&redirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e
>  [5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [EMAIL PROTECTED]
>  For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] POM Element for Source File Encoding

Reply via email to