On Wed, Apr 9, 2008 at 7:36 PM, Benjamin Bentmann <[EMAIL PROTECTED]> wrote: > > > Make sure you consider the case where you have people developing the same > code base all over the world, and the possible reasoning of falling back to > platform default encoding. Consider the team spread across the US, Russia, > and China and what do they do normally? > > > > This international spread of developers is in particular the case we have > in mind. I mean, how should such a team (say the Maven community) deliver > reliable build output if not all developers have agreed to use the same file > encoding for the sources? Say the US devs would have ASCII as default > encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri. > Even if all have agreed to use English for coding, you still might encounter > Non-ASCII characters that get messed up, e.g. in javadoc comments that carry > the name of the contributor/committer. Other developers might experience > build failures because of encoding mismatch, at best other people's names > are disfigured which is rather impolite. > > The Eclipse folks had a similar problem [0]. The solution: Lock the > encoding down for the entire project.\
just for the record, netbeans.org projects all use UTF-8. We have devs in US, Czech rep, Russia and elsewhere. Netbeans allows to set default encoding per project, for maven project I currently lookup how maven-compiler-plugin is configured. If no configuration is in place I fallback to platform encoding. Encoding is not only different across countries but also across platforms. While most Linux distributions use UTF-8, you get different encoding based on what localized version of Windows you buy I think. East european set is different from west europe. My Mac fallbacks to something called MacRoman as default encoding. Milos > > > > > Is it possible to specify an encoding in one place that doesn't work > somewhere else? > > > > Yes, in theory you can have one user specify an encoding that another > user's JVM does not support. As the class javadoc about Charset [1] states, > only a few encodings - including Latin-1 and UTF-8 - are required to be > supported, although the reference implementation from Sun supports quite > more encodings [2]. However, I don't consider this as a practical concern. > Given that support for UTF-8 is mandatory, there exists an encoding that can > handle quite any character people would like to enter and Java can handle. > Hence there exists a solution that works for everyone on the team. > > > > > I am fortunate in that I've never seen an encoding problem in Maven > personally. In your proposal you talk about aligning the encoding value but > my question in what cases have you found the default encoding not working > as you don't talk about that at all in the proposal. > > > > Well, choose your favorite from a search for "encoding" on all Maven 2 > projects in JIRA ;-) > - http://jira.codehaus.org/browse/MNG-2932 > - http://jira.codehaus.org/browse/MANTTASKS-14 > - http://jira.codehaus.org/browse/MTAGLIST-27 > - http://jira.codehaus.org/browse/MRELEASE-302 > - http://jira.codehaus.org/browse/DOXIA-103 > - http://jira.codehaus.org/browse/MCHANGES-71 > - (about 300 more hits) > > ASCII is quite safe, but anything which requires more than those 7 bits > just needs special care. > > > > > Do you know what happens with all the tools that people use. Like checking > into all SCMs, and what happens when people checkout on to their system, > editors, IDEs. I'm merely suggesting that their might be a reason most > things fall back to the default encoding on the system because it's > generally been a hard thing to coral. > > > > In principle you're right, most of the tools are intended for usage with > the platform's encoding. This seems to include the popular diff/patch tools > used by many SCMs, they have not really support for different encodings [3] > (yet another historic design flaw, next to the two-digit year format). > > Also, the SCMs themselves seem not to care about (file content) encoding > yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it. > However, as far as I can tell, not knowing about file encoding SCMs also do > not perform any conversions on the file content but simply assume a simple > byte-to-char mapping like ASCII when doing EOL normalization or keyword > substitution. > > As for editors and IDEs: Even this tiny thing "Notepad" from Windows > supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody > know about a popular editor/IDE that calls itself mature but does not allow > to configure file encoding? > > > Benjamin > > > [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898 > [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html > [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html > [3] > http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization > [4] > http://bazaar-vcs.org/UnicodeSupport?action=show&redirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e > [5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]