Re: [POLL] Default Value for File Encoding

Benjamin Bentmann Tue, 29 Apr 2008 08:51:18 -0700

Brian E. Fox wrote:

Can you outline in what cases and in what ways this change could break
existing builds


Surely. About the cases that might suffer from the change: We propose to use
Latin-1 as the default encoding in case the user did not specify it. So
first up, everybody who already explicitly declares an encoding will not
notice the change, i.e. if your POM looks like

 <plugin>
   <artifactId>maven-compiler-plugin</artifactId>
   <configuration>
     <encoding>big5</encoding>
     ...
   </confinguration>
 </plugin>

the build will work just as before (using big5) when you switch to the newer
plugin version that incorporates our proposal.

In contrast, the build will likely break if you effectively use an encoding
other than Latin-1 or ASCII (ASCII is just a subset of Latin-1) but did not
declare this in the configuration for the various plugins. The prime example
for potentially affected builds seem to be Asian projects that naturally use
the Non-Western encoding of the platforms (compare the comments on our wiki
article).

As for the kind of break: The best case is a plugin that entirely refuses
its work via an exception because the file contents it is trying to process
violates the assumed encoding (e.g. Latin-1 byte sequences are in general
not valid UTF-8 byte sequences). Why do I call this build failure a best
case? Because it tells you straight out that the desired encoding needs to
be declared in the POM. The other way is a plugin that works but silently
outputs garbage. This is more subtle but it requires human review to detect.
That's easy if you know where to look (Non-ASCII characters) but again
requires a user being aware of the issue.

and what it would take for the user to fix?


In one line: State the encoding you want to use in the POM.

The POM is our means to configure a build. If its default values don't fit
your need, you can always go ahead and explicitly add the configuration
element.

When we consider the state as is, i.e. the release versions of the plugins
and Maven, that means to configure each and every plugin separately. Once we
have the plugin versions released that follow our proposal and adhere to the
convention of evaluating the POM property "${project.build.sourceEncoding}",
this configuration can in most cases reduced to adding

 <properties>
   <project.build.sourceEncoding>...</project.build.sourceEncoding>
 </properties>

 Could a tool be created to correct it automatically?


I believe the answer is "no". This is basically related to the discussion we
had over on dev@ with Jason regarding the usage of JChardet [0]. A machine
tool cannot reliable tell what file encoding your sources use (because it
would need to semantically understand text). So this a human task but that
should be easily done.


Benjamin


[0]
http://www.nabble.com/-VOTE--POM-Element-for-Source-File-Encoding-to16515820s177.html


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [POLL] Default Value for File Encoding

Reply via email to