Brian E. Fox wrote:
Can you outline in what cases and in what ways this change could break existing builds
Surely. About the cases that might suffer from the change: We propose to use Latin-1 as the default encoding in case the user did not specify it. So first up, everybody who already explicitly declares an encoding will not notice the change, i.e. if your POM looks like <plugin> <artifactId>maven-compiler-plugin</artifactId> <configuration> <encoding>big5</encoding> ... </confinguration> </plugin> the build will work just as before (using big5) when you switch to the newer plugin version that incorporates our proposal. In contrast, the build will likely break if you effectively use an encoding other than Latin-1 or ASCII (ASCII is just a subset of Latin-1) but did not declare this in the configuration for the various plugins. The prime example for potentially affected builds seem to be Asian projects that naturally use the Non-Western encoding of the platforms (compare the comments on our wiki article). As for the kind of break: The best case is a plugin that entirely refuses its work via an exception because the file contents it is trying to process violates the assumed encoding (e.g. Latin-1 byte sequences are in general not valid UTF-8 byte sequences). Why do I call this build failure a best case? Because it tells you straight out that the desired encoding needs to be declared in the POM. The other way is a plugin that works but silently outputs garbage. This is more subtle but it requires human review to detect. That's easy if you know where to look (Non-ASCII characters) but again requires a user being aware of the issue.
and what it would take for the user to fix?
In one line: State the encoding you want to use in the POM. The POM is our means to configure a build. If its default values don't fit your need, you can always go ahead and explicitly add the configuration element. When we consider the state as is, i.e. the release versions of the plugins and Maven, that means to configure each and every plugin separately. Once we have the plugin versions released that follow our proposal and adhere to the convention of evaluating the POM property "${project.build.sourceEncoding}", this configuration can in most cases reduced to adding <properties> <project.build.sourceEncoding>...</project.build.sourceEncoding> </properties>
Could a tool be created to correct it automatically?
I believe the answer is "no". This is basically related to the discussion we had over on dev@ with Jason regarding the usage of JChardet [0]. A machine tool cannot reliable tell what file encoding your sources use (because it would need to semantically understand text). So this a human task but that should be easily done. Benjamin [0] http://www.nabble.com/-VOTE--POM-Element-for-Source-File-Encoding-to16515820s177.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]