Re: Common Bugs

2008-04-08 Thread Vincent Siveton
2008/4/7, Benjamin Bentmann [EMAIL PROTECTED]:
  I wonder if it's worth posting these as a series under the developers
  section of the Maven site?
 

  Vincent and I had already put parts of this stuff onto [0] in a section
  named Some Pitfalls, together with a link to this mail thread. But I
  agree, having all of this in a nicely formatted APT doc on the site is a
  good idea.

  I suggest we move the Some Pitfalls section out into a standalone
 document such that we can list it on the documentation index. If nobody else
 goes for this, it will need to wait some days until I get the next free time
 slice to merge and clean it up for proper presentation.


+1

Vincent


  Benjamin


  [0]
 http://maven.apache.org/guides/plugin/guide-java-plugin-development.html



 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-04-07 Thread Benjamin Bentmann

I wonder if it's worth posting these as a series under the developers
section of the Maven site?


Vincent and I had already put parts of this stuff onto [0] in a section
named Some Pitfalls, together with a link to this mail thread. But I
agree, having all of this in a nicely formatted APT doc on the site is a
good idea.

I suggest we move the Some Pitfalls section out into a standalone document 
such that we can list it on the documentation index. If nobody else goes for 
this, it will need to wait some days until I get the next free time slice to 
merge and clean it up for proper presentation.



Benjamin


[0] http://maven.apache.org/guides/plugin/guide-java-plugin-development.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-04-06 Thread Benjamin Bentmann

Hi,

6) URLs and Filesystem Paths

URLs and filesystem paths are really two different beasts and converting 
between them is not trivial. The main source of problems is that different 
encoding rules apply for the strings that make up a URL or filesystem path.


For example, consider the following code snippet:

 File file = new File( foo bar+foo );
 URL url = file.toURI().toURL();
 System.out.println( file.toURL() );
 System.out.println( url );
 System.out.println( url.getPath() );
 System.out.println( URLDecoder.decode( url.getPath(), UTF-8 ) );

which outputs something like

 file:/M:/scratch-pad/foo bar+foo
 file:/M:/scratch-pad/foo%20bar+foo
 /M:/scratch-pad/foo%20bar+foo
 /M:/scratch-pad/foo bar foo

First of all, please note that File.toURL() [1] does not escape the space 
character. This yields an invalid URL, as per RFC 2396 [0], section 2.4.3 
Excluded US-ASCII Characters. The class java.net.URL will silently accept 
such invalid URLs, in contrast java.net.URI will not (see also URL.toURI() 
[2]). For this reason, this API method has already been deprecated and 
should be replaced with File.toURI().toURL().


Next, URL.getPath() does in general not return a string that can be used as 
a filesystem path. It returns a substring of the URL and as such can contain 
escape sequences. The prominent example is the space character which will 
show up as %20. People sometimes hack around this by means of 
replace(%20,  ) but that does simply not cover all cases. It's worth to 
mention that on the other hand the related method URI.getPath() [3] does 
decode escapes but still the result is not a filesystem path (compare the 
source for the constructor File(URI)).


To decode a URL, people sometimes also choose java.net.URLDecoder [4]. The 
pitfall with this class is that is actually performs HTML form decoding 
which is yet another encoding and not the same as the URL encoding (compare 
last paragraph in class javadoc about java.net.URL). For instance, a 
URLDecoder will errorneously convert the character + into a space as 
illustrated by the last sysout in the example above.


Code targetting JRE 1.4+ should easily avoid these problems by using

 new File( new URI( url.toString() ) )

when converting a URL to a filesystem path and

 file.toURI().toURL()

when converting back.

Regards,


Benjamin Bentmann


[0] http://www.faqs.org/rfcs/rfc2396.html
[1] http://java.sun.com/javase/6/docs/api/java/io/File.html#toURL()
[2] http://java.sun.com/javase/6/docs/api/java/net/URL.html#toURI()
[3] http://java.sun.com/javase/6/docs/api/java/net/URI.html#getPath()
[4] http://java.sun.com/javase/6/docs/api/java/net/URLDecoder.html 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-04-06 Thread Benjamin Bentmann

 new File( new URI( url.toString() ) )


Correction:
JRE 1.4 is happily returning invalid/unescaped URLs from 
ClassLoader.getResource(), making the above suggestion fail with a 
URISyntaxException.


The new suggestion is to use FileUtils.toFile(URL) [0] from Commons IO. A 
similar methods exists in Plexus Utils but it's currently not decoding 
escape sequences.



Benjamin


[0] 
http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#toFile(java.net.URL) 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-04-06 Thread Brett Porter

Hey Benjamin,

I wonder if it's worth posting these as a series under the developers  
section of the Maven site?


- Brett

On 06/04/2008, at 9:46 PM, Benjamin Bentmann wrote:


new File( new URI( url.toString() ) )


Correction:
JRE 1.4 is happily returning invalid/unescaped URLs from  
ClassLoader.getResource(), making the above suggestion fail with a  
URISyntaxException.


The new suggestion is to use FileUtils.toFile(URL) [0] from Commons  
IO. A similar methods exists in Plexus Utils but it's currently not  
decoding escape sequences.



Benjamin


[0] 
http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#toFile(java.net.URL)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Brett Porter
[EMAIL PROTECTED]
http://blogs.exist.com/bporter/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-04-06 Thread Barrie Treloar
On Mon, Apr 7, 2008 at 6:21 AM, Brett Porter [EMAIL PROTECTED] wrote:
 Hey Benjamin,

  I wonder if it's worth posting these as a series under the developers
 section of the Maven site?

And building custom Checsktyle/PMD rules (if built in ones dont exist)
to flag these as error?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-03-08 Thread Benjamin Bentmann

Hi,

5) Reading and Writing Text Files

Textual content is composed of characters while file systems merely store
byte streams. A file encoding (aka charset) is used to convert between bytes
and characters. The challenge is using the right file encoding...

The JVM has this notion of a default encoding (file.encoding property)
which it derives from a system's locale or whatever. While this might be a
convenient feature sometimes, using this default encoding for a project
build is in general a bad idea: The build output will depend on the
machine/developer who runs the build. As such, usage of the default encoding
threatens the dream of a reproducible build.

For example, if developer A has UTF-8 as default encoding while developer B
uses ISO-8859-1, text files are very likely to get messed up during resource
filtering or similar tasks.

Therefore, plugin developers should avoid any direct or indirect usage
of the classes FileWriter and FileReader. Instead,
OutputStreamWriter/-Reader should be used with an explicit encoding value.
The required encoding value can be obtained from a configuration parameter
like this:

 /*
  * @parameter default-value=ISO-8859-1
  */
 private String encoding;

Providing a default value resembles the JVM's concept of a default encoding
with an important difference: This time, the default is specified by the
plugin and hence controlled by the POM. This way, all builds from the same
POM can be guranteed to use the same encoding regardless of the JVM
executing Maven.

Plugins that already provide means to specify the encoding should make sure
they have a default value for this parameter. This is to follow Maven's
philosophy of convention over configuration where users should get the
best practice out-of-the-box when a reasonable default can be assumed.

Handling XML files is a little different because these files are equipped
with an encoding declaration. Thanks to Herve Boutemy, plexus-utils provides
a convenient Reader-/WriterFactory for the magic of auto-detecting the
encoding from a byte stream (see also [0]). When writing XML files without
the XmlStreamWriter, be sure to ensure the encoding used for the output
writer matches the encoding specified by the XML declaration being written. 
Otherwise later parsing the output might fail.


Regards,


Benjamin Bentmann


[0] http://docs.codehaus.org/display/MAVENUSER/XML+encoding


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-02-02 Thread Benjamin Bentmann

Hi again,

4) Effective Output Directory for Report Plugins

Most reporting plugins will inherit from AbstractMavenReport. In doing so,
they need to implement the inherited but abstract method
getOutputDirectory(). To implement this method, plugins usually declare a
field named outputDirectory which they return in the method. Nothing wrong
so far, but please read on.

Some plugins need to create additional files in the report output directory
that accompany the report generated via the sink. While it is tempting to
use either the method getOutputDirectory() or the field outputDirectory
directly in order to setup a path for the output files, this leads most
likely to a bug. More precisely, those plugins will not properly output
files when run by the Maven Site Plugin as part of the site lifecycle. This
is best noticed when the output directory for the site is configured
directly in the Maven Site Plugin such that it deviates from the expression
${project.reporting.outputDirectory} that the plugins use. Multi-language
site generation is another scenario to exploit the bug.

The reason for this: During the site lifecycle, the Maven Site Plugin takes
control of the output directory, that's why all reports (ideally) end up in
the same directory. In order to do, it invokes
MavenReport.setReportOutputDirectory() on each report. Reports are expected
to use the supplied path and not a value from the POM configuration of the
corresponding reporting plugin.

As the take home message: If one needs to know the effective output
directory for the report, MavenReport.getReportOutputDirectory() is your
friend.

And what about AbstractMavenReport.getOutputDirectory()? Well, the
implementation of getReportOutputDirectory() in AbstractMavenReport will
fallback to this method if the Maven Site Plugin did not previously set the
output directory (see also [0]).

Regards,


Benjamin Bentmann


[0] http://jira.codehaus.org/browse/MNG-3369


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-01-17 Thread Benjamin Bentmann

Greetings,

I remember another subtle issue which I would like to make people aware of.

3) Case-Insensitive String Comparison

When developers need to compare strings without regard to case or want to
realize a map with case-insensitive string keys, they often employ
String.toLowerCase() or String.toUpperCase() to create a normalized string
before doing a simple String.equals(). Now, the to*Case() methods are
overloaded: One takes no arguments and one take a Locale object.

The gotcha with the arg-less methods is that their output depends on the
default locale of the JVM but the default locale is out of control of the
developer (see also [0], [1]). That means the string expected by the
developer (who runs/tests his code in a JVM using locale xy) does not
necessarily match the string seen by another user (that runs a JVM with
locale ab). For example, the comparison

 info.equals(debugLevel.toLowerCase())

is likely to fail for systems with default locale Turkish.

Since developers usually want to compare strings from the English language,
they must use String.to*Case(Locale.ENGLISH) to get reliable results
regardless of the end-user's default locale.

Just to make the picture complete: String.to*Case() is locale-sensitive and
context-aware. In contrast, Character.to*Case() and
String.equalsIgnoreCase() (which relies on Character.to*Case()) are neither
locale-sensitive nor context-aware [2]. For instance

 ΣΣ.toLowerCase(Locale.ENGLISH).equals(σσ)

is false while

 ΣΣ.equalsIgnoreCase(σσ)

is true (because the lower case form of ΣΣ is σς).

Regards,


Benjamin Bentmann


[0]
http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()
[1] http://cafe.elharo.com/blogroll/turkish/
[2]
http://java.sun.com/javase/6/docs/api/java/lang/Character.html#toLowerCase(char)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Common Bugs

2008-01-14 Thread Timothy Reilly
Just an observation,

but would it not be a good idea to codify these heuristics in a plugin
and add them to the plugins pom.

For example, PMD or some other plugin might generate violations for
these.

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Common Bugs

2008-01-14 Thread Brett Porter

That sounds like a great idea :)

- Brett

On 15/01/2008, at 12:35 AM, Timothy Reilly wrote:


Just an observation,

but would it not be a good idea to codify these heuristics in a plugin
and add them to the plugins pom.

For example, PMD or some other plugin might generate violations for
these.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Common Bugs

2008-01-13 Thread Benjamin Bentmann

Dear developers,

After some months of reporting issues for Maven, I have noticed common bugs
in various plugins. For me it seems that many developers are either
completely unaware of the problems (i.e. do not handle them at all) or are
misunderstanding them (i.e. handle them insufficiently). Therefore, I would
like to illustrate the source of those problems in more detail such that
developers can start to prevent bugs right from the beginning rather than
fixing them afterwards.

1) Relative Paths

It is common practice for users of Maven to specify relative paths in the
POM, not to mention that the Super POM does so, too. The intention is almost
always to resolve such relative paths against the base directory of the
current project. In other words, the paths
 target/classes
and
 ${basedir}/target/classes
should resolve to the same directory for a given POM.

Unfortunately, the class java.io.File does not resolve relative paths
against the project's base directory. As mentioned in its class javadoc [0],
it resolves relative paths against the current working directory. In plain
English: Unless a Maven component has complete control over the current
working directory, any usage of java.io.File in combination with a relative
path is a bug.

At first glance, one might be tempted to argue that the project base
directory is equal to the current working directory. However, this
assumption is generally not true. Consider the following scenarios:

a) Reactor Builds
When a child module is build during a reactor build, the current working is
usually the base directory of the parent project, not the base directory of
the current module. That is the most common scenario where users are faced
with the bug.

b) Embedded Maven Invocations
Other tools, most prominently IDEs, that run Maven under the hood may have
set the current working directory to their installation folder or whatever
they like.

c) Maven Invocations using the -f switch
While it is surely an uncommon use-case, the user is free to invoke Maven
from an arbitrary working directory by specifying an absolute path like
 mvn -f /home/me/projects/demo/pom.xml

In order to guarantee reliable builds, Maven and its plugins must manually
resolve relative paths against the project's base directory. A simple idiom
like the following should do just fine:

 File file = new File( path );
 if ( !file.isAbsolute() )
 {
   file = new File( project.getBasedir(), file );
 }

Many Maven plugins can get this resolution automatically if they declare
their affected mojo parameters of type java.io.File instead of
java.lang.String. This subtle difference in parameter types will trigger a
feature that seems to be known as path translation, i.e. Maven itself will
properly resolve relative paths when it pumps the XML configuration into a
mojo.

While it will not cure this problem completely, I suggest to review the
MavenProject API such that it ensures the following invariant: All paths
reported by means of a getter-like method are absolute. This basically
requires to resolve all relative paths that are pushed in by a setter-like
method. MavenProject.addCompileSourceRoot() is just one example for a public
API method that currently blindly accepts a relative path, giving rise to
later failures because a misbehaving component resolves this path not
properly.

2) Resource Bundle Families

Especially reporting plugins employ resource bundles to support
internationalization. One language (usually English) is provided as the
fallback/default language in the base resource bundle. Due to the lookup
strategy performed by ResourceBundle.getBundle() [1], one must always
provide a dedicated resource bundle for this default language, too. This
bundle can be empty because it inherits the strings via the parent chain
from the base bundle, but it must exist.

Take the following example. A report mojo uses a resource bundle family with
the base name mymojo-report and erroneously provides only the following
bundles:
- mymojo-report.properties  (base bundle for English)
- mymojo-report_de.properties   (specific bundle for German)
Further assume that the default locale of the current JVM is de. Now, when
ResourceBundle.getBundle() is called to retrieve the bundle for locale en,
it will try the following candidate bundles and use the first one found:
- mymojo-report_en.properties   (candidate for requested locale)
- mymojo-report_de.properties   (candidate for default locale)
- mymojo-report.properties  (base bundle, always last)
Because mymojo-report_en.properties does not exist, the bundle
mymojo-report_de.properties will be chosen instead of the base bundle
which would have provided the requested English translation.

Plugin developers can easily expose this bug when calling
 mvn site -Dlocales=xy,en
where xy denotes some other language supported by the plugin. Specifying
xy as the first locale will have the Maven Site Plugin change the JVM's
default locale to xy which in turn causes

Re: Common Bugs

2008-01-13 Thread Milos Kleint
+1 on that.

for embedded use in IDEs there's additional pitfalls.

1. use of System.getProperties()
2. java.lang.Runtime's shutdown hooks.

Milos

On Jan 13, 2008 10:21 AM, Benjamin Bentmann [EMAIL PROTECTED] wrote:
 Dear developers,

 After some months of reporting issues for Maven, I have noticed common bugs
 in various plugins. For me it seems that many developers are either
 completely unaware of the problems (i.e. do not handle them at all) or are
 misunderstanding them (i.e. handle them insufficiently). Therefore, I would
 like to illustrate the source of those problems in more detail such that
 developers can start to prevent bugs right from the beginning rather than
 fixing them afterwards.

 1) Relative Paths

 It is common practice for users of Maven to specify relative paths in the
 POM, not to mention that the Super POM does so, too. The intention is almost
 always to resolve such relative paths against the base directory of the
 current project. In other words, the paths
   target/classes
 and
   ${basedir}/target/classes
 should resolve to the same directory for a given POM.

 Unfortunately, the class java.io.File does not resolve relative paths
 against the project's base directory. As mentioned in its class javadoc [0],
 it resolves relative paths against the current working directory. In plain
 English: Unless a Maven component has complete control over the current
 working directory, any usage of java.io.File in combination with a relative
 path is a bug.

 At first glance, one might be tempted to argue that the project base
 directory is equal to the current working directory. However, this
 assumption is generally not true. Consider the following scenarios:

 a) Reactor Builds
 When a child module is build during a reactor build, the current working is
 usually the base directory of the parent project, not the base directory of
 the current module. That is the most common scenario where users are faced
 with the bug.

 b) Embedded Maven Invocations
 Other tools, most prominently IDEs, that run Maven under the hood may have
 set the current working directory to their installation folder or whatever
 they like.

 c) Maven Invocations using the -f switch
 While it is surely an uncommon use-case, the user is free to invoke Maven
 from an arbitrary working directory by specifying an absolute path like
   mvn -f /home/me/projects/demo/pom.xml

 In order to guarantee reliable builds, Maven and its plugins must manually
 resolve relative paths against the project's base directory. A simple idiom
 like the following should do just fine:

   File file = new File( path );
   if ( !file.isAbsolute() )
   {
 file = new File( project.getBasedir(), file );
   }

 Many Maven plugins can get this resolution automatically if they declare
 their affected mojo parameters of type java.io.File instead of
 java.lang.String. This subtle difference in parameter types will trigger a
 feature that seems to be known as path translation, i.e. Maven itself will
 properly resolve relative paths when it pumps the XML configuration into a
 mojo.

 While it will not cure this problem completely, I suggest to review the
 MavenProject API such that it ensures the following invariant: All paths
 reported by means of a getter-like method are absolute. This basically
 requires to resolve all relative paths that are pushed in by a setter-like
 method. MavenProject.addCompileSourceRoot() is just one example for a public
 API method that currently blindly accepts a relative path, giving rise to
 later failures because a misbehaving component resolves this path not
 properly.

 2) Resource Bundle Families

 Especially reporting plugins employ resource bundles to support
 internationalization. One language (usually English) is provided as the
 fallback/default language in the base resource bundle. Due to the lookup
 strategy performed by ResourceBundle.getBundle() [1], one must always
 provide a dedicated resource bundle for this default language, too. This
 bundle can be empty because it inherits the strings via the parent chain
 from the base bundle, but it must exist.

 Take the following example. A report mojo uses a resource bundle family with
 the base name mymojo-report and erroneously provides only the following
 bundles:
 - mymojo-report.properties  (base bundle for English)
 - mymojo-report_de.properties   (specific bundle for German)
 Further assume that the default locale of the current JVM is de. Now, when
 ResourceBundle.getBundle() is called to retrieve the bundle for locale en,
 it will try the following candidate bundles and use the first one found:
 - mymojo-report_en.properties   (candidate for requested locale)
 - mymojo-report_de.properties   (candidate for default locale)
 - mymojo-report.properties  (base bundle, always last)
 Because mymojo-report_en.properties does not exist, the bundle
 mymojo-report_de.properties will be chosen instead of the base bundle
 which would have provided