Re: Common Bugs
2008/4/7, Benjamin Bentmann [EMAIL PROTECTED]: I wonder if it's worth posting these as a series under the developers section of the Maven site? Vincent and I had already put parts of this stuff onto [0] in a section named Some Pitfalls, together with a link to this mail thread. But I agree, having all of this in a nicely formatted APT doc on the site is a good idea. I suggest we move the Some Pitfalls section out into a standalone document such that we can list it on the documentation index. If nobody else goes for this, it will need to wait some days until I get the next free time slice to merge and clean it up for proper presentation. +1 Vincent Benjamin [0] http://maven.apache.org/guides/plugin/guide-java-plugin-development.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
I wonder if it's worth posting these as a series under the developers section of the Maven site? Vincent and I had already put parts of this stuff onto [0] in a section named Some Pitfalls, together with a link to this mail thread. But I agree, having all of this in a nicely formatted APT doc on the site is a good idea. I suggest we move the Some Pitfalls section out into a standalone document such that we can list it on the documentation index. If nobody else goes for this, it will need to wait some days until I get the next free time slice to merge and clean it up for proper presentation. Benjamin [0] http://maven.apache.org/guides/plugin/guide-java-plugin-development.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
Hi, 6) URLs and Filesystem Paths URLs and filesystem paths are really two different beasts and converting between them is not trivial. The main source of problems is that different encoding rules apply for the strings that make up a URL or filesystem path. For example, consider the following code snippet: File file = new File( foo bar+foo ); URL url = file.toURI().toURL(); System.out.println( file.toURL() ); System.out.println( url ); System.out.println( url.getPath() ); System.out.println( URLDecoder.decode( url.getPath(), UTF-8 ) ); which outputs something like file:/M:/scratch-pad/foo bar+foo file:/M:/scratch-pad/foo%20bar+foo /M:/scratch-pad/foo%20bar+foo /M:/scratch-pad/foo bar foo First of all, please note that File.toURL() [1] does not escape the space character. This yields an invalid URL, as per RFC 2396 [0], section 2.4.3 Excluded US-ASCII Characters. The class java.net.URL will silently accept such invalid URLs, in contrast java.net.URI will not (see also URL.toURI() [2]). For this reason, this API method has already been deprecated and should be replaced with File.toURI().toURL(). Next, URL.getPath() does in general not return a string that can be used as a filesystem path. It returns a substring of the URL and as such can contain escape sequences. The prominent example is the space character which will show up as %20. People sometimes hack around this by means of replace(%20, ) but that does simply not cover all cases. It's worth to mention that on the other hand the related method URI.getPath() [3] does decode escapes but still the result is not a filesystem path (compare the source for the constructor File(URI)). To decode a URL, people sometimes also choose java.net.URLDecoder [4]. The pitfall with this class is that is actually performs HTML form decoding which is yet another encoding and not the same as the URL encoding (compare last paragraph in class javadoc about java.net.URL). For instance, a URLDecoder will errorneously convert the character + into a space as illustrated by the last sysout in the example above. Code targetting JRE 1.4+ should easily avoid these problems by using new File( new URI( url.toString() ) ) when converting a URL to a filesystem path and file.toURI().toURL() when converting back. Regards, Benjamin Bentmann [0] http://www.faqs.org/rfcs/rfc2396.html [1] http://java.sun.com/javase/6/docs/api/java/io/File.html#toURL() [2] http://java.sun.com/javase/6/docs/api/java/net/URL.html#toURI() [3] http://java.sun.com/javase/6/docs/api/java/net/URI.html#getPath() [4] http://java.sun.com/javase/6/docs/api/java/net/URLDecoder.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
new File( new URI( url.toString() ) ) Correction: JRE 1.4 is happily returning invalid/unescaped URLs from ClassLoader.getResource(), making the above suggestion fail with a URISyntaxException. The new suggestion is to use FileUtils.toFile(URL) [0] from Commons IO. A similar methods exists in Plexus Utils but it's currently not decoding escape sequences. Benjamin [0] http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#toFile(java.net.URL) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
Hey Benjamin, I wonder if it's worth posting these as a series under the developers section of the Maven site? - Brett On 06/04/2008, at 9:46 PM, Benjamin Bentmann wrote: new File( new URI( url.toString() ) ) Correction: JRE 1.4 is happily returning invalid/unescaped URLs from ClassLoader.getResource(), making the above suggestion fail with a URISyntaxException. The new suggestion is to use FileUtils.toFile(URL) [0] from Commons IO. A similar methods exists in Plexus Utils but it's currently not decoding escape sequences. Benjamin [0] http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#toFile(java.net.URL) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Brett Porter [EMAIL PROTECTED] http://blogs.exist.com/bporter/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
On Mon, Apr 7, 2008 at 6:21 AM, Brett Porter [EMAIL PROTECTED] wrote: Hey Benjamin, I wonder if it's worth posting these as a series under the developers section of the Maven site? And building custom Checsktyle/PMD rules (if built in ones dont exist) to flag these as error? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
Hi, 5) Reading and Writing Text Files Textual content is composed of characters while file systems merely store byte streams. A file encoding (aka charset) is used to convert between bytes and characters. The challenge is using the right file encoding... The JVM has this notion of a default encoding (file.encoding property) which it derives from a system's locale or whatever. While this might be a convenient feature sometimes, using this default encoding for a project build is in general a bad idea: The build output will depend on the machine/developer who runs the build. As such, usage of the default encoding threatens the dream of a reproducible build. For example, if developer A has UTF-8 as default encoding while developer B uses ISO-8859-1, text files are very likely to get messed up during resource filtering or similar tasks. Therefore, plugin developers should avoid any direct or indirect usage of the classes FileWriter and FileReader. Instead, OutputStreamWriter/-Reader should be used with an explicit encoding value. The required encoding value can be obtained from a configuration parameter like this: /* * @parameter default-value=ISO-8859-1 */ private String encoding; Providing a default value resembles the JVM's concept of a default encoding with an important difference: This time, the default is specified by the plugin and hence controlled by the POM. This way, all builds from the same POM can be guranteed to use the same encoding regardless of the JVM executing Maven. Plugins that already provide means to specify the encoding should make sure they have a default value for this parameter. This is to follow Maven's philosophy of convention over configuration where users should get the best practice out-of-the-box when a reasonable default can be assumed. Handling XML files is a little different because these files are equipped with an encoding declaration. Thanks to Herve Boutemy, plexus-utils provides a convenient Reader-/WriterFactory for the magic of auto-detecting the encoding from a byte stream (see also [0]). When writing XML files without the XmlStreamWriter, be sure to ensure the encoding used for the output writer matches the encoding specified by the XML declaration being written. Otherwise later parsing the output might fail. Regards, Benjamin Bentmann [0] http://docs.codehaus.org/display/MAVENUSER/XML+encoding - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
Hi again, 4) Effective Output Directory for Report Plugins Most reporting plugins will inherit from AbstractMavenReport. In doing so, they need to implement the inherited but abstract method getOutputDirectory(). To implement this method, plugins usually declare a field named outputDirectory which they return in the method. Nothing wrong so far, but please read on. Some plugins need to create additional files in the report output directory that accompany the report generated via the sink. While it is tempting to use either the method getOutputDirectory() or the field outputDirectory directly in order to setup a path for the output files, this leads most likely to a bug. More precisely, those plugins will not properly output files when run by the Maven Site Plugin as part of the site lifecycle. This is best noticed when the output directory for the site is configured directly in the Maven Site Plugin such that it deviates from the expression ${project.reporting.outputDirectory} that the plugins use. Multi-language site generation is another scenario to exploit the bug. The reason for this: During the site lifecycle, the Maven Site Plugin takes control of the output directory, that's why all reports (ideally) end up in the same directory. In order to do, it invokes MavenReport.setReportOutputDirectory() on each report. Reports are expected to use the supplied path and not a value from the POM configuration of the corresponding reporting plugin. As the take home message: If one needs to know the effective output directory for the report, MavenReport.getReportOutputDirectory() is your friend. And what about AbstractMavenReport.getOutputDirectory()? Well, the implementation of getReportOutputDirectory() in AbstractMavenReport will fallback to this method if the Maven Site Plugin did not previously set the output directory (see also [0]). Regards, Benjamin Bentmann [0] http://jira.codehaus.org/browse/MNG-3369 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
Greetings, I remember another subtle issue which I would like to make people aware of. 3) Case-Insensitive String Comparison When developers need to compare strings without regard to case or want to realize a map with case-insensitive string keys, they often employ String.toLowerCase() or String.toUpperCase() to create a normalized string before doing a simple String.equals(). Now, the to*Case() methods are overloaded: One takes no arguments and one take a Locale object. The gotcha with the arg-less methods is that their output depends on the default locale of the JVM but the default locale is out of control of the developer (see also [0], [1]). That means the string expected by the developer (who runs/tests his code in a JVM using locale xy) does not necessarily match the string seen by another user (that runs a JVM with locale ab). For example, the comparison info.equals(debugLevel.toLowerCase()) is likely to fail for systems with default locale Turkish. Since developers usually want to compare strings from the English language, they must use String.to*Case(Locale.ENGLISH) to get reliable results regardless of the end-user's default locale. Just to make the picture complete: String.to*Case() is locale-sensitive and context-aware. In contrast, Character.to*Case() and String.equalsIgnoreCase() (which relies on Character.to*Case()) are neither locale-sensitive nor context-aware [2]. For instance ΣΣ.toLowerCase(Locale.ENGLISH).equals(σσ) is false while ΣΣ.equalsIgnoreCase(σσ) is true (because the lower case form of ΣΣ is σς). Regards, Benjamin Bentmann [0] http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase() [1] http://cafe.elharo.com/blogroll/turkish/ [2] http://java.sun.com/javase/6/docs/api/java/lang/Character.html#toLowerCase(char) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Common Bugs
Just an observation, but would it not be a good idea to codify these heuristics in a plugin and add them to the plugins pom. For example, PMD or some other plugin might generate violations for these. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Common Bugs
That sounds like a great idea :) - Brett On 15/01/2008, at 12:35 AM, Timothy Reilly wrote: Just an observation, but would it not be a good idea to codify these heuristics in a plugin and add them to the plugins pom. For example, PMD or some other plugin might generate violations for these. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Common Bugs
Dear developers, After some months of reporting issues for Maven, I have noticed common bugs in various plugins. For me it seems that many developers are either completely unaware of the problems (i.e. do not handle them at all) or are misunderstanding them (i.e. handle them insufficiently). Therefore, I would like to illustrate the source of those problems in more detail such that developers can start to prevent bugs right from the beginning rather than fixing them afterwards. 1) Relative Paths It is common practice for users of Maven to specify relative paths in the POM, not to mention that the Super POM does so, too. The intention is almost always to resolve such relative paths against the base directory of the current project. In other words, the paths target/classes and ${basedir}/target/classes should resolve to the same directory for a given POM. Unfortunately, the class java.io.File does not resolve relative paths against the project's base directory. As mentioned in its class javadoc [0], it resolves relative paths against the current working directory. In plain English: Unless a Maven component has complete control over the current working directory, any usage of java.io.File in combination with a relative path is a bug. At first glance, one might be tempted to argue that the project base directory is equal to the current working directory. However, this assumption is generally not true. Consider the following scenarios: a) Reactor Builds When a child module is build during a reactor build, the current working is usually the base directory of the parent project, not the base directory of the current module. That is the most common scenario where users are faced with the bug. b) Embedded Maven Invocations Other tools, most prominently IDEs, that run Maven under the hood may have set the current working directory to their installation folder or whatever they like. c) Maven Invocations using the -f switch While it is surely an uncommon use-case, the user is free to invoke Maven from an arbitrary working directory by specifying an absolute path like mvn -f /home/me/projects/demo/pom.xml In order to guarantee reliable builds, Maven and its plugins must manually resolve relative paths against the project's base directory. A simple idiom like the following should do just fine: File file = new File( path ); if ( !file.isAbsolute() ) { file = new File( project.getBasedir(), file ); } Many Maven plugins can get this resolution automatically if they declare their affected mojo parameters of type java.io.File instead of java.lang.String. This subtle difference in parameter types will trigger a feature that seems to be known as path translation, i.e. Maven itself will properly resolve relative paths when it pumps the XML configuration into a mojo. While it will not cure this problem completely, I suggest to review the MavenProject API such that it ensures the following invariant: All paths reported by means of a getter-like method are absolute. This basically requires to resolve all relative paths that are pushed in by a setter-like method. MavenProject.addCompileSourceRoot() is just one example for a public API method that currently blindly accepts a relative path, giving rise to later failures because a misbehaving component resolves this path not properly. 2) Resource Bundle Families Especially reporting plugins employ resource bundles to support internationalization. One language (usually English) is provided as the fallback/default language in the base resource bundle. Due to the lookup strategy performed by ResourceBundle.getBundle() [1], one must always provide a dedicated resource bundle for this default language, too. This bundle can be empty because it inherits the strings via the parent chain from the base bundle, but it must exist. Take the following example. A report mojo uses a resource bundle family with the base name mymojo-report and erroneously provides only the following bundles: - mymojo-report.properties (base bundle for English) - mymojo-report_de.properties (specific bundle for German) Further assume that the default locale of the current JVM is de. Now, when ResourceBundle.getBundle() is called to retrieve the bundle for locale en, it will try the following candidate bundles and use the first one found: - mymojo-report_en.properties (candidate for requested locale) - mymojo-report_de.properties (candidate for default locale) - mymojo-report.properties (base bundle, always last) Because mymojo-report_en.properties does not exist, the bundle mymojo-report_de.properties will be chosen instead of the base bundle which would have provided the requested English translation. Plugin developers can easily expose this bug when calling mvn site -Dlocales=xy,en where xy denotes some other language supported by the plugin. Specifying xy as the first locale will have the Maven Site Plugin change the JVM's default locale to xy which in turn causes
Re: Common Bugs
+1 on that. for embedded use in IDEs there's additional pitfalls. 1. use of System.getProperties() 2. java.lang.Runtime's shutdown hooks. Milos On Jan 13, 2008 10:21 AM, Benjamin Bentmann [EMAIL PROTECTED] wrote: Dear developers, After some months of reporting issues for Maven, I have noticed common bugs in various plugins. For me it seems that many developers are either completely unaware of the problems (i.e. do not handle them at all) or are misunderstanding them (i.e. handle them insufficiently). Therefore, I would like to illustrate the source of those problems in more detail such that developers can start to prevent bugs right from the beginning rather than fixing them afterwards. 1) Relative Paths It is common practice for users of Maven to specify relative paths in the POM, not to mention that the Super POM does so, too. The intention is almost always to resolve such relative paths against the base directory of the current project. In other words, the paths target/classes and ${basedir}/target/classes should resolve to the same directory for a given POM. Unfortunately, the class java.io.File does not resolve relative paths against the project's base directory. As mentioned in its class javadoc [0], it resolves relative paths against the current working directory. In plain English: Unless a Maven component has complete control over the current working directory, any usage of java.io.File in combination with a relative path is a bug. At first glance, one might be tempted to argue that the project base directory is equal to the current working directory. However, this assumption is generally not true. Consider the following scenarios: a) Reactor Builds When a child module is build during a reactor build, the current working is usually the base directory of the parent project, not the base directory of the current module. That is the most common scenario where users are faced with the bug. b) Embedded Maven Invocations Other tools, most prominently IDEs, that run Maven under the hood may have set the current working directory to their installation folder or whatever they like. c) Maven Invocations using the -f switch While it is surely an uncommon use-case, the user is free to invoke Maven from an arbitrary working directory by specifying an absolute path like mvn -f /home/me/projects/demo/pom.xml In order to guarantee reliable builds, Maven and its plugins must manually resolve relative paths against the project's base directory. A simple idiom like the following should do just fine: File file = new File( path ); if ( !file.isAbsolute() ) { file = new File( project.getBasedir(), file ); } Many Maven plugins can get this resolution automatically if they declare their affected mojo parameters of type java.io.File instead of java.lang.String. This subtle difference in parameter types will trigger a feature that seems to be known as path translation, i.e. Maven itself will properly resolve relative paths when it pumps the XML configuration into a mojo. While it will not cure this problem completely, I suggest to review the MavenProject API such that it ensures the following invariant: All paths reported by means of a getter-like method are absolute. This basically requires to resolve all relative paths that are pushed in by a setter-like method. MavenProject.addCompileSourceRoot() is just one example for a public API method that currently blindly accepts a relative path, giving rise to later failures because a misbehaving component resolves this path not properly. 2) Resource Bundle Families Especially reporting plugins employ resource bundles to support internationalization. One language (usually English) is provided as the fallback/default language in the base resource bundle. Due to the lookup strategy performed by ResourceBundle.getBundle() [1], one must always provide a dedicated resource bundle for this default language, too. This bundle can be empty because it inherits the strings via the parent chain from the base bundle, but it must exist. Take the following example. A report mojo uses a resource bundle family with the base name mymojo-report and erroneously provides only the following bundles: - mymojo-report.properties (base bundle for English) - mymojo-report_de.properties (specific bundle for German) Further assume that the default locale of the current JVM is de. Now, when ResourceBundle.getBundle() is called to retrieve the bundle for locale en, it will try the following candidate bundles and use the first one found: - mymojo-report_en.properties (candidate for requested locale) - mymojo-report_de.properties (candidate for default locale) - mymojo-report.properties (base bundle, always last) Because mymojo-report_en.properties does not exist, the bundle mymojo-report_de.properties will be chosen instead of the base bundle which would have provided