JiBX does not generate Java from XSD files with correct encoding -----------------------------------------------------------------
Key: JIBX-434 URL: http://jira.codehaus.org/browse/JIBX-434 Project: JiBX Issue Type: Bug Components: CodeGen Affects Versions: JiBX 1.2.2 Environment: Windows Server 2008 R2 Reporter: Christian Callsen Per email conversation (see bottom for request to file bug here): --- Hello there, I have several XSD files with danish national characters in it that I include in my own XSD file (here's one of them: http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/DKCC_CountryIdentificationCode.xsd). When the JiBX CodeGen tool v1.2.2 runs on it (from maven via maven-jibx-plugin v1.2.2) the output file is not UTF-8 even though I've set project.build.outputEncoding to UTF-8. I've checked the source, and the SourceBuilder.java line 325 seems suspicious: FileWriter fwrit = new FileWriter(file); According to the maven plugin guide, one needs to be careful around FileWriter instantiation and file encodings. Would it be possible to respect the maven-requested encoding or by adding a flag to control the encoding written to the file? I've tried setting file.encoding and project.build.sourceEncoding in pom.xml, and supplied -Dfile.encoding=UTF-8. No luck. I've tried turning off javadoc and annotations via: <show-schema>false</show-schema> <delete-annotations>false</delete-annotations> in pom.xml. Still no luck. Any pointers/workarounds? --- Hi Christian, Don is handling all the Maven issue for JiBX, so hopefully he can comment on this. Sorry for the delay in responding to this. In general, it's best to ask this type of question on the JiBX users list (or enter a Jira bug report). - Dennis --- Christian, I tried generating code for this schema and all the special characters look fine. I have attached the generated java source files. You may want to check your default java encoding. Here is a great article: http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding If you can supply a schema definition that is not generating code correctly, that would be a great help. Don --- Hey Don, My problem could be that I'm running this under Windows Server 2008, in CP1252 locale. Which locale and OS did you try out the test on? I'm attaching a ZIP of the generated sources I get. Notice the generated class _CountryIdentificationSchemeType.java - there are comments that contain danish national characters, but the file's not in UTF-8 encoding :-( according to Notepad++ (it says ANSI). Below's a snippet from pom.xml, showing you how I invoke the maven-jibx-plugin. <plugin> <groupId>org.jibx</groupId> <artifactId>maven-jibx-plugin</artifactId> <version>${jibx.plugin.version}</version> <configuration> <directory>target/generated-sources/src/main/java/</directory> <includes> <include>binding.xml</include> </includes> <load>true</load> <verbose>false</verbose> </configuration> <executions> <execution> <id>generate-java-code-from-schema</id> <phase>generate-sources</phase> <goals> <goal>schema-codegen</goal> </goals> <configuration> <targetDirectory>${generated.sources.directory}</targetDirectory> <directory>${webapp.directory}/WEB-INF/xsd</directory> <verbose>true</verbose> <includes> <include>file.xsd</include> </includes> <options> <package>test</package> <show-schema>false</show-schema> <delete-annotations>false</delete-annotations> </options> </configuration> </execution> <execution> <id>compile-binding</id> <phase>process-classes</phase> <goals> <goal>bind</goal> </goals> </execution> </executions> </plugin> The "file.xsd" file is a file that includes these files (can't show you the entire file, but here are the includes): <xsd:import namespace="http://rep.oio.dk/cpr.dk/xml/schemas/core/2005/03/18/" schemaLocation="http://rep.oio.dk/cpr.dk/xml/schemas/core/2005/03/18/CPR_PersonCivilRegistrationIdentifier.xsd"/> <xsd:import namespace="http://rep.oio.dk/itst.dk/xml/schemas/2006/01/17/" schemaLocation="http://rep.oio.dk/itst.dk/xml/schemas/2006/01/17/ITST_PersonNameStructure.xsd"/> <xsd:import namespace="http://rep.oio.dk/xkom.dk/xml/schemas/2006/01/06/" schemaLocation="http://rep.oio.dk/xkom.dk/xml/schemas/2006/01/06/XKOM_AddressPostal.xsd"/> <xsd:import namespace="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/" schemaLocation="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/DKCC_BirthDate.xsd"/> <xsd:import namespace="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/" schemaLocation="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/DKCC_CountryIdentificationCode.xsd"/> I suppose we could use the fix in the stackoverflow article (great website btw), but I'd rather be able to say in my pom.xml that I want output encoding UTF8 for generated files. For some odd reason - possibly the idea I pointed out below - that does not seem to work. I've done a different workaround - using native2ascii, a copy step and a cleanup step in pom.xml - which turns the entire thing into \uabcd notation. It works, but makes my pom quite a bit longer and unwieldy. The maven folks suggest not using a FileWriter here: http://maven.apache.org/plugin-developers/common-bugs.html and point to issues for review http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding There's a guy suggesting a better way here: http://www.malcolmhardie.com/weblogs/angus/2004/10/23/java-filewriter-xml-and-utf-8/ Notice in his example that the encoding is supplied, so that the maven plugin easily could respect the projects source file encoding (if set). Christian --- Christian, I use SUSE linux. UTF-8 must be the default. You are correct, there should be an option to output the code in UTF-8. You need to file a enhancement request for JiBX. Unfortunately, I'm not the one responsible for the SourceBuilder.java file, I handle the maven-jibx-plugin. File a bug at: http://jira.codehaus.org/secure/BrowseProject.jspa?id=10410. File it under 'CodeGen' and assign it to Dennis. Say you need a way to specify UTF-8 encoding on the source code output. Dennis will have to add a parameter to cause UTF-8 output, I can pass this automatically using the <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> property from the maven plugin. Please include the links you supplied in this email. Especially the one that shows how to create an OutputStream with UTF encoding. As you probably know, in open source, you will usually get a faster response if you write the correct code, test it, and submit it with your request as a patch. Thanks, Don --- So is the issue just with the encoding of the generated source code? That would be a problem in the actual generation, rather than anything to do with maven. Right now there's no way to pass in the character encoding for the generated code. It looks like you can tell javac the character encoding on the command line, as "javac -encoding utf8 ...". If I set the code generation to always output UTF-8, will that work for you? - Dennis --- Hey Dennis, Yes, always using UTF-8 will work for us, as we're using source encoding = UTF-8. Btw: JiBX is a serious time saver. I've had to make a few adjustments to the WSDL/XSD I'm using, and every time it turned out all I had to do was adjust the bindings file. Slightly. Excellent tool! Best Regards Christian --- -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf _______________________________________________ jibx-devs mailing list jibx-devs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jibx-devs