JiBX does not generate Java from XSD files with correct encoding
-----------------------------------------------------------------

                 Key: JIBX-434
                 URL: http://jira.codehaus.org/browse/JIBX-434
             Project: JiBX
          Issue Type: Bug
          Components: CodeGen
    Affects Versions: JiBX 1.2.2
         Environment: Windows Server 2008 R2
            Reporter: Christian Callsen


Per email conversation (see bottom for request to file bug here):
---
Hello there,

I have several XSD files with danish national characters in it that I include 
in my own XSD file (here's one of them: 
http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/DKCC_CountryIdentificationCode.xsd).
 

When the JiBX CodeGen tool v1.2.2 runs on it (from maven via maven-jibx-plugin 
v1.2.2) the output file is not UTF-8 even though I've set 
project.build.outputEncoding to UTF-8. I've checked the source, and the 
SourceBuilder.java line 325 seems suspicious:
                 FileWriter fwrit = new FileWriter(file);

According to the maven plugin guide, one needs to be careful around FileWriter 
instantiation and file encodings. Would it be possible to respect the 
maven-requested encoding or by adding a flag to control the encoding written to 
the file?

I've tried setting file.encoding and project.build.sourceEncoding in pom.xml, 
and supplied -Dfile.encoding=UTF-8. No luck. I've tried turning off javadoc and 
annotations via:
              <show-schema>false</show-schema>
              <delete-annotations>false</delete-annotations>
in pom.xml. Still no luck.

Any pointers/workarounds?
---
Hi Christian,

Don is handling all the Maven issue for JiBX, so hopefully he can comment on 
this.

Sorry for the delay in responding to this. In general, it's best to ask this 
type of question on the JiBX users list (or enter a Jira bug report).

  - Dennis
---
Christian,

I tried generating code for this schema and all the special characters look 
fine. I have attached the generated java source files.

You may want to check your default java encoding. Here is a great article:
http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding

If you can supply a schema definition that is not generating code correctly, 
that would be a great help.

Don
---
Hey Don,

My problem could be that I'm running this under Windows Server 2008, in CP1252 
locale. Which locale and OS did you try out the test on? I'm attaching a ZIP of 
the generated sources I get. Notice the generated class 
_CountryIdentificationSchemeType.java - there are comments that contain danish 
national characters, but the file's not in UTF-8 encoding :-( according to 
Notepad++ (it says ANSI).

Below's a snippet from pom.xml, showing you how I invoke the maven-jibx-plugin. 
                        <plugin>
                                <groupId>org.jibx</groupId>
                                <artifactId>maven-jibx-plugin</artifactId>
                                <version>${jibx.plugin.version}</version>
                                <configuration>
                                        
<directory>target/generated-sources/src/main/java/</directory>
                                        <includes>
                                                <include>binding.xml</include>
                                        </includes>
                                        <load>true</load>
                                        <verbose>false</verbose>
                                </configuration>
                                <executions>
                                        <execution>
                                                
<id>generate-java-code-from-schema</id>
                                                <phase>generate-sources</phase>
                                                <goals>
                                                        
<goal>schema-codegen</goal>
                                                </goals>
                                                <configuration>
                                                        
<targetDirectory>${generated.sources.directory}</targetDirectory>
                                                        
<directory>${webapp.directory}/WEB-INF/xsd</directory>
                                                        <verbose>true</verbose>
                                                        <includes>
                                                                
<include>file.xsd</include>
                                                        </includes>
                                                        <options>
                                                                
<package>test</package>
                                                                
<show-schema>false</show-schema>
                                                                
<delete-annotations>false</delete-annotations>
                                                        </options>
                                                </configuration>
                                        </execution>
                                        <execution>
                                                <id>compile-binding</id>
                                                <phase>process-classes</phase>
                                                <goals>
                                                        <goal>bind</goal>
                                                </goals>
                                        </execution>
                                </executions>
                        </plugin>

The "file.xsd" file is a file that includes these files (can't show you the 
entire file, but here are the includes):

        <xsd:import 
namespace="http://rep.oio.dk/cpr.dk/xml/schemas/core/2005/03/18/"; 
schemaLocation="http://rep.oio.dk/cpr.dk/xml/schemas/core/2005/03/18/CPR_PersonCivilRegistrationIdentifier.xsd"/>
        <xsd:import 
namespace="http://rep.oio.dk/itst.dk/xml/schemas/2006/01/17/"; 
schemaLocation="http://rep.oio.dk/itst.dk/xml/schemas/2006/01/17/ITST_PersonNameStructure.xsd"/>
        <xsd:import 
namespace="http://rep.oio.dk/xkom.dk/xml/schemas/2006/01/06/"; 
schemaLocation="http://rep.oio.dk/xkom.dk/xml/schemas/2006/01/06/XKOM_AddressPostal.xsd"/>
        <xsd:import 
namespace="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/"; 
schemaLocation="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/DKCC_BirthDate.xsd"/>
        <xsd:import 
namespace="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/"; 
schemaLocation="http://rep.oio.dk/ebxml/xml/schemas/dkcc/2003/02/13/DKCC_CountryIdentificationCode.xsd"/>

I suppose we could use the fix in the stackoverflow article (great website 
btw), but I'd rather be able to say in my pom.xml that I want output encoding 
UTF8 for generated files. For some odd reason - possibly the idea I pointed out 
below - that does not seem to work.

I've done a different workaround - using native2ascii, a copy step and a 
cleanup step in pom.xml - which turns the entire thing into \uabcd notation. It 
works, but makes my pom quite a bit longer and unwieldy. The maven folks 
suggest not using a FileWriter here: 
http://maven.apache.org/plugin-developers/common-bugs.html and point to issues 
for review 
http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

There's a guy suggesting a better way here: 
http://www.malcolmhardie.com/weblogs/angus/2004/10/23/java-filewriter-xml-and-utf-8/

Notice in his example that the encoding is supplied, so that the maven plugin 
easily could respect the projects source file encoding (if set).

Christian
---
Christian,

I use SUSE linux. UTF-8 must be the default.

You are correct, there should be an option to output the code in UTF-8.

You need to file a enhancement request for JiBX. Unfortunately, I'm not the one 
responsible for the SourceBuilder.java file, I handle the maven-jibx-plugin. 
File a bug at: http://jira.codehaus.org/secure/BrowseProject.jspa?id=10410. 
File it under 'CodeGen' and assign it to Dennis. Say you need a way to specify 
UTF-8 encoding on the source code output. Dennis will have to add a parameter 
to cause UTF-8 output, I can pass this automatically using the 
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> property 
from the maven plugin.

Please include the links you supplied in this email. Especially the one that 
shows how to create an OutputStream with UTF encoding. As you probably know, in 
open source, you will usually get a faster response if you write the correct 
code, test it, and submit it with your request as a patch.

Thanks,

Don
---
So is the issue just with the encoding of the generated source code? That would 
be a problem in the actual generation, rather than anything to do with maven.

Right now there's no way to pass in the character encoding for the generated 
code. It looks like you can tell javac the character encoding on the command 
line, as "javac -encoding utf8 ...". If I set the code generation to always 
output UTF-8, will that work for you?

  - Dennis
---
Hey Dennis,

Yes, always using UTF-8 will work for us, as we're using source encoding = 
UTF-8. 

Btw: JiBX is a serious time saver. I've had to make a few adjustments to the 
WSDL/XSD I'm using, and every time it turned out all I had to do was adjust the 
bindings file. Slightly.

Excellent tool!

Best Regards

Christian
---

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
jibx-devs mailing list
jibx-devs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jibx-devs

Reply via email to