[
http://jira.codehaus.org/browse/DOXIA-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Herve Boutemy closed DOXIA-278.
-------------------------------
Assignee: Herve Boutemy
Resolution: Not A Bug
auto-detecting encoding isn't a bullet-proof feature: nobody can assure to
really detect encoding of a byte stream, the better that can be done is a
guess, without any guarantee
XML encoding selection is possible because encoding is written into the XML
document in a [precise
manner|http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing]: here, we have
automatic encoding *selection*. If the stream effective encoding is different
from the encoding in {{<? xml encoding="..."?>}}, there will be broken
characters because the parser is using what is told is the header.
FYI, there has already been a [long discussion in Maven dev
list|http://www.nabble.com/-VOTE--POM-Element-for-Source-File-Encoding-to16515820.html#a16558356]
about this
APT format does not provide such a convention: it's pure text, without encoding
information.
If a convention similar to the XML convention was added.
bq. an APT file starting with {{~~ encoding="xxx"}} should be considered as
being written in the specified encoding
we could implement a text reader using it.
Don't know if such a comment at APT file start is copmpatible with title
headers though...
Last point: the user complaining about encoding problems you show was hitting a
real bug, when encoding wasn't properly handled in Doxia and maven-site-plugin
this is fixed now: see MSITE-314 and [POM Element for Source File
Encoding|http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding]
> Character encoding autodetection fails for APT source files
> -----------------------------------------------------------
>
> Key: DOXIA-278
> URL: http://jira.codehaus.org/browse/DOXIA-278
> Project: Maven Doxia
> Issue Type: Bug
> Components: Module - Apt
> Affects Versions: 1.0-alpha-11
> Environment: Mac OS X 10.5.6, Java 1.6.0_07
> Reporter: Trevor Harmon
> Assignee: Herve Boutemy
> Attachments: HelloWorld.zip
>
>
> Doxia unnecessarily forces all APT source files to be encoded in ISO-8859-1.
> Files encoded in UTF-8 can have garbage characters as a result. Doxia should
> be able to autodetect the encoding of the APT file to prevent this problem,
> as it already does for XML (see DOXIA-133).
> A test case is attached. It includes two APT source files, one encoded in
> ISO-8859-1 and another encoded in UTF-8. Both contain the copyright symbol.
> To reproduce the problem, simply run "mvn site" on the project and open the
> target/site/test-utf8.html and target/site/test-iso-8859-1.html. The file
> encoded with ISO-8859-1 should display the copyright symbol correctly, while
> the one encoded with UTF-8 contains a garbage character immediately before
> the symbol.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira