https://bz.apache.org/bugzilla/show_bug.cgi?id=63813
Bug ID: 63813
Summary: Special character (greater than equal) converts to '('
text in word documents
Product: POI
Version: unspecified
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: HWPF
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Version:
POI 4.1.0
I have documents (either 'doc' or 'docx') that have a special character for
'greater than equal' and using codes in 'WordToHtmlConverter', I see those
characters are converted into '('.
I tried with the latest apache poi release 4.1.0.
My java code is:
public class TestWordtoHtmlConverter {
public static void main(String[] args ) {
try {
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new
FileInputStream(args[0]));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
System.out.println(result);
} catch (Exception e) {
}
Is there anyway I can correctly identify these symbols?
In the sample document, I am interested in getting 'bad one'.
Thanks
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]