https://bz.apache.org/bugzilla/show_bug.cgi?id=60975

            Bug ID: 60975
           Summary: Error converting doc with excel correspondence to html
           Product: POI
           Version: 3.16-dev
          Hardware: PC
            Status: NEW
          Severity: critical
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: ricardo.martin.aguirre.sanc...@everis.com
  Target Milestone: ---

Hi,
In this case I am trying to convert a .doc document into an html,
The particular thing is that the document word is product of making a
correspondence with data in a table of excel, that is to say, from word I use
the option of "correspondence" which allows me to bring values ​​of some excel
table, when this happens, in the Word words are brought to perfection, but word
internally adds them MERGEFIELD {FIELD} VALUE.
The problem is that if to these words or sentences that I have in the word I
add an ENTER, when I try to convert to an html by means of
wordToHtmlConverter.processDocument (doc), this duplicates the words that are
after ENTER.
Example:
In the .doc document:
Phrase brought from
Excel

After the processDocument method:
Phrase brought from excel
Excel

processDocument->AbstractWordConverter->org.apache.poi.hwpf.converter->poi-scratchpad-3.8-beta4.jar

As a test to rule out that it is a problem that was solved with the future
versions, what I did was to update one by one each version until the last 3.16,
but the bug persists.

My code:

FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath()); 
            HWPFDocument doc=new HWPFDocument(finStream);
            WordExtractor wordExtract=new WordExtractor(doc);
            Document newDocument = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().newDocument();
            WordToHtmlConverter wordToHtmlConverter = new
WordToHtmlConverter(newDocument) ;
            wordToHtmlConverter.processDocument(doc);

            StringWriter stringWriter = new StringWriter();
            Transformer transformer =
TransformerFactory.newInstance().newTransformer();

            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
            transformer.setOutputProperty(OutputKeys.METHOD, "html");
            transformer.transform(new DOMSource(
wordToHtmlConverter.getDocument()), new StreamResult( stringWriter ) );

            String html = stringWriter.toString();

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to