I need to convert doc/docx into html. I was able to convert doc into html using Apache poi. But I am unable to convert docx to html. Some suggest me to use XWPFWordExtractorDecorator class which convert docx to html. I was able to reuse XWPFWordExtractorDecorator class. But it is just giving me simple text. How to get the HTML? Here is what I did so far,
public class XWPFWordExtractorDecoratorChild extends XWPFWordExtractorDecorator{ public XWPFWordExtractorDecoratorChild(ParseContext context, XWPFWordExtractor extractor) { super(context, extractor); } public void buildHTML(XHTMLContentHandler xhtml) throws SAXException, XmlException, IOException { this.buildXHTML(xhtml); } } ParseContext p = new ParseContext(); XWPFDocument doc = new XWPFDocument(stream); XWPFWordExtractor ex = new XWPFWordExtractor(doc); XWPFWordExtractorDecoratorChild dec = new XWPFWordExtractorDecoratorChild(p, ex); StringWriter writer = new StringWriter(); Metadata meta = new Metadata(); XHTMLContentHandler h = new XHTMLContentHandler(new BodyContentHandler(writer), meta); dec.buildHTML(h); String s= writer.toString(); Any help to the to convert doc/docx into Html with style is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Convert-Doc-or-Docx-File-to-HTML-tp3697301p3697301.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.