per http://svn.services.openoffice.org/opengrok/xref/DEV300_m59/sw/source/filter/html/htmlatr.cxx#1013
there seem to be certain rules regarding when a <P> tag is output when using "Save as ..." in Writer (I tried to translate the German here into English -- I need help with that, incidentally): Ein <P> wird nur geschrieben, wenn - wir in keiner OL/UL/DL sind, oder - der Absatz einer OL/UL nicht numeriert ist, oder - keine Styles exportiert werden und - ein unterer Abstand oder - eine Absatz-Ausrichtung existiert, ode - Styles exportiert werden und, - die Textkoerper-Vorlage geaendert wurde, oder - ein Benutzer-Format exportiert wird, oder - Absatz-Attribute existieren A <P> is written only if: - we're not in a list of any kind; or - the paragraph we're in is in an unordered list; or - no Styles are being exported and a (lower distance?) exists or a paragraph-adjustment exists; or - Styles are being exported and the text-body format/style? was changed; or - a User-defined format is being exported; or - paragraph-attributes exist I want to know if I'd need to hack that native code there, in order to get cleaner HTML-output than I'm currently getting from OpenOffice. Incidentally, I've also tried Exporting as XHTML, but the resultant output is even worse than that from "Save as ...": stuff that should not appear in a list does so, etc. I've tweaked the Java-example servlet for document-conversion, so it takes an MS-Word doc as upload and returns (really just the file:/// URL of) an HTML-document. I do like so in my code: // Setting the filter name propertyvalue[1] = new PropertyValue(); propertyvalue[1].Name = "FilterName"; propertyvalue[1].Value = "HTML (StarWriter)"; ... which I believe means, effectively, "Save as ...", rather than "Export", the latter involving a different area of the OpenOffice codebase, if I'm not mistaken. I've seen some documentation on using XSLT to configure or customize the Export process, but, as I've just noted, the Export output seems worse than the output I'm getting (which I believe is from "Save as ..." instead of Export). The problem is that the result (which is, at this point, a resume) comes out looking double-spaced. Also, there are two or three cases of another formatting-issue that seem to have to do with <p>-tags (or divs) within one or another type of HTML-list. So, what's the best way to make the desired improvements in the HTML-output? Should I just do some quick-and-dirty post-processing in my Java-code (which, however, means processing the same file twice, essentially)? Or should I go deep into that native code to try to fix the relevant filter? Or is there a way to use XSLT in this case that I'm missing? -- View this message in context: http://www.nabble.com/Improving-HTML-output-after-import-from-MS-Word-tp25530467p25530467.html Sent from the openoffice - dev mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.org For additional commands, e-mail: dev-h...@openoffice.org