[ https://issues.apache.org/jira/browse/TIKA-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443826#comment-13443826 ]
Kostya Gribov commented on TIKA-895: ------------------------------------ Still actual for tika 1.2. Fix was in TIKA-725 (by [~jukkaz]). > Empty title element makes Tika-generated HTML documents not open > ---------------------------------------------------------------- > > Key: TIKA-895 > URL: https://issues.apache.org/jira/browse/TIKA-895 > Project: Tika > Issue Type: Bug > Components: metadata > Affects Versions: 1.1 > Environment: Windows 7 > Reporter: Benoit MAGGI > Priority: Trivial > Labels: newbie > > I try to transform an empty docx to an html file. > Ex : java -jar tika-app-1.1.jar -x example.docx > t.html > The html file can't be open with Firefox,Internet Explorer and Chrome. > The main point is that <title/> seems to be forbiden by html specification > (can't get the point on html5) > bq. http://www.w3.org/TR/html401/struct/global.html#h-7.4.2 > bq. 7.4.2 The TITLE element > bq. <!-- The TITLE element is not considered part of the flow of text. > bq. It should be displayed, for example as the page header or > bq. window title. Exactly one title is required per document. > bq. --> > bq. <!ELEMENT TITLE > <http://www.w3.org/TR/html401/struct/global.html#edef-TITLE> - - (#PCDATA) > -(%head.misc; > bq. <http://www.w3.org/TR/html401/sgml/dtd.html#head.misc> ) -- document > title --> > bq. <!ATTLIST TITLE %i18n <http://www.w3.org/TR/html401/sgml/dtd.html#i18n> > > bq. *Start tag: required, End tag: required* > For information there was the same bug with xls > https://issues.apache.org/jira/browse/TIKA-725 > The simple solution should be to provide an empty title by default -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira