Re: Fixing the problem of TIKA-895 and TIKA-914</span></a></span> </h1> <p class="darkgray font13"> <span class="sender pipe"><a href="/search?l=dev@tika.apache.org&q=from:%22Jukka+Zitting%22" rel="nofollow"><span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name">Jukka Zitting</span></span></a></span> <span class="date"><a href="/search?l=dev@tika.apache.org&q=date:20120716" rel="nofollow">Mon, 16 Jul 2012 04:35:37 -0700</a></span> </p> </div> <div itemprop="articleBody" class="msgBody"> <!--X-Body-of-Message--> <pre>Hi, On Sun, Jul 15, 2012 at 5:29 PM, John M <jfm.apa...@gmail.com> wrote: > So, is it a bug in the SAX library: that the line > "super.characters(new char[0], 0, 0);" in the XHTMLContentHandler > should work (but doesn't)?</pre><pre> Yes, or the SAX library you're using could treat that as a feature (automatically ignoring empty content). What's the SAX library you're using to serialize the output from Tika? You may also want to try the ToXMLContentHandler class in o.a.t.sax. It can serialize SAX events and doesn't suffer from this problem. BR, Jukka Zitting </pre> </div> <div class="msgButtons margintopdouble"> <ul class="overflow"> <li class="msgButtonItems"><a class="button buttonleft " accesskey="p" href="msg04845.html">Previous message</a></li> <li class="msgButtonItems textaligncenter"><a class="button" accesskey="c" href="thrd15.html#04849">View by thread</a></li> <li class="msgButtonItems textaligncenter"><a class="button" accesskey="i" href="mail15.html#04849">View by date</a></li> <li class="msgButtonItems textalignright"><a class="button buttonright " accesskey="n" href="msg04852.html">Next message</a></li> </ul> </div> <a name="tslice"></a> <div class="tSliceList margintopdouble"> <ul class="icons monospace"> <li class="icons-email"><span class="subject"><a href="msg04843.html">Fixing the <title/> problem of TIKA-895 and TIKA-914</a></span> <span class="sender italic">John M</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg04844.html">Re: Fixing the <title/> problem of TIKA-895 and T...</a></span> <span class="sender italic">Jukka Zitting</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg04845.html">Re: Fixing the <title/> problem of TIKA-895 a...</a></span> <span class="sender italic">John M</span></li> <li><ul> <li class="icons-email tSliceCur"><span class="subject">Re: Fixing the <title/> problem of TIKA-8...</span> <span class="sender italic">Jukka Zitting</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg04852.html">Re: Fixing the <title/> problem of TI...</a></span> <span class="sender italic">John M</span></li> </ul> </ul> </ul> </ul> </ul> </div> <div class="overflow msgActions margintopdouble"> <div class="msgReply" > <h2> Reply via email to </h2> <form method="POST" action="/mailto.php"> <input type="hidden" name="subject" value="Re: Fixing the <title/> problem of TIKA-895 and TIKA-914"> <input type="hidden" name="msgid" value="CAOFYJNbmKLja-ZfewiRfn6YNV5Ktdhp-xXWrQHi-DgD88qgnrQ@mail.gmail.com"> <input type="hidden" name="relpath" value="dev@tika.apache.org/msg04849.html"> <input type="submit" value=" Jukka Zitting "> </form> </div> </div> </div> <div class="aside" role="complementary"> <div class="logo"> <a href="/"><img src="/logo.png" width=247 height=88 alt="The Mail Archive"></a> </div> <form class="overflow" action="/search" method="get"> <input type="hidden" name="l" value="dev@tika.apache.org"> <label class="hidden" for="q">Search the site</label> <input class="submittext" type="text" id="q" name="q" placeholder="Search dev"> <input class="submitbutton" name="submit" type="image" src="/submit.png" alt="Submit"> </form> <div class="nav margintop" id="nav" role="navigation"> <ul class="icons font16"> <li class="icons-home"><a href="/">The Mail Archive home</a></li> <li class="icons-list"><a href="/dev@tika.apache.org/">dev - all messages</a></li> <li class="icons-about"><a href="/dev@tika.apache.org/info.html">dev - about the list</a></li> <li class="icons-expand"><a href="/search?l=dev@tika.apache.org&q=subject:%22Re%5C%3A+Fixing+the+%3Ctitle%5C%2F%3E+problem+of+TIKA%5C-895+and+TIKA%5C-914%22&o=newest&f=1" title="e" id="e">Expand</a></li> <li class="icons-prev"><a href="msg04845.html" title="p">Previous message</a></li> <li class="icons-next"><a href="msg04852.html" title="n">Next message</a></li> </ul> </div> <div class="listlogo margintopdouble"> </div> <div class="margintopdouble"> </div> </div> </div> <div class="footer" role="contentinfo"> <ul> <li><a href="/">The Mail Archive home</a></li> <li><a href="/faq.html#newlist">Add your mailing list</a></li> <li><a href="/faq.html">FAQ</a></li> <li><a href="/faq.html#support">Support</a></li> <li><a href="/faq.html#privacy">Privacy</a></li> <li class="darkgray">CAOFYJNbmKLja-ZfewiRfn6YNV5Ktdhp-xXWrQHi-DgD88qgnrQ@mail.gmail.com</li> </ul> </div> </body> </html> <script>(function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9cd40cfcc97e9318',t:'MTc3MDk4MjgyNA=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();</script>