missing spaces in text extraction of BodyContentHandler
-------------------------------------------------------
Key: TIKA-532
URL: https://issues.apache.org/jira/browse/TIKA-532
Project: Tika
Issue Type: Bug
Affects Versions: 0.8
Reporter: Reinhard Schwab
Fix For: 0.8
BodyContentHandler works fine to extract the text from pages,
except this page:
http://www.lucidimagination.com/developers/whitepapers/whats-new-solr-14
there is a selection,
the text returned by BodyContentHandler contains
"...Country: *
-- Select a Country -- United
StatesCanadaArgentinaAustraliaBrazilChinaFranceGermanyIndiaIndonesiaItalyJapanMexicoRussiaSaudi"
to have a space between the country names would be favourable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.