[ 
https://issues.apache.org/jira/browse/TIKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler closed TIKA-532.
----------------------------

    Resolution: Duplicate

As per link, this is a duplicate of [TIKA-394].

> missing spaces in text extraction of BodyContentHandler
> -------------------------------------------------------
>
>                 Key: TIKA-532
>                 URL: https://issues.apache.org/jira/browse/TIKA-532
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Reinhard Schwab
>             Fix For: 0.8
>
>
> BodyContentHandler works fine to extract the text from pages,
> except this page:
> http://www.lucidimagination.com/developers/whitepapers/whats-new-solr-14
> there is a selection,
> the text returned by BodyContentHandler contains
> "...Country: *
>   -- Select a Country -- United 
> StatesCanadaArgentinaAustraliaBrazilChinaFranceGermanyIndiaIndonesiaItalyJapanMexicoRussiaSaudi"
> to have a space between the country names would be favourable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to