[ 
https://issues.apache.org/jira/browse/SOLR-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160036#comment-13160036
 ] 

Robert Muir commented on SOLR-2930:
-----------------------------------

my bad, i confused this bug with the pdfbox 'character deletion' 
one (TIKA-767), thats still unfortunately not in tika 1.0 it seems.

                
> Allow controlling an important PDF processing parameter in Tika that splits 
> the words in text and is now suppored in version 1.0 of Tika.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2930
>                 URL: https://issues.apache.org/jira/browse/SOLR-2930
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 3.5
>            Reporter: Ravish Bhagdev
>              Labels: pdf, text-splitting, tika,
>
> Tika 1.0 has fixed a major issue with processing and parsing of PDF files 
> that was splitting the words incorrectly: 
> https://issues.apache.org/jira/browse/TIKA-724
> This causes text to be indexed incorrectly in solr and it becomes specially 
> visible when using spellcheck features etc.  
> They have added a special parameter set using setEnableAutoSpace that fixes 
> the problem but there is currently no way of setting this when using Solr.  
> As discussed in thread on above issue, it would be nice if we could control 
> this (and in future other) parameter via Solr configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to