[ 
https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Attachment: SOLR-2901.patch

First patch version.

* Tika 1.0 removes previous deprecations, so this patch changes how the API is 
used in a few places. 
* For MailEntityProcessor we also improve detection by passing part's fileName 
in as MetaData
* For ExtractingDocumentLoader we now provide stream's content type as hint in 
MetaData, but this is not tested extensively..
* Added tests for new languages detected
* Updated eclipse classpath file to point to the new jars. Nothing done for 
other IDEs

One place still uses a deprecated method, that is in ExtractingDocumentLoader 
where we say parser = config.getParser(mediaType) - did not find the new 
equivalent.
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: 
> http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to