[
https://issues.apache.org/jira/browse/DROIDS-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Frovarp updated DROIDS-81:
----------------------------------
Attachment: tika-document-parser.patch
Initial implementation of a Tika document parser that does not HTMLify its
results and does not look for links. This time in patch format with dependency
added.
> Create a document parser that doesn't HTMLify the results.
> ----------------------------------------------------------
>
> Key: DROIDS-81
> URL: https://issues.apache.org/jira/browse/DROIDS-81
> Project: Droids
> Issue Type: Bug
> Components: tika
> Affects Versions: 0.01
> Reporter: Richard Frovarp
> Priority: Minor
> Attachments: tika-document-parser.patch
>
>
> While the TikaHTMLParser can parse pdfs, docs, etc, it returns them in an
> HTMLified format. Solr blows up on that format, and it isn't always necessary
> to do this step anyway.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.