[
https://issues.apache.org/jira/browse/CLEREZZA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915021#action_12915021
]
Davide Palmisano commented on CLEREZZA-182:
-------------------------------------------
Dear Tommaso,
In the attached patch[1] (taken from
/trunk/org.apache.clerezza.parent/org.apache.clerezza.uima/org.apache.clerezza.uima.metadata-generator)
you can find an attempt to integrate Apache Tika 0.7 implementing the
MediaTypeTextExtractor interface. My modifies foresee:
1) tika dependency added to the pom.xml
2) two tests (one for my implementation, TikaTextExtractor, and one for your
PlainTextExtractor class)
3) some added javadocs on the MediaTypeTextExtractor interface.
4) a couple of new constructors for the UnsupportedMediaTypeException exception.
let me know if it fits your needs.
Davide
[1] CLEREZZA-182.patch
> Integrate Apache Tika inside Apache Clerezza
> --------------------------------------------
>
> Key: CLEREZZA-182
> URL: https://issues.apache.org/jira/browse/CLEREZZA-182
> Project: Clerezza
> Issue Type: New Feature
> Reporter: Tommaso Teofili
> Attachments: CLEREZZA-182.patch
>
>
> Apache Tika is a toolkit for detecting and extracting metadata and structured
> text content from various documents using existing parser libraries and it
> would be nice to have it integrated inside Apache Clerezza so that Resources
> could be easily enriched and auto-tagged with Metadata once inside Clerezza
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.