[ 
https://jira.duraspace.org/browse/DS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Masár updated DS-1140:
---------------------------

    Fix Version/s:     (was: 3.0)
                   4.0
    
> Update MSWord Media Filter to use Apache POI (like PPT Filter) and also 
> support .docx
> -------------------------------------------------------------------------------------
>
>                 Key: DS-1140
>                 URL: https://jira.duraspace.org/browse/DS-1140
>             Project: DSpace
>          Issue Type: Improvement
>          Components: DSpace API
>            Reporter: Tim Donohue
>            Assignee: Richard Rodgers
>              Labels: has-patch
>             Fix For: 4.0
>
>
> The Microsoft Word Media Filter (org.dspace.app.mediafilter.WordFilter) uses 
> outdated, obsolete third party software, specifically the "text-mining" tools 
> at: http://code.google.com/p/text-mining/
> However, there are now better options out there, especially Apache POI.
> http://poi.apache.org/text-extraction.html
> Apache POI also has the benefit of being able to extract text from docx, xls, 
> xlsx and even Publisher and Visio files.
> We may even be able to create a single "MSFilter" which can just extract doc, 
> docx, ppt, pptx, xls, xlsx, etc. all using POI.
> Any volunteers to implement?  Looks like we should be able to implement it 
> similar to the current PPT Filter 
> (org.dspace.app.mediafilter.PowerPointFilter) which already uses POI.  See 
> also DS-714.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to