David Pilato created TIKA-2227:
----------------------------------

             Summary: Replacement of MSOffice#KEYWORDS for RTF and ODT docs
                 Key: TIKA-2227
                 URL: https://issues.apache.org/jira/browse/TIKA-2227
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.14
            Reporter: David Pilato
            Priority: Minor


I'm trying to extract metadata from different type of documents.

I'm using for that {{metadata.get(MSOffice.KEYWORDS)}} but it's marked as 
{{Deprecated}} by {{Office}} class.

So I changed my code to use now {{metadata.get(Office.KEYWORDS)}} instead.

It does not work for 2 types of docs: 

* RTF: 
https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.rtf
* ODT: 
https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.odt

It seems that RTF and ODT keywords are extracted to a {{"Keyword"}} metadata 
name although they should probably be generated to {{"meta:keyword"}}.

You can reuse if needed the documents I linked to here as test case if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to