[ https://issues.apache.org/jira/browse/TIKA-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131036#comment-15131036 ]
Tim Allison edited comment on TIKA-1849 at 2/3/16 8:06 PM: ----------------------------------------------------------- I'm not able to reproduce this in our test suite. To confirm, this works for you: {noformat} String result = new Tika().parseToString( getResourceAsStream("/test-documents/testRTF_metadataSet.rtf")); {noformat} but this doesn't work for you: {noformat} String result = new Tika().parseToString( getResourceAsStream("/test-documents/testRTF_metadataSet.rtf"), new Metadata()); {noformat} Are you using tika-app.jar? How are you opening your inputstream? Are you putting anything in Metadata before sending into parseToString()? My initial thought was that the RTF had two {{Office.PAGE_COUNT}}, and this would cause this error, but I'm not seeing that when I try to reproduce it. I think we should change our code to {{set}} vs. {{add}}, but I'd like to figure out how we are doubly adding that value in your file. Thank you! was (Author: talli...@mitre.org): I'm not able to reproduce this in our test suite. To confirm, this works for you: {noformat} String result = new Tika().parseToString( getResourceAsStream("/test-documents/testRTF_metadataSet.rtf")); {noformat} but this doesn't work for you: {noformat} String result = new Tika().parseToString( getResourceAsStream("/test-documents/testRTF_metadataSet.rtf"), new Metadata()); Are you using tika-app.jar? How are you opening your inputstream? Are you putting anything in Metadata before sending into parseToString()? My initial thought was that the RTF had two {{Office.PAGE_COUNT}}, and this would cause this error, but I'm not seeing that when I try to reproduce it. I think we should change our code to {{set}} vs. {{add}}, but I'd like to figure out how we are doubly adding that value in your file. Thank you! > RTF Exception > ------------- > > Key: TIKA-1849 > URL: https://issues.apache.org/jira/browse/TIKA-1849 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.11 > Reporter: Andrea > > When converting a RTF document with a default new > Tika().parseToString(inputStream, new Metadata()) I get an exception: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@47f4e407 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.Tika.parseToString(Tika.java:496) > at com.expertsystem.experiments.tika.tika_test_dtra.App.main(App.java:48) > Caused by: org.apache.tika.metadata.PropertyTypeException: meta:page-count : > SIMPLE > at org.apache.tika.metadata.Metadata.add(Metadata.java:337) > at > org.apache.tika.parser.rtf.TextExtractor.processControlWord(TextExtractor.java:830) > at > org.apache.tika.parser.rtf.TextExtractor.parseControlWord(TextExtractor.java:562) > at > org.apache.tika.parser.rtf.TextExtractor.parseControlToken(TextExtractor.java:488) > at > org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:450) > at > org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) > at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:86) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 4 more > This doesn't happen if I don't pass the metadata parameter. > As an example: > https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=0ahUKEwjI0_q87NvKAhWIpA4KHdDKAHkQFggzMAM&url=http%3A%2F%2Fidkf.bogor.net%2Fidkf-wireless%2Faplikasi%2Fhukum%2FLEGAL%2520ASPECTS%2520OF%2520SOFTWARE%2520DEVELOPMENT%2520AGREEMENT.rtf&usg=AFQjCNGiyWmdH7NECJoso189tgDbIN5D9g&sig2=kn8qEb2m_ft35-h5Ni7Vgg&bvm=bv.113034660,d.ZWU -- This message was sent by Atlassian JIRA (v6.3.4#6332)