Ok,
I figured it out. 
I manually ran the tika-app --gui and I dropped the rss feed into it.
Here's what the metadata output:

Content-Length: 615913
Content-Type: application/rss+xml
dc:description: This is an IBM C3 Public Files feed generated by a Java
application.
dc:title: IBM - C3 Public Files RSS feed
description: This is an IBM C3 Public Files feed generated by a Java
application.
title: IBM - C3 Public Files RSS feed

that's not what I was expecting. where are the items?
the items are in the xml but tika isn't showing them...

I tried using it on the original IBM feed but it failed with SSL errors.
so I saved the feed as an XML file and gave it to tika and it had even less
metadata:
Content-Length: 2068565
Content-Type: application/xml
resourceName: c3files-2-6-2013.xml

Please advise...

Thanks,






--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-add-more-metadata-to-tika-extraction-tp4043417p4043466.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to