Ok, I figured it out. I manually ran the tika-app --gui and I dropped the rss feed into it. Here's what the metadata output:
Content-Length: 615913 Content-Type: application/rss+xml dc:description: This is an IBM C3 Public Files feed generated by a Java application. dc:title: IBM - C3 Public Files RSS feed description: This is an IBM C3 Public Files feed generated by a Java application. title: IBM - C3 Public Files RSS feed that's not what I was expecting. where are the items? the items are in the xml but tika isn't showing them... I tried using it on the original IBM feed but it failed with SSL errors. so I saved the feed as an XML file and gave it to tika and it had even less metadata: Content-Length: 2068565 Content-Type: application/xml resourceName: c3files-2-6-2013.xml Please advise... Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-add-more-metadata-to-tika-extraction-tp4043417p4043466.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.