On Wed, 27 Feb 2013, eShard wrote:
I manually ran the tika-app --gui and I dropped the rss feed into it.
Here's what the metadata output:

Content-Length: 615913
Content-Type: application/rss+xml
dc:description: This is an IBM C3 Public Files feed generated by a Java
application.
dc:title: IBM - C3 Public Files RSS feed
description: This is an IBM C3 Public Files feed generated by a Java
application.
title: IBM - C3 Public Files RSS feed

Looks like the metadata you want isn't being pulled out as metadata by Tika

that's not what I was expecting. where are the items? the items are in the xml but tika isn't showing them...

Metadata != content

I'd suspect that if you look at the content output (eg run tika-app with the --xml flag rather than --gui) you'll see that there. Do you?

Nick

Reply via email to