Hi, I didn't know where else to post this so apologies in advance... Here's my quandary: I'm using manifoldcf v1.1.1 to crawl non standard (IBM) RSS feeds and custom RSS feeds. There's additional metadata in each item that we need to capture. I added the additional fields to the Solr schema (4.0 final) but the additional fields are nowhere to be found. I used fiddler to confirm that manifoldcf is indeed sending all the data to solr. I can only assume that tika is ignoring it / removing it. I turned on the <str name="uprefix">attr_</str> in the solrconfig.xml but that didn't work either.
Can anyone tell me how to modify solr and or tika to accept the additional fields from the feed? I looked into the tika.config file option but I couldn't find any examples and I found one post that says it's obsolete... I also tried putting the additional metadata in the content field but the xml was stripped out leaving the data. so I used a double pipe as a delimiter but that had mixed results. here's what my solrconfig.xml extraction handler looks like for the RSS feed: <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="fmap.content">content</str> <str name="fmap.title">solr.title</str> <str name="fmap.name">solr.name</str> <str name="link">link</str> <str name="pubdateiso">pubdateiso</str> <str name="summary">summary</str> <str name="description">comments</str> <str name="authoremail">authoremail</str> <str name="modifier">modifier</str> <str name="modifieremail">modifieremail</str> <str name="authoremail">authoremail</str> <str name="published">published</str> <str name="updated">updated</str> <str name="modified">modified</str> <str name="created">created</str> <str name="fmap.Last-Modified">last_modified</str> <str name="uprefix">attr_</str> <str name="lowernames">true</str> <str name="fmap.div">ignored_</str> </lst> <lst name="date.formats"> <str>yyyy-MM-dd</str> </lst> </requestHandler> Please advise... Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-add-more-metadata-to-tika-extraction-tp4043417.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.