[ https://issues.apache.org/jira/browse/SOLR-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lance Norskog updated SOLR-2101: -------------------------------- Attachment: htmllist-data-config.xml htmllist.xml DIH config file and indexing job list. Not intended for inclusion. > TikaEntityProcessor does not extract files- does not pick parser correctly > -------------------------------------------------------------------------- > > Key: SOLR-2101 > URL: https://issues.apache.org/jira/browse/SOLR-2101 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) > Affects Versions: 3.1 > Reporter: Lance Norskog > Attachments: htmllist-data-config.xml, htmllist.xml > > > The TikaEntityProcessor does not choose a parser and does not extract data. > The attached DIH config file only works if the Tika parser is specified with: > {{parser="org.apache.tika.parser.html.HtmlParser"}}. > Remove that line and Tika will contribute nothing to the document. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org