[ https://issues.apache.org/jira/browse/SOLR-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352807#comment-14352807 ]
Shalin Shekhar Mangar commented on SOLR-7189: --------------------------------------------- I can imagine uses for it but personally I don't use much of either tika or DIH so I'll defer to your judgement. I'm happy to shepherd any patches though. > Allow DIH to extract content from embedded documents via Tika > ------------------------------------------------------------- > > Key: SOLR-7189 > URL: https://issues.apache.org/jira/browse/SOLR-7189 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 5.0 > Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar > Priority: Minor > Fix For: Trunk, 5.1 > > Attachments: SOLR-7189.patch, test_recursive_embedded.docx > > > DIH's TikaEntityProcessor doesn't currently extract content from embedded > documents/attachments within a file. It might be useful if users could > configure whether or not to include extraction of content from embedded > documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org