[ https://issues.apache.org/jira/browse/SOLR-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353039#comment-14353039 ]
Alexandre Rafalovitch commented on SOLR-7189: --------------------------------------------- I think if the new functionality allows to look inside zips for example, a lot of people would be interested. And it should be exposed through inner entity mechanism, so people could start with a list of file names for zips, then expand the zips, then process individual files, etc. But yes, it should be a separate issue. And I would definitely create it so the people are even aware of this new functionality. > Allow DIH to extract content from embedded documents via Tika > ------------------------------------------------------------- > > Key: SOLR-7189 > URL: https://issues.apache.org/jira/browse/SOLR-7189 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler > Affects Versions: 5.0 > Reporter: Tim Allison > Assignee: Shalin Shekhar Mangar > Priority: Minor > Fix For: Trunk, 5.1 > > Attachments: SOLR-7189.patch, test_recursive_embedded.docx > > > DIH's TikaEntityProcessor doesn't currently extract content from embedded > documents/attachments within a file. It might be useful if users could > configure whether or not to include extraction of content from embedded > documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org