[ https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexandre Rafalovitch updated SOLR-9601: ---------------------------------------- Attachment: tika2_20170308.tgz It is a little hard to generate a readable DIFF between the original Tika example and one I created. So, for ease of testing, I just created it as a separate *tika2* core that can be dropped next to the other DIH cores. I removed all of the unused gunk, so the remaining files are tiny. I wish I could remove the infoStream section, but the default is false and I am not sure I should. I've also added a prototype-oriented demo of wildcard, renamed and simplified text field definition and did other minor cleanup in what is left. I am not sure if I need to worry about docValues here. Also, I have commented out uniqueKey section, but the corresponding *id* field definition is missing. But it was missing in the original example too, so I am not sure it is worth adding in the commented out section. This is a big change (even if with tiny results files), so I would appreciate people commenting on it before I actually commit it. > DIH: Radicially simplify Tika example to only show relevant configuration > ------------------------------------------------------------------------- > > Key: SOLR-9601 > URL: https://issues.apache.org/jira/browse/SOLR-9601 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) > Affects Versions: 6.x, master (7.0) > Reporter: Alexandre Rafalovitch > Assignee: Alexandre Rafalovitch > Labels: examples, usability > Attachments: tika2_20170308.tgz > > > Solr DIH examples are legacy examples to show how DIH work. However, they > include full configurations that may obscure teaching points. This is no > longer needed as we have 3 full-blown examples in the configsets. > Specifically for Tika, the field types definitions were at some point > simplified to have less support files in the configuration directory. This, > however, means that we now have field definitions that have same names as > other examples, but different definitions. > Importantly, Tika does not use most (any?) of those modified definitions. > They are there just for completeness. Similarly, the solrconfig.xml includes > extract handler even though we are demonstrating a different path of using > Tika. Somebody grepping through config files may get confused about what > configuration aspects contributes to what experience. > I am planning to significantly simplify configuration and schema of Tika > example to **only** show DIH Tika extraction path. It will end-up a very > short and focused example. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org