[ https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927116#comment-15927116 ]
Alexandre Rafalovitch commented on SOLR-9601: --------------------------------------------- Turns out to be there is a problem with having - and populating - a uniqueKey. Tika extract does not give us a meaningful key. The nearest one is *resourceName* but it is not made available when parsing through DIH, as - I suspect - we abstract the filesystem too well. I could rename *title* into *id* and change type to string but that's a bit too far bending over I think. I could I guess map it to *id* and copyField to *title*. Would that be reasonable? Ok on removing infoStream, though we have a logging setting that uses it for all examples globally; but I could add a comment in that file I guess. solrconfig.xml already has a long comment about the example being minimalistic. > DIH: Radicially simplify Tika example to only show relevant configuration > ------------------------------------------------------------------------- > > Key: SOLR-9601 > URL: https://issues.apache.org/jira/browse/SOLR-9601 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) > Affects Versions: 6.x, master (7.0) > Reporter: Alexandre Rafalovitch > Assignee: Alexandre Rafalovitch > Labels: examples, usability > Attachments: tika2_20170308.tgz > > > Solr DIH examples are legacy examples to show how DIH work. However, they > include full configurations that may obscure teaching points. This is no > longer needed as we have 3 full-blown examples in the configsets. > Specifically for Tika, the field types definitions were at some point > simplified to have less support files in the configuration directory. This, > however, means that we now have field definitions that have same names as > other examples, but different definitions. > Importantly, Tika does not use most (any?) of those modified definitions. > They are there just for completeness. Similarly, the solrconfig.xml includes > extract handler even though we are demonstrating a different path of using > Tika. Somebody grepping through config files may get confused about what > configuration aspects contributes to what experience. > I am planning to significantly simplify configuration and schema of Tika > example to **only** show DIH Tika extraction path. It will end-up a very > short and focused example. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org