[ 
https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927116#comment-15927116
 ] 

Alexandre Rafalovitch commented on SOLR-9601:
---------------------------------------------

Turns out to be there is a problem with having - and populating - a uniqueKey. 
Tika extract does not give us a meaningful key. The nearest one is 
*resourceName* but it is not made available when parsing through DIH, as - I 
suspect - we abstract the filesystem too well.

I could rename *title* into *id* and change type to string but that's a bit too 
far bending over I think. I could I guess map it to *id* and copyField to 
*title*. Would that be reasonable?

Ok on removing infoStream, though we have a logging setting that uses it for 
all examples globally; but I could add a comment in that file I guess.

solrconfig.xml already has a long comment about the example being minimalistic.

> DIH: Radicially simplify Tika example to only show relevant configuration
> -------------------------------------------------------------------------
>
>                 Key: SOLR-9601
>                 URL: https://issues.apache.org/jira/browse/SOLR-9601
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>    Affects Versions: 6.x, master (7.0)
>            Reporter: Alexandre Rafalovitch
>            Assignee: Alexandre Rafalovitch
>              Labels: examples, usability
>         Attachments: tika2_20170308.tgz
>
>
> Solr DIH examples are legacy examples to show how DIH work. However, they 
> include full configurations that may obscure teaching points. This is no 
> longer needed as we have 3 full-blown examples in the configsets. 
> Specifically for Tika, the field types definitions were at some point 
> simplified to have less support files in the configuration directory. This, 
> however, means that we now have field definitions that have same names as 
> other examples, but different definitions. 
> Importantly, Tika does not use most (any?) of those modified definitions. 
> They are there just for completeness. Similarly, the solrconfig.xml includes 
> extract handler even though we are demonstrating a different path of using 
> Tika. Somebody grepping through config files may get confused about what 
> configuration aspects contributes to what experience.
> I am planning to significantly simplify configuration and schema of Tika 
> example to **only** show DIH Tika extraction path. It will end-up a very 
> short and focused example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to