[ 
https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339432#comment-14339432
 ] 

Ryan Ernst commented on LUCENE-6212:
------------------------------------

bq. you want the content to be indexed under the 'title' and 'body' fields, and 
not 'title_en' and 'title_de'. Well maybe you do/should but the point is that 
you have a single schema for your documents.

[~shaie] [~c.cudennec] That is exactly the problem.  It wasn't really a single 
schema.  It was a trappy API that required deciding at query time which 
analyzer to use.  It also means term statistics can be skewed, so the results 
could be skewed.  Using separate fields for each language is much better.  It's 
not really anymore work, since you would have had separate analyzers for each 
of those languages anyways.

> Remove IndexWriter's per-document analyzer add/updateDocument APIs
> ------------------------------------------------------------------
>
>                 Key: LUCENE-6212
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6212
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, Trunk, 5.1
>
>         Attachments: LUCENE-6212.patch
>
>
> IndexWriter already takes an analyzer up-front (via
> IndexWriterConfig), but it also allows you to specify a different one
> for each add/updateDocument.
> I think this is quite dangerous/trappy since it means you can easily
> index tokens for that document that don't match at search-time based
> on the search-time analyzer.
> I think we should remove this trap in 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to