[ https://issues.apache.org/jira/browse/LUCENE-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6255: --------------------------------- Attachment: LUCENE-6255.patch Here is a middle ground proposal: - enforce that terms are added in order of positions - enforce that positions are all positive - PhraseQuery still accepts that the first position is greater than 0 but PhraseWeight does not - PhraseQuery.rewrite takes care of rebasing positions if the first one is not 0 This way, PhraseQuery would still be friendly to query parsers that create phrase queries from a token stream. > PhraseQuery inconsistencies > --------------------------- > > Key: LUCENE-6255 > URL: https://issues.apache.org/jira/browse/LUCENE-6255 > Project: Lucene - Core > Issue Type: Bug > Reporter: Adrien Grand > Assignee: Adrien Grand > Attachments: LUCENE-6255.patch > > > PhraseQuery behaves quite inconsistently when the position of the first term > is greater than 0. Here is an example: > {noformat} > Directory dir = newDirectory(); > RandomIndexWriter iw = new RandomIndexWriter(random(), dir); > FieldType customType = new FieldType(TextField.TYPE_NOT_STORED); > customType.setOmitNorms(true); > Field f = new Field("body", "", customType); > Document doc = new Document(); > doc.add(f); > f.setStringValue("one quick fox"); > iw.addDocument(doc); > IndexReader ir = iw.getReader(); > iw.close(); > IndexSearcher is = newSearcher(ir); > > PhraseQuery pq = new PhraseQuery(); > pq.add(new Term("body", "quick"), 0); > pq.add(new Term("body", "fox"), 1); > System.out.println(is.search(pq, 1).totalHits); // 1 > pq = new PhraseQuery(); > pq.add(new Term("body", "quick"), 10); > pq.add(new Term("body", "fox"), 11); > System.out.println(is.search(pq, 1).totalHits); // 0 > > pq = new PhraseQuery(); > pq.add(new Term("body", "quick"), 10); > System.out.println(is.search(pq, 1).totalHits); // 1 > > pq = new PhraseQuery(); > pq.add(new Term("body", "quick"), 10); > pq.add(new Term("body", "fox"), 11); > pq.setSlop(1); > System.out.println(is.search(pq, 1).totalHits); // 1 > > ir.close(); > dir.close(); > {noformat} > The reason is that when you add a term with position P on a PhraseQuery, > ExactPhraseScorer ignores all positions for this term which are less than P. > But this is inconsistent: > - if you have a single term, it does not work anymore since we rewrite to a > term query regardless of the position of the term (3rd query) > - if you increase the slop, we will use SloppyPhraseScorer which does not > have this behaviour. (4th query) > So I think we have two options: > - either remove this behaviour and make the positions that are provided to > PhraseQuery only relative (ie. fix ExactPhraseScorer) > - or make it work this way across the board (which means not rewriting to a > term query when the position is not 0 and fixing SloppyPhraseScorer). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org