[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610617#comment-13610617 ] Commit Tag Bot commented on LUCENE-4511: [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revisionrevision=1404132 LUCENE-4511: TermsFilter might return wrong results if a field is not indexed or not present in the index TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487733#comment-13487733 ] Michael McCandless commented on LUCENE-4511: bq. Regarding PrefixCodedTerms I don't think this buys us much here since usecases are not likely to share prefixes? Well I suspect TermsFilter is often used with many terms, at which point prefix coding will usually reduce memory required. Do you have a sense of how many terms typical ElasticSearch usage uses? Seems like it must be highish since we're compacting terms into single byte[] in the first place. It would also be nice to reusing existing same code instead of inventing yet another way to pack terms into bytes (hrm: FieldCache/DocValues is yet another place where we do this). But I agree we don't need to improve that now ... we can refactor later ... progress not perfection. Hmm maybe add an explicit test for the no terms provided case? (Maybe I missed it ...). Also: maybe this should not be IAE but rather just return a filter accepting nothing? (I think this is what current one does today). Ie, just don't add lastTermsAndField if previousField is null in the ctor). Otherwise +1 to the new patch. Thanks Simon! TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487742#comment-13487742 ] Simon Willnauer commented on LUCENE-4511: - bq. Well I suspect TermsFilter is often used with many terms, at which point prefix coding will usually reduce memory required. the main point here is reducing # of objects really. In lucene we often focus on reducing the memory footprint but even if we don't safe much here we are still friendly in terms of GC which is my main concern. so that is also why I don't care too much about the prefix coded stuff. Yet we should consolidate this. I will open another issue. bq. Hmm maybe add an explicit test for the no terms provided case? I will add one before I commit. I don't think we should be smart here. Its likely a bug if nothing is provided. TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487756#comment-13487756 ] Uwe Schindler commented on LUCENE-4511: --- Wasn't there another patch available that uses AutomatonTermsEnum with MTQ to provide the same functionality? The Automaton was this Dahizuk-Mihov-thingie. Maybe we can make a MultiTermQuery out of this one (the Filter is then incuded by the rewrite mode, too)? TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487775#comment-13487775 ] Uwe Schindler commented on LUCENE-4511: --- It is not a problem, I was on vacation, so I only followed the mailing list on my mobile phone... We should in all cases port this automaton from the tests into core/query-module. I think the main issue is: Some tests in corre use it, but maybe we can move those tests to the module. Or we move TermsFilter to core... TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487212#comment-13487212 ] Simon Willnauer commented on LUCENE-4511: - if nobody objects I will commit the current patch tomorrow. TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487249#comment-13487249 ] Michael McCandless commented on LUCENE-4511: Do we need to check for the no terms provided case (throw IAE)? Else we seem to make a TermsAndField w/ null field? Or is that harmless (matches no docs)...? Maybe we need a test for it ... I think the ArrayUtil.grow can be a not a =? Should we shrink the byte[] down in the end? Can we rename .terms - .termBytes? Typo: don't use case we could pollute the cache here easily -- don't use cache since we could pollute the cache here easily Typo?: no freq if we don't need them - no freq since we don't need them Maybe equals should also compare the hashCode first (since we compute/cache it up front)? Should currentTermsAndField be renamed to lastTermsAndField? It's always the last completed field right? Hmm I suddenly realized: I think this code is doing the same thing that FrozenBufferedDeletes does (see PrefixCodedTerms ... which takes even fewer bytes since it shares prefixes). Maybe we should just use that? TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487417#comment-13487417 ] Chris Male commented on LUCENE-4511: +1 to these improvements. Another typo: to optimize for this case and to be fitler-cache friendly we - filter-cache TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486021#comment-13486021 ] Michael McCandless commented on LUCENE-4511: Nice catch! Hmm should we set lastField = field (and termsEnum = null) before continue (so we don't keep calling fields.terms() on the non-existent field), and then change that bogus if to check if termsEnum != null? TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486165#comment-13486165 ] Michael McCandless commented on LUCENE-4511: Wow, this looks good! We could also make an outer array w/ one entry (holding field name array of terms I guess) per field, instead of the array of booleans marking the transition. Hmm, but, you are calling terms.iterator once per term in each field? Can we call that only once per field instead? At some point/density it may be worth union-ing the terms into an A and using Terms.intersect ... we've talked about doing that before ... but we should do that separately. TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org