[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461774#comment-16461774 ] Matt Weber commented on LUCENE-8284: Attached a patch that adds an expansion limit per-segment and just gathers the first terms we come across. Not sure I like this, I am going to try a version that adds a rewrite method to {{IntervalsSource}} so we can use the existing rewrite methods including the one [~dsmiley] mentioned. > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch, LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461601#comment-16461601 ] David Smiley commented on LUCENE-8284: -- LUCENE-6513 is related to more intelligently calculate the top-terms using DF, TTF. I have some WIP refreshing of an existing patch on that ticket. > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461126#comment-16461126 ] Jim Ferenczi commented on LUCENE-8284: -- .bq Is there anything I can do to move this forward? Add an expansion limit? Rewrite support? Yes, as I said in my previous comment we should have a way to limit the expansion through top terms rewrite. I discussed with [~jpountz] and [~romseygeek] and we agreed that the limit should be explicit (a parameter of the source) and that we should never create a disjunction that is bigger than BooleanQuery#MAX_CLAUSE_COUNT (hard limit for the max expansion). If these restrictions are added then we have no objections to add these kind of sources to intervals :). > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461092#comment-16461092 ] Matt Weber commented on LUCENE-8284: [~jpountz] [~jim.ferenczi] I disagree. There are many cases where they are not expensive and/or I, as a user, understand the consequences and am willing to live with it. Indexing techniques (ngrams, etc) will only go so far and there are many cases where they might actually introduce issues once your not working on a tiny dataset. I feel the type of restriction or optimizations you talk about should be added at the usage level, ie. Solr or Elasticsearch. Is there anything I can do to move this forward? Add an expansion limit? Rewrite support? > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460819#comment-16460819 ] Jim Ferenczi commented on LUCENE-8284: -- I agree with Adrien, just exposing multi term queries without limitations is not going to scale and will certainly produce OOME or very slow query. (edge) ngrams indexing should be used for prefixes and infixes matching but it won't work for fuzzy or regex queries so another option would be to accept multi term queries but only if they use top terms rewriting. So it would only select the top terms (and we can limit the number to Boolean.MAX_CLAUSE_COUNT) and translate them into term intervals ? maxExpansions should be a mandatory parameter for this kind of source in order to make sure that users are aware that only a subset of the matching terms are considered. > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460767#comment-16460767 ] Adrien Grand commented on LUCENE-8284: -- I don't think we should expose intervals on multi-term queries: it doesn't scale. I know spans have SpanMultiTermQuery which does the same, but I've seen users having issue with it as soon as they weren't working on a tiny dataset anymore. Intervals on prefixes or infixes should be doable by indexing (edge) ngrams, maybe there are things we could do in order to make it easier to use? This might be just documentation. > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458673#comment-16458673 ] Matt Weber commented on LUCENE-8284: [~romseygeek] [~jimczi] Since these expand terms per-segment the terms are not available when creating the {{IntervalWeight}} and thus result in a null {{simScorer}} if these are the only sources. I currently picked using a constant {{1.0f}} in this case. Not sure if this is the best approach or not. > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org