[jira] [Updated] (SOLR-8671) Date statistics: make "sum" a double instead of a long/date
[ https://issues.apache.org/jira/browse/SOLR-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8671: --- Attachment: 0001-change-date-sum-to-double.patch This patch requires the patch in SOLR-8420 be applied first, as this patch modifies a test added in the patch for SOLR-8420. > Date statistics: make "sum" a double instead of a long/date > --- > > Key: SOLR-8671 > URL: https://issues.apache.org/jira/browse/SOLR-8671 > Project: Solr > Issue Type: Improvement >Reporter: Tomás Fernández Löbbe > Fix For: master > > Attachments: 0001-change-date-sum-to-double.patch > > > Currently {{DateStatsValues#sum}} is defined as long, and returned as a date. > This has two problems: It overflows (with ~6 million values), and the return > value can be a date like {{122366-06-12T21:06:06Z}}. > I think we should just change this stat to a double. See SOLR-8420. > I think we can change this only in master, since it will break backward > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8420) Date statistics: sumOfSquares overflows long
[ https://issues.apache.org/jira/browse/SOLR-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8420: --- Attachment: 0001-Fix-overflow-in-date-statistics.patch This latest version of the path adds an allowance in tests for floating point errors in computations for specific stats. It also fixes the error in the test that Tomas noted. > Date statistics: sumOfSquares overflows long > > > Key: SOLR-8420 > URL: https://issues.apache.org/jira/browse/SOLR-8420 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 5.4 >Reporter: Tom Hill >Priority: Minor > Attachments: 0001-Fix-overflow-in-date-statistics.patch, > 0001-Fix-overflow-in-date-statistics.patch, > 0001-Fix-overflow-in-date-statistics.patch, StdDev.java > > > The values for Dates are large enough that squaring them overflows a "long" > field. This should be converted to a double. > StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add > a cast to double > sumOfSquares += ( (double)value * value * count); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8420) Date statistics: sumOfSquares overflows long
[ https://issues.apache.org/jira/browse/SOLR-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073165#comment-15073165 ] Tom Hill commented on SOLR-8420: Certainly could be changed. Looks like it currently would overflow if you are looking at more than 6 million+ dates, which is a pretty small number today. Downside is small loss of precision for smaller datasets. I think it probably should be changed. I'll update the patch. > Date statistics: sumOfSquares overflows long > > > Key: SOLR-8420 > URL: https://issues.apache.org/jira/browse/SOLR-8420 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 5.4 >Reporter: Tom Hill >Priority: Minor > Attachments: 0001-Fix-overflow-in-date-statistics.patch, > 0001-Fix-overflow-in-date-statistics.patch, StdDev.java > > > The values for Dates are large enough that squaring them overflows a "long" > field. This should be converted to a double. > StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add > a cast to double > sumOfSquares += ( (double)value * value * count); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8420) Date statistics: sumOfSquares overflows long
[ https://issues.apache.org/jira/browse/SOLR-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8420: --- Attachment: StdDev.java Just a quick demo of why TestDistributedSearch is failing, when running with the patch. When TestDistributedSearch#test is run with two partitions, it gets a slightly different value than when run on one partition. The two results are 100010100011010110010111011100010110010101111000110 100010100011010110010111011100010110010101111000101 This matches the numbers seen in TestDistributedSearch. It looks like we need to add some delta into the compare for doubles in BaseDistributedSearchTestCase#public static String compare(Object a, Object b, int flags, Map handle) > Date statistics: sumOfSquares overflows long > > > Key: SOLR-8420 > URL: https://issues.apache.org/jira/browse/SOLR-8420 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 5.4 >Reporter: Tom Hill >Priority: Minor > Attachments: 0001-Fix-overflow-in-date-statistics.patch, > 0001-Fix-overflow-in-date-statistics.patch, StdDev.java > > > The values for Dates are large enough that squaring them overflows a "long" > field. This should be converted to a double. > StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add > a cast to double > sumOfSquares += ( (double)value * value * count); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8420) Date statistics: sumOfSquares overflows long
[ https://issues.apache.org/jira/browse/SOLR-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8420: --- Attachment: 0001-Fix-overflow-in-date-statistics.patch Fixes overflow in stddev, too. Not ready to commit. I still have to fix a rounding error in TestDistributed. > Date statistics: sumOfSquares overflows long > > > Key: SOLR-8420 > URL: https://issues.apache.org/jira/browse/SOLR-8420 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 5.4 >Reporter: Tom Hill >Priority: Minor > Attachments: 0001-Fix-overflow-in-date-statistics.patch, > 0001-Fix-overflow-in-date-statistics.patch > > > The values for Dates are large enough that squaring them overflows a "long" > field. This should be converted to a double. > StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add > a cast to double > sumOfSquares += ( (double)value * value * count); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8420) Date statistics: sumOfSquares overflows long
[ https://issues.apache.org/jira/browse/SOLR-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8420: --- Attachment: 0001-Fix-overflow-in-date-statistics.patch One line fix, plus tests. > Date statistics: sumOfSquares overflows long > > > Key: SOLR-8420 > URL: https://issues.apache.org/jira/browse/SOLR-8420 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 5.4 >Reporter: Tom Hill >Priority: Minor > Attachments: 0001-Fix-overflow-in-date-statistics.patch > > > The values for Dates are large enough that squaring them overflows a "long" > field. This should be converted to a double. > StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add > a cast to double > sumOfSquares += ( (double)value * value * count); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-8420) Date statistics: sumOfSquares overflows long
[ https://issues.apache.org/jira/browse/SOLR-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8420: --- Comment: was deleted (was: One line fix, plus tests.) > Date statistics: sumOfSquares overflows long > > > Key: SOLR-8420 > URL: https://issues.apache.org/jira/browse/SOLR-8420 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 5.4 >Reporter: Tom Hill >Priority: Minor > Attachments: 0001-Fix-overflow-in-date-statistics.patch > > > The values for Dates are large enough that squaring them overflows a "long" > field. This should be converted to a double. > StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add > a cast to double > sumOfSquares += ( (double)value * value * count); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8420) Date statistics: sumOfSquares overflows long
Tom Hill created SOLR-8420: -- Summary: Date statistics: sumOfSquares overflows long Key: SOLR-8420 URL: https://issues.apache.org/jira/browse/SOLR-8420 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 5.4 Reporter: Tom Hill Priority: Minor The values for Dates are large enough that squaring them overflows a "long" field. This should be converted to a double. StatsValuesFactory.java, line 755 DateStatsValues#updateTypeSpecificStats Add a cast to double sumOfSquares += ( (double)value * value * count); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8318) SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries
[ https://issues.apache.org/jira/browse/SOLR-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034166#comment-15034166 ] Tom Hill commented on SOLR-8318: To answer my Nov 19th comment: I don't think I need to do anything with setRewriteMethod for fuzzy queries. QueryParserBase has a newPrefixQuery() and a newFuzzyQuery(), and it does call setRewriteMethod in newPrefixQuery, but not in newFuzzyQuery. > SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries > - > > Key: SOLR-8318 > URL: https://issues.apache.org/jira/browse/SOLR-8318 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.3 >Reporter: Tom Hill >Assignee: Erick Erickson >Priority: Trivial > Attachments: sqp_fuzzy_multiterm.patch > > > Fuzzy queries in SimpleQParserPlugin don't seem to use the multi-term > analysis chain. Prefix queries do, and SolrQueryParser does use multi-term > analysis for fuzzy queries, so it seems like SimpleQParserPlugin should as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8318) SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries
[ https://issues.apache.org/jira/browse/SOLR-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034150#comment-15034150 ] Tom Hill commented on SOLR-8318: Right. The SimpleQueryParser has implementations of newFuzzyQuery and newPrefixQuery that just loop through the weights, and build a Boolean query. The existing implementation for SimpleQParser in SimpleQParserPlugin does override newPrefixQuery to use the multi-term analysis chain. It does not call the base class implementation. (the base class is basically a loop and a new. I looked at using a lambda to share a bit more code, but I found that more confusing). The problem I was trying to fix is that the existing implementation does not override newFuzzyQuery to use the muti-term analysis chain for fuzzy queries. So I basically duplicated what had been done for newPrefixQuery in SimpleQParser. > SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries > - > > Key: SOLR-8318 > URL: https://issues.apache.org/jira/browse/SOLR-8318 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.3 >Reporter: Tom Hill >Assignee: Erick Erickson >Priority: Trivial > Attachments: sqp_fuzzy_multiterm.patch > > > Fuzzy queries in SimpleQParserPlugin don't seem to use the multi-term > analysis chain. Prefix queries do, and SolrQueryParser does use multi-term > analysis for fuzzy queries, so it seems like SimpleQParserPlugin should as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8318) SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries
[ https://issues.apache.org/jira/browse/SOLR-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8318: --- Attachment: sqp_fuzzy_multiterm.patch > SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries > - > > Key: SOLR-8318 > URL: https://issues.apache.org/jira/browse/SOLR-8318 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.3 >Reporter: Tom Hill >Priority: Trivial > Attachments: sqp_fuzzy_multiterm.patch > > > Fuzzy queries in SimpleQParserPlugin don't seem to use the multi-term > analysis chain. Prefix queries do, and SolrQueryParser does use multi-term > analysis for fuzzy queries, so it seems like SimpleQParserPlugin should as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8318) SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries
[ https://issues.apache.org/jira/browse/SOLR-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8318: --- Attachment: (was: sqp_fuzzy_multiterm.patch) > SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries > - > > Key: SOLR-8318 > URL: https://issues.apache.org/jira/browse/SOLR-8318 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.3 >Reporter: Tom Hill >Priority: Trivial > > Fuzzy queries in SimpleQParserPlugin don't seem to use the multi-term > analysis chain. Prefix queries do, and SolrQueryParser does use multi-term > analysis for fuzzy queries, so it seems like SimpleQParserPlugin should as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8318) SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries
[ https://issues.apache.org/jira/browse/SOLR-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014551#comment-15014551 ] Tom Hill commented on SOLR-8318: In newPrefixQuery() in SimpleQParserPlugin, new queries are created like this prefix = sf.getType().getPrefixQuery(qParser, sf, term); Which adds a rewrite method: query.setRewriteMethod(sf.getType().getRewriteMethod(parser, sf)); Is that relevant for fuzzy queries? Or can I just do: fuzzy = new FuzzyQuery(new Term(entry.getKey(), text), fuzziness); > SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries > - > > Key: SOLR-8318 > URL: https://issues.apache.org/jira/browse/SOLR-8318 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.3 >Reporter: Tom Hill >Priority: Trivial > Attachments: sqp_fuzzy_multiterm.patch > > > Fuzzy queries in SimpleQParserPlugin don't seem to use the multi-term > analysis chain. Prefix queries do, and SolrQueryParser does use multi-term > analysis for fuzzy queries, so it seems like SimpleQParserPlugin should as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8318) SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries
[ https://issues.apache.org/jira/browse/SOLR-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-8318: --- Attachment: sqp_fuzzy_multiterm.patch > SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries > - > > Key: SOLR-8318 > URL: https://issues.apache.org/jira/browse/SOLR-8318 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.3 >Reporter: Tom Hill >Priority: Trivial > Attachments: sqp_fuzzy_multiterm.patch > > > Fuzzy queries in SimpleQParserPlugin don't seem to use the multi-term > analysis chain. Prefix queries do, and SolrQueryParser does use multi-term > analysis for fuzzy queries, so it seems like SimpleQParserPlugin should as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8318) SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries
Tom Hill created SOLR-8318: -- Summary: SimpleQueryParser doesn't use MultiTermAnalysis for Fuzzy Queries Key: SOLR-8318 URL: https://issues.apache.org/jira/browse/SOLR-8318 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 5.3 Reporter: Tom Hill Priority: Trivial Fuzzy queries in SimpleQParserPlugin don't seem to use the multi-term analysis chain. Prefix queries do, and SolrQueryParser does use multi-term analysis for fuzzy queries, so it seems like SimpleQParserPlugin should as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2636) Explain doesn't deal with negative only queries completely correctly
[ https://issues.apache.org/jira/browse/SOLR-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060220#comment-13060220 ] Tom Hill commented on SOLR-2636: @hoss Shall I close this one? @yonik Would it make sense for SolrQueryParser to just call makeQueryable? @Override public Query parse(String query) throws ParseException { return QueryUtils.makeQueryable(super.parse(query)); } Then the other three calls to makeQueryable can be deleted, I think. And it fixes the problem with expain. > Explain doesn't deal with negative only queries completely correctly > > > Key: SOLR-2636 > URL: https://issues.apache.org/jira/browse/SOLR-2636 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.2 >Reporter: Tom Hill >Assignee: Yonik Seeley >Priority: Trivial > Attachments: SOLR-2636 > > > If you do a negative only query, such as -author:[* TO *], explain returns > NaN for the score. The query executes correctly, however. > To execute negative only queries, Solr calls QueryUtils.makeQueryable, and > everything works correctly. But explain doesn't call this, and coord ends up > dividing by zero. > One could fix this by fixing the call to explain, which is easy, or perhaps > by fixing the query parser to generate the query that way in the first place. > (It looks like extended dismax does the latter, and so shouldn't have > problems). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2636) Explain doesn't deal with negative only queries completely correctly
[ https://issues.apache.org/jira/browse/SOLR-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Hill updated SOLR-2636: --- Attachment: SOLR-2636 Trivial patch which just fixes the call to explain. > Explain doesn't deal with negative only queries completely correctly > > > Key: SOLR-2636 > URL: https://issues.apache.org/jira/browse/SOLR-2636 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.2 >Reporter: Tom Hill >Priority: Trivial > Attachments: SOLR-2636 > > > If you do a negative only query, such as -author:[* TO *], explain returns > NaN for the score. The query executes correctly, however. > To execute negative only queries, Solr calls QueryUtils.makeQueryable, and > everything works correctly. But explain doesn't call this, and coord ends up > dividing by zero. > One could fix this by fixing the call to explain, which is easy, or perhaps > by fixing the query parser to generate the query that way in the first place. > (It looks like extended dismax does the latter, and so shouldn't have > problems). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2636) Explain doesn't deal with negative only queries completely correctly
Explain doesn't deal with negative only queries completely correctly Key: SOLR-2636 URL: https://issues.apache.org/jira/browse/SOLR-2636 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.2 Reporter: Tom Hill Priority: Trivial If you do a negative only query, such as -author:[* TO *], explain returns NaN for the score. The query executes correctly, however. To execute negative only queries, Solr calls QueryUtils.makeQueryable, and everything works correctly. But explain doesn't call this, and coord ends up dividing by zero. One could fix this by fixing the call to explain, which is easy, or perhaps by fixing the query parser to generate the query that way in the first place. (It looks like extended dismax does the latter, and so shouldn't have problems). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org