[jira] [Comment Edited] (SOLR-7796) Implement a "gather support info" button
[ https://issues.apache.org/jira/browse/SOLR-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048841#comment-17048841 ] Andrzej Bialecki edited comment on SOLR-7796 at 3/2/20 7:28 AM: Please take a look at {{SolrCLI.AutoscalingTool}} (available as {{bin/solr autoscaling -save }} ), which produces comprehensive snapshots of all autoscaling-related state - however, this includes also generally useful things like: * state of all collections (ClusterState) * state of all nodes (including node properties needed by autoscaling policy) * full content of ZK data * summary statistics, mostly related to autoscaling All of this information can be optionally redacted (anonymized) in a consistent fashion, so that eg. node names, IPs, collection names are consistently replaced with meaningless but user-friendly strings, both in JSON payloads and in ZK data dump. The redaction part is also available separately in {{RedactionUtils}}. Edit: COLSTATUS collection admin command gives you the details of collection layout, optionally including low-level Lucene details, down to the per-field size info and stats. was (Author: ab): Please take a look at {{SolrCLI.AutoscalingTool}} (available as {{bin/solr autoscaling -save }} ), which produces comprehensive snapshots of all autoscaling-related state - however, this includes also generally useful things like: * state of all collections (ClusterState) * state of all nodes (including node properties needed by autoscaling policy) * full content of ZK data * summary statistics, mostly related to autoscaling All of this information can be optionally redacted (anonymized) in a consistent fashion, so that eg. node names, IPs, collection names are consistently replaced with meaningless but user-friendly strings, both in JSON payloads and in ZK data dump. The redaction part is also available separately in {{RedactionUtils}}. > Implement a "gather support info" button > > > Key: SOLR-7796 > URL: https://issues.apache.org/jira/browse/SOLR-7796 > Project: Solr > Issue Type: Improvement > Components: Admin UI >Reporter: Shawn Heisey >Priority: Minor > > A "gather support info" button in the admin UI would be extremely helpful. > There are some basic pieces of info that we like to have for problem reports > on the user list, so there should be an easy way for a user to gather that > info. > Some of the more basic bits of info would be easy to include in a single file > that's easy to cut/paste -- java version, heap info, core/collection names, > directories, and stats, etc. If available, it should include server info > like memory, commandline args, ZK info, and possibly disk space. > There could be two buttons -- one that gathers smaller info into an XML, > JSON, or .properties structure that can be easily cut/paste into an email > message, and another that gathers larger info like files for configuration > and schema along with the other info (grabbing from zookeeper if running in > cloud mode) and packages it into a .zip file. Because the user list eats > almost all attachments, we would need to come up with some advice for sharing > the zipfile. I hate to ask INFRA for a file sharing service, but that might > not be a bad idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-7796) Implement a "gather support info" button
[ https://issues.apache.org/jira/browse/SOLR-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048841#comment-17048841 ] Andrzej Bialecki commented on SOLR-7796: Please take a look at {{SolrCLI.AutoscalingTool}} (available as {{bin/solr autoscaling -save }} ), which produces comprehensive snapshots of all autoscaling-related state - however, this includes also generally useful things like: * state of all collections (ClusterState) * state of all nodes (including node properties needed by autoscaling policy) * full content of ZK data * summary statistics, mostly related to autoscaling All of this information can be optionally redacted (anonymized) in a consistent fashion, so that eg. node names, IPs, collection names are consistently replaced with meaningless but user-friendly strings, both in JSON payloads and in ZK data dump. The redaction part is also available separately in {{RedactionUtils}}. > Implement a "gather support info" button > > > Key: SOLR-7796 > URL: https://issues.apache.org/jira/browse/SOLR-7796 > Project: Solr > Issue Type: Improvement > Components: Admin UI >Reporter: Shawn Heisey >Priority: Minor > > A "gather support info" button in the admin UI would be extremely helpful. > There are some basic pieces of info that we like to have for problem reports > on the user list, so there should be an easy way for a user to gather that > info. > Some of the more basic bits of info would be easy to include in a single file > that's easy to cut/paste -- java version, heap info, core/collection names, > directories, and stats, etc. If available, it should include server info > like memory, commandline args, ZK info, and possibly disk space. > There could be two buttons -- one that gathers smaller info into an XML, > JSON, or .properties structure that can be easily cut/paste into an email > message, and another that gathers larger info like files for configuration > and schema along with the other info (grabbing from zookeeper if running in > cloud mode) and packages it into a .zip file. Because the user list eats > almost all attachments, we would need to come up with some advice for sharing > the zipfile. I hate to ask INFRA for a file sharing service, but that might > not be a bad idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048825#comment-17048825 ] Niko Himanen commented on SOLR-13411: - Thank you for fixing this (y) > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Assignee: Mikhail Khludnev >Priority: Minor > Fix For: 8.5 > > Attachments: SOLR-13411.patch, SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface, making it hard to be integrated in Java projects or those who are not familier with C/C++ [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization based algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; where IVFFlat and HNSW are the most popular ones among all the VR algorithms. IVFFlat is better for high-precision applications such as face recognition, while HNSW performs better in general scenarios including recommendation and personalized advertisement. *The recall ratio of IVFFlat could be gradually increased by adjusting the query parameter (nprobe), while it's hard for HNSW to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. Recently, the implementation of HNSW (Hierarchical Navigable Small World, LUCENE-9004) for Lucene, has made great progress. The issue draws attention of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. As an alternative for solving ANN similarity search problems, IVFFlat is also very popular with many users and supporters. Compared with HNSW, IVFFlat has smaller index size but requires k-means clustering, while HNSW is faster in query (no training required) but requires extra storage for saving graphs [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. Another advantage is that IVFFlat can be faster and more accurate when enables GPU parallel computing (current not support in Java). Both algorithms have their merits and demerits. Since HNSW is now under development, it may be better to provide both implementations (HNSW && IVFFlat) for potential users who are faced with very different scenarios and want to more choices. The latest branch is [*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface, making it hard to be integrated in Java projects or those who are not familier with C/C++ [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization based algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; where IVFFlat and HNSW are the most popular ones among all the VR algorithms. IVFFlat is better for high-precision applications such as face recognition, while HNSW performs better in general scenarios including recommendation and personalized advertisement. *The recall ratio of IVFFlat could be gradually increased by adjusting the query parameter (nprobe), while it's hard for HNSW to improve its accuracy*. In theory, IVFFlat could achieve 100% r
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface, making it hard to be integrated in Java projects or those who are not familier with C/C++ [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization based algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; where IVFFlat and HNSW are the most popular ones among all the VR algorithms. IVFFlat is better for high-precision applications such as face recognition, while HNSW performs better in general scenarios including recommendation and personalized advertisement. *The recall ratio of IVFFlat could be gradually increased by adjusting the query parameter (nprobe), while it's hard for HNSW to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. Recently, the implementation of HNSW (Hierarchical Navigable Small World, LUCENE-9004) for Lucene, has made great progress. The issue draws attention of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. As an alternative for solving ANN similarity search problems, IVFFlat is also very popular with many users and supporters. Compared with HNSW, IVFFlat has smaller index size but requires k-means clustering, while HNSW is faster in query (no training required) but requires extra storage for saving graphs [indexing 1M vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. Another advantage is that IVFFlat can be faster and more accurate when enables GPU parallel computing (current not support in Java). Both algorithms have their merits and demerits. Since HNSW is now under development, it may be better to provide both implementations (HNSW && IVFFlat) for potential users who are faced with very different scenarios and want to more choices. The latest branch is [lucene-9136-ann-ivfflat]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat] was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface, making it hard to be integrated in Java projects or those who are not familier with C/C++ [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization based algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; where IVFFlat and HNSW are the most popular ones among all the VR algorithms. IVFFlat is better for high-precision applications such as face recognition, while HNSW performs better in general scenarios including recommendation and personalized advertisement. *The recall ratio of IVFFlat could be gradually increased by adjusting the query parameter (nprobe), while it's hard for HNSW to improve its accuracy*. In theory, IVFFlat could achieve 100% rec
[jira] [Resolved] (LUCENE-9243) TestXYPointDistanceSort failure: point was within the distance , but the bbox doesn't contain it
[ https://issues.apache.org/jira/browse/LUCENE-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9243. -- Fix Version/s: 8.5 Assignee: Ignacio Vera Resolution: Fixed > TestXYPointDistanceSort failure: point was within the distance , but the bbox > doesn't contain it > > > Key: LUCENE-9243 > URL: https://issues.apache.org/jira/browse/LUCENE-9243 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 8.5 > > Time Spent: 20m > Remaining Estimate: 0h > > Reproduce: > {code:java} > ant test -Dtestcase=TestXYPointDistanceSort -Dtests.method=testRandomHuge > -Dtests.seed=EC212F407CDDF680 -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true > -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt > -Dtests.locale=fr-FR -Dtests.timezone=Pacific/Yap -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 {code} > I had a look and this error is similar to LUCENE-7143. The solution should be > similar, add a fudge factor to the bounding box of a circle so we make sure > we include all points that are at the specified distance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9243) TestXYPointDistanceSort failure: point was within the distance , but the bbox doesn't contain it
[ https://issues.apache.org/jira/browse/LUCENE-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048807#comment-17048807 ] ASF subversion and git services commented on LUCENE-9243: - Commit d9787406f895d166e0d13eb5ce8a98865f1f3e39 in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d978740 ] LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle (#1278) > TestXYPointDistanceSort failure: point was within the distance , but the bbox > doesn't contain it > > > Key: LUCENE-9243 > URL: https://issues.apache.org/jira/browse/LUCENE-9243 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Reproduce: > {code:java} > ant test -Dtestcase=TestXYPointDistanceSort -Dtests.method=testRandomHuge > -Dtests.seed=EC212F407CDDF680 -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true > -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt > -Dtests.locale=fr-FR -Dtests.timezone=Pacific/Yap -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 {code} > I had a look and this error is similar to LUCENE-7143. The solution should be > similar, add a fudge factor to the bounding box of a circle so we make sure > we include all points that are at the specified distance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9243) TestXYPointDistanceSort failure: point was within the distance , but the bbox doesn't contain it
[ https://issues.apache.org/jira/browse/LUCENE-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048805#comment-17048805 ] ASF subversion and git services commented on LUCENE-9243: - Commit c653c04bb1717f9813e50d2fd2cef1d323ee6036 in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c653c04 ] LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle (#1278) > TestXYPointDistanceSort failure: point was within the distance , but the bbox > doesn't contain it > > > Key: LUCENE-9243 > URL: https://issues.apache.org/jira/browse/LUCENE-9243 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Reproduce: > {code:java} > ant test -Dtestcase=TestXYPointDistanceSort -Dtests.method=testRandomHuge > -Dtests.seed=EC212F407CDDF680 -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true > -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt > -Dtests.locale=fr-FR -Dtests.timezone=Pacific/Yap -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 {code} > I had a look and this error is similar to LUCENE-7143. The solution should be > similar, add a fudge factor to the bounding box of a circle so we make sure > we include all points that are at the specified distance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #1278: LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle
iverase merged pull request #1278: LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle URL: https://github.com/apache/lucene-solr/pull/1278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13502) Investigate using something other than ZooKeeper's "4 letter words" for the admin UI status
[ https://issues.apache.org/jira/browse/SOLR-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048789#comment-17048789 ] Erick Erickson commented on SOLR-13502: --- Hmmm, according to the ZK docs, the admin server is automatically started on port 8080, and I did a simple test to verify that [http://localhost:8080/commands/ruok] works just fine. So apparently, one could just use a straight http call to get this information for the admin page. But... 1> is it even a good idea to use this? 2> I can't really find a simple http connection in the Solr code on a quick look while riding Amtrak with bad internet connections, any hints? > Investigate using something other than ZooKeeper's "4 letter words" for the > admin UI status > --- > > Key: SOLR-13502 > URL: https://issues.apache.org/jira/browse/SOLR-13502 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > ZooKeeper 3.5.5 requires a whitelist of allowed "4 letter words". The only > place I see on a quick look at the Solr code where 4lws are used is in the > admin UI "ZK Status" link. > In order to use the admin UI "ZK Status" link, users will have to modify > their zoo.cfg file with > {code} > 4lw.commands.whitelist=mntr,conf,ruok > {code} > This JIRA is to see if there are alternatives to using 4lw for the admin UI. > This depends on SOLR-8346. If we find an alternative, we need to remove the > additions to the ref guide that mention changing zoo.cfg (just scan for 4lw > in all the .adoc files) and remove SolrZkServer.ZK_WHITELIST_PROPERTY and all > references to it (SolrZkServer and SolrTestCaseJ4). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on issue #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task
mocobeta commented on issue #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task URL: https://github.com/apache/lucene-solr/pull/1304#issuecomment-593160403 To confirm if the inter-module links are correctly generated, the "broken links check" task will be of help for us (it isn't ported to gradle yet). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant
[ https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048702#comment-17048702 ] Tomoko Uchida commented on LUCENE-9242: --- [~dweiss] [~rcmuir] Could you take a look at the PR? It seems to work for me but I am not sure if this is a good start or not, any thoughts or brief comments are welcomed. > Gradle Javadoc task should output the same documents as Ant > --- > > Key: LUCENE-9242 > URL: https://issues.apache.org/jira/browse/LUCENE-9242 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/javadocs >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > "javadoc" task for the Gradle build does not correctly output package > summaries, since it ignores "package.html" file in the source tree (so the > Python linter {{checkJavaDocs.py}} detects that and fails for now.) > Also the "javadoc" task should make inter-module links just as Ant build does. > See for more details: LUCENE-9201 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant
[ https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048696#comment-17048696 ] Tomoko Uchida edited comment on LUCENE-9242 at 3/1/20 10:38 PM: I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with inter-module hyperlinks by invoking Ant javadoc task. Also this passes {{checkMissingJavadocs}} check. The task can be called as below: {code:java} # generate javadocs for each project $ ./gradlew :lucene:core:invokeJavadoc {code} or, {code:java} # generate javadocs for all projects at once $ ./gradlew invokeJavadoc {code} The work isn't completed yet, but the most important parts are already ported. Quick replies to comments on LUCENE-9201 will be following: {quote}It is my personal preference to have a project-scope granularity. This way you can run project-scoped task (like gradlew -p lucene/core javadoc). My personal take on assembling "distributions" is to have a separate project that just takes what it needs from other projects and puts it together (with any tweaks required). This makes it easier to reason about how a distribution is assembled and from where, while each project just takes care of itself. {quote} I'd love this approach, however, when I was trying I noticed that it looks difficult to properly generate inter-module hyperlinks without affecting the existing javadoc's path hierarchy (already published on the apache.org web site), if we want to place generated javadocs under ${sub_project_root}/build/docs/javadoc (gradle's default javadoc destination). The fundamental problem here I think is, in order to make hyperlinks from a module A to another module B, we need to know the effective relative path from module A to module B and pass it to the Javadoc Tool. I aggregated all javadocs into {{lucene/build/docs}} or {{solr/build/docs}}, just as the Ant build does, to resolve the relative paths. I might miss something - please let me know if my understanding isn't correct. {quote}for "directly call the javadoc tool" we may want to use the ant task as a start. This ant task is doing quite a bit of work above and beyond what the tool is doing (if you look at the relevant code to ant, you may be shocked!). {quote} As the first step I tried to reproduce the principal Ant macros : "invoke-javadoc" (in lucene/common-build.xml) and "invoke-module-javadoc" (in lucene/module-build.xml) on gradle build. By doing so, there's now no missing package summaries and inter-module links will be generated. (Current setups to resolve the hyperlinks look quite redundant, I think we can do it in more sophisticated ways.) {quote}A custom javadoc invocation is certainly possible and could possibly make things easier in the long run. {quote} {quote}as a second step you can look at computing package list for a module yourself (it may allow invoking the tool directly). {quote} Yes we will probably be able to throw away all ant tasks and only rely on pure gradle code. Some extra effort will be needed to faithfully transfer the elaborate ant setups into corresponding gradle scripts... {quote}You'd need to declare inputs/ outputs properly though so that it is skippable. Those javadoc invocations take a long time in precommit. {quote} I passed inputs/outputs to the task not to needlessly repeat the javadoc invocation. It seems to work - {{ant.javadoc}} is called only when the java source or output directory is changed. was (Author: tomoko uchida): I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with inter-module hyperlinks by invoking Ant javadoc task. Also this passes {{checkMissingJavadocs}} check. The task can be called as below: {code:java} # generate javadocs for each project $ ./gradlew :lucene:core:invokeJavadoc {code} or, {code:java} # generate javadocs for all projects at once $ ./gradlew invokeJavadoc {code} The work isn't completed yet, but the most important parts are already ported. Quick replies to comments on LUCENE-9201 will be following: {quote}It is my personal preference to have a project-scope granularity. This way you can run project-scoped task (like gradlew -p lucene/core javadoc). My personal take on assembling "distributions" is to have a separate project that just takes what it needs from other projects and puts it together (with any tweaks required). This makes it easier to reason about how a distribution is assembled and from where, while each project just takes care of itself. {quote} I'd love this approach, however, when I was trying I noticed that it looks difficult to properly generate inter-module hyperlinks without affecting the existing javadoc's path hierarchy (already published on
[jira] [Comment Edited] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant
[ https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048696#comment-17048696 ] Tomoko Uchida edited comment on LUCENE-9242 at 3/1/20 10:37 PM: I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with inter-module hyperlinks by invoking Ant javadoc task. Also this passes {{checkMissingJavadocs}} check. The task can be called as below: {code:java} # generate javadocs for each project $ ./gradlew :lucene:core:invokeJavadoc {code} or, {code:java} # generate javadocs for all projects at once $ ./gradlew invokeJavadoc {code} The work isn't completed yet, but the most important parts are already ported. Quick replies to comments on LUCENE-9201 will be following: {quote}It is my personal preference to have a project-scope granularity. This way you can run project-scoped task (like gradlew -p lucene/core javadoc). My personal take on assembling "distributions" is to have a separate project that just takes what it needs from other projects and puts it together (with any tweaks required). This makes it easier to reason about how a distribution is assembled and from where, while each project just takes care of itself. {quote} I'd love this approach, however, when I was trying I noticed that it looks difficult to properly generate inter-module hyperlinks without affecting the existing javadoc's path hierarchy (already published on the apache.org web site), if we want to place generated javadocs under ${sub_project_root}/build/docs/javadoc (gradle's default javadoc destination). The fundamental problem here I think is, in order to make hyperlinks from a module A to another module B, we need to know the effective relative path from module A to module B and pass it to the Javadoc Tool. I aggregated all javadocs into {{lucene/build/docs}} or {{solr/build/docs}}, just as the Ant build does, to resolve the relative paths. I might miss something - please let me know if my understanding isn't correct. {quote}for "directly call the javadoc tool" we may want to use the ant task as a start. This ant task is doing quite a bit of work above and beyond what the tool is doing (if you look at the relevant code to ant, you may be shocked!). {quote} As the first step I tried to reproduce the principal Ant macros : "invoke-javadoc" (in lucene/common-build.xml) and "invoke-module-javadoc" (in lucene/module-build.xml) on gradle build. By doing so, there's now no missing package summaries and inter-module links will be generated. (Current setups to resolve the hyperlinks look quite redundant, I think we can do it in more sophisticated ways.) {quote}A custom javadoc invocation is certainly possible and could possibly make things easier in the long run. {quote} {quote}as a second step you can look at computing package list for a module yourself (it may allow invoking the tool directly). {quote} Yes we will probably be able to throw away all ant tasks and only rely on pure gradle code. Some extra effort will be needed to faithfully transfer the elaborate ant setups into corresponding gradle scripts... {quote}You'd need to declare inputs/ outputs properly though so that it is skippable. Those javadoc invocations take a long time in precommit. {quote} I passed inputs/outputs to the task not to needlessly repeat the javadoc invocation. It seems to work - {{ant.javadoc}} is called only when the java source or output directory is changed. was (Author: tomoko uchida): I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with inter-module hyperlinks by invoking Ant javadoc task. Also this passes {{checkMissingJavadocs}} check. The task can be called as below: {code:java} # generate javadocs for each project $ ./gradlew :lucene:core:invokeJavadoc {code} or, {code:java} # generate javadocs for all projects at once $ ./gradlew invokeJavadoc {code} The work isn't completed yet, but the most important parts are already ported. Quick replies to comments on LUCENE-9201 will be following: {quote}It is my personal preference to have a project-scope granularity. This way you can run project-scoped task (like gradlew -p lucene/core javadoc). My personal take on assembling "distributions" is to have a separate project that just takes what it needs from other projects and puts it together (with any tweaks required). This makes it easier to reason about how a distribution is assembled and from where, while each project just takes care of itself. {quote} I'd love this approach, however, when I was trying I noticed that it looks difficult to properly generate inter-module hyperlinks without affecting the existing javadoc's path hierarchy (already published on the apach
[jira] [Commented] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant
[ https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048696#comment-17048696 ] Tomoko Uchida commented on LUCENE-9242: --- I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with inter-module hyperlinks by invoking Ant javadoc task. Also this passes {{checkMissingJavadocs}} check. The task can be called as below: {code:java} # generate javadocs for each project $ ./gradlew :lucene:core:invokeJavadoc {code} or, {code:java} # generate javadocs for all projects at once $ ./gradlew invokeJavadoc {code} The work isn't completed yet, but the most important parts are already ported. Quick replies to comments on LUCENE-9201 will be following: {quote}It is my personal preference to have a project-scope granularity. This way you can run project-scoped task (like gradlew -p lucene/core javadoc). My personal take on assembling "distributions" is to have a separate project that just takes what it needs from other projects and puts it together (with any tweaks required). This makes it easier to reason about how a distribution is assembled and from where, while each project just takes care of itself. {quote} I'd love this approach, however, when I was trying I noticed that it looks difficult to properly generate inter-module hyperlinks without affecting the existing javadoc's path hierarchy (already published on the apache.org web site), if we want to place generated javadocs under ${sub_project_root}/build/docs/javadoc (gradle's default javadoc destination). The fundamental problem here I think is, in order to make hyperlinks from a module A to another module B, we need to know the effective relative path from module A to module B and pass it to the Javadoc Tool. I aggregated all javadocs into {{lucene/build/docs}} or {{solr/build/docs}}, just as the Ant build does, to resolve the relative paths. I might miss something - please let me know if my understanding isn't correct. {quote}for "directly call the javadoc tool" we may want to use the ant task as a start. This ant task is doing quite a bit of work above and beyond what the tool is doing (if you look at the relevant code to ant, you may be shocked!). {quote} As the first step I tried to reproduce the principal Ant macros : "invoke-javadoc" (in lucene/common-build.xml) and "invoke-module-javadoc" (in lucene/module-build.xml) on gradle build. By doing so, there's now no missing package summaries and inter-module links will be generated. (Current setups to resolve the hyper links looks quite redundant, I think we can do it more sophisticated ways.) {quote}A custom javadoc invocation is certainly possible and could possibly make things easier in the long run. {quote} {quote}as a second step you can look at computing package list for a module yourself (it may allow invoking the tool directly). {quote} Yes we will probably be able to throw away all ant tasks and only rely on pure gradle code. Some extra effort will be needed to faithfully transfer the elaborate ant setups into corresponding gradle scripts... {quote}You'd need to declare inputs/ outputs properly though so that it is skippable. Those javadoc invocations take a long time in precommit. {quote} I passed inputs/outputs to the task not to needlessly repeat the javadoc invocation. It seems to work - {{ant.javadoc}} is called only when the java source or output directory is changed. > Gradle Javadoc task should output the same documents as Ant > --- > > Key: LUCENE-9242 > URL: https://issues.apache.org/jira/browse/LUCENE-9242 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/javadocs >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > "javadoc" task for the Gradle build does not correctly output package > summaries, since it ignores "package.html" file in the source tree (so the > Python linter {{checkJavaDocs.py}} detects that and fails for now.) > Also the "javadoc" task should make inter-module links just as Ant build does. > See for more details: LUCENE-9201 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593151099 Lets add the following test to TestFunctionRangeQuery: ``` @Test public void testTwoRangeQueries() throws IOException { Query rq1 = new FunctionRangeQuery(INT_VALUESOURCE, 2, 4, true, true); Query rq2 = new FunctionRangeQuery(INT_VALUESOURCE, 8, 10, true, true); Query bq = new BooleanQuery.Builder() .add(rq1, BooleanClause.Occur.SHOULD) .add(rq2, BooleanClause.Occur.SHOULD) .build(); ScoreDoc[] scoreDocs = indexSearcher.search(bq, N_DOCS).scoreDocs; expectScores(scoreDocs, 10, 9, 8, 4, 3, 2); } ``` This'll stack-overflow on your first implementation. > Maybe have FunctionValues expose an abstract cost() method, have all FV derivatives implement it and then simply let VSC's matchCost use that method? Yes; we certainly need the FV to provide the cost; the TPI.matchCost should simply look it up. By the FV (or VS) having a cost, it then becomes straight-forward for anyone's custom FV/VS to specify what their cost is. It's debatable wether this cost should be on the VS vs FV. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9255) ValueSource Has Generic Typing Issues
[ https://issues.apache.org/jira/browse/LUCENE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048670#comment-17048670 ] Alan Woodward commented on LUCENE-9255: --- Ideally we'd deprecate ValueSource and replace it with DoubleValuesSource/LongValuesSource everywhere, but that's a massive job as ValueSource is used all over the place in Solr. > ValueSource Has Generic Typing Issues > - > > Key: LUCENE-9255 > URL: https://issues.apache.org/jira/browse/LUCENE-9255 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > ValueSource uses a bunch of weakly typed members which raises compiler > issues. We need to fix this in ValueSource and all of its subclasses. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta opened a new pull request #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task
mocobeta opened a new pull request #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task URL: https://github.com/apache/lucene-solr/pull/1304 ## Description Draft PR that adds a gradle task to generate javadocs by invoking Ant javadoc task. All generated javadocs are passed "checkMissingDocs" check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593133186 @dsmiley Thinking further, I see no obvious way of ValueSourceScorer being able to determine a reasonable cost without having inputs from FunctionValues, and currently, the sane way of getting a cost out of FunctionValues is through its TPI (unless I am missing something?). The best cost metrics will come when specific implementations (such as IntFieldSource) expose their cost by internally evaluating their complexity in a source specific manner instead of delegating to the default FV matchCost implementation. Maybe have FunctionValues expose an abstract cost() method, have all FV derivatives implement it and then simply let VSC's matchCost use that method? (and oh, I realised that this PR is definitely wrong, thanks for pointing it out. For some reason I missed that FV delegates to VSC using itself as the delegated FV value. Need to add Lucene tests for VSC...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593133186 @dsmiley Thinking further, I see no obvious way of ValueSourceScorer being able to determine a reasonable cost without having inputs from FunctionValues, and currently, the sane way of getting a cost out of FunctionValues is through its TPI (unless I am missing something?). The best cost metrics will come when specific implementations (such as IntFieldSource) expose their cost by internally evaluating their complexity in a source specific manner instead of delegating to the default FV matchCost implementation. Maybe have FunctionValues expose an abstract cost() method, have all FV derivatives implement it and then simply let VSC's matchCost use that method? (and oh, I realised that this PR is definitely wrong, for some reason I missed that FV delegates to VSC using itself as the delegated FV value. Need to add Lucene tests for VSC...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593133186 @dsmiley Thinking further, I see no obvious way of ValueSourceScorer being able to determine a reasonable cost without having inputs from FunctionValues, and currently, the sane way of getting a cost out of FunctionValues is through its TPI (unless I am missing something?). The best cost metrics will come when specific implementations (such as IntFieldSource) expose their cost by internally evaluating their complexity in a source specific manner instead of delegating to the default FV matchCost implementation. Maybe have FunctionValues expose an abstract cost() method, have all FV derivatives implement it and then simply let VSC's matchCost use that method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593126930 > Wouldn't this result in an infinite loop? The idea was that the underlying TwoPhaseIterator implementation for the nested FunctionValues would be an actual VSC derivative and not using the default matchCost version. I planned to merge this PR only for 8x and make matchCost an abstract method for master. BTW, what would be your suggestion to better evaluate FunctionValues's cost? Maybe we could look at the doc each time and see if it matches or not and then return a cost based on that? Another thing -- the Lucene test suite passes fine with this change. Does that mean that we are lacking comprehensive tests for VSC where two nested FunctionValues use the default VSC matchCost? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi commented on issue #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer
danmuzi commented on issue #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer URL: https://github.com/apache/lucene-solr/pull/1296#issuecomment-593120577 Thanks for your review! @msokolov I changed some upper cases in Javadoc. And I added a change log for this patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer
danmuzi commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386122724 ## File path: lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/BinaryDictionary.java ## @@ -150,7 +150,18 @@ protected final InputStream getResource(String suffix) throws IOException { throw new IllegalStateException("unknown resource scheme " + resourceScheme); } } - + + public static InputStream getResource(ResourceScheme scheme, String path) throws IOException { Review comment: It's the same as Kuromoji's pattern. But there is no relation with Kuromoji. It's already written as an enum in this class. https://github.com/apache/lucene-solr/blob/d5e44e95175dbf027915e162925057bbcc14200b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/BinaryDictionary.java#L45-L47 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer
danmuzi commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386122678 ## File path: lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java ## @@ -185,16 +185,43 @@ public KoreanTokenizer(AttributeFactory factory, UserDictionary userDictionary, * @param discardPunctuation true if punctuation tokens should be dropped from the output. */ public KoreanTokenizer(AttributeFactory factory, UserDictionary userDictionary, DecompoundMode mode, boolean outputUnknownUnigrams, boolean discardPunctuation) { +this(factory, +TokenInfoDictionary.getInstance(), +UnknownDictionary.getInstance(), +ConnectionCosts.getInstance(), +userDictionary, mode, outputUnknownUnigrams, discardPunctuation); + } + + /** + * Create a new KoreanTokenizer supplying a custom system dictionary and unknown dictionary. + * This constructor provides an entry point for users that want to construct custom language models + * that can be used as input to {@link org.apache.lucene.analysis.ko.util.DictionaryBuilder}. + * + * @param factory the AttributeFactory to use + * @param systemDictionary a custom known token dictionary + * @param unkDictionary a custom unknown token dictionary + * @param connectionCosts custom token transition costs + * @param userDictionary Optional: if non-null, user dictionary. + * @param mode Decompound mode. + * @param outputUnknownUnigrams If true outputs unigrams for unknown words. Review comment: Oh, I did that because it was capitalized before. I'll change the other constructors as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593116806 Wouldn't this result in an infinite loop? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
msokolov commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r386109250 ## File path: lucene/core/src/java/org/apache/lucene/search/SliceExecutionControlPlane.java ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.search; + +import java.util.Collection; + +/** + * Execution control plane which is responsible + * for execution of slices based on the current status + * of the system and current system load + */ +public interface SliceExecutionControlPlane { + /** + * Invoke all slices that are allocated for the query + */ + C invokeAll(Collection tasks); Review comment: Also- I'm curious if you saw any performance impact from the back pressure here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches
msokolov commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r386109193 ## File path: lucene/core/src/java/org/apache/lucene/search/SliceExecutionControlPlane.java ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.search; + +import java.util.Collection; + +/** + * Execution control plane which is responsible + * for execution of slices based on the current status + * of the system and current system load + */ +public interface SliceExecutionControlPlane { + /** + * Invoke all slices that are allocated for the query + */ + C invokeAll(Collection tasks); Review comment: This is an internal detail of IndexSearcher, right? We're always free to change the method signatures later (if we keep the classes package-private - we should!). Maybe it would help if you were to explain what extension you have in mind. By the way, using force push makes it more difficult for reviewers since we can't easily see what changed from one version to the next. In general it's better to push your commits and then squash-merge them at the end (github will even do this for you I think) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer
msokolov commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386107769 ## File path: lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java ## @@ -185,16 +185,43 @@ public KoreanTokenizer(AttributeFactory factory, UserDictionary userDictionary, * @param discardPunctuation true if punctuation tokens should be dropped from the output. */ public KoreanTokenizer(AttributeFactory factory, UserDictionary userDictionary, DecompoundMode mode, boolean outputUnknownUnigrams, boolean discardPunctuation) { +this(factory, +TokenInfoDictionary.getInstance(), +UnknownDictionary.getInstance(), +ConnectionCosts.getInstance(), +userDictionary, mode, outputUnknownUnigrams, discardPunctuation); + } + + /** + * Create a new KoreanTokenizer supplying a custom system dictionary and unknown dictionary. + * This constructor provides an entry point for users that want to construct custom language models + * that can be used as input to {@link org.apache.lucene.analysis.ko.util.DictionaryBuilder}. + * + * @param factory the AttributeFactory to use + * @param systemDictionary a custom known token dictionary + * @param unkDictionary a custom unknown token dictionary + * @param connectionCosts custom token transition costs + * @param userDictionary Optional: if non-null, user dictionary. + * @param mode Decompound mode. + * @param outputUnknownUnigrams If true outputs unigrams for unknown words. Review comment: Don't capitalize "If" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer
msokolov commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386108024 ## File path: lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/BinaryDictionary.java ## @@ -150,7 +150,18 @@ protected final InputStream getResource(String suffix) throws IOException { throw new IllegalStateException("unknown resource scheme " + resourceScheme); } } - + + public static InputStream getResource(ResourceScheme scheme, String path) throws IOException { Review comment: so .. this basically follows the pattern from JapaneseTokenizer, I think. .. but somehow I don't see where we defined ResourceScheme? We're not referencing the one in kuromoji, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots
[ https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatolii Siuniaev updated SOLR-14291: - Description: If you send a query with range facets using old olap-style syntax DMV(see pdf here), OldAnalyticsRequestConverter just silently (no exception thrown) omits parameters like {code:java} olap..rangefacet..start {code} in case if __ has dots inside (for instance field name is _Project.Value_). And thus no range facets are returned in response. Probably the same happens in case of field faceting. was: If you send a query with range facets using old olap-style syntax (see here), OldAnalyticsRequestConverter just silently (no exception thrown) omits parameters like {code:java} olap..rangefacet..start {code} in case if __ has dots inside (for instance field name is _Project.Value_). And thus no range facets are returned in response. Probably the same happens in case of field faceting. > OldAnalyticsRequestConverter should support fields names with dots > -- > > Key: SOLR-14291 > URL: https://issues.apache.org/jira/browse/SOLR-14291 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search, SearchComponents - other >Reporter: Anatolii Siuniaev >Priority: Trivial > > If you send a query with range facets using old olap-style syntax DMV(see pdf > here), OldAnalyticsRequestConverter just silently (no exception thrown) omits > parameters like > {code:java} > olap..rangefacet..start > {code} > in case if __ has dots inside (for instance field name is > _Project.Value_). And thus no range facets are returned in response. > Probably the same happens in case of field faceting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers
msokolov merged pull request #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers URL: https://github.com/apache/lucene-solr/pull/1295 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers
msokolov commented on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers URL: https://github.com/apache/lucene-solr/pull/1295#issuecomment-593096201 Ah, I see you're right @irvingzhang . I think we could also save something by eliminating the priority queue for this case - it's silly to use an (effectively) 1-length queue, when all we need is a variable, but this does make the implementation match the algorithm: I'll merge This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots
[ https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048251#comment-17048251 ] Anatolii Siuniaev edited comment on SOLR-14291 at 3/1/20 1:15 PM: -- Yep, I'll create a patch in couple of days. And I added the link too. was (Author: anatolii_siuniaev): Yep, I'll create a patch in a couple of days. What do you mean by that article? > OldAnalyticsRequestConverter should support fields names with dots > -- > > Key: SOLR-14291 > URL: https://issues.apache.org/jira/browse/SOLR-14291 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search, SearchComponents - other >Reporter: Anatolii Siuniaev >Priority: Trivial > > If you send a query with range facets using old olap-style syntax (see here), > OldAnalyticsRequestConverter just silently (no exception thrown) omits > parameters like > {code:java} > olap..rangefacet..start > {code} > in case if __ has dots inside (for instance field name is > _Project.Value_). And thus no range facets are returned in response. > Probably the same happens in case of field faceting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048541#comment-17048541 ] Dr Oleg Savrasov commented on SOLR-13411: - [~mkhl] , [~dsmiley] Thank you, guys. > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Assignee: Mikhail Khludnev >Priority: Minor > Fix For: 8.5 > > Attachments: SOLR-13411.patch, SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers
irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers URL: https://github.com/apache/lucene-solr/pull/1295#issuecomment-592287771 > I believe in practice that results. max size is always set to ef, so there shouldn't be any real issue. I agree that the interface doesn't make that plain; we should enforce this invariant by API contract Hi, @msokolov, I agree that the max size is always set to _ef_, but _ef_ has different values in different layers. According to **Algorithm 5** of Yury's [papar](https://arxiv.org/pdf/1603.09320.pdf), HNSW searches the nearest one neighbor (namely, _ef_=1) from the top layer to the 1st layer, and then finds the nearest _ef_ (_ef_=topK) neighbors from layer 0. In the implementation of Lucene HNSW, the actual size of result queue (Line 64, [HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)) is set to _ef_=topK when searching from top layer to the 1st layer while expected neighbor size is 1, result in finding more neighbors than expected. Even if the parameter _ef_ is set to 1 in Line 66, [HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java), the condition `if (dist < f.distance() || results.size() < ef)` (Line 87, [HNSWGraph](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraph.java)) allows inserting more than 1 neighbor to the "results" queue when `dist < f.distance()` and `results.size() >= ef` (here _ef_=1, corresponding to Line 66, [HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)) because the max size of "results" is topK, which implies that the actual size of "results" queue belongs to [1, topK]. **The simplest way to verify this problem is to print the actual size of neighbors**. For example, add "System.out.println(neighbors.size());" after "visitedCount += hnsw.searchLayer(query, neighbors, 1, l, vectorValues);" (Line 66, [HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)), where the nearest one neighbor is expected, but the printed neighbor size would be range from 1~topK. Which also applies to [HNSWGraphWriter](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphWriter.java). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on a change in pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d
iverase commented on a change in pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d URL: https://github.com/apache/lucene-solr/pull/1253#discussion_r386092274 ## File path: lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/PlanetModel.java ## @@ -383,30 +509,233 @@ public GeoPoint surfacePointOnBearing(final GeoPoint from, final double dist, fi Δσ = B * sinσ * (cos2σM + B / 4.0 * (cosσ * (-1.0 + 2.0 * cos2σM * cos2σM) - B / 6.0 * cos2σM * (-3.0 + 4.0 * sinσ * sinσ) * (-3.0 + 4.0 * cos2σM * cos2σM))); σʹ = σ; - σ = dist / (c * inverseScale * A) + Δσ; + σ = dist / (zScaling * inverseScale * A) + Δσ; } while (Math.abs(σ - σʹ) >= Vector.MINIMUM_RESOLUTION && ++iterations < 100); double x = sinU1 * sinσ - cosU1 * cosσ * cosα1; -double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - flattening) * Math.sqrt(sinα * sinα + x * x)); +double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - scaledFlattening) * Math.sqrt(sinα * sinα + x * x)); double λ = Math.atan2(sinσ * sinα1, cosU1 * cosσ - sinU1 * sinσ * cosα1); -double C = flattening / 16.0 * cosSqα * (4.0 + flattening * (4.0 - 3.0 * cosSqα)); -double L = λ - (1.0 - C) * flattening * sinα * +double C = scaledFlattening / 16.0 * cosSqα * (4.0 + scaledFlattening * (4.0 - 3.0 * cosSqα)); +double L = λ - (1.0 - C) * scaledFlattening * sinα * (σ + C * sinσ * (cos2σM + C * cosσ * (-1.0 + 2.0 * cos2σM * cos2σM))); double λ2 = (lon + L + 3.0 * Math.PI) % (2.0 * Math.PI) - Math.PI; // normalise to -180..+180 return new GeoPoint(this, φ2, λ2); } + /** Utility class for encoding / decoding from lat/lon (decimal degrees) into sortable doc value numerics (integers) */ + public static class DocValueEncoder { +private final PlanetModel planetModel; + +// These are the multiplicative constants we need to use to arrive at values that fit in 21 bits. +// The formula we use to go from double to encoded value is: Math.floor((value - minimum) * factor + 0.5) +// If we plug in maximum for value, we should get 0x1F. +// So, 0x1F = Math.floor((maximum - minimum) * factor + 0.5) +// We factor out the 0.5 and Math.floor by stating instead: +// 0x1F = (maximum - minimum) * factor +// So, factor = 0x1F / (maximum - minimum) + +private final static double inverseMaximumValue = 1.0 / (double)(0x1F); + +private final double inverseXFactor; +private final double inverseYFactor; +private final double inverseZFactor; + +private final double xFactor; +private final double yFactor; +private final double zFactor; + +// Fudge factor for step adjustments. This is here solely to handle inaccuracies in bounding boxes +// that occur because of quantization. For unknown reasons, the fudge factor needs to be +// 10.0 rather than 1.0. See LUCENE-7430. + +private final static double STEP_FUDGE = 10.0; + +// These values are the delta between a value and the next value in each specific dimension + +private final double xStep; +private final double yStep; +private final double zStep; + +/** construct an encoder/decoder instance from the provided PlanetModel definition */ +public DocValueEncoder(final PlanetModel planetModel) { Review comment: make constructor private? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma reassigned LUCENE-9114: --- Assignee: Atri Sharma > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Assignee: Atri Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048492#comment-17048492 ] Atri Sharma commented on LUCENE-9114: - [~dsmiley] I have raised a PR for the same -- it is a minimalistic change to allow VSS to incorporate the delegated FunctionValues's cost into its cost. Would that help you get unblocked by adding stacked FunctionValues with custom costing functions? > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris opened a new pull request #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
atris opened a new pull request #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303 This commit makes ValueSourceScorer's costing algorithm to also take the delegated FunctionValues's cost into consideration when calculating its cost. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org