date:20200301

[jira] [Comment Edited] (SOLR-7796) Implement a "gather support info" button

2020-03-01 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048841#comment-17048841
 ] 

Andrzej Bialecki edited comment on SOLR-7796 at 3/2/20 7:28 AM:


Please take a look at {{SolrCLI.AutoscalingTool}} (available as {{bin/solr 
autoscaling -save }} ), which produces comprehensive snapshots of 
all autoscaling-related state - however, this includes also generally useful 
things like:
 * state of all collections (ClusterState)
 * state of all nodes (including node properties needed by autoscaling policy)
 * full content of ZK data
 * summary statistics, mostly related to autoscaling

All of this information can be optionally redacted (anonymized) in a consistent 
fashion, so that eg. node names, IPs, collection names are consistently 
replaced with meaningless but user-friendly strings, both in JSON payloads and 
in ZK data dump. The redaction part is also available separately in 
{{RedactionUtils}}.

 

Edit: COLSTATUS collection admin command gives you the details of collection 
layout, optionally including low-level Lucene details, down to the per-field 
size info and stats.


was (Author: ab):
Please take a look at {{SolrCLI.AutoscalingTool}} (available as {{bin/solr 
autoscaling -save }} ), which produces comprehensive snapshots of 
all autoscaling-related state - however, this includes also generally useful 
things like:
 * state of all collections (ClusterState)
 * state of all nodes (including node properties needed by autoscaling policy)
 * full content of ZK data
 * summary statistics, mostly related to autoscaling

All of this information can be optionally redacted (anonymized) in a consistent 
fashion, so that eg. node names, IPs, collection names are consistently 
replaced with meaningless but user-friendly strings, both in JSON payloads and 
in ZK data dump. The redaction part is also available separately in 
{{RedactionUtils}}.

> Implement a "gather support info" button
> 
>
> Key: SOLR-7796
> URL: https://issues.apache.org/jira/browse/SOLR-7796
> Project: Solr
>  Issue Type: Improvement
>  Components: Admin UI
>Reporter: Shawn Heisey
>Priority: Minor
>
> A "gather support info" button in the admin UI would be extremely helpful.  
> There are some basic pieces of info that we like to have for problem reports 
> on the user list, so there should be an easy way for a user to gather that 
> info.
> Some of the more basic bits of info would be easy to include in a single file 
> that's easy to cut/paste -- java version, heap info, core/collection names, 
> directories, and stats, etc.  If available, it should include server info 
> like memory, commandline args, ZK info, and possibly disk space.
> There could be two buttons -- one that gathers smaller info into an XML, 
> JSON, or .properties structure that can be easily cut/paste into an email 
> message, and another that gathers larger info like files for configuration 
> and schema along with the other info (grabbing from zookeeper if running in 
> cloud mode) and packages it into a .zip file.  Because the user list eats 
> almost all attachments, we would need to come up with some advice for sharing 
> the zipfile.  I hate to ask INFRA for a file sharing service, but that might 
> not be a bad idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-7796) Implement a "gather support info" button

2020-03-01 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048841#comment-17048841
 ] 

Andrzej Bialecki commented on SOLR-7796:


Please take a look at {{SolrCLI.AutoscalingTool}} (available as {{bin/solr 
autoscaling -save }} ), which produces comprehensive snapshots of 
all autoscaling-related state - however, this includes also generally useful 
things like:
 * state of all collections (ClusterState)
 * state of all nodes (including node properties needed by autoscaling policy)
 * full content of ZK data
 * summary statistics, mostly related to autoscaling

All of this information can be optionally redacted (anonymized) in a consistent 
fashion, so that eg. node names, IPs, collection names are consistently 
replaced with meaningless but user-friendly strings, both in JSON payloads and 
in ZK data dump. The redaction part is also available separately in 
{{RedactionUtils}}.

> Implement a "gather support info" button
> 
>
> Key: SOLR-7796
> URL: https://issues.apache.org/jira/browse/SOLR-7796
> Project: Solr
>  Issue Type: Improvement
>  Components: Admin UI
>Reporter: Shawn Heisey
>Priority: Minor
>
> A "gather support info" button in the admin UI would be extremely helpful.  
> There are some basic pieces of info that we like to have for problem reports 
> on the user list, so there should be an easy way for a user to gather that 
> info.
> Some of the more basic bits of info would be easy to include in a single file 
> that's easy to cut/paste -- java version, heap info, core/collection names, 
> directories, and stats, etc.  If available, it should include server info 
> like memory, commandline args, ZK info, and possibly disk space.
> There could be two buttons -- one that gathers smaller info into an XML, 
> JSON, or .properties structure that can be easily cut/paste into an email 
> message, and another that gathers larger info like files for configuration 
> and schema along with the other info (grabbing from zookeeper if running in 
> cloud mode) and packages it into a .zip file.  Because the user list eats 
> almost all attachments, we would need to come up with some advice for sharing 
> the zipfile.  I hate to ask INFRA for a file sharing service, but that might 
> not be a bad idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-03-01 Thread Niko Himanen (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048825#comment-17048825
 ] 

Niko Himanen commented on SOLR-13411:
-

Thank you for fixing this (y)

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Assignee: Mikhail Khludnev
>Priority: Minor
> Fix For: 8.5
>
> Attachments: SOLR-13411.patch, SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-01 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization based algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

IVFFlat is better for high-precision applications such as face recognition, 
while HNSW performs better in general scenarios including recommendation and 
personalized advertisement. *The recall ratio of IVFFlat could be gradually 
increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. 

Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
LUCENE-9004) for Lucene, has made great progress. The issue draws attention of 
those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 

As an alternative for solving ANN similarity search problems, IVFFlat is also 
very popular with many users and supporters. Compared with HNSW, IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query (no training required) but requires extra storage for saving graphs 
[indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
Another advantage is that IVFFlat can be faster and more accurate when enables 
GPU parallel computing (current not support in Java). Both algorithms have 
their merits and demerits. Since HNSW is now under development, it may be 
better to provide both implementations (HNSW && IVFFlat) for potential users 
who are faced with very different scenarios and want to more choices.

The latest branch is 
[*lucene-9136-ann-ivfflat*]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat]

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization based algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

IVFFlat is better for high-precision applications such as face recognition, 
while HNSW performs better in general scenarios including recommendation and 
personalized advertisement. *The recall ratio of IVFFlat could be gradually 
increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
to improve its accuracy*. In theory, IVFFlat could achieve 100% r

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-03-01 Thread Xin-Chun Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization based algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

IVFFlat is better for high-precision applications such as face recognition, 
while HNSW performs better in general scenarios including recommendation and 
personalized advertisement. *The recall ratio of IVFFlat could be gradually 
increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. 

Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
LUCENE-9004) for Lucene, has made great progress. The issue draws attention of 
those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 

As an alternative for solving ANN similarity search problems, IVFFlat is also 
very popular with many users and supporters. Compared with HNSW, IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query (no training required) but requires extra storage for saving graphs 
[indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
Another advantage is that IVFFlat can be faster and more accurate when enables 
GPU parallel computing (current not support in Java). Both algorithms have 
their merits and demerits. Since HNSW is now under development, it may be 
better to provide both implementations (HNSW && IVFFlat) for potential users 
who are faced with very different scenarios and want to more choices.

The latest branch is 
[lucene-9136-ann-ivfflat]([https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat)|https://github.com/irvingzhang/lucene-solr/commits/jira/lucene-9136-ann-ivfflat]

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization based algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

IVFFlat is better for high-precision applications such as face recognition, 
while HNSW performs better in general scenarios including recommendation and 
personalized advertisement. *The recall ratio of IVFFlat could be gradually 
increased by adjusting the query parameter (nprobe), while it's hard for HNSW 
to improve its accuracy*. In theory, IVFFlat could achieve 100% rec

[jira] [Resolved] (LUCENE-9243) TestXYPointDistanceSort failure: point was within the distance , but the bbox doesn't contain it

2020-03-01 Thread Ignacio Vera (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-9243.
--
Fix Version/s: 8.5
 Assignee: Ignacio Vera
   Resolution: Fixed

> TestXYPointDistanceSort failure: point was within the distance , but the bbox 
> doesn't contain it
> 
>
> Key: LUCENE-9243
> URL: https://issues.apache.org/jira/browse/LUCENE-9243
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reproduce:
> {code:java}
> ant test  -Dtestcase=TestXYPointDistanceSort -Dtests.method=testRandomHuge 
> -Dtests.seed=EC212F407CDDF680 -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
>  -Dtests.locale=fr-FR -Dtests.timezone=Pacific/Yap -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8 {code}
> I had a look and this error is similar to LUCENE-7143. The solution should be 
> similar, add a fudge factor to the bounding box of a circle so we make sure 
> we include all points that are at the specified distance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9243) TestXYPointDistanceSort failure: point was within the distance , but the bbox doesn't contain it

2020-03-01 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048807#comment-17048807
 ] 

ASF subversion and git services commented on LUCENE-9243:
-

Commit d9787406f895d166e0d13eb5ce8a98865f1f3e39 in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d978740 ]

LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle (#1278)



> TestXYPointDistanceSort failure: point was within the distance , but the bbox 
> doesn't contain it
> 
>
> Key: LUCENE-9243
> URL: https://issues.apache.org/jira/browse/LUCENE-9243
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reproduce:
> {code:java}
> ant test  -Dtestcase=TestXYPointDistanceSort -Dtests.method=testRandomHuge 
> -Dtests.seed=EC212F407CDDF680 -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
>  -Dtests.locale=fr-FR -Dtests.timezone=Pacific/Yap -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8 {code}
> I had a look and this error is similar to LUCENE-7143. The solution should be 
> similar, add a fudge factor to the bounding box of a circle so we make sure 
> we include all points that are at the specified distance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9243) TestXYPointDistanceSort failure: point was within the distance , but the bbox doesn't contain it

2020-03-01 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048805#comment-17048805
 ] 

ASF subversion and git services commented on LUCENE-9243:
-

Commit c653c04bb1717f9813e50d2fd2cef1d323ee6036 in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c653c04 ]

LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle (#1278)



> TestXYPointDistanceSort failure: point was within the distance , but the bbox 
> doesn't contain it
> 
>
> Key: LUCENE-9243
> URL: https://issues.apache.org/jira/browse/LUCENE-9243
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reproduce:
> {code:java}
> ant test  -Dtestcase=TestXYPointDistanceSort -Dtests.method=testRandomHuge 
> -Dtests.seed=EC212F407CDDF680 -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
>  -Dtests.locale=fr-FR -Dtests.timezone=Pacific/Yap -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8 {code}
> I had a look and this error is similar to LUCENE-7143. The solution should be 
> similar, add a fudge factor to the bounding box of a circle so we make sure 
> we include all points that are at the specified distance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase merged pull request #1278: LUCENE-9243: Add fudge factor when creating a bounding box of a xycircle

2020-03-01 Thread GitBox

iverase merged pull request #1278: LUCENE-9243: Add fudge factor when creating 
a bounding box of a xycircle
URL: https://github.com/apache/lucene-solr/pull/1278
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13502) Investigate using something other than ZooKeeper's "4 letter words" for the admin UI status

2020-03-01 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048789#comment-17048789
 ] 

Erick Erickson commented on SOLR-13502:
---

Hmmm, according to the ZK docs, the admin server is automatically started on 
port 8080, and I did a simple test to verify that 
[http://localhost:8080/commands/ruok] works just fine.

So apparently, one could just use a straight http call to get this information 
for the admin page. But...

1> is it even a good idea to use this?

2> I can't really find a simple http connection in the Solr code on a quick 
look while riding Amtrak with bad internet connections, any hints?

> Investigate using something other than ZooKeeper's "4 letter words" for the 
> admin UI status
> ---
>
> Key: SOLR-13502
> URL: https://issues.apache.org/jira/browse/SOLR-13502
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> ZooKeeper 3.5.5 requires a whitelist of allowed "4 letter words". The only 
> place I see on a quick look at the Solr code where 4lws are used is in the 
> admin UI "ZK Status" link.
> In order to use the admin UI "ZK Status" link, users will have to modify 
> their zoo.cfg file with
> {code}
> 4lw.commands.whitelist=mntr,conf,ruok
> {code}
> This JIRA is to see if there are alternatives to using 4lw for the admin UI.
> This depends on SOLR-8346. If we find an alternative, we need to remove the 
> additions to the ref guide that mention changing zoo.cfg (just scan for 4lw 
> in all the .adoc files) and remove SolrZkServer.ZK_WHITELIST_PROPERTY and all 
> references to it (SolrZkServer and SolrTestCaseJ4).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mocobeta commented on issue #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task

2020-03-01 Thread GitBox

mocobeta commented on issue #1304: LUCENE-9242: generate javadocs by calling 
Ant javadoc task
URL: https://github.com/apache/lucene-solr/pull/1304#issuecomment-593160403
 
 
   To confirm if the inter-module links are correctly generated, the "broken 
links check" task will be of help for us (it isn't ported to gradle yet).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant

2020-03-01 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048702#comment-17048702
 ] 

Tomoko Uchida commented on LUCENE-9242:
---

[~dweiss] [~rcmuir] Could you take a look at the PR? It seems to work for me 
but I am not sure if this is a good start or not, any thoughts or brief 
comments are welcomed.

> Gradle Javadoc task should output the same documents as Ant
> ---
>
> Key: LUCENE-9242
> URL: https://issues.apache.org/jira/browse/LUCENE-9242
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> "javadoc" task for the Gradle build does not correctly output package 
> summaries, since it ignores "package.html" file in the source tree (so the 
> Python linter {{checkJavaDocs.py}} detects that and fails for now.)
> Also the "javadoc" task should make inter-module links just as Ant build does.
> See for more details: LUCENE-9201



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant

2020-03-01 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048696#comment-17048696
 ] 

Tomoko Uchida edited comment on LUCENE-9242 at 3/1/20 10:38 PM:


I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This 
adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with 
inter-module hyperlinks by invoking Ant javadoc task. Also this passes 
{{checkMissingJavadocs}} check.

The task can be called as below:
{code:java}
# generate javadocs for each project
$ ./gradlew :lucene:core:invokeJavadoc
{code}
or,
{code:java}
# generate javadocs for all projects at once
$ ./gradlew invokeJavadoc
{code}
The work isn't completed yet, but the most important parts are already ported.

 

Quick replies to comments on LUCENE-9201 will be following:
{quote}It is my personal preference to have a project-scope granularity. This 
way you can run project-scoped task (like gradlew -p lucene/core javadoc). My 
personal take on assembling "distributions" is to have a separate project that 
just takes what it needs from other projects and puts it together (with any 
tweaks required). This makes it easier to reason about how a distribution is 
assembled and from where, while each project just takes care of itself.
{quote}
I'd love this approach, however, when I was trying I noticed that it looks 
difficult to properly generate inter-module hyperlinks without affecting the 
existing javadoc's path hierarchy (already published on the apache.org web 
site), if we want to place generated javadocs under 
${sub_project_root}/build/docs/javadoc (gradle's default javadoc destination). 
The fundamental problem here I think is, in order to make hyperlinks from a 
module A to another module B, we need to know the effective relative path from 
module A to module B and pass it to the Javadoc Tool.

I aggregated all javadocs into {{lucene/build/docs}} or {{solr/build/docs}}, 
just as the Ant build does, to resolve the relative paths. I might miss 
something - please let me know if my understanding isn't correct.

 
{quote}for "directly call the javadoc tool" we may want to use the ant task as 
a start. This ant task is doing quite a bit of work above and beyond what the 
tool is doing (if you look at the relevant code to ant, you may be shocked!).
{quote}
As the first step I tried to reproduce the principal Ant macros : 
"invoke-javadoc" (in lucene/common-build.xml) and "invoke-module-javadoc" (in 
lucene/module-build.xml) on gradle build. By doing so, there's now no missing 
package summaries and inter-module links will be generated. (Current setups to 
resolve the hyperlinks look quite redundant, I think we can do it in more 
sophisticated ways.)

 
{quote}A custom javadoc invocation is certainly possible and could possibly 
make things easier in the long run.
{quote}
{quote}as a second step you can look at computing package list for a module 
yourself (it may allow invoking the tool directly).
{quote}
Yes we will probably be able to throw away all ant tasks and only rely on pure 
gradle code. Some extra effort will be needed to faithfully transfer the 
elaborate ant setups into corresponding gradle scripts...

 
{quote}You'd need to declare inputs/ outputs properly though so that it is 
skippable. Those javadoc invocations take a long time in precommit.
{quote}
I passed inputs/outputs to the task not to needlessly repeat the javadoc 
invocation. It seems to work - {{ant.javadoc}} is called only when the java 
source or output directory is changed.


was (Author: tomoko uchida):
I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This 
adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with 
inter-module hyperlinks by invoking Ant javadoc task. Also this passes 
{{checkMissingJavadocs}} check.

The task can be called as below:
{code:java}
# generate javadocs for each project
$ ./gradlew :lucene:core:invokeJavadoc
{code}
or,
{code:java}
# generate javadocs for all projects at once
$ ./gradlew invokeJavadoc
{code}
The work isn't completed yet, but the most important parts are already ported.

 

Quick replies to comments on LUCENE-9201 will be following:
{quote}It is my personal preference to have a project-scope granularity. This 
way you can run project-scoped task (like gradlew -p lucene/core javadoc). My 
personal take on assembling "distributions" is to have a separate project that 
just takes what it needs from other projects and puts it together (with any 
tweaks required). This makes it easier to reason about how a distribution is 
assembled and from where, while each project just takes care of itself.
{quote}
I'd love this approach, however, when I was trying I noticed that it looks 
difficult to properly generate inter-module hyperlinks without affecting the 
existing javadoc's path hierarchy (already published on

[jira] [Comment Edited] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant

2020-03-01 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048696#comment-17048696
 ] 

Tomoko Uchida edited comment on LUCENE-9242 at 3/1/20 10:37 PM:


I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This 
adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with 
inter-module hyperlinks by invoking Ant javadoc task. Also this passes 
{{checkMissingJavadocs}} check.

The task can be called as below:
{code:java}
# generate javadocs for each project
$ ./gradlew :lucene:core:invokeJavadoc
{code}
or,
{code:java}
# generate javadocs for all projects at once
$ ./gradlew invokeJavadoc
{code}
The work isn't completed yet, but the most important parts are already ported.

 

Quick replies to comments on LUCENE-9201 will be following:
{quote}It is my personal preference to have a project-scope granularity. This 
way you can run project-scoped task (like gradlew -p lucene/core javadoc). My 
personal take on assembling "distributions" is to have a separate project that 
just takes what it needs from other projects and puts it together (with any 
tweaks required). This makes it easier to reason about how a distribution is 
assembled and from where, while each project just takes care of itself.
{quote}
I'd love this approach, however, when I was trying I noticed that it looks 
difficult to properly generate inter-module hyperlinks without affecting the 
existing javadoc's path hierarchy (already published on the apache.org web 
site), if we want to place generated javadocs under 
${sub_project_root}/build/docs/javadoc (gradle's default javadoc destination). 
The fundamental problem here I think is, in order to make hyperlinks from a 
module A to another module B, we need to know the effective relative path from 
module A to module B and pass it to the Javadoc Tool.

I aggregated all javadocs into {{lucene/build/docs}} or {{solr/build/docs}}, 
just as the Ant build does, to resolve the relative paths. I might miss 
something - please let me know if my understanding isn't correct.
{quote}for "directly call the javadoc tool" we may want to use the ant task as 
a start. This ant task is doing quite a bit of work above and beyond what the 
tool is doing (if you look at the relevant code to ant, you may be shocked!).
{quote}
As the first step I tried to reproduce the principal Ant macros : 
"invoke-javadoc" (in lucene/common-build.xml) and "invoke-module-javadoc" (in 
lucene/module-build.xml) on gradle build. By doing so, there's now no missing 
package summaries and inter-module links will be generated. (Current setups to 
resolve the hyperlinks look quite redundant, I think we can do it in more 
sophisticated ways.)
{quote}A custom javadoc invocation is certainly possible and could possibly 
make things easier in the long run.
{quote}
{quote}as a second step you can look at computing package list for a module 
yourself (it may allow invoking the tool directly).
{quote}
Yes we will probably be able to throw away all ant tasks and only rely on pure 
gradle code. Some extra effort will be needed to faithfully transfer the 
elaborate ant setups into corresponding gradle scripts...
{quote}You'd need to declare inputs/ outputs properly though so that it is 
skippable. Those javadoc invocations take a long time in precommit.
{quote}
I passed inputs/outputs to the task not to needlessly repeat the javadoc 
invocation. It seems to work - {{ant.javadoc}} is called only when the java 
source or output directory is changed.


was (Author: tomoko uchida):
I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This 
adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with 
inter-module hyperlinks by invoking Ant javadoc task. Also this passes 
{{checkMissingJavadocs}} check.

The task can be called as below:
{code:java}
# generate javadocs for each project
$ ./gradlew :lucene:core:invokeJavadoc
{code}
or,
{code:java}
# generate javadocs for all projects at once
$ ./gradlew invokeJavadoc
{code}
The work isn't completed yet, but the most important parts are already ported.

 

Quick replies to comments on LUCENE-9201 will be following:
{quote}It is my personal preference to have a project-scope granularity. This 
way you can run project-scoped task (like gradlew -p lucene/core javadoc). My 
personal take on assembling "distributions" is to have a separate project that 
just takes what it needs from other projects and puts it together (with any 
tweaks required). This makes it easier to reason about how a distribution is 
assembled and from where, while each project just takes care of itself.
{quote}
I'd love this approach, however, when I was trying I noticed that it looks 
difficult to properly generate inter-module hyperlinks without affecting the 
existing javadoc's path hierarchy (already published on the apach

[jira] [Commented] (LUCENE-9242) Gradle Javadoc task should output the same documents as Ant

2020-03-01 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048696#comment-17048696
 ] 

Tomoko Uchida commented on LUCENE-9242:
---

I opened a draft PR [https://github.com/apache/lucene-solr/pull/1304]. This 
adds a gradle task, named {{invokeJavadoc}}, which generates Javadocs with 
inter-module hyperlinks by invoking Ant javadoc task. Also this passes 
{{checkMissingJavadocs}} check.

The task can be called as below:
{code:java}
# generate javadocs for each project
$ ./gradlew :lucene:core:invokeJavadoc
{code}
or,
{code:java}
# generate javadocs for all projects at once
$ ./gradlew invokeJavadoc
{code}
The work isn't completed yet, but the most important parts are already ported.

 

Quick replies to comments on LUCENE-9201 will be following:
{quote}It is my personal preference to have a project-scope granularity. This 
way you can run project-scoped task (like gradlew -p lucene/core javadoc). My 
personal take on assembling "distributions" is to have a separate project that 
just takes what it needs from other projects and puts it together (with any 
tweaks required). This makes it easier to reason about how a distribution is 
assembled and from where, while each project just takes care of itself.
{quote}
I'd love this approach, however, when I was trying I noticed that it looks 
difficult to properly generate inter-module hyperlinks without affecting the 
existing javadoc's path hierarchy (already published on the apache.org web 
site), if we want to place generated javadocs under 
${sub_project_root}/build/docs/javadoc (gradle's default javadoc destination). 
The fundamental problem here I think is, in order to make hyperlinks from a 
module A to another module B, we need to know the effective relative path from 
module A to module B and pass it to the Javadoc Tool.

I aggregated all javadocs into {{lucene/build/docs}} or {{solr/build/docs}}, 
just as the Ant build does, to resolve the relative paths. I might miss 
something - please let me know if my understanding isn't correct.
{quote}for "directly call the javadoc tool" we may want to use the ant task as 
a start. This ant task is doing quite a bit of work above and beyond what the 
tool is doing (if you look at the relevant code to ant, you may be shocked!).
{quote}
As the first step I tried to reproduce the principal Ant macros : 
"invoke-javadoc" (in lucene/common-build.xml) and "invoke-module-javadoc" (in 
lucene/module-build.xml) on gradle build. By doing so, there's now no missing 
package summaries and inter-module links will be generated. (Current setups to 
resolve the hyper links looks quite redundant, I think we can do it more 
sophisticated ways.)
{quote}A custom javadoc invocation is certainly possible and could possibly 
make things easier in the long run.
{quote}
{quote}as a second step you can look at computing package list for a module 
yourself (it may allow invoking the tool directly).
{quote}
Yes we will probably be able to throw away all ant tasks and only rely on pure 
gradle code. Some extra effort will be needed to faithfully transfer the 
elaborate ant setups into corresponding gradle scripts...
{quote}You'd need to declare inputs/ outputs properly though so that it is 
skippable. Those javadoc invocations take a long time in precommit.
{quote}
I passed inputs/outputs to the task not to needlessly repeat the javadoc 
invocation. It seems to work - {{ant.javadoc}} is called only when the java 
source or output directory is changed.

> Gradle Javadoc task should output the same documents as Ant
> ---
>
> Key: LUCENE-9242
> URL: https://issues.apache.org/jira/browse/LUCENE-9242
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/javadocs
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> "javadoc" task for the Gradle build does not correctly output package 
> summaries, since it ignores "package.html" file in the source tree (so the 
> Python linter {{checkJavaDocs.py}} detects that and fails for now.)
> Also the "javadoc" task should make inter-module links just as Ant build does.
> See for more details: LUCENE-9201



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

2020-03-01 Thread GitBox

dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593151099
 
 
   Lets add the following test to TestFunctionRangeQuery:
   ```
 @Test
 public void testTwoRangeQueries() throws IOException {
   Query rq1 = new FunctionRangeQuery(INT_VALUESOURCE, 2, 4, true, true);
   Query rq2 = new FunctionRangeQuery(INT_VALUESOURCE, 8, 10, true, true);
   Query bq = new BooleanQuery.Builder()
   .add(rq1, BooleanClause.Occur.SHOULD)
   .add(rq2, BooleanClause.Occur.SHOULD)
   .build();
   
   ScoreDoc[] scoreDocs = indexSearcher.search(bq, N_DOCS).scoreDocs;
   expectScores(scoreDocs, 10, 9, 8, 4, 3, 2);
 }
   ```
   
   This'll stack-overflow on your first implementation.
   
   > Maybe have FunctionValues expose an abstract cost() method, have all FV 
derivatives implement it and then simply let VSC's matchCost use that method?
   
   Yes; we certainly need the FV to provide the cost; the TPI.matchCost should 
simply look it up.  By the FV (or VS) having a cost, it then becomes 
straight-forward for anyone's custom FV/VS to specify what their cost is.  It's 
debatable wether this cost should be on the VS vs FV.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9255) ValueSource Has Generic Typing Issues

2020-03-01 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048670#comment-17048670
 ] 

Alan Woodward commented on LUCENE-9255:
---

Ideally we'd deprecate ValueSource and replace it with 
DoubleValuesSource/LongValuesSource everywhere, but that's a massive job as 
ValueSource is used all over the place in Solr.

> ValueSource Has Generic Typing Issues
> -
>
> Key: LUCENE-9255
> URL: https://issues.apache.org/jira/browse/LUCENE-9255
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> ValueSource uses a bunch of weakly typed members which raises compiler 
> issues. We need to fix this in ValueSource and all of its subclasses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mocobeta opened a new pull request #1304: LUCENE-9242: generate javadocs by calling Ant javadoc task

2020-03-01 Thread GitBox

mocobeta opened a new pull request #1304: LUCENE-9242: generate javadocs by 
calling Ant javadoc task
URL: https://github.com/apache/lucene-solr/pull/1304
 
 
   ## Description
   
   Draft PR that adds a gradle task to generate javadocs by invoking Ant 
javadoc task.
   All generated javadocs are passed "checkMissingDocs" check.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

2020-03-01 Thread GitBox

atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593133186
 
 
   @dsmiley Thinking further, I see no obvious way of ValueSourceScorer being 
able to determine a reasonable cost without having inputs from FunctionValues, 
and currently, the sane way of getting a cost out of FunctionValues is through 
its TPI (unless I am missing something?).
   
   The best cost metrics will come when specific implementations (such as 
IntFieldSource) expose their cost by internally evaluating their complexity in 
a source specific manner instead of delegating to the default FV matchCost 
implementation.
   
   Maybe have FunctionValues expose an abstract cost() method, have all FV 
derivatives implement it and then simply let VSC's matchCost use that method?
   
   (and oh, I realised that this PR is definitely wrong, thanks for pointing it 
out. For some reason I missed that FV delegates to VSC using itself as the 
delegated FV value. Need to add Lucene tests for VSC...)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

2020-03-01 Thread GitBox

atris edited a comment on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593133186
 
 
   @dsmiley Thinking further, I see no obvious way of ValueSourceScorer being 
able to determine a reasonable cost without having inputs from FunctionValues, 
and currently, the sane way of getting a cost out of FunctionValues is through 
its TPI (unless I am missing something?).
   
   The best cost metrics will come when specific implementations (such as 
IntFieldSource) expose their cost by internally evaluating their complexity in 
a source specific manner instead of delegating to the default FV matchCost 
implementation.
   
   Maybe have FunctionValues expose an abstract cost() method, have all FV 
derivatives implement it and then simply let VSC's matchCost use that method?
   
   (and oh, I realised that this PR is definitely wrong, for some reason I 
missed that FV delegates to VSC using itself as the delegated FV value. Need to 
add Lucene tests for VSC...)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

2020-03-01 Thread GitBox

atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593133186
 
 
   @dsmiley Thinking further, I see no obvious way of ValueSourceScorer being 
able to determine a reasonable cost without having inputs from FunctionValues, 
and currently, the sane way of getting a cost out of FunctionValues is through 
its TPI (unless I am missing something?).
   
   The best cost metrics will come when specific implementations (such as 
IntFieldSource) expose their cost by internally evaluating their complexity in 
a source specific manner instead of delegating to the default FV matchCost 
implementation.
   
   Maybe have FunctionValues expose an abstract cost() method, have all FV 
derivatives implement it and then simply let VSC's matchCost use that method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

2020-03-01 Thread GitBox

atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593126930
 
 
   > Wouldn't this result in an infinite loop?
   
   The idea was that the underlying TwoPhaseIterator implementation for the 
nested FunctionValues would be an actual VSC derivative and not using the 
default matchCost version. I planned to merge this PR only for 8x and make 
matchCost an abstract method for master. BTW, what would be your suggestion to 
better evaluate FunctionValues's cost? Maybe we could look at the doc each time 
and see if it matches or not and then return a cost based on that?
   
   Another thing -- the Lucene test suite passes fine with this change. Does 
that mean that we are lacking comprehensive tests for VSC where two nested 
FunctionValues use the default VSC matchCost?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi commented on issue #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer

2020-03-01 Thread GitBox

danmuzi commented on issue #1296: LUCENE-9253: Support custom dictionaries in 
KoreanTokenizer
URL: https://github.com/apache/lucene-solr/pull/1296#issuecomment-593120577
 
 
   Thanks for your review! @msokolov
   
   I changed some upper cases in Javadoc.
   And I added a change log for this patch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer

2020-03-01 Thread GitBox

danmuzi commented on a change in pull request #1296: LUCENE-9253: Support 
custom dictionaries in KoreanTokenizer
URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386122724
 
 

 ##
 File path: 
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/BinaryDictionary.java
 ##
 @@ -150,7 +150,18 @@ protected final InputStream getResource(String suffix) 
throws IOException {
 throw new IllegalStateException("unknown resource scheme " + 
resourceScheme);
 }
   }
-  
+
+  public static InputStream getResource(ResourceScheme scheme, String path) 
throws IOException {
 
 Review comment:
   It's the same as Kuromoji's pattern.
   
   But there is no relation with Kuromoji.
   It's already written as an enum in this class.
   
https://github.com/apache/lucene-solr/blob/d5e44e95175dbf027915e162925057bbcc14200b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/BinaryDictionary.java#L45-L47


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer

2020-03-01 Thread GitBox

danmuzi commented on a change in pull request #1296: LUCENE-9253: Support 
custom dictionaries in KoreanTokenizer
URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386122678
 
 

 ##
 File path: 
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
 ##
 @@ -185,16 +185,43 @@ public KoreanTokenizer(AttributeFactory factory, 
UserDictionary userDictionary,
* @param discardPunctuation true if punctuation tokens should be dropped 
from the output.
*/
   public KoreanTokenizer(AttributeFactory factory, UserDictionary 
userDictionary, DecompoundMode mode, boolean outputUnknownUnigrams, boolean 
discardPunctuation) {
+this(factory,
+TokenInfoDictionary.getInstance(),
+UnknownDictionary.getInstance(),
+ConnectionCosts.getInstance(),
+userDictionary, mode, outputUnknownUnigrams, discardPunctuation);
+  }
+
+  /**
+   * Create a new KoreanTokenizer supplying a custom system dictionary and 
unknown dictionary.
+   * This constructor provides an entry point for users that want to construct 
custom language models
+   * that can be used as input to {@link 
org.apache.lucene.analysis.ko.util.DictionaryBuilder}.
+   *
+   * @param factory the AttributeFactory to use
+   * @param systemDictionary a custom known token dictionary
+   * @param unkDictionary a custom unknown token dictionary
+   * @param connectionCosts custom token transition costs
+   * @param userDictionary Optional: if non-null, user dictionary.
+   * @param mode Decompound mode.
+   * @param outputUnknownUnigrams If true outputs unigrams for unknown words.
 
 Review comment:
   Oh, I did that because it was capitalized before.
   I'll change the other constructors as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

2020-03-01 Thread GitBox

dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593116806
 
 
   Wouldn't this result in an infinite loop?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-01 Thread GitBox

msokolov commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r386109250
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/search/SliceExecutionControlPlane.java
 ##
 @@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.Collection;
+
+/**
+ * Execution control plane which is responsible
+ * for execution of slices based on the current status
+ * of the system and current system load
+ */
+public interface SliceExecutionControlPlane {
+  /**
+   * Invoke all slices that are allocated for the query
+   */
+  C invokeAll(Collection tasks);
 
 Review comment:
   Also- I'm curious if you saw any performance impact from the back pressure 
here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-01 Thread GitBox

msokolov commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r386109193
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/search/SliceExecutionControlPlane.java
 ##
 @@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.Collection;
+
+/**
+ * Execution control plane which is responsible
+ * for execution of slices based on the current status
+ * of the system and current system load
+ */
+public interface SliceExecutionControlPlane {
+  /**
+   * Invoke all slices that are allocated for the query
+   */
+  C invokeAll(Collection tasks);
 
 Review comment:
   This is an internal detail of IndexSearcher, right? We're always free to 
change the method signatures later (if we keep the classes package-private - we 
should!). Maybe it would help if you were to explain what extension you have in 
mind.
   
   By the way, using force push makes it more difficult for reviewers since we 
can't easily see what changed from one version to the next. In general it's 
better to push your commits and then squash-merge them at the end (github will 
even do this for you I think)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer

2020-03-01 Thread GitBox

msokolov commented on a change in pull request #1296: LUCENE-9253: Support 
custom dictionaries in KoreanTokenizer
URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386107769
 
 

 ##
 File path: 
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
 ##
 @@ -185,16 +185,43 @@ public KoreanTokenizer(AttributeFactory factory, 
UserDictionary userDictionary,
* @param discardPunctuation true if punctuation tokens should be dropped 
from the output.
*/
   public KoreanTokenizer(AttributeFactory factory, UserDictionary 
userDictionary, DecompoundMode mode, boolean outputUnknownUnigrams, boolean 
discardPunctuation) {
+this(factory,
+TokenInfoDictionary.getInstance(),
+UnknownDictionary.getInstance(),
+ConnectionCosts.getInstance(),
+userDictionary, mode, outputUnknownUnigrams, discardPunctuation);
+  }
+
+  /**
+   * Create a new KoreanTokenizer supplying a custom system dictionary and 
unknown dictionary.
+   * This constructor provides an entry point for users that want to construct 
custom language models
+   * that can be used as input to {@link 
org.apache.lucene.analysis.ko.util.DictionaryBuilder}.
+   *
+   * @param factory the AttributeFactory to use
+   * @param systemDictionary a custom known token dictionary
+   * @param unkDictionary a custom unknown token dictionary
+   * @param connectionCosts custom token transition costs
+   * @param userDictionary Optional: if non-null, user dictionary.
+   * @param mode Decompound mode.
+   * @param outputUnknownUnigrams If true outputs unigrams for unknown words.
 
 Review comment:
   Don't capitalize "If"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer

2020-03-01 Thread GitBox

msokolov commented on a change in pull request #1296: LUCENE-9253: Support 
custom dictionaries in KoreanTokenizer
URL: https://github.com/apache/lucene-solr/pull/1296#discussion_r386108024
 
 

 ##
 File path: 
lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/BinaryDictionary.java
 ##
 @@ -150,7 +150,18 @@ protected final InputStream getResource(String suffix) 
throws IOException {
 throw new IllegalStateException("unknown resource scheme " + 
resourceScheme);
 }
   }
-  
+
+  public static InputStream getResource(ResourceScheme scheme, String path) 
throws IOException {
 
 Review comment:
   so .. this basically follows the pattern from JapaneseTokenizer, I think. .. 
but somehow I don't see where we defined ResourceScheme? We're not referencing 
the one in kuromoji, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots

2020-03-01 Thread Anatolii Siuniaev (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatolii Siuniaev updated SOLR-14291:
-
Description: 
If you send a query with range facets using old olap-style syntax DMV(see pdf 
here), OldAnalyticsRequestConverter just silently (no exception thrown) omits 
parameters like
{code:java}
olap..rangefacet..start
{code}
in case if __ has dots inside (for instance field name is 
_Project.Value_). And thus no range facets are returned in response.  

Probably the same happens in case of field faceting. 

  was:
If you send a query with range facets using old olap-style syntax (see here), 
OldAnalyticsRequestConverter just silently (no exception thrown) omits 
parameters like
{code:java}
olap..rangefacet..start
{code}
in case if __ has dots inside (for instance field name is 
_Project.Value_). And thus no range facets are returned in response.  

Probably the same happens in case of field faceting. 


> OldAnalyticsRequestConverter should support fields names with dots
> --
>
> Key: SOLR-14291
> URL: https://issues.apache.org/jira/browse/SOLR-14291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search, SearchComponents - other
>Reporter: Anatolii Siuniaev
>Priority: Trivial
>
> If you send a query with range facets using old olap-style syntax DMV(see pdf 
> here), OldAnalyticsRequestConverter just silently (no exception thrown) omits 
> parameters like
> {code:java}
> olap..rangefacet..start
> {code}
> in case if __ has dots inside (for instance field name is 
> _Project.Value_). And thus no range facets are returned in response.  
> Probably the same happens in case of field faceting. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers

2020-03-01 Thread GitBox

msokolov merged pull request #1295: Lucene-9004: bug fix for searching the 
nearest one neighbor in higher layers
URL: https://github.com/apache/lucene-solr/pull/1295
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers

2020-03-01 Thread GitBox

msokolov commented on issue #1295: Lucene-9004: bug fix for searching the 
nearest one neighbor in higher layers
URL: https://github.com/apache/lucene-solr/pull/1295#issuecomment-593096201
 
 
   Ah, I see you're right @irvingzhang . I think we could also save something 
by eliminating the priority queue for this case - it's silly to use an 
(effectively) 1-length queue, when all we need is a variable, but this does 
make the implementation match the algorithm: I'll merge 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots

2020-03-01 Thread Anatolii Siuniaev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048251#comment-17048251
 ] 

Anatolii Siuniaev edited comment on SOLR-14291 at 3/1/20 1:15 PM:
--

Yep, I'll create a patch in couple of days. 
And I added the link too. 


was (Author: anatolii_siuniaev):
Yep, I'll create a patch in a couple of days. 
What do you mean by that article? 

> OldAnalyticsRequestConverter should support fields names with dots
> --
>
> Key: SOLR-14291
> URL: https://issues.apache.org/jira/browse/SOLR-14291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search, SearchComponents - other
>Reporter: Anatolii Siuniaev
>Priority: Trivial
>
> If you send a query with range facets using old olap-style syntax (see here), 
> OldAnalyticsRequestConverter just silently (no exception thrown) omits 
> parameters like
> {code:java}
> olap..rangefacet..start
> {code}
> in case if __ has dots inside (for instance field name is 
> _Project.Value_). And thus no range facets are returned in response.  
> Probably the same happens in case of field faceting. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-03-01 Thread Dr Oleg Savrasov (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048541#comment-17048541
 ] 

Dr Oleg Savrasov commented on SOLR-13411:
-

[~mkhl] , [~dsmiley] Thank you, guys.

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Assignee: Mikhail Khludnev
>Priority: Minor
> Fix For: 8.5
>
> Attachments: SOLR-13411.patch, SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers

2020-03-01 Thread GitBox

irvingzhang edited a comment on issue #1295: Lucene-9004: bug fix for searching 
the nearest one neighbor in higher layers
URL: https://github.com/apache/lucene-solr/pull/1295#issuecomment-592287771
 
 
   > I believe in practice that results. max size is always set to ef, so there 
shouldn't be any real issue. I agree that the interface doesn't make that 
plain; we should enforce this invariant by API contract
   
   Hi, @msokolov, I agree that the max size is always set to _ef_, but _ef_ has 
different values in different layers. 
   According to **Algorithm 5** of Yury's 
[papar](https://arxiv.org/pdf/1603.09320.pdf),  HNSW searches the nearest one 
neighbor (namely, _ef_=1) from the top layer to the 1st layer,  and then finds 
the nearest _ef_ (_ef_=topK) neighbors from layer 0. In the implementation of 
Lucene HNSW, the actual size of result queue (Line 64, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java))
 is set to _ef_=topK when searching from top layer to the 1st layer while 
expected neighbor size is 1, result in finding more neighbors than expected. 
Even if the parameter _ef_ is set to 1 in Line 66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java),
 the condition `if (dist < f.distance() || results.size() < ef)` (Line 87, 
[HNSWGraph](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraph.java))
 allows inserting more than 1 neighbor to the "results" queue when `dist < 
f.distance()` and `results.size() >= ef` (here _ef_=1, corresponding to  Line 
66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java))
 because the max size of "results" is topK, which implies that the actual size 
of "results" queue belongs to [1, topK].
   
   **The simplest way to verify this problem is to print the actual size of 
neighbors**. For example, add "System.out.println(neighbors.size());" after 
"visitedCount += hnsw.searchLayer(query, neighbors, 1, l, vectorValues);" (Line 
66, 
[HNSWGraphReader](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphReader.java)),
 where the nearest one neighbor is expected, but the printed neighbor size 
would be range from 1~topK. Which also applies to 
[HNSWGraphWriter](https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/util/hnsw/HNSWGraphWriter.java).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on a change in pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d

2020-03-01 Thread GitBox

iverase commented on a change in pull request #1253: LUCENE-9150: Restore 
support for dynamic PlanetModel in spatial3d
URL: https://github.com/apache/lucene-solr/pull/1253#discussion_r386092274
 
 

 ##
 File path: 
lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/PlanetModel.java
 ##
 @@ -383,30 +509,233 @@ public GeoPoint surfacePointOnBearing(final GeoPoint 
from, final double dist, fi
   Δσ = B * sinσ * (cos2σM + B / 4.0 * (cosσ * (-1.0 + 2.0 * cos2σM * 
cos2σM) -
   B / 6.0 * cos2σM * (-3.0 + 4.0 * sinσ * sinσ) * (-3.0 + 4.0 * cos2σM 
* cos2σM)));
   σʹ = σ;
-  σ = dist / (c * inverseScale * A) + Δσ;
+  σ = dist / (zScaling * inverseScale * A) + Δσ;
 } while (Math.abs(σ - σʹ) >= Vector.MINIMUM_RESOLUTION && ++iterations < 
100);
 double x = sinU1 * sinσ - cosU1 * cosσ * cosα1;
-double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - 
flattening) * Math.sqrt(sinα * sinα + x * x));
+double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - 
scaledFlattening) * Math.sqrt(sinα * sinα + x * x));
 double λ = Math.atan2(sinσ * sinα1, cosU1 * cosσ - sinU1 * sinσ * cosα1);
-double C = flattening / 16.0 * cosSqα * (4.0 + flattening * (4.0 - 3.0 * 
cosSqα));
-double L = λ - (1.0 - C) * flattening * sinα *
+double C = scaledFlattening / 16.0 * cosSqα * (4.0 + scaledFlattening * 
(4.0 - 3.0 * cosSqα));
+double L = λ - (1.0 - C) * scaledFlattening * sinα *
 (σ + C * sinσ * (cos2σM + C * cosσ * (-1.0 + 2.0 * cos2σM * cos2σM)));
 double λ2 = (lon + L + 3.0 * Math.PI) % (2.0 * Math.PI) - Math.PI;  // 
normalise to -180..+180
 
 return new GeoPoint(this, φ2, λ2);
   }
 
+  /** Utility class for encoding / decoding from lat/lon (decimal degrees) 
into sortable doc value numerics (integers) */
+  public static class DocValueEncoder {
+private final PlanetModel planetModel;
+
+// These are the multiplicative constants we need to use to arrive at 
values that fit in 21 bits.
+// The formula we use to go from double to encoded value is:  
Math.floor((value - minimum) * factor + 0.5)
+// If we plug in maximum for value, we should get 0x1F.
+// So, 0x1F = Math.floor((maximum - minimum) * factor + 0.5)
+// We factor out the 0.5 and Math.floor by stating instead:
+// 0x1F = (maximum - minimum) * factor
+// So, factor = 0x1F / (maximum - minimum)
+
+private final static double inverseMaximumValue = 1.0 / (double)(0x1F);
+
+private final double inverseXFactor;
+private final double inverseYFactor;
+private final double inverseZFactor;
+
+private final double xFactor;
+private final double yFactor;
+private final double zFactor;
+
+// Fudge factor for step adjustments.  This is here solely to handle 
inaccuracies in bounding boxes
+// that occur because of quantization.  For unknown reasons, the fudge 
factor needs to be
+// 10.0 rather than 1.0.  See LUCENE-7430.
+
+private final static double STEP_FUDGE = 10.0;
+
+// These values are the delta between a value and the next value in each 
specific dimension
+
+private final double xStep;
+private final double yStep;
+private final double zStep;
+
+/** construct an encoder/decoder instance from the provided PlanetModel 
definition */
+public DocValueEncoder(final PlanetModel planetModel) {
 
 Review comment:
   make constructor private?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-9114) Add FunctionValues.cost

2020-03-01 Thread Atri Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Atri Sharma reassigned LUCENE-9114:
---

Assignee: Atri Sharma

> Add FunctionValues.cost
> ---
>
> Key: LUCENE-9114
> URL: https://issues.apache.org/jira/browse/LUCENE-9114
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/query
>Reporter: David Smiley
>Assignee: Atri Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a 
> subclass of  ValueSourceScorer.  VSC's TwoPhaseIterator has a matchCost impl 
> that returns a constant 100.  This is pretty terrible; the cost should vary 
> based on the complexity of the ValueSource provided to FRQ.  ValueSource's 
> are typically nested a number of levels, so they should aggregate.
> BTW there is a parallel concern for FunctionMatchQuery which works with 
> DoubleValuesSource which doesn't have a cost either, and unsurprisingly there 
> is a TPI with matchCost 100 there.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost

2020-03-01 Thread Atri Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048492#comment-17048492
 ] 

Atri Sharma commented on LUCENE-9114:
-

[~dsmiley] I have raised a PR for the same -- it is a minimalistic change to 
allow VSS to incorporate the delegated FunctionValues's cost into its cost. 
Would that help you get unblocked by adding stacked FunctionValues with custom 
costing functions?

> Add FunctionValues.cost
> ---
>
> Key: LUCENE-9114
> URL: https://issues.apache.org/jira/browse/LUCENE-9114
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/query
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a 
> subclass of  ValueSourceScorer.  VSC's TwoPhaseIterator has a matchCost impl 
> that returns a constant 100.  This is pretty terrible; the cost should vary 
> based on the complexity of the ValueSource provided to FRQ.  ValueSource's 
> are typically nested a number of levels, so they should aggregate.
> BTW there is a parallel concern for FunctionMatchQuery which works with 
> DoubleValuesSource which doesn't have a cost either, and unsurprisingly there 
> is a TPI with matchCost 100 there.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] atris opened a new pull request #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

2020-03-01 Thread GitBox

atris opened a new pull request #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303
 
 
   This commit makes ValueSourceScorer's costing algorithm to also take the 
delegated FunctionValues's cost into consideration when calculating its cost.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

40 matches

Mail list logo