[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-01-28 Thread Xin-Chun Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025647#comment-17025647
 ] 

Xin-Chun Zhang commented on LUCENE-9004:


"This is already gigantic - what would be the benefit of merging?"

-- Yes, I agree that it's gigantic. It's only a personal proposal based on the 
following considerations,
 * Both issues concern about the same problem that searching approximate 
nearest neighbor in vector space, which implies that the key parts of the 
design and implementation could be reused, _e.g._ vector format and the 
corresponding reader/writer. The implementation could be then more elegant.
 * Moreover, we could make sure that the provided interfaces are consistent and 
compatible.

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does 

[jira] [Resolved] (SOLR-14211) TestBulkSchemaConcurrent failures

2020-01-28 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14211.
-
Resolution: Fixed

> TestBulkSchemaConcurrent failures
> -
>
> Key: SOLR-14211
> URL: https://issues.apache.org/jira/browse/SOLR-14211
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.5
>
>
> This bug is caused by a recent change in SOLR-14192 - SchemaManager tries to 
> locate in ZK a schema resource name without prepending the full configset 
> path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14227) Query parsing with switch query parser number of results not matching

2020-01-28 Thread santosh rachuri (Jira)
santosh rachuri created SOLR-14227:
--

 Summary: Query parsing with switch query parser number of results 
not matching
 Key: SOLR-14227
 URL: https://issues.apache.org/jira/browse/SOLR-14227
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: config-api
Affects Versions: 5.2.1
 Environment: OS is windows. 
Reporter: santosh rachuri
 Fix For: 5.2.1


Hi 
I have described the issue in this link. I have not received any response for 
the issue so here I am, posting my issue here.

[https://stackoverflow.com/questions/59931921/solr-exact-match-results-not-matching]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms

2020-01-28 Thread Shalin Shekhar Mangar (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025563#comment-17025563
 ] 

Shalin Shekhar Mangar commented on SOLR-13897:
--

Thanks [~jpountz] for fixing. I forgot that javadoc changes can cause precommit 
to fail.

> Unsafe publication of Terms object in ZkShardTerms
> --
>
> Key: SOLR-13897
> URL: https://issues.apache.org/jira/browse/SOLR-13897
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.2, 8.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, 
> SOLR-13897.patch
>
>
> The Terms object in ZkShardTerms is written using a write lock but reading is 
> allowed freely. This is not safe and can cause visibility issues and 
> associated race conditions under contention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14224) Not able to build solr 6.6.2 from source after January 15, 2020

2020-01-28 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025558#comment-17025558
 ] 

Ishan Chattopadhyaya commented on SOLR-14224:
-

SOLR-13756 is related, fyi.

> Not able to build solr 6.6.2 from source after January 15, 2020
> ---
>
> Key: SOLR-14224
> URL: https://issues.apache.org/jira/browse/SOLR-14224
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6.2
>Reporter: Guruprasad K K
>Priority: Major
>
> After Jan 15th maven is allowing only https connections to repo. But solr 
> 6.6.2 version uses http connection. So builds are failing.
> But looks like latest version of solr has the fix to this in common_build.xml 
> and other places where it uses https connection to maven.
>  
> Error log:
>  ivy-bootstrap1:
>  [mkdir] Created dir: /root/.ant/lib
>  [echo] installing ivy 2.3.0 to /root/.ant/lib
>  [get] Getting: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] To: /root/.ant/lib/ivy-2.3.0.jar
>  [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Can't get 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
>  to /root/.ant/lib/ivy-2.3.0.jar
>  
>  
>  
>  
> [NOTE]: It works on latest version of solr, where http is converted to https



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14224) Not able to build solr 6.6.2 from source after January 15, 2020

2020-01-28 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025557#comment-17025557
 ] 

Ishan Chattopadhyaya commented on SOLR-14224:
-

If this is the case, we should (not must), investigate and help affected users.

> Not able to build solr 6.6.2 from source after January 15, 2020
> ---
>
> Key: SOLR-14224
> URL: https://issues.apache.org/jira/browse/SOLR-14224
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6.2
>Reporter: Guruprasad K K
>Priority: Major
>
> After Jan 15th maven is allowing only https connections to repo. But solr 
> 6.6.2 version uses http connection. So builds are failing.
> But looks like latest version of solr has the fix to this in common_build.xml 
> and other places where it uses https connection to maven.
>  
> Error log:
>  ivy-bootstrap1:
>  [mkdir] Created dir: /root/.ant/lib
>  [echo] installing ivy 2.3.0 to /root/.ant/lib
>  [get] Getting: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] To: /root/.ant/lib/ivy-2.3.0.jar
>  [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Can't get 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
>  to /root/.ant/lib/ivy-2.3.0.jar
>  
>  
>  
>  
> [NOTE]: It works on latest version of solr, where http is converted to https



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14224) Not able to build solr 6.6.2 from source after January 15, 2020

2020-01-28 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-14224:

Status: Reopened  (was: Closed)

> Not able to build solr 6.6.2 from source after January 15, 2020
> ---
>
> Key: SOLR-14224
> URL: https://issues.apache.org/jira/browse/SOLR-14224
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6.2
>Reporter: Guruprasad K K
>Priority: Major
>
> After Jan 15th maven is allowing only https connections to repo. But solr 
> 6.6.2 version uses http connection. So builds are failing.
> But looks like latest version of solr has the fix to this in common_build.xml 
> and other places where it uses https connection to maven.
>  
> Error log:
>  ivy-bootstrap1:
>  [mkdir] Created dir: /root/.ant/lib
>  [echo] installing ivy 2.3.0 to /root/.ant/lib
>  [get] Getting: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] To: /root/.ant/lib/ivy-2.3.0.jar
>  [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Can't get 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
>  to /root/.ant/lib/ivy-2.3.0.jar
>  
>  
>  
>  
> [NOTE]: It works on latest version of solr, where http is converted to https



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (SOLR-14224) Not able to build solr 6.6.2 from source after January 15, 2020

2020-01-28 Thread Guruprasad K K (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guruprasad K K closed SOLR-14224.
-

> Not able to build solr 6.6.2 from source after January 15, 2020
> ---
>
> Key: SOLR-14224
> URL: https://issues.apache.org/jira/browse/SOLR-14224
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6.2
>Reporter: Guruprasad K K
>Priority: Major
>
> After Jan 15th maven is allowing only https connections to repo. But solr 
> 6.6.2 version uses http connection. So builds are failing.
> But looks like latest version of solr has the fix to this in common_build.xml 
> and other places where it uses https connection to maven.
>  
> Error log:
>  ivy-bootstrap1:
>  [mkdir] Created dir: /root/.ant/lib
>  [echo] installing ivy 2.3.0 to /root/.ant/lib
>  [get] Getting: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] To: /root/.ant/lib/ivy-2.3.0.jar
>  [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Can't get 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
>  to /root/.ant/lib/ivy-2.3.0.jar
>  
>  
>  
>  
> [NOTE]: It works on latest version of solr, where http is converted to https



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14201) some SolrCore are not released after being removed

2020-01-28 Thread Vinh Le (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024969#comment-17024969
 ] 

Vinh Le edited comment on SOLR-14201 at 1/29/20 3:07 AM:
-

h3. Without both optimize and alias
{code:java}
#!/bin/bash -e
HOST=http://localhost:8983/solr
# Base on alias
# PREV_COLLECTION=$(http --timeout=300 
"$HOST/admin/collections?action=LISTALIASES" | jq -r ".aliases.SGFAS")
# Base on last collection
PREV_COLLECTION=$(http --timeout=300 "$HOST/admin/collections?action=LIST" | jq 
-r ".collections[0]")

COLLECTION="next_$(gdate +%H%M%S)"
# COLLECTION="next_1029"

echo "Create new collection = $COLLECTION"
http --timeout=300 POST 
"$HOST/admin/collections?action=CREATE=$COLLECTION=seafas=1"

echo "Push data to new collection"
cat docs.xml | http --timeout=300 POST 
"$HOST/$COLLECTION/update?commitWithin=1000=true=json" 
"Content-Type: text/xml"

# echo "Optimize"
# http --timeout=300 
"$HOST/$COLLECTION/update?optimize=true=1=false"
# echo "Update alias"
# http --timeout=300 
"$HOST/admin/collections?action=CREATEALIAS=$COLLECTION=SGFAS"

echo "Delete previous collection = $PREV_COLLECTION"
http --timeout=300 "$HOST/admin/collections?action=DELETE=$PREV_COLLECTION"

echo "Classes.loaded"
http --timeout=300 "http://localhost:8983/solr/admin/metrics; | jq 
'.metrics."solr.jvm"."classes.loaded"'

{code}
 

Basically, just remove the previous collection after creating a new one.
  
 !image-2020-01-28-16-59-51-813.png|width=853,height=645!
  
 Classes loaded still keeps increasing.
  
  


was (Author: vinhlh):
h3. Without both optimize and alias
{code:java}
#!/bin/bash -e
HOST=http://localhost:8983/solr
# Base on alias
# PREV_COLLECTION=$(http --timeout=300 
"$HOST/admin/collections?action=LISTALIASES" | jq -r ".aliases.SGFAS")
# Base on last collection
PREV_COLLECTION=$(http --timeout=300 "$HOST/admin/collections?action=LIST" | jq 
-r ".collections[0]")COLLECTION="next_$(gdate +%H%M%S)"
# COLLECTION="next_1029"

echo "Create new collection = $COLLECTION"
http --timeout=300 POST 
"$HOST/admin/collections?action=CREATE=$COLLECTION=seafas=1"

echo "Push data to new collection"
cat docs.xml | http --timeout=300 POST 
"$HOST/$COLLECTION/update?commitWithin=1000=true=json" 
"Content-Type: text/xml"

# echo "Optimize"
# http --timeout=300 
"$HOST/$COLLECTION/update?optimize=true=1=false"
# echo "Update alias"
# http --timeout=300 
"$HOST/admin/collections?action=CREATEALIAS=$COLLECTION=SGFAS"

echo "Delete previous collection = $PREV_COLLECTION"
http --timeout=300 "$HOST/admin/collections?action=DELETE=$PREV_COLLECTION"

echo "Classes.loaded"
http --timeout=300 "http://localhost:8983/solr/admin/metrics; | jq 
'.metrics."solr.jvm"."classes.loaded"'

{code}
 

Basically, just remove the previous collection after creating a new one.
  
 !image-2020-01-28-16-59-51-813.png|width=853,height=645!
  
 Classes loaded still keeps increasing.
  
  

> some SolrCore are not released after being removed
> --
>
> Key: SOLR-14201
> URL: https://issues.apache.org/jira/browse/SOLR-14201
> Project: Solr
>  Issue Type: Bug
>Reporter: Christine Poerschke
>Priority: Major
> Attachments: image-2020-01-22-10-39-15-301.png, 
> image-2020-01-22-10-42-17-511.png, image-2020-01-22-12-28-46-241.png, 
> image-2020-01-22-14-45-52-730.png, image-2020-01-28-16-17-44-030.png, 
> image-2020-01-28-16-19-43-760.png, image-2020-01-28-16-20-50-709.png, 
> image-2020-01-28-16-59-51-813.png
>
>
> [~vinhlh] reported in SOLR-10506 (affecting 6.5 with fixes in 6.6.6 and 7.0):
> bq. In 7.7.2, some SolrCore still are not released after being removed.
> https://issues.apache.org/jira/secure/attachment/12991357/image-2020-01-20-14-51-26-411.png
> Starting this ticket for a separate investigation and fix. A next 
> investigative step could be to try and reproduce the issue on the latest 8.x 
> release.
>   
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on issue #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
dweiss commented on issue #1219:  LUCENE-9134: Javacc skeleton for Gradle 
regenerate
URL: https://github.com/apache/lucene-solr/pull/1219#issuecomment-579468672
 
 
   bq. Which is somewhat misleading? Does the task not work if we don't delete 
files first?
   
   javacc tries to be smart and analyzes those files looking for any local 
changes. Since we do introduce such changes it wouldn't touch those files, 
hence the delete.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on issue #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
dweiss commented on issue #1219:  LUCENE-9134: Javacc skeleton for Gradle 
regenerate
URL: https://github.com/apache/lucene-solr/pull/1219#issuecomment-579467903
 
 
   I thought so at first... but then I realized it will be a nightmare to 
regenerate some of these -- like the jflex one that requires 10 gigs and 20 
minutes on my machine (!).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
dweiss commented on a change in pull request #1219:  LUCENE-9134: Javacc 
skeleton for Gradle regenerate
URL: https://github.com/apache/lucene-solr/pull/1219#discussion_r372071747
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
+  if (file.text.contains("Generated By:JavaCC")) {
+file.delete()
+  }
+}
+logger.lifecycle("Regenerating JavaCC:\n  from: ${javaccFile}\nto: 
${parentDir}")
+
+project.javaexec {
+  classpath {
+project.rootProject.configurations.javacc
+  }
+  main = "org.javacc.parser.Main"
+  args += "-OUTPUT_DIRECTORY=${parentDir}"
+  args += [javaccFile]
 
 Review comment:
   it is correct but unnecessary


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on issue #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
madrob commented on issue #1219:  LUCENE-9134: Javacc skeleton for Gradle 
regenerate
URL: https://github.com/apache/lucene-solr/pull/1219#issuecomment-579461009
 
 
   Do we want to have a top level regenerate task that will regenerate 
everything? Fine to push that out to a later issue also.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
madrob commented on a change in pull request #1219:  LUCENE-9134: Javacc 
skeleton for Gradle regenerate
URL: https://github.com/apache/lucene-solr/pull/1219#discussion_r372050229
 
 

 ##
 File path: build.gradle
 ##
 @@ -40,7 +40,8 @@ ext {
   // https://github.com/palantir/gradle-consistent-versions/issues/383
   scriptDepVersions = [
   "apache-rat": "0.11",
-  "jflex": "1.7.0"
+  "jflex": "1.7.0",
+  "javacc": "5.0"
 
 Review comment:
   nit: can we alphabetize this? or some other consistent ordering?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
madrob commented on a change in pull request #1219:  LUCENE-9134: Javacc 
skeleton for Gradle regenerate
URL: https://github.com/apache/lucene-solr/pull/1219#discussion_r372052356
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
+  if (file.text.contains("Generated By:JavaCC")) {
+file.delete()
+  }
+}
+logger.lifecycle("Regenerating JavaCC:\n  from: ${javaccFile}\nto: 
${parentDir}")
+
+project.javaexec {
+  classpath {
+project.rootProject.configurations.javacc
+  }
+  main = "org.javacc.parser.Main"
+  args += "-OUTPUT_DIRECTORY=${parentDir}"
+  args += [javaccFile]
 
 Review comment:
   is this right to pass an array?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
madrob commented on a change in pull request #1219:  LUCENE-9134: Javacc 
skeleton for Gradle regenerate
URL: https://github.com/apache/lucene-solr/pull/1219#discussion_r372053503
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
+  if (file.text.contains("Generated By:JavaCC")) {
+file.delete()
+  }
+}
+logger.lifecycle("Regenerating JavaCC:\n  from: ${javaccFile}\nto: 
${parentDir}")
+
+project.javaexec {
+  classpath {
+project.rootProject.configurations.javacc
+  }
+  main = "org.javacc.parser.Main"
+  args += "-OUTPUT_DIRECTORY=${parentDir}"
+  args += [javaccFile]
+}
+  }
+}
+
+
+configure(project(":lucene:queryparser")) {
+  task javaccParserClassic(type: JavaCCTask) {
+description "Regenerate classic query parser from java CC.java"
 
 Review comment:
   I think this is missing the specific description?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14225) Upgrade jaegertracing

2020-01-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025409#comment-17025409
 ] 

Jan Høydahl commented on SOLR-14225:


I have not assigned myself yet, so this jira is up for grabs :) 

> Upgrade jaegertracing
> -
>
> Key: SOLR-14225
> URL: https://issues.apache.org/jira/browse/SOLR-14225
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Priority: Major
>
> Upgrade jaegertracing from 0.35.5 to 1.1.0. This will also give us a newer 
> libthrift which is more stable and secure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
dweiss commented on a change in pull request #1218: LUCENE-9134: Javacc skeleton
URL: https://github.com/apache/lucene-solr/pull/1218#discussion_r372005277
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
+  if (file.text.contains("Generated By:JavaCC")) {
+file.delete()
+  }
+}
+logger.lifecycle("Regenerating JavaCC:\n  from: ${javaccFile}\nto: 
${parentDir}")
+
+project.javaexec {
+  classpath {
+project.rootProject.configurations.javacc
+  }
+  main = "org.javacc.parser.Main"
+  args += "-OUTPUT_DIRECTORY=${parentDir}"
+  args += [javaccFile]
+}
+  }
+}
+
+
+configure(project(":lucene:queryparser")) {
+  task javaccParserClassic(type: JavaCCTask) {
+description "Regenerate classic query parser from java CC.java"
+group "generation"
+
+javaccFile = 
file('src/java/org/apache/lucene/queryparser/classic/QueryParser.jj')
+def parent = javaccFile.parentFile.toString() // I'll need this later.
+
+doLast {
+  // There'll be a lot of cleanup in here to get precommits and builds to 
pass, but as long as we don't
 
 Review comment:
   Thanks Erick. I'm exhausted today - will commit it in tomorrow though. Good 
one with the coaching... some would call it whining!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025381#comment-17025381
 ] 

ASF subversion and git services commented on SOLR-13897:


Commit 7941d109bde418bf53a8b6cf547b8af21c0c1835 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7941d10 ]

SOLR-13897: Fix precommit.


> Unsafe publication of Terms object in ZkShardTerms
> --
>
> Key: SOLR-13897
> URL: https://issues.apache.org/jira/browse/SOLR-13897
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.2, 8.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, 
> SOLR-13897.patch
>
>
> The Terms object in ZkShardTerms is written using a write lock but reading is 
> allowed freely. This is not safe and can cause visibility issues and 
> associated race conditions under contention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025380#comment-17025380
 ] 

ASF subversion and git services commented on SOLR-13897:


Commit 47c01af39472b37e6a90b08d3eee7cc0b180428d in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=47c01af ]

SOLR-13897: Fix precommit.


> Unsafe publication of Terms object in ZkShardTerms
> --
>
> Key: SOLR-13897
> URL: https://issues.apache.org/jira/browse/SOLR-13897
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.2, 8.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, 
> SOLR-13897.patch
>
>
> The Terms object in ZkShardTerms is written using a write lock but reading is 
> allowed freely. This is not safe and can cause visibility issues and 
> associated race conditions under contention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9115) NRTCachingDirectory may put large files in the cache

2020-01-28 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9115.
--
Fix Version/s: 8.5
   Resolution: Fixed

> NRTCachingDirectory may put large files in the cache
> 
>
> Key: LUCENE-9115
> URL: https://issues.apache.org/jira/browse/LUCENE-9115
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NRTCachingDirectory assumes that the length of a file to write is 0 if there 
> is no merge info or flush info. This is not correct as there are situations 
> when Lucene might write very large files that have neither of them, for 
> instance:
>  - Stored fields are written on the fly with IOContext.DEFAULT (which doesn't 
> have flush or merge info) and without taking any of the IndexWriter buffer, 
> so gigabytes could be written before a flush happens.
>  - BKD trees are merged with IOContext.DEFAULT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9118) BlockTreeTermsReader should use compareUnsigned to compare suffixes

2020-01-28 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9118.
--
Fix Version/s: 8.5
   Resolution: Fixed

> BlockTreeTermsReader should use compareUnsigned to compare suffixes
> ---
>
> Key: LUCENE-9118
> URL: https://issues.apache.org/jira/browse/LUCENE-9118
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently BlockTreeTermsReader performs manual comparison of the target and 
> suffix bytes, it could use {{Arrays#compareUnsigned}} instead. I'm not 
> expecting a performance improvement from this, this is mostly to simplify.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9161) DirectMonotonicWriter should check for overflows

2020-01-28 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9161.
--
Fix Version/s: 8.5
   Resolution: Fixed

> DirectMonotonicWriter should check for overflows
> 
>
> Key: LUCENE-9161
> URL: https://issues.apache.org/jira/browse/LUCENE-9161
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DirectMonotonicWriter doesn't verify that the provided blockShift is 
> compatible with the number of written values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4702) Terms dictionary compression

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025370#comment-17025370
 ] 

ASF subversion and git services commented on LUCENE-4702:
-

Commit 033220e2ab31494054b26c236be4b43b777aea02 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=033220e ]

LUCENE-4702: Reduce terms dictionary compression overhead. (#1216)

Changes include:
 - Removed LZ4 compression of suffix lengths which didn't save much space
   anyway.
 - For stats, LZ4 was only really used for run-length compression of terms whose
   docFreq is 1. This has been replaced by explicit run-length compression.
 - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we
   now only try LZ4 out if the average suffix length is greater than 6, in order
   to reduce index-time overhead.


> Terms dictionary compression
> 
>
> Key: LUCENE-4702
> URL: https://issues.apache.org/jira/browse/LUCENE-4702
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-4702.patch, LUCENE-4702.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> I've done a quick test with the block tree terms dictionary by replacing a 
> call to IndexOutput.writeBytes to write suffix bytes with a call to 
> LZ4.compressHC to test the peformance hit. Interestingly, search performance 
> was very good (see comparison table below) and the tim files were 14% smaller 
> (from 150432 bytes overall to 129516).
> {noformat}
> TaskQPS baseline  StdDevQPS compressed  StdDev
> Pct diff
>   Fuzzy1  111.50  (2.0%)   78.78  (1.5%)  
> -29.4% ( -32% -  -26%)
>   Fuzzy2   36.99  (2.7%)   28.59  (1.5%)  
> -22.7% ( -26% -  -18%)
>  Respell  122.86  (2.1%)  103.89  (1.7%)  
> -15.4% ( -18% -  -11%)
> Wildcard  100.58  (4.3%)   94.42  (3.2%)   
> -6.1% ( -13% -1%)
>  Prefix3  124.90  (5.7%)  122.67  (4.7%)   
> -1.8% ( -11% -9%)
>OrHighLow  169.87  (6.8%)  167.77  (8.0%)   
> -1.2% ( -15% -   14%)
>  LowTerm 1949.85  (4.5%) 1929.02  (3.4%)   
> -1.1% (  -8% -7%)
>   AndHighLow 2011.95  (3.5%) 1991.85  (3.3%)   
> -1.0% (  -7% -5%)
>   OrHighHigh  155.63  (6.7%)  154.12  (7.9%)   
> -1.0% ( -14% -   14%)
>  AndHighHigh  341.82  (1.2%)  339.49  (1.7%)   
> -0.7% (  -3% -2%)
>OrHighMed  217.55  (6.3%)  216.16  (7.1%)   
> -0.6% ( -13% -   13%)
>   IntNRQ   53.10 (10.9%)   52.90  (8.6%)   
> -0.4% ( -17% -   21%)
>  MedTerm  998.11  (3.8%)  994.82  (5.6%)   
> -0.3% (  -9% -9%)
>  MedSpanNear   60.50  (3.7%)   60.36  (4.8%)   
> -0.2% (  -8% -8%)
> HighSpanNear   19.74  (4.5%)   19.72  (5.1%)   
> -0.1% (  -9% -9%)
>  LowSpanNear  101.93  (3.2%)  101.82  (4.4%)   
> -0.1% (  -7% -7%)
>   AndHighMed  366.18  (1.7%)  366.93  (1.7%)
> 0.2% (  -3% -3%)
> PKLookup  237.28  (4.0%)  237.96  (4.2%)
> 0.3% (  -7% -8%)
>MedPhrase  173.17  (4.7%)  174.69  (4.7%)
> 0.9% (  -8% -   10%)
>  LowSloppyPhrase  180.91  (2.6%)  182.79  (2.7%)
> 1.0% (  -4% -6%)
>LowPhrase  374.64  (5.5%)  379.11  (5.8%)
> 1.2% (  -9% -   13%)
> HighTerm  253.14  (7.9%)  256.97 (11.4%)
> 1.5% ( -16% -   22%)
>   HighPhrase   19.52 (10.6%)   19.83 (11.0%)
> 1.6% ( -18% -   25%)
>  MedSloppyPhrase  141.90  (2.6%)  144.11  (2.5%)
> 1.6% (  -3% -6%)
> HighSloppyPhrase   25.26  (4.8%)   25.97  (5.0%)
> 2.8% (  -6% -   13%)
> {noformat}
> Only queries which are very terms-dictionary-intensive got a performance hit 
> (Fuzzy, Fuzzy2, Respell, Wildcard), other queries including Prefix3 behaved 
> (surprisingly) well.
> Do you think of it as something worth exploring?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9161) DirectMonotonicWriter should check for overflows

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025371#comment-17025371
 ] 

ASF subversion and git services commented on LUCENE-9161:
-

Commit 25fc09ee9e08ca7ef81962caadad2ee79d6ac2ef in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=25fc09e ]

LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197)



> DirectMonotonicWriter should check for overflows
> 
>
> Key: LUCENE-9161
> URL: https://issues.apache.org/jira/browse/LUCENE-9161
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DirectMonotonicWriter doesn't verify that the provided blockShift is 
> compatible with the number of written values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-01-28 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025367#comment-17025367
 ] 

David Smiley commented on LUCENE-8962:
--

[~msfroh] as you can see above, I accomplished the effect here already in a 
different way without modifying Lucene.  Not that I think we shouldn't modify 
Lucene altogether but I think the changes can be limited to _implementations 
of_ MergePolicy & MergeScheduler without needing to modify the abstractions 
themselves or core Lucene, which are already sufficient.  See LUCENE-8331 for a 
benchmark utility.  I should resume this work.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14226) SolrStream reports AuthN/AuthZ failures (401|403) as IOException w/o details

2020-01-28 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025365#comment-17025365
 ] 

Chris M. Hostetter commented on SOLR-14226:
---

>From a test i'm in the process of trying to write...

{code:java}
  public void testEchoStreamFail() throws Exception {
final SolrStream solrStream = new SolrStream(solrUrl,
 params("qt", "/stream", 
"expr", "echo(hello 
world)"));
solrStream.setCredentials("bogus_user", "bogus_pass");
SolrException e = expectThrows(SolrException.class, () -> {
final List ignored = getTuples(solrStream);
  });
assertEquals(401, e.code());
  }

{code}

{noformat}
   [junit4]> Throwable #1: junit.framework.AssertionFailedError: Unexpected 
exception type, expected SolrException but got java.io.IOException: --> 
http://127.0.0.1:35337/solr/collection_x: An exception has occurred on the 
server, refer to server log for details.
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([F7287DED4A9F66CA:B866576F16986894]:0)
   [junit4]>at 
org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2752)
   [junit4]>at 
org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2740)
   [junit4]>at 
org.apache.solr.client.solrj.io.stream.CloudAuthStreamTest.testEchoStreamFail(CloudAuthStreamTest.java:208)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]>at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]>at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
   [junit4]>at java.base/java.lang.Thread.run(Thread.java:834)
   [junit4]> Caused by: java.io.IOException: --> 
http://127.0.0.1:35337/solr/collection_x: An exception has occurred on the 
server, refer to server log for details.
   [junit4]>at 
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:232)
   [junit4]>at 
org.apache.solr.client.solrj.io.stream.CloudAuthStreamTest.getTuples(CloudAuthStreamTest.java:221)
   [junit4]>at 
org.apache.solr.client.solrj.io.stream.CloudAuthStreamTest.lambda$testEchoStreamFail$3(CloudAuthStreamTest.java:209)
   [junit4]>at 
org.apache.lucene.util.LuceneTestCase._expectThrows(LuceneTestCase.java:2870)
   [junit4]>at 
org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2745)
   [junit4]>... 41 more
   [junit4]> Caused by: org.noggit.JSONParser$ParseException: JSON Parse 
Error: char=<,position=0 AFTER='<' BEFORE='html>  https://issues.apache.org/jira/browse/SOLR-14226
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ, streaming expressions
>Reporter: Chris M. Hostetter
>Priority: Major
>
> If you try to use the SolrJ {{SolrStream}} class to making a streaming 
> expression request to a solr node, any authentication or authorization 
> failures will be swallowed and a eneric "IOException" will be thrown.
> (evidently due to a pars error trying to read the body of the response w/o 
> consulting the HTTP status?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14226) SolrStream reports AuthN/AuthZ failures (401|403) as IOException w/o details

2020-01-28 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created SOLR-14226:
-

 Summary: SolrStream reports AuthN/AuthZ failures (401|403) as 
IOException w/o details
 Key: SOLR-14226
 URL: https://issues.apache.org/jira/browse/SOLR-14226
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ, streaming expressions
Reporter: Chris M. Hostetter


If you try to use the SolrJ {{SolrStream}} class to making a streaming 
expression request to a solr node, any authentication or authorization failures 
will be swallowed and a eneric "IOException" will be thrown.

(evidently due to a pars error trying to read the body of the response w/o 
consulting the HTTP status?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9161) DirectMonotonicWriter should check for overflows

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025344#comment-17025344
 ] 

ASF subversion and git services commented on LUCENE-9161:
-

Commit 92b684c647876c886ba71dab51edf6f1f3c59d82 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=92b684c ]

LUCENE-9161: DirectMonotonicWriter checks for overflows. (#1197)



> DirectMonotonicWriter should check for overflows
> 
>
> Key: LUCENE-9161
> URL: https://issues.apache.org/jira/browse/LUCENE-9161
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DirectMonotonicWriter doesn't verify that the provided blockShift is 
> compatible with the number of written values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1197: LUCENE-9161: DirectMonotonicWriter checks for overflows.

2020-01-28 Thread GitBox
jpountz merged pull request #1197: LUCENE-9161: DirectMonotonicWriter checks 
for overflows.
URL: https://github.com/apache/lucene-solr/pull/1197
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4702) Terms dictionary compression

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025321#comment-17025321
 ] 

ASF subversion and git services commented on LUCENE-4702:
-

Commit 6eb8834a57fa176c6c2e995480b69ecea1b6bd07 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6eb8834 ]

LUCENE-4702: Reduce terms dictionary compression overhead. (#1216)

Changes include:
 - Removed LZ4 compression of suffix lengths which didn't save much space
   anyway.
 - For stats, LZ4 was only really used for run-length compression of terms whose
   docFreq is 1. This has been replaced by explicit run-length compression.
 - Since we only use LZ4 for suffix bytes if the compression ration is < 75%, we
   now only try LZ4 out if the average suffix length is greater than 6, in order
   to reduce index-time overhead.

> Terms dictionary compression
> 
>
> Key: LUCENE-4702
> URL: https://issues.apache.org/jira/browse/LUCENE-4702
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-4702.patch, LUCENE-4702.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> I've done a quick test with the block tree terms dictionary by replacing a 
> call to IndexOutput.writeBytes to write suffix bytes with a call to 
> LZ4.compressHC to test the peformance hit. Interestingly, search performance 
> was very good (see comparison table below) and the tim files were 14% smaller 
> (from 150432 bytes overall to 129516).
> {noformat}
> TaskQPS baseline  StdDevQPS compressed  StdDev
> Pct diff
>   Fuzzy1  111.50  (2.0%)   78.78  (1.5%)  
> -29.4% ( -32% -  -26%)
>   Fuzzy2   36.99  (2.7%)   28.59  (1.5%)  
> -22.7% ( -26% -  -18%)
>  Respell  122.86  (2.1%)  103.89  (1.7%)  
> -15.4% ( -18% -  -11%)
> Wildcard  100.58  (4.3%)   94.42  (3.2%)   
> -6.1% ( -13% -1%)
>  Prefix3  124.90  (5.7%)  122.67  (4.7%)   
> -1.8% ( -11% -9%)
>OrHighLow  169.87  (6.8%)  167.77  (8.0%)   
> -1.2% ( -15% -   14%)
>  LowTerm 1949.85  (4.5%) 1929.02  (3.4%)   
> -1.1% (  -8% -7%)
>   AndHighLow 2011.95  (3.5%) 1991.85  (3.3%)   
> -1.0% (  -7% -5%)
>   OrHighHigh  155.63  (6.7%)  154.12  (7.9%)   
> -1.0% ( -14% -   14%)
>  AndHighHigh  341.82  (1.2%)  339.49  (1.7%)   
> -0.7% (  -3% -2%)
>OrHighMed  217.55  (6.3%)  216.16  (7.1%)   
> -0.6% ( -13% -   13%)
>   IntNRQ   53.10 (10.9%)   52.90  (8.6%)   
> -0.4% ( -17% -   21%)
>  MedTerm  998.11  (3.8%)  994.82  (5.6%)   
> -0.3% (  -9% -9%)
>  MedSpanNear   60.50  (3.7%)   60.36  (4.8%)   
> -0.2% (  -8% -8%)
> HighSpanNear   19.74  (4.5%)   19.72  (5.1%)   
> -0.1% (  -9% -9%)
>  LowSpanNear  101.93  (3.2%)  101.82  (4.4%)   
> -0.1% (  -7% -7%)
>   AndHighMed  366.18  (1.7%)  366.93  (1.7%)
> 0.2% (  -3% -3%)
> PKLookup  237.28  (4.0%)  237.96  (4.2%)
> 0.3% (  -7% -8%)
>MedPhrase  173.17  (4.7%)  174.69  (4.7%)
> 0.9% (  -8% -   10%)
>  LowSloppyPhrase  180.91  (2.6%)  182.79  (2.7%)
> 1.0% (  -4% -6%)
>LowPhrase  374.64  (5.5%)  379.11  (5.8%)
> 1.2% (  -9% -   13%)
> HighTerm  253.14  (7.9%)  256.97 (11.4%)
> 1.5% ( -16% -   22%)
>   HighPhrase   19.52 (10.6%)   19.83 (11.0%)
> 1.6% ( -18% -   25%)
>  MedSloppyPhrase  141.90  (2.6%)  144.11  (2.5%)
> 1.6% (  -3% -6%)
> HighSloppyPhrase   25.26  (4.8%)   25.97  (5.0%)
> 2.8% (  -6% -   13%)
> {noformat}
> Only queries which are very terms-dictionary-intensive got a performance hit 
> (Fuzzy, Fuzzy2, Respell, Wildcard), other queries including Prefix3 behaved 
> (surprisingly) well.
> Do you think of it as something worth exploring?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1216: LUCENE-4702: Reduce terms dictionary compression overhead.

2020-01-28 Thread GitBox
jpountz merged pull request #1216: LUCENE-4702: Reduce terms dictionary 
compression overhead.
URL: https://github.com/apache/lucene-solr/pull/1216
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-01-28 Thread GitBox
msokolov commented on a change in pull request #1155: LUCENE-8962: Add ability 
to selectively merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r371953236
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3223,15 +3259,44 @@ private long prepareCommitInternal() throws 
IOException {
   // sneak into the commit point:
   toCommit = segmentInfos.clone();
 
+  if (anyChanges) {
+mergeAwaitLatchRef = new AtomicReference<>();
+MergePolicy mergeOnCommitPolicy = 
waitForMergeOnCommitPolicy(config.getMergePolicy(), toCommit, 
mergeAwaitLatchRef);
+
+// Find any merges that can execute on commit (per 
MergePolicy).
+commitMerges = 
mergeOnCommitPolicy.findCommitMerges(segmentInfos, this);
+if (commitMerges != null && commitMerges.merges.size() > 0) {
+  int mergeCount = 0;
+  for (MergePolicy.OneMerge oneMerge : commitMerges.merges) {
+if (registerMerge(oneMerge)) {
+  mergeCount++;
+} else {
+  throw new IllegalStateException("MergePolicy " + 
config.getMergePolicy().getClass() +
 
 Review comment:
   I see, thanks for explaining!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-01-28 Thread Michael Froh (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025313#comment-17025313
 ] 

Michael Froh commented on LUCENE-8962:
--

Thanks [~msoko...@gmail.com] for the feedback on the PR! I've updated it to 
incorporate your suggestions.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9189) TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-9189.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

As mentioned above, I marked nightly for now. I need to go to the beer store if 
I'm gonna touch MockDirectoryWrapper...

> TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
> ---
>
> Key: LUCENE-9189
> URL: https://issues.apache.org/jira/browse/LUCENE-9189
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
>
> I thought it was just the testUpdatesOnDiskFull, but looks like this one 
> needs to be nightly too.
> Should look more into the test, but I know something causes it to make such 
> an insane amount of files, that sorting them becomes a bottleneck.
> I guess also related is that it would be great if MockDirectoryWrapper's disk 
> full check didn't trigger a sort of the files (via listAll): it does this 
> check on like every i/o, would be nice for it to be less absurd. Maybe 
> instead the test could check for disk full on not every i/o but some random 
> sample of them?
> Temporarily lets make it nightly...
> {noformat}
> PROFILE SUMMARY from 182501 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT   SAMPLES STACK
> 15.89%28995   java.lang.StringLatin1#compareTo()
> 6.61% 12069   java.util.TimSort#mergeHi()
> 5.96% 10878   java.util.TimSort#binarySort()
> 3.41% 6231java.util.concurrent.ConcurrentHashMap#tabAt()
> 2.98% 5433java.util.Comparators$NaturalOrderComparator#compare()
> 2.12% 3876org.apache.lucene.store.DataOutput#copyBytes()
> 2.03% 3712java.lang.String#compareTo()
> 1.84% 3350java.util.concurrent.ConcurrentHashMap#get()
> 1.83% 3337java.util.TimSort#mergeLo()
> 1.67% 3047java.util.ArrayList#add()
> {noformat}
> All the file sorting is called from stacks like this, so its literally 
> happening every writeByte() and so on
> {noformat}
> 0.73% 1329java.util.TimSort#binarySort()
> at java.util.TimSort#sort()
> at java.util.Arrays#sort()
> at java.util.ArrayList#sort()
> at java.util.stream.SortedOps$RefSortingSink#end()
> at java.util.stream.AbstractPipeline#copyInto()
> at java.util.stream.AbstractPipeline#wrapAndCopyInto()
> at java.util.stream.AbstractPipeline#evaluate()
> at 
> java.util.stream.AbstractPipeline#evaluateToArrayNode()
> at java.util.stream.ReferencePipeline#toArray()
> at 
> org.apache.lucene.store.ByteBuffersDirectory#listAll()
> at 
> org.apache.lucene.store.MockDirectoryWrapper#sizeInBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#checkDiskFull()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeByte()
> at org.apache.lucene.store.DataOutput#writeInt()
> at org.apache.lucene.codecs.CodecUtil#writeFooter()
> at 
> org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.index.PendingDeletes#writeLiveDocs()
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9189) TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025307#comment-17025307
 ] 

ASF subversion and git services commented on LUCENE-9189:
-

Commit 4773574578f089802fe3f36bff6951c4a29a3628 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4773574 ]

LUCENE-9189: TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes

The issue is that MockDirectoryWrapper's disk full check is horribly
inefficient. On every writeByte/etc, it totally recomputes disk space
across all files. This means it calls listAll() on the underlying
Directory (which sorts all the underlying files), then sums up fileLength()
for each of those files.

This leads to many pathological cases in the disk full tests... but the
number of tests impacted by this is minimal, and the logic is scary.


> TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
> ---
>
> Key: LUCENE-9189
> URL: https://issues.apache.org/jira/browse/LUCENE-9189
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> I thought it was just the testUpdatesOnDiskFull, but looks like this one 
> needs to be nightly too.
> Should look more into the test, but I know something causes it to make such 
> an insane amount of files, that sorting them becomes a bottleneck.
> I guess also related is that it would be great if MockDirectoryWrapper's disk 
> full check didn't trigger a sort of the files (via listAll): it does this 
> check on like every i/o, would be nice for it to be less absurd. Maybe 
> instead the test could check for disk full on not every i/o but some random 
> sample of them?
> Temporarily lets make it nightly...
> {noformat}
> PROFILE SUMMARY from 182501 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT   SAMPLES STACK
> 15.89%28995   java.lang.StringLatin1#compareTo()
> 6.61% 12069   java.util.TimSort#mergeHi()
> 5.96% 10878   java.util.TimSort#binarySort()
> 3.41% 6231java.util.concurrent.ConcurrentHashMap#tabAt()
> 2.98% 5433java.util.Comparators$NaturalOrderComparator#compare()
> 2.12% 3876org.apache.lucene.store.DataOutput#copyBytes()
> 2.03% 3712java.lang.String#compareTo()
> 1.84% 3350java.util.concurrent.ConcurrentHashMap#get()
> 1.83% 3337java.util.TimSort#mergeLo()
> 1.67% 3047java.util.ArrayList#add()
> {noformat}
> All the file sorting is called from stacks like this, so its literally 
> happening every writeByte() and so on
> {noformat}
> 0.73% 1329java.util.TimSort#binarySort()
> at java.util.TimSort#sort()
> at java.util.Arrays#sort()
> at java.util.ArrayList#sort()
> at java.util.stream.SortedOps$RefSortingSink#end()
> at java.util.stream.AbstractPipeline#copyInto()
> at java.util.stream.AbstractPipeline#wrapAndCopyInto()
> at java.util.stream.AbstractPipeline#evaluate()
> at 
> java.util.stream.AbstractPipeline#evaluateToArrayNode()
> at java.util.stream.ReferencePipeline#toArray()
> at 
> org.apache.lucene.store.ByteBuffersDirectory#listAll()
> at 
> org.apache.lucene.store.MockDirectoryWrapper#sizeInBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#checkDiskFull()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeByte()
> at org.apache.lucene.store.DataOutput#writeInt()
> at org.apache.lucene.codecs.CodecUtil#writeFooter()
> at 
> org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.index.PendingDeletes#writeLiveDocs()
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9189) TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025303#comment-17025303
 ] 

Robert Muir commented on LUCENE-9189:
-

OK, I see the issue. it also "tracks" (by track we mean, recomputes by calling 
listAll and then summing fileLength of every file... on every writeByte etc) 
the disk usage if you setMaxSizeInBytes. 

So it only impacts these disk full tests. The tracking should get more 
efficient, but the scope is limited and I don't want to wrestle with this logic 
right now. Going with Nightly until we fix the efficiency of this thing.

> TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
> ---
>
> Key: LUCENE-9189
> URL: https://issues.apache.org/jira/browse/LUCENE-9189
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> I thought it was just the testUpdatesOnDiskFull, but looks like this one 
> needs to be nightly too.
> Should look more into the test, but I know something causes it to make such 
> an insane amount of files, that sorting them becomes a bottleneck.
> I guess also related is that it would be great if MockDirectoryWrapper's disk 
> full check didn't trigger a sort of the files (via listAll): it does this 
> check on like every i/o, would be nice for it to be less absurd. Maybe 
> instead the test could check for disk full on not every i/o but some random 
> sample of them?
> Temporarily lets make it nightly...
> {noformat}
> PROFILE SUMMARY from 182501 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT   SAMPLES STACK
> 15.89%28995   java.lang.StringLatin1#compareTo()
> 6.61% 12069   java.util.TimSort#mergeHi()
> 5.96% 10878   java.util.TimSort#binarySort()
> 3.41% 6231java.util.concurrent.ConcurrentHashMap#tabAt()
> 2.98% 5433java.util.Comparators$NaturalOrderComparator#compare()
> 2.12% 3876org.apache.lucene.store.DataOutput#copyBytes()
> 2.03% 3712java.lang.String#compareTo()
> 1.84% 3350java.util.concurrent.ConcurrentHashMap#get()
> 1.83% 3337java.util.TimSort#mergeLo()
> 1.67% 3047java.util.ArrayList#add()
> {noformat}
> All the file sorting is called from stacks like this, so its literally 
> happening every writeByte() and so on
> {noformat}
> 0.73% 1329java.util.TimSort#binarySort()
> at java.util.TimSort#sort()
> at java.util.Arrays#sort()
> at java.util.ArrayList#sort()
> at java.util.stream.SortedOps$RefSortingSink#end()
> at java.util.stream.AbstractPipeline#copyInto()
> at java.util.stream.AbstractPipeline#wrapAndCopyInto()
> at java.util.stream.AbstractPipeline#evaluate()
> at 
> java.util.stream.AbstractPipeline#evaluateToArrayNode()
> at java.util.stream.ReferencePipeline#toArray()
> at 
> org.apache.lucene.store.ByteBuffersDirectory#listAll()
> at 
> org.apache.lucene.store.MockDirectoryWrapper#sizeInBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#checkDiskFull()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeByte()
> at org.apache.lucene.store.DataOutput#writeInt()
> at org.apache.lucene.codecs.CodecUtil#writeFooter()
> at 
> org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.index.PendingDeletes#writeLiveDocs()
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9189) TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025297#comment-17025297
 ] 

Robert Muir commented on LUCENE-9189:
-

There are definitely test bugs here too. MockDirectoryWrapper shouldn't even be 
checking disk full here, it wasn't told to do so! So its copyBytes is bad, as 
it unconditionally does the expensive disk full check on every invocation (even 
if setTrackDiskUsage was never called, such as this test).

So we definitely need to fix it to only check for disk full if the test asked 
for it, and then fix tests that want to test disk full to 
.setTrackDiskUsage(true).

I'm looking in.

> TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
> ---
>
> Key: LUCENE-9189
> URL: https://issues.apache.org/jira/browse/LUCENE-9189
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> I thought it was just the testUpdatesOnDiskFull, but looks like this one 
> needs to be nightly too.
> Should look more into the test, but I know something causes it to make such 
> an insane amount of files, that sorting them becomes a bottleneck.
> I guess also related is that it would be great if MockDirectoryWrapper's disk 
> full check didn't trigger a sort of the files (via listAll): it does this 
> check on like every i/o, would be nice for it to be less absurd. Maybe 
> instead the test could check for disk full on not every i/o but some random 
> sample of them?
> Temporarily lets make it nightly...
> {noformat}
> PROFILE SUMMARY from 182501 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT   SAMPLES STACK
> 15.89%28995   java.lang.StringLatin1#compareTo()
> 6.61% 12069   java.util.TimSort#mergeHi()
> 5.96% 10878   java.util.TimSort#binarySort()
> 3.41% 6231java.util.concurrent.ConcurrentHashMap#tabAt()
> 2.98% 5433java.util.Comparators$NaturalOrderComparator#compare()
> 2.12% 3876org.apache.lucene.store.DataOutput#copyBytes()
> 2.03% 3712java.lang.String#compareTo()
> 1.84% 3350java.util.concurrent.ConcurrentHashMap#get()
> 1.83% 3337java.util.TimSort#mergeLo()
> 1.67% 3047java.util.ArrayList#add()
> {noformat}
> All the file sorting is called from stacks like this, so its literally 
> happening every writeByte() and so on
> {noformat}
> 0.73% 1329java.util.TimSort#binarySort()
> at java.util.TimSort#sort()
> at java.util.Arrays#sort()
> at java.util.ArrayList#sort()
> at java.util.stream.SortedOps$RefSortingSink#end()
> at java.util.stream.AbstractPipeline#copyInto()
> at java.util.stream.AbstractPipeline#wrapAndCopyInto()
> at java.util.stream.AbstractPipeline#evaluate()
> at 
> java.util.stream.AbstractPipeline#evaluateToArrayNode()
> at java.util.stream.ReferencePipeline#toArray()
> at 
> org.apache.lucene.store.ByteBuffersDirectory#listAll()
> at 
> org.apache.lucene.store.MockDirectoryWrapper#sizeInBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#checkDiskFull()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeByte()
> at org.apache.lucene.store.DataOutput#writeInt()
> at org.apache.lucene.codecs.CodecUtil#writeFooter()
> at 
> org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.index.PendingDeletes#writeLiveDocs()
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13817) Deprecate and remove legacy SolrCache implementations

2020-01-28 Thread Andy Webb (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025295#comment-17025295
 ] 

Andy Webb commented on SOLR-13817:
--

Could I put in a request that we get to use the final version of CaffeineCache 
in 8.5.0+ before the legacy cache implementations are removed in 9.0.0 please?

Currently 
https://github.com/apache/lucene-solr/commit/b4fe911cc8e4bddff18226bc8c98a2deb735a8fc#diff-fc056ba10fcf92dc69fe32991cdad5f0
 (in master) both updates CaffeineCache.java and removes FastLRUCache etc.

thanks,
Andy

> Deprecate and remove legacy SolrCache implementations
> -
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] balaji-s commented on issue #1221: SOLR-14193 Update tutorial.adoc(line no:664) so that command executes…

2020-01-28 Thread GitBox
balaji-s commented on issue #1221: SOLR-14193 Update tutorial.adoc(line no:664) 
so that command executes…
URL: https://github.com/apache/lucene-solr/pull/1221#issuecomment-579353263
 
 
   Updated line no:664 in solr-tutorial.adoc. Added escape characters for  ^ 
and | symbols in windows environment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14193) Update tutorial.adoc(line no:664) so that command executes in windows enviroment

2020-01-28 Thread balaji sundaram (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

balaji sundaram updated SOLR-14193:
---
Attachment: solr-tutorial.adoc
Status: Open  (was: Open)

> Update tutorial.adoc(line no:664) so that command executes in windows 
> enviroment
> 
>
> Key: SOLR-14193
> URL: https://issues.apache.org/jira/browse/SOLR-14193
> Project: Solr
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 8.4
>Reporter: balaji sundaram
>Priority: Minor
> Attachments: solr-tutorial.adoc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {{When executing the following command in windows 10 "java -jar -Dc=films 
> -Dparams=f.genre.split=true_by.split=true=|_by.separator=|
>  -Dauto example\exampledocs\post.jar example\films\*.csv", it throws error "& 
> was unexpected at this time."}}
> Fix: the command should escape "&" and "|" symbol{{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9186) remove linefiledocs usage from basetokenstreamtestcase

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-9186.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

I opened LUCENE-9191 as a followup for other tests using LineFileDocs in a 
similar way. But fixing the analyzers tests was an easy win.

> remove linefiledocs usage from basetokenstreamtestcase
> --
>
> Key: LUCENE-9186
> URL: https://issues.apache.org/jira/browse/LUCENE-9186
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/test
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9186.patch
>
>
> LineFileDocs is slow, even to open. That's because it (very slowly) "skips" 
> to a pseudorandom position into a 5MB gzip stream when you open it.
> There was a time when we didn't have a nice string generator for tests 
> (TestUtil.randomAnalysisString), but now we do. And when it was introduced it 
> found interesting new things that linefiledocs never found.
> This speeds up all the analyzer tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9186) remove linefiledocs usage from basetokenstreamtestcase

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025289#comment-17025289
 ] 

ASF subversion and git services commented on LUCENE-9186:
-

Commit 3bcc97c8eb70f4a3a309d4cdab290363b524b0a2 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3bcc97c ]

LUCENE-9186: remove linefiledocs usage from BaseTokenStreamTestCase


> remove linefiledocs usage from basetokenstreamtestcase
> --
>
> Key: LUCENE-9186
> URL: https://issues.apache.org/jira/browse/LUCENE-9186
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/test
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9186.patch
>
>
> LineFileDocs is slow, even to open. That's because it (very slowly) "skips" 
> to a pseudorandom position into a 5MB gzip stream when you open it.
> There was a time when we didn't have a nice string generator for tests 
> (TestUtil.randomAnalysisString), but now we do. And when it was introduced it 
> found interesting new things that linefiledocs never found.
> This speeds up all the analyzer tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-01-28 Thread Gregg Donovan (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025286#comment-17025286
 ] 

Gregg Donovan commented on SOLR-13289:
--

{quote}This feature currently doesn't work in case of faceting(this is 
expected), grouping.{quote}

Will WAND cause faceting to break entirely? Or will the counts for facets just 
be inexact?

{quote}as same minExactHits is shared across shard. so, actual minExactHits is 
shardCount*minExactHits{quote}
Perhaps it would be worth having an additional parameter for a 
perShardExactHits? E.g. if we're requesting the top 1000 hits across 64 shards, 
we'd likely be fine with WAND getting the top, say, 150 per shard.


> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] balaji-s opened a new pull request #1221: SOLR-14193 Update tutorial.adoc(line no:664) so that command executes…

2020-01-28 Thread GitBox
balaji-s opened a new pull request #1221: SOLR-14193 Update tutorial.adoc(line 
no:664) so that command executes…
URL: https://github.com/apache/lucene-solr/pull/1221
 
 
   … in windows enviroment
   
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9191) Fix linefiledocs compression or replace in tests

2020-01-28 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9191:
---

 Summary: Fix linefiledocs compression or replace in tests
 Key: LUCENE-9191
 URL: https://issues.apache.org/jira/browse/LUCENE-9191
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


LineFileDocs(random) is very slow, even to open. It does a very slow "random 
skip" through a gzip compressed file.

For the analyzers tests, in LUCENE-9186 I simply removed its usage, since 
TestUtil.randomAnalysisString is superior, and fast. But we should address 
other tests using it, since LineFileDocs(random) is slow!

I think it is also the case that every lucene test has probably tested every 
LineFileDocs line many times now, whereas randomAnalysisString will invent new 
ones.

Alternatively, we could "fix" LineFileDocs(random), e.g. special compression 
options (in blocks)... deflate supports such stuff. But it would make it even 
hairier than it is now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9187) remove too-expensive assert from LZ4 HighCompressionHashTable

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-9187.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

I opened LUCENE-9190 as a followup for the dedicated test idea so we don't lose 
it.

> remove too-expensive assert from LZ4 HighCompressionHashTable
> -
>
> Key: LUCENE-9187
> URL: https://issues.apache.org/jira/browse/LUCENE-9187
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9187.patch
>
>
> This is the slowest method in the lucene tests. See LUCENE-9185 for what I 
> mean.
> If you look at it, its checking 64k values every time the assert is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9187) remove too-expensive assert from LZ4 HighCompressionHashTable

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025276#comment-17025276
 ] 

ASF subversion and git services commented on LUCENE-9187:
-

Commit 4350efa932a4c6aaad1943857c935bafce98fe56 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4350efa ]

LUCENE-9187: remove too-expensive assert from LZ4 HighCompressionHashTable


> remove too-expensive assert from LZ4 HighCompressionHashTable
> -
>
> Key: LUCENE-9187
> URL: https://issues.apache.org/jira/browse/LUCENE-9187
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9187.patch
>
>
> This is the slowest method in the lucene tests. See LUCENE-9185 for what I 
> mean.
> If you look at it, its checking 64k values every time the assert is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9190) add dedicated test to assert internals of LZ4 hashtable

2020-01-28 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9190:
---

 Summary: add dedicated test to assert internals of LZ4 hashtable
 Key: LUCENE-9190
 URL: https://issues.apache.org/jira/browse/LUCENE-9190
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


This assert was called all the time by all tests, causing a bottleneck. I 
disabled it in LUCENE-9187, but it would be nice to add a subclass or 
package-private method or something to still test it (without taking up tons of 
cpu).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9189) TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025270#comment-17025270
 ] 

Robert Muir commented on LUCENE-9189:
-

I'm guessing there is something such as a copyBytes that goes one byte at a 
time or similar stuff causing it to be truly pathological.

> TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes
> ---
>
> Key: LUCENE-9189
> URL: https://issues.apache.org/jira/browse/LUCENE-9189
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>
> I thought it was just the testUpdatesOnDiskFull, but looks like this one 
> needs to be nightly too.
> Should look more into the test, but I know something causes it to make such 
> an insane amount of files, that sorting them becomes a bottleneck.
> I guess also related is that it would be great if MockDirectoryWrapper's disk 
> full check didn't trigger a sort of the files (via listAll): it does this 
> check on like every i/o, would be nice for it to be less absurd. Maybe 
> instead the test could check for disk full on not every i/o but some random 
> sample of them?
> Temporarily lets make it nightly...
> {noformat}
> PROFILE SUMMARY from 182501 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT   SAMPLES STACK
> 15.89%28995   java.lang.StringLatin1#compareTo()
> 6.61% 12069   java.util.TimSort#mergeHi()
> 5.96% 10878   java.util.TimSort#binarySort()
> 3.41% 6231java.util.concurrent.ConcurrentHashMap#tabAt()
> 2.98% 5433java.util.Comparators$NaturalOrderComparator#compare()
> 2.12% 3876org.apache.lucene.store.DataOutput#copyBytes()
> 2.03% 3712java.lang.String#compareTo()
> 1.84% 3350java.util.concurrent.ConcurrentHashMap#get()
> 1.83% 3337java.util.TimSort#mergeLo()
> 1.67% 3047java.util.ArrayList#add()
> {noformat}
> All the file sorting is called from stacks like this, so its literally 
> happening every writeByte() and so on
> {noformat}
> 0.73% 1329java.util.TimSort#binarySort()
> at java.util.TimSort#sort()
> at java.util.Arrays#sort()
> at java.util.ArrayList#sort()
> at java.util.stream.SortedOps$RefSortingSink#end()
> at java.util.stream.AbstractPipeline#copyInto()
> at java.util.stream.AbstractPipeline#wrapAndCopyInto()
> at java.util.stream.AbstractPipeline#evaluate()
> at 
> java.util.stream.AbstractPipeline#evaluateToArrayNode()
> at java.util.stream.ReferencePipeline#toArray()
> at 
> org.apache.lucene.store.ByteBuffersDirectory#listAll()
> at 
> org.apache.lucene.store.MockDirectoryWrapper#sizeInBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#checkDiskFull()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeBytes()
> at 
> org.apache.lucene.store.MockIndexOutputWrapper#writeByte()
> at org.apache.lucene.store.DataOutput#writeInt()
> at org.apache.lucene.codecs.CodecUtil#writeFooter()
> at 
> org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat#writeLiveDocs()
> at 
> org.apache.lucene.index.PendingDeletes#writeLiveDocs()
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-9185.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 

[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025263#comment-17025263
 ] 

ASF subversion and git services commented on LUCENE-9185:
-

Commit e504798a44e5f1577d87ef3a43d9d1e3a859d68a in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e504798 ]

LUCENE-9185: add "tests.profile" to gradle build to aid fixing slow tests

Run test(s) with -Ptests.profile=true to print a histogram at the end of
the build.


> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> 

[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2020-01-28 Thread Zsolt Gyulavari (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025259#comment-17025259
 ] 

Zsolt Gyulavari commented on SOLR-13756:


I've rebased and addressed the gradle build as well, however I think the 
cloudera repo is not needed anymore if not for the backup purposes. Otherwise 
we can remove it altogether. What do you think?

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9189) TestIndexWriterDelete.testDeletesOnDiskFull can run for minutes

2020-01-28 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9189:
---

 Summary: TestIndexWriterDelete.testDeletesOnDiskFull can run for 
minutes
 Key: LUCENE-9189
 URL: https://issues.apache.org/jira/browse/LUCENE-9189
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


I thought it was just the testUpdatesOnDiskFull, but looks like this one needs 
to be nightly too.

Should look more into the test, but I know something causes it to make such an 
insane amount of files, that sorting them becomes a bottleneck.

I guess also related is that it would be great if MockDirectoryWrapper's disk 
full check didn't trigger a sort of the files (via listAll): it does this check 
on like every i/o, would be nice for it to be less absurd. Maybe instead the 
test could check for disk full on not every i/o but some random sample of them?

Temporarily lets make it nightly...

{noformat}
PROFILE SUMMARY from 182501 samples
  tests.profile.count=10
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT SAMPLES STACK
15.89%  28995   java.lang.StringLatin1#compareTo()
6.61%   12069   java.util.TimSort#mergeHi()
5.96%   10878   java.util.TimSort#binarySort()
3.41%   6231java.util.concurrent.ConcurrentHashMap#tabAt()
2.98%   5433java.util.Comparators$NaturalOrderComparator#compare()
2.12%   3876org.apache.lucene.store.DataOutput#copyBytes()
2.03%   3712java.lang.String#compareTo()
1.84%   3350java.util.concurrent.ConcurrentHashMap#get()
1.83%   3337java.util.TimSort#mergeLo()
1.67%   3047java.util.ArrayList#add()
{noformat}

All the file sorting is called from stacks like this, so its literally 
happening every writeByte() and so on

{noformat}
0.73%   1329java.util.TimSort#binarySort()
  at java.util.TimSort#sort()
  at java.util.Arrays#sort()
  at java.util.ArrayList#sort()
  at java.util.stream.SortedOps$RefSortingSink#end()
  at java.util.stream.AbstractPipeline#copyInto()
  at java.util.stream.AbstractPipeline#wrapAndCopyInto()
  at java.util.stream.AbstractPipeline#evaluate()
  at 
java.util.stream.AbstractPipeline#evaluateToArrayNode()
  at java.util.stream.ReferencePipeline#toArray()
  at 
org.apache.lucene.store.ByteBuffersDirectory#listAll()
  at 
org.apache.lucene.store.MockDirectoryWrapper#sizeInBytes()
  at 
org.apache.lucene.store.MockIndexOutputWrapper#checkDiskFull()
  at 
org.apache.lucene.store.MockIndexOutputWrapper#writeBytes()
  at 
org.apache.lucene.store.MockIndexOutputWrapper#writeByte()
  at org.apache.lucene.store.DataOutput#writeInt()
  at org.apache.lucene.codecs.CodecUtil#writeFooter()
  at 
org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat#writeLiveDocs()
  at 
org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat#writeLiveDocs()
  at 
org.apache.lucene.index.PendingDeletes#writeLiveDocs()
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-01-28 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025251#comment-17025251
 ] 

Michael Sokolov commented on LUCENE-9004:
-

> Is there any possible to merge LUCENE-9136 with this issue?

This is already gigantic - what would be the benefit of merging?

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 
> a standalone test class. I've tested with uniform random vectors and on my 
> laptop indexed 10K documents in around 10 seconds and searched them at 95% 
> recall (compared with exact nearest-neighbor baseline) at around 250 QPS. I 
> haven't made any attempt to use multithreaded search for this, but it is 
> amenable to per-segment concurrency.
> [1] 
> 

[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload

2020-01-28 Thread GitBox
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost 
by payload 
URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-579326000
 
 
   No strong opinion on that, it was actually the first time I used the 
AttributeSource so I am happy to switch to TokenStream if it is more memory 
efficient.
   The change shouldn't be too heavy.
   I will just wait for confirmation, and when we are all aligned I'll proceed 
with the implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9188) Add jacoco code coverage support to gradle build

2020-01-28 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9188:
---

 Summary: Add jacoco code coverage support to gradle build
 Key: LUCENE-9188
 URL: https://issues.apache.org/jira/browse/LUCENE-9188
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Robert Muir


Seems to be missing. I looked into it a little, all the documented ways of 
using the jacoco plugin seem to involve black magic if you are using "java" 
plugin, but we are using "javaLibrary", so I wasn't able to hold it right.

This one should work very well, it has low overhead and should work fine 
running tests in parallel (since it supports merging of coverage data files: 
that's how it works in the ant build)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025238#comment-17025238
 ] 

Robert Muir commented on LUCENE-9185:
-

I agree, it would be nice. For now I added basic usage to the help and the 
reporter itself prints out the values of any fancy options.

just trying to make it as easy as possible to keep the slow tests at bay...

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> 

[jira] [Commented] (SOLR-14225) Upgrade jaegertracing

2020-01-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025234#comment-17025234
 ] 

Dawid Weiss commented on SOLR-14225:


It'd be great if the patch included corresponding gradle updates, Jan (if you 
have problems with something, let me know).

> Upgrade jaegertracing
> -
>
> Key: SOLR-14225
> URL: https://issues.apache.org/jira/browse/SOLR-14225
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Priority: Major
>
> Upgrade jaegertracing from 0.35.5 to 1.1.0. This will also give us a newer 
> libthrift which is more stable and secure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025233#comment-17025233
 ] 

Dawid Weiss commented on LUCENE-9185:
-

It looks great, thanks Robert. 

I'd love to have some kind of task to display all these build options at some 
point. Currently this is done just for randomization options (try gradlew 
testOpts -p lucene/core) but I'm sure it could be pulled from other parts of 
the build and displayed consistently. For now it can stay as it is.

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> 

[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025220#comment-17025220
 ] 

Robert Muir commented on LUCENE-9185:
-

I added a bunch of crazy abstractions and constants to the java code so that 
the gradle code looks a little prettier. I realize you really hate how i did it 
before, but I want to keep the simple main method, and I don't think gradle's 
bad decisions should get in the way of that.

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   

[jira] [Updated] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9185:

Attachment: LUCENE-9185.patch

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> 

[jira] [Created] (SOLR-14225) Upgrade jaegertracing

2020-01-28 Thread Jira
Jan Høydahl created SOLR-14225:
--

 Summary: Upgrade jaegertracing
 Key: SOLR-14225
 URL: https://issues.apache.org/jira/browse/SOLR-14225
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Jan Høydahl


Upgrade jaegertracing from 0.35.5 to 1.1.0. This will also give us a newer 
libthrift which is more stable and secure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025193#comment-17025193
 ] 

Robert Muir commented on LUCENE-9185:
-

[~dweiss] I tried to fold in your feedback, can you take another look?

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> 

[jira] [Updated] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9185:

Attachment: LUCENE-9185.patch

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch, LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> 

[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc 
skeleton
URL: https://github.com/apache/lucene-solr/pull/1218#discussion_r371828729
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
+  if (file.text.contains("Generated By:JavaCC")) {
+file.delete()
+  }
+}
+logger.lifecycle("Regenerating JavaCC:\n  from: ${javaccFile}\nto: 
${parentDir}")
+
+project.javaexec {
+  classpath {
+project.rootProject.configurations.javacc
+  }
+  main = "org.javacc.parser.Main"
+  args += "-OUTPUT_DIRECTORY=${parentDir}"
+  args += [javaccFile]
+}
+  }
+}
+
+
+configure(project(":lucene:queryparser")) {
+  task javaccParserClassic(type: JavaCCTask) {
+description "Regenerate classic query parser from java CC.java"
+group "generation"
+
+javaccFile = 
file('src/java/org/apache/lucene/queryparser/classic/QueryParser.jj')
+def parent = javaccFile.parentFile.toString() // I'll need this later.
+
+doLast {
+  // There'll be a lot of cleanup in here to get precommits and builds to 
pass, but as long as we don't
 
 Review comment:
   That _should_ be the end product already, that's one of the reasons I spent 
so much time on the ant version and why all those files were changed when I 
committed. At least I _think_ I got them all. At least that's what I remember 
doing... That said I'll try not to go off in the weeds.
   
   Now that I've got the structure right, I'll see if I can get this to happen. 
Shouldn't actually be that much.
   
   Oh, and ignore PR 1219, I had a bad title for this PR and it didn't link. 
When I changed the title of this one it took a while to show up and I got 
impatient. 1219 and 1218 are identical.
   
   Finally, many thanks for your coaching (well, ok, outright fixing things)!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025142#comment-17025142
 ] 

Robert Muir commented on LUCENE-9185:
-

{quote}
All of them make sense but you're killing me...

It's also worth nothing that the "slowest tests" list depends on the level of 
parallelism and what other tests ran in the background alongside (one memory or 
I/O heavy test slows down everything running with it).
{quote}

I know, but its all the rudimentary "profiling" we have at the moment.  Trying 
to change that!

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> 

[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025129#comment-17025129
 ] 

Robert Muir commented on LUCENE-9185:
-

{quote}
As for the patch: it works because you invoke a static method on that class and 
it inherits gradle's environment. A nicer way to do it would be to pass 
arguments like tests.profile.count explicitly to ProfileResults (via args, 
setters or otherwise) preparing them on gradle side.
{quote}

I know, i wanted to keep a simple main() method, to make it easy to improve or 
fix bugs, iterate quickly, e.g.
{noformat}
$ java buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java 
./lucene/analysis/opennlp/build/tmp/tests-cwd/hotspot-pid-133619-id-1-2020_01_28_06_11_03.jfr
 
./lucene/analysis/opennlp/build/tmp/tests-cwd/hotspot-pid-133548-id-1-2020_01_28_06_11_02.jfr
PROFILE SUMMARY from 306 samples
  tests.profile.count=10
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT SAMPLES STACK
13.73%  42  java.util.zip.Inflater#inflateBytesBytes()
2.94%   9   java.lang.StringLatin1#indexOf()
2.61%   8   java.io.UnixFileSystem#getBooleanAttributes0()
2.29%   7   java.util.DualPivotQuicksort#sort()
1.96%   6   java.lang.StringLatin1#charAt()
1.96%   6   java.io.UnixFileSystem#normalize()
1.63%   5   java.lang.StringLatin1#inflate()
1.31%   4   java.lang.String#startsWith()
1.31%   4   java.lang.ClassLoader#defineClass1()
1.31%   4   java.lang.StringLatin1#compareTo()
{noformat}

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 

[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025128#comment-17025128
 ] 

Dawid Weiss commented on LUCENE-9185:
-

bq. But if i had to ask for a wishlist of improvements

All of them make sense but you're killing me... ;)

It's also worth nothing that the "slowest tests" list depends on the level of 
parallelism and what other tests ran in the background alongside (one memory or 
I/O heavy test slows down everything running with it).


> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>  

[GitHub] [lucene-solr] dsmiley commented on issue #357: [SOLR-12238] Synonym Queries boost by payload

2020-01-28 Thread GitBox
dsmiley commented on issue #357: [SOLR-12238] Synonym Queries boost by payload 
URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-579257834
 
 
   I noticed the use of {{AttributeSource[]}} (array of AttributeSource), done 
at the behest of @romseygeek .  That seems fishy... shouldn't it be a 
TokenStream, which is a more memory efficient iterator over AttributeSource 
changing state?  I see, for example, the _existing_ 
{{createSpanQuery(TokenStream in, String field)}} but the PR adds 
{{newSpanQuery(String field, AttributeSource[] attributes)}} and makes the 
former call the latter.  Why bother; why not retain createSpanQuery and if Solr 
wants to override it to do payload boosting then it can?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9171) Synonyms Boost by Payload

2020-01-28 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025124#comment-17025124
 ] 

David Smiley commented on LUCENE-9171:
--

I have my doubts on AttributeSource and arrays of such; I'll put my comments in 
the PR in a minute.

BTW I agree with Alan about keeping things simple in its base class.  In Lucene 
we fight complexity all the time.

> Synonyms Boost by Payload
> -
>
> Key: LUCENE-9171
> URL: https://issues.apache.org/jira/browse/LUCENE-9171
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser
>Reporter: Alessandro Benedetti
>Priority: Major
>
> I have been working in the additional capability of boosting queries by terms 
> payload through a parameter to enable it in Lucene Query Builder.
> This has been done targeting the Synonyms Query.
> It is parametric, so it meant to see no difference unless the feature is 
> enabled.
> Solr has its bits to comply thorugh its SynonymsQueryStyles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13996) Refactor HttpShardHandler#prepDistributed() into smaller pieces

2020-01-28 Thread Shalin Shekhar Mangar (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025122#comment-17025122
 ] 

Shalin Shekhar Mangar commented on SOLR-13996:
--

I've been working on a refactoring of this method and it's my fault that I 
didn't see this issue and the PR earlier. However, my goals are a bit more 
ambitious. This first PR https://github.com/apache/lucene-solr/pull/1220 is 
just a re-organization of the code but I'll be expanding it further by adding 
tests for each individual case and then move on to improve performance. 
Currently this class is quite inefficient as it parses and re-parses and 
creates strings out of shard urls even for solr cloud cases. The goal is to 
eventually have a cloud focused class that is extremely efficient and avoids 
unnecessary copies of shards/replicas completely. This will require changes in 
other places as well e.g. the host checker can be made to operate in a 
streaming mode etc. I haven't quite decided on how the replica list transformer 
should be changed.

I hope you don't mind Ishan but I'll assign this issue and take this forward. 
Reviews welcome!

> Refactor HttpShardHandler#prepDistributed() into smaller pieces
> ---
>
> Key: SOLR-13996
> URL: https://issues.apache.org/jira/browse/SOLR-13996
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Attachments: SOLR-13996.patch, SOLR-13996.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, it is very hard to understand all the various things being done in 
> HttpShardHandler. I'm starting with refactoring the prepDistributed() method 
> to make it easier to grasp. It has standalone and cloud code intertwined, and 
> wanted to cleanly separate them out. Later, we can even have two separate 
> method (for standalone and cloud, each).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025120#comment-17025120
 ] 

Dawid Weiss commented on LUCENE-9185:
-

It looks odd to your eyes because it's a legacy from ant. These are different 
things: system properties are global, project properties are local (or looked 
up via scopes). You can set project properties with finer granularity than 
globally. 

As for the patch: it works because you invoke a static method on that class and 
it inherits gradle's environment. A nicer way to do it would be to pass 
arguments like tests.profile.count explicitly to ProfileResults (via args, 
setters or otherwise) preparing them on gradle side.

The propertyOrDefault utility is actually a hack in the build so that people 
used to global system properties can still pass them to gradle build... maybe 
it was a mistake that I added it in the first place, don't know.


> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>  

[jira] [Assigned] (SOLR-13996) Refactor HttpShardHandler#prepDistributed() into smaller pieces

2020-01-28 Thread Shalin Shekhar Mangar (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-13996:


Assignee: Shalin Shekhar Mangar

> Refactor HttpShardHandler#prepDistributed() into smaller pieces
> ---
>
> Key: SOLR-13996
> URL: https://issues.apache.org/jira/browse/SOLR-13996
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Attachments: SOLR-13996.patch, SOLR-13996.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, it is very hard to understand all the various things being done in 
> HttpShardHandler. I'm starting with refactoring the prepDistributed() method 
> to make it easier to grasp. It has standalone and cloud code intertwined, and 
> wanted to cleanly separate them out. Later, we can even have two separate 
> method (for standalone and cloud, each).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] shalinmangar opened a new pull request #1220: SOLR-13996: Refactor HttpShardHandler.prepDistributed method

2020-01-28 Thread GitBox
shalinmangar opened a new pull request #1220: SOLR-13996: Refactor 
HttpShardHandler.prepDistributed method
URL: https://github.com/apache/lucene-solr/pull/1220
 
 
   # Description
   
   This PR refactors the huge HttpShardHandler.prepDistributed method into 
smaller pieces.
   
   # Solution
   
   It separates the logic for cloud and non-cloud modes into separate classes 
which are implementations of a new (experimental/internal) interface named 
ReplicaSource.
   
   # Tests
   
   This PR passes all current tests and I'll add more tests before merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025116#comment-17025116
 ] 

Robert Muir commented on LUCENE-9185:
-

{quote}
Maybe we should make the latter optional as well?
{quote}

Do you mean the whole {{slowest-test-at-end}}? Given how insanely slow some of 
these tests are, I feel it should be mandatory to see it? :)

But if i had to ask for a wishlist of improvements to {{slowest-tests-at-end}}, 
they would be:
* option (or change behavior) to print them always, even if a test sporadically 
failed.
* property to increase the count (e.g. from 10 to 100) and threshold (e.g. from 
500ms to 250ms, yes we may get there soon in lucene!)
* some way to show or count beforeclass/afterclass time. I'm not sure it is 
currently considered, only time for each method (i assume that includes 
setup+teardown)
* some way to see the slowest suites, too. Even if we fix all the tests to be 
100ms, it can cause bottlenecks if a suite has a TON of tests, because of bad 
gradle load balancing.

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at 

[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025112#comment-17025112
 ] 

Robert Muir commented on LUCENE-9185:
-

{quote}
It will work but -Ptests.profile=true would be more gradle-sque (it sets a 
project property as opposed to system property).
{quote}

The tool uses actual system properties for the more advanced options (e.g. 
{{-Dtests.profile.count=20}}). Seems a little evil to mix -P's and -D's when 
documenting this? I'll be honest, the difference is super confusing.

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> 

[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025110#comment-17025110
 ] 

Dawid Weiss commented on LUCENE-9185:
-

Ok, fair enough. With profiling it's explicit; those slow-tests are always 
shown. Maybe we should make the latter optional as well?

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  

[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025108#comment-17025108
 ] 

Robert Muir commented on LUCENE-9185:
-

{quote}
Also: it'll always display the profile, even on a failed build. Look at 
slowest-tests-at-end.gradle - this one only displays the slowest tests if the 
build is successful.
{quote}

Honestly when looking at slow solr tests, I remove that logic locally from 
{{slowest-tests-at-end.gradle}}. It takes me 80 minutes to run solr tests, and 
90% of the time some test fails and then i get no output from it at all. This 
is frustrating because then I wasted 80 minutes. I feel the same way about it 
here, its about performance, and you asked for profile output, and it found jfr 
files, why not show it?

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> 

[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc 
skeleton
URL: https://github.com/apache/lucene-solr/pull/1218#discussion_r371801937
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
+  if (file.text.contains("Generated By:JavaCC")) {
+file.delete()
+  }
+}
+logger.lifecycle("Regenerating JavaCC:\n  from: ${javaccFile}\nto: 
${parentDir}")
+
+project.javaexec {
+  classpath {
+project.rootProject.configurations.javacc
+  }
+  main = "org.javacc.parser.Main"
+  args += "-OUTPUT_DIRECTORY=${parentDir}"
+  args += [javaccFile]
 
 Review comment:
   I'll change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc 
skeleton
URL: https://github.com/apache/lucene-solr/pull/1218#discussion_r371801798
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
 
 Review comment:
   Actually, they aren't overwritten. If they're not deleted you get messages 
during execution like: "Warning: TokenMgrError.java: File is obsolete.  Please 
rename or delete this file so that a new one can be generated for you."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc 
skeleton
URL: https://github.com/apache/lucene-solr/pull/1218#discussion_r371799808
 
 

 ##
 File path: gradle/generation/javacc.gradle
 ##
 @@ -0,0 +1,102 @@
+// Add a top-level pseudo-task to which we will attach individual regenerate 
tasks.
+import static groovy.io.FileType.*
+
+configure(rootProject) {
+  configurations {
+javacc
+  }
+
+  dependencies {
+javacc "net.java.dev.javacc:javacc:${scriptDepVersions['javacc']}"
+  }
+
+  task javacc() {
+description "Regenerate sources for corresponding javacc grammar files."
+group "generation"
+
+dependsOn ":lucene:queryparser:javaccParserClassic"
+dependsOn ":lucene:queryparser:javaccParserSurround"
+dependsOn ":lucene:queryparser:javaccParserFlexible"
+  }
+}
+
+// We always regenerate, no need to declare outputs.
+class JavaCCTask extends DefaultTask {
+  @Input
+  File javaccFile
+
+  JavaCCTask() {
+dependsOn(project.rootProject.configurations.javacc)
+  }
+
+  @TaskAction
+  def generate() {
+if (!javaccFile || !javaccFile.exists()) {
+  throw new RuntimeException("JavaCC input file does not exist: 
${javaccFile}")
+}
+// Remove old files so we can regenerate them
+def parentDir = javaccFile.parentFile
+parentDir.eachFileMatch FILES, ~/.*\.java/, { file ->
 
 Review comment:
   I copied it from some example and it worked...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
ErickErickson commented on a change in pull request #1218: LUCENE-9134: Javacc 
skeleton
URL: https://github.com/apache/lucene-solr/pull/1218#discussion_r371799564
 
 

 ##
 File path: gradle/defaults-java.gradle
 ##
 @@ -25,13 +25,13 @@ allprojects {
 tasks.withType(JavaCompile) {
   options.encoding = "UTF-8"
   options.compilerArgs += [
-"-Xlint", 
 
 Review comment:
   OK, I'll check. I don't even know how they got changed frankly, I'll revert


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14224) Not able to build solr 6.6.2 from source after January 15, 2020

2020-01-28 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14224.
---
Resolution: Invalid

We stopped active support for Solr 6x quite some time ago and will not be 
releasing any new versions. Arguing about whether it's a bug or not is 
pointless, please ask the question on the user's list as Jan suggested and do 
not reopen this JIRA.

> Not able to build solr 6.6.2 from source after January 15, 2020
> ---
>
> Key: SOLR-14224
> URL: https://issues.apache.org/jira/browse/SOLR-14224
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6.2
>Reporter: Guruprasad K K
>Priority: Major
>
> After Jan 15th maven is allowing only https connections to repo. But solr 
> 6.6.2 version uses http connection. So builds are failing.
> But looks like latest version of solr has the fix to this in common_build.xml 
> and other places where it uses https connection to maven.
>  
> Error log:
>  ivy-bootstrap1:
>  [mkdir] Created dir: /root/.ant/lib
>  [echo] installing ivy 2.3.0 to /root/.ant/lib
>  [get] Getting: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] To: /root/.ant/lib/ivy-2.3.0.jar
>  [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Can't get 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
>  to /root/.ant/lib/ivy-2.3.0.jar
>  
>  
>  
>  
> [NOTE]: It works on latest version of solr, where http is converted to https



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-28 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025101#comment-17025101
 ] 

Erick Erickson commented on LUCENE-9134:


New PR with skeleton of javacc changes. Just for the structure of the Gradle 
changes, won't be committable until after the post-generation cleanup is done.

> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch, core_regen.patch
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Take II about organizing this beast.
>  A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. See process comments 
> at parent (LUCENE-9077)
>  * Implement jflex task in lucene/core
>  * Implement jflex tasks in lucene/analysis
>  * Implement javacc tasks in lucene/queryparser (EOE)
>  * Implement javacc tasks in solr/core (EOE)
>  * Implement python tasks in lucene (? there are several javadocs mentions in 
> the build.xml, this may be irrelevant to the Gradle effort).
>  * Implement python tasks in lucene/core
>  * Implement python tasks in lucene/analysis
>  
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for –
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9186) remove linefiledocs usage from basetokenstreamtestcase

2020-01-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025091#comment-17025091
 ] 

Dawid Weiss commented on LUCENE-9186:
-

+1.

> remove linefiledocs usage from basetokenstreamtestcase
> --
>
> Key: LUCENE-9186
> URL: https://issues.apache.org/jira/browse/LUCENE-9186
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/test
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9186.patch
>
>
> LineFileDocs is slow, even to open. That's because it (very slowly) "skips" 
> to a pseudorandom position into a 5MB gzip stream when you open it.
> There was a time when we didn't have a nice string generator for tests 
> (TestUtil.randomAnalysisString), but now we do. And when it was introduced it 
> found interesting new things that linefiledocs never found.
> This speeds up all the analyzer tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9187) remove too-expensive assert from LZ4 HighCompressionHashTable

2020-01-28 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025085#comment-17025085
 ] 

Adrien Grand commented on LUCENE-9187:
--

This profile option is pretty cool.

+1 to removing the assert, I'd like to make it a dedicated test instead but it 
doesn't have to block the removal of the assertion.

> remove too-expensive assert from LZ4 HighCompressionHashTable
> -
>
> Key: LUCENE-9187
> URL: https://issues.apache.org/jira/browse/LUCENE-9187
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9187.patch
>
>
> This is the slowest method in the lucene tests. See LUCENE-9185 for what I 
> mean.
> If you look at it, its checking 64k values every time the assert is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025074#comment-17025074
 ] 

Dawid Weiss commented on LUCENE-9185:
-

Great beyond words. I never had a chance to use jfr but I'll surely want to 
dig. A few nitpicks:
{code}
+allprojects {
+  tasks.withType(Test) {
+def profileMode = propertyOrDefault("tests.profile", false)
+if (profileMode) {
{code}
you can apply the if outside and only do allprojects closure if tests.profile 
is true at the root project level (I assume we won't have to enable it for 
individual projects within a larger build).

{code}
+gradlew -p lucene/core test -Dtests.profile=true
{code}

It will work but -Ptests.profile=true would be more gradle-sque (it sets a 
project property as opposed to system property).

{code}
+gradle.buildFinished {
+  if (!recordings.isEmpty()) {
+def args = ["ProfileResults"]
+for (file in recordings.getFiles()) {
+  args += file.toString()
+}
+ProfileResults.main(args as String[])
+  }
+}
{code}

If you pull up the if then this thing can go underneath so that it's not adding 
any closure if it's not enabled. Also: it'll always display the profile, even 
on a failed build. Look at slowest-tests-at-end.gradle - this one only displays 
the slowest tests if the build is successful.

Finally you may want to simplify to something like (didn't check but should 
work):
{code}
 def args = ["ProfileResults"]
 args += recordings.getFiles().collect { it.toString() }
{code}

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> 

[GitHub] [lucene-solr] ErickErickson opened a new pull request #1219: LUCENE-9134: Javacc skeleton for Gradle regenerate

2020-01-28 Thread GitBox
ErickErickson opened a new pull request #1219:  LUCENE-9134: Javacc skeleton 
for Gradle regenerate
URL: https://github.com/apache/lucene-solr/pull/1219
 
 
   Here's the build changes to get javacc to run, modeled on the jflex changes 
, many thanks for the model. Only two files changed here ;)
   
   If the structure is OK, I'll fill in the "doLast" blocks with the cleanup 
code and maybe be able extract some common parts. NOTE: you can't even compile 
the result of running this because I wanted the changes to the build structure 
to be clear first so didn't include the cleanup tasks yet.
   
   So if this structure is OK, should I merge it into master before or after 
the rest of the cleanup? My assumption is after. I want to try to get all the 
warnings etc. out of the generated code in the next phase to reduce the 
temptation for people to make hand-edits.
   
   I didn't intentionally change the line endings in defaults-java, there's no 
other change there...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson closed pull request #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
ErickErickson closed pull request #1218: LUCENE-9134: Javacc skeleton
URL: https://github.com/apache/lucene-solr/pull/1218
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on issue #1218: LUCENE-9134: Javacc skeleton

2020-01-28 Thread GitBox
ErickErickson commented on issue #1218: LUCENE-9134: Javacc skeleton
URL: https://github.com/apache/lucene-solr/pull/1218#issuecomment-579219139
 
 
   Didn't title it right.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2020-01-28 Thread Alessandro Benedetti (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025050#comment-17025050
 ] 

Alessandro Benedetti commented on SOLR-12238:
-

I followed the refactor comments from both @diegoceccarelli  and @romseygeek .
The PR seems much cleaner right now both Lucene and Solr side.
Copious tests are present and should cover the various situations.

Few questions remain:

- from a test I read a comment from @dsmiley  saying: "confirm 
autoGeneratePhraseQueries always builds OR queries" from 
org.apache.solr.search.TestSolrQueryParser#testSynonymQueryStyle

- what can we do for the SpanBoostQuery, I was completely not aware they are 
going to be deprecated

Let me know

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost by payload

2020-01-28 Thread GitBox
alessandrobenedetti commented on issue #357: [SOLR-12238] Synonym Queries boost 
by payload 
URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-579200844
 
 
   I followed the refactor comments from both @diegoceccarelli  and @romseygeek 
.
   The PR seems much cleaner right now both Lucene and Solr side.
   Copious tests are present and should cover the various situations.
   
   Few questions remain:
   
   - from a test I read a comment from @dsmiley  saying: "confirm 
autoGeneratePhraseQueries always builds OR queries" from 
org.apache.solr.search.TestSolrQueryParser#testSynonymQueryStyle
   
   - what can we do for the SpanBoostQuery, I was completely not aware they are 
going to be deprecated
   
   Let me know
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9187) remove too-expensive assert from LZ4 HighCompressionHashTable

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025048#comment-17025048
 ] 

Robert Muir commented on LUCENE-9187:
-

cc [~jpountz]

> remove too-expensive assert from LZ4 HighCompressionHashTable
> -
>
> Key: LUCENE-9187
> URL: https://issues.apache.org/jira/browse/LUCENE-9187
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9187.patch
>
>
> This is the slowest method in the lucene tests. See LUCENE-9185 for what I 
> mean.
> If you look at it, its checking 64k values every time the assert is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9187) remove too-expensive assert from LZ4 HighCompressionHashTable

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9187:

Attachment: LUCENE-9187.patch

> remove too-expensive assert from LZ4 HighCompressionHashTable
> -
>
> Key: LUCENE-9187
> URL: https://issues.apache.org/jira/browse/LUCENE-9187
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9187.patch
>
>
> This is the slowest method in the lucene tests. See LUCENE-9185 for what I 
> mean.
> If you look at it, its checking 64k values every time the assert is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9187) remove too-expensive assert from LZ4 HighCompressionHashTable

2020-01-28 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9187:
---

 Summary: remove too-expensive assert from LZ4 
HighCompressionHashTable
 Key: LUCENE-9187
 URL: https://issues.apache.org/jira/browse/LUCENE-9187
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


This is the slowest method in the lucene tests. See LUCENE-9185 for what I mean.

If you look at it, its checking 64k values every time the assert is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9186) remove linefiledocs usage from basetokenstreamtestcase

2020-01-28 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9186:

Attachment: LUCENE-9186.patch

> remove linefiledocs usage from basetokenstreamtestcase
> --
>
> Key: LUCENE-9186
> URL: https://issues.apache.org/jira/browse/LUCENE-9186
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/test
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9186.patch
>
>
> LineFileDocs is slow, even to open. That's because it (very slowly) "skips" 
> to a pseudorandom position into a 5MB gzip stream when you open it.
> There was a time when we didn't have a nice string generator for tests 
> (TestUtil.randomAnalysisString), but now we do. And when it was introduced it 
> found interesting new things that linefiledocs never found.
> This speeds up all the analyzer tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9186) remove linefiledocs usage from basetokenstreamtestcase

2020-01-28 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9186:
---

 Summary: remove linefiledocs usage from basetokenstreamtestcase
 Key: LUCENE-9186
 URL: https://issues.apache.org/jira/browse/LUCENE-9186
 Project: Lucene - Core
  Issue Type: Task
  Components: general/test
Reporter: Robert Muir


LineFileDocs is slow, even to open. That's because it (very slowly) "skips" to 
a pseudorandom position into a 5MB gzip stream when you open it.

There was a time when we didn't have a nice string generator for tests 
(TestUtil.randomAnalysisString), but now we do. And when it was introduced it 
found interesting new things that linefiledocs never found.

This speeds up all the analyzer tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9185) add "tests.profile" to gradle build to aid fixing slow tests

2020-01-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025032#comment-17025032
 ] 

Robert Muir commented on LUCENE-9185:
-

Attached is my initial stab... its helpful to me at least when tracking these 
things down. cc [~dweiss]

> add "tests.profile" to gradle build to aid fixing slow tests
> 
>
> Key: LUCENE-9185
> URL: https://issues.apache.org/jira/browse/LUCENE-9185
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9185.patch
>
>
> It is kind of a hassle to profile slow tests to fix the bottlenecks
> The idea here is to make it dead easy to profile (just) the tests, capturing 
> samples at a very low granularity, reducing noise as much as possible (e.g. 
> not profiling entire gradle build or anything) and print a simple report for 
> quick iterating.
> Here's a prototype of what I hacked together:
> All of lucene core: {{./gradlew -p lucene/core test -Dtests.profile=true}}
> {noformat}
> ...
> PROFILE SUMMARY from 122464 samples
>   tests.profile.count=10
>   tests.profile.stacksize=1
>   tests.profile.linenumbers=false
> PERCENT SAMPLES STACK
> 2.59%   3170
> org.apache.lucene.util.compress.LZ4$HighCompressionHashTable#assertReset()
> 2.26%   2762java.util.Arrays#fill()
> 1.59%   1953com.carrotsearch.randomizedtesting.RandomizedContext#context()
> 1.24%   1523java.util.Random#nextInt()
> 1.19%   1456java.lang.StringUTF16#compress()
> 1.08%   1319java.lang.StringLatin1#inflate()
> 1.00%   1228java.lang.Integer#getChars()
> 0.99%   1214java.util.Arrays#compareUnsigned()
> 0.96%   1179java.util.zip.Inflater#inflateBytesBytes()
> 0.91%   1114java.util.concurrent.atomic.AtomicLong#compareAndSet()
> BUILD SUCCESSFUL in 3m 59s
> {noformat}
> If you look at this LZ4 assertReset method, you can see its indeed way too 
> expensive, checking 64K items every time.
> To dig deeper into potential problems you can pass additional parameters (all 
> of them used here for demonstration):
> {{./gradlew -p solr/core test --tests TestLRUStatsCache -Dtests.profile=true 
> -Dtests.profile.count=8 -Dtests.profile.stacksize=20 
> -Dtests.profile.linenumbers=true}}
> This clearly finds SOLR-14223 (expensive RSA key generation in CryptoKeys) ...
> {noformat}
> ...
> PROFILE SUMMARY from 21355 samples
>   tests.profile.count=8
>   tests.profile.stacksize=20
>   tests.profile.linenumbers=true
> PERCENT SAMPLES STACK
> 26.30%  5617sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool#runJob():806
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner#run():938
>   at java.lang.Thread#run():830
> 16.19%  3458sun.nio.ch.EPoll#wait():(Native code)
>   at sun.nio.ch.EPollSelectorImpl#doSelect():120
>   at sun.nio.ch.SelectorImpl#lockAndDoSelect():124
>   at sun.nio.ch.SelectorImpl#select():141
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#select():472
>   at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer#produce():409
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produceTask():360
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#doProduce():184
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#tryProduce():171
>   at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill#produce():135
>   at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$235.1914126144#run():(Interpreted
>  code)
> 

[jira] [Updated] (SOLR-14224) Not able to build solr 6.6.2 from source after January 15, 2020

2020-01-28 Thread Guruprasad K K (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guruprasad K K updated SOLR-14224:
--
Description: 
After Jan 15th maven is allowing only https connections to repo. But solr 6.6.2 
version uses http connection. So builds are failing.

But looks like latest version of solr has the fix to this in common_build.xml 
and other places where it uses https connection to maven.

 

Error log:
 ivy-bootstrap1:
 [mkdir] Created dir: /root/.ant/lib
 [echo] installing ivy 2.3.0 to /root/.ant/lib
 [get] Getting: 
 [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

[get] To: /root/.ant/lib/ivy-2.3.0.jar
 [get] Error opening connection 
 [java.io|http://java.io/]
 .IOException: Server returned HTTP response code: 501 for URL: 
 [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

[get] Error opening connection 
 [java.io|http://java.io/]
 .IOException: Server returned HTTP response code: 501 for URL: 
 [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

[get] Error opening connection 
 [java.io|http://java.io/]
 .IOException: Server returned HTTP response code: 501 for URL: 
 [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

[get] Can't get 
 [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
 to /root/.ant/lib/ivy-2.3.0.jar

 

 

 

 

[NOTE]: It works on latest version of solr, where http is converted to https

  was:
After Jan 15th maven is allowing only https connections to repo. But solr 6.6.2 
version uses http connection. So builds are failing.

But looks like latest version of solr has the fix to this in common_build.xml 
and other places where it uses https connection to maven.

 

Error log:
ivy-bootstrap1:
[mkdir] Created dir: /root/.ant/lib
 [echo] installing ivy 2.3.0 to /root/.ant/lib
  [get] Getting: 
[http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

  [get] To: /root/.ant/lib/ivy-2.3.0.jar
  [get] Error opening connection 
[java.io|http://java.io/]
.IOException: Server returned HTTP response code: 501 for URL: 
[http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

  [get] Error opening connection 
[java.io|http://java.io/]
.IOException: Server returned HTTP response code: 501 for URL: 
[http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

  [get] Error opening connection 
[java.io|http://java.io/]
.IOException: Server returned HTTP response code: 501 for URL: 
[http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]

  [get] Can't get 
[http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
 to /root/.ant/lib/ivy-2.3.0.jar


> Not able to build solr 6.6.2 from source after January 15, 2020
> ---
>
> Key: SOLR-14224
> URL: https://issues.apache.org/jira/browse/SOLR-14224
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6.2
>Reporter: Guruprasad K K
>Priority: Major
>
> After Jan 15th maven is allowing only https connections to repo. But solr 
> 6.6.2 version uses http connection. So builds are failing.
> But looks like latest version of solr has the fix to this in common_build.xml 
> and other places where it uses https connection to maven.
>  
> Error log:
>  ivy-bootstrap1:
>  [mkdir] Created dir: /root/.ant/lib
>  [echo] installing ivy 2.3.0 to /root/.ant/lib
>  [get] Getting: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] To: /root/.ant/lib/ivy-2.3.0.jar
>  [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Error opening connection 
>  [java.io|http://java.io/]
>  .IOException: Server returned HTTP response code: 501 for URL: 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
> [get] Can't get 
>  [http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar]
>  to /root/.ant/lib/ivy-2.3.0.jar
>  
>  
>  
>  
> [NOTE]: It works on latest version of solr, where http is converted to https



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >