[jira] [Comment Edited] (LUCENE-8213) Cache costly subqueries asynchronously

2019-10-02 Thread Atri Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943317#comment-16943317
 ] 

Atri Sharma edited comment on LUCENE-8213 at 10/3/19 4:18 AM:
--

Thanks [~hossman] 

Interestingly, all but one test failures are coming from LatLon queries -- is 
there anything special about them?


was (Author: atris):
Thanks [~hossman] 

Interestingly, all but one test failures are coming from LatLong queries -- is 
there anything special about them?

> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
> Attachments: 
> 0001-Reproduce-across-segment-caching-of-same-query.patch, 
> thetaphi_Lucene-Solr-master-Linux_24839.log.txt
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously

2019-10-02 Thread Atri Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943317#comment-16943317
 ] 

Atri Sharma commented on LUCENE-8213:
-

Thanks [~hossman] 

Interestingly, all but one test failures are coming from LatLong queries -- is 
there anything special about them?

> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
> Attachments: 
> 0001-Reproduce-across-segment-caching-of-same-query.patch, 
> thetaphi_Lucene-Solr-master-Linux_24839.log.txt
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on issue #853: SOLR-13737 Moving SolrCloud on the README with some cues.

2019-10-02 Thread GitBox
dsmiley commented on issue #853: SOLR-13737 Moving SolrCloud on the README with 
some cues.
URL: https://github.com/apache/lucene-solr/pull/853#issuecomment-537766519
 
 
   Hey could you update the README and try again?  Also replace "distributed" 
with "clustered" per recent conversation on the dev list (CC @chatman )


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8989) IndexSearcher Should Handle Rejection of Concurrent Task

2019-10-02 Thread Atri Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943296#comment-16943296
 ] 

Atri Sharma commented on LUCENE-8989:
-

bq. Hmm, that is a good point – maybe we should not handle this exception and 
let it throw?

I would still argue that the default case should be to handle the exception and 
let the query execute gracefully. The case where executor has been erratically 
shut down is a good point, but for edges like this, we should probably override 
IndexSearcher's default behaviour. If it helps, we could potentially introduce 
a toggle which is enabled by default and indicates that IndexSearcher will 
perform the graceful handling, and if the user wishes the error to propagate, 
just call a new API method to disable the toggle?


> IndexSearcher Should Handle Rejection of Concurrent Task
> 
>
> Key: LUCENE-8989
> URL: https://issues.apache.org/jira/browse/LUCENE-8989
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As discussed in [https://github.com/apache/lucene-solr/pull/815,] 
> IndexSearcher should handle the case when the executor rejects the execution 
> of a task (unavailability of threads?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-02 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943283#comment-16943283
 ] 

Ben Manes commented on SOLR-8241:
-

Thanks [~ab], [~dsmiley], [~elyograg]!

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8991) disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399

2019-10-02 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated LUCENE-8991:
---
Fix Version/s: 8.3
   master (9.0)
 Assignee: Chris M. Hostetter
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> disable java.util.HashMap assertions to avoid spurious vailures due to 
> JDK-8205399
> --
>
> Key: LUCENE-8991
> URL: https://issues.apache.org/jira/browse/LUCENE-8991
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
>  Labels: Java10, Java11
> Fix For: master (9.0), 8.3
>
> Attachments: LUCENE-8991.patch, LUCENE-8991.patch
>
>
> An incredibly common class of jenkins failure (at least in Solr tests) stems 
> from triggering assertion failures in java.util.HashMap -- evidently 
> triggering bug JDK-8205399, first introduced in java-10, and fixed in 
> java-12, but has never been backported to any java-10 or java-11 bug fix 
> release...
>https://bugs.openjdk.java.net/browse/JDK-8205399
> SOLR-13653 tracks how this bug can affect Solr users, but I think it would 
> make sense to disable java.util.HashMap in our build system to reduce the 
> confusing failures when users/jenkins runs tests, since there is nothing we 
> can do to work around this when testing with java-11 (or java-10 on branch_8x)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8991) disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943255#comment-16943255
 ] 

ASF subversion and git services commented on LUCENE-8991:
-

Commit 60b9ec0866e6223afb269fa377203f731cca2973 in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=60b9ec0 ]

LUCENE-8991: disable java.util.HashMap assertions to avoid spurious vailures 
due to JDK-8205399

(cherry picked from commit 10da07a396777e3e7cfb091c5dec826b6df11284)


> disable java.util.HashMap assertions to avoid spurious vailures due to 
> JDK-8205399
> --
>
> Key: LUCENE-8991
> URL: https://issues.apache.org/jira/browse/LUCENE-8991
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
>  Labels: Java10, Java11
> Attachments: LUCENE-8991.patch, LUCENE-8991.patch
>
>
> An incredibly common class of jenkins failure (at least in Solr tests) stems 
> from triggering assertion failures in java.util.HashMap -- evidently 
> triggering bug JDK-8205399, first introduced in java-10, and fixed in 
> java-12, but has never been backported to any java-10 or java-11 bug fix 
> release...
>https://bugs.openjdk.java.net/browse/JDK-8205399
> SOLR-13653 tracks how this bug can affect Solr users, but I think it would 
> make sense to disable java.util.HashMap in our build system to reduce the 
> confusing failures when users/jenkins runs tests, since there is nothing we 
> can do to work around this when testing with java-11 (or java-10 on branch_8x)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] KoenDG opened a new pull request #919: LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll().

2019-10-02 Thread GitBox
KoenDG opened a new pull request #919: LUCENE-8994: Code Cleanup - Pass values 
to list constructor instead of empty constructor followed by addAll().
URL: https://github.com/apache/lucene-solr/pull/919
 
 
   https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-8994
   
   If you have actual serious issues to attend, no need to bother with this PR, 
it is code cleanup, not features or fixes.
   
   A small and unimportant PR. Some code cleanup. And perhaps in some cases a 
small performance gain.
   
   I would understand if issue was taken concerning readability in some cases.
   
   I could change those cases to look something like this, if it made if more 
readable:
   
   ```
   new ArrayList<>(
   nameOfCollection.getMethod(someExtraVars)
   );
   ```
   
   Frankly, it already was equally unreadable in such cases as:
   
   ```
   resources.addAll(Accountables.namedAccountables("field", fields));
   ```
   
   Depends on what the reviewer wants, I suppose.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] KoenDG commented on issue #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2

2019-10-02 Thread GitBox
KoenDG commented on issue #881: LUCENE-8979: Code Cleanup: Use entryset for map 
iteration wherever possible. - part 2
URL: https://github.com/apache/lucene-solr/pull/881#issuecomment-537724000
 
 
   Updated as requested.
   
   Also, the initial changes were not manual, they were automatic with the 
Intellij IDE.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8991) disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943222#comment-16943222
 ] 

ASF subversion and git services commented on LUCENE-8991:
-

Commit 10da07a396777e3e7cfb091c5dec826b6df11284 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=10da07a ]

LUCENE-8991: disable java.util.HashMap assertions to avoid spurious vailures 
due to JDK-8205399


> disable java.util.HashMap assertions to avoid spurious vailures due to 
> JDK-8205399
> --
>
> Key: LUCENE-8991
> URL: https://issues.apache.org/jira/browse/LUCENE-8991
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
>  Labels: Java10, Java11
> Attachments: LUCENE-8991.patch, LUCENE-8991.patch
>
>
> An incredibly common class of jenkins failure (at least in Solr tests) stems 
> from triggering assertion failures in java.util.HashMap -- evidently 
> triggering bug JDK-8205399, first introduced in java-10, and fixed in 
> java-12, but has never been backported to any java-10 or java-11 bug fix 
> release...
>https://bugs.openjdk.java.net/browse/JDK-8205399
> SOLR-13653 tracks how this bug can affect Solr users, but I think it would 
> make sense to disable java.util.HashMap in our build system to reduce the 
> confusing failures when users/jenkins runs tests, since there is nothing we 
> can do to work around this when testing with java-11 (or java-10 on branch_8x)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8213) Cache costly subqueries asynchronously

2019-10-02 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter reopened LUCENE-8213:


> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
> Attachments: 
> 0001-Reproduce-across-segment-caching-of-same-query.patch, 
> thetaphi_Lucene-Solr-master-Linux_24839.log.txt
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously

2019-10-02 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943217#comment-16943217
 ] 

Chris M. Hostetter commented on LUCENE-8213:


FWIW: I just attached thetaphi_Lucene-Solr-master-Linux_24839.log.txt, a 
jenkins log that shows two other failures that seem to be related to this 
issue...

{noformat}
Checking out Revision 3c399bb696073bdb30f278309410a50effabd0e7 
(refs/remotes/origin/master)
...
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestLatLonDocValuesQueries -Dtests.method=testAllLatEqual 
-Dtests.seed=8D56B48917EDB35F -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=zh-Hans -Dtests.timezone=America/Bahia -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 2.67s J0 | TestLatLonDocValuesQueries.testAllLatEqual <<<
   [junit4]> Throwable #1: java.lang.AssertionError: wrong hit (first of 
possibly more):
   [junit4]> FAIL: id=235 should match but did not
   [junit4]>   query=point:polygons([[0.0, 1.401298464324817E-45] [35.0, 
1.401298464324817E-45] [35.0, 180.0] [0.0, 180.0] [0.0, 1.401298464324817E-45] 
]) docID=227
   [junit4]>   lat=0.0 lon=41.82071185670793
   [junit4]>   deleted?=false  polygon=[0.0, 1.401298464324817E-45] [35.0, 
1.401298464324817E-45] [35.0, 180.0] [0.0, 180.0] [0.0, 1.401298464324817E-45] 
...
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestFieldCacheSortRandom -Dtests.method=testRandomStringSort 
-Dtests.seed=F7E42E6905E37945 -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=khq-ML -Dtests.timezone=America/Chihuahua -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.15s J1 | TestFieldCacheSortRandom.testRandomStringSort <<<
   [junit4]> Throwable #1: java.lang.AssertionError: expected:<[65 76 65 64 
6d 68 76 68 75 77 61 71 64 63 65 63 7a 79 77 77 63 69 71 62 76 70 7a 62 6d 66 
75 67 61]> but was:<[66 6c 75 6e 78 63 67 66 63 77 7a 6b 69 6d 7a 77 62 6c 71 
61 79 61 67 71 6f 69 66 71 64 6a 66]>
{noformat}

both of those seeds seem to fail reliably against GIT:3c399bb6960 (the version 
checked out by the jenkins run) but both seem to start passing reliably as of 
GIT:302cd09b4ce (when this issue was reverted)

I should point out however in case it's helpful: when the failures reproduce, 
the specifics of the failures are always different -- suggesting that parallel 
thread execution is affecting the results, since the randomization of the index 
data should be deterministic based on the seed.

> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
> Attachments: 
> 0001-Reproduce-across-segment-caching-of-same-query.patch, 
> thetaphi_Lucene-Solr-master-Linux_24839.log.txt
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8213) Cache costly subqueries asynchronously

2019-10-02 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated LUCENE-8213:
---
Attachment: thetaphi_Lucene-Solr-master-Linux_24839.log.txt

> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
> Attachments: 
> 0001-Reproduce-across-segment-caching-of-same-query.patch, 
> thetaphi_Lucene-Solr-master-Linux_24839.log.txt
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13797) SolrResourceLoader produces inconsistent results when given bad arguments

2019-10-02 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-13797:
-
Fix Version/s: master (9.0)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for the review Anshum. Fixed the annotation and pushed.

> SolrResourceLoader produces inconsistent results when given bad arguments
> -
>
> Key: SOLR-13797
> URL: https://issues.apache.org/jira/browse/SOLR-13797
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13797.v1.patch, SOLR-13797.v2.patch
>
>
> SolrResourceLoader will attempt to do some magic to infer what the user 
> wanted when loading TokenFilter and Tokenizer classes. However, this can end 
> up putting the wrong class in the cache such that the request succeeds the 
> first time but fails subsequent times. It should either succeed or fail 
> consistently on every call.
> This can be triggered in a variety of ways, but the simplest is maybe by 
> specifying the wrong element type in an indexing chain. Consider the field 
> type definition:
> {code:xml}
> 
>   
> 
>  maxGramSize="2"/>
>   
> 
> {code}
> If loaded by itself (e.g. docker container for standalone validation) then 
> the schema will pass and collection will succeed, with Solr actually figuring 
> out that it needs an {{NGramTokenFilterFactory}}. However, if this is loaded 
> on a cluster with other collections where the {{NGramTokenizerFactory}} has 
> been loaded correctly then we get {{ClassCastException}}. Or if this 
> collection is loaded first then others using the Tokenizer will fail instead.
> I'd argue that succeeding on both calls is the better approach because it 
> does what the user likely wants instead of what the user explicitly asks for, 
> and creates a nicer user experience that is marginally less pedantic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13797) SolrResourceLoader produces inconsistent results when given bad arguments

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943203#comment-16943203
 ] 

ASF subversion and git services commented on SOLR-13797:


Commit 2d3baf6e8fc1d86f13e2e0f62eba97bb5ec9afca in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2d3baf6 ]

SOLR-13797 SolrResourceLoader no longer caches bad results when asked for wrong 
type


> SolrResourceLoader produces inconsistent results when given bad arguments
> -
>
> Key: SOLR-13797
> URL: https://issues.apache.org/jira/browse/SOLR-13797
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Attachments: SOLR-13797.v1.patch, SOLR-13797.v2.patch
>
>
> SolrResourceLoader will attempt to do some magic to infer what the user 
> wanted when loading TokenFilter and Tokenizer classes. However, this can end 
> up putting the wrong class in the cache such that the request succeeds the 
> first time but fails subsequent times. It should either succeed or fail 
> consistently on every call.
> This can be triggered in a variety of ways, but the simplest is maybe by 
> specifying the wrong element type in an indexing chain. Consider the field 
> type definition:
> {code:xml}
> 
>   
> 
>  maxGramSize="2"/>
>   
> 
> {code}
> If loaded by itself (e.g. docker container for standalone validation) then 
> the schema will pass and collection will succeed, with Solr actually figuring 
> out that it needs an {{NGramTokenFilterFactory}}. However, if this is loaded 
> on a cluster with other collections where the {{NGramTokenizerFactory}} has 
> been loaded correctly then we get {{ClassCastException}}. Or if this 
> collection is loaded first then others using the Tokenizer will fail instead.
> I'd argue that succeeding on both calls is the better approach because it 
> does what the user likely wants instead of what the user explicitly asks for, 
> and creates a nicer user experience that is marginally less pedantic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage

2019-10-02 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943196#comment-16943196
 ] 

Lucene/Solr QA commented on SOLR-13812:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
58s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  4m 33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  4m 12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  4m 12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
50s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
34s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13812 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982002/SOLR-13812.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / a57ec14 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/564/testReport/ |
| modules | C: solr/core solr/test-framework U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/564/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test 
> coverage
> 
>
> Key: SOLR-13812
> URL: https://issues.apache.org/jira/browse/SOLR-13812
> Project: Solr
>  Issue Type: Test
>Reporter: Diego Ceccarelli
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-13812.patch
>
>
> In 
> https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d
>  on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 
> [~diegoceccarelli] proposes to add javadocs and uneven length parameter 
> rejection for the {{SolrTestCaseJ4.params(String...)}} method.
> This ticket proposes to do that plus to also add basic test coverage for the 
> method, separately from the unrelated SOLR-11831 changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-8998) OverviewImplTest.testIsOptimized reproducible failure

2019-10-02 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created LUCENE-8998:
--

 Summary: OverviewImplTest.testIsOptimized reproducible failure
 Key: LUCENE-8998
 URL: https://issues.apache.org/jira/browse/LUCENE-8998
 Project: Lucene - Core
  Issue Type: Bug
  Components: luke
Reporter: Chris M. Hostetter
Assignee: Tomoko Uchida


The following seed reproduces reliably for me on master...

(NOTE: the {{ERROR StatusLogger}} messages include the one about the 
AccessControlException occur even with other seeds when the test passes)

{noformat}
[mkdir] Created dir: /home/hossman/lucene/alt_dev/lucene/build/luke/test
[junit4:pickseed] Seed property 'tests.seed' already defined: 9123DD19C50D658
[mkdir] Created dir: 
/home/hossman/lucene/alt_dev/lucene/build/luke/test/temp
   [junit4]  says cześć! Master seed: 9123DD19C50D658
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(8576@localhost).
   [junit4] Suite: org.apache.lucene.luke.models.overview.OverviewImplTest
   [junit4]   2> ERROR StatusLogger No Log4j 2 configuration file found. Using 
default configuration (logging only errors to the console), or user 
programmatically provided configurations. Set system property 'log4j2.debug' to 
show Log4j 2 internal initialization logging. See 
https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions 
on how to configure Log4j 2
   [junit4]   2> ERROR StatusLogger Could not reconfigure JMX
   [junit4]   2>  java.security.AccessControlException: access denied 
("javax.management.MBeanServerPermission" "createMBeanServer")
   [junit4]   2>at 
java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
   [junit4]   2>at 
java.base/java.security.AccessController.checkPermission(AccessController.java:897)
   [junit4]   2>at 
java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:322)
   [junit4]   2>at 
java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:479)
   [junit4]   2>at 
org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140)
   [junit4]   2>at 
org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:559)
   [junit4]   2>at 
org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:620)
   [junit4]   2>at 
org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:637)
   [junit4]   2>at 
org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:231)
   [junit4]   2>at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
   [junit4]   2>at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
   [junit4]   2>at 
org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
   [junit4]   2>at 
org.apache.logging.log4j.LogManager.getLogger(LogManager.java:581)
   [junit4]   2>at 
org.apache.lucene.luke.util.LoggerFactory.getLogger(LoggerFactory.java:70)
   [junit4]   2>at 
org.apache.lucene.luke.models.util.IndexUtils.(IndexUtils.java:62)
   [junit4]   2>at 
org.apache.lucene.luke.models.LukeModel.(LukeModel.java:60)
   [junit4]   2>at 
org.apache.lucene.luke.models.overview.OverviewImpl.(OverviewImpl.java:50)
   [junit4]   2>at 
org.apache.lucene.luke.models.overview.OverviewImplTest.testIsOptimized(OverviewImplTest.java:77)
   [junit4]   2>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]   2>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]   2>at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]   2>at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
   [junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
   [junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
   [junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
   [junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
   [junit4]   2>at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
   [junit4]   2>at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
   [junit4]   2>at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
   [junit4]   2>at 
or

[GitHub] [lucene-solr] jpountz commented on issue #905: LUCENE-8990: Add estimateDocCount(visitor) method to PointValues

2019-10-02 Thread GitBox
jpountz commented on issue #905: LUCENE-8990: Add estimateDocCount(visitor) 
method to PointValues
URL: https://github.com/apache/lucene-solr/pull/905#issuecomment-537680322
 
 
   @colings86 pointed me to 
https://math.stackexchange.com/questions/1175295/urn-problem-probability-of-drawing-balls-of-k-unique-colors
 which seems to have the answer to the problem we are trying to solve, if we 
can make the assumption that all docs have about the same number of values, 
which is likely not always the case but still probably a fair assumption for 
this kind of heuristic?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz edited a comment on issue #905: LUCENE-8990: Add estimateDocCount(visitor) method to PointValues

2019-10-02 Thread GitBox
jpountz edited a comment on issue #905: LUCENE-8990: Add 
estimateDocCount(visitor) method to PointValues
URL: https://github.com/apache/lucene-solr/pull/905#issuecomment-537456383
 
 
   I was more thinking that there should be a better formula, which takes into 
account the fact that when you match few points, then it's more likely that all 
points belong to different documents than that they belong to the same 
documents?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8993) Change Maven POM repository URLs to https

2019-10-02 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943150#comment-16943150
 ] 

Uwe Schindler edited comment on LUCENE-8993 at 10/2/19 8:41 PM:


After deleting the local Maven repos on Jnekins, I had to fix this issue again: 
https://issues.apache.org/jira/browse/LUCENE-2562?focusedCommentId=16816237&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16816237

It looks like the Ant/Ivy build no longer works on a completely clean 
infrastructure (no maven repo, no ivy cache).


was (Author: thetaphi):
After deleting the Maven repo, I had to fix this issue again: 
https://issues.apache.org/jira/browse/LUCENE-2562?focusedCommentId=16816237&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16816237

> Change Maven POM repository URLs to https
> -
>
> Key: LUCENE-8993
> URL: https://issues.apache.org/jira/browse/LUCENE-8993
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Affects Versions: 7.7.2, 8.2, 8.1.1
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.3
>
> Attachments: LUCENE-8993.patch
>
>
> After fixing LUCENE-8807 I figured out today, that Lucene's build system uses 
> HTTPS URLs everywhere. But the POMs deployed to Maven central still use http 
> (I assumed that those are inherited from the ANT build).
> This will fix it for later versions by changing the POM templates. Hopefully 
> this will not happen in Gradle!
> [~markrmil...@gmail.com]: Can you make sure that the new Gradle build uses 
> HTTPS for all hard configured repositories (like Cloudera)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8993) Change Maven POM repository URLs to https

2019-10-02 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943150#comment-16943150
 ] 

Uwe Schindler commented on LUCENE-8993:
---

After deleting the Maven repo, I had to fix this issue again: 
https://issues.apache.org/jira/browse/LUCENE-2562?focusedCommentId=16816237&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16816237

> Change Maven POM repository URLs to https
> -
>
> Key: LUCENE-8993
> URL: https://issues.apache.org/jira/browse/LUCENE-8993
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Affects Versions: 7.7.2, 8.2, 8.1.1
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.3
>
> Attachments: LUCENE-8993.patch
>
>
> After fixing LUCENE-8807 I figured out today, that Lucene's build system uses 
> HTTPS URLs everywhere. But the POMs deployed to Maven central still use http 
> (I assumed that those are inherited from the ANT build).
> This will fix it for later versions by changing the POM templates. Hopefully 
> this will not happen in Gradle!
> [~markrmil...@gmail.com]: Can you make sure that the new Gradle build uses 
> HTTPS for all hard configured repositories (like Cloudera)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13790) LRUStatsCache size explosion and ineffective caching

2019-10-02 Thread David Wayne Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943149#comment-16943149
 ] 

David Wayne Smiley commented on SOLR-13790:
---

Interesting; your proposal makes sense.  Thanks [~ab].

> LRUStatsCache size explosion and ineffective caching
> 
>
> Key: SOLR-13790
> URL: https://issues.apache.org/jira/browse/SOLR-13790
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2, 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13790.patch, SOLR-13790.patch
>
>
> On a sizeable cluster with multi-shard multi-replica collections, when 
> {{LRUStatsCache}} was in use we encountered excessive memory usage, which 
> consequently led to severe performance problems.
> On a closer examination of the heapdumps it became apparent that when 
> {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of 
> {{FastLRUCache}} using the passed {{shard}} argument - however, the value of 
> this argument is not a simple shard name but instead it's a randomly ordered 
> list of ALL replica URLs for this shard.
> As a result, due to the combinatoric number of possible keys, over time the 
> map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries...
> The fix seems to be simply to extract the shard name and cache using this 
> name instead of the full string value of the {{shard}} parameter. Existing 
> unit tests also need much improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.

2019-10-02 Thread GitBox
thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload 
solr core after schema is persisted.
URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330756503
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java
 ##
 @@ -138,7 +138,7 @@
 
   protected List fieldsWithDefaultValue = new ArrayList<>();
   protected Collection requiredFields = new HashSet<>();
-  protected volatile DynamicField[] dynamicFields;
 
 Review comment:
   > @sarowe why was this volatile? It's fishy to see this as the only volatile 
field.
   
   I was looking for any double check pattern, or lazy init, found nothing, as 
you mentioned it is the only one, so i removed it, makes no sense from my point 
of view


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.

2019-10-02 Thread GitBox
thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload 
solr core after schema is persisted.
URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330755828
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java
 ##
 @@ -1556,48 +1547,46 @@ SimpleOrderedMap getProperties(SchemaField sf) {
 }
   }
 }
-if (null != dynamicCopyFields) {
 
 Review comment:
   > Is this big diff just a re-indent after removing the wrapping null check?
   
   Yes just removed the null check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-02 Thread David Wayne Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943141#comment-16943141
 ] 

David Wayne Smiley commented on SOLR-8241:
--

Woohoo!  Thanks [~ab] and for your extreme persistence [~ben.manes].  Better 
late than never.  I'd hope to see this as the default in solr configs in 9.0.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.

2019-10-02 Thread GitBox
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr 
core after schema is persisted.
URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330750867
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/schema/ManagedIndexSchema.java
 ##
 @@ -81,7 +81,7 @@
 /** Solr-managed schema - non-user-editable, but can be mutable via internal 
and external REST API requests. */
 public final class ManagedIndexSchema extends IndexSchema {
 
-  private boolean isMutable = false;
+  private final boolean isMutable;
 
 Review comment:
   nice


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.

2019-10-02 Thread GitBox
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr 
core after schema is persisted.
URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330751954
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java
 ##
 @@ -1556,48 +1547,46 @@ SimpleOrderedMap getProperties(SchemaField sf) {
 }
   }
 }
-if (null != dynamicCopyFields) {
 
 Review comment:
   Is this big diff just a re-indent after removing the wrapping null check?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.

2019-10-02 Thread GitBox
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr 
core after schema is persisted.
URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330751354
 
 

 ##
 File path: 
solr/core/src/test/org/apache/solr/rest/schema/TestBulkSchemaAPI.java
 ##
 @@ -184,8 +188,6 @@ public void testAnalyzerClass() throws Exception {
 response = restTestHarness.post("/schema", 
json(addFieldTypeAnalyzerWithClass + suffix));
 map = (Map) fromJSONString(response);
 assertNull(response, map.get("error"));
-
-
restTestHarness.checkAdminResponseStatus("/admin/cores?wt=xml&action=RELOAD&core="
 + coreName, "0");
 
 Review comment:
   I'm glad you remembered


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.

2019-10-02 Thread GitBox
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr 
core after schema is persisted.
URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330752451
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java
 ##
 @@ -138,7 +138,7 @@
 
   protected List fieldsWithDefaultValue = new ArrayList<>();
   protected Collection requiredFields = new HashSet<>();
-  protected volatile DynamicField[] dynamicFields;
 
 Review comment:
   @sarowe why was this volatile?  It's fishy to see this as the only volatile 
field.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8989) IndexSearcher Should Handle Rejection of Concurrent Task

2019-10-02 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943135#comment-16943135
 ] 

Michael McCandless commented on LUCENE-8989:


{quote}Suppose the executor has been shut down and is rejecting all requests 
for new work. In that case an Exception here would help the caller to 
understand they are doing something unusual instead of burying the error case 
and attempting to handle it.
{quote}
Hmm, that is a good point – maybe we should not handle this exception and let 
it throw?

> IndexSearcher Should Handle Rejection of Concurrent Task
> 
>
> Key: LUCENE-8989
> URL: https://issues.apache.org/jira/browse/LUCENE-8989
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As discussed in [https://github.com/apache/lucene-solr/pull/815,] 
> IndexSearcher should handle the case when the executor rejects the execution 
> of a task (unavailability of threads?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on issue #665: Fixes SOLR-13539

2019-10-02 Thread GitBox
thomaswoeckinger commented on issue #665: Fixes SOLR-13539
URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537648923
 
 
   pre commit check is still toggling


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13722) Package Management APIs

2019-10-02 Thread David Wayne Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943087#comment-16943087
 ] 

David Wayne Smiley commented on SOLR-13722:
---

I'm confused on the status here.  It's in master but not 8x; was that 
deliberate?   This is not an implied "ask" to merge to 8x as it ought to be 
reviewed any way.

> Package Management APIs
> ---
>
> Key: SOLR-13722
> URL: https://issues.apache.org/jira/browse/SOLR-13722
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: package
>
> This ticket totally eliminates the need for an external service to host the 
> jars. So a url will no longer be required. An external URL leads to 
> unreliability because the service may go offline or it can be DDoSed if/when 
> too many requests are sent to them
>  
>  
>  Add a jar to cluster as follows
> {code:java}
> curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
> @myjar.jar http://localhost:8983/api/cluster/filestore/package?name=myjar.jar
> {code}
> This does the following operations
>  * Upload this jar to all the live nodes in the system
>  * The name of the file is the {{sha256-}} of the file/payload
>  * The store is agnostic of the content of the file/payload
> h2.  How it works?
> A blob that is POSTed to the {{/api/cluster/blob}} end point is persisted 
> locally & all nodes are instructed to download it from this node or from any 
> other available node. If a node comes up later, it can query other nodes in 
> the system and download the blobs as required
> h2. {{add}} package command
> {code:java}
> curl -X POST -H 'Content-type:application/json' --data-binary '{
>   "add": {
>"name": "my-package" ,
>   "file":{"id" : "", "sig" : ""}
>   }}' http://localhost:8983/api/cluster/package
> {code}
> ]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13813) Shared storage online split support

2019-10-02 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943070#comment-16943070
 ] 

Yonik Seeley commented on SOLR-13813:
-

The other notable thing is that I *thought* the test was solid when 
sharedStorage=false, but then I looped the test overnight and also got a 
failure.
I'm going to extract a version of this test that doesn't depend on SOLR-13101 
and see if it's still reproducible.

> Shared storage online split support
> ---
>
> Key: SOLR-13813
> URL: https://issues.apache.org/jira/browse/SOLR-13813
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The strategy for online shard splitting is the same as that for normal (non 
> SHARED shards.)
> During a split, the leader will forward updates to sub-shard leaders, those 
> updates will be buffered by the transaction log while the split is in 
> progress, and then the buffered updates are replayed.
> One change that was added was to push the local index to blob store after 
> buffered updates are applied (but before it is marked as ACTIVE):
> See 
> https://github.com/apache/lucene-solr/commit/fe17c813f5fe6773c0527f639b9e5c598b98c7d4#diff-081b7c2242d674bb175b41b6afc21663
> This issue is about adding tests and ensuring that online shard splitting 
> (while updates are flowing) works reliably.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] magibney edited a comment on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2019-10-02 Thread GitBox
magibney edited a comment on issue #892: LUCENE-8972: Add 
ICUTransformCharFilter, to support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-537538822
 
 
   Here's a thought: what if we provided a boolean configuration option like 
`assumeExternalUnicodeNormalization`. Many of these transforms work on NFD 
input, and produce NFC output, but they generally are configured defensively 
(not assuming input to be NFD, and not assuming that output will be externally 
converted to NFC).
   
   This is understandable, but results in the odd situation (for example) that 
an analysis component like "ICUTransformFilter(Cyrillic-Latin)" would have NFC 
output, but _only_ for characters whose input representation matched the 
top-level Cyrillic-Latin filter (which is pretty restrictive). Input characters 
that didn't match the top-level filter would be untouched by any component of 
the underlying CompoundTransliterator. So if you want fully unicode-normalized 
output (and in the context of an analysis chain, most do), you have to 
separately apply post-transform NFC normalization anyway.
   
   At best, for this ends up doing some redundant work; but for the performance 
case we're considering here, there are particular implications. NFC, as a 
trailing transformation step, is both _very_ common and _very_ active -- active 
in the sense that it will in many common contexts block output waiting for 
combining diacritics for literally almost every character. If we know we're 
externally applying unicode normalization over the entire output, skipping 
baked-in post-NFC for every transform component avoids redundant work, but more 
importantly avoids a common case that's virtually guaranteed to result in a 
substantial amount of partial transliteration, rollback, etc. I think this can 
be done relatively cleanly using Transliterator getElements(), toRules(false), 
and createFromRules(...).
   
   I'd be curious to know what you think, @msokolov, and perhaps @rmuir?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13813) Shared storage online split support

2019-10-02 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943057#comment-16943057
 ] 

Yonik Seeley commented on SOLR-13813:
-

This PR ( https://github.com/apache/lucene-solr/pull/918 ) adds a simple test.
It usually fails for shared storage :-( . Example:
{code}
java.lang.AssertionError: 
Expected :50
Actual   :49
{code}
And I normally see background exceptions like the following during the run:
{code}
Error occured pulling shard=shard1_1 collection=livesplit1 from shared store 
java.lang.Exception: Local Directory content 
/private/var/folders/_f/2q_bxy9d0kz45_rk451rds3n9r_x9g/T/solr.store.blob.SharedStorageSplitTest_E4343DDDB931B9AE-001/tempDir-001/node1/./livesplit1_shard1_1_replica_s4/data/index/
 has changed since Blob pull started. Aborting pull.
at 
org.apache.solr.store.blob.util.BlobStoreUtils.syncLocalCoreWithSharedStore(BlobStoreUtils.java:128)
at 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.readFromSharedStoreIfNecessary(DistributedZkUpdateProcessor.java:1096)
at 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:202)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:200)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2609)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:816)

{code}

Best guess is that this is caused by lack of concurrency support that needs to 
still be addressed in the blob puller/pusher code.

> Shared storage online split support
> ---
>
> Key: SOLR-13813
> URL: https://issues.apache.org/jira/browse/SOLR-13813
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The strategy for online shard splitting is the same as that for normal (non 
> SHARED shards.)
> During a split, the leader will forward updates to sub-shard leaders, those 
> updates will be buffered by the transaction log while the split is in 
> progress, and then the buffered updates are replayed.
> One change that was added was to push the local index to blob store after 
> buffered updates are applied (but before it is marked as ACTIVE):
> See 
> https://github.com/apache/lucene-solr/commit/fe17c813f5fe6773c0527f639b9e5c598b98c7d4#diff-081b7c2242d674bb175b41b6afc21663
> This issue is about adding tests and ensuring that online shard splitting 
> (while updates are flowing) works reliably.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] yonik opened a new pull request #918: SOLR-13813: SHARED: add basic test for online shard splitting

2019-10-02 Thread GitBox
yonik opened a new pull request #918: SOLR-13813: SHARED: add basic test for 
online shard splitting
URL: https://github.com/apache/lucene-solr/pull/918
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943041#comment-16943041
 ] 

Andrzej Bialecki commented on SOLR-8241:


Updated patch:
* reduced contention in stats counting by using LongAdder-s instead of 
AtomicLong-s.
* added option to set maxRamMB limit (it's an either or with the maxSize 
limit). I'm not sure I did the right thing when changing the value of this 
option - basically, if the existing cache was not weighted the {{setMaxRamMB}} 
rebuilds the cache, instead of just changing the policy limits.
* added unit test for testing the limit changes on a live cache.

If this patch looks more or less ok I'll add the RefGuide changes and commit it 
shortly (hopefully in time for 8.3 :) )

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache

2019-10-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-8241:
---
Attachment: SOLR-8241.patch

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8993) Change Maven POM repository URLs to https

2019-10-02 Thread Mark Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943037#comment-16943037
 ] 

Mark Miller commented on LUCENE-8993:
-

FYI, I’ve done this, though I’ll check stuff like this again before the merge 
to master.

> Change Maven POM repository URLs to https
> -
>
> Key: LUCENE-8993
> URL: https://issues.apache.org/jira/browse/LUCENE-8993
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Affects Versions: 7.7.2, 8.2, 8.1.1
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.3
>
> Attachments: LUCENE-8993.patch
>
>
> After fixing LUCENE-8807 I figured out today, that Lucene's build system uses 
> HTTPS URLs everywhere. But the POMs deployed to Maven central still use http 
> (I assumed that those are inherited from the ANT build).
> This will fix it for later versions by changing the POM templates. Hopefully 
> this will not happen in Gradle!
> [~markrmil...@gmail.com]: Can you make sure that the new Gradle build uses 
> HTTPS for all hard configured repositories (like Cloudera)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13797) SolrResourceLoader produces inconsistent results when given bad arguments

2019-10-02 Thread Anshum Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943033#comment-16943033
 ] 

Anshum Gupta commented on SOLR-13797:
-

[~mdrob] - LGTM.

Just a minor question/suggestion. Can you also annotate clearCache w/ 
@VisibleForTesting ?

> SolrResourceLoader produces inconsistent results when given bad arguments
> -
>
> Key: SOLR-13797
> URL: https://issues.apache.org/jira/browse/SOLR-13797
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Attachments: SOLR-13797.v1.patch, SOLR-13797.v2.patch
>
>
> SolrResourceLoader will attempt to do some magic to infer what the user 
> wanted when loading TokenFilter and Tokenizer classes. However, this can end 
> up putting the wrong class in the cache such that the request succeeds the 
> first time but fails subsequent times. It should either succeed or fail 
> consistently on every call.
> This can be triggered in a variety of ways, but the simplest is maybe by 
> specifying the wrong element type in an indexing chain. Consider the field 
> type definition:
> {code:xml}
> 
>   
> 
>  maxGramSize="2"/>
>   
> 
> {code}
> If loaded by itself (e.g. docker container for standalone validation) then 
> the schema will pass and collection will succeed, with Solr actually figuring 
> out that it needs an {{NGramTokenFilterFactory}}. However, if this is loaded 
> on a cluster with other collections where the {{NGramTokenizerFactory}} has 
> been loaded correctly then we get {{ClassCastException}}. Or if this 
> collection is loaded first then others using the Tokenizer will fail instead.
> I'd argue that succeeding on both calls is the better approach because it 
> does what the user likely wants instead of what the user explicitly asks for, 
> and creates a nicer user experience that is marginally less pedantic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13813) Shared storage online split support

2019-10-02 Thread Yonik Seeley (Jira)
Yonik Seeley created SOLR-13813:
---

 Summary: Shared storage online split support
 Key: SOLR-13813
 URL: https://issues.apache.org/jira/browse/SOLR-13813
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley


The strategy for online shard splitting is the same as that for normal (non 
SHARED shards.)
During a split, the leader will forward updates to sub-shard leaders, those 
updates will be buffered by the transaction log while the split is in progress, 
and then the buffered updates are replayed.

One change that was added was to push the local index to blob store after 
buffered updates are applied (but before it is marked as ACTIVE):
See 
https://github.com/apache/lucene-solr/commit/fe17c813f5fe6773c0527f639b9e5c598b98c7d4#diff-081b7c2242d674bb175b41b6afc21663

This issue is about adding tests and ensuring that online shard splitting 
(while updates are flowing) works reliably.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2019-10-02 Thread GitBox
magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to 
support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-537601926
 
 
   Tried this out, and the performance gain was indeed significant. Comparing 
apples to apples here:
   
   
[charFilterPerformanceTest2.txt](https://github.com/apache/lucene-solr/files/3682491/charFilterPerformanceTest2.txt)
   
   Results (again, very quick and dirty):
   
   ```
  [junit4] Suite: org.apache.lucene.analysis.icu.TestICUTransformCharFilter
  [junit4]   2> tokenCount: 100, elapsed: 20244
  [junit4] OK  20.6s | 
TestICUTransformCharFilter.testFilterPerformanceChar
  [junit4]   2> tokenCount: 100, elapsed: 3040
  [junit4] OK  3.05s | 
TestICUTransformCharFilter.testFilterPerformanceToken
  [junit4]   2> tokenCount: 100, elapsed: 4339
  [junit4] OK  4.36s | 
TestICUTransformCharFilter.testFilterPerformanceModifiedChar
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes

2019-10-02 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13811:
--
Attachment: hoss_local_failure_after_refactoring.log.txt
apache_Lucene-Solr-NightlyTests-8.x_221.log.txt
Status: Open  (was: Open)

As noted by gitbot, I've committed some refactoring to help clean this up and 
isolate the problematic test logic.

I'm attaching two files:
 * {{apache_Lucene-Solr-NightlyTests-8.x_221.log.txt}} - showing and example of 
how the problem has manifested in jenkins builds _prior_ to the refactoring 
I've just committed.
 * {{hoss_local_failure_after_refactoring.log.txt}} - showing how the newly 
refactored {{testRapidStopStartStopWithPropChange()}} can fail demonstrating 
the same problem in isolation.

Note that {{testRapidStopStartStopWithPropChange()}} does not fail 
deterministically – the behavior is dependent on the timing of when exactly 
{{NodeLostTrigger}} fires _after_ the node is restarted, but before it is 
stopped again. Perhaps there is a way to "pause" the triggers to increase the 
odds of this happening? ... not sure.  (It also seems to fail much more often 
in the Hdfs version of the test ... i'm not sure if that's because the 
MOVEREPLICA logic works faster/slower then in the non hdfs situation? ... i 
actaully haven't been able to trigger the failure w/the refactoring in place)

[~ab] : can you please take a look at this and chime in with wether you think 
the current code in {{testRapidStopStartStopWithPropChange()}} is something 
that should pass reliably given the way the code is designed to work? ... if so 
please update the jira summary/description to make it clear what the underlying 
bug is, if not we should go ahead and: delete this test method, reclassify this 
issue as a "Test" task, and resolve as "DONE".

> possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest 
> refactoring / fixes
> 
>
> Key: SOLR-13811
> URL: https://issues.apache.org/jira/browse/SOLR-13811
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: apache_Lucene-Solr-NightlyTests-8.x_221.log.txt, 
> hoss_local_failure_after_refactoring.log.txt
>
>
> I've noticed a pattern of failure behavior in jenkins runs of 
> {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass 
> {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which 
> indicates either:
>  # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a 
> situation where the current impl of {{NodeLostTrigger}} isn't smart enough to 
> handle
>  # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't.
> The test failure is currently somewhat finicky to reproduce, and depends on a 
> node being stoped, restarted, and stopped again – while an affected 
> collection is changed from {{autoAddReplicas=false}} to 
> {{autoAddReplicas=true}} before the second "stop"
> Regardless of which of the 2 above is true: the test itself is somewhat 
> convoluted. It creates a sequence of events (some randomized, some static) 
> and asserting specific outcomes after each – but the timing of scheduled 
> triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a 
> random node to shutdown" with a subsequent "explicitly shut down node2" (even 
> if it was the node randomly shut down earlier) is confusing.
> I'm creating this issue to track two tightly dependent objectives:
>  # refactoring this test to:
>  ** better isolate the specific things it's trying to test in individual test 
> methods.
>  ** have a singular test method that triggers the specific sequence of events 
> that is currently problematic (ideally in such a way that it reliably fails).
>  # AwaitsFix this new test method until someone with a better understand of 
> the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is 
> faulty or the code being tested is faulty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943010#comment-16943010
 ] 

ASF subversion and git services commented on SOLR-13811:


Commit 18bf61504fbd9d8becff1a572642b4207dc7d54c in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=18bf615 ]

SOLR-13811: Refactor AutoAddReplicasIntegrationTest to isolate problematic 
situation into an AwaitsFix test method

(cherry picked from commit a57ec148e52507104fdf0f99381d2b485fa846fc)


> possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest 
> refactoring / fixes
> 
>
> Key: SOLR-13811
> URL: https://issues.apache.org/jira/browse/SOLR-13811
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> I've noticed a pattern of failure behavior in jenkins runs of 
> {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass 
> {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which 
> indicates either:
>  # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a 
> situation where the current impl of {{NodeLostTrigger}} isn't smart enough to 
> handle
>  # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't.
> The test failure is currently somewhat finicky to reproduce, and depends on a 
> node being stoped, restarted, and stopped again – while an affected 
> collection is changed from {{autoAddReplicas=false}} to 
> {{autoAddReplicas=true}} before the second "stop"
> Regardless of which of the 2 above is true: the test itself is somewhat 
> convoluted. It creates a sequence of events (some randomized, some static) 
> and asserting specific outcomes after each – but the timing of scheduled 
> triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a 
> random node to shutdown" with a subsequent "explicitly shut down node2" (even 
> if it was the node randomly shut down earlier) is confusing.
> I'm creating this issue to track two tightly dependent objectives:
>  # refactoring this test to:
>  ** better isolate the specific things it's trying to test in individual test 
> methods.
>  ** have a singular test method that triggers the specific sequence of events 
> that is currently problematic (ideally in such a way that it reliably fails).
>  # AwaitsFix this new test method until someone with a better understand of 
> the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is 
> faulty or the code being tested is faulty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cbuescher commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection

2019-10-02 Thread GitBox
cbuescher commented on a change in pull request #913: LUCENE-8995: 
TopSuggestDocsCollector#collect should be able to signal rejection
URL: https://github.com/apache/lucene-solr/pull/913#discussion_r330672796
 
 

 ##
 File path: 
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java
 ##
 @@ -283,17 +299,25 @@ public int compare(Pair o1, Pair o2) {
* 
* If a filter is applied, the queue size is increased by
* half the number of live documents.
+   *
+   * If the collector can reject documents upon collecting, the queue size is
+   * increased by half the number of live documents again.
+   *
* 
* The maximum queue size is {@link #MAX_TOP_N_QUEUE_SIZE}
*/
-  private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double 
liveDocsRatio, boolean filterEnabled) {
+  private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double 
liveDocsRatio, boolean filterEnabled,
 
 Review comment:
   I considered this as well first, but then moved to the two independent 
flags. Will revert that part.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on issue #300: SOLR-11831: Skip second grouping step if group.limit is 1 (aka Las Vegas Patch)

2019-10-02 Thread GitBox
cpoerschke commented on issue #300: SOLR-11831: Skip second grouping step if 
group.limit is 1 (aka Las Vegas Patch)
URL: https://github.com/apache/lucene-solr/pull/300#issuecomment-537594398
 
 
   Thanks @diegoceccarelli for the updates and for splitting the unrelated 
`maxScore` change out into a separate PR! I've just opened 
https://issues.apache.org/jira/browse/SOLR-13812 for the also unrelated (though 
admittedly smaller and less controversional) `SolrTestCaseJ4` change and with 
added test coverage for that too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage

2019-10-02 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-13812:
---
Status: Patch Available  (was: Open)

> SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test 
> coverage
> 
>
> Key: SOLR-13812
> URL: https://issues.apache.org/jira/browse/SOLR-13812
> Project: Solr
>  Issue Type: Test
>Reporter: Diego Ceccarelli
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-13812.patch
>
>
> In 
> https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d
>  on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 
> [~diegoceccarelli] proposes to add javadocs and uneven length parameter 
> rejection for the {{SolrTestCaseJ4.params(String...)}} method.
> This ticket proposes to do that plus to also add basic test coverage for the 
> method, separately from the unrelated SOLR-11831 changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage

2019-10-02 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-13812:
---
Attachment: SOLR-13812.patch

> SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test 
> coverage
> 
>
> Key: SOLR-13812
> URL: https://issues.apache.org/jira/browse/SOLR-13812
> Project: Solr
>  Issue Type: Test
>Reporter: Diego Ceccarelli
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-13812.patch
>
>
> In 
> https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d
>  on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 
> [~diegoceccarelli] proposes to add javadocs and uneven length parameter 
> rejection for the {{SolrTestCaseJ4.params(String...)}} method.
> This ticket proposes to do that plus to also add basic test coverage for the 
> method, separately from the unrelated SOLR-11831 changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage

2019-10-02 Thread Christine Poerschke (Jira)
Christine Poerschke created SOLR-13812:
--

 Summary: SolrTestCaseJ4.params(String...) javadocs, uneven 
rejection, basic test coverage
 Key: SOLR-13812
 URL: https://issues.apache.org/jira/browse/SOLR-13812
 Project: Solr
  Issue Type: Test
Reporter: Diego Ceccarelli
Assignee: Christine Poerschke


In 
https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d
 on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 
[~diegoceccarelli] proposes to add javadocs and uneven length parameter 
rejection for the {{SolrTestCaseJ4.params(String...)}} method.

This ticket proposes to do that plus to also add basic test coverage for the 
method, separately from the unrelated SOLR-11831 changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942993#comment-16942993
 ] 

ASF subversion and git services commented on SOLR-13811:


Commit a57ec148e52507104fdf0f99381d2b485fa846fc in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a57ec14 ]

SOLR-13811: Refactor AutoAddReplicasIntegrationTest to isolate problematic 
situation into an AwaitsFix test method


> possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest 
> refactoring / fixes
> 
>
> Key: SOLR-13811
> URL: https://issues.apache.org/jira/browse/SOLR-13811
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> I've noticed a pattern of failure behavior in jenkins runs of 
> {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass 
> {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which 
> indicates either:
>  # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a 
> situation where the current impl of {{NodeLostTrigger}} isn't smart enough to 
> handle
>  # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't.
> The test failure is currently somewhat finicky to reproduce, and depends on a 
> node being stoped, restarted, and stopped again – while an affected 
> collection is changed from {{autoAddReplicas=false}} to 
> {{autoAddReplicas=true}} before the second "stop"
> Regardless of which of the 2 above is true: the test itself is somewhat 
> convoluted. It creates a sequence of events (some randomized, some static) 
> and asserting specific outcomes after each – but the timing of scheduled 
> triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a 
> random node to shutdown" with a subsequent "explicitly shut down node2" (even 
> if it was the node randomly shut down earlier) is confusing.
> I'm creating this issue to track two tightly dependent objectives:
>  # refactoring this test to:
>  ** better isolate the specific things it's trying to test in individual test 
> methods.
>  ** have a singular test method that triggers the specific sequence of events 
> that is currently problematic (ideally in such a way that it reliably fails).
>  # AwaitsFix this new test method until someone with a better understand of 
> the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is 
> faulty or the code being tested is faulty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942991#comment-16942991
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 8a34ce0257cd48ad2c65a94ace2d9d3e8d102f60 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Yonik Seeley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8a34ce0 ]

SOLR-13101: fix test compilation


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] yonik merged pull request #917: SOLR-13101: fix test compilation

2019-10-02 Thread GitBox
yonik merged pull request #917: SOLR-13101: fix test compilation
URL: https://github.com/apache/lucene-solr/pull/917
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection

2019-10-02 Thread GitBox
jimczi commented on a change in pull request #913: LUCENE-8995: 
TopSuggestDocsCollector#collect should be able to signal rejection
URL: https://github.com/apache/lucene-solr/pull/913#discussion_r330667847
 
 

 ##
 File path: 
lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestPrefixCompletionQuery.java
 ##
 @@ -253,6 +263,126 @@ public void testDocFiltering() throws Exception {
 iw.close();
   }
 
+  /**
+   * Test that the correct amount of documents are collected if using a 
collector that also rejects documents.
+   */
+  public void testCollectorThatRejects() throws Exception {
+// use synonym analyzer to have multiple paths to same suggested document. 
This mock adds "dog" as synonym for "dogs"
+Analyzer analyzer = new MockSynonymAnalyzer();
+RandomIndexWriter iw = new RandomIndexWriter(random(), dir, 
iwcWithSuggestField(analyzer, "suggest_field"));
+List expectedResults = new ArrayList();
+
+for (int docCount = 10; docCount > 0; docCount--) {
+  Document document = new Document();
+  String value = "ab" + docCount + " dogs";
+  document.add(new SuggestField("suggest_field", value, docCount));
+  expectedResults.add(new Entry(value, docCount));
+  iw.addDocument(document);
+}
+
+if (rarely()) {
+  iw.commit();
+}
+
+DirectoryReader reader = iw.getReader();
+SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader);
+
+PrefixCompletionQuery query = new PrefixCompletionQuery(analyzer, new 
Term("suggest_field", "ab"));
+int topN = 5;
+
+// use a TopSuggestDocsCollector that rejects results with duplicate docIds
+TopSuggestDocsCollector collector = new TopSuggestDocsCollector(topN, 
false) {
+
+  private Set seenDocIds = new HashSet<>();
+
+  @Override
+  public boolean collect(int docID, CharSequence key, CharSequence 
context, float score) throws IOException {
+  int globalDocId = docID + docBase;
+  boolean collected = false;
+  if (seenDocIds.contains(globalDocId) == false) {
+  super.collect(docID, key, context, score);
+  seenDocIds.add(globalDocId);
+  collected = true;
+  }
+  return collected;
+  }
+
+  @Override
+  protected boolean canReject() {
+return true;
+  }
+};
+
+indexSearcher.suggest(query, collector);
+TopSuggestDocs suggestions = collector.get();
+assertSuggestions(suggestions, expectedResults.subList(0, 
topN).toArray(new Entry[0]));
+assertTrue(suggestions.isComplete());
+
+reader.close();
+iw.close();
+  }
+
+  /**
+   * A large scale tests where the collector rejects based on docIds
+   */
+  public void testCollectorWithManyRejects() throws Exception {
+Analyzer analyzer = new MockAnalyzer(random());
+RandomIndexWriter iw = new RandomIndexWriter(random(), dir, 
iwcWithSuggestField(analyzer, "suggest_field"));
+Set acceptedDocs = new HashSet<>();
+List expectedResults = new ArrayList();
+
+for (int docCount = 0; docCount < 1; docCount++) {
+  Document document = new Document();
+  String value = "ab" + 
RandomStrings.randomAsciiAlphanumOfLength(random(), 10) +"_" + docCount;
+  document.add(new SuggestField("suggest_field", value, docCount));
+  if (random().nextDouble() > 0.75) {
 
 Review comment:
   the maximum queue size is `5000` so we should ensure that we don't reject 
more than this number if we want to ensure that the search is complete. If you 
change the live docs to contain at least `5000` docs, this test should work 
fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection

2019-10-02 Thread GitBox
jimczi commented on a change in pull request #913: LUCENE-8995: 
TopSuggestDocsCollector#collect should be able to signal rejection
URL: https://github.com/apache/lucene-solr/pull/913#discussion_r330659818
 
 

 ##
 File path: 
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java
 ##
 @@ -283,17 +299,25 @@ public int compare(Pair o1, Pair o2) {
* 
* If a filter is applied, the queue size is increased by
* half the number of live documents.
+   *
+   * If the collector can reject documents upon collecting, the queue size is
+   * increased by half the number of live documents again.
+   *
* 
* The maximum queue size is {@link #MAX_TOP_N_QUEUE_SIZE}
*/
-  private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double 
liveDocsRatio, boolean filterEnabled) {
+  private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double 
liveDocsRatio, boolean filterEnabled,
 
 Review comment:
   I am not sure we need to differentiate the case where there is a filter and 
when the collector can reject. It's the same thing, we don't know the number of 
rejections beforehand so just adding `(numDocs/2)` once should be enough. So we 
can maybe just merge the two boolean and applies the heuristic once ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes

2019-10-02 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created SOLR-13811:
-

 Summary: possible autoAddReplicas bug and/or 
(Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes
 Key: SOLR-13811
 URL: https://issues.apache.org/jira/browse/SOLR-13811
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter


I've noticed a pattern of failure behavior in jenkins runs of 
{{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass 
{{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which indicates 
either:
 # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a 
situation where the current impl of {{NodeLostTrigger}} isn't smart enough to 
handle
 # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't.

The test failure is currently somewhat finicky to reproduce, and depends on a 
node being stoped, restarted, and stopped again – while an affected collection 
is changed from {{autoAddReplicas=false}} to {{autoAddReplicas=true}} before 
the second "stop"

Regardless of which of the 2 above is true: the test itself is somewhat 
convoluted. It creates a sequence of events (some randomized, some static) and 
asserting specific outcomes after each – but the timing of scheduled triggers 
like {{NodeLostTrigger}} , and the interplay of things like "pick a random node 
to shutdown" with a subsequent "explicitly shut down node2" (even if it was the 
node randomly shut down earlier) is confusing.

I'm creating this issue to track two tightly dependent objectives:
 # refactoring this test to:
 ** better isolate the specific things it's trying to test in individual test 
methods.
 ** have a singular test method that triggers the specific sequence of events 
that is currently problematic (ideally in such a way that it reliably fails).
 # AwaitsFix this new test method until someone with a better understand of the 
{{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is faulty 
or the code being tested is faulty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] yonik opened a new pull request #917: SOLR-13101: fix test compilation

2019-10-02 Thread GitBox
yonik opened a new pull request #917: SOLR-13101: fix test compilation
URL: https://github.com/apache/lucene-solr/pull/917
 
 
   Looks like the merge when the original PR was put up broke test compilation. 
 Here's the simple fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on issue #665: Fixes SOLR-13539

2019-10-02 Thread GitBox
thomaswoeckinger commented on issue #665: Fixes SOLR-13539
URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537570032
 
 
   Pushed again without ignored binary test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger edited a comment on issue #665: Fixes SOLR-13539

2019-10-02 Thread GitBox
thomaswoeckinger edited a comment on issue #665: Fixes SOLR-13539
URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537560022
 
 
   > Hi Thomas. I'd really like to get this (#665) in for 8.3, but right now 
it's bundled to your change in #883. I can't merge this without #883.
   > 
   > And I haven't had the time I need to understand how the different pieces 
of #883 (response formats, field types, binary content) fit together and what 
the best design is there.
   > 
   > So what I'm going to do is try to unbundle the two PRs, or at least flip 
the ordering. I'll take what you've uploaded to #665 here and either comment 
out or remove entirely the tests that hit this binary-XML issue. They can be 
added back in #883, once binary-xml works for these field types.
   > 
   > There might still be time to get #883 in, as I have a chunk of time now 
that I didn't before. But even if we don't get to it for 8.3, it won't prevent 
#665 from getting in.
   
   No problem i can comment out the binary test, rebase and push again.
   Just a second!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on issue #665: Fixes SOLR-13539

2019-10-02 Thread GitBox
thomaswoeckinger commented on issue #665: Fixes SOLR-13539
URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537560022
 
 
   > Hi Thomas. I'd really like to get this (#665) in for 8.3, but right now 
it's bundled to your change in #883. I can't merge this without #883.
   > 
   > And I haven't had the time I need to understand how the different pieces 
of #883 (response formats, field types, binary content) fit together and what 
the best design is there.
   > 
   > So what I'm going to do is try to unbundle the two PRs, or at least flip 
the ordering. I'll take what you've uploaded to #665 here and either comment 
out or remove entirely the tests that hit this binary-XML issue. They can be 
added back in #883, once binary-xml works for these field types.
   > 
   > There might still be time to get #883 in, as I have a chunk of time now 
that I didn't before. But even if we don't get to it for 8.3, it won't prevent 
#665 from getting in.
   No problem i can comment out the binary test, rebase and push again.
   Just a second!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija commented on issue #665: Fixes SOLR-13539

2019-10-02 Thread GitBox
gerlowskija commented on issue #665: Fixes SOLR-13539
URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537556502
 
 
   Hi Thomas.  I'd really like to get this (#665) in for 8.3, but right now 
it's bundled to your change in #883.  I can't merge this without #883.
   
   And I haven't had the time I need to understand how the different pieces of 
#883 (response formats, field types, binary content) fit together and what the 
best design is there.
   
   So what I'm going to do is try to unbundle the two PRs, or at least flip the 
ordering.  I'll take what you've uploaded to #665 here and either comment out 
or remove entirely the tests that hit this binary-XML issue.  They can be added 
back in #883, once binary-xml works for these field types.
 
   There might still be time to get #883 in, as I have a chunk of time now that 
I didn't before.  But even if we don't get to it for 8.3, it won't prevent #665 
from getting in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13790) LRUStatsCache size explosion and ineffective caching

2019-10-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942908#comment-16942908
 ] 

Andrzej Bialecki commented on SOLR-13790:
-

Oh, and until the staleness issue is fixed I would recommend using only 
{{ExactStatsCache}} - other implementations can only make matters worse, both 
in terms of memory use and scoring inaccuracies.

> LRUStatsCache size explosion and ineffective caching
> 
>
> Key: SOLR-13790
> URL: https://issues.apache.org/jira/browse/SOLR-13790
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2, 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13790.patch, SOLR-13790.patch
>
>
> On a sizeable cluster with multi-shard multi-replica collections, when 
> {{LRUStatsCache}} was in use we encountered excessive memory usage, which 
> consequently led to severe performance problems.
> On a closer examination of the heapdumps it became apparent that when 
> {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of 
> {{FastLRUCache}} using the passed {{shard}} argument - however, the value of 
> this argument is not a simple shard name but instead it's a randomly ordered 
> list of ALL replica URLs for this shard.
> As a result, due to the combinatoric number of possible keys, over time the 
> map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries...
> The fix seems to be simply to extract the shard name and cache using this 
> name instead of the full string value of the {{shard}} parameter. Existing 
> unit tests also need much improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13790) LRUStatsCache size explosion and ineffective caching

2019-10-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942904#comment-16942904
 ] 

Andrzej Bialecki commented on SOLR-13790:
-

Upon further examination it looks like {{ExactSharedStatsCache}} and 
{{LRUStatsCache}} have a problem with staleness - they don't track updates in 
the shards so they have no way of knowing when to refresh the stats. As a 
result the global stats may be even more wrong than if we used just local stats 
- imagine a scenario where there's a heavy indexing activity that adds a lot of 
terms and postings. In this scenario local stats from the local shard would 
reflect this growth, albeit partially, but the global stats that are stale 
would not.

Another issue is with the purported optimization in {{LRUStatsCache}} and 
{{ExactSharedStatsCache}} - the claimed advantage of these caches is that they 
help to avoid unnecessary fetching of stats from shards. Only they don't ... as 
explained in my previous comment, both of these implementations always send 
ShardRequest-s to fetch the stats, thus adding one more round-trip to every 
query. Since the stats are fetched on every request at least there was no 
problem with the staleness ;) but the "caching" aspect was completely false - 
per-shard stats were being fetched on every request, and on every request new 
global stats would be built and send out.

I plan to address these issues separately, the current patch is already large.

Updated patch with the following additional changes:
 * the biggest change is that now StatsCache instances are tied to 
SolrIndexSearcher and its life-cycle and not to SolrCore - this helps to at 
least mitigate the problem of staleness and also the problem of unbound memory 
consumption of {{ExactSharedStatsCache}}. The downside is that after every 
commit the cache needs to be re-populated.
 * more optimization and safety in StatsUtil serialization code
 * fixed a bug in {{DebugComponent}} where only local stats would be used for 
explanations - this threw me off for a while, as I relied on explanations to 
explain the details of scoring :)
 * added more substance to SolrCloud unit tests

All tests are passing. If there are no objections I'd like to commit this 
shortly.

> LRUStatsCache size explosion and ineffective caching
> 
>
> Key: SOLR-13790
> URL: https://issues.apache.org/jira/browse/SOLR-13790
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2, 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13790.patch, SOLR-13790.patch
>
>
> On a sizeable cluster with multi-shard multi-replica collections, when 
> {{LRUStatsCache}} was in use we encountered excessive memory usage, which 
> consequently led to severe performance problems.
> On a closer examination of the heapdumps it became apparent that when 
> {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of 
> {{FastLRUCache}} using the passed {{shard}} argument - however, the value of 
> this argument is not a simple shard name but instead it's a randomly ordered 
> list of ALL replica URLs for this shard.
> As a result, due to the combinatoric number of possible keys, over time the 
> map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries...
> The fix seems to be simply to extract the shard name and cache using this 
> name instead of the full string value of the {{shard}} parameter. Existing 
> unit tests also need much improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13764) Parse Interval Query from JSON API

2019-10-02 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942895#comment-16942895
 ] 

Mikhail Khludnev commented on SOLR-13764:
-

I'm sorry. Need to switch to something different. 

> Parse Interval Query from JSON API
> --
>
> Key: SOLR-13764
> URL: https://issues.apache.org/jira/browse/SOLR-13764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Minor
>
> h2. Context
> Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy 
> man's Spans/Phrases. Note: It's not about ranges nor facets.
> h2. Problem
> There's no way to search by IntervalQuery via JSON Query DSL.
> h2. Suggestion
>  * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie 
> one can combine a few such refs in {{json.query.bool}}
>  * It accepts just a name of JSON params, nothing like this happens yet.
>  * This param carries plain json which is accessible via {{req.getJSON()}}
> please examine 
> https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON
>  for syntax proposal.
> h2. Challenges
>  * I have no idea about particular JSON DSL for these queries, Lucene API 
> seems like easy JSON-able. Proposals are welcome.
>  * Another awkward things is combining analysis and low level query API. eg 
> what if one request term for one word and analysis yield two tokens, and vice 
> versa requesting phrase might end up with single token stream.
>  * Putting json into Jira ticket description
> h2. Q: Why don't..
> .. put intervals DSL right into {{json.query}}, avoiding these odd param 
> refs? 
>  A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
> for handling old good http parametrized queires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13764) Parse Interval Query from JSON API

2019-10-02 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13764:

Fix Version/s: (was: 8.3)

> Parse Interval Query from JSON API
> --
>
> Key: SOLR-13764
> URL: https://issues.apache.org/jira/browse/SOLR-13764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Minor
>
> h2. Context
> Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy 
> man's Spans/Phrases. Note: It's not about ranges nor facets.
> h2. Problem
> There's no way to search by IntervalQuery via JSON Query DSL.
> h2. Suggestion
>  * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie 
> one can combine a few such refs in {{json.query.bool}}
>  * It accepts just a name of JSON params, nothing like this happens yet.
>  * This param carries plain json which is accessible via {{req.getJSON()}}
> please examine 
> https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON
>  for syntax proposal.
> h2. Challenges
>  * I have no idea about particular JSON DSL for these queries, Lucene API 
> seems like easy JSON-able. Proposals are welcome.
>  * Another awkward things is combining analysis and low level query API. eg 
> what if one request term for one word and analysis yield two tokens, and vice 
> versa requesting phrase might end up with single token stream.
>  * Putting json into Jira ticket description
> h2. Q: Why don't..
> .. put intervals DSL right into {{json.query}}, avoiding these odd param 
> refs? 
>  A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
> for handling old good http parametrized queires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13764) Parse Interval Query from JSON API

2019-10-02 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13764:

Priority: Minor  (was: Blocker)

> Parse Interval Query from JSON API
> --
>
> Key: SOLR-13764
> URL: https://issues.apache.org/jira/browse/SOLR-13764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Minor
> Fix For: 8.3
>
>
> h2. Context
> Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy 
> man's Spans/Phrases. Note: It's not about ranges nor facets.
> h2. Problem
> There's no way to search by IntervalQuery via JSON Query DSL.
> h2. Suggestion
>  * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie 
> one can combine a few such refs in {{json.query.bool}}
>  * It accepts just a name of JSON params, nothing like this happens yet.
>  * This param carries plain json which is accessible via {{req.getJSON()}}
> please examine 
> https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON
>  for syntax proposal.
> h2. Challenges
>  * I have no idea about particular JSON DSL for these queries, Lucene API 
> seems like easy JSON-able. Proposals are welcome.
>  * Another awkward things is combining analysis and low level query API. eg 
> what if one request term for one word and analysis yield two tokens, and vice 
> versa requesting phrase might end up with single token stream.
>  * Putting json into Jira ticket description
> h2. Q: Why don't..
> .. put intervals DSL right into {{json.query}}, avoiding these odd param 
> refs? 
>  A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
> for handling old good http parametrized queires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13101) Shared storage support in SolrCloud

2019-10-02 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942890#comment-16942890
 ] 

Yonik Seeley edited comment on SOLR-13101 at 10/2/19 3:10 PM:
--

bq. How far do you think is it complete? Do you forsee a lot of more work going 
in here? Or, do you suggest we start reviewing it and attempt to merge it soon 
(in a week or so?).

I think it's got a bit more to go.  It would be nice if the behavior matched 
normal solr semantics a little closer... would be easier to get better test 
coverage by reusing existing tests and changing the replica type.  Some things 
off the top of my head:
 - a commit doesn't cause latest changes to be visible on replicas (a query on 
a non-leader replica actually causes an async pull from blob of the latest 
index)
 - there are currently some concurrency issues with index pushing
 - I *think* one still needs to specify a commit to get a push to blob... this 
needs to be implicit (commit=true,openSearcher=false) for data durability by 
default
I need to dig into the code in general more... as you can see from the commits 
on the branch, this work was all done by my colleagues, not me.  But we're 
working on encouraging more open development!



was (Author: ysee...@gmail.com):
bq. How far do you think is it complete? Do you forsee a lot of more work going 
in here? Or, do you suggest we start reviewing it and attempt to merge it soon 
(in a week or so?).

I think it's got a bit more to go.  It would be nice if the behavior matched 
normal solr semantics a little closer... would be easier to get better test 
coverage by reusing existing tests and changing the replica type.  Some things 
off the top of my head:
 - a commit doesn't cause latest changes to be visible on replicas (a query on 
a non-leader replica actually causes an async pull from blob of the latest 
index)
 - there are currently some concurrency issues with index pushing
 - I 
I need to dig into the code in general more... as you can see from the commits 
on the branch, this work was all done by my colleagues, not me.  But we're 
working on encouraging more open development!


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-02 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942890#comment-16942890
 ] 

Yonik Seeley commented on SOLR-13101:
-

bq. How far do you think is it complete? Do you forsee a lot of more work going 
in here? Or, do you suggest we start reviewing it and attempt to merge it soon 
(in a week or so?).

I think it's got a bit more to go.  It would be nice if the behavior matched 
normal solr semantics a little closer... would be easier to get better test 
coverage by reusing existing tests and changing the replica type.  Some things 
off the top of my head:
 - a commit doesn't cause latest changes to be visible on replicas (a query on 
a non-leader replica actually causes an async pull from blob of the latest 
index)
 - there are currently some concurrency issues with index pushing
 - I 
I need to dig into the code in general more... as you can see from the commits 
on the branch, this work was all done by my colleagues, not me.  But we're 
working on encouraging more open development!


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2019-10-02 Thread GitBox
magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to 
support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-537538822
 
 
   Here's a thought: what if we provided a boolean configuration option like 
`assumeExternalUnicodeNormalization`. Many of these transforms work on NFD 
input, and produce NFC output, but they generally are configured defensively 
(not assuming input to be NFD, and not assuming that output will be externally 
converted to NFC).
   
   This is understandable, but results in the odd situation (for example) that 
an analysis component like "ICUTransformFilter(Cyrillic-Latin)" would have NFC 
output, but _only_ for characters whose input representation matched the 
top-level Cyrillic-Latin filter (which is pretty restrictive). Input characters 
that didn't match the top-level filter would be untouched by any component of 
the underlying CompoundTransliterator. So if you want fully unicode-normalized 
output (and in the context of an analysis chain, most do), you have to 
separately apply post-transform NFD normalization anyway.
   
   At best, for this ends up doing some redundant work; but for the performance 
case we're considering here, there are particular implications. NFC, as a 
trailing transformation step, is both _very_ common and _very_ active -- active 
in the sense that it will in many common contexts block output waiting for 
combining diacritics for literally almost every character. If we know we're 
externally applying unicode normalization over the entire output, skipping 
baked-in post-NFC for every transform component avoids redundant work, but more 
importantly avoids a common case that's virtually guaranteed to result in a 
substantial amount of partial transliteration, rollback, etc. I think this can 
be done relatively cleanly using Transliterator getElements(), toRules(false), 
and createFromRules(...).
   
   I'd be curious to know what you think, @msokolov, and perhaps @rmuir?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13790) LRUStatsCache size explosion and ineffective caching

2019-10-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13790:

Attachment: SOLR-13790.patch

> LRUStatsCache size explosion and ineffective caching
> 
>
> Key: SOLR-13790
> URL: https://issues.apache.org/jira/browse/SOLR-13790
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2, 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13790.patch, SOLR-13790.patch
>
>
> On a sizeable cluster with multi-shard multi-replica collections, when 
> {{LRUStatsCache}} was in use we encountered excessive memory usage, which 
> consequently led to severe performance problems.
> On a closer examination of the heapdumps it became apparent that when 
> {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of 
> {{FastLRUCache}} using the passed {{shard}} argument - however, the value of 
> this argument is not a simple shard name but instead it's a randomly ordered 
> list of ALL replica URLs for this shard.
> As a result, due to the combinatoric number of possible keys, over time the 
> map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries...
> The fix seems to be simply to extract the shard name and cache using this 
> name instead of the full string value of the {{shard}} parameter. Existing 
> unit tests also need much improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on issue #916: LUCENE-8213: Asynchronous Caching in 
LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#issuecomment-537530015
 
 
   @jpountz Updated, please see and let me know your thoughts


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942871#comment-16942871
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 97c516b9ba805717033e20ba7deaee5006cb5bce in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=97c516b ]

SOLR-13105: Update regression header toc 3


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942869#comment-16942869
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 4ed2ac4a9f94079494d388e44165e1d8ce9d511a in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4ed2ac4 ]

SOLR-13105: Update regression header toc 2


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942868#comment-16942868
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit c9455b27be50a4bf04b50ad597cd0a412743be33 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c9455b2 ]

SOLR-13105: Update regression header toc 1


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942866#comment-16942866
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit f1d6b5efc946163fa1e554387a5ced4767e41332 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f1d6b5e ]

SOLR-13105: Add simulations header toc 1


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942862#comment-16942862
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit a0878f9325b8557c533d621e4cfeb09ea245891d in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a0878f9 ]

SOLR-13105: Add simulations header toc


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330584691
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/QueryCache.java
 ##
 @@ -33,4 +35,10 @@
*/
   Weight doCache(Weight weight, QueryCachingPolicy policy);
 
+  /**
+   * Same as above, but allows passing in an Executor to perform caching
+   * asynchronously
+   */
+  Weight doCache(Weight weight, QueryCachingPolicy policy, Executor executor);
 
 Review comment:
   We cannot remove this constructor since it is defined in `QueryCache` -- 
made it directly delegate to the new constructor


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330582970
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -88,13 +93,36 @@
  * @lucene.experimental
  */
 public class LRUQueryCache implements QueryCache, Accountable {
+  /** Act as key for the inflight queries map */
+  private static class MapKey {
+private final Query query;
+private final IndexReader.CacheKey cacheKey;
+
+public MapKey(Query query, IndexReader.CacheKey cacheKey) {
+  this.query = query;
+  this.cacheKey = cacheKey;
+}
+
+public Query getQuery() {
+  return query;
+}
+
+public IndexReader.CacheKey getCacheKey() {
+  return cacheKey;
+}
+  }
 
   private final int maxSize;
   private final long maxRamBytesUsed;
   private final Predicate leavesToCache;
   // maps queries that are contained in the cache to a singleton so that this
   // cache does not store several copies of the same query
   private final Map uniqueQueries;
+  // Marks the inflight queries that are being asynchronously loaded into the 
cache
+  // This is used to ensure that multiple threads do not trigger loading
+  // of the same query in the same cache. We use a set because it is an 
invariant that
+  // the entries of this data structure be unique.
+  private final Map inFlightAsyncLoadQueries = 
new HashMap<>();
 
 Review comment:
   Moved to set


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330581040
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -389,6 +431,7 @@ public void clear() {
   cache.clear();
   // Note that this also clears the uniqueQueries map since 
mostRecentlyUsedQueries is the uniqueQueries.keySet view:
   mostRecentlyUsedQueries.clear();
+  inFlightAsyncLoadQueries.clear();
 
 Review comment:
   Removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330580984
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -368,10 +400,20 @@ public void clearQuery(Query query) {
 onEviction(singleton);
   }
 } finally {
+  removeQuery(query);
 
 Review comment:
   Removed, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: 
Use entryset for map iteration wherever possible. - part 2
URL: https://github.com/apache/lucene-solr/pull/881#discussion_r330579248
 
 

 ##
 File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/utils/Config.java
 ##
 @@ -403,15 +404,15 @@ public String getColsValuesForValsByRound(int roundNum) {
   return "";
 }
 StringBuilder sb = new StringBuilder();
-for (final String name : colForValByRound.keySet()) {
-  String colName = colForValByRound.get(name);
+for (final Map.Entry entry : colForValByRound.entrySet()) {
+  String colName = entry.getValue();
   String template = " " + colName;
   if (roundNum < 0) {
 // just append blanks
 sb.append(Format.formatPaddLeft("-", template));
   } else {
 // append actual values, for that round
-Object a = valByRound.get(name);
+Object a = valByRound.get(entry.getKey());
 
 Review comment:
   can you extract a variable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: 
Use entryset for map iteration wherever possible. - part 2
URL: https://github.com/apache/lucene-solr/pull/881#discussion_r330578921
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/query/QueryAutoStopWordAnalyzer.java
 ##
 @@ -200,10 +200,10 @@ protected TokenStreamComponents wrapComponents(String 
fieldName, TokenStreamComp
*/
   public Term[] getStopWords() {
 List allStopWords = new ArrayList<>();
-for (String fieldName : stopWordsPerField.keySet()) {
-  Set stopWords = stopWordsPerField.get(fieldName);
+for (Map.Entry> entry : stopWordsPerField.entrySet()) {
+  Set stopWords = entry.getValue();
   for (String text : stopWords) {
-allStopWords.add(new Term(fieldName, text));
+allStopWords.add(new Term(entry.getKey(), text));
 
 Review comment:
   can you extract a variable to improve readability?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: 
Use entryset for map iteration wherever possible. - part 2
URL: https://github.com/apache/lucene-solr/pull/881#discussion_r330579524
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
 ##
 @@ -506,10 +506,10 @@ public Query rewrite(IndexReader reader) throws 
IOException {
 
   @Override
   public void visit(QueryVisitor visitor) {
-for (BooleanClause.Occur occur : clauseSets.keySet()) {
-  if (clauseSets.get(occur).size() > 0) {
-QueryVisitor v = visitor.getSubVisitor(occur, this);
-for (Query q : clauseSets.get(occur)) {
+for (Map.Entry> entry : clauseSets.entrySet()) {
+  if (entry.getValue().size() > 0) {
+QueryVisitor v = visitor.getSubVisitor(entry.getKey(), this);
+for (Query q : entry.getValue()) {
 
 Review comment:
   can you extract variables?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330579304
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -813,8 +918,23 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean cacheSynchronously = executor == null;
+  // If asynchronous caching is requested, perform the same and return
+  // the uncached iterator
+  if (cacheSynchronously == false) {
+cacheSynchronously = cacheAsynchronously(context, cacheHelper);
+
+// If async caching failed, we will perform synchronous caching
+// hence do not return the uncached value here
+if (cacheSynchronously == false) {
 
 Review comment:
   Not necessarily -- cacheAsynchronously() might have failed, in which case it 
will return a true and this code path will trigger a synchronous caching


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330578984
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -732,8 +821,24 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean cacheSynchronously = executor == null;
+
+  // If asynchronous caching is requested, perform the same and return
+  // the uncached iterator
+  if (cacheSynchronously == false) {
+cacheSynchronously = cacheAsynchronously(context, cacheHelper);
+
+// If async caching failed, synchronous caching will
+// be performed, hence do not return the uncached value
+if (cacheSynchronously == false) {
+  return in.scorerSupplier(context);
+}
+  }
+
+  if (cacheSynchronously) {
 
 Review comment:
   We need this to be checked even after async caching since async caching 
might have failed, in which case we will have to perform a synchronous caching


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330577294
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -88,13 +93,36 @@
  * @lucene.experimental
  */
 public class LRUQueryCache implements QueryCache, Accountable {
+  /** Act as key for the inflight queries map */
+  private static class MapKey {
+private final Query query;
+private final IndexReader.CacheKey cacheKey;
+
+public MapKey(Query query, IndexReader.CacheKey cacheKey) {
+  this.query = query;
+  this.cacheKey = cacheKey;
+}
+
+public Query getQuery() {
+  return query;
+}
+
+public IndexReader.CacheKey getCacheKey() {
+  return cacheKey;
+}
+  }
 
 Review comment:
   Oops, dont know how it did not make it in this commit, let me check that 
right away


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330575572
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -88,13 +93,36 @@
  * @lucene.experimental
  */
 public class LRUQueryCache implements QueryCache, Accountable {
+  /** Act as key for the inflight queries map */
+  private static class MapKey {
+private final Query query;
+private final IndexReader.CacheKey cacheKey;
+
+public MapKey(Query query, IndexReader.CacheKey cacheKey) {
+  this.query = query;
+  this.cacheKey = cacheKey;
+}
+
+public Query getQuery() {
+  return query;
+}
+
+public IndexReader.CacheKey getCacheKey() {
+  return cacheKey;
+}
+  }
 
   private final int maxSize;
   private final long maxRamBytesUsed;
   private final Predicate leavesToCache;
   // maps queries that are contained in the cache to a singleton so that this
   // cache does not store several copies of the same query
   private final Map uniqueQueries;
+  // Marks the inflight queries that are being asynchronously loaded into the 
cache
+  // This is used to ensure that multiple threads do not trigger loading
+  // of the same query in the same cache. We use a set because it is an 
invariant that
+  // the entries of this data structure be unique.
+  private final Map inFlightAsyncLoadQueries = 
new HashMap<>();
 
 Review comment:
   I originally used a Set, but moved it to a map specifically to enable tests 
for the double caching case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942851#comment-16942851
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 9060aee4d8e8cd4b14846dc9990d650e390fdb09 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9060aee ]

SOLR-13105: Update machine learning docs 11


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330550479
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/QueryCache.java
 ##
 @@ -33,4 +35,10 @@
*/
   Weight doCache(Weight weight, QueryCachingPolicy policy);
 
+  /**
+   * Same as above, but allows passing in an Executor to perform caching
+   * asynchronously
+   */
+  Weight doCache(Weight weight, QueryCachingPolicy policy, Executor executor);
 
 Review comment:
   Let's remove the other doCache and only have this one, with a `null` 
executor signaling that things should get cached in the current thread?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330554876
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -88,13 +93,36 @@
  * @lucene.experimental
  */
 public class LRUQueryCache implements QueryCache, Accountable {
+  /** Act as key for the inflight queries map */
+  private static class MapKey {
+private final Query query;
+private final IndexReader.CacheKey cacheKey;
+
+public MapKey(Query query, IndexReader.CacheKey cacheKey) {
+  this.query = query;
+  this.cacheKey = cacheKey;
+}
+
+public Query getQuery() {
+  return query;
+}
+
+public IndexReader.CacheKey getCacheKey() {
+  return cacheKey;
+}
+  }
 
 Review comment:
   We need equals/hashcode, or this will never prevent the caching of the same 
query multiple times.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330564331
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -832,5 +952,47 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
   return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, 
ScoreMode.COMPLETE_NO_SCORES, disi));
 }
 
+// Perform a cache load asynchronously
+// @return true if synchronous caching is needed, false otherwise
+private boolean cacheAsynchronously(LeafReaderContext context, 
IndexReader.CacheHelper cacheHelper) {
+  /*
+   * If the current query is already being asynchronously cached,
+   * do not trigger another cache operation
+   */
+  Object returnValue = inFlightAsyncLoadQueries.putIfAbsent(new 
MapKey(in.getQuery(),
+  cacheHelper.getKey()), cacheHelper.getKey());
+
+  assert returnValue == null || returnValue == cacheHelper.getKey();
+
+  if (returnValue != null) {
+return false;
+  }
+
+  FutureTask task = new FutureTask<>(() -> {
+DocIdSet localDocIdSet = cache(context);
+putIfAbsent(in.getQuery(), localDocIdSet, cacheHelper);
+
+// Remove the key from inflight -- the key is loaded now
+Object retValue = inFlightAsyncLoadQueries.remove(new 
MapKey(in.getQuery(), cacheHelper.getKey()));
+
+// The query should have been present in the inflight queries set 
before
+// we actually loaded it -- hence the removal of the key should be 
successful
+assert retValue != null;
+
+if (countDownLatch != null) {
+  countDownLatch.countDown();
+}
+
+return null;
+  });
+  try {
+executor.execute(task);
+  } catch (RejectedExecutionException e) {
+// Trigger synchronous caching
+return true;
+  }
 
 Review comment:
   Same here, we need to remove from inFlightAsyncLoadQueries on every code 
path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330553473
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -389,6 +431,7 @@ public void clear() {
   cache.clear();
   // Note that this also clears the uniqueQueries map since 
mostRecentlyUsedQueries is the uniqueQueries.keySet view:
   mostRecentlyUsedQueries.clear();
+  inFlightAsyncLoadQueries.clear();
 
 Review comment:
   same here, I don't think it's correct?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330546874
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -813,8 +918,23 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean cacheSynchronously = executor == null;
+  // If asynchronous caching is requested, perform the same and return
+  // the uncached iterator
+  if (cacheSynchronously == false) {
+cacheSynchronously = cacheAsynchronously(context, cacheHelper);
+
+// If async caching failed, we will perform synchronous caching
+// hence do not return the uncached value here
+if (cacheSynchronously == false) {
 
 Review comment:
   cacheSynchronously is necessarily false already?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330572346
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -448,13 +491,48 @@ void assertConsistent() {
 }
   }
 
+  // pkg-private for testing
+  void setCountDownLatch(CountDownLatch latch) {
 
 Review comment:
   do you think we could avoid setting a latch here, and maybe instead calling 
countDown from a subclass' onDocIdSetCache?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330552818
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -368,10 +400,20 @@ public void clearQuery(Query query) {
 onEviction(singleton);
   }
 } finally {
+  removeQuery(query);
 
 Review comment:
   I don't think this is correct? The fact that we are removing entries for a 
query doesn't cancel the loading of cache entries?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330549857
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -832,5 +952,47 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
   return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, 
ScoreMode.COMPLETE_NO_SCORES, disi));
 }
 
+// Perform a cache load asynchronously
+// @return true if synchronous caching is needed, false otherwise
+private boolean cacheAsynchronously(LeafReaderContext context, 
IndexReader.CacheHelper cacheHelper) {
+  /*
+   * If the current query is already being asynchronously cached,
+   * do not trigger another cache operation
+   */
+  Object returnValue = inFlightAsyncLoadQueries.putIfAbsent(new 
MapKey(in.getQuery(),
+  cacheHelper.getKey()), cacheHelper.getKey());
+
+  assert returnValue == null || returnValue == cacheHelper.getKey();
+
+  if (returnValue != null) {
+return false;
+  }
+
+  FutureTask task = new FutureTask<>(() -> {
+DocIdSet localDocIdSet = cache(context);
+putIfAbsent(in.getQuery(), localDocIdSet, cacheHelper);
+
+// Remove the key from inflight -- the key is loaded now
+Object retValue = inFlightAsyncLoadQueries.remove(new 
MapKey(in.getQuery(), cacheHelper.getKey()));
 
 Review comment:
   We should probably put it in a finally block to make sure it runs even in 
case of exceptions in above calls. Otherwise we'd have a memory leak.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330546325
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -732,8 +821,24 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean cacheSynchronously = executor == null;
+
+  // If asynchronous caching is requested, perform the same and return
+  // the uncached iterator
+  if (cacheSynchronously == false) {
+cacheSynchronously = cacheAsynchronously(context, cacheHelper);
+
+// If async caching failed, synchronous caching will
+// be performed, hence do not return the uncached value
+if (cacheSynchronously == false) {
+  return in.scorerSupplier(context);
+}
+  }
+
+  if (cacheSynchronously) {
 
 Review comment:
   make it an `else`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache

2019-10-02 Thread GitBox
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous 
Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330545241
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -88,13 +93,36 @@
  * @lucene.experimental
  */
 public class LRUQueryCache implements QueryCache, Accountable {
+  /** Act as key for the inflight queries map */
+  private static class MapKey {
+private final Query query;
+private final IndexReader.CacheKey cacheKey;
+
+public MapKey(Query query, IndexReader.CacheKey cacheKey) {
+  this.query = query;
+  this.cacheKey = cacheKey;
+}
+
+public Query getQuery() {
+  return query;
+}
+
+public IndexReader.CacheKey getCacheKey() {
+  return cacheKey;
+}
+  }
 
   private final int maxSize;
   private final long maxRamBytesUsed;
   private final Predicate leavesToCache;
   // maps queries that are contained in the cache to a singleton so that this
   // cache does not store several copies of the same query
   private final Map uniqueQueries;
+  // Marks the inflight queries that are being asynchronously loaded into the 
cache
+  // This is used to ensure that multiple threads do not trigger loading
+  // of the same query in the same cache. We use a set because it is an 
invariant that
+  // the entries of this data structure be unique.
+  private final Map inFlightAsyncLoadQueries = 
new HashMap<>();
 
 Review comment:
   use a set instead? I see you added a couple of assertions on return values, 
but they don't seem to add more value than what we could get with a set?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz closed pull request #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-10-02 Thread GitBox
jpountz closed pull request #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-10-02 Thread GitBox
jpountz commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-537483307
 
 
   Closing now that @dsmiley merged this change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942791#comment-16942791
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 4481e5ba9f94f1182cc228653d722bc09689a7df in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4481e5b ]

SOLR-13105: Update machine learning docs 10


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >