date:20190930

[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #911: SOLR-13802: Write analyzer property luceneMatchVersion to managed schema

2019-09-30 Thread GitBox

thomaswoeckinger commented on a change in pull request #911: SOLR-13802: Write 
analyzer property luceneMatchVersion to managed schema
URL: https://github.com/apache/lucene-solr/pull/911#discussion_r329888692
 
 

 ##
 File path: 
solr/core/src/test/org/apache/solr/analysis/TestWordDelimiterFilterFactory.java
 ##
 @@ -222,7 +223,7 @@ public void testCustomTypes() throws Exception {
 /* custom behavior */
 args = new HashMap<>();
 // use a custom type mapping
-args.put("luceneMatchVersion", Version.LATEST.toString());
+args.put(IndexSchema.LUCENE_MATCH_VERSION_PARAM, 
Version.LATEST.toString());
 
 Review comment:
   later comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on issue #910: SOLR-13661 : SOLR-13661 A package management system for Solr

2019-09-30 Thread GitBox

dsmiley commented on issue #910: SOLR-13661 :  SOLR-13661 A package management 
system for Solr
URL: https://github.com/apache/lucene-solr/pull/910#issuecomment-536862873
 
 
   The scope here is too much to review.   Sub-system / component is more 
approachable.  For example the new blob thingy ought to be it's own issue/PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #911: SOLR-13802: Write analyzer property luceneMatchVersion to managed schema

2019-09-30 Thread GitBox

dsmiley commented on a change in pull request #911: SOLR-13802: Write analyzer 
property luceneMatchVersion to managed schema
URL: https://github.com/apache/lucene-solr/pull/911#discussion_r329877193
 
 

 ##
 File path: 
solr/core/src/test/org/apache/solr/analysis/TestWordDelimiterFilterFactory.java
 ##
 @@ -222,7 +223,7 @@ public void testCustomTypes() throws Exception {
 /* custom behavior */
 args = new HashMap<>();
 // use a custom type mapping
-args.put("luceneMatchVersion", Version.LATEST.toString());
+args.put(IndexSchema.LUCENE_MATCH_VERSION_PARAM, 
Version.LATEST.toString());
 
 Review comment:
   not ugly (see later comment)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #911: SOLR-13802: Write analyzer property luceneMatchVersion to managed schema

2019-09-30 Thread GitBox

dsmiley commented on a change in pull request #911: SOLR-13802: Write analyzer 
property luceneMatchVersion to managed schema
URL: https://github.com/apache/lucene-solr/pull/911#discussion_r329877133
 
 

 ##
 File path: 
solr/core/src/test/org/apache/solr/rest/schema/TestSerializedLuceneMatchVersion.java
 ##
 @@ -44,13 +45,13 @@ public void testExplicitLuceneMatchVersions() throws 
Exception {
 "count(/response/lst[@name='fieldType']) = 1",
 
 
"//lst[str[@name='class'][.='org.apache.solr.analysis.MockCharFilterFactory']]"
-   +" [str[@name='luceneMatchVersion'][.='4.0.0']]",
+   +" [str[@name='" + IndexSchema.LUCENE_MATCH_VERSION_PARAM + 
"'][.='4.0.0']]",
 
 Review comment:
   again; here in particular it's ugly


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #911: SOLR-13802: Write analyzer property luceneMatchVersion to managed schema

2019-09-30 Thread GitBox

dsmiley commented on a change in pull request #911: SOLR-13802: Write analyzer 
property luceneMatchVersion to managed schema
URL: https://github.com/apache/lucene-solr/pull/911#discussion_r329876510
 
 

 ##
 File path: 
solr/core/src/test/org/apache/solr/rest/schema/TestBulkSchemaAPI.java
 ##
 @@ -125,7 +126,7 @@ public void testAnalyzerClass() throws Exception {
 "'name' : 'myNewTextFieldWithAnalyzerClass',\n" +
 "'class':'solr.TextField',\n" +
 "'analyzer' : {\n" +
-"'luceneMatchVersion':'5.0.0',\n" +
+"'" + IndexSchema.LUCENE_MATCH_VERSION_PARAM + "':'5.0.0',\n" +
 
 Review comment:
   Using Constants in tests like here is a bit too far IMO.  It slightly 
obscures readability and there's a point to be made that changing the 
input/output constant _should_ break a test.  Subjective, I know.   I don't 
mean to suggest the removal of all constants in tests but here in particular 
it's ugly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13802) Analyzer property luceneMatchVersion is not written to managed schema

2019-09-30 Thread David Wayne Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Wayne Smiley reassigned SOLR-13802:
-

Assignee: David Wayne Smiley

> Analyzer property luceneMatchVersion is not written to managed schema
> -
>
> Key: SOLR-13802
> URL: https://issues.apache.org/jira/browse/SOLR-13802
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Affects Versions: 7.7.2, master (9.0), 8.2
>Reporter: Thomas Wöckinger
>Assignee: David Wayne Smiley
>Priority: Major
>  Labels: easy-fix, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The analyzer property luceneMatchVersion is no written to managed schema, it 
> is simply not handled by the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-13804) ant precommit fails on OpenJDK11 Corretto

2019-09-30 Thread Koen De Groote (Jira)

Koen De Groote created SOLR-13804:
-

 Summary: ant precommit fails on OpenJDK11 Corretto
 Key: SOLR-13804
 URL: https://issues.apache.org/jira/browse/SOLR-13804
 Project: Solr
  Issue Type: Bug
Reporter: Koen De Groote
 Attachments: ant_precommit_fails.txt

Noticed this while preparing for another pull request. I've attached a file 
with the output of the command. Errors start at point 4.

 

I'm not sure if this is down to it being Corretto, or just anything other than 
Oracle JDK.

At the time of writing latest commit was 
67f4c7f36eef2ae75fb80859dfc0e612675cb94d

My knowledge does not extend far enough to say anything meaningful about this.

So I'm asking here for people to take a look at it.

 

Corretto is Amazon's OpenJDK implementation: 
https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/what-is-corretto-11.html

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13771) Add -v and -m to ulimit section of reference guide and bin/solr checks

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941372#comment-16941372
 ] 

ASF subversion and git services commented on SOLR-13771:


Commit 2f0dc888f51ff5b763d1f49aa7b2e621c274d00e in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2f0dc88 ]

SOLR-13771: Add -v and -m to ulimit section of reference guide  and bin/solr 
checks. Forgot CHANGES.txt entry

(cherry picked from commit 67f4c7f36eef2ae75fb80859dfc0e612675cb94d)


> Add -v and -m to ulimit section of reference guide  and bin/solr checks
> ---
>
> Key: SOLR-13771
> URL: https://issues.apache.org/jira/browse/SOLR-13771
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13771.patch
>
>
> I just noticed these bits in MMapDirectory.java
> {code}
> if (!Constants.JRE_IS_64BIT) {
>   moreInfo = "MMapDirectory should only be used on 64bit platforms, 
> because the address space on 32bit operating systems is too small. ";
> } else if (Constants.WINDOWS) {
>   moreInfo = "Windows is unfortunately very limited on virtual address 
> space. If your index size is several hundred Gigabytes, consider changing to 
> Linux. ";
> } else if (Constants.LINUX) {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'), and 'sysctl vm.max_map_count'. ";
> } else {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'). ";
> }
> {code}
> We should add this info to the ref guide, particularly the bits about -v and 
> -m. We already mention ulimits, but only in relation to file handles and 
> processes.
> What about restructuring that section a bit, to something like "operating 
> system settings", so we can include some of the information above.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13771) Add -v and -m to ulimit section of reference guide and bin/solr checks

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941369#comment-16941369
 ] 

ASF subversion and git services commented on SOLR-13771:


Commit 67f4c7f36eef2ae75fb80859dfc0e612675cb94d in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=67f4c7f ]

SOLR-13771: Add -v and -m to ulimit section of reference guide  and bin/solr 
checks. Forgot CHANGES.txt entry


> Add -v and -m to ulimit section of reference guide  and bin/solr checks
> ---
>
> Key: SOLR-13771
> URL: https://issues.apache.org/jira/browse/SOLR-13771
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13771.patch
>
>
> I just noticed these bits in MMapDirectory.java
> {code}
> if (!Constants.JRE_IS_64BIT) {
>   moreInfo = "MMapDirectory should only be used on 64bit platforms, 
> because the address space on 32bit operating systems is too small. ";
> } else if (Constants.WINDOWS) {
>   moreInfo = "Windows is unfortunately very limited on virtual address 
> space. If your index size is several hundred Gigabytes, consider changing to 
> Linux. ";
> } else if (Constants.LINUX) {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'), and 'sysctl vm.max_map_count'. ";
> } else {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'). ";
> }
> {code}
> We should add this info to the ref guide, particularly the bits about -v and 
> -m. We already mention ulimits, but only in relation to file handles and 
> processes.
> What about restructuring that section a bit, to something like "operating 
> system settings", so we can include some of the information above.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13771) Add -v and -m to ulimit section of reference guide and bin/solr checks

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941367#comment-16941367
 ] 

ASF subversion and git services commented on SOLR-13771:


Commit a1f3d2c29a1b61ac01e5defcb097695c43aaadd9 in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a1f3d2c ]

SOLR-13771: Add -v and -m to ulimit section of reference guide and bin/solr 
checks


> Add -v and -m to ulimit section of reference guide  and bin/solr checks
> ---
>
> Key: SOLR-13771
> URL: https://issues.apache.org/jira/browse/SOLR-13771
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13771.patch
>
>
> I just noticed these bits in MMapDirectory.java
> {code}
> if (!Constants.JRE_IS_64BIT) {
>   moreInfo = "MMapDirectory should only be used on 64bit platforms, 
> because the address space on 32bit operating systems is too small. ";
> } else if (Constants.WINDOWS) {
>   moreInfo = "Windows is unfortunately very limited on virtual address 
> space. If your index size is several hundred Gigabytes, consider changing to 
> Linux. ";
> } else if (Constants.LINUX) {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'), and 'sysctl vm.max_map_count'. ";
> } else {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'). ";
> }
> {code}
> We should add this info to the ref guide, particularly the bits about -v and 
> -m. We already mention ulimits, but only in relation to file handles and 
> processes.
> What about restructuring that section a bit, to something like "operating 
> system settings", so we can include some of the information above.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13771) Add -v and -m to ulimit section of reference guide and bin/solr checks

2019-09-30 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13771:
--
Summary: Add -v and -m to ulimit section of reference guide  and bin/solr 
checks  (was: Add -v and -m to ulimit section of reference guide)

> Add -v and -m to ulimit section of reference guide  and bin/solr checks
> ---
>
> Key: SOLR-13771
> URL: https://issues.apache.org/jira/browse/SOLR-13771
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13771.patch
>
>
> I just noticed these bits in MMapDirectory.java
> {code}
> if (!Constants.JRE_IS_64BIT) {
>   moreInfo = "MMapDirectory should only be used on 64bit platforms, 
> because the address space on 32bit operating systems is too small. ";
> } else if (Constants.WINDOWS) {
>   moreInfo = "Windows is unfortunately very limited on virtual address 
> space. If your index size is several hundred Gigabytes, consider changing to 
> Linux. ";
> } else if (Constants.LINUX) {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'), and 'sysctl vm.max_map_count'. ";
> } else {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'). ";
> }
> {code}
> We should add this info to the ref guide, particularly the bits about -v and 
> -m. We already mention ulimits, but only in relation to file handles and 
> processes.
> What about restructuring that section a bit, to something like "operating 
> system settings", so we can include some of the information above.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13771) Add -v and -m to ulimit section of reference guide

2019-09-30 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13771:
--
Attachment: SOLR-13771.patch
Status: Open  (was: Open)

Doc change, also I changed bin/solr to check these two ulimits. Committing 
momentarily

> Add -v and -m to ulimit section of reference guide
> --
>
> Key: SOLR-13771
> URL: https://issues.apache.org/jira/browse/SOLR-13771
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13771.patch
>
>
> I just noticed these bits in MMapDirectory.java
> {code}
> if (!Constants.JRE_IS_64BIT) {
>   moreInfo = "MMapDirectory should only be used on 64bit platforms, 
> because the address space on 32bit operating systems is too small. ";
> } else if (Constants.WINDOWS) {
>   moreInfo = "Windows is unfortunately very limited on virtual address 
> space. If your index size is several hundred Gigabytes, consider changing to 
> Linux. ";
> } else if (Constants.LINUX) {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'), and 'sysctl vm.max_map_count'. ";
> } else {
>   moreInfo = "Please review 'ulimit -v', 'ulimit -m' (both should return 
> 'unlimited'). ";
> }
> {code}
> We should add this info to the ref guide, particularly the bits about -v and 
> -m. We already mention ulimits, but only in relation to file handles and 
> processes.
> What about restructuring that section a bit, to something like "operating 
> system settings", so we can include some of the information above.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941362#comment-16941362
 ] 

Ishan Chattopadhyaya commented on SOLR-13101:
-

Can we collaborate over the ASF slack for discussing harmonizing the 3 blob 
stores? I am okay with having all three, if they serve different usecases; just 
that we need to have a cohesive and consistent story around it in terms of 
documentation.

bq. I plan on creating a branch jira/SOLR-13101 soon for future work on this 
issue.
How far do you think is it complete? Do you forsee a lot of more work going in 
here? Or, do you suggest we start reviewing it and attempt to merge it soon (in 
a week or so?).

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941355#comment-16941355
 ] 

Ishan Chattopadhyaya commented on SOLR-13661:
-

As per a discussion in 8.3 release thread, branch cutting has been delayed for 
a week.
Let us do the needful (either revert or review/merge this PR) by then.

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13764) Parse Interval Query from JSON API

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13764:

Fix Version/s: 8.3

> Parse Interval Query from JSON API
> --
>
> Key: SOLR-13764
> URL: https://issues.apache.org/jira/browse/SOLR-13764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Blocker
> Fix For: 8.3
>
>
> h2. Context
> Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy 
> man's Spans/Phrases. Note: It's not about ranges nor facets.
> h2. Problem
> There's no way to search by IntervalQuery via JSON Query DSL.
> h2. Suggestion
>  * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie 
> one can combine a few such refs in {{json.query.bool}}
>  * It accepts just a name of JSON params, nothing like this happens yet.
>  * This param carries plain json which is accessible via {{req.getJSON()}}
> please examine 
> https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON
>  for syntax proposal.
> h2. Challenges
>  * I have no idea about particular JSON DSL for these queries, Lucene API 
> seems like easy JSON-able. Proposals are welcome.
>  * Another awkward things is combining analysis and low level query API. eg 
> what if one request term for one word and analysis yield two tokens, and vice 
> versa requesting phrase might end up with single token stream.
>  * Putting json into Jira ticket description
> h2. Q: Why don't..
> .. put intervals DSL right into {{json.query}}, avoiding these odd param 
> refs? 
>  A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
> for handling old good http parametrized queires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13764) Parse Interval Query from JSON API

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941353#comment-16941353
 ] 

Ishan Chattopadhyaya commented on SOLR-13764:
-

I'm marking this as a 8.3 blocker just keep a track of issues that we discussed 
for 8.3. If this doesn't finish up by 8.3 timeframe, I'll downgrade the 
priority. It will be good to have it out with 8.3, so hopefully we can have it 
in a week. Thanks [~mkhl].

> Parse Interval Query from JSON API
> --
>
> Key: SOLR-13764
> URL: https://issues.apache.org/jira/browse/SOLR-13764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>
> h2. Context
> Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy 
> man's Spans/Phrases. Note: It's not about ranges nor facets.
> h2. Problem
> There's no way to search by IntervalQuery via JSON Query DSL.
> h2. Suggestion
>  * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie 
> one can combine a few such refs in {{json.query.bool}}
>  * It accepts just a name of JSON params, nothing like this happens yet.
>  * This param carries plain json which is accessible via {{req.getJSON()}}
> please examine 
> https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON
>  for syntax proposal.
> h2. Challenges
>  * I have no idea about particular JSON DSL for these queries, Lucene API 
> seems like easy JSON-able. Proposals are welcome.
>  * Another awkward things is combining analysis and low level query API. eg 
> what if one request term for one word and analysis yield two tokens, and vice 
> versa requesting phrase might end up with single token stream.
>  * Putting json into Jira ticket description
> h2. Q: Why don't..
> .. put intervals DSL right into {{json.query}}, avoiding these odd param 
> refs? 
>  A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
> for handling old good http parametrized queires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13764) Parse Interval Query from JSON API

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13764:

Priority: Blocker  (was: Major)

> Parse Interval Query from JSON API
> --
>
> Key: SOLR-13764
> URL: https://issues.apache.org/jira/browse/SOLR-13764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Blocker
>
> h2. Context
> Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy 
> man's Spans/Phrases. Note: It's not about ranges nor facets.
> h2. Problem
> There's no way to search by IntervalQuery via JSON Query DSL.
> h2. Suggestion
>  * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie 
> one can combine a few such refs in {{json.query.bool}}
>  * It accepts just a name of JSON params, nothing like this happens yet.
>  * This param carries plain json which is accessible via {{req.getJSON()}}
> please examine 
> https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON
>  for syntax proposal.
> h2. Challenges
>  * I have no idea about particular JSON DSL for these queries, Lucene API 
> seems like easy JSON-able. Proposals are welcome.
>  * Another awkward things is combining analysis and low level query API. eg 
> what if one request term for one word and analysis yield two tokens, and vice 
> versa requesting phrase might end up with single token stream.
>  * Putting json into Jira ticket description
> h2. Q: Why don't..
> .. put intervals DSL right into {{json.query}}, avoiding these odd param 
> refs? 
>  A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
> for handling old good http parametrized queires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] magibney commented on a change in pull request #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2019-09-30 Thread GitBox

magibney commented on a change in pull request #892: LUCENE-8972: Add 
ICUTransformCharFilter, to support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892#discussion_r329744248
 
 

 ##
 File path: 
lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUTransformCharFilter.java
 ##
 @@ -0,0 +1,322 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.icu;
+
+import java.io.IOException;
+import java.io.Reader;
+
+import com.ibm.icu.text.ReplaceableString;
+import com.ibm.icu.text.Transliterator;
+import com.ibm.icu.text.Transliterator.Position;
+import com.ibm.icu.text.UTF16;
+
+import org.apache.lucene.analysis.CharFilter;
+import org.apache.lucene.analysis.charfilter.BaseCharFilter;
+import org.apache.lucene.util.ArrayUtil;
+
+/**
+ * A {@link CharFilter} that transforms text with ICU.
+ * 
+ * ICU provides text-transformation functionality via its Transliteration API.
+ * Although script conversion is its most common use, a Transliterator can
+ * actually perform a more general class of tasks. In fact, Transliterator
+ * defines a very general API which specifies only that a segment of the input
+ * text is replaced by new text. The particulars of this conversion are
+ * determined entirely by subclasses of Transliterator.
+ * 
+ * 
+ * Some useful transformations for search are built-in:
+ * 
+ * Conversion from Traditional to Simplified Chinese characters
+ * Conversion from Hiragana to Katakana
+ * Conversion from Fullwidth to Halfwidth forms.
+ * Script conversions, for example Serbian Cyrillic to Latin
+ * 
+ * 
+ * Example usage: stream = new ICUTransformCharFilter(reader,
+ * Transliterator.getInstance("Traditional-Simplified"));
+ * 
+ * For more details, see the http://userguide.icu-project.org/transforms/general;>ICU User
+ * Guide.
+ */
+public final class ICUTransformCharFilter extends BaseCharFilter {
+
+  // Transliterator to transform the text
+  private final Transliterator transform;
+
+  // Reusable position object
+  private final Position position = new Position();
+
+  private static final int READ_BUFFER_SIZE = 1024;
+  private final char[] tmpBuffer = new char[READ_BUFFER_SIZE];
+
+  private static final int INITIAL_TRANSLITERATE_BUFFER_SIZE = 1024;
+  private final StringBuffer buffer = new 
StringBuffer(INITIAL_TRANSLITERATE_BUFFER_SIZE);
+  private final ReplaceableString replaceable = new ReplaceableString(buffer);
+
+  private static final int BUFFER_PRUNE_THRESHOLD = 1024;
+
+  private int outputCursor = 0;
+  private boolean inputFinished = false;
+  private int charCount = 0;
+
+  static final int DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY = 8192;
+  private final int maxRollbackBufferCapacity;
+
+  private static final int DEFAULT_INITIAL_ROLLBACK_BUFFER_CAPACITY = 4; // 
must be power of 2
+  private char[] rollbackBuffer;
+  private int rollbackBufferSize = 0;
+
+  ICUTransformCharFilter(Reader in, Transliterator transform) {
+this(in, transform, DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY);
+  }
+
+  /**
+   * Construct new {@link ICUTransformCharFilter} with the specified {@link 
Transliterator}, backed by
+   * the specified {@link Reader}.
+   * @param in input source
+   * @param transform used to perform transliteration
+   * @param maxRollbackBufferCapacityHint used to control the maximum size to 
which this
+   * {@link ICUTransformCharFilter} will buffer and rollback partial 
transliteration of input sequences.
+   * The provided hint will be converted to an enforced limit of "the greatest 
power of 2 (excluding '1')
+   * less than or equal to the specified value". Specifying a negative value 
allows the rollback buffer to
 
 Review comment:
   +1. This was here more as a convenience targeted at the external schema 
config API, so I've preserved that convenience, but moved it out to the 
`ICUTransformCharFilterFactory` instead (to allow users who essentially want to 
configure "no limit" to avoid having to explicitly write something like 
"2147483647" in their config files). Does that sound ok?
   
   I also noticed that there is no power of 2 greater than or equal to hint, 
for hint greater than

[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941226#comment-16941226
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit e54a792e4c835cf0eb55d319250ebad23a0274b3 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e54a792 ]

SOLR-13105: Update regression docs 2


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13790) LRUStatsCache size explosion

2019-09-30 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941213#comment-16941213
 ] 

Andrzej Bialecki commented on SOLR-13790:
-

This patch is a work in progress - it fixes the error described above but it 
also tries to fix an existential problem in LRUStatsCache - namely, as it is 
now it would always send requests for fetching stats (thus adding a round-trip 
to every query), even for repeated queries, consequently defeating the point of 
LRU caching.

Changes in this patch:
* consistenly use shard name instead of the full shard URL lists as caching 
keys, both in SolrCloud mode and in standalone distributed mode
* optimized serialization of stats in order to minimize the size of data and to 
prevent serialization errors when terms contain separators or url-unsafe 
characters
* added SolrCloud unit tests, still need much improvement
* added some logic in LRUStatsCache that tries to avoid sending a stats request 
if all global data is already available in cache. This part is a little bit 
shaky but I don't have any better idea at the moment how to address this 
problem. Basically, it rewrites a query locally to see if there are any missing 
stats to be fetched - but the answer "none" is not 100% fool-proof because 
queries may be rewritten differently based on the available terms and fields in 
the local vs. remote index. The code tries to fix it post-factum by detecting 
missing global stats and forcing a fetch+cache of the missing stats with the 
next request.

> LRUStatsCache size explosion
> 
>
> Key: SOLR-13790
> URL: https://issues.apache.org/jira/browse/SOLR-13790
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2, 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13790.patch
>
>
> On a sizeable cluster with multi-shard multi-replica collections, when 
> {{LRUStatsCache}} was in use we encountered excessive memory usage, which 
> consequently led to severe performance problems.
> On a closer examination of the heapdumps it became apparent that when 
> {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of 
> {{FastLRUCache}} using the passed {{shard}} argument - however, the value of 
> this argument is not a simple shard name but instead it's a randomly ordered 
> list of ALL replica URLs for this shard.
> As a result, due to the combinatoric number of possible keys, over time the 
> map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries...
> The fix seems to be simply to extract the shard name and cache using this 
> name instead of the full string value of the {{shard}} parameter. Existing 
> unit tests also need much improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13790) LRUStatsCache size explosion

2019-09-30 Thread Andrzej Bialecki (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13790:

Attachment: SOLR-13790.patch

> LRUStatsCache size explosion
> 
>
> Key: SOLR-13790
> URL: https://issues.apache.org/jira/browse/SOLR-13790
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.2, 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Critical
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13790.patch
>
>
> On a sizeable cluster with multi-shard multi-replica collections, when 
> {{LRUStatsCache}} was in use we encountered excessive memory usage, which 
> consequently led to severe performance problems.
> On a closer examination of the heapdumps it became apparent that when 
> {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of 
> {{FastLRUCache}} using the passed {{shard}} argument - however, the value of 
> this argument is not a simple shard name but instead it's a randomly ordered 
> list of ALL replica URLs for this shard.
> As a result, due to the combinatoric number of possible keys, over time the 
> map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries...
> The fix seems to be simply to extract the shard name and cache using this 
> name instead of the full string value of the {{shard}} parameter. Existing 
> unit tests also need much improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13101) Shared storage support in SolrCloud

2019-09-30 Thread Yonik Seeley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939809#comment-16939809
 ] 

Yonik Seeley edited comment on SOLR-13101 at 9/30/19 5:21 PM:
--

I plan on creating a branch jira/SOLR-13101 soon for future work on this issue.
edit: this has been done.
Please use this branch for future pull requests:
https://github.com/apache/lucene-solr/tree/jira/SOLR-13101


was (Author: ysee...@gmail.com):
I plan on creating a branch jira/SOLR-13101 soon for future work on this issue.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] magibney commented on a change in pull request #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2019-09-30 Thread GitBox

magibney commented on a change in pull request #892: LUCENE-8972: Add 
ICUTransformCharFilter, to support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892#discussion_r329684574
 
 

 ##
 File path: 
lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUTransformCharFilter.java
 ##
 @@ -0,0 +1,322 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.icu;
+
+import java.io.IOException;
+import java.io.Reader;
+
+import com.ibm.icu.text.ReplaceableString;
+import com.ibm.icu.text.Transliterator;
+import com.ibm.icu.text.Transliterator.Position;
+import com.ibm.icu.text.UTF16;
+
+import org.apache.lucene.analysis.CharFilter;
+import org.apache.lucene.analysis.charfilter.BaseCharFilter;
+import org.apache.lucene.util.ArrayUtil;
+
+/**
+ * A {@link CharFilter} that transforms text with ICU.
+ * 
+ * ICU provides text-transformation functionality via its Transliteration API.
+ * Although script conversion is its most common use, a Transliterator can
+ * actually perform a more general class of tasks. In fact, Transliterator
+ * defines a very general API which specifies only that a segment of the input
+ * text is replaced by new text. The particulars of this conversion are
+ * determined entirely by subclasses of Transliterator.
+ * 
+ * 
+ * Some useful transformations for search are built-in:
+ * 
+ * Conversion from Traditional to Simplified Chinese characters
+ * Conversion from Hiragana to Katakana
+ * Conversion from Fullwidth to Halfwidth forms.
+ * Script conversions, for example Serbian Cyrillic to Latin
+ * 
+ * 
+ * Example usage: stream = new ICUTransformCharFilter(reader,
+ * Transliterator.getInstance("Traditional-Simplified"));
+ * 
+ * For more details, see the http://userguide.icu-project.org/transforms/general;>ICU User
+ * Guide.
+ */
+public final class ICUTransformCharFilter extends BaseCharFilter {
+
+  // Transliterator to transform the text
+  private final Transliterator transform;
+
+  // Reusable position object
+  private final Position position = new Position();
+
+  private static final int READ_BUFFER_SIZE = 1024;
+  private final char[] tmpBuffer = new char[READ_BUFFER_SIZE];
+
+  private static final int INITIAL_TRANSLITERATE_BUFFER_SIZE = 1024;
+  private final StringBuffer buffer = new 
StringBuffer(INITIAL_TRANSLITERATE_BUFFER_SIZE);
+  private final ReplaceableString replaceable = new ReplaceableString(buffer);
+
+  private static final int BUFFER_PRUNE_THRESHOLD = 1024;
+
+  private int outputCursor = 0;
+  private boolean inputFinished = false;
+  private int charCount = 0;
+
+  static final int DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY = 8192;
+  private final int maxRollbackBufferCapacity;
+
+  private static final int DEFAULT_INITIAL_ROLLBACK_BUFFER_CAPACITY = 4; // 
must be power of 2
+  private char[] rollbackBuffer;
+  private int rollbackBufferSize = 0;
+
+  ICUTransformCharFilter(Reader in, Transliterator transform) {
+this(in, transform, DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY);
+  }
+
+  /**
+   * Construct new {@link ICUTransformCharFilter} with the specified {@link 
Transliterator}, backed by
+   * the specified {@link Reader}.
+   * @param in input source
+   * @param transform used to perform transliteration
+   * @param maxRollbackBufferCapacityHint used to control the maximum size to 
which this
+   * {@link ICUTransformCharFilter} will buffer and rollback partial 
transliteration of input sequences.
+   * The provided hint will be converted to an enforced limit of "the greatest 
power of 2 (excluding '1')
+   * less than or equal to the specified value". Specifying a negative value 
allows the rollback buffer to
+   * grow indefinitely (equivalent to specifying {@link Integer#MAX_VALUE}). 
Specifying "0" (or "1", in practice)
+   * disables rollback. Larger values can in some cases yield more accurate 
transliteration, at the cost of
+   * performance and resolution/accuracy of offset correction.
+   * This is intended primarily as a failsafe, with a relatively large default 
value of {@value ICUTransformCharFilter#DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY}.
+   * See comments "To understand the need for

[jira] [Commented] (SOLR-13399) compositeId support for shard splitting

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941130#comment-16941130
 ] 

ASF subversion and git services commented on SOLR-13399:


Commit 7775e17414b83508f40ee2e440914177951b5882 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Yonik Seeley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7775e17 ]

SOLR-13399: fix splitByPrefix default to be false


> compositeId support for shard splitting
> ---
>
> Key: SOLR-13399
> URL: https://issues.apache.org/jira/browse/SOLR-13399
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13399.patch, SOLR-13399.patch, 
> SOLR-13399_testfix.patch, SOLR-13399_useId.patch, 
> ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Shard splitting does not currently have a way to automatically take into 
> account the actual distribution (number of documents) in each hash bucket 
> created by using compositeId hashing.
> We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* 
> command that would look at the number of docs sharing each compositeId prefix 
> and use that to create roughly equal sized buckets by document count rather 
> than just assuming an equal distribution across the entire hash range.
> Like normal shard splitting, we should bias against splitting within hash 
> buckets unless necessary (since that leads to larger query fanout.) . Perhaps 
> this warrants a parameter that would control how much of a size mismatch is 
> tolerable before resorting to splitting within a bucket. 
> *allowedSizeDifference*?
> To more quickly calculate the number of docs in each bucket, we could index 
> the prefix in a different field.  Iterating over the terms for this field 
> would quickly give us the number of docs in each (i.e lucene keeps track of 
> the doc count for each term already.)  Perhaps the implementation could be a 
> flag on the *id* field... something like *indexPrefixes* and poly-fields that 
> would cause the indexing to be automatically done and alleviate having to 
> pass in an additional field during indexing and during the call to 
> *SPLITSHARD*.  This whole part is an optimization though and could be split 
> off into its own issue if desired.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13399) compositeId support for shard splitting

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941131#comment-16941131
 ] 

ASF subversion and git services commented on SOLR-13399:


Commit c169c182ffc2f9cfad2c175a26504d0c116a8ccd in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Megan Carey
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c169c18 ]

SOLR-13399: Adding splitByPrefix param to IndexSizeTrigger; some splitByPrefix 
test and code cleanup


> compositeId support for shard splitting
> ---
>
> Key: SOLR-13399
> URL: https://issues.apache.org/jira/browse/SOLR-13399
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13399.patch, SOLR-13399.patch, 
> SOLR-13399_testfix.patch, SOLR-13399_useId.patch, 
> ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Shard splitting does not currently have a way to automatically take into 
> account the actual distribution (number of documents) in each hash bucket 
> created by using compositeId hashing.
> We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* 
> command that would look at the number of docs sharing each compositeId prefix 
> and use that to create roughly equal sized buckets by document count rather 
> than just assuming an equal distribution across the entire hash range.
> Like normal shard splitting, we should bias against splitting within hash 
> buckets unless necessary (since that leads to larger query fanout.) . Perhaps 
> this warrants a parameter that would control how much of a size mismatch is 
> tolerable before resorting to splitting within a bucket. 
> *allowedSizeDifference*?
> To more quickly calculate the number of docs in each bucket, we could index 
> the prefix in a different field.  Iterating over the terms for this field 
> would quickly give us the number of docs in each (i.e lucene keeps track of 
> the doc count for each term already.)  Perhaps the implementation could be a 
> flag on the *id* field... something like *indexPrefixes* and poly-fields that 
> would cause the indexing to be automatically done and alleviate having to 
> pass in an additional field during indexing and during the call to 
> *SPLITSHARD*.  This whole part is an optimization though and could be split 
> off into its own issue if desired.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] magibney commented on a change in pull request #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2019-09-30 Thread GitBox

magibney commented on a change in pull request #892: LUCENE-8972: Add 
ICUTransformCharFilter, to support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892#discussion_r329679320
 
 

 ##
 File path: 
lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUTransformCharFilter.java
 ##
 @@ -0,0 +1,322 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.analysis.icu;
+
+import java.io.IOException;
+import java.io.Reader;
+
+import com.ibm.icu.text.ReplaceableString;
+import com.ibm.icu.text.Transliterator;
+import com.ibm.icu.text.Transliterator.Position;
+import com.ibm.icu.text.UTF16;
+
+import org.apache.lucene.analysis.CharFilter;
+import org.apache.lucene.analysis.charfilter.BaseCharFilter;
+import org.apache.lucene.util.ArrayUtil;
+
+/**
+ * A {@link CharFilter} that transforms text with ICU.
+ * 
+ * ICU provides text-transformation functionality via its Transliteration API.
+ * Although script conversion is its most common use, a Transliterator can
+ * actually perform a more general class of tasks. In fact, Transliterator
+ * defines a very general API which specifies only that a segment of the input
+ * text is replaced by new text. The particulars of this conversion are
+ * determined entirely by subclasses of Transliterator.
+ * 
+ * 
+ * Some useful transformations for search are built-in:
+ * 
+ * Conversion from Traditional to Simplified Chinese characters
+ * Conversion from Hiragana to Katakana
+ * Conversion from Fullwidth to Halfwidth forms.
+ * Script conversions, for example Serbian Cyrillic to Latin
+ * 
+ * 
+ * Example usage: stream = new ICUTransformCharFilter(reader,
+ * Transliterator.getInstance("Traditional-Simplified"));
+ * 
+ * For more details, see the http://userguide.icu-project.org/transforms/general;>ICU User
+ * Guide.
+ */
+public final class ICUTransformCharFilter extends BaseCharFilter {
+
+  // Transliterator to transform the text
+  private final Transliterator transform;
+
+  // Reusable position object
+  private final Position position = new Position();
+
+  private static final int READ_BUFFER_SIZE = 1024;
+  private final char[] tmpBuffer = new char[READ_BUFFER_SIZE];
+
+  private static final int INITIAL_TRANSLITERATE_BUFFER_SIZE = 1024;
+  private final StringBuffer buffer = new 
StringBuffer(INITIAL_TRANSLITERATE_BUFFER_SIZE);
+  private final ReplaceableString replaceable = new ReplaceableString(buffer);
+
+  private static final int BUFFER_PRUNE_THRESHOLD = 1024;
+
+  private int outputCursor = 0;
+  private boolean inputFinished = false;
+  private int charCount = 0;
+
+  static final int DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY = 8192;
+  private final int maxRollbackBufferCapacity;
+
+  private static final int DEFAULT_INITIAL_ROLLBACK_BUFFER_CAPACITY = 4; // 
must be power of 2
+  private char[] rollbackBuffer;
+  private int rollbackBufferSize = 0;
+
+  ICUTransformCharFilter(Reader in, Transliterator transform) {
+this(in, transform, DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY);
+  }
+
+  /**
+   * Construct new {@link ICUTransformCharFilter} with the specified {@link 
Transliterator}, backed by
+   * the specified {@link Reader}.
+   * @param in input source
+   * @param transform used to perform transliteration
+   * @param maxRollbackBufferCapacityHint used to control the maximum size to 
which this
+   * {@link ICUTransformCharFilter} will buffer and rollback partial 
transliteration of input sequences.
+   * The provided hint will be converted to an enforced limit of "the greatest 
power of 2 (excluding '1')
+   * less than or equal to the specified value". Specifying a negative value 
allows the rollback buffer to
+   * grow indefinitely (equivalent to specifying {@link Integer#MAX_VALUE}). 
Specifying "0" (or "1", in practice)
+   * disables rollback. Larger values can in some cases yield more accurate 
transliteration, at the cost of
+   * performance and resolution/accuracy of offset correction.
+   * This is intended primarily as a failsafe, with a relatively large default 
value of {@value ICUTransformCharFilter#DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY}.
+   * See comments "To understand the need for

[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941086#comment-16941086
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 2966f1308832ed10e4db38af1815df099e9419b2 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2966f13 ]

SOLR-13105: Update machine learning docs 2


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thomaswoeckinger commented on issue #911: SOLR-13802: Write analyzer property luceneMatchVersion to managed schema

2019-09-30 Thread GitBox

thomaswoeckinger commented on issue #911: SOLR-13802: Write analyzer property 
luceneMatchVersion to managed schema
URL: https://github.com/apache/lucene-solr/pull/911#issuecomment-536620261
 
 
   @dsmiley: Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thomaswoeckinger opened a new pull request #911: SOLR-13802: Write analyzer property luceneMatchVersion to managed schema

2019-09-30 Thread GitBox

thomaswoeckinger opened a new pull request #911: SOLR-13802: Write analyzer 
property luceneMatchVersion to managed schema
URL: https://github.com/apache/lucene-solr/pull/911
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I am authorized to contribute this code to the ASF and have removed 
any code I do not have a license to distribute.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13793) HTTPSolrCall makes cascading calls even when all replicas are down for a collection

2019-09-30 Thread Kesharee Nandan Vishwakarma (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kesharee Nandan Vishwakarma updated SOLR-13793:
---
Affects Version/s: master (9.0)

> HTTPSolrCall makes cascading calls even when all replicas are down for a 
> collection
> ---
>
> Key: SOLR-13793
> URL: https://issues.apache.org/jira/browse/SOLR-13793
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.6, master (9.0)
>Reporter: Kesharee Nandan Vishwakarma
>Priority: Major
>
> REMOTEQUERY action in HTTPSolrCall ends up making too many cascading 
> remoteQuery calls when all all the replicas of a collection are in down 
> state. 
> This results in increase in thread count, unresponsive solr nodes and 
> eventually node (one's which have this collection) going out of live nodes.
> *Example scenario*: Consider a cluster with 3 nodes(solr1, solrw1, 
> solr-overseer1). A collection is present on solr1, solrw1 but both replicas 
> are in down state. When a search request is made to solr-overseer1, since 
> replica is not present locally a remote query is made to solr1 (we also 
> consider inactive slices/coreUrls), solr1 also doesn't see an active replica 
> present locally, it forwards to solrw1, again solrw1 will forward request to 
> solr1. This goes on till both of solr1, solrw1 become unresponsive. Attached 
> logs for this.
> This is happening because we are considering [inactive 
> slices|https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L913
>  ], [inactive coreUrl| 
> https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L929]
>  while forwarding requests to nodes.
> *Steps to reproduce*:
> #  Bring down all replicas of a collection but ensure nodes containing them 
> are up 
> # Make any search call to any of solr nodes for this collection. 
>  
> *Possible fixes*: 
> # Ensure we select only active slices/coreUrls before making remote queries
> # Put a limit on cascading calls probably limit to number of replicas 
>  
> {noformat} 
> solrw1_1 |
> solrw1_1 | 2019-09-24 09:35:14.458 ERROR (qtp762152757-8772) [   ] 
> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error trying 
> to proxy request for url: http://solr1:8983/solr/kg3/select
> solrw1_1 |at 
> org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:660)
> solrw1_1 |at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:514)
> solrw1_1 |at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
> solrw1_1 |at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
> solrw1_1 |at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
> solrw1_1 |at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> solrw1_1 |at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> solrw1_1 |at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> solrw1_1 |at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> solrw1_1 |at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> solrw1_1 |at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> solrw1_1 |at 
> org.eclipse.jetty.server.Server.handle(Server.java:534)
> solrw1_1 |at 
>

[jira] [Commented] (SOLR-13798) SSL: Adding Enabling/Disabling client's hostname verification config

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941071#comment-16941071
 ] 

ASF subversion and git services commented on SOLR-13798:


Commit 494d823e9d2f3dae7587cc9824cae9fbd900e4e1 in lucene-solr's branch 
refs/heads/branch_8x from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=494d823 ]

SOLR-13798: SSL: Adding Enabling/Disabling client's hostname verification config


> SSL: Adding Enabling/Disabling client's hostname verification config
> 
>
> Key: SOLR-13798
> URL: https://issues.apache.org/jira/browse/SOLR-13798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.2
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13709.patch, SOLR-13709.patch
>
>
> The problem for this after upgrading to Jetty 9.4.19 (SOLR-13541). 
> {{endpointIdentificationAlgorithm}} changed from null → HTTPS. As a result of 
> this client's hostname (identity) is always get verified on connecting Solr. 
> This change improved the security level of Solr, since it requires 2 ways 
> identity verifications (client verify server's identity and vice versa). It 
> leads to a problem when only certificate verification is enough (client's 
> hostname is not known ahead) for users.
> We should introduce a flag in {{solr.in.sh}} to disable client's hostname 
> verification when needed then.
> More about this at : 
> * https://tools.ietf.org/html/rfc2818#section-3
> * https://github.com/eclipse/jetty.project/issues/3454
> * https://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13798) SSL: Adding Enabling/Disabling client's hostname verification config

2019-09-30 Thread Cao Manh Dat (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat resolved SOLR-13798.
-
Fix Version/s: 8.3
   Resolution: Fixed

> SSL: Adding Enabling/Disabling client's hostname verification config
> 
>
> Key: SOLR-13798
> URL: https://issues.apache.org/jira/browse/SOLR-13798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.2
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13709.patch, SOLR-13709.patch
>
>
> The problem for this after upgrading to Jetty 9.4.19 (SOLR-13541). 
> {{endpointIdentificationAlgorithm}} changed from null → HTTPS. As a result of 
> this client's hostname (identity) is always get verified on connecting Solr. 
> This change improved the security level of Solr, since it requires 2 ways 
> identity verifications (client verify server's identity and vice versa). It 
> leads to a problem when only certificate verification is enough (client's 
> hostname is not known ahead) for users.
> We should introduce a flag in {{solr.in.sh}} to disable client's hostname 
> verification when needed then.
> More about this at : 
> * https://tools.ietf.org/html/rfc2818#section-3
> * https://github.com/eclipse/jetty.project/issues/3454
> * https://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-13802) Analyzer property luceneMatchVersion is not written to managed schema

2019-09-30 Thread Jira

Thomas Wöckinger created SOLR-13802:
---

 Summary: Analyzer property luceneMatchVersion is not written to 
managed schema
 Key: SOLR-13802
 URL: https://issues.apache.org/jira/browse/SOLR-13802
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Schema and Analysis
Affects Versions: 8.2, 7.7.2, master (9.0)
Reporter: Thomas Wöckinger


The analyzer property luceneMatchVersion is no written to managed schema, it is 
simply not handled by the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13798) SSL: Adding Enabling/Disabling client's hostname verification config

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941070#comment-16941070
 ] 

ASF subversion and git services commented on SOLR-13798:


Commit 7350c5031635317c531c2f9249325d304a900772 in lucene-solr's branch 
refs/heads/master from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7350c50 ]

SOLR-13798: SSL: Adding Enabling/Disabling client's hostname verification config


> SSL: Adding Enabling/Disabling client's hostname verification config
> 
>
> Key: SOLR-13798
> URL: https://issues.apache.org/jira/browse/SOLR-13798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.2
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13709.patch, SOLR-13709.patch
>
>
> The problem for this after upgrading to Jetty 9.4.19 (SOLR-13541). 
> {{endpointIdentificationAlgorithm}} changed from null → HTTPS. As a result of 
> this client's hostname (identity) is always get verified on connecting Solr. 
> This change improved the security level of Solr, since it requires 2 ways 
> identity verifications (client verify server's identity and vice versa). It 
> leads to a problem when only certificate verification is enough (client's 
> hostname is not known ahead) for users.
> We should introduce a flag in {{solr.in.sh}} to disable client's hostname 
> verification when needed then.
> More about this at : 
> * https://tools.ietf.org/html/rfc2818#section-3
> * https://github.com/eclipse/jetty.project/issues/3454
> * https://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941047#comment-16941047
 ] 

Ishan Chattopadhyaya commented on SOLR-13661:
-

bq. We revert those changes. But they seem complicated enough that I don't want 
to attempt it.
The commits are here: https://issues.apache.org/jira/browse/SOLR-13710

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13661) A package management system for Solr

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13661:

Priority: Blocker  (was: Major)

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-9458) DocumentDictionaryFactory StackOverflowError on many documents

2019-09-30 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-9458.
--
Resolution: Fixed

I think this is fixed by LUCENE-7914. We can re-open if it's still a problem 
here. There'll still be an exception thrown, but at least one that's controlled.

Although understanding how to fix this problem is unclear. Add a filter that 
limits the length of a token?

> DocumentDictionaryFactory StackOverflowError on many documents
> --
>
> Key: SOLR-9458
> URL: https://issues.apache.org/jira/browse/SOLR-9458
> Project: Solr
>  Issue Type: Bug
>  Components: Suggester
>Affects Versions: 6.1, 6.2
>Reporter: Chris de Kok
>Priority: Major
>
> When using the FuzzyLookupFactory in combinarion with the 
> DocumentDictionaryFactory it will throw a stackoverflow trying to build the 
> dictionary.
> Using the HighFrequencyDictionaryFactory works ok but behaves very different.
> ```
> 
> 
> suggest
> suggestions
> suggestions
> FuzzyLookupFactory
> DocumentDictionaryFactory
> suggest_fuzzy
> true
> false
> false
> true
> 0
> 
> 
> null:java.lang.StackOverflowError
>   at 
> org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)
>   at 
> org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)
>   at 
> org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)
>   at 
> org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941012#comment-16941012
 ] 

Ishan Chattopadhyaya commented on SOLR-13661:
-

bq. I want to be clear on one thing: The concern/frustration that Jan and I 
have on peer review is because this issue is not some ordinary JIRA issue. It's 
highly impactful to Solr. As-such, IMO peer review is required for at least the 
major ideas / high level, naming, CLI, release-plan. Getting into some small 
details, no, not needed. Thankfully the peer review is here now 

Thanks David and Jan for your help with reviews. I am glad to receive the 
reviews and the improvement that brings to our design and implementation. I 
shall update the design document with all the details that we discussed offline 
(on slack) that will make it easier to understand the workflows involved here.

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-30 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941010#comment-16941010
 ] 

Ishan Chattopadhyaya commented on SOLR-13661:
-

We are in a difficult situation here. There are some commits concerning the 
class loader changes and new blob store that are already in master and 
branch_8x. But I am supposed to cut the 8.3 branch today. Here are the options 
we have:
# We revert those changes. But they seem complicated enough that I don't want 
to attempt it.
# We review and merge https://github.com/apache/lucene-solr/pull/910 which will 
complete the blob store and class loader work, but might take 1-2 days to 
review?
# We leave the unfinished code in the branch and disable the feature and cut 
the branch.

[~noble.paul], [~dsmiley], [~janhoy] any thoughts?

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-30 Thread David Wayne Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941007#comment-16941007
 ] 

David Wayne Smiley commented on SOLR-13661:
---

The design document is pretty fantastic in its overall scope (not too much or 
too little) and structure (easy to consume).  Of course I have things inside to 
debate but it was a breath of fresh air to consume.

I want to be clear on one thing:  The concern/frustration that Jan and I have 
on peer review is because this issue is not some ordinary JIRA issue.  It's 
highly impactful to Solr.  As-such, IMO peer review is _required_ for at least 
the major ideas / high level, naming, CLI, release-plan.  Getting into some 
small details, no, not needed.  Thankfully the peer review is here now :-)

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] noblepaul opened a new pull request #910: SOLR-13661 : SOLR-13661 A package management system for Solr

2019-09-30 Thread GitBox

noblepaul opened a new pull request #910: SOLR-13661 :  SOLR-13661 A package 
management system for Solr
URL: https://github.com/apache/lucene-solr/pull/910
 
 
   
   
   # Description
   
   Please refer to the design doc for for details
   
   
https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#
   
   
   
   # Tests
   
   TestPackages have the tests required for this 
   
   # Checklist
   
   Please review the following and check all that apply:
   
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I am authorized to contribute this code to the ASF and have removed 
any code I do not have a license to distribute.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13290) Prometheus metric exporter AsyncLogger: java.lang.NoClassDefFoundError

2019-09-30 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-13290.
---
Resolution: Fixed

[~kstoney] Looking over old JIRAs I've assigned to myself and saw this. I'm 
assuming it's fixed, probably by the linked JIRA. Do you agree? If not we can 
re-open.

> Prometheus metric exporter AsyncLogger: java.lang.NoClassDefFoundError
> --
>
> Key: SOLR-13290
> URL: https://issues.apache.org/jira/browse/SOLR-13290
> Project: Solr
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 8.0, 8.1
>Reporter: Karl Stoney
>Assignee: Erick Erickson
>Priority: Major
>
> Since this 
> commit:[https://github.com/apache/lucene-solr/commit/02eb9d34404b8fc7225ee7c5c867e194afae17a0]
> The metrics exporter in branch_8x no longer starts
> {code:java}
> 2019-03-04 16:06:01,070 main ERROR Unable to invoke factory method in class 
> org.apache.logging.log4j.core.async.AsyncLoggerConfig for element 
> AsyncLogger: java.lang.NoClassDefFoundError
> : com/lmax/disruptor/EventFactory java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136)
>  at 
> org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:964)
>  at 
> org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:904)
>  at 
> org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:896)
>  at 
> org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:514)
>  at 
> org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:238)
>  at 
> org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:250)
>  at 
> org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:548)
>  at 
> org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:620)
>  at 
> org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:637)
>  at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:231)
>  at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
>  at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
>  at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
>  at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:121)
>  at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
>  at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:46)
>  at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
>  at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
>  at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
>  at 
> org.apache.solr.prometheus.exporter.SolrExporter.(SolrExporter.java:48)
> Caused by: java.lang.NoClassDefFoundError: com/lmax/disruptor/EventFactory
>  at 
> org.apache.logging.log4j.core.config.AbstractConfiguration.getAsyncLoggerConfigDelegate(AbstractConfiguration.java:203)
>  at 
> org.apache.logging.log4j.core.async.AsyncLoggerConfig.(AsyncLoggerConfig.java:91)
>  at 
> org.apache.logging.log4j.core.async.AsyncLoggerConfig.createLogger(AsyncLoggerConfig.java:273)
>  ... 25 more
> Caused by: java.lang.ClassNotFoundException: com.lmax.disruptor.EventFactory
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>  ... 28 more{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jimczi commented on issue #904: LUCENE-8992: Share minimum score across segment in concurrent search

2019-09-30 Thread GitBox

jimczi commented on issue #904: LUCENE-8992: Share minimum score across segment 
in concurrent search
URL: https://github.com/apache/lucene-solr/pull/904#issuecomment-536520766
 
 
   Thanks for reviewing @atris . I pushed some changes to address your comments 
and add unit tests for the bottom value checker, can you take another look ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jimczi commented on a change in pull request #904: LUCENE-8992: Share minimum score across segment in concurrent search

2019-09-30 Thread GitBox

jimczi commented on a change in pull request #904: LUCENE-8992: Share minimum 
score across segment in concurrent search
URL: https://github.com/apache/lucene-solr/pull/904#discussion_r329528877
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java
 ##
 @@ -423,22 +440,24 @@ static TopFieldCollector create(Sort sort, int numHits, 
FieldDoc after,
 throw new IllegalArgumentException("after.fields has " + 
after.fields.length + " values but sort has " + sort.getSort().length);
   }
 
-  return new PagingFieldCollector(sort, queue, after, numHits, 
hitsThresholdChecker);
+  return new PagingFieldCollector(sort, queue, after, numHits, 
hitsThresholdChecker, bottomValueChecker);
 }
   }
 
   /**
* Create a CollectorManager which uses a shared hit counter to maintain 
number of hits
+   * and a shared bottom value checker to propagate the minimum score accross 
segments if
+   * the primary sort is by relevancy.
*/
-  public static CollectorManager 
createSharedManager(Sort sort, int numHits, FieldDoc after,
-   
  int totalHitsThreshold) {
+  public static CollectorManager 
createSharedManager(Sort sort, int numHits, FieldDoc after, int 
totalHitsThreshold) {
 
 Review comment:
   good catch, this shouldn't be changed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously

2019-09-30 Thread Atri Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940798#comment-16940798
 ] 

Atri Sharma commented on LUCENE-8213:
-

Interesting – I did not realise that testLRUEviction could also cause 
LRUQueryCache to cache asynchronously, hence did not update it to handle the 
same (in the manner testLRUConcurrentLoadAndEviction does).

 

I have pushed a test fix now – beasted the test 50 times with the seed you 
provided, and also beasted the entire TestLRUQueryCache suite 20 times with the 
seed. Ran the entire Lucene test suite – came in clean.

 

It is curious to note that I could not reproduce the test failure without the 
seed even after running multiple times – kudos to the CI!

> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940750#comment-16940750
 ] 

ASF subversion and git services commented on SOLR-13661:


Commit 7779be1017c93166cf10d8debc8765e7d121037c in lucene-solr's branch 
refs/heads/jira/SOLR-13661 from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7779be1 ]

SOLR-13661: Changed the blob store to use sha256- as the blob id 
instead of just sha256


> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13793) HTTPSolrCall makes cascading calls even when all replicas are down for a collection

2019-09-30 Thread Kesharee Nandan Vishwakarma (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940694#comment-16940694
 ] 

Kesharee Nandan Vishwakarma commented on SOLR-13793:


As per https://issues.apache.org/jira/browse/SOLR-4553 we are attempting to 
proxy requests more aggressively , this leads to scenarios mentioned in this 
bug.

[~markrmil...@gmail.com] Can we improve accuracy in getting active 
[slices|https://github.com/apache/lucene-solr/blob/e7522297a70674662f1083f9942403bac3119693/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L978]
 / [coreUrls 
|https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L929]
 itself. or else we can put a limit on cascading remote queries when 
considering dead replicas/slices

> HTTPSolrCall makes cascading calls even when all replicas are down for a 
> collection
> ---
>
> Key: SOLR-13793
> URL: https://issues.apache.org/jira/browse/SOLR-13793
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.6
>Reporter: Kesharee Nandan Vishwakarma
>Priority: Major
>
> REMOTEQUERY action in HTTPSolrCall ends up making too many cascading 
> remoteQuery calls when all all the replicas of a collection are in down 
> state. 
> This results in increase in thread count, unresponsive solr nodes and 
> eventually node (one's which have this collection) going out of live nodes.
> *Example scenario*: Consider a cluster with 3 nodes(solr1, solrw1, 
> solr-overseer1). A collection is present on solr1, solrw1 but both replicas 
> are in down state. When a search request is made to solr-overseer1, since 
> replica is not present locally a remote query is made to solr1 (we also 
> consider inactive slices/coreUrls), solr1 also doesn't see an active replica 
> present locally, it forwards to solrw1, again solrw1 will forward request to 
> solr1. This goes on till both of solr1, solrw1 become unresponsive. Attached 
> logs for this.
> This is happening because we are considering [inactive 
> slices|https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L913
>  ], [inactive coreUrl| 
> https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L929]
>  while forwarding requests to nodes.
> *Steps to reproduce*:
> #  Bring down all replicas of a collection but ensure nodes containing them 
> are up 
> # Make any search call to any of solr nodes for this collection. 
>  
> *Possible fixes*: 
> # Ensure we select only active slices/coreUrls before making remote queries
> # Put a limit on cascading calls probably limit to number of replicas 
>  
> {noformat} 
> solrw1_1 |
> solrw1_1 | 2019-09-24 09:35:14.458 ERROR (qtp762152757-8772) [   ] 
> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error trying 
> to proxy request for url: http://solr1:8983/solr/kg3/select
> solrw1_1 |at 
> org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:660)
> solrw1_1 |at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:514)
> solrw1_1 |at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
> solrw1_1 |at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
> solrw1_1 |at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
> solrw1_1 |at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> solrw1_1 |at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> solrw1_1 |at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> solrw1_1 |at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> solrw1_1 |at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> solrw1_1 |at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> solrw1_1 |at 
>

[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously

2019-09-30 Thread Ignacio Vera (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940677#comment-16940677
 ] 

Ignacio Vera commented on LUCENE-8213:
--

Here is the failure:

https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+master/5526/console

I had a look and it is a race condition. We are checking if the query has been 
cached just after executing a query but if doing it async, it might happen that 
the query is still not there.

> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously

2019-09-30 Thread Atri Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940673#comment-16940673
 ] 

Atri Sharma commented on LUCENE-8213:
-

Taking a look – although I am unable to reproduce with a simple ant test.


Can you point me to the CI link so that I can dive deeper into the error output?

> Cache costly subqueries asynchronously
> --
>
> Key: LUCENE-8213
> URL: https://issues.apache.org/jira/browse/LUCENE-8213
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Affects Versions: 7.2.1
>Reporter: Amir Hadadi
>Priority: Minor
>  Labels: performance
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery allows to combine costly range queries with a selective 
> lead iterator in an optimized way. However, the range query at some point 
> gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see an asynchronous caching implementation in such cases, 
> so that queries involving IndexOrDocValuesQuery would have consistent 
> performance characteristics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

49 matches

Mail list logo