[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366734889
 
 

 ##
 File path: versions.props
 ##
 @@ -72,6 +72,7 @@ org.apache.opennlp:opennlp-tools=1.9.1
 org.apache.pdfbox:*=2.0.17
 org.apache.pdfbox:jempbox=1.8.16
 org.apache.poi:*=4.1.1
+org.apache.rat:apache-rat:0.11
 
 Review comment:
   Did you run gradlew precommit? :) Because I think the lock update file isn't 
included in the patch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366734419
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import groovy.xml.NamespaceBuilder
+
+configure(rootProject) {
+configurations {
+rat
+}
+
+dependencies {
+rat "org.apache.rat:apache-rat"
+}
+}
+
+allprojects {
+task("rat", type: RatTask) {
+group = 'Verification'
+description = 'Runs Apache Rat checks.'
+}
+
+if (project == rootProject) {
+rat {
+includes += [
+"buildSrc/**/*.java",
+"lucene/tools/forbiddenApis/**",
+"lucene/tools/prettify/**",
+// "dev-tools/**"
+]
+excludes += [
+// Unclear if this needs ASF header, depends on how much was 
copied from ElasticSearch
+"**/ErrorReportingTestListener.java"
+]
+}
+}
+
+if (project.path == ":lucene:analysis:common") {
+rat {
+srcExcludes += [
+"**/*.aff",
+"**/*.dic",
+"**/charfilter/*.htm*",
+"**/*LuceneResourcesWikiPage.html"
+]
+}
+}
+
+if (project.path == ":lucene:analysis:kuromoji") {
+rat {
+srcExcludes += [
+// whether rat detects this as binary or not is platform 
dependent?!
+"**/bocchan.utf-8"
+]
+}
+}
+
+if (project.path == ":lucene:analysis:opennlp") {
+rat {
+excludes += [
+"src/tools/test-model-data/*.txt",
+]
+}
+}
+
+if (project.path == ":lucene:highlighter") {
+rat {
+srcExcludes += [
+"**/CambridgeMA.utf8"
+]
+}
+}
+
+if (project.path == ":lucene:suggest") {
+rat {
+srcExcludes += [
+"**/Top50KWiki.utf8",
+"**/stop-snowball.txt"
+]
+}
+}
+
+if (project.path == ":lucene:tools") {
+rat {
+includes += [
+"forbiddenApis/**",
+"prettify/**",
+// If/when :lucene:tools becomes a gradle project, then the 
following line will fail
 
 Review comment:
   I don't think there will be lucene:tools anymore once we depart from ant - I 
don't think there is any relevant code in there that should stay.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366734202
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import groovy.xml.NamespaceBuilder
+
+configure(rootProject) {
+configurations {
+rat
+}
+
+dependencies {
+rat "org.apache.rat:apache-rat"
+}
+}
+
+allprojects {
+task("rat", type: RatTask) {
+group = 'Verification'
+description = 'Runs Apache Rat checks.'
+}
+
+if (project == rootProject) {
+rat {
+includes += [
+"buildSrc/**/*.java",
+"lucene/tools/forbiddenApis/**",
+"lucene/tools/prettify/**",
+// "dev-tools/**"
+]
+excludes += [
+// Unclear if this needs ASF header, depends on how much was 
copied from ElasticSearch
+"**/ErrorReportingTestListener.java"
+]
+}
+}
+
+if (project.path == ":lucene:analysis:common") {
+rat {
+srcExcludes += [
+"**/*.aff",
+"**/*.dic",
+"**/charfilter/*.htm*",
+"**/*LuceneResourcesWikiPage.html"
+]
+}
+}
+
+if (project.path == ":lucene:analysis:kuromoji") {
+rat {
+srcExcludes += [
+// whether rat detects this as binary or not is platform 
dependent?!
 
 Review comment:
   This is one of the reasons I'd rather have our own check - we don't have to 
rely on a black box (or look inside rat's code to figure out what it does).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366733855
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import groovy.xml.NamespaceBuilder
+
+configure(rootProject) {
+configurations {
+rat
+}
+
+dependencies {
+rat "org.apache.rat:apache-rat"
+}
+}
+
+allprojects {
+task("rat", type: RatTask) {
+group = 'Verification'
+description = 'Runs Apache Rat checks.'
+}
+
+if (project == rootProject) {
+rat {
+includes += [
+"buildSrc/**/*.java",
+"lucene/tools/forbiddenApis/**",
+"lucene/tools/prettify/**",
+// "dev-tools/**"
+]
+excludes += [
+// Unclear if this needs ASF header, depends on how much was 
copied from ElasticSearch
 
 Review comment:
   The idea was but I changed a whole bunch of stuff along the way. There are 
similarities and the attribution is there but I think we can consider it our 
own.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on issue #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on issue #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574539229
 
 
   Thanks Mike. I'll merge it in today. I actually looked at apache rat sources 
yesterday -- they are indeed riddled with encoding issues (writer on top of 
printstream, etc.) and they're not easy to use directly to apply just to a 
bunch of files instead of directory+inclusions/exclusions.
   
   My motivation here was to replace the whole fileset trickery with gradle's 
file collections. Then it's easier to pick what you need, pass it to task 
inputs, etc. Also, what rat does is merely simple pattern detection (line 
matching)... this could also be written in just plain groovy with relative ease 
- perhaps at some point we can just switch over to having our own license check 
where we can control things better. 
   
   For now I think it's fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-14 Thread GitBox
mkhludnev commented on a change in pull request #1161: SOLR-12325 Added 
uniqueBlockQuery(parent:true) aggregation for JSON Facet
URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366721693
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/search/facet/UniqueBlockFieldAgg.java
 ##
 @@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search.facet;
+
+import java.io.IOException;
+
+import org.apache.solr.schema.SchemaField;
+
+public class UniqueBlockFieldAgg extends UniqueBlockAgg {
+
+  private static final class UniqueBlockFieldSlotAcc extends 
UniqueBlockSlotAcc {
 
 Review comment:
   is it really necessary? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-14 Thread GitBox
mkhludnev commented on a change in pull request #1161: SOLR-12325 Added 
uniqueBlockQuery(parent:true) aggregation for JSON Facet
URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366721270
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/FunctionQParser.java
 ##
 @@ -231,7 +232,59 @@ public String parseArg() throws SyntaxError {
 return val;
   }
 
-  
+  /**
+   * Parse argument in the same way as parseArg() but also parse the argument 
name
+   * if written in following syntax: argumentName=argument.
+   *
+   * @return Immutable entry with name as the key and argument as value. In 
case where there's no name, key is null.
+   * @throws SyntaxError in case when argument is not ended by ) or ,
+   */
+  public SimpleImmutableEntry parseNamedArg() throws 
SyntaxError {
 
 Review comment:
   1. Usually it's nicer to declare interface ie Entry<>.
   1. I've thought that Yonik whiteness about something already existed? 
Doesn't it? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-9683) include/exclude filters by tag

2020-01-14 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015663#comment-17015663
 ] 

Mikhail Khludnev commented on SOLR-9683:


Ok. We had an filter exclusion before. Now, after SOLR-12490 allows to refer 
DSL query in domain filter it's worth to combine filtering and exclusion. My 
understanding that now it's one or other.

> include/exclude filters by tag
> --
>
> Key: SOLR-9683
> URL: https://issues.apache.org/jira/browse/SOLR-9683
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, JSON Request API
>Reporter: Yonik Seeley
>Priority: Major
>
> When specifying a filter list in JSON syntax, it would be nice to be able to 
> include/exclude filters by tag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12490) Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets

2020-01-14 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev resolved SOLR-12490.
-
Resolution: Fixed

> Introducing json.queries was:Query DSL supports for further referring and 
> exclusion in JSON facets 
> ---
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
> Fix For: 8.5
>
> Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, 
> SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12490) Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets

2020-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015659#comment-17015659
 ] 

ASF subversion and git services commented on SOLR-12490:


Commit c90ef46497cf6314b63b4aeb1d69b2f8b64230bd in lucene-solr's branch 
refs/heads/branch_8x from Mikhail Khludnev
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c90ef46 ]

SOLR-12490: Describe json.queries in the ref guide.
Link it from many pages.
Fix a few errors by the way.


> Introducing json.queries was:Query DSL supports for further referring and 
> exclusion in JSON facets 
> ---
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
> Fix For: 8.5
>
> Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, 
> SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12490) Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets

2020-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015650#comment-17015650
 ] 

ASF subversion and git services commented on SOLR-12490:


Commit 5cf1ffef321cdcd43677d7e4fc3363f73a4ed468 in lucene-solr's branch 
refs/heads/master from Mikhail Khludnev
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5cf1ffe ]

SOLR-12490: Describe json.queries in the ref guide.
Link it from many pages.
Fix a few errors by the way.


> Introducing json.queries was:Query DSL supports for further referring and 
> exclusion in JSON facets 
> ---
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
> Fix For: 8.5
>
> Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, 
> SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-14 Thread Kazuaki Hiraga (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015612#comment-17015612
 ] 

Kazuaki Hiraga commented on LUCENE-9123:


Hello [~tomoko],
Please find an attached file that is named _LUCENE-9123_revised.patch_ , which 
added new constructors that reflect your comment. 

> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-14 Thread Kazuaki Hiraga (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Hiraga updated LUCENE-9123:
---
Attachment: LUCENE-9123_revised.patch

> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-14 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015608#comment-17015608
 ] 

Erick Erickson commented on LUCENE-9134:


OK, bear with me, I'm finally getting my feet wet. First, let me check the 
magic:

settings.gradle essentially points to all the directories containing a 
build.gradle file we care about, so any task defined in one of the build.gradle 
files will be found. So when I just added tasks (well, copied some of Marks 
work as a way to start) into lucene/queryparser/build.gradle) tasks are 
magically found and I can try to execute "./gradlew regenerate" for instance, 
which in turn eventually depends on a task (among others) defined like so:

{code}
task runJavaccQueryParser(type: org.apache.lucene.gradle.JavaCC)...
{code}

and fails with
{code}
Could not get unknown property 'org' for project ':lucene:queryparser' of type 
org.gradle.api.Project.
{code}

 but at least it tries. 

Which brings me to the next bit. Mark's work has a directory here: 
./buildSrc/src/main/groovy/org/apache/lucene/gradle which has a file 
JavaCC.groovy that defines the JavaCC class:
{code}
org.apache.lucene.gradle
class JavaCC.
{code}

How are we organizing this kind of thing in the current work? I can move the 
JavaCC.groovy file anywhere it should go, where would that be? And before I do 
that, is there a better approach to this problem?


> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for --
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires 
less memory and disks when compared with HNSW. I'm now working on the 
implementation of IVFFlat. And I will try my best to reuse the excellent work 
by LUCENE-9004.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. Compared with 
HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will 
provide one more optional choice for interesting users.  I'm now working on the 
implementation of IVFFlat. And I will try my best to reuse the excellent work 
by LUCENE-9004.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as KD-tree;
>  # Hashing methods, such as LSH (Local 

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. Compared with 
HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will 
provide one more optional choice for interesting users.  I'm now working on the 
implementation of IVFFlat. And I will try my best to reuse the excellent work 
by LUCENE-9004.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. Compared with 
HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will 
provide one more optional choice for interesting users.  I'm now working on the 
implementation of IVFFlat. And I will try my best to reuse the excellent work 
by LUCENE-9004.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World) Approximate nearest vector search , has made great 
progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce 
IVFFlat to Lucene will provide one more optional choice for interesting users.  
I'm now working on the implementation of IVFFlat. And I will try my best to 
reuse the excellent work by LUCENE-9004.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[faiss|[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World) Approximate nearest vector search , has made great 
progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce 
IVFFlat to Lucene will provide one more optional choice for interesting users.  
I'm now working on the implementation of IVFFlat. And I will try my best to 
reuse the excellent work by LUCENE-9004.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World, LUCENE-9004), has made great progress. Compared with 
HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will 
provide one more optional choice for interesting users.  I'm now working on the 
implementation of IVFFlat. And I will try my best to reuse the excellent work 
by LUCENE-9004.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World) Approximate nearest vector search , has made great 
progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce 
IVFFlat to Lucene will provide one more optional choice for interesting users.  
I'm now working on the implementation of IVFFlat. And I will try my best to 
reuse the excellent work by LUCENE-9004.


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified i

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[faiss|[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, 
implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical 
Navigable Small World) Approximate nearest vector search , has made great 
progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce 
IVFFlat to Lucene will provide one more optional choice for interesting users.  
I'm now working on the implementation of IVFFlat. And I will try my best to 
reuse the excellent work by LUCENE-9004.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[faiss|[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

The IVFFlat and HNSW are the most popular ones among all the algorithms. 
Recently, implementation of ANN algorithms for Lucene, such as HNSW 
(Hierarchical Navigable Small World) [Approximate nearest vector search 
|https://issues.apache.org/jira/browse/LUCENE-9004], has made great progress. 
I'm now working on the implementation of IVFFlat. And I will try my best to 
reuse the excellent work by LUCENE-9004


> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such 

[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface 
[faiss|[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as kd-tree;
 # Hashing methods, such as LSH (local sensitive hashing);
 # Product quantization algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

The IVFFlat and HNSW are the most popular ones among all the algorithms. 
Recently, implementation of ANN algorithms for Lucene, such as HNSW 
(Hierarchical Navigable Small World) [Approximate nearest vector search 
|https://issues.apache.org/jira/browse/LUCENE-9004], has made great progress. 
I'm now working on the implementation of IVFFlat. And I will try my best to 
reuse the excellent work by LUCENE-9004

> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> Representation learning (RL) has been an established discipline in the 
> machine learning space for decades but it draws tremendous attention lately 
> with the emergence of deep learning. The central problem of RL is to 
> determine an optimal representation of the input data. By embedding the data 
> into a high dimensional vector, the vector retrieval (VR) method is then 
> applied to search the relevant items.
> With the rapid development of RL over the past few years, the technique has 
> been used extensively in industry from online advertising to computer vision 
> and speech recognition. There exist many open source implementations of VR 
> algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
> choices for potential users. However, the aforementioned implementations are 
> all written in C++, and no plan for supporting Java interface 
> [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. 
> The algorithms for vector retrieval can be roughly classified into four 
> categories,
>  # Tree-base algorithms, such as kd-tree;
>  # Hashing methods, such as LSH (local sensitive hashing);
>  # Product quantization algorithms, such as IVFFlat;
>  # Graph-base algorithms, such as HNSW, SSG, NSG;
> The IVFFlat and HNSW are the most popular ones among all the algorithms. 
> Recently, implementation of ANN algorithms for Lucene, such as HNSW 
> (Hierarchical Navigable Small World) [Approximate nearest vector search 
> |https://issues.apache.org/jira/browse/LUCENE-9004], has made great progress. 
> I'm now working on the implementation of IVFFlat. And I will try my best to 
> reuse the excellent work by LUCENE-9004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Summary: Introduce IVFFlat for ANN similarity search  (was: Add delete 
action for HNSW and fix merger when segments contain deleted vectors)

> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> This issue is 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: (was: This issue is )
 Issue Type: New Feature  (was: Bug)

> Introduce IVFFlat for ANN similarity search
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Xin-Chun Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on issue #1042: LUCENE-9068: Build FuzzyQuery automata up-front

2020-01-14 Thread GitBox
madrob commented on issue #1042: LUCENE-9068: Build FuzzyQuery automata up-front
URL: https://github.com/apache/lucene-solr/pull/1042#issuecomment-574415386
 
 
   LGTM after precommit comes back happy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on issue #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
madrob commented on issue #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574409798
 
 
   One final round of updates:
   * I moved the java plugin check to inside of the task and verified that the 
other projects would actually get called/fail with bad headers.
   * Added support for incremental builds so that should decrease the impact 
time to most developers
   * Added in the missing directories that didn't have their own projects. And 
a landmine if those do end up becoming projects so that somebody pays attention 
and fixes it (probably one of us)
   
   I think that's everything in this PR, and we're ready to merge? 
   
   We can go and switch from ant task to rat classes directly in a later change 
if we decide that's worthwhile.
   
   > To be honest I would prefer not to add those license headers to build 
files, unless it's a requirement of apache legal -- this would have to be 
verified. For one thing, I don't perceive build files as something particularly 
valuable as an intellectual property (although it surely does require 
intellectual input to write them). But even if then the distribution bundle 
comes with a top-level license file that covers them?
   
   I guess we'll hear from LEGAL about this one way or the other. Not going to 
bother adding them for this PR until somebody does the checks.
   
   > For now I'd rather keep it in this "aspect-oriented" form if you don't 
mind (but this is a subjective decision, not any better or common practice).
   
   That's fine. I just noticed this and was curious since we were moving from 
the project oriented approach with ant to this approach and wanted to make sure 
it was a conscious decision.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14187) SolrJ async admin request helpers don't work with per-request based authentication

2020-01-14 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created SOLR-14187:
-

 Summary: SolrJ async admin request helpers don't work with 
per-request based authentication
 Key: SOLR-14187
 URL: https://issues.apache.org/jira/browse/SOLR-14187
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: security, SolrJ
Reporter: Chris M. Hostetter


Discovered this set of related bugs while trying to write a test using 
BasicAuth...
 * {{AsyncCollectionAdminRequest.processAndWait(...)}} doesn't copy any 
authentication settings (example: if the user called: 
{{setBasicAuthCredentials(...)}} ) from the original 
{{AsyncCollectionAdminRequest}} when creating the underlying {{RequestStatus}} 
instance
 * Even if clients create their own {{RequestStatus}} instance and set 
credentials on it before calling {{waitFor(...)}}, _it_ doesn't copy those 
credentials when trying to call {{deleteAsyncId(...)}} on {{COMPLETED}} or 
{{FAILED}} results



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

2020-01-14 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015341#comment-17015341
 ] 

David Smiley commented on LUCENE-9125:
--

[~broustant] could you please post lucene-util benchmark results here?

I was looking at the Lucene nightly benchmarks for fuzzy queries and I see a 
sudden drop, e.g.: https://home.apache.org/~mikemccand/lucenebench/Fuzzy2.html
But many other queries suddenly dropped like even 
https://home.apache.org/~mikemccand/lucenebench/TermDTSort.html that have 
nothing to do with automata.  CC [~mikemccand]

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
madrob commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366524386
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import groovy.xml.NamespaceBuilder
+
+allprojects {
+configurations {
+rat
+}
+
+dependencies {
+rat 'org.apache.rat:apache-rat-tasks:0.13'
+}
+}
+
+subprojects {
+plugins.withId("java", {
 
 Review comment:
   We _could_ explicitly code the `src/java` and `src/test` paths here, but we 
are also already doing this in `gradle/ant-compat/folder-layout.gradle` and I'd 
like to stick to doing it in one place. As for projects with files but not the 
java plugin... do we have those? I'll need to figure out what happens if we 
have a failed file in lucene/ directory directly...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] nknize commented on a change in pull request #762: LUCENE-8903: Add LatLonShape point query

2020-01-14 Thread GitBox
nknize commented on a change in pull request #762: LUCENE-8903: Add LatLonShape 
point query
URL: https://github.com/apache/lucene-solr/pull/762#discussion_r301264563
 
 

 ##
 File path: 
lucene/sandbox/src/java/org/apache/lucene/document/LatLonShapePointQuery.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.util.Arrays;
+
+import org.apache.lucene.geo.GeoEncodingUtils;
+import org.apache.lucene.index.PointValues;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.util.NumericUtils;
+
+import static java.lang.Integer.BYTES;
+import static org.apache.lucene.geo.GeoUtils.orient;
+
+/**
+ * Finds all previously indexed shapes that intersect the specified bounding 
box.
+ *
+ * The field must be indexed using
+ * {@link LatLonShape#createIndexableFields} added per document.
+ *
+ *  @lucene.experimental
+ **/
+final class LatLonShapePointQuery extends LatLonShapeQuery {
+  final double lat;
+  final double lon;
+  final int latEnc;
+  final int lonEnc;
+  final byte[] point;
+
+  public LatLonShapePointQuery(String field, LatLonShape.QueryRelation 
queryRelation, double lat, double lon) {
+super(field, queryRelation);
+this.lat = lat;
+this.lon = lon;
+this.point = new byte[2 * LatLonShape.BYTES];
+this.lonEnc = GeoEncodingUtils.encodeLongitude(lon);
+this.latEnc = GeoEncodingUtils.encodeLatitude(lat);
+NumericUtils.intToSortableBytes(latEnc, this.point, 0);
+NumericUtils.intToSortableBytes(lonEnc, this.point, LatLonShape.BYTES);
+  }
+
+  @Override
+  protected Relation relateRangeBBoxToQuery(int minXOffset, int minYOffset, 
byte[] minTriangle,
+int maxXOffset, int maxYOffset, 
byte[] maxTriangle) {
+if (Arrays.compareUnsigned(minTriangle, minXOffset, minXOffset + BYTES, 
point,  BYTES, 2 * BYTES) > 0 ||
+Arrays.compareUnsigned(maxTriangle, maxXOffset, maxXOffset + BYTES, 
point, BYTES, 2 * BYTES) < 0 ||
+Arrays.compareUnsigned(minTriangle, minYOffset, minYOffset + BYTES, 
point, 0, BYTES) > 0 ||
+Arrays.compareUnsigned(maxTriangle, maxYOffset, maxYOffset + BYTES, 
point, 0, BYTES) < 0) {
+  return PointValues.Relation.CELL_OUTSIDE_QUERY;
+}
+return PointValues.Relation.CELL_CROSSES_QUERY;
+  }
+
+  /** returns true if the query matches the encoded triangle */
+  @Override
+  protected boolean queryMatches(byte[] t, int[] scratchTriangle, 
LatLonShape.QueryRelation queryRelation) {
+
+// decode indexed triangle
+LatLonShape.decodeTriangle(t, scratchTriangle);
+
+int aY = scratchTriangle[0];
+int aX = scratchTriangle[1];
+int bY = scratchTriangle[2];
+int bX = scratchTriangle[3];
+int cY = scratchTriangle[4];
+int cX = scratchTriangle[5];
+
+if (queryRelation == LatLonShape.QueryRelation.WITHIN) {
+   if (aY == bY && cY == aY && aX == bX && cX == aX) {
+ return lonEnc == aX && latEnc == aY;
+   }
+  return false;
+}
+return pointInTriangle(lonEnc, latEnc, aX, aY, bX, bY, cX, cY);
+  }
+
+  //This should be moved when LatLonShape is moved from sandbox!
+  /**
+   * Compute whether the given x, y point is in a triangle; uses the winding 
order method */
+  private static boolean pointInTriangle (double x, double y, double ax, 
double ay, double bx, double by, double cx, double cy) {
 
 Review comment:
   duplicate of? 
https://github.com/apache/lucene-solr/blob/ac209b637d68c84ce1402b6b8967514ce9cf6854/lucene/sandbox/src/java/org/apache/lucene/geo/Tessellator.java#L795


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] irvingzhang opened a new pull request #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging

2020-01-14 Thread GitBox
irvingzhang opened a new pull request #1169: LUCENE-9004: A minor feature and 
patch -- support deleting vector values and fix segments merging
URL: https://github.com/apache/lucene-solr/pull/1169
 
 
   I think this commit belongs to this issue 
(https://issues.apache.org/jira/browse/LUCENE-9004). I'm not sure if I need to 
create a new issue. Following are my specified considerations,
   1. A minor feature: Regarding to the ANN search problems, it's dangerous to 
delete vectors according to similarity search result in HNSW.  The selected 
docs are neither sorted nor reduced. The number of deleted vectors is 
proportional to the segment count and the parameter ef. And the deleted vectors 
is obviously uncertain. Hence, I created a new type of Query (KnnDelQuery) And 
Weight (KnnDelScoreWeight) for the dedicated deleting of the exact values that 
matching the query vector;
   2. A minor patch: For fixing the merge process while some segments may 
contain deleted documents that must be filtered;
   
   The modified codes have been tested by the test cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9137) Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html

2020-01-14 Thread Sebb (Jira)
Sebb created LUCENE-9137:


 Summary: Broken link 'Change log' for 8.4.1 on 
https://lucene.apache.org/core/downloads.html
 Key: LUCENE-9137
 URL: https://issues.apache.org/jira/browse/LUCENE-9137
 Project: Lucene - Core
  Issue Type: Bug
 Environment: Broken link 'Change log' for 8.4.1 on 
https://lucene.apache.org/core/downloads.html
Reporter: Sebb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9136) Add delete action for HNSW and fix merger when segments contain deleted vectors

2020-01-14 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: This issue is 

> Add delete action for HNSW and fix merger when segments contain deleted 
> vectors
> ---
>
> Key: LUCENE-9136
> URL: https://issues.apache.org/jira/browse/LUCENE-9136
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Xin-Chun Zhang
>Priority: Major
>
> This issue is 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9136) Add delete action for HNSW and fix merger when segments contain deleted vectors

2020-01-14 Thread Xin-Chun Zhang (Jira)
Xin-Chun Zhang created LUCENE-9136:
--

 Summary: Add delete action for HNSW and fix merger when segments 
contain deleted vectors
 Key: LUCENE-9136
 URL: https://issues.apache.org/jira/browse/LUCENE-9136
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Xin-Chun Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-14 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned LUCENE-9134:
--

Assignee: Erick Erickson

> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for --
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1168: LUCENE-9135: Make uniformsplit.FieldMetadata counters long.

2020-01-14 Thread GitBox
bruno-roustant opened a new pull request #1168: LUCENE-9135: Make 
uniformsplit.FieldMetadata counters long.
URL: https://github.com/apache/lucene-solr/pull/1168
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9135) UniformSplit FieldMetadata counters should all be long

2020-01-14 Thread Bruno Roustant (Jira)
Bruno Roustant created LUCENE-9135:
--

 Summary: UniformSplit FieldMetadata counters should all be long
 Key: LUCENE-9135
 URL: https://issues.apache.org/jira/browse/LUCENE-9135
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Bruno Roustant
Assignee: Bruno Roustant


Currently UniformSplit FieldMetadata stores sumDocFreq, numTerms, 
sumTotalTermFreq as int which is incorrect.

The fix is to make them long. The postings format will be compatible since 
those counters are currently written as VInt and they will be read as VLong 
(and then written as VLong afterwards).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on a change in pull request #1152: SOLR-14172: Collection metadata remains in zookeeper if too many shards requested

2020-01-14 Thread GitBox
risdenk commented on a change in pull request #1152: SOLR-14172: Collection 
metadata remains in zookeeper if too many shards requested
URL: https://github.com/apache/lucene-solr/pull/1152#discussion_r366413865
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java
 ##
 @@ -190,10 +190,12 @@ public void call(ClusterState clusterState, ZkNodeProps 
message, NamedList resul
   try {
 replicaPositions = buildReplicaPositions(ocmh.cloudManager, 
clusterState, clusterState.getCollection(collectionName), message, shardNames, 
sessionWrapper);
   } catch (Assign.AssignmentException e) {
-ZkNodeProps deleteMessage = new ZkNodeProps("name", collectionName);
-new DeleteCollectionCmd(ocmh).call(clusterState, deleteMessage, 
results);
+deleteCollection(clusterState, results, collectionName);
 // unwrap the exception
 throw new SolrException(ErrorCode.SERVER_ERROR, e.getMessage(), 
e.getCause());
+  } catch (SolrException e) {
 
 Review comment:
   So one question I have is why is the error coming back from 
`buildReplicaPositions` not an `Assign.AssignmentException`? Is it because it 
is wrapped in a `SolrException` from the remote node? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation

2020-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015152#comment-17015152
 ] 

ASF subversion and git services commented on LUCENE-9117:
-

Commit fb5ba8c9de62baba91d32c3b1e9b2faea8fe5f01 in lucene-solr's branch 
refs/heads/gradle-master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fb5ba8c ]

LUCENE-9117: follow-up.


> RamUsageEstimator hangs with AOT compilation
> 
>
> Key: LUCENE-9117
> URL: https://issues.apache.org/jira/browse/LUCENE-9117
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-9117.patch
>
>
> Mailing list report by Cleber Muramoto.
> {code}
> After generating a pre-compiled image lucene-core (8.3.0)  with jaotc (JDK
> 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the
> static initializer.
> Steps to reproduce:
>  1)Generate the image with
> >jaotc --info --ignore-errors --jar 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >--output lucene-core.so
> 2)Create a simple test class to trigger class loading
>  import java.io.IOException;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.MMapDirectory;
> public class TestAOT {
> public static void main(String...args){
>  run();
>  System.out.println("Done");
>  }
> static void run(){
>  try {
>  var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new
> IndexWriterConfig());
>  iw.close();
>  } catch (IOException e) {
>  e.printStackTrace();
>  }
>  }
> }
> 3)Run the class with the pre-compiled image
> >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT  -cp 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >TestAOT.java
> 4)The program never completes. After inspecting it with jstack  it
> shows that it's stuck in line 195 of RamUsageEstimator
> "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s
> tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000]
>   java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195)
> at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276)
> at TestAOT.run(TestAOT.java:20)
> at TestAOT.main(TestAOT.java:14)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native
> Method)
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62)
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567)
> at 
> com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415)
> at 
> com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192)
> at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1
> /Main.java:132)
> My guess is that the AOT compiler aggressive optimizations like
> inlining/scalar replacing calls to Long.valueOf, are working against the
> code's desired semantics.
> Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be
> conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK
> 7 onwards (don't know about older versions) Long cache has a fixed range of
> [-128,127], so there's no need to loop until a non-cached boxed value shows
> up.
> I know this compiler bug isn't Lucene's fault and as a workaround one could
> use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc,
> however, it would be nice to have a faster, allocation/loop-free
> initializer for RamUsageEstimator nevertheless.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation

2020-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015146#comment-17015146
 ] 

ASF subversion and git services commented on LUCENE-9117:
-

Commit 742301ca155f556b1e7374d7662a14608659f84b in lucene-solr's branch 
refs/heads/gradle-master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=742301c ]

LUCENE-9117: RamUsageEstimator hangs with AOT compilation. Removed any attempt 
to estimate Long.valueOf cache size.


> RamUsageEstimator hangs with AOT compilation
> 
>
> Key: LUCENE-9117
> URL: https://issues.apache.org/jira/browse/LUCENE-9117
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-9117.patch
>
>
> Mailing list report by Cleber Muramoto.
> {code}
> After generating a pre-compiled image lucene-core (8.3.0)  with jaotc (JDK
> 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the
> static initializer.
> Steps to reproduce:
>  1)Generate the image with
> >jaotc --info --ignore-errors --jar 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >--output lucene-core.so
> 2)Create a simple test class to trigger class loading
>  import java.io.IOException;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.MMapDirectory;
> public class TestAOT {
> public static void main(String...args){
>  run();
>  System.out.println("Done");
>  }
> static void run(){
>  try {
>  var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new
> IndexWriterConfig());
>  iw.close();
>  } catch (IOException e) {
>  e.printStackTrace();
>  }
>  }
> }
> 3)Run the class with the pre-compiled image
> >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT  -cp 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >TestAOT.java
> 4)The program never completes. After inspecting it with jstack  it
> shows that it's stuck in line 195 of RamUsageEstimator
> "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s
> tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000]
>   java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195)
> at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276)
> at TestAOT.run(TestAOT.java:20)
> at TestAOT.main(TestAOT.java:14)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native
> Method)
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62)
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567)
> at 
> com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415)
> at 
> com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192)
> at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1
> /Main.java:132)
> My guess is that the AOT compiler aggressive optimizations like
> inlining/scalar replacing calls to Long.valueOf, are working against the
> code's desired semantics.
> Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be
> conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK
> 7 onwards (don't know about older versions) Long cache has a fixed range of
> [-128,127], so there's no need to loop until a non-cached boxed value shows
> up.
> I know this compiler bug isn't Lucene's fault and as a workaround one could
> use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc,
> however, it would be nice to have a faster, allocation/loop-free
> initializer for RamUsageEstimator nevertheless.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation

2020-01-14 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015142#comment-17015142
 ] 

Dawid Weiss commented on LUCENE-9117:
-

Thanks. I'll prepare a pull request, test and commit.

> RamUsageEstimator hangs with AOT compilation
> 
>
> Key: LUCENE-9117
> URL: https://issues.apache.org/jira/browse/LUCENE-9117
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-9117.patch
>
>
> Mailing list report by Cleber Muramoto.
> {code}
> After generating a pre-compiled image lucene-core (8.3.0)  with jaotc (JDK
> 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the
> static initializer.
> Steps to reproduce:
>  1)Generate the image with
> >jaotc --info --ignore-errors --jar 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >--output lucene-core.so
> 2)Create a simple test class to trigger class loading
>  import java.io.IOException;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.MMapDirectory;
> public class TestAOT {
> public static void main(String...args){
>  run();
>  System.out.println("Done");
>  }
> static void run(){
>  try {
>  var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new
> IndexWriterConfig());
>  iw.close();
>  } catch (IOException e) {
>  e.printStackTrace();
>  }
>  }
> }
> 3)Run the class with the pre-compiled image
> >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT  -cp 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >TestAOT.java
> 4)The program never completes. After inspecting it with jstack  it
> shows that it's stuck in line 195 of RamUsageEstimator
> "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s
> tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000]
>   java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195)
> at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276)
> at TestAOT.run(TestAOT.java:20)
> at TestAOT.main(TestAOT.java:14)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native
> Method)
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62)
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567)
> at 
> com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415)
> at 
> com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192)
> at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1
> /Main.java:132)
> My guess is that the AOT compiler aggressive optimizations like
> inlining/scalar replacing calls to Long.valueOf, are working against the
> code's desired semantics.
> Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be
> conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK
> 7 onwards (don't know about older versions) Long cache has a fixed range of
> [-128,127], so there's no need to loop until a non-cached boxed value shows
> up.
> I know this compiler bug isn't Lucene's fault and as a workaround one could
> use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc,
> however, it would be nice to have a faster, allocation/loop-free
> initializer for RamUsageEstimator nevertheless.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud

2020-01-14 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015114#comment-17015114
 ] 

David Smiley commented on SOLR-14040:
-

Added PR with details.

> solr.xml shareSchema does not work in SolrCloud
> ---
>
> Key: SOLR-14040
> URL: https://issues.apache.org/jira/browse/SOLR-14040
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> solr.xml has a shareSchema boolean option that can be toggled from the 
> default of false to true in order to share IndexSchema objects within the 
> Solr node.  This is silently ignored in SolrCloud mode.  The pertinent code 
> is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which 
> creates a CloudConfigSetService that is not related to the SchemaCaching 
> class.  This may not be a big deal in SolrCloud which tends not to deal well 
> with many cores per node but I'm working on changing that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud

2020-01-14 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned SOLR-14040:
---

Assignee: David Smiley

> solr.xml shareSchema does not work in SolrCloud
> ---
>
> Key: SOLR-14040
> URL: https://issues.apache.org/jira/browse/SOLR-14040
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> solr.xml has a shareSchema boolean option that can be toggled from the 
> default of false to true in order to share IndexSchema objects within the 
> Solr node.  This is silently ignored in SolrCloud mode.  The pertinent code 
> is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which 
> creates a CloudConfigSetService that is not related to the SchemaCaching 
> class.  This may not be a big deal in SolrCloud which tends not to deal well 
> with many cores per node but I'm working on changing that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ctargett commented on issue #1164: SOLR-12930: Create developer docs in source repo

2020-01-14 Thread GitBox
ctargett commented on issue #1164: SOLR-12930: Create developer docs in source 
repo
URL: https://github.com/apache/lucene-solr/pull/1164#issuecomment-574181698
 
 
   > The gradle branch has a number of guide-style txt files under top-level 
help/ folder. I don't know how this fits with this proposal?
   
   Oh, cool, thanks for pointing those out. Those files appear to be pretty 
much about the Gradle build, so under this proposal what I would say is that 
those docs should move to the new top-level `dev-docs` directory. To start we 
could just put them in that directory and later as we move content into source 
we could add some organization via sub-directories to help people out.
   
   If it looks like we'll do this proposal, I'd be happy to make the changes on 
the branch to minimize your workload.
   
   They should also be converted to .adoc file format, which would be easy for 
me to do rather quickly and I'd offer to do that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation

2020-01-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015082#comment-17015082
 ] 

Adrien Grand commented on LUCENE-9117:
--

+1

> RamUsageEstimator hangs with AOT compilation
> 
>
> Key: LUCENE-9117
> URL: https://issues.apache.org/jira/browse/LUCENE-9117
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-9117.patch
>
>
> Mailing list report by Cleber Muramoto.
> {code}
> After generating a pre-compiled image lucene-core (8.3.0)  with jaotc (JDK
> 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the
> static initializer.
> Steps to reproduce:
>  1)Generate the image with
> >jaotc --info --ignore-errors --jar 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >--output lucene-core.so
> 2)Create a simple test class to trigger class loading
>  import java.io.IOException;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.MMapDirectory;
> public class TestAOT {
> public static void main(String...args){
>  run();
>  System.out.println("Done");
>  }
> static void run(){
>  try {
>  var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new
> IndexWriterConfig());
>  iw.close();
>  } catch (IOException e) {
>  e.printStackTrace();
>  }
>  }
> }
> 3)Run the class with the pre-compiled image
> >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT  -cp 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >TestAOT.java
> 4)The program never completes. After inspecting it with jstack  it
> shows that it's stuck in line 195 of RamUsageEstimator
> "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s
> tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000]
>   java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195)
> at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276)
> at TestAOT.run(TestAOT.java:20)
> at TestAOT.main(TestAOT.java:14)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native
> Method)
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62)
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567)
> at 
> com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415)
> at 
> com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192)
> at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1
> /Main.java:132)
> My guess is that the AOT compiler aggressive optimizations like
> inlining/scalar replacing calls to Long.valueOf, are working against the
> code's desired semantics.
> Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be
> conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK
> 7 onwards (don't know about older versions) Long cache has a fixed range of
> [-128,127], so there's no need to loop until a non-cached boxed value shows
> up.
> I know this compiler bug isn't Lucene's fault and as a workaround one could
> use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc,
> however, it would be nice to have a faster, allocation/loop-free
> initializer for RamUsageEstimator nevertheless.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with something in TestInjection

2020-01-14 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015060#comment-17015060
 ] 

Lucene/Solr QA commented on SOLR-14184:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m  2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m  2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 37s{color} 
| {color:red} core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.cloud.SystemCollectionCompatTest |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14184 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12990820/SOLR-14184.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 2cda4184c94 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/650/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/650/testReport/ |
| modules | C: solr/core solr/test-framework U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/650/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> replace DirectUpdateHandler2.commitOnClose with something in TestInjection
> --
>
> Key: SOLR-14184
> URL: https://issues.apache.org/jira/browse/SOLR-14184
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14184.patch, SOLR-14184.patch
>
>
> {code:java}
> public static volatile boolean commitOnClose = true;  // TODO: make this a 
> real config option or move it to TestInjection
> {code}
> Lots of tests muck with this (to simulate unclean shutdown and force tlog 
> replay on restart) but there's no garuntee that it is reset properly.
> It should be replaced by logic in {{TestInjection}} that is correctly cleaned 
> up by {{TestInjection.reset()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-14 Thread GitBox
a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added 
uniqueBlockQuery(parent:true) aggregation for JSON Facet
URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366312261
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/search/facet/UniqueBlockQueryAgg.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.search.facet;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Objects;
+import java.util.function.IntFunction;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.util.BitSet;
+
+import static 
org.apache.solr.search.join.BlockJoinParentQParser.getCachedFilter;
+
+public class UniqueBlockQueryAgg extends AggValueSource {
 
 Review comment:
   Extracted. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-14 Thread GitBox
a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added 
uniqueBlockQuery(parent:true) aggregation for JSON Facet
URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366312200
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/ValueSourceParser.java
 ##
 @@ -975,6 +976,13 @@ public ValueSource parse(FunctionQParser fp) throws 
SyntaxError {
   }
 });
 
+addParser("agg_uniqueBlockQuery", new ValueSourceParser() {
+  @Override
+  public ValueSource parse(FunctionQParser fp) throws SyntaxError {
 
 Review comment:
   Added parseNamedArg to FunctionQParser.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija edited a comment on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes

2020-01-14 Thread GitBox
gerlowskija edited a comment on issue #1163: SOLR-14186: Enforce CRLF in 
Windows files with .gitattributes
URL: https://github.com/apache/lucene-solr/pull/1163#issuecomment-574142361
 
 
   > I thought it was git itself that enforces gitattributes, not 3rd party 
tools.
   
   That's what I thought too.  What sort of editor/tool issues have you seen 
Dawid?
   
   > Don't know if git uses it only when checking out files from repo or also 
when committing changes?
   
   As far as I can tell from the docs 
[here,](https://git-scm.com/docs/gitattributes#_checking_out_and_checking_in) 
it looks like both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes

2020-01-14 Thread GitBox
gerlowskija commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files 
with .gitattributes
URL: https://github.com/apache/lucene-solr/pull/1163#issuecomment-574142361
 
 
   > I thought it was git itself that enforces gitattributes, not 3rd party 
tools.
   That's what I thought too.  What sort of editor/tool issues have you seen 
Dawid?
   
   > Don't know if git uses it only when checking out files from repo or also 
when committing changes?
   As far as I can tell from the docs 
[here,](https://git-scm.com/docs/gitattributes#_checking_out_and_checking_in) 
it looks like both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation

2020-01-14 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015019#comment-17015019
 ] 

Dawid Weiss commented on LUCENE-9117:
-

The AOT compiler is indeed pretty clever at optimizing away object equality 
check here. The attached patch works but I don't think we should even try to be 
smart and measure cache size here... Sure - we would overestimate for certain 
scenarios but I don't think it's harmful to overestimate here (and the specs on 
the result of valueOf are such that even a smart java compiler could turn the 
check expression into a constant).

[~jpountz] What do you think?

> RamUsageEstimator hangs with AOT compilation
> 
>
> Key: LUCENE-9117
> URL: https://issues.apache.org/jira/browse/LUCENE-9117
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-9117.patch
>
>
> Mailing list report by Cleber Muramoto.
> {code}
> After generating a pre-compiled image lucene-core (8.3.0)  with jaotc (JDK
> 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the
> static initializer.
> Steps to reproduce:
>  1)Generate the image with
> >jaotc --info --ignore-errors --jar 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >--output lucene-core.so
> 2)Create a simple test class to trigger class loading
>  import java.io.IOException;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.MMapDirectory;
> public class TestAOT {
> public static void main(String...args){
>  run();
>  System.out.println("Done");
>  }
> static void run(){
>  try {
>  var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new
> IndexWriterConfig());
>  iw.close();
>  } catch (IOException e) {
>  e.printStackTrace();
>  }
>  }
> }
> 3)Run the class with the pre-compiled image
> >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT  -cp 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >TestAOT.java
> 4)The program never completes. After inspecting it with jstack  it
> shows that it's stuck in line 195 of RamUsageEstimator
> "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s
> tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000]
>   java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195)
> at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276)
> at TestAOT.run(TestAOT.java:20)
> at TestAOT.main(TestAOT.java:14)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native
> Method)
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62)
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567)
> at 
> com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415)
> at 
> com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192)
> at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1
> /Main.java:132)
> My guess is that the AOT compiler aggressive optimizations like
> inlining/scalar replacing calls to Long.valueOf, are working against the
> code's desired semantics.
> Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be
> conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK
> 7 onwards (don't know about older versions) Long cache has a fixed range of
> [-128,127], so there's no need to loop until a non-cached boxed value shows
> up.
> I know this compiler bug isn't Lucene's fault and as a workaround one could
> use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc,
> however, it would be nice to have a faster, allocation/loop-free
> initializer for RamUsageEstimator nevertheless.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation

2020-01-14 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9117:

Attachment: LUCENE-9117.patch

> RamUsageEstimator hangs with AOT compilation
> 
>
> Key: LUCENE-9117
> URL: https://issues.apache.org/jira/browse/LUCENE-9117
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-9117.patch
>
>
> Mailing list report by Cleber Muramoto.
> {code}
> After generating a pre-compiled image lucene-core (8.3.0)  with jaotc (JDK
> 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the
> static initializer.
> Steps to reproduce:
>  1)Generate the image with
> >jaotc --info --ignore-errors --jar 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >--output lucene-core.so
> 2)Create a simple test class to trigger class loading
>  import java.io.IOException;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.MMapDirectory;
> public class TestAOT {
> public static void main(String...args){
>  run();
>  System.out.println("Done");
>  }
> static void run(){
>  try {
>  var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new
> IndexWriterConfig());
>  iw.close();
>  } catch (IOException e) {
>  e.printStackTrace();
>  }
>  }
> }
> 3)Run the class with the pre-compiled image
> >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT  -cp 
> >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar 
> >TestAOT.java
> 4)The program never completes. After inspecting it with jstack  it
> shows that it's stuck in line 195 of RamUsageEstimator
> "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s
> tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000]
>   java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195)
> at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276)
> at TestAOT.run(TestAOT.java:20)
> at TestAOT.main(TestAOT.java:14)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native
> Method)
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62)
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567)
> at 
> com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415)
> at 
> com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192)
> at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1
> /Main.java:132)
> My guess is that the AOT compiler aggressive optimizations like
> inlining/scalar replacing calls to Long.valueOf, are working against the
> code's desired semantics.
> Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be
> conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK
> 7 onwards (don't know about older versions) Long cache has a fixed range of
> [-128,127], so there's no need to loop until a non-cached boxed value shows
> up.
> I know this compiler bug isn't Lucene's fault and as a workaround one could
> use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc,
> however, it would be nice to have a faster, allocation/loop-free
> initializer for RamUsageEstimator nevertheless.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13845) DELETEREPLICA API by "count" and "type"

2020-01-14 Thread Amrit Sarkar (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014987#comment-17014987
 ] 

Amrit Sarkar commented on SOLR-13845:
-

Generated PR request with the latest code, and documentation.

> DELETEREPLICA API by "count" and "type"
> ---
>
> Key: SOLR-13845
> URL: https://issues.apache.org/jira/browse/SOLR-13845
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Amrit Sarkar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Attachments: SOLR-13845.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SOLR-9319 added support for deleting replicas by count. It would be great to 
> have the feature with added functionality the type of replica we want to 
> delete like we add replicas by count and type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sarkaramrit2 opened a new pull request #1167: SOLR-13845: DELETEREPLICA API by count and type

2020-01-14 Thread GitBox
sarkaramrit2 opened a new pull request #1167: SOLR-13845: DELETEREPLICA API by 
count and type
URL: https://github.com/apache/lucene-solr/pull/1167
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8615) Can LatLonShape's tessellator create more search-efficient triangles?

2020-01-14 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014971#comment-17014971
 ] 

Ignacio Vera commented on LUCENE-8615:
--

Here is an idea. We currently have the following type of triangles in the index:

!screenshot-1.png|width=432,height=464!

This plot shows that the potentially more wasteful triangles are the ones where 
two of the points belongs to bounding box (first four possibilities). So I 
wonder we should prevent to add those types of triangles to the index and 
instead split them using the longest side.

Note that the side effect is that we can reduce the number of triangle types to 
4.

> Can LatLonShape's tessellator create more search-efficient triangles?
> -
>
> Key: LUCENE-8615
> URL: https://issues.apache.org/jira/browse/LUCENE-8615
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: 2-tessellations.png, re-tessellate-triangle.png, 
> screenshot-1.png
>
>
> The triangular mesh produced by LatLonShape's Tessellator creates reasonable 
> numbers of triangles, which is helpful for indexing speed. However I'm 
> wondering that there are conditions when it might be beneficial to run 
> tessellation slightly differently in order to create triangles that are more 
> search-friendly. Given that we only index the minimum bounding rectangle for 
> each triangle, we always check for intersection between the query and the 
> triangle if the query intersects with the MBR of the triangle. So the smaller 
> the area of the triangle compared to its MBR, the higher the likeliness to 
> have false positive when querying.
> For instance see the following shape, there are two ways that it can be 
> tessellated into two triangles. LatLonShape's Tessellator is going to return 
> either of them depending on which point is listed first in the polygon. Yet 
> the first one is more efficient that the second one: with the second one, 
> both triangles have roughly the same MBR (which is also the MBR of the 
> polygon), so both triangles will need to be checked all the time whenever the 
> query intersects with this shared MBR. On the other hand, with the first way, 
> both MBRs are smaller and don't overlap, which makes it more likely that only 
> one triangle needs to be checked at query time.
>  !2-tessellations.png! 
> Another example is the following polygon. It can be tessellated into a single 
> triangle. Yet at times it might be a better idea create more triangles so 
> that the overall area of MBRs is smaller and queries are less likely to run 
> into false positives.
>  !re-tessellate-triangle.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8615) Can LatLonShape's tessellator create more search-efficient triangles?

2020-01-14 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8615:
-
Attachment: screenshot-1.png

> Can LatLonShape's tessellator create more search-efficient triangles?
> -
>
> Key: LUCENE-8615
> URL: https://issues.apache.org/jira/browse/LUCENE-8615
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: 2-tessellations.png, re-tessellate-triangle.png, 
> screenshot-1.png
>
>
> The triangular mesh produced by LatLonShape's Tessellator creates reasonable 
> numbers of triangles, which is helpful for indexing speed. However I'm 
> wondering that there are conditions when it might be beneficial to run 
> tessellation slightly differently in order to create triangles that are more 
> search-friendly. Given that we only index the minimum bounding rectangle for 
> each triangle, we always check for intersection between the query and the 
> triangle if the query intersects with the MBR of the triangle. So the smaller 
> the area of the triangle compared to its MBR, the higher the likeliness to 
> have false positive when querying.
> For instance see the following shape, there are two ways that it can be 
> tessellated into two triangles. LatLonShape's Tessellator is going to return 
> either of them depending on which point is listed first in the polygon. Yet 
> the first one is more efficient that the second one: with the second one, 
> both triangles have roughly the same MBR (which is also the MBR of the 
> polygon), so both triangles will need to be checked all the time whenever the 
> query intersects with this shared MBR. On the other hand, with the first way, 
> both MBRs are smaller and don't overlap, which makes it more likely that only 
> one triangle needs to be checked at query time.
>  !2-tessellations.png! 
> Another example is the following polygon. It can be tessellated into a single 
> triangle. Yet at times it might be a better idea create more triangles so 
> that the overall area of MBRs is smaller and queries are less likely to run 
> into false positives.
>  !re-tessellate-triangle.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366229880
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import groovy.xml.NamespaceBuilder
+
+allprojects {
+configurations {
+rat
+}
+
+dependencies {
+rat 'org.apache.rat:apache-rat-tasks:0.13'
+}
+}
+
+subprojects {
+plugins.withId("java", {
 
 Review comment:
   That's not a quirk :) What I'm saying is that you explicitly rely on the 
project to be a java convention project. We don't need to require this but then 
have to decide what to use as source folders.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on issue #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on issue #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574082560
 
 
   bq. One other question of best practices - do we want all the project 
exclusions in a central rat.gradle file, or spread out and local to each 
project?
   
   This is a good question and I think the answer is really subjective. 
   
   Both approaches have advantages and disadvantages. I personally like the 
"aspect" oriented approach when everything related to a particular build 
function is gathered in a single file. So yes, for rat it'd be just that single 
file -- any special handling of that aspect, exclusions, etc. would be 
collected there. In an extreme case you should be able to enable/ disable rat 
just by commenting out the apply block in the master build file.
   
   As the build evolves over time it may need to be changed. For example when 
you move parts of the build into buildSrc and wrap it in plugins it will become 
necessary to move project-specific logic outside.
   
   For now I'd rather keep it in this "aspect-oriented" form if you don't mind 
(but this is a subjective decision, not any better or common practice).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on issue #1157: Add RAT check using Gradle

2020-01-14 Thread GitBox
dweiss commented on issue #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574079514
 
 
   To be honest I would prefer not to add those license headers to build files, 
unless it's a requirement of apache legal -- this would have to be verified. 
For one thing, I don't perceive build files as something particularly valuable 
as an intellectual property (although it surely does require intellectual input 
to write them). But even if then the distribution bundle comes with a top-level 
license file that covers them?
   
   The downside of requiring this boilerplate is that they bloat each and every 
script. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes

2020-01-14 Thread GitBox
dweiss commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files with 
.gitattributes
URL: https://github.com/apache/lucene-solr/pull/1163#issuecomment-574074888
 
 
   You assume a single "git" but there are multiple versions of the tool and 
even multiple implementations (jgit, for example). An assumption editors 
support or understand .gitattributes (or .editorconfig) is probably overly 
optimistic either... Not that I use nodepad.exe... but I'm sure there are folks 
who do! :)
   
   And more seriously: I've experimented with both options in the past and I 
use them in other projects but still prefer an explicit check for sanity. 
Ideally an integration test that takes the final ZIP/TGZ bundle and verifies 
distribution sanity - this also covers the (not unlikely) possibility of build 
tools messing up files while doing content-filtering, copying, whatever else.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on issue #1164: SOLR-12930: Create developer docs in source repo

2020-01-14 Thread GitBox
dweiss commented on issue #1164: SOLR-12930: Create developer docs in source 
repo
URL: https://github.com/apache/lucene-solr/pull/1164#issuecomment-574072794
 
 
   The gradle branch has a number of guide-style txt files under top-level 
help/ folder. I don't know how this fits with this proposal?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org