[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366734889 ## File path: versions.props ## @@ -72,6 +72,7 @@ org.apache.opennlp:opennlp-tools=1.9.1 org.apache.pdfbox:*=2.0.17 org.apache.pdfbox:jempbox=1.8.16 org.apache.poi:*=4.1.1 +org.apache.rat:apache-rat:0.11 Review comment: Did you run gradlew precommit? :) Because I think the lock update file isn't included in the patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366734419 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,266 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import groovy.xml.NamespaceBuilder + +configure(rootProject) { +configurations { +rat +} + +dependencies { +rat "org.apache.rat:apache-rat" +} +} + +allprojects { +task("rat", type: RatTask) { +group = 'Verification' +description = 'Runs Apache Rat checks.' +} + +if (project == rootProject) { +rat { +includes += [ +"buildSrc/**/*.java", +"lucene/tools/forbiddenApis/**", +"lucene/tools/prettify/**", +// "dev-tools/**" +] +excludes += [ +// Unclear if this needs ASF header, depends on how much was copied from ElasticSearch +"**/ErrorReportingTestListener.java" +] +} +} + +if (project.path == ":lucene:analysis:common") { +rat { +srcExcludes += [ +"**/*.aff", +"**/*.dic", +"**/charfilter/*.htm*", +"**/*LuceneResourcesWikiPage.html" +] +} +} + +if (project.path == ":lucene:analysis:kuromoji") { +rat { +srcExcludes += [ +// whether rat detects this as binary or not is platform dependent?! +"**/bocchan.utf-8" +] +} +} + +if (project.path == ":lucene:analysis:opennlp") { +rat { +excludes += [ +"src/tools/test-model-data/*.txt", +] +} +} + +if (project.path == ":lucene:highlighter") { +rat { +srcExcludes += [ +"**/CambridgeMA.utf8" +] +} +} + +if (project.path == ":lucene:suggest") { +rat { +srcExcludes += [ +"**/Top50KWiki.utf8", +"**/stop-snowball.txt" +] +} +} + +if (project.path == ":lucene:tools") { +rat { +includes += [ +"forbiddenApis/**", +"prettify/**", +// If/when :lucene:tools becomes a gradle project, then the following line will fail Review comment: I don't think there will be lucene:tools anymore once we depart from ant - I don't think there is any relevant code in there that should stay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366734202 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,266 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import groovy.xml.NamespaceBuilder + +configure(rootProject) { +configurations { +rat +} + +dependencies { +rat "org.apache.rat:apache-rat" +} +} + +allprojects { +task("rat", type: RatTask) { +group = 'Verification' +description = 'Runs Apache Rat checks.' +} + +if (project == rootProject) { +rat { +includes += [ +"buildSrc/**/*.java", +"lucene/tools/forbiddenApis/**", +"lucene/tools/prettify/**", +// "dev-tools/**" +] +excludes += [ +// Unclear if this needs ASF header, depends on how much was copied from ElasticSearch +"**/ErrorReportingTestListener.java" +] +} +} + +if (project.path == ":lucene:analysis:common") { +rat { +srcExcludes += [ +"**/*.aff", +"**/*.dic", +"**/charfilter/*.htm*", +"**/*LuceneResourcesWikiPage.html" +] +} +} + +if (project.path == ":lucene:analysis:kuromoji") { +rat { +srcExcludes += [ +// whether rat detects this as binary or not is platform dependent?! Review comment: This is one of the reasons I'd rather have our own check - we don't have to rely on a black box (or look inside rat's code to figure out what it does). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366733855 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,266 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import groovy.xml.NamespaceBuilder + +configure(rootProject) { +configurations { +rat +} + +dependencies { +rat "org.apache.rat:apache-rat" +} +} + +allprojects { +task("rat", type: RatTask) { +group = 'Verification' +description = 'Runs Apache Rat checks.' +} + +if (project == rootProject) { +rat { +includes += [ +"buildSrc/**/*.java", +"lucene/tools/forbiddenApis/**", +"lucene/tools/prettify/**", +// "dev-tools/**" +] +excludes += [ +// Unclear if this needs ASF header, depends on how much was copied from ElasticSearch Review comment: The idea was but I changed a whole bunch of stuff along the way. There are similarities and the attribution is there but I think we can consider it our own. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1157: Add RAT check using Gradle
dweiss commented on issue #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574539229 Thanks Mike. I'll merge it in today. I actually looked at apache rat sources yesterday -- they are indeed riddled with encoding issues (writer on top of printstream, etc.) and they're not easy to use directly to apply just to a bunch of files instead of directory+inclusions/exclusions. My motivation here was to replace the whole fileset trickery with gradle's file collections. Then it's easier to pick what you need, pass it to task inputs, etc. Also, what rat does is merely simple pattern detection (line matching)... this could also be written in just plain groovy with relative ease - perhaps at some point we can just switch over to having our own license check where we can control things better. For now I think it's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet
mkhludnev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366721693 ## File path: solr/core/src/java/org/apache/solr/search/facet/UniqueBlockFieldAgg.java ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.search.facet; + +import java.io.IOException; + +import org.apache.solr.schema.SchemaField; + +public class UniqueBlockFieldAgg extends UniqueBlockAgg { + + private static final class UniqueBlockFieldSlotAcc extends UniqueBlockSlotAcc { Review comment: is it really necessary? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet
mkhludnev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366721270 ## File path: solr/core/src/java/org/apache/solr/search/FunctionQParser.java ## @@ -231,7 +232,59 @@ public String parseArg() throws SyntaxError { return val; } - + /** + * Parse argument in the same way as parseArg() but also parse the argument name + * if written in following syntax: argumentName=argument. + * + * @return Immutable entry with name as the key and argument as value. In case where there's no name, key is null. + * @throws SyntaxError in case when argument is not ended by ) or , + */ + public SimpleImmutableEntry parseNamedArg() throws SyntaxError { Review comment: 1. Usually it's nicer to declare interface ie Entry<>. 1. I've thought that Yonik whiteness about something already existed? Doesn't it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-9683) include/exclude filters by tag
[ https://issues.apache.org/jira/browse/SOLR-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015663#comment-17015663 ] Mikhail Khludnev commented on SOLR-9683: Ok. We had an filter exclusion before. Now, after SOLR-12490 allows to refer DSL query in domain filter it's worth to combine filtering and exclusion. My understanding that now it's one or other. > include/exclude filters by tag > -- > > Key: SOLR-9683 > URL: https://issues.apache.org/jira/browse/SOLR-9683 > Project: Solr > Issue Type: Improvement > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Priority: Major > > When specifying a filter list in JSON syntax, it would be nice to be able to > include/exclude filters by tag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12490) Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev resolved SOLR-12490. - Resolution: Fixed > Introducing json.queries was:Query DSL supports for further referring and > exclusion in JSON facets > --- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Labels: newdev > Fix For: 8.5 > > Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, > SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12490) Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015659#comment-17015659 ] ASF subversion and git services commented on SOLR-12490: Commit c90ef46497cf6314b63b4aeb1d69b2f8b64230bd in lucene-solr's branch refs/heads/branch_8x from Mikhail Khludnev [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c90ef46 ] SOLR-12490: Describe json.queries in the ref guide. Link it from many pages. Fix a few errors by the way. > Introducing json.queries was:Query DSL supports for further referring and > exclusion in JSON facets > --- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Labels: newdev > Fix For: 8.5 > > Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, > SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12490) Introducing json.queries was:Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015650#comment-17015650 ] ASF subversion and git services commented on SOLR-12490: Commit 5cf1ffef321cdcd43677d7e4fc3363f73a4ed468 in lucene-solr's branch refs/heads/master from Mikhail Khludnev [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5cf1ffe ] SOLR-12490: Describe json.queries in the ref guide. Link it from many pages. Fix a few errors by the way. > Introducing json.queries was:Query DSL supports for further referring and > exclusion in JSON facets > --- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Labels: newdev > Fix For: 8.5 > > Attachments: SOLR-12490-ref-guide.patch, SOLR-12490-ref-guide.patch, > SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015612#comment-17015612 ] Kazuaki Hiraga commented on LUCENE-9123: Hello [~tomoko], Please find an attached file that is named _LUCENE-9123_revised.patch_ , which added new constructors that reflect your comment. > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Hiraga updated LUCENE-9123: --- Attachment: LUCENE-9123_revised.patch > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015608#comment-17015608 ] Erick Erickson commented on LUCENE-9134: OK, bear with me, I'm finally getting my feet wet. First, let me check the magic: settings.gradle essentially points to all the directories containing a build.gradle file we care about, so any task defined in one of the build.gradle files will be found. So when I just added tasks (well, copied some of Marks work as a way to start) into lucene/queryparser/build.gradle) tasks are magically found and I can try to execute "./gradlew regenerate" for instance, which in turn eventually depends on a task (among others) defined like so: {code} task runJavaccQueryParser(type: org.apache.lucene.gradle.JavaCC)... {code} and fails with {code} Could not get unknown property 'org' for project ':lucene:queryparser' of type org.gradle.api.Project. {code} but at least it tries. Which brings me to the next bit. Mark's work has a directory here: ./buildSrc/src/main/groovy/org/apache/lucene/gradle which has a file JavaCC.groovy that defines the JavaCC class: {code} org.apache.lucene.gradle class JavaCC. {code} How are we organizing this kind of thing in the current work? I can move the JavaCC.groovy file anywhere it should go, where would that be? And before I do that, is there a better approach to this problem? > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. IVFFlat requires less memory and disks when compared with HNSW. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as KD-tree; # Hashing methods, such as LSH (Local Sensitive Hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World) Approximate nearest vector search , has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World) Approximate nearest vector search , has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World, LUCENE-9004), has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World) Approximate nearest vector search , has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified i
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World) Approximate nearest vector search , has made great progress. Compared with HNSW, IVFFlat requires less memory and disks. Introduce IVFFlat to Lucene will provide one more optional choice for interesting users. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004. was: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; The IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World) [Approximate nearest vector search |https://issues.apache.org/jira/browse/LUCENE-9004], has made great progress. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004 > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: Representation learning (RL) has been an established discipline in the machine learning space for decades but it draws tremendous attention lately with the emergence of deep learning. The central problem of RL is to determine an optimal representation of the input data. By embedding the data into a high dimensional vector, the vector retrieval (VR) method is then applied to search the relevant items. With the rapid development of RL over the past few years, the technique has been used extensively in industry from online advertising to computer vision and speech recognition. There exist many open source implementations of VR algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various choices for potential users. However, the aforementioned implementations are all written in C++, and no plan for supporting Java interface [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. The algorithms for vector retrieval can be roughly classified into four categories, # Tree-base algorithms, such as kd-tree; # Hashing methods, such as LSH (local sensitive hashing); # Product quantization algorithms, such as IVFFlat; # Graph-base algorithms, such as HNSW, SSG, NSG; The IVFFlat and HNSW are the most popular ones among all the algorithms. Recently, implementation of ANN algorithms for Lucene, such as HNSW (Hierarchical Navigable Small World) [Approximate nearest vector search |https://issues.apache.org/jira/browse/LUCENE-9004], has made great progress. I'm now working on the implementation of IVFFlat. And I will try my best to reuse the excellent work by LUCENE-9004 > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface > [faiss|[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as kd-tree; > # Hashing methods, such as LSH (local sensitive hashing); > # Product quantization algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > The IVFFlat and HNSW are the most popular ones among all the algorithms. > Recently, implementation of ANN algorithms for Lucene, such as HNSW > (Hierarchical Navigable Small World) [Approximate nearest vector search > |https://issues.apache.org/jira/browse/LUCENE-9004], has made great progress. > I'm now working on the implementation of IVFFlat. And I will try my best to > reuse the excellent work by LUCENE-9004 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Summary: Introduce IVFFlat for ANN similarity search (was: Add delete action for HNSW and fix merger when segments contain deleted vectors) > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: Bug >Reporter: Xin-Chun Zhang >Priority: Major > > This issue is -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: (was: This issue is ) Issue Type: New Feature (was: Bug) > Introduce IVFFlat for ANN similarity search > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on issue #1042: LUCENE-9068: Build FuzzyQuery automata up-front
madrob commented on issue #1042: LUCENE-9068: Build FuzzyQuery automata up-front URL: https://github.com/apache/lucene-solr/pull/1042#issuecomment-574415386 LGTM after precommit comes back happy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on issue #1157: Add RAT check using Gradle
madrob commented on issue #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574409798 One final round of updates: * I moved the java plugin check to inside of the task and verified that the other projects would actually get called/fail with bad headers. * Added support for incremental builds so that should decrease the impact time to most developers * Added in the missing directories that didn't have their own projects. And a landmine if those do end up becoming projects so that somebody pays attention and fixes it (probably one of us) I think that's everything in this PR, and we're ready to merge? We can go and switch from ant task to rat classes directly in a later change if we decide that's worthwhile. > To be honest I would prefer not to add those license headers to build files, unless it's a requirement of apache legal -- this would have to be verified. For one thing, I don't perceive build files as something particularly valuable as an intellectual property (although it surely does require intellectual input to write them). But even if then the distribution bundle comes with a top-level license file that covers them? I guess we'll hear from LEGAL about this one way or the other. Not going to bother adding them for this PR until somebody does the checks. > For now I'd rather keep it in this "aspect-oriented" form if you don't mind (but this is a subjective decision, not any better or common practice). That's fine. I just noticed this and was curious since we were moving from the project oriented approach with ant to this approach and wanted to make sure it was a conscious decision. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14187) SolrJ async admin request helpers don't work with per-request based authentication
Chris M. Hostetter created SOLR-14187: - Summary: SolrJ async admin request helpers don't work with per-request based authentication Key: SOLR-14187 URL: https://issues.apache.org/jira/browse/SOLR-14187 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: security, SolrJ Reporter: Chris M. Hostetter Discovered this set of related bugs while trying to write a test using BasicAuth... * {{AsyncCollectionAdminRequest.processAndWait(...)}} doesn't copy any authentication settings (example: if the user called: {{setBasicAuthCredentials(...)}} ) from the original {{AsyncCollectionAdminRequest}} when creating the underlying {{RequestStatus}} instance * Even if clients create their own {{RequestStatus}} instance and set credentials on it before calling {{waitFor(...)}}, _it_ doesn't copy those credentials when trying to call {{deleteAsyncId(...)}} on {{COMPLETED}} or {{FAILED}} results -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015341#comment-17015341 ] David Smiley commented on LUCENE-9125: -- [~broustant] could you please post lucene-util benchmark results here? I was looking at the Lucene nightly benchmarks for fuzzy queries and I see a sudden drop, e.g.: https://home.apache.org/~mikemccand/lucenebench/Fuzzy2.html But many other queries suddenly dropped like even https://home.apache.org/~mikemccand/lucenebench/TermDTSort.html that have nothing to do with automata. CC [~mikemccand] > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1157: Add RAT check using Gradle
madrob commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366524386 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import groovy.xml.NamespaceBuilder + +allprojects { +configurations { +rat +} + +dependencies { +rat 'org.apache.rat:apache-rat-tasks:0.13' +} +} + +subprojects { +plugins.withId("java", { Review comment: We _could_ explicitly code the `src/java` and `src/test` paths here, but we are also already doing this in `gradle/ant-compat/folder-layout.gradle` and I'd like to stick to doing it in one place. As for projects with files but not the java plugin... do we have those? I'll need to figure out what happens if we have a failed file in lucene/ directory directly... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize commented on a change in pull request #762: LUCENE-8903: Add LatLonShape point query
nknize commented on a change in pull request #762: LUCENE-8903: Add LatLonShape point query URL: https://github.com/apache/lucene-solr/pull/762#discussion_r301264563 ## File path: lucene/sandbox/src/java/org/apache/lucene/document/LatLonShapePointQuery.java ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.util.Arrays; + +import org.apache.lucene.geo.GeoEncodingUtils; +import org.apache.lucene.index.PointValues; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.util.NumericUtils; + +import static java.lang.Integer.BYTES; +import static org.apache.lucene.geo.GeoUtils.orient; + +/** + * Finds all previously indexed shapes that intersect the specified bounding box. + * + * The field must be indexed using + * {@link LatLonShape#createIndexableFields} added per document. + * + * @lucene.experimental + **/ +final class LatLonShapePointQuery extends LatLonShapeQuery { + final double lat; + final double lon; + final int latEnc; + final int lonEnc; + final byte[] point; + + public LatLonShapePointQuery(String field, LatLonShape.QueryRelation queryRelation, double lat, double lon) { +super(field, queryRelation); +this.lat = lat; +this.lon = lon; +this.point = new byte[2 * LatLonShape.BYTES]; +this.lonEnc = GeoEncodingUtils.encodeLongitude(lon); +this.latEnc = GeoEncodingUtils.encodeLatitude(lat); +NumericUtils.intToSortableBytes(latEnc, this.point, 0); +NumericUtils.intToSortableBytes(lonEnc, this.point, LatLonShape.BYTES); + } + + @Override + protected Relation relateRangeBBoxToQuery(int minXOffset, int minYOffset, byte[] minTriangle, +int maxXOffset, int maxYOffset, byte[] maxTriangle) { +if (Arrays.compareUnsigned(minTriangle, minXOffset, minXOffset + BYTES, point, BYTES, 2 * BYTES) > 0 || +Arrays.compareUnsigned(maxTriangle, maxXOffset, maxXOffset + BYTES, point, BYTES, 2 * BYTES) < 0 || +Arrays.compareUnsigned(minTriangle, minYOffset, minYOffset + BYTES, point, 0, BYTES) > 0 || +Arrays.compareUnsigned(maxTriangle, maxYOffset, maxYOffset + BYTES, point, 0, BYTES) < 0) { + return PointValues.Relation.CELL_OUTSIDE_QUERY; +} +return PointValues.Relation.CELL_CROSSES_QUERY; + } + + /** returns true if the query matches the encoded triangle */ + @Override + protected boolean queryMatches(byte[] t, int[] scratchTriangle, LatLonShape.QueryRelation queryRelation) { + +// decode indexed triangle +LatLonShape.decodeTriangle(t, scratchTriangle); + +int aY = scratchTriangle[0]; +int aX = scratchTriangle[1]; +int bY = scratchTriangle[2]; +int bX = scratchTriangle[3]; +int cY = scratchTriangle[4]; +int cX = scratchTriangle[5]; + +if (queryRelation == LatLonShape.QueryRelation.WITHIN) { + if (aY == bY && cY == aY && aX == bX && cX == aX) { + return lonEnc == aX && latEnc == aY; + } + return false; +} +return pointInTriangle(lonEnc, latEnc, aX, aY, bX, bY, cX, cY); + } + + //This should be moved when LatLonShape is moved from sandbox! + /** + * Compute whether the given x, y point is in a triangle; uses the winding order method */ + private static boolean pointInTriangle (double x, double y, double ax, double ay, double bx, double by, double cx, double cy) { Review comment: duplicate of? https://github.com/apache/lucene-solr/blob/ac209b637d68c84ce1402b6b8967514ce9cf6854/lucene/sandbox/src/java/org/apache/lucene/geo/Tessellator.java#L795 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] irvingzhang opened a new pull request #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging
irvingzhang opened a new pull request #1169: LUCENE-9004: A minor feature and patch -- support deleting vector values and fix segments merging URL: https://github.com/apache/lucene-solr/pull/1169 I think this commit belongs to this issue (https://issues.apache.org/jira/browse/LUCENE-9004). I'm not sure if I need to create a new issue. Following are my specified considerations, 1. A minor feature: Regarding to the ANN search problems, it's dangerous to delete vectors according to similarity search result in HNSW. The selected docs are neither sorted nor reduced. The number of deleted vectors is proportional to the segment count and the parameter ef. And the deleted vectors is obviously uncertain. Hence, I created a new type of Query (KnnDelQuery) And Weight (KnnDelScoreWeight) for the dedicated deleting of the exact values that matching the query vector; 2. A minor patch: For fixing the merge process while some segments may contain deleted documents that must be filtered; The modified codes have been tested by the test cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9137) Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html
Sebb created LUCENE-9137: Summary: Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html Key: LUCENE-9137 URL: https://issues.apache.org/jira/browse/LUCENE-9137 Project: Lucene - Core Issue Type: Bug Environment: Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html Reporter: Sebb -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Add delete action for HNSW and fix merger when segments contain deleted vectors
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Description: This issue is > Add delete action for HNSW and fix merger when segments contain deleted > vectors > --- > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: Bug >Reporter: Xin-Chun Zhang >Priority: Major > > This issue is -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9136) Add delete action for HNSW and fix merger when segments contain deleted vectors
Xin-Chun Zhang created LUCENE-9136: -- Summary: Add delete action for HNSW and fix merger when segments contain deleted vectors Key: LUCENE-9136 URL: https://issues.apache.org/jira/browse/LUCENE-9136 Project: Lucene - Core Issue Type: Bug Reporter: Xin-Chun Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned LUCENE-9134: -- Assignee: Erick Erickson > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1168: LUCENE-9135: Make uniformsplit.FieldMetadata counters long.
bruno-roustant opened a new pull request #1168: LUCENE-9135: Make uniformsplit.FieldMetadata counters long. URL: https://github.com/apache/lucene-solr/pull/1168 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9135) UniformSplit FieldMetadata counters should all be long
Bruno Roustant created LUCENE-9135: -- Summary: UniformSplit FieldMetadata counters should all be long Key: LUCENE-9135 URL: https://issues.apache.org/jira/browse/LUCENE-9135 Project: Lucene - Core Issue Type: Bug Reporter: Bruno Roustant Assignee: Bruno Roustant Currently UniformSplit FieldMetadata stores sumDocFreq, numTerms, sumTotalTermFreq as int which is incorrect. The fix is to make them long. The postings format will be compatible since those counters are currently written as VInt and they will be read as VLong (and then written as VLong afterwards). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk commented on a change in pull request #1152: SOLR-14172: Collection metadata remains in zookeeper if too many shards requested
risdenk commented on a change in pull request #1152: SOLR-14172: Collection metadata remains in zookeeper if too many shards requested URL: https://github.com/apache/lucene-solr/pull/1152#discussion_r366413865 ## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java ## @@ -190,10 +190,12 @@ public void call(ClusterState clusterState, ZkNodeProps message, NamedList resul try { replicaPositions = buildReplicaPositions(ocmh.cloudManager, clusterState, clusterState.getCollection(collectionName), message, shardNames, sessionWrapper); } catch (Assign.AssignmentException e) { -ZkNodeProps deleteMessage = new ZkNodeProps("name", collectionName); -new DeleteCollectionCmd(ocmh).call(clusterState, deleteMessage, results); +deleteCollection(clusterState, results, collectionName); // unwrap the exception throw new SolrException(ErrorCode.SERVER_ERROR, e.getMessage(), e.getCause()); + } catch (SolrException e) { Review comment: So one question I have is why is the error coming back from `buildReplicaPositions` not an `Assign.AssignmentException`? Is it because it is wrapped in a `SolrException` from the remote node? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation
[ https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015152#comment-17015152 ] ASF subversion and git services commented on LUCENE-9117: - Commit fb5ba8c9de62baba91d32c3b1e9b2faea8fe5f01 in lucene-solr's branch refs/heads/gradle-master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fb5ba8c ] LUCENE-9117: follow-up. > RamUsageEstimator hangs with AOT compilation > > > Key: LUCENE-9117 > URL: https://issues.apache.org/jira/browse/LUCENE-9117 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-9117.patch > > > Mailing list report by Cleber Muramoto. > {code} > After generating a pre-compiled image lucene-core (8.3.0) with jaotc (JDK > 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the > static initializer. > Steps to reproduce: > 1)Generate the image with > >jaotc --info --ignore-errors --jar > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >--output lucene-core.so > 2)Create a simple test class to trigger class loading > import java.io.IOException; > import java.nio.file.Path; > import java.nio.file.Paths; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.store.MMapDirectory; > public class TestAOT { > public static void main(String...args){ > run(); > System.out.println("Done"); > } > static void run(){ > try { > var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new > IndexWriterConfig()); > iw.close(); > } catch (IOException e) { > e.printStackTrace(); > } > } > } > 3)Run the class with the pre-compiled image > >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT -cp > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >TestAOT.java > 4)The program never completes. After inspecting it with jstack it > shows that it's stuck in line 195 of RamUsageEstimator > "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s > tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000] > java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195) > at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276) > at TestAOT.run(TestAOT.java:20) > at TestAOT.main(TestAOT.java:14) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567) > at > com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415) > at > com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192) > at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1 > /Main.java:132) > My guess is that the AOT compiler aggressive optimizations like > inlining/scalar replacing calls to Long.valueOf, are working against the > code's desired semantics. > Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be > conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK > 7 onwards (don't know about older versions) Long cache has a fixed range of > [-128,127], so there's no need to loop until a non-cached boxed value shows > up. > I know this compiler bug isn't Lucene's fault and as a workaround one could > use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc, > however, it would be nice to have a faster, allocation/loop-free > initializer for RamUsageEstimator nevertheless. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation
[ https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015146#comment-17015146 ] ASF subversion and git services commented on LUCENE-9117: - Commit 742301ca155f556b1e7374d7662a14608659f84b in lucene-solr's branch refs/heads/gradle-master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=742301c ] LUCENE-9117: RamUsageEstimator hangs with AOT compilation. Removed any attempt to estimate Long.valueOf cache size. > RamUsageEstimator hangs with AOT compilation > > > Key: LUCENE-9117 > URL: https://issues.apache.org/jira/browse/LUCENE-9117 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-9117.patch > > > Mailing list report by Cleber Muramoto. > {code} > After generating a pre-compiled image lucene-core (8.3.0) with jaotc (JDK > 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the > static initializer. > Steps to reproduce: > 1)Generate the image with > >jaotc --info --ignore-errors --jar > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >--output lucene-core.so > 2)Create a simple test class to trigger class loading > import java.io.IOException; > import java.nio.file.Path; > import java.nio.file.Paths; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.store.MMapDirectory; > public class TestAOT { > public static void main(String...args){ > run(); > System.out.println("Done"); > } > static void run(){ > try { > var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new > IndexWriterConfig()); > iw.close(); > } catch (IOException e) { > e.printStackTrace(); > } > } > } > 3)Run the class with the pre-compiled image > >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT -cp > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >TestAOT.java > 4)The program never completes. After inspecting it with jstack it > shows that it's stuck in line 195 of RamUsageEstimator > "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s > tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000] > java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195) > at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276) > at TestAOT.run(TestAOT.java:20) > at TestAOT.main(TestAOT.java:14) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567) > at > com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415) > at > com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192) > at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1 > /Main.java:132) > My guess is that the AOT compiler aggressive optimizations like > inlining/scalar replacing calls to Long.valueOf, are working against the > code's desired semantics. > Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be > conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK > 7 onwards (don't know about older versions) Long cache has a fixed range of > [-128,127], so there's no need to loop until a non-cached boxed value shows > up. > I know this compiler bug isn't Lucene's fault and as a workaround one could > use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc, > however, it would be nice to have a faster, allocation/loop-free > initializer for RamUsageEstimator nevertheless. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation
[ https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015142#comment-17015142 ] Dawid Weiss commented on LUCENE-9117: - Thanks. I'll prepare a pull request, test and commit. > RamUsageEstimator hangs with AOT compilation > > > Key: LUCENE-9117 > URL: https://issues.apache.org/jira/browse/LUCENE-9117 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-9117.patch > > > Mailing list report by Cleber Muramoto. > {code} > After generating a pre-compiled image lucene-core (8.3.0) with jaotc (JDK > 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the > static initializer. > Steps to reproduce: > 1)Generate the image with > >jaotc --info --ignore-errors --jar > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >--output lucene-core.so > 2)Create a simple test class to trigger class loading > import java.io.IOException; > import java.nio.file.Path; > import java.nio.file.Paths; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.store.MMapDirectory; > public class TestAOT { > public static void main(String...args){ > run(); > System.out.println("Done"); > } > static void run(){ > try { > var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new > IndexWriterConfig()); > iw.close(); > } catch (IOException e) { > e.printStackTrace(); > } > } > } > 3)Run the class with the pre-compiled image > >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT -cp > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >TestAOT.java > 4)The program never completes. After inspecting it with jstack it > shows that it's stuck in line 195 of RamUsageEstimator > "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s > tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000] > java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195) > at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276) > at TestAOT.run(TestAOT.java:20) > at TestAOT.main(TestAOT.java:14) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567) > at > com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415) > at > com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192) > at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1 > /Main.java:132) > My guess is that the AOT compiler aggressive optimizations like > inlining/scalar replacing calls to Long.valueOf, are working against the > code's desired semantics. > Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be > conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK > 7 onwards (don't know about older versions) Long cache has a fixed range of > [-128,127], so there's no need to loop until a non-cached boxed value shows > up. > I know this compiler bug isn't Lucene's fault and as a workaround one could > use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc, > however, it would be nice to have a faster, allocation/loop-free > initializer for RamUsageEstimator nevertheless. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015114#comment-17015114 ] David Smiley commented on SOLR-14040: - Added PR with details. > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis >Reporter: David Smiley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14040) solr.xml shareSchema does not work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned SOLR-14040: --- Assignee: David Smiley > solr.xml shareSchema does not work in SolrCloud > --- > > Key: SOLR-14040 > URL: https://issues.apache.org/jira/browse/SOLR-14040 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > solr.xml has a shareSchema boolean option that can be toggled from the > default of false to true in order to share IndexSchema objects within the > Solr node. This is silently ignored in SolrCloud mode. The pertinent code > is {{org.apache.solr.core.ConfigSetService#createConfigSetService}} which > creates a CloudConfigSetService that is not related to the SchemaCaching > class. This may not be a big deal in SolrCloud which tends not to deal well > with many cores per node but I'm working on changing that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ctargett commented on issue #1164: SOLR-12930: Create developer docs in source repo
ctargett commented on issue #1164: SOLR-12930: Create developer docs in source repo URL: https://github.com/apache/lucene-solr/pull/1164#issuecomment-574181698 > The gradle branch has a number of guide-style txt files under top-level help/ folder. I don't know how this fits with this proposal? Oh, cool, thanks for pointing those out. Those files appear to be pretty much about the Gradle build, so under this proposal what I would say is that those docs should move to the new top-level `dev-docs` directory. To start we could just put them in that directory and later as we move content into source we could add some organization via sub-directories to help people out. If it looks like we'll do this proposal, I'd be happy to make the changes on the branch to minimize your workload. They should also be converted to .adoc file format, which would be easy for me to do rather quickly and I'd offer to do that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation
[ https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015082#comment-17015082 ] Adrien Grand commented on LUCENE-9117: -- +1 > RamUsageEstimator hangs with AOT compilation > > > Key: LUCENE-9117 > URL: https://issues.apache.org/jira/browse/LUCENE-9117 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-9117.patch > > > Mailing list report by Cleber Muramoto. > {code} > After generating a pre-compiled image lucene-core (8.3.0) with jaotc (JDK > 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the > static initializer. > Steps to reproduce: > 1)Generate the image with > >jaotc --info --ignore-errors --jar > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >--output lucene-core.so > 2)Create a simple test class to trigger class loading > import java.io.IOException; > import java.nio.file.Path; > import java.nio.file.Paths; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.store.MMapDirectory; > public class TestAOT { > public static void main(String...args){ > run(); > System.out.println("Done"); > } > static void run(){ > try { > var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new > IndexWriterConfig()); > iw.close(); > } catch (IOException e) { > e.printStackTrace(); > } > } > } > 3)Run the class with the pre-compiled image > >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT -cp > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >TestAOT.java > 4)The program never completes. After inspecting it with jstack it > shows that it's stuck in line 195 of RamUsageEstimator > "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s > tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000] > java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195) > at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276) > at TestAOT.run(TestAOT.java:20) > at TestAOT.main(TestAOT.java:14) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567) > at > com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415) > at > com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192) > at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1 > /Main.java:132) > My guess is that the AOT compiler aggressive optimizations like > inlining/scalar replacing calls to Long.valueOf, are working against the > code's desired semantics. > Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be > conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK > 7 onwards (don't know about older versions) Long cache has a fixed range of > [-128,127], so there's no need to loop until a non-cached boxed value shows > up. > I know this compiler bug isn't Lucene's fault and as a workaround one could > use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc, > however, it would be nice to have a faster, allocation/loop-free > initializer for RamUsageEstimator nevertheless. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with something in TestInjection
[ https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015060#comment-17015060 ] Lucene/Solr QA commented on SOLR-14184: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 37s{color} | {color:red} core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} test-framework in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | solr.cloud.SystemCollectionCompatTest | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-14184 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990820/SOLR-14184.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 2cda4184c94 | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | Default Java | LTS | | unit | https://builds.apache.org/job/PreCommit-SOLR-Build/650/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/650/testReport/ | | modules | C: solr/core solr/test-framework U: solr | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/650/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > replace DirectUpdateHandler2.commitOnClose with something in TestInjection > -- > > Key: SOLR-14184 > URL: https://issues.apache.org/jira/browse/SOLR-14184 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14184.patch, SOLR-14184.patch > > > {code:java} > public static volatile boolean commitOnClose = true; // TODO: make this a > real config option or move it to TestInjection > {code} > Lots of tests muck with this (to simulate unclean shutdown and force tlog > replay on restart) but there's no garuntee that it is reset properly. > It should be replaced by logic in {{TestInjection}} that is correctly cleaned > up by {{TestInjection.reset()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet
a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366312261 ## File path: solr/core/src/java/org/apache/solr/search/facet/UniqueBlockQueryAgg.java ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.search.facet; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Objects; +import java.util.function.IntFunction; + +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.search.Query; +import org.apache.lucene.util.BitSet; + +import static org.apache.solr.search.join.BlockJoinParentQParser.getCachedFilter; + +public class UniqueBlockQueryAgg extends AggValueSource { Review comment: Extracted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet
a-siuniaev commented on a change in pull request #1161: SOLR-12325 Added uniqueBlockQuery(parent:true) aggregation for JSON Facet URL: https://github.com/apache/lucene-solr/pull/1161#discussion_r366312200 ## File path: solr/core/src/java/org/apache/solr/search/ValueSourceParser.java ## @@ -975,6 +976,13 @@ public ValueSource parse(FunctionQParser fp) throws SyntaxError { } }); +addParser("agg_uniqueBlockQuery", new ValueSourceParser() { + @Override + public ValueSource parse(FunctionQParser fp) throws SyntaxError { Review comment: Added parseNamedArg to FunctionQParser. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija edited a comment on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes
gerlowskija edited a comment on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes URL: https://github.com/apache/lucene-solr/pull/1163#issuecomment-574142361 > I thought it was git itself that enforces gitattributes, not 3rd party tools. That's what I thought too. What sort of editor/tool issues have you seen Dawid? > Don't know if git uses it only when checking out files from repo or also when committing changes? As far as I can tell from the docs [here,](https://git-scm.com/docs/gitattributes#_checking_out_and_checking_in) it looks like both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes
gerlowskija commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes URL: https://github.com/apache/lucene-solr/pull/1163#issuecomment-574142361 > I thought it was git itself that enforces gitattributes, not 3rd party tools. That's what I thought too. What sort of editor/tool issues have you seen Dawid? > Don't know if git uses it only when checking out files from repo or also when committing changes? As far as I can tell from the docs [here,](https://git-scm.com/docs/gitattributes#_checking_out_and_checking_in) it looks like both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation
[ https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015019#comment-17015019 ] Dawid Weiss commented on LUCENE-9117: - The AOT compiler is indeed pretty clever at optimizing away object equality check here. The attached patch works but I don't think we should even try to be smart and measure cache size here... Sure - we would overestimate for certain scenarios but I don't think it's harmful to overestimate here (and the specs on the result of valueOf are such that even a smart java compiler could turn the check expression into a constant). [~jpountz] What do you think? > RamUsageEstimator hangs with AOT compilation > > > Key: LUCENE-9117 > URL: https://issues.apache.org/jira/browse/LUCENE-9117 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-9117.patch > > > Mailing list report by Cleber Muramoto. > {code} > After generating a pre-compiled image lucene-core (8.3.0) with jaotc (JDK > 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the > static initializer. > Steps to reproduce: > 1)Generate the image with > >jaotc --info --ignore-errors --jar > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >--output lucene-core.so > 2)Create a simple test class to trigger class loading > import java.io.IOException; > import java.nio.file.Path; > import java.nio.file.Paths; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.store.MMapDirectory; > public class TestAOT { > public static void main(String...args){ > run(); > System.out.println("Done"); > } > static void run(){ > try { > var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new > IndexWriterConfig()); > iw.close(); > } catch (IOException e) { > e.printStackTrace(); > } > } > } > 3)Run the class with the pre-compiled image > >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT -cp > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >TestAOT.java > 4)The program never completes. After inspecting it with jstack it > shows that it's stuck in line 195 of RamUsageEstimator > "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s > tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000] > java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195) > at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276) > at TestAOT.run(TestAOT.java:20) > at TestAOT.main(TestAOT.java:14) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567) > at > com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415) > at > com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192) > at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1 > /Main.java:132) > My guess is that the AOT compiler aggressive optimizations like > inlining/scalar replacing calls to Long.valueOf, are working against the > code's desired semantics. > Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be > conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK > 7 onwards (don't know about older versions) Long cache has a fixed range of > [-128,127], so there's no need to loop until a non-cached boxed value shows > up. > I know this compiler bug isn't Lucene's fault and as a workaround one could > use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc, > however, it would be nice to have a faster, allocation/loop-free > initializer for RamUsageEstimator nevertheless. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9117) RamUsageEstimator hangs with AOT compilation
[ https://issues.apache.org/jira/browse/LUCENE-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-9117: Attachment: LUCENE-9117.patch > RamUsageEstimator hangs with AOT compilation > > > Key: LUCENE-9117 > URL: https://issues.apache.org/jira/browse/LUCENE-9117 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Attachments: LUCENE-9117.patch > > > Mailing list report by Cleber Muramoto. > {code} > After generating a pre-compiled image lucene-core (8.3.0) with jaotc (JDK > 13.0.1), RamUsageEstimator class is never loaded - it fails to complete the > static initializer. > Steps to reproduce: > 1)Generate the image with > >jaotc --info --ignore-errors --jar > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >--output lucene-core.so > 2)Create a simple test class to trigger class loading > import java.io.IOException; > import java.nio.file.Path; > import java.nio.file.Paths; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.store.MMapDirectory; > public class TestAOT { > public static void main(String...args){ > run(); > System.out.println("Done"); > } > static void run(){ > try { > var iw = new IndexWriter(MMapDirectory.open(Paths.get("dir")), new > IndexWriterConfig()); > iw.close(); > } catch (IOException e) { > e.printStackTrace(); > } > } > } > 3)Run the class with the pre-compiled image > >java -XX:AOTLibrary=./lucene-core.so -XX:+PrintAOT -cp > >~/.m2/repository/org/apache/lucene/lucene-core/8.3.0/lucene-core-8.3.0.jar > >TestAOT.java > 4)The program never completes. After inspecting it with jstack it > shows that it's stuck in line 195 of RamUsageEstimator > "main" #1 prio=5 os_prio=0 cpu=174827,04ms elapsed=173,91s > tid=0x7fe984018800 nid=0x2daf runnable [0x7fe98bc3c000] > java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.util.RamUsageEstimator.(RamUsageEstimator.java:195) > at org.apache.lucene.util.ArrayUtil.(ArrayUtil.java:31) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:276) > at TestAOT.run(TestAOT.java:20) > at TestAOT.main(TestAOT.java:14) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@13.0.1/Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@13.0.1/NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@13.0.1/DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(java.base@13.0.1/Method.java:567) > at > com.sun.tools.javac.launcher.Main.execute(jdk.compiler@13.0.1/Main.java:415) > at > com.sun.tools.javac.launcher.Main.run(jdk.compiler@13.0.1/Main.java:192) > at com.sun.tools.javac.launcher.Main.main(jdk.compiler@13.0.1 > /Main.java:132) > My guess is that the AOT compiler aggressive optimizations like > inlining/scalar replacing calls to Long.valueOf, are working against the > code's desired semantics. > Perhaps the logic to determine LONG_CACHE_[MIN/MAX]_VALUE could be > conditioned to VM version/vendor in Constants class. From [Open|Oracle]JDK > 7 onwards (don't know about older versions) Long cache has a fixed range of > [-128,127], so there's no need to loop until a non-cached boxed value shows > up. > I know this compiler bug isn't Lucene's fault and as a workaround one could > use --compile-for-tiered or exclude RamUsageEstimator's methods from jaotc, > however, it would be nice to have a faster, allocation/loop-free > initializer for RamUsageEstimator nevertheless. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13845) DELETEREPLICA API by "count" and "type"
[ https://issues.apache.org/jira/browse/SOLR-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014987#comment-17014987 ] Amrit Sarkar commented on SOLR-13845: - Generated PR request with the latest code, and documentation. > DELETEREPLICA API by "count" and "type" > --- > > Key: SOLR-13845 > URL: https://issues.apache.org/jira/browse/SOLR-13845 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Amrit Sarkar >Assignee: Shalin Shekhar Mangar >Priority: Major > Attachments: SOLR-13845.patch > > Time Spent: 10m > Remaining Estimate: 0h > > SOLR-9319 added support for deleting replicas by count. It would be great to > have the feature with added functionality the type of replica we want to > delete like we add replicas by count and type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sarkaramrit2 opened a new pull request #1167: SOLR-13845: DELETEREPLICA API by count and type
sarkaramrit2 opened a new pull request #1167: SOLR-13845: DELETEREPLICA API by count and type URL: https://github.com/apache/lucene-solr/pull/1167 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8615) Can LatLonShape's tessellator create more search-efficient triangles?
[ https://issues.apache.org/jira/browse/LUCENE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014971#comment-17014971 ] Ignacio Vera commented on LUCENE-8615: -- Here is an idea. We currently have the following type of triangles in the index: !screenshot-1.png|width=432,height=464! This plot shows that the potentially more wasteful triangles are the ones where two of the points belongs to bounding box (first four possibilities). So I wonder we should prevent to add those types of triangles to the index and instead split them using the longest side. Note that the side effect is that we can reduce the number of triangle types to 4. > Can LatLonShape's tessellator create more search-efficient triangles? > - > > Key: LUCENE-8615 > URL: https://issues.apache.org/jira/browse/LUCENE-8615 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: 2-tessellations.png, re-tessellate-triangle.png, > screenshot-1.png > > > The triangular mesh produced by LatLonShape's Tessellator creates reasonable > numbers of triangles, which is helpful for indexing speed. However I'm > wondering that there are conditions when it might be beneficial to run > tessellation slightly differently in order to create triangles that are more > search-friendly. Given that we only index the minimum bounding rectangle for > each triangle, we always check for intersection between the query and the > triangle if the query intersects with the MBR of the triangle. So the smaller > the area of the triangle compared to its MBR, the higher the likeliness to > have false positive when querying. > For instance see the following shape, there are two ways that it can be > tessellated into two triangles. LatLonShape's Tessellator is going to return > either of them depending on which point is listed first in the polygon. Yet > the first one is more efficient that the second one: with the second one, > both triangles have roughly the same MBR (which is also the MBR of the > polygon), so both triangles will need to be checked all the time whenever the > query intersects with this shared MBR. On the other hand, with the first way, > both MBRs are smaller and don't overlap, which makes it more likely that only > one triangle needs to be checked at query time. > !2-tessellations.png! > Another example is the following polygon. It can be tessellated into a single > triangle. Yet at times it might be a better idea create more triangles so > that the overall area of MBRs is smaller and queries are less likely to run > into false positives. > !re-tessellate-triangle.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8615) Can LatLonShape's tessellator create more search-efficient triangles?
[ https://issues.apache.org/jira/browse/LUCENE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8615: - Attachment: screenshot-1.png > Can LatLonShape's tessellator create more search-efficient triangles? > - > > Key: LUCENE-8615 > URL: https://issues.apache.org/jira/browse/LUCENE-8615 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: 2-tessellations.png, re-tessellate-triangle.png, > screenshot-1.png > > > The triangular mesh produced by LatLonShape's Tessellator creates reasonable > numbers of triangles, which is helpful for indexing speed. However I'm > wondering that there are conditions when it might be beneficial to run > tessellation slightly differently in order to create triangles that are more > search-friendly. Given that we only index the minimum bounding rectangle for > each triangle, we always check for intersection between the query and the > triangle if the query intersects with the MBR of the triangle. So the smaller > the area of the triangle compared to its MBR, the higher the likeliness to > have false positive when querying. > For instance see the following shape, there are two ways that it can be > tessellated into two triangles. LatLonShape's Tessellator is going to return > either of them depending on which point is listed first in the polygon. Yet > the first one is more efficient that the second one: with the second one, > both triangles have roughly the same MBR (which is also the MBR of the > polygon), so both triangles will need to be checked all the time whenever the > query intersects with this shared MBR. On the other hand, with the first way, > both MBRs are smaller and don't overlap, which makes it more likely that only > one triangle needs to be checked at query time. > !2-tessellations.png! > Another example is the following polygon. It can be tessellated into a single > triangle. Yet at times it might be a better idea create more triangles so > that the overall area of MBRs is smaller and queries are less likely to run > into false positives. > !re-tessellate-triangle.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r366229880 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import groovy.xml.NamespaceBuilder + +allprojects { +configurations { +rat +} + +dependencies { +rat 'org.apache.rat:apache-rat-tasks:0.13' +} +} + +subprojects { +plugins.withId("java", { Review comment: That's not a quirk :) What I'm saying is that you explicitly rely on the project to be a java convention project. We don't need to require this but then have to decide what to use as source folders. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1157: Add RAT check using Gradle
dweiss commented on issue #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574082560 bq. One other question of best practices - do we want all the project exclusions in a central rat.gradle file, or spread out and local to each project? This is a good question and I think the answer is really subjective. Both approaches have advantages and disadvantages. I personally like the "aspect" oriented approach when everything related to a particular build function is gathered in a single file. So yes, for rat it'd be just that single file -- any special handling of that aspect, exclusions, etc. would be collected there. In an extreme case you should be able to enable/ disable rat just by commenting out the apply block in the master build file. As the build evolves over time it may need to be changed. For example when you move parts of the build into buildSrc and wrap it in plugins it will become necessary to move project-specific logic outside. For now I'd rather keep it in this "aspect-oriented" form if you don't mind (but this is a subjective decision, not any better or common practice). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1157: Add RAT check using Gradle
dweiss commented on issue #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-574079514 To be honest I would prefer not to add those license headers to build files, unless it's a requirement of apache legal -- this would have to be verified. For one thing, I don't perceive build files as something particularly valuable as an intellectual property (although it surely does require intellectual input to write them). But even if then the distribution bundle comes with a top-level license file that covers them? The downside of requiring this boilerplate is that they bloat each and every script. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes
dweiss commented on issue #1163: SOLR-14186: Enforce CRLF in Windows files with .gitattributes URL: https://github.com/apache/lucene-solr/pull/1163#issuecomment-574074888 You assume a single "git" but there are multiple versions of the tool and even multiple implementations (jgit, for example). An assumption editors support or understand .gitattributes (or .editorconfig) is probably overly optimistic either... Not that I use nodepad.exe... but I'm sure there are folks who do! :) And more seriously: I've experimented with both options in the past and I use them in other projects but still prefer an explicit check for sanity. Ideally an integration test that takes the final ZIP/TGZ bundle and verifies distribution sanity - this also covers the (not unlikely) possibility of build tools messing up files while doing content-filtering, copying, whatever else. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1164: SOLR-12930: Create developer docs in source repo
dweiss commented on issue #1164: SOLR-12930: Create developer docs in source repo URL: https://github.com/apache/lucene-solr/pull/1164#issuecomment-574072794 The gradle branch has a number of guide-style txt files under top-level help/ folder. I don't know how this fits with this proposal? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org