[GitHub] [lucene-solr] dweiss commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


dweiss commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r500041959



##
File path: solr/packaging/build.gradle
##
@@ -62,12 +63,17 @@ dependencies {
 
   example project(path: ":solr:example", configuration: "packaging")
   server project(path: ":solr:server", configuration: "packaging")
+
+  // Copy files from documentation output
+  docs project(path: ':solr', configuration: 'docs')

Review comment:
   I explained this above, I hope. I think artifact/configuration based 
approach to collecting packaging dependencies is superior to explicit 
properties and task-dependencies.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


dweiss commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r500042473



##
File path: solr/packaging/build.gradle
##
@@ -115,15 +121,15 @@ distributions {
 into "server"
   })
 
+  from(configurations.docs, {
+into "docs"
+  })
   // docs/   - TODO: this is assembled via XSLT... leaving out for now.
+  // -- Is there more to do here?

Review comment:
   I think we can create a separate issue for collecting Solr docs? This 
patch is mostly about Lucene and it's pretty big already.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


dweiss commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r500041550



##
File path: lucene/packaging/build.gradle
##
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// This project puts together a "distribution", assembling dependencies from
+// various other projects.
+
+plugins {
+id 'distribution'
+}
+
+description = 'Lucene distribution packaging'
+
+// Declare all subprojects that should be included in binary distribution.
+// By default everything is included, unless explicitly excluded.
+def includeInBinaries = project(":lucene").subprojects.findAll {subproject ->
+return !(subproject.path in [
+// Exclude packaging, not relevant to binary distribution.
+":lucene:packaging",
+// Exclude parent container project of analysis modules (no artifacts).
+":lucene:analysis"
+])
+}
+
+// Create a configuration to each subproject and add dependency.
+def binaryArtifactsConf = { Project prj ->
+"dep-binary" + prj.path.replace(':', '-')
+}
+
+def allDepsConf = { Project prj ->
+"dep-full" + prj.path.replace(':', '-')
+}
+
+configurations {
+docs
+}
+
+for (Project includedProject : includeInBinaries) {
+def confBinaries = binaryArtifactsConf(includedProject)
+def confFull = allDepsConf(includedProject)
+configurations.create(confBinaries)
+configurations.create(confFull)
+dependencies { DependencyHandler handler ->
+// Just project binaries.
+handler.add(confBinaries, project(path: includedProject.path, 
configuration: "packaging"))
+// All project dependencies, including transitive dependencies from 
the runtime configuration.
+handler.add(confFull, project(path: includedProject.path, 
configuration: "runtimeElements"), {
+exclude group: "org.apache.lucene"
+
+// Exclude these from all projects.
+exclude group: "commons-logging"
+exclude group: "org.slf4j"
+})
+}
+}
+
+dependencies {
+docs project(path: ':lucene', configuration: 'docs')
+}
+
+distributions {
+// The "main" distribution is the binary distribution.
+// We should also add 'source' distribution at some point
+// (we can't do it now as the build itself is tangled with Solr).
+main {
+distributionBaseName = 'lucene'
+
+contents {
+// Manually correct posix permissions (matters when packaging on 
Windows).
+filesMatching(["**/*.sh", "**/*.bat"]) { copy ->
+copy.setMode(0755)
+}
+
+// Root distribution files; these are cherry-picked manually.
+from(project(':lucene').projectDir, {
+include "CHANGES.txt"
+include "JRE_VERSION_MIGRATION.md"
+include "LICENSE.txt"
+include "licenses/*"
+include "MIGRATE.md"
+include "NOTICE.txt"
+include "README.md"
+include "SYSTEM_REQUIREMENTS.md"
+})
+
+// A couple more missing README files
+from(project(':lucene:analysis').projectDir) {
+include "README.txt"
+into 'analysis'
+}
+
+// Copy files from documentation output to 'docs'
+from(configurations.docs, {

Review comment:
   This is part of the PR, Uwe - here.
   
https://github.com/apache/lucene-solr/pull/1905/files#diff-d55d7b9fe9d94d4e2e5554e3c1108943R60-R63
   
   I suggested using configuration-based copying instead of pointing at an 
explicit folder because it has several advantages. First, an artifact exported 
via configuration is configurable in the place where it logically belongs to. 
So you can put together a set of documentation artifacts in documentation 
configuration snippet and just depend on it when you're assembling something 
that uses those artifacts.
   
   The second benefit is that dependencies on tasks assembling those artifacts 
are automatically resolved - you don't need to know anything beyond the name of 
the project and configuration from which you're coping. The rest is done 

[jira] [Updated] (SOLR-14870) gradle build does not validate ref-guide -> javadoc links

2020-10-05 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14870:
--
Attachment: SOLR-14870.patch
  Assignee: Chris M. Hostetter
Status: Open  (was: Open)

... this has taken a little longer to get into a usable shape then i expected 
it would, but I've got a patch attached that i think is getting pretty close to 
what we want for the end game.
{panel:title=digression...}
The main thing that bogged me down is that the lifecycle of gradle "tasks" 
really doesn't seem to make much sense, and there isn't a "clean" guide to how 
to refactor a "task" (instance) into a re-usable "class" that many tasks can be 
instances of.

I tried to start by refactoring 'prepareSources' into a "class PrepareSources 
extends Sync" using a doFirst closure registered in the constructor to "set up" 
the ivy properties that we can't read on construction/configuration – but that 
didn't work because gradle 'doFirst' is a lie...
 * [https://issues.gradle.org/browse/GRADLE-2064]
 * 
[https://discuss.gradle.org/t/dofirst-does-not-execute-first-but-only-after-task-execution/28129/6]
 * [https://github.com/gradle/gradle/issues/9142]

...the for 'TaskAction' of a gradle Task (class) is itself registered in a 
doFirst, which causes any 'doFirst' logic you want to register in the class to 
actually be done after the task is complete (there's no 'do ancestors 
first/last type evaluation order for doFirst/doLast like Before/After methods 
in junit)

So then i tried to write a new Task (class) that used composition around a 
private inner 'Sync' Task – but that kept getting me into weird Injection type 
errors (because Sync really expects gradle to Inject things). In trying to 
figure out if there was a hook i could call to tell the Project to "init" my 
Sync task (and handle all the injection) i realized that the 'Project.sync' 
seemed like a much more direct
 approach.

>From there things got much easier, allthough I still got hung up on how some 
>groovy syntax interacts with gradle "special sauce" ... notably: when using 
>the "task" keyword in gradle, that's evidently just calling the task() method 
>on the Project passing in a closure; which in groovy gets an implicit 'it' 
>param – so when you do dynamic task declarations inside (or around) 'each' 
>loops (where there is another implicit 'it' param) one overrides the other.
{panel}
In any case – i think I've got something nice and clean here, and would 
appreciate review from [~dweiss] & [~uschindler] on wether I've gone off and 
done anything completley insane?

Assuming i haven't, there are really just 2 outstanding nocommit's that need 
attention:
 * is there really no better way to get the 'main' classpath from the Project 
then the '...getPlugin(JavaPluginConvention.class)...' hoops i jumped through?
 * At the moment, link checking "fails" largely because it looks like the solr 
javadocs seem to have been restructured
 ** i need someone to sanity check for me that this is an "expected" change?
 specifically that paths like ".../solr-solrj/..." & ".../solr-ltr/..." are now 
going to be ".../solrj/..." & ".../ltr/..." starting in 9.0 ?
 ** examples:
 *** 
[https://lucene.apache.org/solr/8_6_0/solr-solrj/org/apache/solr/client/solrj/SolrClient.html]
  
[https://lucene.apache.org/solr/9_0_0/solrj/org/apache/solr/client/solrj/SolrClient.html]
 *** 
[https://lucene.apache.org/solr/8_6_0/solr-core/org/apache/solr/search/XmlQParserPlugin.html]
  
[https://lucene.apache.org/solr/9_0_0/core/org/apache/solr/search/XmlQParserPlugin.html]
** if this is all expected, then i'll move forward with updating all of the 
affected \*.adoc files and replace the hardcoded relative paths in the 
build.gradle with "clean" project(...)" based paths per Uwe's suggestion 

> gradle build does not validate ref-guide -> javadoc links
> -
>
> Key: SOLR-14870
> URL: https://issues.apache.org/jira/browse/SOLR-14870
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14870.patch
>
>
> the ant build had (has on 8x) a feature that ensured we didn't have any 
> broken links between the ref guide and the javadocs...
> {code}
>  depends="javadocs,changes-to-html,process-webpages">
>  inheritall="false">
>   
>   
> 
>   
> {code}
> ...by default {{cd solr/solr-ref-guide && ant bare-bones-html-validation}} 
> just did interanal validation of the strucure of the guide, but this hook 
> ment that {{cd solr && ant documentation}} (or {{ant precommit}}) would first 
> build the javadocs; then build the ref-guide; then validate _all_ lin

[jira] [Commented] (LUCENE-9493) Remove obsolete dev-tools/{idea,netbeans,maven} folders

2020-10-05 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208424#comment-17208424
 ] 

David Smiley commented on LUCENE-9493:
--

I filed LUCENE-9563 for a ".editorConfig" which should be a cross-IDE way of 
specifying code style settings.
RE a copyright profile... [~dweiss] might you propose where you are comfortable 
adding this?

It'd be interesting if the gradle build could detect that IntelliJ is importing 
it, first time setup in particular, and then tell the user about the copyright 
profile and any other matter.

> Remove obsolete dev-tools/{idea,netbeans,maven} folders
> ---
>
> Key: LUCENE-9493
> URL: https://issues.apache.org/jira/browse/LUCENE-9493
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I don't think they're used or applicable anymore. Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9563) Add .editorConfig

2020-10-05 Thread David Smiley (Jira)
David Smiley created LUCENE-9563:


 Summary: Add .editorConfig
 Key: LUCENE-9563
 URL: https://issues.apache.org/jira/browse/LUCENE-9563
 Project: Lucene - Core
  Issue Type: Task
Reporter: David Smiley
Assignee: David Smiley


I propose adding a ".editorConfig" to the root of the project.  Many text 
editors and IDEs support this file to declare code style settings such as 
indentation and more.  In particular, IntelliJ supports this natively and 
Eclipse has a plugin for it.

https://editorconfig.org

I furthermore propose I simply generate this as an export of my current 
IntelliJ code style, which is a code style I've been using and was originally 
imported from the Lucene's former IntelliJ config.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-9476) Add a bulk ordinal->FacetLabel API

2020-10-05 Thread Gautam Worah (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Worah reopened LUCENE-9476:
--

> Add a bulk ordinal->FacetLabel API
> --
>
> Key: LUCENE-9476
> URL: https://issues.apache.org/jira/browse/LUCENE-9476
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6.1
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
>
> This issue is a spillover from the 
> [PR|https://github.com/apache/lucene-solr/pull/1733/files] for LUCENE 9450
> The idea here is to share a single {{BinaryDocValues}} instance per leaf per 
> query instead of creating a new one each time in the 
> {{DirectoryTaxonomyReader}}.
> Suggested by [~mikemccand]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9540) Investigate double indexing of the fullPathField in the DirectoryTaxonomyWriter

2020-10-05 Thread Gautam Worah (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Worah resolved LUCENE-9540.
--
Resolution: Not A Problem

> Investigate double indexing of the fullPathField in the 
> DirectoryTaxonomyWriter
> ---
>
> Key: LUCENE-9540
> URL: https://issues.apache.org/jira/browse/LUCENE-9540
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Gautam Worah
>Priority: Minor
>
> We may have reason to believe that we are double indexing the fullPathField 
> postings item in the DirectoryTaxonomyWriter constructor.
> This should ideally be a StoredField.
> See related discussion in PR https://github.com/apache/lucene-solr/pull/1733/
> Postings are already enabled for facet labels in 
> [FacetsConfig#L364-L399|https://github.com/apache/lucene-solr/blob/master/lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java#L364-L366]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9540) Investigate double indexing of the fullPathField in the DirectoryTaxonomyWriter

2020-10-05 Thread Gautam Worah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208416#comment-17208416
 ] 

Gautam Worah commented on LUCENE-9540:
--

It turns out that we are not double indexing the {{fullPathField}}. Removing 
the [line|#L495] that adds a {{StringValue}} to the {{fullPathField}} results 
in multiple failing tests. 

I think the lookup for input facet label using reverse index (postings) is 
enabled by the {{fullPathField}} and hence the removal of it causes test 
failures. Closing this issue. 

> Investigate double indexing of the fullPathField in the 
> DirectoryTaxonomyWriter
> ---
>
> Key: LUCENE-9540
> URL: https://issues.apache.org/jira/browse/LUCENE-9540
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Gautam Worah
>Priority: Minor
>
> We may have reason to believe that we are double indexing the fullPathField 
> postings item in the DirectoryTaxonomyWriter constructor.
> This should ideally be a StoredField.
> See related discussion in PR https://github.com/apache/lucene-solr/pull/1733/
> Postings are already enabled for facet labels in 
> [FacetsConfig#L364-L399|https://github.com/apache/lucene-solr/blob/master/lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java#L364-L366]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9476) Add a bulk ordinal->FacetLabel API

2020-10-05 Thread Gautam Worah (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Worah resolved LUCENE-9476.
--
Resolution: Not A Problem

> Add a bulk ordinal->FacetLabel API
> --
>
> Key: LUCENE-9476
> URL: https://issues.apache.org/jira/browse/LUCENE-9476
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6.1
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
>
> This issue is a spillover from the 
> [PR|https://github.com/apache/lucene-solr/pull/1733/files] for LUCENE 9450
> The idea here is to share a single {{BinaryDocValues}} instance per leaf per 
> query instead of creating a new one each time in the 
> {{DirectoryTaxonomyReader}}.
> Suggested by [~mikemccand]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499905829



##
File path: solr/packaging/build.gradle
##
@@ -115,15 +121,15 @@ distributions {
 into "server"
   })
 
+  from(configurations.docs, {
+into "docs"
+  })
   // docs/   - TODO: this is assembled via XSLT... leaving out for now.
+  // -- Is there more to do here?

Review comment:
   See comments below.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499905487



##
File path: solr/packaging/build.gradle
##
@@ -62,12 +63,17 @@ dependencies {
 
   example project(path: ":solr:example", configuration: "packaging")
   server project(path: ":solr:server", configuration: "packaging")
+
+  // Copy files from documentation output
+  docs project(path: ':solr', configuration: 'docs')

Review comment:
   There are some `ext` properties prointing to the documentation after 
it's generated. For the Solr ZIP/TGZ its just some minimal folder: 
`project(':solr').docrootOnline` (packaged in ZIP)
   
   For full lucene/solr docs, refer to `project(':lucene').docroot` (packaged 
in TGZ) and `project(':solr').docroot` (not packaged, only published on 
website).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499904208



##
File path: solr/packaging/build.gradle
##
@@ -62,12 +63,17 @@ dependencies {
 
   example project(path: ":solr:example", configuration: "packaging")
   server project(path: ":solr:server", configuration: "packaging")
+
+  // Copy files from documentation output
+  docs project(path: ':solr', configuration: 'docs')

Review comment:
   I have no idea where this configuration is defined. The `:solr` project 
only has a task called `documentation` that needs to be executed before, but 
the outputs are not defined as config. So how should this work (same for 
Lucene)?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499903030



##
File path: solr/packaging/build.gradle
##
@@ -115,15 +121,15 @@ distributions {
 into "server"
   })
 
+  from(configurations.docs, {

Review comment:
   where is this coming from?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499902932



##
File path: lucene/packaging/build.gradle
##
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// This project puts together a "distribution", assembling dependencies from
+// various other projects.
+
+plugins {
+id 'distribution'
+}
+
+description = 'Lucene distribution packaging'
+
+// Declare all subprojects that should be included in binary distribution.
+// By default everything is included, unless explicitly excluded.
+def includeInBinaries = project(":lucene").subprojects.findAll {subproject ->
+return !(subproject.path in [
+// Exclude packaging, not relevant to binary distribution.
+":lucene:packaging",
+// Exclude parent container project of analysis modules (no artifacts).
+":lucene:analysis"
+])
+}
+
+// Create a configuration to each subproject and add dependency.
+def binaryArtifactsConf = { Project prj ->
+"dep-binary" + prj.path.replace(':', '-')
+}
+
+def allDepsConf = { Project prj ->
+"dep-full" + prj.path.replace(':', '-')
+}
+
+configurations {
+docs
+}
+
+for (Project includedProject : includeInBinaries) {
+def confBinaries = binaryArtifactsConf(includedProject)
+def confFull = allDepsConf(includedProject)
+configurations.create(confBinaries)
+configurations.create(confFull)
+dependencies { DependencyHandler handler ->
+// Just project binaries.
+handler.add(confBinaries, project(path: includedProject.path, 
configuration: "packaging"))
+// All project dependencies, including transitive dependencies from 
the runtime configuration.
+handler.add(confFull, project(path: includedProject.path, 
configuration: "runtimeElements"), {
+exclude group: "org.apache.lucene"
+
+// Exclude these from all projects.
+exclude group: "commons-logging"
+exclude group: "org.slf4j"
+})
+}
+}
+
+dependencies {
+docs project(path: ':lucene', configuration: 'docs')
+}
+
+distributions {
+// The "main" distribution is the binary distribution.
+// We should also add 'source' distribution at some point
+// (we can't do it now as the build itself is tangled with Solr).
+main {
+distributionBaseName = 'lucene'
+
+contents {
+// Manually correct posix permissions (matters when packaging on 
Windows).
+filesMatching(["**/*.sh", "**/*.bat"]) { copy ->
+copy.setMode(0755)
+}
+
+// Root distribution files; these are cherry-picked manually.
+from(project(':lucene').projectDir, {
+include "CHANGES.txt"
+include "JRE_VERSION_MIGRATION.md"
+include "LICENSE.txt"
+include "licenses/*"
+include "MIGRATE.md"
+include "NOTICE.txt"
+include "README.md"
+include "SYSTEM_REQUIREMENTS.md"
+})
+
+// A couple more missing README files
+from(project(':lucene:analysis').projectDir) {
+include "README.txt"
+into 'analysis'
+}
+
+// Copy files from documentation output to 'docs'
+from(configurations.docs, {

Review comment:
   where is this coming from?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499902511



##
File path: solr/packaging/build.gradle
##
@@ -115,15 +121,15 @@ distributions {
 into "server"
   })
 
+  from(configurations.docs, {
+into "docs"
+  })
   // docs/   - TODO: this is assembled via XSLT... leaving out for now.
+  // -- Is there more to do here?

Review comment:
   Oh, I am not sure if this works. At least in current master, I see NO 
'docs/' foder in the binary releaese!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499901026



##
File path: solr/packaging/build.gradle
##
@@ -115,15 +121,15 @@ distributions {
 into "server"
   })
 
+  from(configurations.docs, {
+into "docs"
+  })
   // docs/   - TODO: this is assembled via XSLT... leaving out for now.
+  // -- Is there more to do here?

Review comment:
   I don't think so. Documentation in Solr's Artifacts on Jenkins already 
contain everything perfectly. If its the same for Lucene, we're fine. Should 
land in the ZIP's `docs` folder.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14802) Sorting by min of two geodist functions

2020-10-05 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208366#comment-17208366
 ] 

Cassandra Targett commented on SOLR-14802:
--

Thanks to you both, that clears up my question.

> Sorting by min of two geodist functions
> ---
>
> Key: SOLR-14802
> URL: https://issues.apache.org/jira/browse/SOLR-14802
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spatial
>Reporter: Shaun Storey
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previously using geo field with type LatLonType you could implement a query 
> like
> {code:java}
> /select?q=*:*&fl=ID&fq={!geofilt}&d=50&pt=53.4721936,-2.24703&sfield=MAIN_LOCATION&sort=min(geodist(),
>  geodist(ALT_LOCATION,53.4721936,-2.24703)) asc{code}
> to sort results on minium distance from multiple different locations. Moving 
> the fields to advised LatLonPointSpatialField gives
> "geodist() does not support field names in its arguments when stated fields 
> are solr.LatLonPointSpatialField spatial type, requires sfield param instead"
> This has been reviewed before SOLR-11601 but not for my actual actual 
> use-case so it seems the choice was to just change the error message rather 
> than to implement the previous functionality. Can this be re-reviewed or let 
> me know if there is another way to achieve the same result.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499898687



##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java
##
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.lucene.codecs.VectorWriter;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.Counter;
+
+/** Buffers up pending vector value(s) per doc, then flushes when segment 
flushes. */
+public class VectorValuesWriter {
+
+  private final FieldInfo fieldInfo;
+  private final Counter iwBytesUsed;
+  private final List vectors = new ArrayList<>();
+  private final DocsWithFieldSet docsWithField;
+
+  private int lastDocID = -1;
+
+  private long bytesUsed;
+
+  VectorValuesWriter(FieldInfo fieldInfo, Counter iwBytesUsed) {
+this.fieldInfo = fieldInfo;
+this.iwBytesUsed = iwBytesUsed;
+this.docsWithField = new DocsWithFieldSet();
+this.bytesUsed = docsWithField.ramBytesUsed();
+if (iwBytesUsed != null) {
+  iwBytesUsed.addAndGet(bytesUsed);
+}
+  }
+
+  /**
+   * Adds a value for the given document. Only a single value may be added.
+   * @param docID the value is added to this document
+   * @param vectorValue the value to add
+   * @throws IllegalArgumentException if a value has already been added to the 
given document
+   */
+  public void addValue(int docID, float[] vectorValue) {
+if (docID == lastDocID) {
+  throw new IllegalArgumentException("VectorValuesField \"" + 
fieldInfo.name + "\" appears more than once in this document (only one value is 
allowed per field)");
+}
+assert docID > lastDocID;
+docsWithField.add(docID);
+vectors.add(ArrayUtil.copyOfSubArray(vectorValue, 0, vectorValue.length));

Review comment:
   Yes in the normal case, but a sneaky/unlucky person could bypass Field 
type safety; I'll add a check here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova edited a comment on pull request #1943: LUCENE-9555 Advance conjuction Iterator for two phase iteration

2020-10-05 Thread GitBox


mayya-sharipova edited a comment on pull request #1943:
URL: https://github.com/apache/lucene-solr/pull/1943#issuecomment-703885914


   @jpountz Sorry for the noise, I have found the cause of this error, and the 
latest commit addresses it.
   Basically this PR will just address the failing test of 
`TestUnifiedHighlighterStrictPhrases.testBasics`.
   
   My next steps will  the following:
   - use leap-frog logic without using ConjunctionDISI for this method  (a 
separate PR for that)
   - Then reintroduce again checks in  
[ConjunctionDISI](https://github.com/apache/lucene-solr/pull/1937) that I 
reverted,  as sort optimization by introducing new iterators in the middle of 
iteration would break these checks.   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499893497



##
File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextVectorReader.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs.simpletext;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.StandardCharsets;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.lucene.codecs.VectorReader;
+import org.apache.lucene.index.CorruptIndexException;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.store.BufferedChecksumIndexInput;
+import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.BytesRefBuilder;
+import org.apache.lucene.util.StringHelper;
+
+import static org.apache.lucene.codecs.simpletext.SimpleTextVectorWriter.*;
+
+/**
+ * Reads vector values from a simple text format. All vectors are read up 
front and cached in RAM in order to support
+ * random access.
+ * FOR RECREATIONAL USE ONLY
+ * @lucene.experimental
+ */
+public class SimpleTextVectorReader extends VectorReader {
+
+  private static final BytesRef EMPTY = new BytesRef("");
+
+  private final SegmentReadState readState;
+  private final IndexInput dataIn;
+  private final BytesRefBuilder scratch = new BytesRefBuilder();
+  private final Map fieldEntries = new HashMap<>();
+
+  SimpleTextVectorReader(SegmentReadState readState) throws IOException {
+this.readState = readState;
+String metaFileName = 
IndexFileNames.segmentFileName(readState.segmentInfo.name, 
readState.segmentSuffix, SimpleTextVectorFormat.META_EXTENSION);
+try (ChecksumIndexInput in = 
readState.directory.openChecksumInput(metaFileName, IOContext.DEFAULT)) {
+  int fieldNumber = readInt(in, FIELD_NUMBER);
+  while (fieldNumber != -1) {
+String fieldName = readString(in, FIELD_NAME);
+String scoreFunctionName = readString(in, SCORE_FUNCTION);
+VectorValues.ScoreFunction scoreFunction = 
VectorValues.ScoreFunction.valueOf(scoreFunctionName);
+long vectorDataOffset = readLong(in, VECTOR_DATA_OFFSET);
+long vectorDataLength = readLong(in, VECTOR_DATA_LENGTH);
+int dimension = readInt(in, VECTOR_DIMENSION);
+int size = readInt(in, SIZE);
+int[] docIds = new int[size];
+for (int i = 0; i < size; i++) {
+  docIds[i] = readInt(in, EMPTY);
+}
+assert fieldEntries.containsKey(fieldName) == false;
+fieldEntries.put(fieldName, new FieldEntry(dimension, scoreFunction, 
vectorDataOffset, vectorDataLength, docIds));
+fieldNumber = readInt(in, FIELD_NUMBER);
+  }
+  SimpleTextUtil.checkFooter(in);
+}
+
+String vectorFileName = 
IndexFileNames.segmentFileName(readState.segmentInfo.name, 
readState.segmentSuffix, SimpleTextVectorFormat.VECTOR_EXTENSION);
+dataIn = readState.directory.openInput(vectorFileName, IOContext.DEFAULT);
+  }
+
+  @Override
+  public VectorValues getVectorValues(String field) throws IOException {
+FieldInfo info = readState.fieldInfos.fieldInfo(field);
+if (info == null) {
+  throw new IllegalStateException("No vectors indexed for field=\"" + 
field + "\"");
+}
+int dimension = info.getVectorDimension();
+if (dimension == 0) {
+  return VectorValues.EMPTY;
+}
+FieldEntry fieldEntry = fieldEntries.get(field);
+if (fieldEntry == null) {
+  throw new IllegalStateException("No entry found for vector field=\"" + 
field + "\"");
+}
+if (dimension != fieldEntry.dimension) {
+  throw new IllegalStateException("Inconsistent vector dimension for 
field=\"" + field + "\"; " + dimension + " != " + fieldEntry.dimension);
+}
+IndexInput bytesSlice = dataIn.slice("vector-data", 
fieldEnt

[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499891879



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorReader.java
##
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs.lucene90;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.FloatBuffer;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.VectorReader;
+import org.apache.lucene.index.CorruptIndexException;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FieldInfos;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.RamUsageEstimator;
+
+/**
+ * Reads vectors from the index segments.
+ */
+public final class Lucene90VectorReader extends VectorReader {
+
+  private final FieldInfos fieldInfos;
+  private final Map fields = new HashMap<>();
+  private final IndexInput vectorData;
+  private final int maxDoc;
+
+  Lucene90VectorReader(SegmentReadState state) throws IOException {
+this.fieldInfos = state.fieldInfos;
+this.maxDoc = state.segmentInfo.maxDoc();
+
+String metaFileName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
Lucene90VectorFormat.META_EXTENSION);
+int versionMeta = -1;
+try (ChecksumIndexInput meta = 
state.directory.openChecksumInput(metaFileName, state.context)) {
+  Throwable priorE = null;
+  try {
+versionMeta = CodecUtil.checkIndexHeader(meta,
+Lucene90VectorFormat.META_CODEC_NAME,
+Lucene90VectorFormat.VERSION_START,
+Lucene90VectorFormat.VERSION_CURRENT,
+state.segmentInfo.getId(),
+state.segmentSuffix);
+readFields(meta, state.fieldInfos);
+  } catch (Throwable exception) {
+priorE = exception;
+  } finally {
+CodecUtil.checkFooter(meta, priorE);
+  }
+}
+
+boolean success = false;
+
+String vectorDataFileName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
Lucene90VectorFormat.VECTOR_DATA_EXTENSION);
+this.vectorData = state.directory.openInput(vectorDataFileName, 
state.context);
+try {
+  int versionVectorData = CodecUtil.checkIndexHeader(vectorData,
+  Lucene90VectorFormat.VECTOR_DATA_CODEC_NAME,
+  Lucene90VectorFormat.VERSION_START,
+  Lucene90VectorFormat.VERSION_CURRENT,
+  state.segmentInfo.getId(),
+  state.segmentSuffix);
+  if (versionMeta != versionVectorData) {
+throw new CorruptIndexException("Format versions mismatch: meta=" + 
versionMeta + ", vector data=" + versionVectorData, vectorData);
+  }
+  CodecUtil.retrieveChecksum(vectorData);
+
+  success = true;
+} finally {
+  if (!success) {
+IOUtils.closeWhileHandlingException(this.vectorData);
+  }
+}
+  }
+
+  private void readFields(ChecksumIndexInput meta, FieldInfos infos) throws 
IOException {
+for (int fieldNumber = meta.readInt(); fieldNumber != -1; fieldNumber = 
meta.readInt()) {
+  FieldInfo info = infos.fieldInfo(fieldNumber);
+  if (info == null) {
+throw new CorruptIndexException("Invalid field number: " + 
fieldNumber, meta);
+  }
+  VectorValues.ScoreFunction scoreFunction = 
VectorValues.ScoreFunction.fromId(meta.readInt());
+  long vectorDataOffset = meta.readVLong();
+  long vectorDataLength = meta.readVLong();
+  int dimension = meta.readInt();
+  int size = meta.readInt();
+  int[] ordToDoc = new int[size];
+  for (int i = 0; i < size; i++) {
+int doc = meta.readVInt();
+ordToDoc[i] = doc;
+  }
+  FieldEntry fieldEntry = new FieldEntry(dimension, scoreFunction, maxDoc,

[GitHub] [lucene-solr] JohnHillegass opened a new pull request #1949: Jinja2_autoescape_false set to True

2020-10-05 Thread GitBox


JohnHillegass opened a new pull request #1949:
URL: https://github.com/apache/lucene-solr/pull/1949


   By default, jinja2 sets autoescape to False. Consider using autoescape=True 
to mitigate XSS vulnerabilities.
   
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14802) Sorting by min of two geodist functions

2020-10-05 Thread Shaun Storey (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208333#comment-17208333
 ] 

Shaun Storey commented on SOLR-14802:
-

Hi, [~ctargett] the change has just restored previous functionality that is 
still in the documentation 
[https://lucene.apache.org/solr/guide/8_6/spatial-search.html#geodist] as you 
mentioned. I guess the docs could be improve with an example as while it is 
mentioned all the code shows the parameterless usage.

> Sorting by min of two geodist functions
> ---
>
> Key: SOLR-14802
> URL: https://issues.apache.org/jira/browse/SOLR-14802
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spatial
>Reporter: Shaun Storey
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previously using geo field with type LatLonType you could implement a query 
> like
> {code:java}
> /select?q=*:*&fl=ID&fq={!geofilt}&d=50&pt=53.4721936,-2.24703&sfield=MAIN_LOCATION&sort=min(geodist(),
>  geodist(ALT_LOCATION,53.4721936,-2.24703)) asc{code}
> to sort results on minium distance from multiple different locations. Moving 
> the fields to advised LatLonPointSpatialField gives
> "geodist() does not support field names in its arguments when stated fields 
> are solr.LatLonPointSpatialField spatial type, requires sfield param instead"
> This has been reviewed before SOLR-11601 but not for my actual actual 
> use-case so it seems the choice was to just change the error message rather 
> than to implement the previous functionality. Can this be re-reviewed or let 
> me know if there is another way to achieve the same result.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14802) Sorting by min of two geodist functions

2020-10-05 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208332#comment-17208332
 ] 

David Smiley commented on SOLR-14802:
-

the spatial-search.adoc page already documents geodist and refers to these 
parameters it takes. 
{noformat}
`geodist` is a distance function that takes three optional parameters: 
`(sfield,latitude,longitude)`. You can use the `geodist` function to sort 
results by distance or score return results.
{noformat}
The issue here addresses a shortcoming so that it actually works with basically 
most spatial fields today.  That one line mentioning these parameters had 
become somewhat obsolete for a number of years, and is now more accurate again 
:-). I suspect the latitude & longitude aspect part may be outdated; you're 
expected to use the "pt" param like all the examples show.

> Sorting by min of two geodist functions
> ---
>
> Key: SOLR-14802
> URL: https://issues.apache.org/jira/browse/SOLR-14802
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spatial
>Reporter: Shaun Storey
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previously using geo field with type LatLonType you could implement a query 
> like
> {code:java}
> /select?q=*:*&fl=ID&fq={!geofilt}&d=50&pt=53.4721936,-2.24703&sfield=MAIN_LOCATION&sort=min(geodist(),
>  geodist(ALT_LOCATION,53.4721936,-2.24703)) asc{code}
> to sort results on minium distance from multiple different locations. Moving 
> the fields to advised LatLonPointSpatialField gives
> "geodist() does not support field names in its arguments when stated fields 
> are solr.LatLonPointSpatialField spatial type, requires sfield param instead"
> This has been reviewed before SOLR-11601 but not for my actual actual 
> use-case so it seems the choice was to just change the error message rather 
> than to implement the previous functionality. Can this be re-reviewed or let 
> me know if there is another way to achieve the same result.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13438) DELETE collection should remove AUTOCREATED configsets

2020-10-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208321#comment-17208321
 ] 

ASF subversion and git services commented on SOLR-13438:


Commit b45c43fdebf1a4bf6b33f2541a8fdb630a705775 in lucene-solr's branch 
refs/heads/master from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b45c43f ]

SOLR-13438: update ref guide for new default delete behavior


> DELETE collection should remove AUTOCREATED configsets
> --
>
> Key: SOLR-13438
> URL: https://issues.apache.org/jira/browse/SOLR-13438
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Mike Drob
>Priority: Major
>  Labels: newdev
> Fix For: master (9.0), 8.7
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Current user experience:
> # User creates a collection (without specifying configset), and makes some 
> schema/config changes.
> # He's/She's not happy with how the changes turned out, so he/she deletes and 
> re-creates the collection.
> # He/she observes that the previously made settings changes persist. If 
> he/she is only aware of Schema and Config APIs and not explicitly aware of 
> the concept of configsets, this will be un-intuitive for him/her.
> Proposed:
> DELETE collection should delete the configset if it has the prefix 
> ".AUTOCREATED" and that configset isn't being shared by any other collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13438) DELETE collection should remove AUTOCREATED configsets

2020-10-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208323#comment-17208323
 ] 

ASF subversion and git services commented on SOLR-13438:


Commit 4c76813f8c89c647381b42d93152953bbc8ef083 in lucene-solr's branch 
refs/heads/branch_8x from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4c76813 ]

SOLR-13438: update ref guide for new default delete behavior


> DELETE collection should remove AUTOCREATED configsets
> --
>
> Key: SOLR-13438
> URL: https://issues.apache.org/jira/browse/SOLR-13438
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Mike Drob
>Priority: Major
>  Labels: newdev
> Fix For: master (9.0), 8.7
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Current user experience:
> # User creates a collection (without specifying configset), and makes some 
> schema/config changes.
> # He's/She's not happy with how the changes turned out, so he/she deletes and 
> re-creates the collection.
> # He/she observes that the previously made settings changes persist. If 
> he/she is only aware of Schema and Config APIs and not explicitly aware of 
> the concept of configsets, this will be un-intuitive for him/her.
> Proposed:
> DELETE collection should delete the configset if it has the prefix 
> ".AUTOCREATED" and that configset isn't being shared by any other collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14802) Sorting by min of two geodist functions

2020-10-05 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208315#comment-17208315
 ] 

Cassandra Targett commented on SOLR-14802:
--

[~dsmiley], [~spstorey] - I noticed there weren't any Ref Guide updates with 
this, and I'm guessing without knowing very much about spatial that it's 
because the {{geodist}} function can take a "{{sfield}}" (spatial field name) 
as a parameter already? Or was the update missed?

> Sorting by min of two geodist functions
> ---
>
> Key: SOLR-14802
> URL: https://issues.apache.org/jira/browse/SOLR-14802
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spatial
>Reporter: Shaun Storey
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previously using geo field with type LatLonType you could implement a query 
> like
> {code:java}
> /select?q=*:*&fl=ID&fq={!geofilt}&d=50&pt=53.4721936,-2.24703&sfield=MAIN_LOCATION&sort=min(geodist(),
>  geodist(ALT_LOCATION,53.4721936,-2.24703)) asc{code}
> to sort results on minium distance from multiple different locations. Moving 
> the fields to advised LatLonPointSpatialField gives
> "geodist() does not support field names in its arguments when stated fields 
> are solr.LatLonPointSpatialField spatial type, requires sfield param instead"
> This has been reviewed before SOLR-11601 but not for my actual actual 
> use-case so it seems the choice was to just change the error message rather 
> than to implement the previous functionality. Can this be re-reviewed or let 
> me know if there is another way to achieve the same result.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on pull request #1943: LUCENE-9555 Advance conjuction Iterator for two phase iteration

2020-10-05 Thread GitBox


mayya-sharipova commented on pull request #1943:
URL: https://github.com/apache/lucene-solr/pull/1943#issuecomment-703885914


   @jpountz Sorry for the noise, I have found the cause of this error, and the 
latest commit addresses it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499863809



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -374,7 +377,25 @@ private FieldNormStatus() {
   /** Total number of fields with points. */
   public int totalValueFields;
   
-  /** Exception thrown during doc values test (null on success) */
+  /** Exception thrown during point values test (null on success) */

Review comment:
   Yeah, Points was a model for much of this, so I stumbled on that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499862666



##
File path: lucene/core/src/java/org/apache/lucene/document/FieldType.java
##
@@ -351,6 +356,27 @@ public int pointNumBytes() {
 return dimensionNumBytes;
   }
 
+  void setVectorDimensionsAndScoreFunction(int numDimensions, 
VectorValues.ScoreFunction distFunc) {
+if (numDimensions <= 0) {
+  throw new IllegalArgumentException("vector numDimensions must be > 0; 
got " + numDimensions);
+}
+if (numDimensions > VectorValues.MAX_DIMENSIONS) {
+  throw new IllegalArgumentException("vector numDimensions must be <= 
VectorValues.MAX_DIMENSIONS (=" + VectorValues.MAX_DIMENSIONS + "); got " + 
numDimensions);
+}
+this.vectorDimension = numDimensions;

Review comment:
   hmm I noticed we do not do this for Points. I think that in practice we 
do not expect users to create these FieldTypes - they are created implicitly 
when adding values using VectorField constructors - we create a new Type for 
every Field unless the user cleverly creates a generic Field with a float[] as 
its `fieldsData` and an appropriate FieldType. Efficient usage would seem to 
be: create a VectorField, add it to a doc, update its value and repeat. Anyway 
I'll add the checking





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani edited a comment on pull request #1948: LUCENE-9536: Optimize OrdinalMap when one segment contains all distinct values.

2020-10-05 Thread GitBox


jtibshirani edited a comment on pull request #1948:
URL: https://github.com/apache/lucene-solr/pull/1948#issuecomment-703871852


   I used `TestOrdinalMap` to test a map with 10,000 terms and ~10 segments. In 
the scenario where one segment contains all ordinal values, it shows a small 
improvement:
   
   ```
   baseline bytes used: 11184
   new bytes used: 10536
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499858704



##
File path: lucene/core/src/java/org/apache/lucene/codecs/VectorWriter.java
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.lucene.index.DocIDMerger;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.MergeState;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.BytesRef;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+/**
+ * Writes vectors to an index.
+ */
+public abstract class VectorWriter implements Closeable {
+
+  /** Sole constructor */
+  protected VectorWriter() {}
+
+  /** Write all values contained in the provided reader */
+  public abstract void writeField(FieldInfo fieldInfo, VectorValues values) 
throws IOException;
+
+  /** Called once at the end before close */
+  public abstract void finish() throws IOException;
+
+  /** Merge the vector values from multiple segments, for all fields */
+  public void merge(MergeState mergeState) throws IOException {
+for (VectorReader reader : mergeState.vectorReaders) {
+  if (reader != null) {
+reader.checkIntegrity();
+  }
+}
+for (FieldInfo fieldInfo : mergeState.mergeFieldInfos) {
+  if (fieldInfo.hasVectorValues()) {
+mergeVectors(fieldInfo, mergeState);
+  }
+}
+finish();
+  }
+
+  private void mergeVectors(FieldInfo mergeFieldInfo, final MergeState 
mergeState) throws IOException {
+if (mergeState.infoStream.isEnabled("VV")) {
+  mergeState.infoStream.message("VV", "merging " + mergeState.segmentInfo);
+}
+List subs = new ArrayList<>();
+int dimension = -1;
+VectorValues.ScoreFunction scoreFunction = null;
+for (int i = 0; i < mergeState.vectorReaders.length; i++) {
+  VectorReader vectorReader = mergeState.vectorReaders[i];
+  if (vectorReader != null) {
+if (mergeFieldInfo != null && mergeFieldInfo.hasVectorValues()) {
+  int segmentDimension = mergeFieldInfo.getVectorDimension();
+  VectorValues.ScoreFunction segmentScoreFunction = 
mergeFieldInfo.getVectorScoreFunction();
+  if (dimension == -1) {
+dimension = segmentDimension;
+scoreFunction = mergeFieldInfo.getVectorScoreFunction();
+  } else if (dimension != segmentDimension) {
+throw new IllegalStateException("Varying dimensions for 
vector-valued field " + mergeFieldInfo.name
++ ": " + dimension + "!=" + segmentDimension);
+  } else if (scoreFunction != segmentScoreFunction) {

Review comment:
   Yes, `IndexingChain.indexVector` calls 
`FieldInfo.setVectorDimensionAndScoreFunction`, which checks against existing 
values. It allows to go from 0 to non-zero dimension (setting scoreFunction at 
that time), but no other change is allowed. This is tested in TestVectorValues





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499858704



##
File path: lucene/core/src/java/org/apache/lucene/codecs/VectorWriter.java
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.lucene.index.DocIDMerger;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.MergeState;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.BytesRef;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+/**
+ * Writes vectors to an index.
+ */
+public abstract class VectorWriter implements Closeable {
+
+  /** Sole constructor */
+  protected VectorWriter() {}
+
+  /** Write all values contained in the provided reader */
+  public abstract void writeField(FieldInfo fieldInfo, VectorValues values) 
throws IOException;
+
+  /** Called once at the end before close */
+  public abstract void finish() throws IOException;
+
+  /** Merge the vector values from multiple segments, for all fields */
+  public void merge(MergeState mergeState) throws IOException {
+for (VectorReader reader : mergeState.vectorReaders) {
+  if (reader != null) {
+reader.checkIntegrity();
+  }
+}
+for (FieldInfo fieldInfo : mergeState.mergeFieldInfos) {
+  if (fieldInfo.hasVectorValues()) {
+mergeVectors(fieldInfo, mergeState);
+  }
+}
+finish();
+  }
+
+  private void mergeVectors(FieldInfo mergeFieldInfo, final MergeState 
mergeState) throws IOException {
+if (mergeState.infoStream.isEnabled("VV")) {
+  mergeState.infoStream.message("VV", "merging " + mergeState.segmentInfo);
+}
+List subs = new ArrayList<>();
+int dimension = -1;
+VectorValues.ScoreFunction scoreFunction = null;
+for (int i = 0; i < mergeState.vectorReaders.length; i++) {
+  VectorReader vectorReader = mergeState.vectorReaders[i];
+  if (vectorReader != null) {
+if (mergeFieldInfo != null && mergeFieldInfo.hasVectorValues()) {
+  int segmentDimension = mergeFieldInfo.getVectorDimension();
+  VectorValues.ScoreFunction segmentScoreFunction = 
mergeFieldInfo.getVectorScoreFunction();
+  if (dimension == -1) {
+dimension = segmentDimension;
+scoreFunction = mergeFieldInfo.getVectorScoreFunction();
+  } else if (dimension != segmentDimension) {
+throw new IllegalStateException("Varying dimensions for 
vector-valued field " + mergeFieldInfo.name
++ ": " + dimension + "!=" + segmentDimension);
+  } else if (scoreFunction != segmentScoreFunction) {

Review comment:
   Yes, IndexingChain.indexVector calls 
FieldInfo.setVectorDimensionAndScoreFunction, which checks against existing 
values. It allows to go from 0 to non-zero dimension (setting scoreFunction at 
that time), but no other change is allowed. This is tested in TestVectorValues





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#issuecomment-703872279


   > Thank you for ... the tests catching mis-use where user tries to change 
dimension or scoring function in an existing field.
   Thanks to @mocobeta for those; I was able to carry that forward from her 
earlier patch
   
   > I see you implemented the two score functions, but are they ever exercised 
in tests
   True - this was extracted from a bigger change including usage of those 
methods as part of KNN search, but they deserve their own unit tests - I'll add.
   
   > I would love to see a "Vector Overview" javadoc somewhere ...
   Yes - I'll add to the VectorValues/VectorField class javadocs I think that's 
the most natural/visible place.
   
   > I am curious how the basic vector usage performs -- just indexing one 
vector field, and retrieving it at search time. We can (separately) enable 
luceneutil to support testing vectors, somehow. But I wonder where we'll get 
semi-realistic vectors derived from Wikipedia content 
   Agreed that benchmarking is needed. I think we can use 
http://ann-benchmarks.com/ as a guide for some standardized test vectors. They 
won't be related to wikipedia? If we get to wanting that, we could also make 
use of something like https://fasttext.cc/docs/en/pretrained-vectors.html that 
is trained on ngrams taken from Wikipedia (for many languages)? I don't know 
how suited it is, just found in a google search. For that, we'd have to compute 
document/query vectors based on an ngram-vector dictionary. I think a simple 
thing is to sum all the ngram-vectors for all the ngrams in a document / query



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov edited a comment on pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov edited a comment on pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#issuecomment-703872279


   > Thank you for ... the tests catching mis-use where user tries to change 
dimension or scoring function in an existing field.
   
   Thanks to @mocobeta for those; I was able to carry that forward from her 
earlier patch
   
   > I see you implemented the two score functions, but are they ever exercised 
in tests
   
   True - this was extracted from a bigger change including usage of those 
methods as part of KNN search, but they deserve their own unit tests - I'll add.
   
   > I would love to see a "Vector Overview" javadoc somewhere ...
   
   Yes - I'll add to the VectorValues/VectorField class javadocs I think that's 
the most natural/visible place.
   
   > I am curious how the basic vector usage performs -- just indexing one 
vector field, and retrieving it at search time. We can (separately) enable 
luceneutil to support testing vectors, somehow. But I wonder where we'll get 
semi-realistic vectors derived from Wikipedia content 
   
   Agreed that benchmarking is needed. I think we can use 
http://ann-benchmarks.com/ as a guide for some standardized test vectors. They 
won't be related to wikipedia? If we get to wanting that, we could also make 
use of something like https://fasttext.cc/docs/en/pretrained-vectors.html that 
is trained on ngrams taken from Wikipedia (for many languages)? I don't know 
how suited it is, just found in a google search. For that, we'd have to compute 
document/query vectors based on an ngram-vector dictionary. I think a simple 
thing is to sum all the ngram-vectors for all the ngrams in a document / query



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on pull request #1948: LUCENE-9536: Optimize OrdinalMap when one segment contains all distinct values.

2020-10-05 Thread GitBox


jtibshirani commented on pull request #1948:
URL: https://github.com/apache/lucene-solr/pull/1948#issuecomment-703871852


   I used `TestOrdinalMap` to test a map with 10,000 terms in a scenario and 
~10 segments. In the scenario where one segment contains all ordinal values, it 
shows a small improvement:
   
   ```
   baseline bytes used: 11184
   new bytes used: 10536
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani opened a new pull request #1948: LUCENE-9536: Optimize OrdinalMap when one segment contains all distinct values.

2020-10-05 Thread GitBox


jtibshirani opened a new pull request #1948:
URL: https://github.com/apache/lucene-solr/pull/1948


   For doc values that are not too high cardinality, it is common for some large
   segments to contain all distinct values. In this case, we can check if the 
first
   segment ords map perfectly to global ords, and if so store the global ord 
deltas
   and first segment indices as `LongValues.ZEROES`
   to save some space.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


msokolov commented on pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#issuecomment-703831393


   Thanks for the extensive comments, @mikemccand - I'll address soon with an 
updated PR. 
   
   I also found some bugs in implementations of the random access interface, 
and I want to fix those and enhance the test coverage. Mainly, they were 
incorrectly sharing state with the enclosing iterators, which causes issues if 
you simultaneously iterate and access randomly. There were also a couple of 
bugs dealing with empty segments that apparently weren't caught by testRandom.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


danmuzi commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499759537



##
File path: gradle/releasing.gradle
##
@@ -0,0 +1,58 @@
+

Review comment:
   minor comment but there is an unnecessary blank line :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


danmuzi commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499759537



##
File path: gradle/releasing.gradle
##
@@ -0,0 +1,58 @@
+

Review comment:
   minor comment but unnecessary blank line :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase commented on pull request #1940: LUCENE-9552: Adds a LatLonPoint query that accepts an array of LatLonGeometries

2020-10-05 Thread GitBox


iverase commented on pull request #1940:
URL: https://github.com/apache/lucene-solr/pull/1940#issuecomment-703744381


   @rmuir, I am trying to understand why adding this new API seems so 
problematic, I would be more incline to deprecate `newPolygonQuery` in favour 
of this new one. My feeling is that we should not force the user to decide 
which API to use but to provide the geometries and we internally decide the 
most optimal way to execute them?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r499709502



##
File path: gradle/releasing.gradle
##
@@ -0,0 +1,58 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.commons.codec.digest.DigestUtils
+import org.apache.commons.codec.digest.MessageDigestAlgorithms
+
+// We're using commons-codec for computing checksums.
+buildscript {
+repositories {
+mavenCentral()
+}
+
+dependencies {
+classpath 'commons-codec:commons-codec:1.13'
+}
+}
+
+allprojects {
+plugins.withType(DistributionPlugin) {
+def checksum = {
+outputs.files.each { File file ->
+new File(file.parent, file.name + ".sha512").text = new 
DigestUtils(MessageDigestAlgorithms.SHA_512).digestAsHex(file).trim() + "  " + 
file.name

Review comment:
   Use charset here. `File.text = ...` uses default charset. Use 
`File.setText(..., 'UTF-8')`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-703727019


   Nevertheless, there's still a forbiddenapis issue: When writing the SHA512 
file, we don't apply charset, it's platform dependent.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


dweiss commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-703725625


   That's what I though.
   
   On Mon, Oct 5, 2020 at 5:55 PM Uwe Schindler 
   wrote:
   
   > You can apply forbiddenapis to groovy or gradle code. But for that you
   > need to have everything compiled to build-src./build/classes With adhoc
   > scripts it wont work.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > ,
   > or unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14913) Including non-existing field in edismax field alias breaks parsing of boolean query

2020-10-05 Thread Johannes Baiter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Baiter updated SOLR-14913:
---
Description: 
When including a non-existing field in a {{f..qf}} field alias, boolean 
queries are parsed incorrectly.
 For non-boolean queries the invalid field is simply ignored, but in boolean 
queries only the first search term is resolved to the fields from the alias, 
while the second term is resolved to the {{_text_}} field.
 This parse is identical to the parse obtained from the same query with 
brackets removed, i.e. {{: ( )}} returns the exact same 
parse as {{: }}.

I was able to reproduce this with the demo index in the latest Docker setup:

{{$ docker run --name solr_demo -d -p 8983:8983 solr:8 solr-demo}}

*Query:*
{code:bash}
$ curl "http://localhost:8983/solr/demo/select?\
debugQuery=on\
&defType=edismax\
&f.meta.qf=name+manu+cat+features+idontexist\
&q=meta%3A%28samsung+noiseguard%29"
{code}
*Expected Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung)) (name:noiseguard | manu:noiseguard | features: noiseguard | 
cat:noiseguard))" }
}
{code}
*Actual Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung) (_text_:noiseguard))" }
}
{code}

  was:
When including a non-existing field in a {{f..qf}} field alias, boolean 
queries are parsed incorrectly.
 For non-boolean queries the invalid field is simply ignored, but in boolean 
queries only the first search term is resolved to the fields from the alias, 
while the second term is resolved to the {{_text_}} field.
 This parse is identical to the parse obtained from the same query with 
brackets removed, i.e. {{: ( ) }}returns the exact same 
parse as{{ : }}.

I was able to reproduce this with the demo index in the latest Docker setup:

{{$ docker run --name solr_demo -d -p 8983:8983 solr:8 solr-demo}}

*Query:*
{code:bash}
$ curl "http://localhost:8983/solr/demo/select?\
debugQuery=on\
&defType=edismax\
&f.meta.qf=name+manu+cat+features+idontexist\
&q=meta%3A%28samsung+noiseguard%29"
{code}
*Expected Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung)) (name:noiseguard | manu:noiseguard | features: noiseguard | 
cat:noiseguard))" }
}
{code}
*Actual Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung) (_text_:noiseguard))" }
}
{code}


> Including non-existing field in edismax field alias breaks parsing of boolean 
> query
> ---
>
> Key: SOLR-14913
> URL: https://issues.apache.org/jira/browse/SOLR-14913
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.6, 8.6.2
>Reporter: Johannes Baiter
>Priority: Major
>
> When including a non-existing field in a {{f..qf}} field alias, 
> boolean queries are parsed incorrectly.
>  For non-boolean queries the invalid field is simply ignored, but in boolean 
> queries only the first search term is resolved to the fields from the alias, 
> while the second term is resolved to the {{_text_}} field.
>  This parse is identical to the parse obtained from the same query with 
> brackets removed, i.e. {{: ( )}} returns the exact same 
> parse as {{: }}.
> I was able to reproduce this with the demo index in the latest Docker setup:
> {{$ docker run --name solr_demo -d -p 8983:8983 solr:8 solr-demo}}
> *Query:*
> {code:bash}
> $ curl "http://localhost:8983/solr/demo/select?\
> debugQuery=on\
> &defType=edismax\
> &f.meta.qf=name+manu+cat+features+idontexist\
> &q=meta%3A%28samsung+noiseguard%29"
> {code}
> *Expected Parse:*
> {code:json}
> { "debug":
>   { "parsedquery_toString": "+((name:samsung | manu:samsung | 
> features:samsung | cat:samsung)) (name:noiseguard | manu:noiseguard | 
> features: noiseguard | cat:noiseguard))" }
> }
> {code}
> *Actual Parse:*
> {code:json}
> { "debug":
>   { "parsedquery_toString": "+((name:samsung | manu:samsung | 
> features:samsung | cat:samsung) (_text_:noiseguard))" }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14913) Including non-existing field in edismax field alias breaks parsing of boolean query

2020-10-05 Thread Johannes Baiter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Baiter updated SOLR-14913:
---
Description: 
When including a non-existing field in a {{f..qf}} field alias, boolean 
queries are parsed incorrectly.
 For non-boolean queries the invalid field is simply ignored, but in boolean 
queries only the first search term is resolved to the fields from the alias, 
while the second term is resolved to the {{_text_}} field.
 This parse is identical to the parse obtained from the same query with 
brackets removed, i.e. {{: ( ) }}returns the exact same 
parse as{{ : }}.

I was able to reproduce this with the demo index in the latest Docker setup:

{{$ docker run --name solr_demo -d -p 8983:8983 solr:8 solr-demo}}

*Query:*
{code:bash}
$ curl "http://localhost:8983/solr/demo/select?\
debugQuery=on\
&defType=edismax\
&f.meta.qf=name+manu+cat+features+idontexist\
&q=meta%3A%28samsung+noiseguard%29"
{code}
*Expected Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung)) (name:noiseguard | manu:noiseguard | features: noiseguard | 
cat:noiseguard))" }
}
{code}
*Actual Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung) (_text_:noiseguard))" }
}
{code}

  was:
When including a non-existing field in a {{f..qf}} field alias, boolean 
queries are parsed incorrectly.
 For non-boolean queries the invalid field is simply ignored, but in boolean 
queries only the first search term is resolved to the fields from the alias, 
while the second term is resolved to the {{_text_}} field.
 This parse is identical to the parse obtained from the same query with 
brackets removed, i.e. {{:( ) }}returns the exact same 
parse as{{ : }}.

I was able to reproduce this with the demo index in the latest Docker setup:

{{$ docker run --name solr_demo -d -p 8983:8983 solr:8 solr-demo}}

*Query:*
{code:bash}
$ curl "http://localhost:8983/solr/demo/select?\
debugQuery=on\
&defType=edismax\
&f.meta.qf=name+manu+cat+features+idontexist\
&q=meta%3A%28samsung+noiseguard%29"
{code}
*Expected Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung)) (name:noiseguard | manu:noiseguard | features: noiseguard | 
cat:noiseguard))" }
}
{code}
*Actual Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung) (_text_:noiseguard))" }
}
{code}


> Including non-existing field in edismax field alias breaks parsing of boolean 
> query
> ---
>
> Key: SOLR-14913
> URL: https://issues.apache.org/jira/browse/SOLR-14913
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.6, 8.6.2
>Reporter: Johannes Baiter
>Priority: Major
>
> When including a non-existing field in a {{f..qf}} field alias, 
> boolean queries are parsed incorrectly.
>  For non-boolean queries the invalid field is simply ignored, but in boolean 
> queries only the first search term is resolved to the fields from the alias, 
> while the second term is resolved to the {{_text_}} field.
>  This parse is identical to the parse obtained from the same query with 
> brackets removed, i.e. {{: ( ) }}returns the exact same 
> parse as{{ : }}.
> I was able to reproduce this with the demo index in the latest Docker setup:
> {{$ docker run --name solr_demo -d -p 8983:8983 solr:8 solr-demo}}
> *Query:*
> {code:bash}
> $ curl "http://localhost:8983/solr/demo/select?\
> debugQuery=on\
> &defType=edismax\
> &f.meta.qf=name+manu+cat+features+idontexist\
> &q=meta%3A%28samsung+noiseguard%29"
> {code}
> *Expected Parse:*
> {code:json}
> { "debug":
>   { "parsedquery_toString": "+((name:samsung | manu:samsung | 
> features:samsung | cat:samsung)) (name:noiseguard | manu:noiseguard | 
> features: noiseguard | cat:noiseguard))" }
> }
> {code}
> *Actual Parse:*
> {code:json}
> { "debug":
>   { "parsedquery_toString": "+((name:samsung | manu:samsung | 
> features:samsung | cat:samsung) (_text_:noiseguard))" }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14913) Including non-existing field in edismax field alias breaks parsing of boolean query

2020-10-05 Thread Johannes Baiter (Jira)
Johannes Baiter created SOLR-14913:
--

 Summary: Including non-existing field in edismax field alias 
breaks parsing of boolean query
 Key: SOLR-14913
 URL: https://issues.apache.org/jira/browse/SOLR-14913
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: query parsers
Affects Versions: 8.6.2, 7.6
Reporter: Johannes Baiter


When including a non-existing field in a {{f..qf}} field alias, boolean 
queries are parsed incorrectly.
 For non-boolean queries the invalid field is simply ignored, but in boolean 
queries only the first search term is resolved to the fields from the alias, 
while the second term is resolved to the {{_text_}} field.
 This parse is identical to the parse obtained from the same query with 
brackets removed, i.e. {{:( ) }}returns the exact same 
parse as{{ : }}.

I was able to reproduce this with the demo index in the latest Docker setup:

{{$ docker run --name solr_demo -d -p 8983:8983 solr:8 solr-demo}}

*Query:*
{code:bash}
$ curl "http://localhost:8983/solr/demo/select?\
debugQuery=on\
&defType=edismax\
&f.meta.qf=name+manu+cat+features+idontexist\
&q=meta%3A%28samsung+noiseguard%29"
{code}
*Expected Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung)) (name:noiseguard | manu:noiseguard | features: noiseguard | 
cat:noiseguard))" }
}
{code}
*Actual Parse:*
{code:json}
{ "debug":
  { "parsedquery_toString": "+((name:samsung | manu:samsung | features:samsung 
| cat:samsung) (_text_:noiseguard))" }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


uschindler commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-703723514


   You can apply forbiddenapis to groovy or gradle code. But for that you need 
to have everything compiled to build-src./build/classes With adhoc scripts it 
wont work.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


dweiss commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-703718577


   Yup, don't use internal* - not a good idea. As for gradle API - I don't 
think it'll be easy to apply forbidden APIs here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


madrob commented on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-703715598


   > Are we sure we need dependency at all. I would have assumed that Gradle 
has it already.
   
   Gradle has a shaded commons codec that we can use - 
`org.gradle.internal.impldep.org.apache.commons.codec.digest.DigestUtils` - but 
I assumed that would be frowned upon.
   
   > This is also a forbiddenapis violation: the charset is missing. Please 
use: setText(..., "UTF-8")



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob edited a comment on pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-10-05 Thread GitBox


madrob edited a comment on pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#issuecomment-703715598


   > Are we sure we need dependency at all. I would have assumed that Gradle 
has it already.
   
   Gradle has a shaded commons codec that we can use - 
`org.gradle.internal.impldep.org.apache.commons.codec.digest.DigestUtils` - but 
I assumed that would be frowned upon.
   
   > This is also a forbiddenapis violation: the charset is missing. Please 
use: setText(..., "UTF-8")
   
   This didn't fail on precommit for me, can you do a follow on issue to apply 
forbiddenAPI to the gradle files?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9541) Ensure sub-iterators of ConjunctionDISI are on the same document

2020-10-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208122#comment-17208122
 ] 

ASF subversion and git services commented on LUCENE-9541:
-

Commit e325f66e61af4b67cc8effd2093f7331e449f1d4 in lucene-solr's branch 
refs/heads/master from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e325f66 ]

Revert "LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)"

This reverts commit 5f34acfdb59f58987e0f69e8876e190b5e91dd3a.


> Ensure sub-iterators of ConjunctionDISI are on the same document
> 
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> For example if BitSetConjuctionDISI  _disi_ is composed of DocIdSetIterator 
> _a_ of docs  [0,1] and BitSetIterator _b_ of docs [0,1].  Doing `b.nextDoc()` 
> we are collecting doc0,  doing `disi.nextDoc` we again  collecting the same 
> doc0.
> It seems that other conjunction iterators don't have this behaviour, if we 
> are advancing any of their component pass a certain document, the whole 
> conjunction iterator will also be advanced pass this document. 
>  
> This behaviour was exposed in this 
> [PR|https://github.com/apache/lucene-solr/pull/1903]. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on a change in pull request #1921: SOLR-14829: Improve documentation for Request Handlers in RefGuide and solrconfig.xml

2020-10-05 Thread GitBox


arafalov commented on a change in pull request #1921:
URL: https://github.com/apache/lucene-solr/pull/1921#discussion_r499658300



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -307,11 +307,13 @@ The `echoParams` parameter controls what information 
about request parameters is
 
 The `echoParams` parameter accepts the following values:
 
-* `explicit`: This is the default value. Only parameters included in the 
actual request, plus the `_` parameter (which is a 64-bit numeric timestamp) 
will be added to the `params` section of the response header.
+* `explicit`: Only parameters included in the actual request, plus the `_` 
parameter (which is a 64-bit numeric timestamp) will be added to the `params` 
section of the response header.

Review comment:
   Will do. I was busy on the other project, but will finish this one very 
soon.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1921: SOLR-14829: Improve documentation for Request Handlers in RefGuide and solrconfig.xml

2020-10-05 Thread GitBox


dsmiley commented on a change in pull request #1921:
URL: https://github.com/apache/lucene-solr/pull/1921#discussion_r499647440



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -307,11 +307,13 @@ The `echoParams` parameter controls what information 
about request parameters is
 
 The `echoParams` parameter accepts the following values:
 
-* `explicit`: This is the default value. Only parameters included in the 
actual request, plus the `_` parameter (which is a 64-bit numeric timestamp) 
will be added to the `params` section of the response header.
+* `explicit`: Only parameters included in the actual request, plus the `_` 
parameter (which is a 64-bit numeric timestamp) will be added to the `params` 
section of the response header.

Review comment:
   @arafalov can you please remove the '_' reference here in the docs?  I 
think what it says about this is false; I suspect it was always false.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14887) Upgrade JQuery to 3.5.1

2020-10-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208102#comment-17208102
 ] 

Gézapeti commented on SOLR-14887:
-

I've found a reference to jquery 2.1.3 in our owasp check and jquery 3.3.1 was 
in used in the reference guide.
I've fixed both places and checked the ref guide and the admin ui manually

> Upgrade JQuery to 3.5.1
> ---
>
> Key: SOLR-14887
> URL: https://issues.apache.org/jira/browse/SOLR-14887
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is 
> out and addresses some security vulnerabilities. It would be good to upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gezapeti opened a new pull request #1947: SOLR-14887 Upgrade JQuery to 3.5.1

2020-10-05 Thread GitBox


gezapeti opened a new pull request #1947:
URL: https://github.com/apache/lucene-solr/pull/1947


   # Description
   
   The Solr admin UI currently uses JQuery 3.4.1 (SOLR-14209). JQuery 3.5.1 is 
out and addresses some security vulnerabilities. It would be good to upgrade.
   
   # Solution
   
   Upgraded JQuery to 3.5.1
   
   # Tests
   
   Manually tested the UI.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova merged pull request #1937: LUCENE-9541 ConjunctionDISI sub-iterators check

2020-10-05 Thread GitBox


mayya-sharipova merged pull request #1937:
URL: https://github.com/apache/lucene-solr/pull/1937


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9541) Ensure sub-iterators of ConjunctionDISI are on the same document

2020-10-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208083#comment-17208083
 ] 

ASF subversion and git services commented on LUCENE-9541:
-

Commit 5f34acfdb59f58987e0f69e8876e190b5e91dd3a in lucene-solr's branch 
refs/heads/master from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f34acf ]

LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)

* LUCENE-9541 ConjunctionDISI sub-iterators check

Ensure sub-iterators of a conjunction iterator are on the same doc.

> Ensure sub-iterators of ConjunctionDISI are on the same document
> 
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> For example if BitSetConjuctionDISI  _disi_ is composed of DocIdSetIterator 
> _a_ of docs  [0,1] and BitSetIterator _b_ of docs [0,1].  Doing `b.nextDoc()` 
> we are collecting doc0,  doing `disi.nextDoc` we again  collecting the same 
> doc0.
> It seems that other conjunction iterators don't have this behaviour, if we 
> are advancing any of their component pass a certain document, the whole 
> conjunction iterator will also be advanced pass this document. 
>  
> This behaviour was exposed in this 
> [PR|https://github.com/apache/lucene-solr/pull/1903]. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9541) Ensure sub-iterators of ConjunctionDISI are on the same document

2020-10-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208084#comment-17208084
 ] 

ASF subversion and git services commented on LUCENE-9541:
-

Commit 5f34acfdb59f58987e0f69e8876e190b5e91dd3a in lucene-solr's branch 
refs/heads/master from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f34acf ]

LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)

* LUCENE-9541 ConjunctionDISI sub-iterators check

Ensure sub-iterators of a conjunction iterator are on the same doc.

> Ensure sub-iterators of ConjunctionDISI are on the same document
> 
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> For example if BitSetConjuctionDISI  _disi_ is composed of DocIdSetIterator 
> _a_ of docs  [0,1] and BitSetIterator _b_ of docs [0,1].  Doing `b.nextDoc()` 
> we are collecting doc0,  doing `disi.nextDoc` we again  collecting the same 
> doc0.
> It seems that other conjunction iterators don't have this behaviour, if we 
> are advancing any of their component pass a certain document, the whole 
> conjunction iterator will also be advanced pass this document. 
>  
> This behaviour was exposed in this 
> [PR|https://github.com/apache/lucene-solr/pull/1903]. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1930: LUCENE-9322: add VectorValues to new Lucene90 codec

2020-10-05 Thread GitBox


mikemccand commented on a change in pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#discussion_r499232347



##
File path: .dir-locals.el
##
@@ -0,0 +1,3 @@
+;; set up Lucene style for emacs
+((java-mode . ((c-basic-offset . 2

Review comment:
   Woot!

##
File path: lucene/core/src/java/org/apache/lucene/codecs/VectorWriter.java
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.lucene.index.DocIDMerger;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.MergeState;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.util.BytesRef;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+/**
+ * Writes vectors to an index.
+ */
+public abstract class VectorWriter implements Closeable {
+
+  /** Sole constructor */
+  protected VectorWriter() {}
+
+  /** Write all values contained in the provided reader */
+  public abstract void writeField(FieldInfo fieldInfo, VectorValues values) 
throws IOException;
+
+  /** Called once at the end before close */
+  public abstract void finish() throws IOException;
+
+  /** Merge the vector values from multiple segments, for all fields */
+  public void merge(MergeState mergeState) throws IOException {
+for (VectorReader reader : mergeState.vectorReaders) {
+  if (reader != null) {
+reader.checkIntegrity();
+  }
+}
+for (FieldInfo fieldInfo : mergeState.mergeFieldInfos) {
+  if (fieldInfo.hasVectorValues()) {
+mergeVectors(fieldInfo, mergeState);
+  }
+}
+finish();
+  }
+
+  private void mergeVectors(FieldInfo mergeFieldInfo, final MergeState 
mergeState) throws IOException {
+if (mergeState.infoStream.isEnabled("VV")) {
+  mergeState.infoStream.message("VV", "merging " + mergeState.segmentInfo);
+}
+List subs = new ArrayList<>();
+int dimension = -1;
+VectorValues.ScoreFunction scoreFunction = null;
+for (int i = 0; i < mergeState.vectorReaders.length; i++) {
+  VectorReader vectorReader = mergeState.vectorReaders[i];
+  if (vectorReader != null) {
+if (mergeFieldInfo != null && mergeFieldInfo.hasVectorValues()) {
+  int segmentDimension = mergeFieldInfo.getVectorDimension();
+  VectorValues.ScoreFunction segmentScoreFunction = 
mergeFieldInfo.getVectorScoreFunction();
+  if (dimension == -1) {
+dimension = segmentDimension;
+scoreFunction = mergeFieldInfo.getVectorScoreFunction();
+  } else if (dimension != segmentDimension) {
+throw new IllegalStateException("Varying dimensions for 
vector-valued field " + mergeFieldInfo.name
++ ": " + dimension + "!=" + segmentDimension);
+  } else if (scoreFunction != segmentScoreFunction) {

Review comment:
   Does `IndexWriter` also catch if the user tries to change either 
dimension or score function on indexing a new document?

##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java
##
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOExce

[jira] [Updated] (LUCENE-9541) Ensure sub-iterators of ConjunctionDISI are on the same document

2020-10-05 Thread Mayya Sharipova (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova updated LUCENE-9541:

Summary: Ensure sub-iterators of ConjunctionDISI are on the same document  
(was: BitSetConjunctionDISI doesn't advance based on its components)

> Ensure sub-iterators of ConjunctionDISI are on the same document
> 
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> For example if BitSetConjuctionDISI  _disi_ is composed of DocIdSetIterator 
> _a_ of docs  [0,1] and BitSetIterator _b_ of docs [0,1].  Doing `b.nextDoc()` 
> we are collecting doc0,  doing `disi.nextDoc` we again  collecting the same 
> doc0.
> It seems that other conjunction iterators don't have this behaviour, if we 
> are advancing any of their component pass a certain document, the whole 
> conjunction iterator will also be advanced pass this document. 
>  
> This behaviour was exposed in this 
> [PR|https://github.com/apache/lucene-solr/pull/1903]. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1937: LUCENE-9541 ConjunctionDISI sub-iterators check

2020-10-05 Thread GitBox


mayya-sharipova commented on a change in pull request #1937:
URL: https://github.com/apache/lucene-solr/pull/1937#discussion_r499499632



##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestConjunctionDISI.java
##
@@ -391,4 +391,20 @@ public void testCollapseSubConjunctionDISIs() throws 
IOException {
   public void testCollapseSubConjunctionScorers() throws IOException {
 testCollapseSubConjunctions(true);
   }
+
+  public void testIllegalAdvancementOfSubIteratorsTripsAssertion() throws 
IOException {
+int maxDoc = 100;
+final int numIterators = TestUtil.nextInt(random(), 2, 5);
+FixedBitSet set = randomSet(maxDoc);
+
+DocIdSetIterator[] iterators = new DocIdSetIterator[numIterators];
+for (int i = 0; i < iterators.length; ++i) {
+  iterators[i] = new BitDocIdSet(set).iterator();
+}
+final DocIdSetIterator conjunction = 
ConjunctionDISI.intersectIterators(Arrays.asList(iterators));
+int idx = TestUtil.nextInt(random() , 0, iterators.length-1);
+iterators[idx].nextDoc(); // illegally advancing one of the sub-iterators 
outside of the conjunction iterator
+AssertionError ex = expectThrows(AssertionError.class, () -> 
conjunction.nextDoc());
+assertEquals(ex.getMessage(), "Sub-iterators of ConjunctionDISI are not 
the same document!");

Review comment:
   Thanks @jpountz, my mistake, addressed in bf9a3e8





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1937: LUCENE-9541 ConjunctionDISI sub-iterators check

2020-10-05 Thread GitBox


mayya-sharipova commented on a change in pull request #1937:
URL: https://github.com/apache/lucene-solr/pull/1937#discussion_r499499532



##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestConjunctionDISI.java
##
@@ -391,4 +391,20 @@ public void testCollapseSubConjunctionDISIs() throws 
IOException {
   public void testCollapseSubConjunctionScorers() throws IOException {
 testCollapseSubConjunctions(true);
   }
+
+  public void testIllegalAdvancementOfSubIteratorsTripsAssertion() throws 
IOException {
+int maxDoc = 100;

Review comment:
   Thanks @jpountz, very good point, addressed in bf9a3e8





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9560) Position aware TermQuery

2020-10-05 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207983#comment-17207983
 ] 

Michael McCandless commented on LUCENE-9560:


+1 to make this simpler.  The common solution of the (inefficient, doubly 
indexed) {{all}} field that apps are forced to use today is not a great 
solution.

A position-aware term query (and e.g. phrase query?) would let you stop 
double-indexing, but pay some small search-time performance since positions 
must be decoded while searching even for non-positional (term) query.

I think the fun/challenging part of this is how to know which fields mapped to 
which token positions while indexing?  Apps could statically assign positions 
to fields (field "title" gets 0-1000, field "body" gets 1000 - 6000, etc.)?  
Or, they could do it dynamically, but then every document must record its own 
"position to field" mapping and then at search time, decode that encoding.

This does perfect for interval queries, but, which interval query could we 
(dynamically?) feed position constraints to?  Maybe we would just need our own 
{{IntervalsSource}} filter?

> Position aware TermQuery
> 
>
> Key: LUCENE-9560
> URL: https://issues.apache.org/jira/browse/LUCENE-9560
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Haoyu Zhai
>Priority: Major
>
> In our work, we index most of our fields into an "all" field (like 
> elasticsearch) to make our search faster. But at the same time we still want 
> to support some of the field specific search (like {{title}}), so currently 
> our solution is to double index them so that we could do both "all" search as 
> well as specific field search.
> I want to propose a new term query that accept a range in a specific field to 
> search so that we could search on "all" field but act like a field specific 
> search. Then we need not to doubly index those field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase commented on pull request #1907: LUCENE-9538: Detect polygon self-intersections in the Tessellator

2020-10-05 Thread GitBox


iverase commented on pull request #1907:
URL: https://github.com/apache/lucene-solr/pull/1907#issuecomment-703526645


   Performance test shows a small impact in indexing performance:
   
   ```
   Index time (sec)||Force merge time (sec)||Index size (GB)||Reader heap (MB)||
   ||Dev||Base||Diff ||Dev  ||Base  ||diff   ||Dev||Base||Diff||Dev||Base||Diff 
||
   |480.5s|467.0s| 3%|0.0s|0.0s| 0%|2.24|2.24| 0%|0.01|0.01| 0%|
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9558) Clean up package name conflicts for analyzers-icu module

2020-10-05 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida resolved LUCENE-9558.
---
Fix Version/s: master (9.0)
   Resolution: Fixed

> Clean up package name conflicts for analyzers-icu module
> 
>
> Key: LUCENE-9558
> URL: https://issues.apache.org/jira/browse/LUCENE-9558
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> analyzers-icu module shares the package names {{o.a.l.collation}} and 
> {{o.a.l.collation.tokenattributes}} with analyzers-common; they need to be 
> renamed.
> There could be two solutions:
> 1. rename "o.a.l.collation" to "o.a.l.a.icu.collation"
>  2. move classes under "o.a.l.collation" to "o.a.l.a.icu" and classes under 
> "o.a.l.collation.tokenattributes" to "o.a.l.a.icu.tokenattributes", and 
> delete "o.a.l.collation" from analyzers-icu
> I would prefer option 2. 1. may complicate the package hierarchy and there 
> already exist {{o.a.l.a.icu.tokenattributes}}. (All classes under 
> "o.a.l.collation" have prefix "ICUCollation", so I think we don't need to 
> keep "collation" in the package name, do we?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta merged pull request #1946: LUCENE-9558: Clean up package name conflicts for analyzers-icu.

2020-10-05 Thread GitBox


mocobeta merged pull request #1946:
URL: https://github.com/apache/lucene-solr/pull/1946


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9558) Clean up package name conflicts for analyzers-icu module

2020-10-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207923#comment-17207923
 ] 

ASF subversion and git services commented on LUCENE-9558:
-

Commit b70eaeee5acd9b166e51d4eb5a27eecac984f762 in lucene-solr's branch 
refs/heads/master from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b70eaee ]

LUCENE-9558: Clean up package name conflicts for analyzers-icu. (#1946)



> Clean up package name conflicts for analyzers-icu module
> 
>
> Key: LUCENE-9558
> URL: https://issues.apache.org/jira/browse/LUCENE-9558
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> analyzers-icu module shares the package names {{o.a.l.collation}} and 
> {{o.a.l.collation.tokenattributes}} with analyzers-common; they need to be 
> renamed.
> There could be two solutions:
> 1. rename "o.a.l.collation" to "o.a.l.a.icu.collation"
>  2. move classes under "o.a.l.collation" to "o.a.l.a.icu" and classes under 
> "o.a.l.collation.tokenattributes" to "o.a.l.a.icu.tokenattributes", and 
> delete "o.a.l.collation" from analyzers-icu
> I would prefer option 2. 1. may complicate the package hierarchy and there 
> already exist {{o.a.l.a.icu.tokenattributes}}. (All classes under 
> "o.a.l.collation" have prefix "ICUCollation", so I think we don't need to 
> keep "collation" in the package name, do we?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta edited a comment on pull request #1946: LUCENE-9558: Clean up package name conflicts for analyzers-icu.

2020-10-05 Thread GitBox


mocobeta edited a comment on pull request #1946:
URL: https://github.com/apache/lucene-solr/pull/1946#issuecomment-703493569


   > BTW I stumbled on this split-package javadoc issue trying to add some 
utilities in lucene/misc/src/java/org/apache/lucene/util/hnsw - I guess I won't 
do that, but I wonder where benchmarking tools for hnsw belong?
   
   Not pretty sure, but we could put benchmarking tools in lucene/benchmark? I 
think package name would not be a big matter for it, maybe 
`o.a.l.benchmark.hnsw` would be ok.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta commented on pull request #1946: LUCENE-9558: Clean up package name conflicts for analyzers-icu.

2020-10-05 Thread GitBox


mocobeta commented on pull request #1946:
URL: https://github.com/apache/lucene-solr/pull/1946#issuecomment-703493569


   > BTW I stumbled on this split-package javadoc issue trying to add some 
utilities in lucene/misc/src/java/org/apache/lucene/util/hnsw - I guess I won't 
do that, but I wonder where benchmarking tools for hnsw belong?
   
   Not pretty sure, but we could put benchmarking tools in lucene/benchmark?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9560) Position aware TermQuery

2020-10-05 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207906#comment-17207906
 ] 

Alan Woodward commented on LUCENE-9560:
---

Interval queries should already allow you to do this, but of course won't 
produce scores in the same way.

> Position aware TermQuery
> 
>
> Key: LUCENE-9560
> URL: https://issues.apache.org/jira/browse/LUCENE-9560
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Haoyu Zhai
>Priority: Major
>
> In our work, we index most of our fields into an "all" field (like 
> elasticsearch) to make our search faster. But at the same time we still want 
> to support some of the field specific search (like {{title}}), so currently 
> our solution is to double index them so that we could do both "all" search as 
> well as specific field search.
> I want to propose a new term query that accept a range in a specific field to 
> search so that we could search on "all" field but act like a field specific 
> search. Then we need not to doubly index those field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org