[jira] [Commented] (SOLR-14837) prometheus-exporter: different metrics ports publishes mixed metrics

2020-11-18 Thread Fadi Mohsen (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235233#comment-17235233
 ] 

Fadi Mohsen commented on SOLR-14837:


changed file name, [~janhoy] do you have any idea on when it will get merged 
in, or do you want me to open a PR ?

> prometheus-exporter: different metrics ports publishes mixed metrics
> 
>
> Key: SOLR-14837
> URL: https://issues.apache.org/jira/browse/SOLR-14837
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Affects Versions: 8.6.2
>Reporter: Fadi Mohsen
>Priority: Minor
> Attachments: SOLR-14837.patch
>
>
> when calling SolrExporter.main "pro-grammatically"/"same JVM" with two 
> different solr masters asking to publish the metrics on two different ports, 
> the metrics are being mixed on both metric endpoints from the two solr 
> masters.
> This was tracked down to a static variable called *defaultRegistry*:
> https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/exporter/SolrExporter.java#L86
> removing the static keyword fixes the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14837) prometheus-exporter: different metrics ports publishes mixed metrics

2020-11-18 Thread Fadi Mohsen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fadi Mohsen updated SOLR-14837:
---
Attachment: SOLR-14837.patch

> prometheus-exporter: different metrics ports publishes mixed metrics
> 
>
> Key: SOLR-14837
> URL: https://issues.apache.org/jira/browse/SOLR-14837
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Affects Versions: 8.6.2
>Reporter: Fadi Mohsen
>Priority: Minor
> Attachments: SOLR-14837.patch
>
>
> when calling SolrExporter.main "pro-grammatically"/"same JVM" with two 
> different solr masters asking to publish the metrics on two different ports, 
> the metrics are being mixed on both metric endpoints from the two solr 
> masters.
> This was tracked down to a static variable called *defaultRegistry*:
> https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/exporter/SolrExporter.java#L86
> removing the static keyword fixes the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14837) prometheus-exporter: different metrics ports publishes mixed metrics

2020-11-18 Thread Fadi Mohsen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fadi Mohsen updated SOLR-14837:
---
Attachment: SOLR-14837.log

> prometheus-exporter: different metrics ports publishes mixed metrics
> 
>
> Key: SOLR-14837
> URL: https://issues.apache.org/jira/browse/SOLR-14837
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Affects Versions: 8.6.2
>Reporter: Fadi Mohsen
>Priority: Minor
> Attachments: SOLR-14837.patch
>
>
> when calling SolrExporter.main "pro-grammatically"/"same JVM" with two 
> different solr masters asking to publish the metrics on two different ports, 
> the metrics are being mixed on both metric endpoints from the two solr 
> masters.
> This was tracked down to a static variable called *defaultRegistry*:
> https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/exporter/SolrExporter.java#L86
> removing the static keyword fixes the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14837) prometheus-exporter: different metrics ports publishes mixed metrics

2020-11-18 Thread Fadi Mohsen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fadi Mohsen updated SOLR-14837:
---
Attachment: (was: SOLR-14837.log)

> prometheus-exporter: different metrics ports publishes mixed metrics
> 
>
> Key: SOLR-14837
> URL: https://issues.apache.org/jira/browse/SOLR-14837
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - prometheus-exporter
>Affects Versions: 8.6.2
>Reporter: Fadi Mohsen
>Priority: Minor
> Attachments: SOLR-14837.patch
>
>
> when calling SolrExporter.main "pro-grammatically"/"same JVM" with two 
> different solr masters asking to publish the metrics on two different ports, 
> the metrics are being mixed on both metric endpoints from the two solr 
> masters.
> This was tracked down to a static variable called *defaultRegistry*:
> https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/exporter/SolrExporter.java#L86
> removing the static keyword fixes the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


zacharymorn commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r526615461



##
File path: 
lucene/misc/src/test/org/apache/lucene/misc/store/TestDirectIODirectory.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import com.carrotsearch.randomizedtesting.LifecycleScope;
+import com.carrotsearch.randomizedtesting.RandomizedTest;
+import org.apache.lucene.store.*;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static 
org.apache.lucene.misc.store.DirectIODirectory.DEFAULT_MIN_BYTES_DIRECT;
+
+public class TestDirectIODirectory extends BaseDirectoryTestCase {
+  public void testWriteReadWithDirectIO() throws IOException {
+try(Directory dir = 
getDirectory(RandomizedTest.newTempDir(LifecycleScope.TEST))) {
+  final long blockSize = 
Files.getFileStore(createTempFile()).getBlockSize();
+  final long minBytesDirect = 
Double.valueOf(Math.ceil(DEFAULT_MIN_BYTES_DIRECT / blockSize)).longValue() *
+blockSize;
+  // Need to worry about overflows here?
+  final int writtenByteLength = Math.toIntExact(minBytesDirect);
+
+  MergeInfo mergeInfo = new MergeInfo(1000, Integer.MAX_VALUE, true, 1);
+  final IOContext context = new IOContext(mergeInfo);
+
+  IndexOutput indexOutput = dir.createOutput("test", context);
+  indexOutput.writeBytes(new byte[writtenByteLength], 0, 
writtenByteLength);
+  IndexInput indexInput = dir.openInput("test", context);
+
+  assertEquals("The length of bytes read should equal to written", 
writtenByteLength, indexInput.length());
+
+  indexOutput.close();
+  indexInput.close();
+}
+  }
+
+  @Override
+  protected Directory getDirectory(Path path) throws IOException {
+Directory delegate = FSDirectory.open(path);

Review comment:
   I just gave this a try, but looks like it would consistently fail some 
test cases in `BaseDirectoryTestCase`:
   
   1. With plain replacement of `FSDirectory.open(path)` with 
`LuceneTestCase.newFSDirectory(path)`, this would fail  
`BaseDirectoryTestCase.testCreateOutputForExistingFile`. The issue is 
`DirectIODirectory` delegates to `MockDirectoryWrapper.createOutput` for file 
creation, but `FSDirectory.deleteFile` for file deletion. This cause 
`MockDirectoryWrapper.createdFiles` variable to not be updated properly for 
deletion, and thus test failed with exception `FileAlreadyExistsException`
   2. When I also tried to delegate `DirectIODirectory.deleteFile` to 
`MockDirectoryWrapper.deleteFile` with 
 ```
 @Override
 public void deleteFile(String name) throws IOException {
   delegate.deleteFile(name);
 }
 ```
   
  `BaseDirectoryTestCase.testCreateOutputForExistingFile` will now pass, 
but `BaseDirectoryTestCase.testPendingDeletions`  will start to fail with 
`FileAlreadyExistsException`, with tempDir folder containing a bunch of random 
files not deleted due to `VirusCheckingFS` rejecting deletion. I'm still 
debugging this but it may take some time. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


zacharymorn commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r526607397



##
File path: 
lucene/misc/src/test/org/apache/lucene/misc/store/TestDirectIODirectory.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import com.carrotsearch.randomizedtesting.LifecycleScope;
+import com.carrotsearch.randomizedtesting.RandomizedTest;
+import org.apache.lucene.store.*;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static 
org.apache.lucene.misc.store.DirectIODirectory.DEFAULT_MIN_BYTES_DIRECT;
+
+public class TestDirectIODirectory extends BaseDirectoryTestCase {
+  public void testWriteReadWithDirectIO() throws IOException {
+try(Directory dir = 
getDirectory(RandomizedTest.newTempDir(LifecycleScope.TEST))) {
+  final long blockSize = 
Files.getFileStore(createTempFile()).getBlockSize();
+  final long minBytesDirect = 
Double.valueOf(Math.ceil(DEFAULT_MIN_BYTES_DIRECT / blockSize)).longValue() *
+blockSize;
+  // Need to worry about overflows here?
+  final int writtenByteLength = Math.toIntExact(minBytesDirect);
+
+  MergeInfo mergeInfo = new MergeInfo(1000, Integer.MAX_VALUE, true, 1);
+  final IOContext context = new IOContext(mergeInfo);
+
+  IndexOutput indexOutput = dir.createOutput("test", context);
+  indexOutput.writeBytes(new byte[writtenByteLength], 0, 
writtenByteLength);
+  IndexInput indexInput = dir.openInput("test", context);
+
+  assertEquals("The length of bytes read should equal to written", 
writtenByteLength, indexInput.length());
+
+  indexOutput.close();
+  indexInput.close();
+}
+  }
+
+  @Override
+  protected Directory getDirectory(Path path) throws IOException {

Review comment:
   Cool thanks for the verification!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on pull request #2088: LUCENE-9617: Reset lowestUnassignedFieldNumber in FieldNumbers.clear()

2020-11-18 Thread GitBox


rmuir commented on pull request #2088:
URL: https://github.com/apache/lucene-solr/pull/2088#issuecomment-730067131


   I'm suspicious that this is safe to do. What if another thread is calling 
addDocument at the same time?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9616) Improve test coverage for internal format versions

2020-11-18 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235072#comment-17235072
 ] 

Julie Tibshirani edited comment on LUCENE-9616 at 11/19/20, 12:07 AM:
--

I also suspect it's enough to keep unit tests for old formats. And maybe this 
could only be forward-looking -- we would make sure to preserve tests in future 
changes, but not actively add logic/ tests back?

bq. I gave it a try on my PR for LUCENE-9613 by keeping the old 
DocValuesConsumer around

Thanks for trying out the idea. It looks like you copied all logic for version 
1 of Lucene80DocValuesConsumer into a test class Lucene80v1DocValuesConsumer. I 
guess an alternate naming scheme would be to rename the current 
Lucene80DocValuesConsumer -> Lucene88DocValuesConsumer, and name the test class 
Lucene80DocValuesConsumer (instead of Lucene80v1DocValuesConsumer). I liked 
that this doesn't introduce a new version element, but it might be confusing 
that the producer/ consumer versions no longer align. Also should these tests 
live in backwards-codecs instead of core?


was (Author: jtibshirani):
I also suspect it's enough to keep unit tests for old formats. And maybe this 
could only be forward-looking -- we would make sure to preserve tests in future 
changes, but not actively add logic/ tests back?

bq. I gave it a try on my PR for LUCENE-9613 by keeping the old 
DocValuesConsumer around

Thanks for trying out the idea. It looks like you copied all logic for version 
1 of Lucene80DocValuesConsumer into a test class Lucene80v1DocValuesConsumer. I 
guess an alternate naming scheme would be to rename the current 
Lucene80DocValuesConsumer -> Lucene87DocValuesConsumer, and name the test class 
Lucene80DocValuesConsumer (instead of Lucene80v1DocValuesConsumer). I liked 
that this doesn't introduce a new version element, but it might be confusing 
that the producer/ consumer versions no longer align. Also should these tests 
live in backwards-codecs instead of core?

> Improve test coverage for internal format versions
> --
>
> Key: LUCENE-9616
> URL: https://issues.apache.org/jira/browse/LUCENE-9616
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Minor
>
> Some formats use an internal versioning system -- for example 
> {{CompressingStoredFieldsFormat}} maintains older logic for reading an 
> on-heap fields index. Because we always allow reading segments from the 
> current + previous major version, some users still rely on the read-side 
> logic of older internal versions.
> Although the older version logic is covered by 
> {{TestBackwardsCompatibility}}, it looks like it's not exercised in unit 
> tests. Older versions aren't "in rotation" when choosing a random codec for 
> tests. They also don't have dedicated unit tests as we have for separate 
> older formats, for example {{TestLucene60PointsFormat}}.
> It could be good to improve unit test coverage for the older versions, since 
> they're in active use. A downside is that it's not straightforward to add 
> unit tests, since we tend to just change/ delete the old write-side logic as 
> we bump internal versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9616) Improve test coverage for internal format versions

2020-11-18 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235072#comment-17235072
 ] 

Julie Tibshirani commented on LUCENE-9616:
--

I also suspect it's enough to keep unit tests for old formats. And maybe this 
could only be forward-looking -- we would make sure to preserve tests in future 
changes, but not actively add logic/ tests back?

bq. I gave it a try on my PR for LUCENE-9613 by keeping the old 
DocValuesConsumer around

Thanks for trying out the idea. It looks like you copied all logic for version 
1 of Lucene80DocValuesConsumer into a test class Lucene80v1DocValuesConsumer. I 
guess an alternate naming scheme would be to rename the current 
Lucene80DocValuesConsumer -> Lucene87DocValuesConsumer, and name the test class 
Lucene80DocValuesConsumer (instead of Lucene80v1DocValuesConsumer). I liked 
that this doesn't introduce a new version element, but it might be confusing 
that the producer/ consumer versions no longer align. Also should these tests 
live in backwards-codecs instead of core?

> Improve test coverage for internal format versions
> --
>
> Key: LUCENE-9616
> URL: https://issues.apache.org/jira/browse/LUCENE-9616
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Minor
>
> Some formats use an internal versioning system -- for example 
> {{CompressingStoredFieldsFormat}} maintains older logic for reading an 
> on-heap fields index. Because we always allow reading segments from the 
> current + previous major version, some users still rely on the read-side 
> logic of older internal versions.
> Although the older version logic is covered by 
> {{TestBackwardsCompatibility}}, it looks like it's not exercised in unit 
> tests. Older versions aren't "in rotation" when choosing a random codec for 
> tests. They also don't have dedicated unit tests as we have for separate 
> older formats, for example {{TestLucene60PointsFormat}}.
> It could be good to improve unit test coverage for the older versions, since 
> they're in active use. A downside is that it's not straightforward to add 
> unit tests, since we tend to just change/ delete the old write-side logic as 
> we bump internal versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14999) Add built-in option to advertise Solr with a different port than Jetty listens on.

2020-11-18 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235050#comment-17235050
 ] 

Houston Putman commented on SOLR-14999:
---

+1 I'm a fan of that idea. It might be difficult doing this multi-level 
defaulting in the jetty config. But I haven't looked at that yet. Will 
investigate

> Add built-in option to advertise Solr with a different port than Jetty 
> listens on.
> --
>
> Key: SOLR-14999
> URL: https://issues.apache.org/jira/browse/SOLR-14999
> Project: Solr
>  Issue Type: Improvement
>Reporter: Houston Putman
>Assignee: Houston Putman
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the default settings in {{solr.xml}} allow the specification of one 
> port, {{jetty.port}} which the bin/solr script provides from the 
> {{SOLR_PORT}} environment variable. This port is used twice. Jetty uses it to 
> listen for requests, and the clusterState uses the port to advertise the 
> address of the Solr Node.
> In cloud environments, it's sometimes crucial to be able to listen on one 
> port and advertise yourself as listening on another. This is because there is 
> a proxy that listens on the advertised port, and forwards the request to the 
> server which is listening to the jetty port.
> Solr already supports having a separate Jetty port and Live Nodes port 
> (examples provided in the dev-list discussion linked below). I suggest that 
> we add this to the default solr config so that users can use the default 
> solr.xml in cloud configurations, and the solr/bin script will enable easy 
> use of this feature.
> There has been [discussion on this exact 
> problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E]
>  on the dev list already.
> I propose the new system property to be used for {{hostPort}} in the 
> solr.xml. I am open to changing the name, but to me it is more descriptive 
> than {{hostPort}}.
> {{-Dsolr.port.advertise}} and {{SOLR_PORT_ADVERTISE}} (env var checked in 
> bin/solr).
> The xml field {{}} would not be changed however, just the system 
> property that is used to fill the value in the default {{solr.xml}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2089: SOLR-14999: Option to set the advertised port for Solr.

2020-11-18 Thread GitBox


madrob commented on a change in pull request #2089:
URL: https://github.com/apache/lucene-solr/pull/2089#discussion_r526487616



##
File path: solr/solr-ref-guide/src/format-of-solr-xml.adoc
##
@@ -134,9 +134,10 @@ The hostname Solr uses to access cores.
 The url context path.
 
 `hostPort`::
-The port Solr uses to access cores.
+The port Solr uses to access cores, and advertise Solr node locations through 
liveNodes.
 +
-In the default `solr.xml` file, this is set to `${jetty.port:8983}`, which 
will use the Solr port defined in Jetty, and otherwise fall back to 8983.
+In the default `solr.xml` file, this is set to `${solr.port.advertise:0}`.

Review comment:
   I'd like a sentence here giving an example why the user might care about 
it, i.e. solr running behind a proxy.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14999) Add built-in option to advertise Solr with a different port than Jetty listens on.

2020-11-18 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235047#comment-17235047
 ] 

Mike Drob commented on SOLR-14999:
--

Borrowing from Mark's comments in the linked thread, can we deprecate 
jetty.port and use solr.port.listen and solr.port.advertise, maybe with a 
fallback to solr.port if either is missing? And probably a fallback + warning 
logged if we end up using jetty.port?

> Add built-in option to advertise Solr with a different port than Jetty 
> listens on.
> --
>
> Key: SOLR-14999
> URL: https://issues.apache.org/jira/browse/SOLR-14999
> Project: Solr
>  Issue Type: Improvement
>Reporter: Houston Putman
>Assignee: Houston Putman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the default settings in {{solr.xml}} allow the specification of one 
> port, {{jetty.port}} which the bin/solr script provides from the 
> {{SOLR_PORT}} environment variable. This port is used twice. Jetty uses it to 
> listen for requests, and the clusterState uses the port to advertise the 
> address of the Solr Node.
> In cloud environments, it's sometimes crucial to be able to listen on one 
> port and advertise yourself as listening on another. This is because there is 
> a proxy that listens on the advertised port, and forwards the request to the 
> server which is listening to the jetty port.
> Solr already supports having a separate Jetty port and Live Nodes port 
> (examples provided in the dev-list discussion linked below). I suggest that 
> we add this to the default solr config so that users can use the default 
> solr.xml in cloud configurations, and the solr/bin script will enable easy 
> use of this feature.
> There has been [discussion on this exact 
> problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E]
>  on the dev list already.
> I propose the new system property to be used for {{hostPort}} in the 
> solr.xml. I am open to changing the name, but to me it is more descriptive 
> than {{hostPort}}.
> {{-Dsolr.port.advertise}} and {{SOLR_PORT_ADVERTISE}} (env var checked in 
> bin/solr).
> The xml field {{}} would not be changed however, just the system 
> property that is used to fill the value in the default {{solr.xml}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14999) Add built-in option to advertise Solr with a different port than Jetty listens on.

2020-11-18 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-14999:
--
Description: 
Currently the default settings in {{solr.xml}} allow the specification of one 
port, {{jetty.port}} which the bin/solr script provides from the {{SOLR_PORT}} 
environment variable. This port is used twice. Jetty uses it to listen for 
requests, and the clusterState uses the port to advertise the address of the 
Solr Node.

In cloud environments, it's sometimes crucial to be able to listen on one port 
and advertise yourself as listening on another. This is because there is a 
proxy that listens on the advertised port, and forwards the request to the 
server which is listening to the jetty port.

Solr already supports having a separate Jetty port and Live Nodes port 
(examples provided in the dev-list discussion linked below). I suggest that we 
add this to the default solr config so that users can use the default solr.xml 
in cloud configurations, and the solr/bin script will enable easy use of this 
feature.

There has been [discussion on this exact 
problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E]
 on the dev list already.

I propose the new system property to be used for {{hostPort}} in the solr.xml. 
I am open to changing the name, but to me it is more descriptive than 
{{hostPort}}.
{{-Dsolr.port.advertise}} and {{SOLR_PORT_ADVERTISE}} (env var checked in 
bin/solr).

The xml field {{}} would not be changed however, just the system 
property that is used to fill the value in the default {{solr.xml}}.

  was:
Currently the default settings in {{solr.xml}} allow the specification of one 
port, {{jetty.port}}  which the bin/solr script provides from the {{SOLR_PORT}} 
environment variable. This port is used twice. Jetty uses it to listen for 
requests, and the clusterState uses the port to advertise the address of the 
Solr Node.

In cloud environments, it's sometimes crucial to be able to listen on one port 
and advertise yourself as listening on another. This is because there is a 
proxy that listens on the advertised port, and forwards the request to the 
server which is listening to the jetty port.

Solr already supports having a separate Jetty port and Live Nodes port 
(examples provided in the dev-list discussion linked below). I suggest that we 
add this to the default solr config so that users can use the default solr.xml 
in cloud configurations, and the solr/bin script will enable easy use of this 
feature.

There has been [discussion on this exact 
problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E]
 on the dev list already.


> Add built-in option to advertise Solr with a different port than Jetty 
> listens on.
> --
>
> Key: SOLR-14999
> URL: https://issues.apache.org/jira/browse/SOLR-14999
> Project: Solr
>  Issue Type: Improvement
>Reporter: Houston Putman
>Assignee: Houston Putman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the default settings in {{solr.xml}} allow the specification of one 
> port, {{jetty.port}} which the bin/solr script provides from the 
> {{SOLR_PORT}} environment variable. This port is used twice. Jetty uses it to 
> listen for requests, and the clusterState uses the port to advertise the 
> address of the Solr Node.
> In cloud environments, it's sometimes crucial to be able to listen on one 
> port and advertise yourself as listening on another. This is because there is 
> a proxy that listens on the advertised port, and forwards the request to the 
> server which is listening to the jetty port.
> Solr already supports having a separate Jetty port and Live Nodes port 
> (examples provided in the dev-list discussion linked below). I suggest that 
> we add this to the default solr config so that users can use the default 
> solr.xml in cloud configurations, and the solr/bin script will enable easy 
> use of this feature.
> There has been [discussion on this exact 
> problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E]
>  on the dev list already.
> I propose the new system property to be used for {{hostPort}} in the 
> solr.xml. I am open to changing the name, but to me it is more descriptive 
> than {{hostPort}}.
> {{-Dsolr.port.advertise}} and {{SOLR_PORT_ADVERTISE}} (env var checked in 
> bin/solr).
> The xml field {{}} would not be changed however, just the system 
> property that is used to fill the value in the default {{solr.xml}}.


[GitHub] [lucene-solr] HoustonPutman opened a new pull request #2089: SOLR-14999: Option to set the advertised port for Solr.

2020-11-18 Thread GitBox


HoustonPutman opened a new pull request #2089:
URL: https://github.com/apache/lucene-solr/pull/2089


   https://issues.apache.org/jira/browse/SOLR-14999
   
   I am open to suggestions on the naming of this new option. Currently I have 
`-Dsolr.port.advertise` and `SOLR_PORT_ADVERTISE`, because I think those are 
more descriptive names than `hostPort`.
   
   This is backwards compatible with the current `solr.xml`s, because the 
default value (`jetty.port`) is now checked in the Java code, rather than the 
xml.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msfroh opened a new pull request #2088: LUCENE-9617: Reset lowestUnassignedFieldNumber in FieldNumbers.clear()

2020-11-18 Thread GitBox


msfroh opened a new pull request #2088:
URL: https://github.com/apache/lucene-solr/pull/2088


   FieldNumbers.clear() is called from IndexWriter.deleteAll(), which is
   supposed to completely reset the state of the index. This includes
   clearing all known fields.
   
   Prior to this change, it would allocate progressively higher field
   numbers, which results in larger and  larger arrays for
   FieldInfos.byNumber, effectively "leaking" field numbers every time
   deleteAll() is called.
   
   
   
   
   # Description
   
   If you run a loop that repeatedly adds documents to an IndexWriter and calls 
deleteAll, new fields will be given progressively higher index numbers. Since 
these index numbers are reflected in the size of the FieldInfos.byNumber array, 
this is effectively a memory leak.
   
   # Solution
   
   Reset lowestUnassignedFieldNumber to -1 when FieldNumbers.clear() is called 
(resetting the state to that of a newly-created FieldNumbers instance).
   
   # Tests
   
   Added a unit test to confirm that after calling clear(), the next field is 
given number 0.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2083: SOLR-15001 Docker: require init_var_solr.sh

2020-11-18 Thread GitBox


madrob commented on a change in pull request #2083:
URL: https://github.com/apache/lucene-solr/pull/2083#discussion_r526336171



##
File path: solr/docker/include/scripts/init-var-solr
##
@@ -31,31 +31,31 @@ function check_dir_writability {
 }
 
 if [ ! -d "$DIR/data" ]; then
-echo "Creating $DIR/data"
+#echo "Creating $DIR/data"
 check_dir_writability "$DIR"
 mkdir "$DIR/data"
 chmod 0770 "$DIR/data"

Review comment:
   Combine these into `mkdir -m`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2083: SOLR-15001 Docker: require init_var_solr.sh

2020-11-18 Thread GitBox


madrob commented on a change in pull request #2083:
URL: https://github.com/apache/lucene-solr/pull/2083#discussion_r526336171



##
File path: solr/docker/include/scripts/init-var-solr
##
@@ -31,31 +31,31 @@ function check_dir_writability {
 }
 
 if [ ! -d "$DIR/data" ]; then
-echo "Creating $DIR/data"
+#echo "Creating $DIR/data"
 check_dir_writability "$DIR"
 mkdir "$DIR/data"
 chmod 0770 "$DIR/data"

Review comment:
   Combine these as above?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9617) FieldNumbers.clear() should reset lowestUnassignedFieldNumber

2020-11-18 Thread Michael Froh (Jira)
Michael Froh created LUCENE-9617:


 Summary: FieldNumbers.clear() should reset 
lowestUnassignedFieldNumber
 Key: LUCENE-9617
 URL: https://issues.apache.org/jira/browse/LUCENE-9617
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 8.7
Reporter: Michael Froh


A call to IndexWriter.deleteAll() should completely reset the state of the 
index. Part of that is a call to globalFieldNumbersMap.clear(), which purges 
all knowledge of fields by clearing name -> number and number -> name maps. 
However, it does not reset lowestUnassignedFieldNumber.

If we have loop that adds some documents, calls deleteAll(), adds documents, 
etc. lowestUnassignedFieldNumber keeps counting up. Since FieldInfos allocates 
an array for number -> FieldInfo, this array will get larger and larger, 
effectively leaking memory.

We can fix this by resetting lowestUnassignedFieldNumber to -1 in 
FieldNumbers.clear().

I'll write a unit test and attach a patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14560) Learning To Rank Interleaving

2020-11-18 Thread Alessandro Benedetti (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-14560:

Fix Version/s: 8.8
   master (9.0)

> Learning To Rank Interleaving
> -
>
> Key: SOLR-14560
> URL: https://issues.apache.org/jira/browse/SOLR-14560
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - LTR
>Affects Versions: 8.5.2
>Reporter: Alessandro Benedetti
>Priority: Minor
> Fix For: master (9.0), 8.8
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Interleaving is an approach to Online Search Quality evaluation that can be 
> very useful for Learning To Rank models:
> [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html]
> Scope of this issue is to introduce the ability to the LTR query parser of 
> accepting multiple models (2 to start with).
> If one model is passed, normal reranking happens.
> If two models are passed, reranking happens for both models and the final 
> reranked list is the interleaved sequence of results coming from the two 
> models lists.
> As a first step it is going to be implemented through:
> TeamDraft Interleaving with two models in input.
> In the future, we can expand the functionality adding the interleaving 
> algorithm as a parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14560) Learning To Rank Interleaving

2020-11-18 Thread Alessandro Benedetti (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti resolved SOLR-14560.
-
Resolution: Fixed

Thanks [~cpoerschke] for all the reviewing help!

This new feature is now merged in master and 8.x (upcoming 8.8.0 )

> Learning To Rank Interleaving
> -
>
> Key: SOLR-14560
> URL: https://issues.apache.org/jira/browse/SOLR-14560
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - LTR
>Affects Versions: 8.5.2
>Reporter: Alessandro Benedetti
>Priority: Minor
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Interleaving is an approach to Online Search Quality evaluation that can be 
> very useful for Learning To Rank models:
> [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html]
> Scope of this issue is to introduce the ability to the LTR query parser of 
> accepting multiple models (2 to start with).
> If one model is passed, normal reranking happens.
> If two models are passed, reranking happens for both models and the final 
> reranked list is the interleaved sequence of results coming from the two 
> models lists.
> As a first step it is going to be implemented through:
> TeamDraft Interleaving with two models in input.
> In the future, we can expand the functionality adding the interleaving 
> algorithm as a parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #2086: SOLR-14985: Slow indexing and search performance when using HttpClusterStateProvider

2020-11-18 Thread GitBox


madrob commented on pull request #2086:
URL: https://github.com/apache/lucene-solr/pull/2086#issuecomment-729873570


   @thelabdude this seems like something you'd be interested in was well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2086: SOLR-14985: Slow indexing and search performance when using HttpClusterStateProvider

2020-11-18 Thread GitBox


madrob commented on a change in pull request #2086:
URL: https://github.com/apache/lucene-solr/pull/2086#discussion_r526311005



##
File path: 
solr/solrj/src/test/org/apache/solr/client/solrj/impl/CountingHttpClusterStateProvider.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.client.solrj.impl;
+
+import org.apache.http.client.HttpClient;
+import org.apache.solr.client.solrj.ResponseParser;
+import org.apache.solr.client.solrj.SolrClient;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.common.util.NamedList;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+
+@SuppressWarnings({"unchecked"})
+public class CountingHttpClusterStateProvider extends 
BaseHttpClusterStateProvider {
+
+  private final HttpClient httpClient;
+  private final boolean clientIsInternal;
+
+  private final AtomicInteger counter = new AtomicInteger(0);
+
+  public CountingHttpClusterStateProvider(List solrUrls, HttpClient 
httpClient) throws Exception {
+this.httpClient = httpClient == null ? HttpClientUtil.createClient(null) : 
httpClient;
+this.clientIsInternal = httpClient == null;
+init(solrUrls);
+  }
+
+  @Override
+  protected SolrClient getSolrClient(String baseUrl) {
+return new AssertingHttpSolrClient(new 
HttpSolrClient.Builder().withBaseSolrUrl(baseUrl).withHttpClient(httpClient));
+  }
+
+  @Override
+  public void close() throws IOException {
+if (this.clientIsInternal && this.httpClient != null) {
+  HttpClientUtil.close(httpClient);
+}
+  }
+
+  public int getRequestCount() {
+return counter.get();
+  }
+
+  class AssertingHttpSolrClient extends HttpSolrClient {
+public AssertingHttpSolrClient(Builder builder) {
+  super(builder);
+}
+
+@Override
+public NamedList request(@SuppressWarnings({"rawtypes"}) 
SolrRequest request, ResponseParser processor, String collection) throws 
SolrServerException, IOException {
+  new Exception().printStackTrace();

Review comment:
   please send this to a logger.

##
File path: 
solr/solrj/src/test/org/apache/solr/client/solrj/impl/CountingHttpClusterStateProvider.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.client.solrj.impl;
+
+import org.apache.http.client.HttpClient;
+import org.apache.solr.client.solrj.ResponseParser;
+import org.apache.solr.client.solrj.SolrClient;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.common.util.NamedList;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+
+@SuppressWarnings({"unchecked"})

Review comment:
   unneeded

##
File path: 
solr/solrj/src/test/org/apache/solr/client/solrj/impl/HttpClusterStateProviderTest.java
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of 

[jira] [Commented] (SOLR-14560) Learning To Rank Interleaving

2020-11-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234899#comment-17234899
 ] 

ASF subversion and git services commented on SOLR-14560:


Commit f441b435a0f075e3769740f062a90f7753535055 in lucene-solr's branch 
refs/heads/branch_8x from Alessandro Benedetti
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f441b43 ]

SOLR-14560: Interleaving for Learning To Rank (#1571)
(cherry picked from commit af0455ac8366d6dba941f2b2674ed2a8245c76f9)


> Learning To Rank Interleaving
> -
>
> Key: SOLR-14560
> URL: https://issues.apache.org/jira/browse/SOLR-14560
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - LTR
>Affects Versions: 8.5.2
>Reporter: Alessandro Benedetti
>Priority: Minor
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Interleaving is an approach to Online Search Quality evaluation that can be 
> very useful for Learning To Rank models:
> [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html]
> Scope of this issue is to introduce the ability to the LTR query parser of 
> accepting multiple models (2 to start with).
> If one model is passed, normal reranking happens.
> If two models are passed, reranking happens for both models and the final 
> reranked list is the interleaved sequence of results coming from the two 
> models lists.
> As a first step it is going to be implemented through:
> TeamDraft Interleaving with two models in input.
> In the future, we can expand the functionality adding the interleaving 
> algorithm as a parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-18 Thread GitBox


alessandrobenedetti commented on pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#issuecomment-729864396


   Thanks @cpoerschke for the thorough review!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14560) Learning To Rank Interleaving

2020-11-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234888#comment-17234888
 ] 

ASF subversion and git services commented on SOLR-14560:


Commit af0455ac8366d6dba941f2b2674ed2a8245c76f9 in lucene-solr's branch 
refs/heads/master from Alessandro Benedetti
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=af0455a ]

SOLR-14560: Interleaving for Learning To Rank (#1571)

SOLR-14560: Add interleaving support in Learning To Rank

> Learning To Rank Interleaving
> -
>
> Key: SOLR-14560
> URL: https://issues.apache.org/jira/browse/SOLR-14560
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - LTR
>Affects Versions: 8.5.2
>Reporter: Alessandro Benedetti
>Priority: Minor
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Interleaving is an approach to Online Search Quality evaluation that can be 
> very useful for Learning To Rank models:
> [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html]
> Scope of this issue is to introduce the ability to the LTR query parser of 
> accepting multiple models (2 to start with).
> If one model is passed, normal reranking happens.
> If two models are passed, reranking happens for both models and the final 
> reranked list is the interleaved sequence of results coming from the two 
> models lists.
> As a first step it is going to be implemented through:
> TeamDraft Interleaving with two models in input.
> In the future, we can expand the functionality adding the interleaving 
> algorithm as a parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14560) Learning To Rank Interleaving

2020-11-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234887#comment-17234887
 ] 

ASF subversion and git services commented on SOLR-14560:


Commit af0455ac8366d6dba941f2b2674ed2a8245c76f9 in lucene-solr's branch 
refs/heads/master from Alessandro Benedetti
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=af0455a ]

SOLR-14560: Interleaving for Learning To Rank (#1571)

SOLR-14560: Add interleaving support in Learning To Rank

> Learning To Rank Interleaving
> -
>
> Key: SOLR-14560
> URL: https://issues.apache.org/jira/browse/SOLR-14560
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - LTR
>Affects Versions: 8.5.2
>Reporter: Alessandro Benedetti
>Priority: Minor
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Interleaving is an approach to Online Search Quality evaluation that can be 
> very useful for Learning To Rank models:
> [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html]
> Scope of this issue is to introduce the ability to the LTR query parser of 
> accepting multiple models (2 to start with).
> If one model is passed, normal reranking happens.
> If two models are passed, reranking happens for both models and the final 
> reranked list is the interleaved sequence of results coming from the two 
> models lists.
> As a first step it is going to be implemented through:
> TeamDraft Interleaving with two models in input.
> In the future, we can expand the functionality adding the interleaving 
> algorithm as a parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti merged pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-18 Thread GitBox


alessandrobenedetti merged pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9021) QueryParser should avoid creating an LookaheadSuccess(Error) object with every instance

2020-11-18 Thread Clemens Stukenbrock (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234877#comment-17234877
 ] 

Clemens Stukenbrock commented on LUCENE-9021:
-

Hello [~mkhl],  do you have any update on this topic? Kind regards

> QueryParser should avoid creating an LookaheadSuccess(Error) object with 
> every instance
> ---
>
> Key: LUCENE-9021
> URL: https://issues.apache.org/jira/browse/LUCENE-9021
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Przemek Bruski
>Priority: Major
> Attachments: LUCENE-9021.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is basically the same as 
> https://issues.apache.org/jira/browse/SOLR-11242 , but for Lucene QueryParser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude commented on pull request #2010: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead

2020-11-18 Thread GitBox


thelabdude commented on pull request #2010:
URL: https://github.com/apache/lucene-solr/pull/2010#issuecomment-729783538


   Hi @noblepaul thanks for taking a look ... so we decided to not try to 
handle the upgrade to TLS using a rolling restart (as described in my comment 
^) in this PR ... also see Anshum's comments. I initially had a way to get the 
"current" urlScheme for each live node as with a rolling restart, you'll have a 
mix of nodes with TLS enabled and some not yet, but we felt that could be a 
little trappy b/c it really doesn't address the client applications. So our 
advice now is just suck it up and take the down time to enable TLS. Basically, 
we don't want to promise the community a zero-downtime upgrade to enable TLS, 
because it is a hard thing to promise. The live nodes approach (see commit 
history in this PR) works on the server side, but doesn't address client 
applications. Probably other weird issues too ... FWIW ~ that's the current 
experience as well, so we're not any worse off. I find it highly unlikely that 
users will enable TLS after building up a large production cluster anyway, 
 that seems like it wouldn't happen in practice.
   
   Regarding migrating to this: I don't think these changes would require any 
migration process. Currently, `node_name` is stored in the state in ZK (as you 
know), so the stored `base_url` will just be ignored and re-created when 
reading from ZK using the `node_name`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-18 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234755#comment-17234755
 ] 

Adrien Grand edited comment on LUCENE-9378 at 11/18/20, 3:56 PM:
-

[~gworah] I'm pretty sure that the drop is (at least partially) due to this 
change as we had seen a change in the other direction when introducing 
compression for binary fields. The format with compression looks generally 
faster for linear scans (like our faceting tasks in nightlies) and slower for 
selective queries (like some use-cases mentioned above).


was (Author: jpountz):
[~gworah] I'm pretty sure that the drop is due to this change as we had seen a 
change in the other direction when introducing compression for binary fields. 
The format with compression looks generally faster for linear scans (like our 
faceting tasks in nightlies) and slower for selective queries (like some 
use-cases mentioned above).

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Fix For: 8.8
>
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-18 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234755#comment-17234755
 ] 

Adrien Grand commented on LUCENE-9378:
--

[~gworah] I'm pretty sure that the drop is due to this change as we had seen a 
change in the other direction when introducing compression for binary fields. 
The format with compression looks generally faster for linear scans (like our 
faceting tasks in nightlies) and slower for selective queries (like some 
use-cases mentioned above).

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Fix For: 8.8
>
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9616) Improve test coverage for internal format versions

2020-11-18 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234746#comment-17234746
 ] 

Adrien Grand commented on LUCENE-9616:
--

I gave it a try on my PR for LUCENE-9613 by keeping the old DocValuesConsumer 
around in order to be able to keep unit tests for the previous internal version.

> Improve test coverage for internal format versions
> --
>
> Key: LUCENE-9616
> URL: https://issues.apache.org/jira/browse/LUCENE-9616
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Minor
>
> Some formats use an internal versioning system -- for example 
> {{CompressingStoredFieldsFormat}} maintains older logic for reading an 
> on-heap fields index. Because we always allow reading segments from the 
> current + previous major version, some users still rely on the read-side 
> logic of older internal versions.
> Although the older version logic is covered by 
> {{TestBackwardsCompatibility}}, it looks like it's not exercised in unit 
> tests. Older versions aren't "in rotation" when choosing a random codec for 
> tests. They also don't have dedicated unit tests as we have for separate 
> older formats, for example {{TestLucene60PointsFormat}}.
> It could be good to improve unit test coverage for the older versions, since 
> they're in active use. A downside is that it's not straightforward to add 
> unit tests, since we tend to just change/ delete the old write-side logic as 
> we bump internal versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9613) Create blocks for ords when it helps in Lucene80DocValuesFormat

2020-11-18 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234742#comment-17234742
 ] 

Adrien Grand commented on LUCENE-9613:
--

I opened a pull request that writes ordinals using the same code we use for 
numbers: https://github.com/apache/lucene-solr/pull/2087.

> Create blocks for ords when it helps in Lucene80DocValuesFormat
> ---
>
> Key: LUCENE-9613
> URL: https://issues.apache.org/jira/browse/LUCENE-9613
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently for sorted(-set) values, we always write ords using 
> log2(valueCount) bits per entry. However in several cases like when the field 
> is used in the index sort, or if one value is _very_common, splitting into 
> blocks like we do for numerics would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #2087: LUCENE-9613: Split ordinals into blocks when it saves 10+% space.

2020-11-18 Thread GitBox


jpountz opened a new pull request #2087:
URL: https://github.com/apache/lucene-solr/pull/2087


   This changes the way that doc values write ordinals so that Lucene splits 
into
   blocks if it would save more than 10% space, like we do for numeric fields. 
By
   the way, we are now using the same code path to write ordinals and numbers.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-18 Thread GitBox


rmuir commented on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-729757207


   `totalTermFreq`/`sumTotalTermFreq` are about term frequencies, nothing to do 
with the norms... but this norms check is the only thing guarding against 
overflow. we can't just disable the check like this without causing additional 
problems.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9616) Improve test coverage for internal format versions

2020-11-18 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234719#comment-17234719
 ] 

Adrien Grand commented on LUCENE-9616:
--

I don't know if I would go as far as including old codecs in the rotation, but 
+1 to keep unit tests for old internal versions. It feels like it should be 
doable to keep the old write logic in the test folder in order to be able to 
keep unit testing previous versions of our file formats.

> Improve test coverage for internal format versions
> --
>
> Key: LUCENE-9616
> URL: https://issues.apache.org/jira/browse/LUCENE-9616
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Minor
>
> Some formats use an internal versioning system -- for example 
> {{CompressingStoredFieldsFormat}} maintains older logic for reading an 
> on-heap fields index. Because we always allow reading segments from the 
> current + previous major version, some users still rely on the read-side 
> logic of older internal versions.
> Although the older version logic is covered by 
> {{TestBackwardsCompatibility}}, it looks like it's not exercised in unit 
> tests. Older versions aren't "in rotation" when choosing a random codec for 
> tests. They also don't have dedicated unit tests as we have for separate 
> older formats, for example {{TestLucene60PointsFormat}}.
> It could be good to improve unit test coverage for the older versions, since 
> they're in active use. A downside is that it's not straightforward to add 
> unit tests, since we tend to just change/ delete the old write-side logic as 
> we bump internal versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-18 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234714#comment-17234714
 ] 

Tim Allison commented on SOLR-14973:


Backporting and confirming that I didn't break anything takes a day of 
intermittent work.  If there are plans to do another 8.6.x release, I'll do it. 
 Otherwise, onwards... Thank you [~krisden]!

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-18 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234688#comment-17234688
 ] 

Kevin Risden commented on SOLR-14973:
-

[~tallison] If its not a lot of work, then sure backport it. Unless there are 
plans to do a 8.6.x release - then not sure it makes sense to backport the 
changes since it would never get into a release. 

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13821) Package Store

2020-11-18 Thread marian vasile caraiman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234644#comment-17234644
 ] 

marian vasile caraiman commented on SOLR-13821:
---

I saw in the design document that the feature to *delete a file* is TBD :   
"DELETE /api/cluster/files/package//myfile.jar delete a file from 
all nodes ( TBD )" . It seems it is not available in the 8.7 version.  Any 
updates or estimates on this ? Without this feature people using snapshot of 
libraries can't  replace older versions. Also, it means the filestore can grow 
without control. 

> Package Store
> -
>
> Key: SOLR-13821
> URL: https://issues.apache.org/jira/browse/SOLR-13821
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.4
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Package store is a storage managed by Solr that holds the package artifacts. 
> This is replicated across nodes.
> Design is here: 
> [https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#]
> The package store is powered by an underlying filestore. This filestore is a 
> fully replicated p2p filesystem storage for artifacts.
> The APIs are as follows
> {code:java}
> # add a file
> POST  /api/cluster/files/path/to/file.jar
> #retrieve a file
> GET /api/cluster/files/path/to/file.jar
> #list files in the /path/to directory
> GET /api/cluster/files/path/to
> #GET meta info of the jar
> GET /api/cluster/files/path/to/file.jar?meta=true
> {code}
> This store keeps 2 files per file
>  # The actual file say {{myplugin.jar}}
>  # A metadata file {{.myplugin.jar.json}} in the same directory
> The contenbts of the metadata file is
> {code:json}
> {
> "sha512" : ""
> "sig": {
> "" :""
> }}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] shalinmangar opened a new pull request #2086: SOLR-14985: Slow indexing and search performance when using HttpClusterStateProvider

2020-11-18 Thread GitBox


shalinmangar opened a new pull request #2086:
URL: https://github.com/apache/lucene-solr/pull/2086


   # Description
   
   HttpClusterStateProvider fetches and caches Aliases and Live Nodes for 5 
seconds. But all calls to getState are live. These collection states are 
supposed to be cached inside BaseSolrCloudClient but BaseSolrCloudClient only 
caches collection state if it is lazy which is never the case for states 
returned by HttpClusterStateProvider.
   
   The BaseSolrCloudClient calls getState for each collection mentioned in the 
request thereby making a live http call. It also calls getClusterProperties() 
for each live node!
   
   So overall, at least 4 HTTP calls are made to fetch cluster state for each 
update request when using HttpClusterStateProvider. There may be more if 
aliases are involved or if more than one collection is specified in the 
request. Similar problems exist on the query path as well.
   
   Due to these reasons, using HttpClusterStateProvider causes horrible 
latencies and throughput for update and search requests.
   
   # Solution
   
   This PR fixes BaseCloudSolrClient to cache collection states returned by 
HttpClusterStateProvider, reduce the number of calls to getClusterProperty in 
case of admin requests, replace usage of getClusterStateProvider().getState() 
with getDocCollection() which caches the collection state. Therefore the number 
of clusterstatus calls are reduced from 4 for each query/indexing to either one 
at max and usually 0 (if data is cached already).
   
   # Tests
   
   A new CountingHttpClusterStateProvider test class is added which can track 
the number of http calls made. This class is used in the 
HttpClusterStateProviderTest to track and assert the number of http calls made 
by HttpClusterStateProvider.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have developed this patch against the `master` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15007) Aggregate core handler=/select and /update metrics at the node level metric too

2020-11-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234635#comment-17234635
 ] 

Andrzej Bialecki commented on SOLR-15007:
-

Please check out {{MetricsCollectorHandler}}, {{SolrShardReporter}} and 
{{SolrClusterReporter}}. They were created to handle a little different 
scenario (aggregating metrics from nodes into shard leader / Overseer leader) 
but perhaps they can be reused.

They also use {{AggregateMetric}} to represent aggregated numeric values 
together with individual contributions.

> Aggregate core handler=/select and /update metrics at the node level metric 
> too
> ---
>
> Key: SOLR-15007
> URL: https://issues.apache.org/jira/browse/SOLR-15007
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (9.0)
>Reporter: Mathieu Marie
>Priority: Minor
>
> At my company, we anticipate huge number of cores and would like to report 
> aggregated view at the node level instead of the core level that will grow 
> exponentially.
> Right now, we're aggregating all of the solr.cores metrics to compute 
> per-cluster dashboards.
> But given that there are many admin handlers already reporting metrics at the 
> node level, I wonder if we could aggregate _/update_, _/select_ and all the 
> other handler counters in solr and expose them at the solr.node level too.
> It would requires (a lot) less data to transport, store and aggregate later, 
> while still giving access to per core metrics.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on pull request #2010: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead

2020-11-18 Thread GitBox


noblepaul commented on pull request #2010:
URL: https://github.com/apache/lucene-solr/pull/2010#issuecomment-729684303


   Before I go further, can I know
   
   * Does this work for a rolling restart?
   * How does one migrate to this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #1769: SOLR-14789: Absorb the docker-solr repo.

2020-11-18 Thread GitBox


dsmiley commented on pull request #1769:
URL: https://github.com/apache/lucene-solr/pull/1769#issuecomment-729668076


   > we could still document how to perform PGP checks on the jars
   
   Just confirming... we agree that ./verify-docker.sh is needless, and we need 
to just _document_ (e.g. in the ref guide) how to verify individual JARs at the 
CLI.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #2085: LUCENE-9508: Fix DocumentsWriter to block threads until unstalled

2020-11-18 Thread GitBox


s1monw commented on a change in pull request #2085:
URL: https://github.com/apache/lucene-solr/pull/2085#discussion_r526058298



##
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java
##
@@ -371,19 +371,15 @@ public void close() throws IOException {
   private boolean preUpdate() throws IOException {
 ensureOpen();
 boolean hasEvents = false;
-
-if (flushControl.anyStalledThreads() || (flushControl.numQueuedFlushes() > 
0 && config.checkPendingFlushOnUpdate)) {

Review comment:
   yeah, details...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #2085: LUCENE-9508: Fix DocumentsWriter to block threads until unstalled

2020-11-18 Thread GitBox


s1monw commented on a change in pull request #2085:
URL: https://github.com/apache/lucene-solr/pull/2085#discussion_r526058138



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -4258,4 +4258,47 @@ public void testPendingNumDocs() throws Exception {
   }
 }
   }
+
+  public void testIndexWriterBlocksOnStall() throws IOException, 
InterruptedException {
+try (Directory dir = newDirectory()) {
+  try (IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig())) {

Review comment:
   it's more clear that way what is closed first which is important here. 
That's why I do it this way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2085: LUCENE-9508: Fix DocumentsWriter to block threads until unstalled

2020-11-18 Thread GitBox


dweiss commented on a change in pull request #2085:
URL: https://github.com/apache/lucene-solr/pull/2085#discussion_r526032942



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -4258,4 +4258,47 @@ public void testPendingNumDocs() throws Exception {
   }
 }
   }
+
+  public void testIndexWriterBlocksOnStall() throws IOException, 
InterruptedException {
+try (Directory dir = newDirectory()) {
+  try (IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig())) {

Review comment:
   pull it up into the same try (semicolon-separated)?

##
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java
##
@@ -371,19 +371,15 @@ public void close() throws IOException {
   private boolean preUpdate() throws IOException {
 ensureOpen();
 boolean hasEvents = false;
-
-if (flushControl.anyStalledThreads() || (flushControl.numQueuedFlushes() > 
0 && config.checkPendingFlushOnUpdate)) {

Review comment:
   oh, wonderful.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #2085: LUCENE-9508: Fix DocumentsWriter to block threads until unstalled

2020-11-18 Thread GitBox


s1monw commented on pull request #2085:
URL: https://github.com/apache/lucene-solr/pull/2085#issuecomment-729631181


   @mikemccand can you take a look pls



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #2085: LUCENE-9508: Fix DocumentsWriter to block threads until unstalled

2020-11-18 Thread GitBox


s1monw opened a new pull request #2085:
URL: https://github.com/apache/lucene-solr/pull/2085


   DWStallControl expects the caller to loop on top of the wait call to make
   progress with flushing if the DW is stalled. This logic wasn't applied such 
that
   DW only stalled for one second and then released the indexing thread. This 
can cause
   OOM if for instance during a full flush one DWPT gets stuck and other 
threads keep on
   indexing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9584) PointInSetQuery does not terminate early if result iterator has no docs

2020-11-18 Thread hackerwin7 (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220608#comment-17220608
 ] 

hackerwin7 edited comment on LUCENE-9584 at 11/18/20, 10:57 AM:


I upload a patch based on 7.7.0, just check the result iterator before new 
ConstantScoreScorer; but I'm not sure should we use return null to terminate 
more early in BooleanWeight.scorerSupplier() just let the subScorer == null.

I believe that in PointInSetQuery.scorer()  

reader.getPointValues(field) == null   *is equal to*  reader intersected result 
is empty

this two case should both return null to terminate more early.


was (Author: hackerwin7):
I upload a patch based for 7.7.0, just check the result iterator before new 
ConstantScoreScorer; but I'm not sure should we use return null to terminate 
more early in BooleanWeight.scorerSupplier() just let the subScorer == null.

I believe that in PointInSetQuery.scorer()  

reader.getPointValues(field) == null   *is equal to*  reader intersected result 
is empty

this two case should both return null to terminate more early.

> PointInSetQuery does not terminate early if result iterator has no docs
> ---
>
> Key: LUCENE-9584
> URL: https://issues.apache.org/jira/browse/LUCENE-9584
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.7.3, 8.6.3
>Reporter: hackerwin7
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-7.7.0-PointInSetQuery_terminate_early.patch
>
>
> Today, in a point in set query after BKD intersect we get a DocIdSetBuilder 
> result, if result's iterator have no docs, then the PointInSetQuery still 
> create ConstantScoreScorer with an empty DocIdSetIterator. 
> In a Boolean Query, such as query = subQuery1 AND subQuery2 AND subQuery3 
>  subQueryN
> if subQuery1 is a PointInSetQuery and get an empty result iterator, and 
> subsequent subQuery2 ~ subQueryN would still evaluate to call build scorer, 
> this is an unnecessary cost for this query if subQuery1 have already got an 
> empty result iterator .
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


uschindler commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r525918279



##
File path: 
lucene/misc/src/test/org/apache/lucene/misc/store/TestDirectIODirectory.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import com.carrotsearch.randomizedtesting.LifecycleScope;
+import com.carrotsearch.randomizedtesting.RandomizedTest;
+import org.apache.lucene.store.*;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static 
org.apache.lucene.misc.store.DirectIODirectory.DEFAULT_MIN_BYTES_DIRECT;
+
+public class TestDirectIODirectory extends BaseDirectoryTestCase {
+  public void testWriteReadWithDirectIO() throws IOException {
+try(Directory dir = 
getDirectory(RandomizedTest.newTempDir(LifecycleScope.TEST))) {
+  final long blockSize = 
Files.getFileStore(createTempFile()).getBlockSize();
+  final long minBytesDirect = 
Double.valueOf(Math.ceil(DEFAULT_MIN_BYTES_DIRECT / blockSize)).longValue() *
+blockSize;
+  // Need to worry about overflows here?
+  final int writtenByteLength = Math.toIntExact(minBytesDirect);
+
+  MergeInfo mergeInfo = new MergeInfo(1000, Integer.MAX_VALUE, true, 1);
+  final IOContext context = new IOContext(mergeInfo);
+
+  IndexOutput indexOutput = dir.createOutput("test", context);
+  indexOutput.writeBytes(new byte[writtenByteLength], 0, 
writtenByteLength);
+  IndexInput indexInput = dir.openInput("test", context);
+
+  assertEquals("The length of bytes read should equal to written", 
writtenByteLength, indexInput.length());
+
+  indexOutput.close();
+  indexInput.close();
+}
+  }
+
+  @Override
+  protected Directory getDirectory(Path path) throws IOException {

Review comment:
   I verified: `BaseDirectoryTestCase` uses 
[newIOContext(Random)](https://lucene.apache.org/core/8_7_0/test-framework/org/apache/lucene/util/LuceneTestCase.html#newIOContext-java.util.Random-).
 So we have the correct randomization and our directory is used from time to 
time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


uschindler commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r525917794



##
File path: 
lucene/misc/src/test/org/apache/lucene/misc/store/TestDirectIODirectory.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import com.carrotsearch.randomizedtesting.LifecycleScope;
+import com.carrotsearch.randomizedtesting.RandomizedTest;
+import org.apache.lucene.store.*;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static 
org.apache.lucene.misc.store.DirectIODirectory.DEFAULT_MIN_BYTES_DIRECT;
+
+public class TestDirectIODirectory extends BaseDirectoryTestCase {
+  public void testWriteReadWithDirectIO() throws IOException {
+try(Directory dir = 
getDirectory(RandomizedTest.newTempDir(LifecycleScope.TEST))) {
+  final long blockSize = 
Files.getFileStore(createTempFile()).getBlockSize();
+  final long minBytesDirect = 
Double.valueOf(Math.ceil(DEFAULT_MIN_BYTES_DIRECT / blockSize)).longValue() *
+blockSize;
+  // Need to worry about overflows here?
+  final int writtenByteLength = Math.toIntExact(minBytesDirect);
+
+  MergeInfo mergeInfo = new MergeInfo(1000, Integer.MAX_VALUE, true, 1);
+  final IOContext context = new IOContext(mergeInfo);
+
+  IndexOutput indexOutput = dir.createOutput("test", context);
+  indexOutput.writeBytes(new byte[writtenByteLength], 0, 
writtenByteLength);
+  IndexInput indexInput = dir.openInput("test", context);
+
+  assertEquals("The length of bytes read should equal to written", 
writtenByteLength, indexInput.length());
+
+  indexOutput.close();
+  indexInput.close();
+}
+  }
+
+  @Override
+  protected Directory getDirectory(Path path) throws IOException {
+Directory delegate = FSDirectory.open(path);

Review comment:
   I verified: `BaseDirectoryTestCase` uses 
[newIOContext(Random)](https://lucene.apache.org/core/8_7_0/test-framework/org/apache/lucene/util/LuceneTestCase.html#newIOContext-java.util.Random-).
 So we have the correct randomization and our directory is used from time to 
time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


uschindler commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r525912478



##
File path: 
lucene/misc/src/test/org/apache/lucene/misc/store/TestDirectIODirectory.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import com.carrotsearch.randomizedtesting.LifecycleScope;
+import com.carrotsearch.randomizedtesting.RandomizedTest;
+import org.apache.lucene.store.*;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static 
org.apache.lucene.misc.store.DirectIODirectory.DEFAULT_MIN_BYTES_DIRECT;
+
+public class TestDirectIODirectory extends BaseDirectoryTestCase {
+  public void testWriteReadWithDirectIO() throws IOException {
+try(Directory dir = 
getDirectory(RandomizedTest.newTempDir(LifecycleScope.TEST))) {
+  final long blockSize = 
Files.getFileStore(createTempFile()).getBlockSize();
+  final long minBytesDirect = 
Double.valueOf(Math.ceil(DEFAULT_MIN_BYTES_DIRECT / blockSize)).longValue() *
+blockSize;
+  // Need to worry about overflows here?
+  final int writtenByteLength = Math.toIntExact(minBytesDirect);
+
+  MergeInfo mergeInfo = new MergeInfo(1000, Integer.MAX_VALUE, true, 1);
+  final IOContext context = new IOContext(mergeInfo);
+
+  IndexOutput indexOutput = dir.createOutput("test", context);
+  indexOutput.writeBytes(new byte[writtenByteLength], 0, 
writtenByteLength);
+  IndexInput indexInput = dir.openInput("test", context);
+
+  assertEquals("The length of bytes read should equal to written", 
writtenByteLength, indexInput.length());
+
+  indexOutput.close();
+  indexInput.close();
+}
+  }
+
+  @Override
+  protected Directory getDirectory(Path path) throws IOException {

Review comment:
   Thanks great. We have to check, if the base class also sometimes send 
the correct IOContexts, so our directory is triggered. Because the wrong 
IOContext will cause everything to be delegated to the underlying FSDirectory.

##
File path: 
lucene/misc/src/test/org/apache/lucene/misc/store/TestDirectIODirectory.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import com.carrotsearch.randomizedtesting.LifecycleScope;
+import com.carrotsearch.randomizedtesting.RandomizedTest;
+import org.apache.lucene.store.*;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static 
org.apache.lucene.misc.store.DirectIODirectory.DEFAULT_MIN_BYTES_DIRECT;
+
+public class TestDirectIODirectory extends BaseDirectoryTestCase {
+  public void testWriteReadWithDirectIO() throws IOException {
+try(Directory dir = 
getDirectory(RandomizedTest.newTempDir(LifecycleScope.TEST))) {
+  final long blockSize = 
Files.getFileStore(createTempFile()).getBlockSize();
+  final long minBytesDirect = 
Double.valueOf(Math.ceil(DEFAULT_MIN_BYTES_DIRECT / blockSize)).longValue() *
+blockSize;
+  // Need to worry about overflows here?
+  final int writtenByteLength = Math.toIntExact(minBytesDirect);
+
+  MergeInfo mergeInfo = new MergeInfo(1000, Integer.MAX_VALUE, true, 1);
+  final IOContext context = new IOContext(mergeInfo);
+
+  IndexOutput indexOutput = 

[GitHub] [lucene-solr] uschindler commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


uschindler commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r525910929



##
File path: 
lucene/misc/src/test/org/apache/lucene/misc/store/TestDirectIODirectory.java
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import com.carrotsearch.randomizedtesting.LifecycleScope;
+import com.carrotsearch.randomizedtesting.RandomizedTest;
+import org.apache.lucene.store.*;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static 
org.apache.lucene.misc.store.DirectIODirectory.DEFAULT_MIN_BYTES_DIRECT;
+
+public class TestDirectIODirectory extends BaseDirectoryTestCase {

Review comment:
   That's because of the new tests in it, it looks too different to the 
forked file. So it's 2 different files now. But this really depends on the Git 
version that is used.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


uschindler commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r525909321



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -74,12 +65,12 @@
  *
  * @lucene.experimental
  */
-public class NativeUnixDirectory extends FSDirectory {

Review comment:
   You are right, WindowsDirectory is unrelated to direct IO. So let's 
discuss this on separate issue! The background is here: 
https://issues.apache.org/jira/browse/LUCENE-2791. The issue behind that is the 
following: https://bugs.openjdk.java.net/browse/JDK-6265734





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on pull request #1769: SOLR-14789: Absorb the docker-solr repo.

2020-11-18 Thread GitBox


janhoy commented on pull request #1769:
URL: https://github.com/apache/lucene-solr/pull/1769#issuecomment-729525514


   > what do you imagine `verify-docker.sh` would be doing under the hood
   
   It would verify the sha512 sum of each jar with the corresponding checksum 
published by the project, and also allow the user to verify the PGP signature 
of each jar to assert it was signed by a committer.
   
   > Wouldn't it be simpler for the release manager to build the docker image, 
examine the sha256 hash of the image, and publish that to the download 
location, making it official
   
   That's a great idea David, that the RM records the image SHA when pushing, 
and publishes that sha. People can then either pull with SHA or verify later 
with `docker images --digests solr`. That should be sufficient for most users. 
For those who want to futher assert that they use the very same binaries signed 
by the committer, we could still document how to perform PGP checks on the jars.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


zacharymorn commented on pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#issuecomment-729520080


   > I stated a bit earlier, removing WindowsDirectory should be a separate PR. 
It's not the same thing (it's not about DirectIO), it works around aproblem 
with positional reads. We should review this again and decide later. Maybe open 
an issue.
   > 
   > What's good with this new PR: it allows to use directio also on windows.
   
   Sounds good. I've reverted the `WindowsDirectory` removal commit, and will 
create a follow-up Jira ticket on this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-18 Thread GitBox


uschindler commented on pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#issuecomment-729516294


   I stated a bit earlier, removing WindowsDirectory should be a separate PR. 
It's not the same thing (it's not about DirectIO), it works around aproblem 
with positional reads. We should review this again and decide later. Maybe open 
an issue.
   
   What's good with this new PR: it allows to use directio also on windows.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org