date:20190108

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/3374/
Java: 32bit/jdk1.8.0_172 -server -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 59601 lines...]
-ecj-javadoc-lint-src:
[mkdir] Created dir: /tmp/ecj144715031
 [ecj-lint] Compiling 1235 source files to /tmp/ecj144715031
 [ecj-lint] Processing annotations
 [ecj-lint] Annotations processed
 [ecj-lint] Processing annotations
 [ecj-lint] No elements to process
 [ecj-lint] invalid Class-Path header in manifest of jar file: 
/home/jenkins/.ivy2/cache/org.restlet.jee/org.restlet/jars/org.restlet-2.3.0.jar
 [ecj-lint] invalid Class-Path header in manifest of jar file: 
/home/jenkins/.ivy2/cache/org.restlet.jee/org.restlet.ext.servlet/jars/org.restlet.ext.servlet-2.3.0.jar
 [ecj-lint] --
 [ecj-lint] 1. WARNING in 
/home/jenkins/workspace/Lucene-Solr-7.x-Linux/solr/core/src/java/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java
 (at line 219)
 [ecj-lint] return (NamedList) new 
JavaBinCodec(resolver).unmarshal(in);
 [ecj-lint]^^
 [ecj-lint] Resource leak: '' is never closed
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 2. WARNING in 
/home/jenkins/workspace/Lucene-Solr-7.x-Linux/solr/core/src/java/org/apache/solr/cloud/api/collections/RestoreCmd.java
 (at line 257)
 [ecj-lint] throw new SolrException(ErrorCode.BAD_REQUEST, "Unexpected 
number of replicas, replicationFactor, " +
 [ecj-lint]   Replica.Type.NRT + " or " + Replica.Type.TLOG + " 
must be greater than 0");
 [ecj-lint] 
^^^
 [ecj-lint] Resource leak: 'repository' is not closed at this location
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 3. WARNING in 
/home/jenkins/workspace/Lucene-Solr-7.x-Linux/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java
 (at line 137)
 [ecj-lint] new JavaBinCodec() {
 [ecj-lint]   SolrParams params;
 [ecj-lint]   AddUpdateCommand addCmd = null;
 [ecj-lint] 
 [ecj-lint]   @Override
 [ecj-lint]   public List readIterator(DataInputInputStream fis) 
throws IOException {
 [ecj-lint] while (true) {
 [ecj-lint]   Object o = readVal(fis);
 [ecj-lint]   if (o == END_OBJ) break;
 [ecj-lint]   if (o instanceof NamedList) {
 [ecj-lint] params = ((NamedList) o).toSolrParams();
 [ecj-lint]   } else {
 [ecj-lint] try {
 [ecj-lint]   if (o instanceof byte[]) {
 [ecj-lint] if (params != null) req.setParams(params);
 [ecj-lint] byte[] buf = (byte[]) o;
 [ecj-lint] contentStreamLoader.load(req, rsp, new 
ContentStreamBase.ByteArrayStream(buf, null), processor);
 [ecj-lint]   } else {
 [ecj-lint] throw new RuntimeException("unsupported type ");
 [ecj-lint]   }
 [ecj-lint] } catch (Exception e) {
 [ecj-lint]   throw new RuntimeException(e);
 [ecj-lint] } finally {
 [ecj-lint]   params = null;
 [ecj-lint]   req.setParams(old);
 [ecj-lint] }
 [ecj-lint]   }
 [ecj-lint] }
 [ecj-lint] return Collections.emptyList();
 [ecj-lint]   }
 [ecj-lint] 
 [ecj-lint] }.unmarshal(in);
 [ecj-lint] 
^^
 [ecj-lint] Resource leak: '' is never closed
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 4. ERROR in 
/home/jenkins/workspace/Lucene-Solr-7.x-Linux/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java
 (at line 32)
 [ecj-lint] import org.apache.solr.common.util.ByteArrayUtf8CharSequence;
 [ecj-lint]^
 [ecj-lint] The import org.apache.solr.common.util.ByteArrayUtf8CharSequence

[JENKINS] Lucene-Solr-7.x-Windows (64bit/jdk1.8.0_172) - Build # 949 - Failure!

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Windows/949/
Java: 64bit/jdk1.8.0_172 -XX:-UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.handler.TestSQLHandler.doTest

Error Message:


Stack Trace:
java.lang.AssertionError
at 
__randomizedtesting.SeedInfo.seed([6D2A49D52AAA9387:CA6EF1714711803E]:0)
at 
org.apache.solr.handler.TestSQLHandler.testBasicSelect(TestSQLHandler.java:236)
at org.apache.solr.handler.TestSQLHandler.doTest(TestSQLHandler.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1063)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1035)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:748)




Build Log:
[...truncated 13929 lines...]
   [junit4] Suite: org.apache.solr.handler.TestSQLHandler
   [junit4]

[JENKINS] Lucene-Solr-master-Linux (32bit/jdk1.8.0_172) - Build # 23494 - Failure!

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/23494/
Java: 32bit/jdk1.8.0_172 -client -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 59675 lines...]
-ecj-javadoc-lint-src:
[mkdir] Created dir: /tmp/ecj245164612
 [ecj-lint] Compiling 1241 source files to /tmp/ecj245164612
 [ecj-lint] Processing annotations
 [ecj-lint] Annotations processed
 [ecj-lint] Processing annotations
 [ecj-lint] No elements to process
 [ecj-lint] invalid Class-Path header in manifest of jar file: 
/home/jenkins/.ivy2/cache/org.restlet.jee/org.restlet/jars/org.restlet-2.3.0.jar
 [ecj-lint] invalid Class-Path header in manifest of jar file: 
/home/jenkins/.ivy2/cache/org.restlet.jee/org.restlet.ext.servlet/jars/org.restlet.ext.servlet-2.3.0.jar
 [ecj-lint] --
 [ecj-lint] 1. WARNING in 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/core/src/java/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java
 (at line 219)
 [ecj-lint] return (NamedList) new 
JavaBinCodec(resolver).unmarshal(in);
 [ecj-lint]^^
 [ecj-lint] Resource leak: '' is never closed
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 2. WARNING in 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/core/src/java/org/apache/solr/cloud/api/collections/RestoreCmd.java
 (at line 257)
 [ecj-lint] throw new SolrException(ErrorCode.BAD_REQUEST, "Unexpected 
number of replicas, replicationFactor, " +
 [ecj-lint]   Replica.Type.NRT + " or " + Replica.Type.TLOG + " 
must be greater than 0");
 [ecj-lint] 
^^^
 [ecj-lint] Resource leak: 'repository' is not closed at this location
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 3. WARNING in 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java
 (at line 137)
 [ecj-lint] new JavaBinCodec() {
 [ecj-lint]   SolrParams params;
 [ecj-lint]   AddUpdateCommand addCmd = null;
 [ecj-lint] 
 [ecj-lint]   @Override
 [ecj-lint]   public List readIterator(DataInputInputStream fis) 
throws IOException {
 [ecj-lint] while (true) {
 [ecj-lint]   Object o = readVal(fis);
 [ecj-lint]   if (o == END_OBJ) break;
 [ecj-lint]   if (o instanceof NamedList) {
 [ecj-lint] params = ((NamedList) o).toSolrParams();
 [ecj-lint]   } else {
 [ecj-lint] try {
 [ecj-lint]   if (o instanceof byte[]) {
 [ecj-lint] if (params != null) req.setParams(params);
 [ecj-lint] byte[] buf = (byte[]) o;
 [ecj-lint] contentStreamLoader.load(req, rsp, new 
ContentStreamBase.ByteArrayStream(buf, null), processor);
 [ecj-lint]   } else {
 [ecj-lint] throw new RuntimeException("unsupported type ");
 [ecj-lint]   }
 [ecj-lint] } catch (Exception e) {
 [ecj-lint]   throw new RuntimeException(e);
 [ecj-lint] } finally {
 [ecj-lint]   params = null;
 [ecj-lint]   req.setParams(old);
 [ecj-lint] }
 [ecj-lint]   }
 [ecj-lint] }
 [ecj-lint] return Collections.emptyList();
 [ecj-lint]   }
 [ecj-lint] 
 [ecj-lint] }.unmarshal(in);
 [ecj-lint] 
^^
 [ecj-lint] Resource leak: '' is never closed
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 4. ERROR in 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java
 (at line 32)
 [ecj-lint] import org.apache.solr.common.util.ByteArrayUtf8CharSequence;
 [ecj-lint]^
 [ecj-lint] The import

Re: IndexOptions without frequencies

2019-01-08 Thread Adrien Grand

Hi Nitin,

We need to encore frequencies somehow in order to know how many
positions a posting has, so we couldn't really optimize storage for
that use-case. Besides, I suspect that frequencies are only a tiny
part of the disk footprint of an inverted index when positions are
enabled.

On Tue, Jan 8, 2019 at 8:59 PM Nitin Goyal  wrote:
>
> Hi All,
>
> Does it make sense to have another option for IndexOptions, i.e. 
> DOCS_AND_POSITIONS
> Use case is that we are looking on to decrease our storage and we don't care 
> about frequencies but we do need positions for phrase queries.
>
> https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/index/IndexOptions.html
>
> --
>
> Nitin Goyal
>
>

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-SmokeRelease-7.x - Build # 423 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.x/423/

No tests ran.

Build Log:
[...truncated 23484 lines...]
[asciidoctor:convert] asciidoctor: ERROR: about-this-guide.adoc: line 1: 
invalid part, must have at least one section (e.g., chapter, appendix, etc.)
[asciidoctor:convert] asciidoctor: ERROR: solr-glossary.adoc: line 1: invalid 
part, must have at least one section (e.g., chapter, appendix, etc.)
 [java] Processed 2458 links (2009 relative) to 3221 anchors in 247 files
 [echo] Validated Links & Anchors via: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/solr/build/solr-ref-guide/bare-bones-html/

-dist-changes:
 [copy] Copying 4 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/solr/package/changes

package:

-unpack-solr-tgz:

-ensure-solr-tgz-exists:
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/solr/build/solr.tgz.unpacked
[untar] Expanding: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/solr/package/solr-7.7.0.tgz
 into 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/solr/build/solr.tgz.unpacked

generate-maven-artifacts:

resolve:

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-7.x/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings

[JENKINS] Lucene-Solr-master-Windows (32bit/jdk1.8.0_172) - Build # 7685 - Failure!

2019-01-08 Thread ASF subversion and git services (JIRA)

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7685/
Java: 32bit/jdk1.8.0_172 -client -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 59650 lines...]
-ecj-javadoc-lint-src:
[mkdir] Created dir: C:\Users\jenkins\AppData\Local\Temp\ecj373449379
 [ecj-lint] Compiling 1241 source files to 
C:\Users\jenkins\AppData\Local\Temp\ecj373449379
 [ecj-lint] Processing annotations
 [ecj-lint] Annotations processed
 [ecj-lint] Processing annotations
 [ecj-lint] No elements to process
 [ecj-lint] invalid Class-Path header in manifest of jar file: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\core\lib\org.restlet-2.3.0.jar
 [ecj-lint] invalid Class-Path header in manifest of jar file: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\core\lib\org.restlet.ext.servlet-2.3.0.jar
 [ecj-lint] --
 [ecj-lint] 1. WARNING in 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\core\src\java\org\apache\solr\client\solrj\embedded\EmbeddedSolrServer.java
 (at line 219)
 [ecj-lint] return (NamedList) new 
JavaBinCodec(resolver).unmarshal(in);
 [ecj-lint]^^
 [ecj-lint] Resource leak: '' is never closed
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 2. WARNING in 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\core\src\java\org\apache\solr\cloud\api\collections\RestoreCmd.java
 (at line 257)
 [ecj-lint] throw new SolrException(ErrorCode.BAD_REQUEST, "Unexpected 
number of replicas, replicationFactor, " +
 [ecj-lint]   Replica.Type.NRT + " or " + Replica.Type.TLOG + " 
must be greater than 0");
 [ecj-lint] 
^^^
 [ecj-lint] Resource leak: 'repository' is not closed at this location
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 3. WARNING in 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\core\src\java\org\apache\solr\handler\loader\JavabinLoader.java
 (at line 137)
 [ecj-lint] new JavaBinCodec() {
 [ecj-lint]   SolrParams params;
 [ecj-lint]   AddUpdateCommand addCmd = null;
 [ecj-lint] 
 [ecj-lint]   @Override
 [ecj-lint]   public List readIterator(DataInputInputStream fis) 
throws IOException {
 [ecj-lint] while (true) {
 [ecj-lint]   Object o = readVal(fis);
 [ecj-lint]   if (o == END_OBJ) break;
 [ecj-lint]   if (o instanceof NamedList) {
 [ecj-lint] params = ((NamedList) o).toSolrParams();
 [ecj-lint]   } else {
 [ecj-lint] try {
 [ecj-lint]   if (o instanceof byte[]) {
 [ecj-lint] if (params != null) req.setParams(params);
 [ecj-lint] byte[] buf = (byte[]) o;
 [ecj-lint] contentStreamLoader.load(req, rsp, new 
ContentStreamBase.ByteArrayStream(buf, null), processor);
 [ecj-lint]   } else {
 [ecj-lint] throw new RuntimeException("unsupported type ");
 [ecj-lint]   }
 [ecj-lint] } catch (Exception e) {
 [ecj-lint]   throw new RuntimeException(e);
 [ecj-lint] } finally {
 [ecj-lint]   params = null;
 [ecj-lint]   req.setParams(old);
 [ecj-lint] }
 [ecj-lint]   }
 [ecj-lint] }
 [ecj-lint] return Collections.emptyList();
 [ecj-lint]   }
 [ecj-lint] 
 [ecj-lint] }.unmarshal(in);
 [ecj-lint] 
^^
 [ecj-lint] Resource leak: '' is never closed
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 4. ERROR in 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\core\src\java\org\apache\solr\update\DocumentBuilder.java
 (at line 32)
 [ecj-lint] import org.apache.solr.common.util.ByteArrayUtf8CharSequence;
 [ecj-lint]

IndexOptions without frequencies

2019-01-08 Thread Nitin Goyal

Hi All,

Does it make sense to have another option for IndexOptions, i.e.
DOCS_AND_POSITIONS

Use case is that we are looking on to decrease our storage and we don't
care about frequencies but we do need positions for phrase queries.

https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/index/IndexOptions.html

-- 

Nitin Goyal

[GitHub] lucene-solr issue #531: SOLR-12768

2019-01-08 Thread dsmiley

Github user dsmiley commented on the issue:

https://github.com/apache/lucene-solr/pull/531
  
Code changes seem good; I'll look closer and run tests.

PathHierarchyTokenizerFactory _might_ eventually be used at query time if 
we had a syntax to say "get me all my ancestors".  But given it's conditional 
based on syntax, it can't easily go into the query analyzer, so I think it's 
better to leave the query time chain simple/direct and we do this stuff when 
looking at the syntax.  I know this has evolved recently from where we were.  
In this issue I removed PathHierarchyTokenizer from the index side because I 
feel it's better to err on lighter-weight indexing at the expense of slower 
queries since I think it's very likely a very low cost for what I think are 
typical use-cases.

RE issue to discuss the syntax: yes a subtask of SOLR-12298.  no hurry on 
that one


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12888) NestedUpdateProcessor code should activate automatically in 8.0



[ 
https://issues.apache.org/jira/browse/SOLR-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737828#comment-16737828
 ] 

ASF subversion and git services commented on SOLR-12888:


Commit e5c7bb4ddfa1344042b36ef5d5744e8fb6a0d0ab in lucene-solr's branch 
refs/heads/branch_8x from David Wayne Smiley
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e5c7bb4 ]

SOLR-12888: Run URP now auto-registers NestedUpdateProcessor before it.

(cherry picked from commit df119573dbc5781b2eed357821856b44bd7af5fd)


> NestedUpdateProcessor code should activate automatically in 8.0
> ---
>
> Key: SOLR-12888
> URL: https://issues.apache.org/jira/browse/SOLR-12888
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Blocker
> Fix For: 8.0
>
> Attachments: SOLR-12888.patch
>
>
> If the schema supports it, the NestedUpdateProcessor URP should be registered 
> automatically somehow.  The Factory for this already looks for the existence 
> of certain special fields in the schema, so that's good.  But the URP Factory 
> needs to be added to your chain in any of the ways we support that.  _In 8.0 
> the user shouldn't have to do anything to their solrconfig._  
> We might un-URP this and call directly somewhere.  Or perhaps we might add a 
> special named URP chain (needn't document), defined automatically, that 
> activates at RunURP.  Perhaps other things could be added to this in the 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12888) NestedUpdateProcessor code should activate automatically in 8.0

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737827#comment-16737827
 ] 

ASF subversion and git services commented on SOLR-12888:


Commit df119573dbc5781b2eed357821856b44bd7af5fd in lucene-solr's branch 
refs/heads/master from David Wayne Smiley
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df11957 ]

SOLR-12888: Run URP now auto-registers NestedUpdateProcessor before it.


> NestedUpdateProcessor code should activate automatically in 8.0
> ---
>
> Key: SOLR-12888
> URL: https://issues.apache.org/jira/browse/SOLR-12888
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Blocker
> Fix For: 8.0
>
> Attachments: SOLR-12888.patch
>
>
> If the schema supports it, the NestedUpdateProcessor URP should be registered 
> automatically somehow.  The Factory for this already looks for the existence 
> of certain special fields in the schema, so that's good.  But the URP Factory 
> needs to be added to your chain in any of the ways we support that.  _In 8.0 
> the user shouldn't have to do anything to their solrconfig._  
> We might un-URP this and call directly somewhere.  Or perhaps we might add a 
> special named URP chain (needn't document), defined automatically, that 
> activates at RunURP.  Perhaps other things could be added to this in the 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

BadApples

2019-01-08 Thread Erick Erickson

It's looking much better, there are only a few tests that repeatedly
fail, and most of them are failing a small percentage of the time.
Here's the short form, full list attached:
DO NOT ENABLE LIST:
MoveReplicaHDFSTest.testFailedMove
MoveReplicaHDFSTest.testNormalFailedMove
TestControlledRealTimeReopenThread.testCRTReopen
TestICUNormalizer2CharFilter.testRandomStrings
TestICUTokenizerCJK
TestImpersonationWithHadoopAuth.testForwarding
TestLTRReRankingPipeline.testDifferentTopN
TestRandomChains


DO NOT ANNOTATE LIST
CdcrBidirectionalTest.testBiDir
IndexSizeTriggerTest.testMergeIntegration
IndexSizeTriggerTest.testMixedBounds
IndexSizeTriggerTest.testSplitIntegration
IndexSizeTriggerTest.testTrigger
InfixSuggestersTest.testShutdownDuringBuild
ShardSplitTest.test
ShardSplitTest.testSplitMixedReplicaTypes
ShardSplitTest.testSplitWithChaosMonkey
TestLatLonShapeQueries.testRandomBig
TestRandomChains.testRandomChainsWithLargeStrings
TestTriggerIntegration.testSearchRate

Processing file (History bit 3): HOSS-2019-01-08.csv
Processing file (History bit 2): HOSS-2018-12-31.csv
Processing file (History bit 1): HOSS-2018-12-24.csv
Processing file (History bit 0): HOSS-2018-12-17.csv


**Annotated tests that didn't fail in the last 4 weeks.

  **Tests removed from the next two lists because they were specified
in 'doNotEnable' in the properties file
 MoveReplicaHDFSTest.testNormalFailedMove

  **Annotations will be removed from the following tests because they
haven't failed in the last 4 rollups.

  **Methods: 11
   ComputePlanActionTest.testNodeAdded
   ComputePlanActionTest.testNodeLostTriggerWithDeleteNodePreferredOp
   CustomCollectionTest.testRouteFieldForHashRouter
   DeleteReplicaTest.raceConditionOnDeleteAndRegisterReplicaLegacy
   ScheduledTriggerIntegrationTest.testScheduledTrigger
   SolrRrdBackendFactoryTest.testBasic
   StreamingTest.testParallelMergeStream
   StreamingTest.testZeroParallelReducerStream
   TestMiniSolrCloudClusterSSL.testSslWithCheckPeerName
   TestStressInPlaceUpdates.stressTest
   ZkShardTermsTest.testParticipationOfReplicas


Failures in Hoss' reports for the last 4 rollups.

There were 655 unannotated tests that failed in Hoss' rollups. Ordered
by the date I downloaded the rollup file, newest->oldest. See above
for the dates the files were collected
These tests were NOT BadApple'd or AwaitsFix'd
All tests that failed 4 weeks running will be BadApple'd unless there
are objections

Failures in the last 4 reports..
   Report   Pct runsfails   test
 0123   0.2 1673 11
MissingSegmentRecoveryTest.testLeaderRecovery
 0123   1.6 1484 19  TestSQLHandler.doTest
 0123   0.2 1702 18  TestSearchAfter.testQueries
 0123   8.1 1656 59  TestSimLargeCluster.testAddNode
 0123   0.7 1656  9  TestSimLargeCluster.testSearchRate
 0123   2.8  502 12  TestSimTriggerIntegration.testCooldown
 0123   1.4 1212 34  TestSimTriggerIntegration.testListeners
 0123  37.3  501160
TestSimTriggerIntegration.testNodeMarkersRegistration


As usual, I'll commit on Thursday  barring objections.

Erick
DO NOT ENABLE LIST:
MoveReplicaHDFSTest.testFailedMove
MoveReplicaHDFSTest.testNormalFailedMove
TestControlledRealTimeReopenThread.testCRTReopen
TestICUNormalizer2CharFilter.testRandomStrings
TestICUTokenizerCJK
TestImpersonationWithHadoopAuth.testForwarding
TestLTRReRankingPipeline.testDifferentTopN
TestRandomChains


DO NOT ANNOTATE LIST
CdcrBidirectionalTest.testBiDir
IndexSizeTriggerTest.testMergeIntegration
IndexSizeTriggerTest.testMixedBounds
IndexSizeTriggerTest.testSplitIntegration
IndexSizeTriggerTest.testTrigger
InfixSuggestersTest.testShutdownDuringBuild
ShardSplitTest.test
ShardSplitTest.testSplitMixedReplicaTypes
ShardSplitTest.testSplitWithChaosMonkey
TestLatLonShapeQueries.testRandomBig
TestRandomChains.testRandomChainsWithLargeStrings
TestTriggerIntegration.testSearchRate

Processing file (History bit 3): HOSS-2019-01-08.csv
Processing file (History bit 2): HOSS-2018-12-31.csv
Processing file (History bit 1): HOSS-2018-12-24.csv
Processing file (History bit 0): HOSS-2018-12-17.csv


**Annotated tests that didn't fail in the last 4 weeks.

  **Tests removed from the next two lists because they were specified in 
'doNotEnable' in the properties file
 MoveReplicaHDFSTest.testNormalFailedMove

  **Annotations will be removed from the following tests because they haven't 
failed in the last 4 rollups.

  **Methods: 11
   ComputePlanActionTest.testNodeAdded
   ComputePlanActionTest.testNodeLostTriggerWithDeleteNodePreferredOp
   CustomCollectionTest.testRouteFieldForHashRouter
   DeleteReplicaTest.raceConditionOnDeleteAndRegisterReplicaLegacy
   ScheduledTriggerIntegrationTest.testScheduledTrigger

[jira] [Commented] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737785#comment-16737785
 ] 

ASF subversion and git services commented on SOLR-12983:


Commit 507a96e4181d4151d36332d46dd51e7ca5a09f90 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=507a96e ]

SOLR-12983: JavabinLoader should avoid creating String Objects and create 
UTF8CharSequence fields from byte[]


> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13130) during the ResponseBuilder.STAGE_GET_FIELDS avoid creation of Strings

Noble Paul created SOLR-13130:
-

 Summary: during the ResponseBuilder.STAGE_GET_FIELDS avoid 
creation of Strings
 Key: SOLR-13130
 URL: https://issues.apache.org/jira/browse/SOLR-13130
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Noble Paul


The javabin format bytes can be directly written out to the client instead of 
doing the double transformation
 

 


{{ utf-8 -> String -> utf8 }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737788#comment-16737788
 ] 

Noble Paul commented on SOLR-12983:
---

[~steve_rowe] done. thanks

> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]



[ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737783#comment-16737783
 ] 

ASF subversion and git services commented on SOLR-12983:


Commit 91a07ce43555607d00814b08d34323efc0dadc84 in lucene-solr's branch 
refs/heads/branch_7x from Noble Paul
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=91a07ce ]

SOLR-12983: Create DocValues fields directly from byte[]


> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737787#comment-16737787
 ] 

ASF subversion and git services commented on SOLR-12983:


Commit 0d4c81f2f9d354514c323e2876eea71b901021ca in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0d4c81f ]

SOLR-12983: Create DocValues fields directly from byte[]


> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]

2019-01-08 Thread ASF subversion and git services (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-12983.
---
   Resolution: Fixed
Fix Version/s: 7.7
   8.0

> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.0, 7.7
>
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]



[ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737786#comment-16737786
 ] 

ASF subversion and git services commented on SOLR-12983:


Commit 6f6a35d8f7353856476d24dbfe404c4b171dafc2 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6f6a35d ]

SOLR-12983: tests don't need to use the optimization


> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]

2019-01-08 Thread ASF subversion and git services (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-12983:
-

Assignee: Noble Paul

> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]



[ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737778#comment-16737778
 ] 

ASF subversion and git services commented on SOLR-12983:


Commit d814d862b035a17eced4ac40663471105dd56a4b in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d814d86 ]

SOLR-12983: Create DocValues fields directly from byte[]


> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8631) How Nori Tokenizer can deal with Longest-Matching

2019-01-08 Thread Yeongsu Kim (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yeongsu Kim updated LUCENE-8631:

Description:
I think... Nori tokenizer has one issue.

I don’t understand why “Longest-Matching” is NOT working to Nori tokenizer via
config mode (config mode:
[https://www.elastic.co/guide/en/elasticsearch/plugins/6.x/analysis-nori-tokenizer.html).]

Here is an example for explaining what is longest-matching.

Let assume we have `userdict_ko.txt` including only three Korean single-words
such as ‘골드’, ‘브라운’, ‘골드브라운’, and save it to Nori analyzer. After update, we
can see that it outputs two tokens such as ‘골드’ and ‘브라운’, when the input is
‘골드브라운’. (In English: ‘골드’ means ‘gold’, ‘브라운’ means ‘brown’, and ‘골드브라운’ means
‘goldbrown’)

With this result, we recognize that “Longest-Matching” is NOT working. If
“Longest-Matching” is working, the output must be ‘골드브라운’, which is the longest
matching word in the user dictionary.

Curiously enough, when we add user dictionary via custom mode (custom mode:
[https://github.com/jimczi/nori/blob/master/how-to-custom-dict.asciidoc|https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjimczi%2Fnori%2Fblob%2Fmaster%2Fhow-to-custom-dict.asciidoc=02%7C01%7Chigh_yeongsu%40wemakeprice.com%7C6953d739414e4da5ad1408d67473a6fe%7C6322d5f522044e9d9ca6d18828a04daf%7C0%7C0%7C636824437418170758=5iuNvKr8WJCXlCkJQrf5r3BgDVnF5hpG7l%2BQL0Ok7Aw%3D=0]),
we found the result is ‘골드브라운’, where ‘Longest-Matching’ is applied. We think
the reason is because learned Mecab engine automatically generates word costs
by its own criteria. We hope this mechanism is also applied to config mode.

Would you tell me the way to “Longest-Matching” via config mode (not custom) or
give me some hints (e.g. where to modify source codes) to solve this problem?

P.S

Recently, I've mailed to [~jim.ferenczi], who is a developer of Nori, and
received his suggestions:

- Add a way to set a score to each new rule (this way you could set up a
negative cost for the compound word that is less than the sum of the two single
words.

- Same as above but the cost is computed from the statistics of the training
(like the custom dictionary does when you recompile entirely).

- Implement longest-match first in the dictionary.

Thanks for your support.

was:

I think... Nori tokenizer has one issue.

Here is an example for explaining what is longest-matching.

Let assume we have `userdict_ko.txt` including only three Korean single-words
such as ‘골드’, ‘브라운’, ‘골드브라운’, and save it to Nori tokenizer. After update, we
can see that it outputs two tokens such as ‘골드’ and ‘브라운’ when the input is
‘골드브라운’. (In English: ‘골드’ means ‘gold’, ‘브라운’ means ‘brown’, and ‘골드브라운’ means
‘goldbrown’)

With this result, we recognize that “Longest-Matching” is NOT working.

If “Longest-Matching” is working, the output must be ‘골드브라운’, which is the
longest matching word in the user dictionary.

Curiously enough, when we add user dictionary via custom mode (custom mode:
[https://github.com/jimczi/nori/blob/master/how-to-custom-dict.asciidoc|https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjimczi%2Fnori%2Fblob%2Fmaster%2Fhow-to-custom-dict.asciidoc=02%7C01%7Chigh_yeongsu%40wemakeprice.com%7C6953d739414e4da5ad1408d67473a6fe%7C6322d5f522044e9d9ca6d18828a04daf%7C0%7C0%7C636824437418170758=5iuNvKr8WJCXlCkJQrf5r3BgDVnF5hpG7l%2BQL0Ok7Aw%3D=0]),
we found the result is ‘골드브라운’, where ‘Longest-Matching’ is applied. We think
that the reason is because learned Mecab engine automatically generates word
costs by its own criteria. We hope this mechanism is also applied to config
mode.

Would you tell me the way to “Longest-Matching” via config mode (not custom) or
give me some hints (e.g. where to modify source codes) to solve this problem?

P.S

Recently, I've mailed to Jim Ferenczi, who is a developer of Nori, and received
his suggestions:

* Add a way to set a score to each new rule (this way you could set up a
negative cost for the compound word that is less than the sum of the two single
words.

* Same as above but the cost is computed from the statistics of the training
(like the custom dictionary does when you recompile entirely).

* Implement longest-match first in the dictionary.

Thanks for your support.

> How Nori Tokenizer can deal with Longest-Matching
> -
>
> Key: LUCENE-8631
> URL: https://issues.apache.org/jira/browse/LUCENE-8631
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis

[JENKINS] Lucene-Solr-BadApples-NightlyTests-7.x - Build # 43 - Still Unstable

Build: https://builds.apache.org/job/Lucene-Solr-BadApples-NightlyTests-7.x/43/

8 tests failed.
FAILED:  org.apache.lucene.search.TestInetAddressRangeQueries.testRandomBig

Error Message:
Test abandoned because suite timeout was reached.

Stack Trace:
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([90FAB33DE109B5B3]:0)


FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.TestInetAddressRangeQueries

Error Message:
Suite timeout exceeded (>= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (>= 720 msec).
at __randomizedtesting.SeedInfo.seed([90FAB33DE109B5B3]:0)


FAILED:  org.apache.solr.cloud.UnloadDistributedZkTest.test

Error Message:
Captured an uncaught exception in thread: Thread[id=1813, 
name=testExecutor-640-thread-7, state=RUNNABLE, 
group=TGRP-UnloadDistributedZkTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=1813, name=testExecutor-640-thread-7, 
state=RUNNABLE, group=TGRP-UnloadDistributedZkTest]
Caused by: java.lang.RuntimeException: 
org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting 
response from server at: http://127.0.0.1:46080/frg/nc
at __randomizedtesting.SeedInfo.seed([77E910A0EED8C76F]:0)
at 
org.apache.solr.cloud.BasicDistributedZkTest.lambda$createCollectionInOneInstance$1(BasicDistributedZkTest.java:659)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout occured 
while waiting response from server at: http://127.0.0.1:46080/frg/nc
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
at 
org.apache.solr.cloud.BasicDistributedZkTest.lambda$createCollectionInOneInstance$1(BasicDistributedZkTest.java:657)
... 4 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at 
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at 
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
... 9 more


FAILED:  
org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitWithChaosMonkey

Error Message:
Address already in use

Stack Trace:
java.net.BindException: Address already in use
at

[JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 2250 - Unstable!

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/2250/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ObjectTracker found 6 object(s) that were not released!!! 
[MockDirectoryWrapper, MockDirectoryWrapper, MockDirectoryWrapper, 
MockDirectoryWrapper, SolrCore, InternalHttpClient] 
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException: 
org.apache.lucene.store.MockDirectoryWrapper  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:42)
  at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:348)
  at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:359)  at 
org.apache.solr.core.SolrCore.initIndex(SolrCore.java:738)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:967)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:874)  at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1189)
  at org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:695)  
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
 at java.lang.Thread.run(Thread.java:748)  
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException: 
org.apache.lucene.store.MockDirectoryWrapper  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:42)
  at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:348)
  at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:95)  at 
org.apache.solr.core.SolrCore.initIndex(SolrCore.java:770)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:967)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:874)  at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1189)
  at org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:695)  
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
 at java.lang.Thread.run(Thread.java:748)  
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException: 
org.apache.lucene.store.MockDirectoryWrapper  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:42)
  at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:348)
  at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:503)  
at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:346) 
 at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:422) 
 at 
org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1182)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)  
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
 at java.lang.Thread.run(Thread.java:748)  
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException: 
org.apache.lucene.store.MockDirectoryWrapper  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:42)
  at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:348)
  at 
org.apache.solr.core.SolrCore.initSnapshotMetaDataManager(SolrCore.java:508)  
at org.apache.solr.core.SolrCore.(SolrCore.java:959)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:874)  at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1189)
  at org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:695)  
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)  at

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737679#comment-16737679
 ] 

Jan Høydahl commented on SOLR-13116:


Uploaded an (untested) patch that aims to solve the UI bug and add the login 
text agreed upon. To test it, it should be enough to re-build the webapp and 
reload the browser
{code:java}
pushd webapp && ant dist && popd{code}
 

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: SOLR-13116.patch, eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13116) Add Admin UI login support for Kerberos



 [ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-13116:
---
Attachment: SOLR-13116.patch

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: SOLR-13116.patch, eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13116) Add Admin UI login support for Kerberos



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737558#comment-16737558
 ] 

Jan Høydahl edited comment on SOLR-13116 at 1/8/19 11:57 PM:
-

>From the stack trace it is evident that the response from Kerberos auth plugin 
>does contain a {{WWW-Authenticate}} but when trying to split this in a schema 
>and parameters with regex {{/(\w+)\s+(.*)/}} then there is not match, so 
>probably the header has some other format.

Could you perhaps check the WWW-Authenticate header in the browser's debug 
panel under the Network tab? Then we can either change the parsing of the 
pattern or we can change the header sent by Kerberos plugin to carry 
information about the scheme.

According to 
[https://tools.ietf.org/html/rfc4559#section-4|https://tools.ietf.org/html/rfc4559#section-4)]
 the plain string "Negotiate" is returned in the first phase of Kerberos, so 
the regex fails. Working on a better parsing code.


was (Author: janhoy):
>From the stack trace it is evident that the response from Kerberos auth plugin 
>does contain a {{WWW-Authenticate}} but when trying to split this in a schema 
>and parameters with regex {{/(\w+)\s+(.*)/}} then there is not match, so 
>probably the header has some other format.

Could you perhaps check the WWW-Authenticate header in the browser's debug 
panel under the Network tab? Then we can either change the parsing of the 
pattern or we can change the header sent by Kerberos plugin to carry 
information about the scheme.

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-7.x-MacOSX (64bit/jdk1.8.0) - Build # 1015 - Failure!

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/1015/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 14334 lines...]
   [junit4] JVM J1: stdout was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/solr/build/solr-core/test/temp/junit4-J1-20190108_220753_0142400029962890954877.sysout
   [junit4] >>> JVM J1 emitted unexpected output (verbatim) 
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGFPE (0x8) at pc=0x7fff8874d143, pid=78791, 
tid=0x4603
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build 
1.8.0_172-b11)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode 
bsd-amd64 )
   [junit4] # Problematic frame:
   [junit4] # C  [libsystem_kernel.dylib+0x11143]  __commpage_gettimeofday+0x43
   [junit4] #
   [junit4] # Failed to write core dump. Core dumps have been disabled. To 
enable core dumping, try "ulimit -c unlimited" before starting Java again
   [junit4] #
   [junit4] # An error report file with more information is saved as:
   [junit4] # 
/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/solr/build/solr-core/test/J1/hs_err_pid78791.log
   [junit4] #
   [junit4] # If you would like to submit a bug report, please visit:
   [junit4] #   http://bugreport.java.com/bugreport/crash.jsp
   [junit4] #
   [junit4] <<< JVM J1: EOF 

   [junit4] JVM J0: stdout was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20190108_220753_014934698731847611233.sysout
   [junit4] >>> JVM J0 emitted unexpected output (verbatim) 
   [junit4] [thread 331587 also had an error]
   [junit4] [thread 235231 also had an error][thread 256911 also had an error]
   [junit4] 
   [junit4] [thread 269359 also had an error][thread 16899 also had an error]
   [junit4] 
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGFPE (0x8) at pc=0x7fff8874d143, pid=78790, 
tid=0xdb3f
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build 
1.8.0_172-b11)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode 
bsd-amd64 )
   [junit4] # Problematic frame:
   [junit4] # C  [libsystem_kernel.dylib+0x11143]  __commpage_gettimeofday+0x43
   [junit4] #
   [junit4] # Failed to write core dump. Core dumps have been disabled. To 
enable core dumping, try "ulimit -c unlimited" before starting Java again
   [junit4] #
   [junit4] # An error report file with more information is saved as:
   [junit4] # 
/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/solr/build/solr-core/test/J0/hs_err_pid78790.log
   [junit4] #
   [junit4] # If you would like to submit a bug report, please visit:
   [junit4] #   http://bugreport.java.com/bugreport/crash.jsp
   [junit4] #
   [junit4] <<< JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/heapdumps -ea 
-esa -Dtests.prefix=tests -Dtests.seed=E3DD0C11D0F54375 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=7.7.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false 
-Dtests.slow=true -Dtests.asserts=true -Dtests.multiplier=1 -DtempDir=./temp 
-Djava.io.tmpdir=./temp 
-Dcommon.dir=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/lucene 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/lucene/build/clover/db
 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/lucene/tools/junit4/solr-tests.policy
 -Dtests.LUCENE_VERSION=7.7.0 -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.src.home=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX 
-Djava.security.egd=file:/dev/./urandom 
-Djunit4.childvm.cwd=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/solr/build/solr-core/test/J0
 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-7.x-MacOSX/solr/build/solr-core/test/temp
 -Djunit4.childvm.id=0 -Djunit4.childvm.count=2 -Dtests.leaveTemporary=false 
-Dtests.filterstacks=true -Dtests.disableHdfs=true -Dtests.badapples=false

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737558#comment-16737558
 ] 

Jan Høydahl commented on SOLR-13116:


>From the stack trace it is evident that the response from Kerberos auth plugin 
>does contain a {{WWW-Authenticate}} but when trying to split this in a schema 
>and parameters with regex {{/(\w+)\s+(.*)/}} then there is not match, so 
>probably the header has some other format.

Could you perhaps check the WWW-Authenticate header in the browser's debug 
panel under the Network tab? Then we can either change the parsing of the 
pattern or we can change the header sent by Kerberos plugin to carry 
information about the scheme.

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12891) Injection Dangers in Streaming Expressions

2019-01-08 Thread Cassandra Targett (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-12891:
-
Priority: Minor  (was: Blocker)

> Injection Dangers in Streaming Expressions
> --
>
> Key: SOLR-12891
> URL: https://issues.apache.org/jira/browse/SOLR-12891
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Affects Versions: 7.5, 8.0
>Reporter: Gus Heck
>Priority: Minor
>  Labels: security
> Attachments: SOLR-12891.patch, SOLR-12891.patch, SOLR-12891.patch, 
> SOLR12819example.java
>
>
> I just spent some time fiddling with streaming expressions for fun, reading 
> Erick Erickson's blog 
> ([https://lucidworks.com/2017/12/06/streaming-expressions-in-solrj/)] and the 
> example given in the ref guide 
> ([https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html#streaming-requests-and-responses)]
>  and it occurred to me that we are recommending string concatenation into an 
> expression language with the power to harm the server, or other network 
> services visible from the server. I'm starting this Jira as a security issue 
> to avoid creating a public impression of insecurity, feel free to undo that 
> if I have guessed wrong. I haven't developed an exploit example, but it would 
> go something like this:
>  # Some portion of an expression is built including user supplied data using 
> the techniques we're recommending in the ref guide
>  # Malicious user constructs input data that breaks out of the expression 
> (SOLR-10894 is relevant here), probably somewhere inside a let() expression 
> where one could simply define an additional variable taking the value of a 
> malicious expression...
>  # update() expression is executed to add/overwrite data, jdbc() makes a JDBC 
> connection to a database visible to the server, or the malicious expression 
> executes some very expensive expression for DOS effect.
> Technically this is of course the fault of the end user who allowed unchecked 
> input into programmatic execution, but when I think about how to check the 
> input I realize that the only way to be sure is to construct for myself a 
> notion of exactly how the parser behaves and then determine what needs to be 
> escaped. To do this I need to dig into the expression parser code...
> How to escape input is also already unclear as shown by SOLR-10894
> There's another important wrinkle that would easily be missed by someone 
> trying to construct their own escaping/protection system relating to 
> parameter substitution as discussed here: SOLR-8458 
> I think the solution to this is that SolrJ API should be enhanced to provide 
> an escaping utility at a minimum and possibly a "prepared expression" similar 
> to SQL prepared statements and call this issue to attention in the ref guide 
> once these tools are available... 
> Additionally, templating features might be a useful addition to help folks 
> manage large expressions and facilitate re-use of patterns... such templating 
> should also have this issue in mind when/if they are added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12891) Injection Dangers in Streaming Expressions

2019-01-08 Thread Cassandra Targett (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-12891:
-
Security: Public  (was: Private (Security Issue))

Changed issue visibility from private to public.

> Injection Dangers in Streaming Expressions
> --
>
> Key: SOLR-12891
> URL: https://issues.apache.org/jira/browse/SOLR-12891
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Affects Versions: 7.5, 8.0
>Reporter: Gus Heck
>Priority: Blocker
>  Labels: security
> Attachments: SOLR-12891.patch, SOLR-12891.patch, SOLR-12891.patch, 
> SOLR12819example.java
>
>
> I just spent some time fiddling with streaming expressions for fun, reading 
> Erick Erickson's blog 
> ([https://lucidworks.com/2017/12/06/streaming-expressions-in-solrj/)] and the 
> example given in the ref guide 
> ([https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html#streaming-requests-and-responses)]
>  and it occurred to me that we are recommending string concatenation into an 
> expression language with the power to harm the server, or other network 
> services visible from the server. I'm starting this Jira as a security issue 
> to avoid creating a public impression of insecurity, feel free to undo that 
> if I have guessed wrong. I haven't developed an exploit example, but it would 
> go something like this:
>  # Some portion of an expression is built including user supplied data using 
> the techniques we're recommending in the ref guide
>  # Malicious user constructs input data that breaks out of the expression 
> (SOLR-10894 is relevant here), probably somewhere inside a let() expression 
> where one could simply define an additional variable taking the value of a 
> malicious expression...
>  # update() expression is executed to add/overwrite data, jdbc() makes a JDBC 
> connection to a database visible to the server, or the malicious expression 
> executes some very expensive expression for DOS effect.
> Technically this is of course the fault of the end user who allowed unchecked 
> input into programmatic execution, but when I think about how to check the 
> input I realize that the only way to be sure is to construct for myself a 
> notion of exactly how the parser behaves and then determine what needs to be 
> escaped. To do this I need to dig into the expression parser code...
> How to escape input is also already unclear as shown by SOLR-10894
> There's another important wrinkle that would easily be missed by someone 
> trying to construct their own escaping/protection system relating to 
> parameter substitution as discussed here: SOLR-8458 
> I think the solution to this is that SolrJ API should be enhanced to provide 
> an escaping utility at a minimum and possibly a "prepared expression" similar 
> to SQL prepared statements and call this issue to attention in the ref guide 
> once these tools are available... 
> Additionally, templating features might be a useful addition to help folks 
> manage large expressions and facilitate re-use of patterns... such templating 
> should also have this issue in mind when/if they are added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk-9) - Build # 5010 - Unstable!

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/5010/
Java: 64bit/jdk-9 -XX:-UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.cloud.HttpPartitionTest.test

Error Message:
Doc with id=2 not found in http://127.0.0.1:55799/c8n_1x2_leader_session_loss 
due to: Path not found: /id; rsp={doc=null}

Stack Trace:
java.lang.AssertionError: Doc with id=2 not found in 
http://127.0.0.1:55799/c8n_1x2_leader_session_loss due to: Path not found: /id; 
rsp={doc=null}
at 
__randomizedtesting.SeedInfo.seed([5A1F937DFE4EE446:D24BACA750B289BE]:0)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.solr.cloud.HttpPartitionTest.assertDocExists(HttpPartitionTest.java:566)
at 
org.apache.solr.cloud.HttpPartitionTest.assertDocsExistInAllReplicas(HttpPartitionTest.java:511)
at 
org.apache.solr.cloud.HttpPartitionTest.testLeaderZkSessionLoss(HttpPartitionTest.java:466)
at 
org.apache.solr.cloud.HttpPartitionTest.test(HttpPartitionTest.java:155)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1070)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1042)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at

[JENKINS] Lucene-Solr-repro - Build # 2645 - Unstable

2019-01-08 Thread ASF subversion and git services (JIRA)

Build: https://builds.apache.org/job/Lucene-Solr-repro/2645/

[...truncated 28 lines...]
[repro] Jenkins log URL: 
https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.x/426/consoleText

[repro] Revision: 5c813f37d34c0e8dc4037ec47db86e795df778cd

[repro] Ant options: -Dtests.multiplier=2 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.x/test-data/enwiki.random.lines.txt
[repro] Repro line:  ant test  -Dtestcase=TestDistributedSearch 
-Dtests.method=test -Dtests.seed=1B45B21C0A901F9 -Dtests.multiplier=2 
-Dtests.nightly=true -Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.x/test-data/enwiki.random.lines.txt
 -Dtests.locale=es-PA -Dtests.timezone=US/Arizona -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1

[repro] Repro line:  ant test  -Dtestcase=ForceLeaderTest 
-Dtests.method=testReplicasInLIRNoLeader -Dtests.seed=1B45B21C0A901F9 
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.x/test-data/enwiki.random.lines.txt
 -Dtests.locale=it -Dtests.timezone=America/Buenos_Aires -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1

[repro] git rev-parse --abbrev-ref HEAD
[repro] git rev-parse HEAD
[repro] Initial local git branch/revision: 
951b4e4c83756d2d5b8592168cdba828a9133ba3
[repro] git fetch
[repro] git checkout 5c813f37d34c0e8dc4037ec47db86e795df778cd

[...truncated 2 lines...]
[repro] git merge --ff-only

[...truncated 1 lines...]
[repro] ant clean

[...truncated 6 lines...]
[repro] Test suites by module:
[repro]solr/core
[repro]   TestDistributedSearch
[repro]   ForceLeaderTest
[repro] ant compile-test

[...truncated 3605 lines...]
[repro] ant test-nocompile -Dtests.dups=5 -Dtests.maxfailures=10 
-Dtests.class="*.TestDistributedSearch|*.ForceLeaderTest" 
-Dtests.showOutput=onerror -Dtests.multiplier=2 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.x/test-data/enwiki.random.lines.txt
 -Dtests.seed=1B45B21C0A901F9 -Dtests.multiplier=2 -Dtests.nightly=true 
-Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.x/test-data/enwiki.random.lines.txt
 -Dtests.locale=es-PA -Dtests.timezone=US/Arizona -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1

[...truncated 1861 lines...]
   [junit4]   2> 261966 WARN  
(TEST-ForceLeaderTest.testReplicasInLIRNoLeader-seed#[1B45B21C0A901F9]) [] 
o.a.s.c.AbstractFullDistribZkTestBase ERROR: 
org.apache.solr.common.SolrException: Could not find a healthy node to handle 
the request. ... Sleeping for 1 seconds before re-try ...
   [junit4]   2> 262509 ERROR (indexFetcher-307-thread-1) [] 
o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: No registered leader was found after 
waiting for 4000ms , collection: forceleader_test_collection slice: shard1 saw 
state=DocCollection(forceleader_test_collection//collections/forceleader_test_collection/state.json/15)={
   [junit4]   2>   "pullReplicas":"0",
   [junit4]   2>   "replicationFactor":"0",
   [junit4]   2>   "shards":{"shard1":{
   [junit4]   2>   "range":"8000-7fff",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "replicas":{
   [junit4]   2> "core_node2":{
   [junit4]   2>   
"core":"forceleader_test_collection_shard1_replica_t1",
   [junit4]   2>   "base_url":"http://127.0.0.1:37059/k_dpv/c;,
   [junit4]   2>   "node_name":"127.0.0.1:37059_k_dpv%2Fc",
   [junit4]   2>   "state":"down",
   [junit4]   2>   "type":"TLOG"},
   [junit4]   2> "core_node4":{
   [junit4]   2>   "state":"down",
   [junit4]   2>   "base_url":"http://127.0.0.1:34761/k_dpv/c;,
   [junit4]   2>   
"core":"forceleader_test_collection_shard1_replica_t3",
   [junit4]   2>   "node_name":"127.0.0.1:34761_k_dpv%2Fc",
   [junit4]   2>   "force_set_state":"false",
   [junit4]   2>   "type":"TLOG"},
   [junit4]   2> "core_node6":{
   [junit4]   2>   "state":"down",
   [junit4]   2>   "base_url":"http://127.0.0.1:44452/k_dpv/c;,
   [junit4]   2>   
"core":"forceleader_test_collection_shard1_replica_t5",
   [junit4]   2>   "node_name":"127.0.0.1:44452_k_dpv%2Fc",
   [junit4]   2>   "force_set_state":"false",
   [junit4]   2>   "type":"TLOG",
   [junit4]   2>   "router":{"name":"compositeId"},
   [junit4]   2>   "maxShardsPerNode":"1",
   [junit4]   2>   "autoAddReplicas":"false",
   [junit4]   2>   "nrtReplicas":"0",
   [junit4]   2>   "tlogReplicas":"3"} with 
live_nodes=[127.0.0.1:34761_k_dpv%2Fc, 127.0.0.1:43565_k_dpv%2Fc, 
127.0.0.1:44452_k_dpv%2Fc]
   [junit4]   2>at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:902)
   [junit4]   2>at

[jira] [Commented] (SOLR-13129) Document nested child docs in the ref guide

2019-01-08 Thread David Smiley (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737514#comment-16737514
 ] 

David Smiley commented on SOLR-13129:
-

[~moshebla] I'm hoping you could help.

> Document nested child docs in the ref guide
> ---
>
> Key: SOLR-13129
> URL: https://issues.apache.org/jira/browse/SOLR-13129
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 8.0
>Reporter: David Smiley
>Priority: Major
> Fix For: 8.0
>
>
> Solr 8.0 will have nicer support for nested child documents than its 
> predecessors.  This should be documented in one place in the ref guide (to 
> the extent that makes sense). Users need to the schema ramifications (incl. 
> special fields and that some aspects are optional and when), what a nested 
> document "looks like" (XML, JSON, SolrJ), how to use the child doc 
> transformer, how to use block join queries, and get some overview of how this 
> all works.  Maybe mention some plausible future enhancements / direction this 
> is going in (e.g. path query language?).  Some of this is already done but 
> it's in various places and could be moved.  Unlike other features which 
> conveniently fit into one spot in the documentation (like a query parser), 
> this is a more complex issue that has multiple aspects – more 
> "cross-cutting", and so IMO doesn't belong in the current doc pigeon holes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13129) Document nested child docs in the ref guide

2019-01-08 Thread David Smiley (JIRA)

David Smiley created SOLR-13129:
---

 Summary: Document nested child docs in the ref guide
 Key: SOLR-13129
 URL: https://issues.apache.org/jira/browse/SOLR-13129
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: documentation
Affects Versions: 8.0
Reporter: David Smiley
 Fix For: 8.0


Solr 8.0 will have nicer support for nested child documents than its 
predecessors.  This should be documented in one place in the ref guide (to the 
extent that makes sense). Users need to the schema ramifications (incl. special 
fields and that some aspects are optional and when), what a nested document 
"looks like" (XML, JSON, SolrJ), how to use the child doc transformer, how to 
use block join queries, and get some overview of how this all works.  Maybe 
mention some plausible future enhancements / direction this is going in (e.g. 
path query language?).  Some of this is already done but it's in various places 
and could be moved.  Unlike other features which conveniently fit into one spot 
in the documentation (like a query parser), this is a more complex issue that 
has multiple aspects – more "cross-cutting", and so IMO doesn't belong in the 
current doc pigeon holes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-08 Thread David Smiley (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737471#comment-16737471
 ] 

David Smiley commented on SOLR-12304:
-

My preferences is to deprecate the MLT handler & component based on its 
apparent redundancy.  Thanks for summarizing the differences.  If you agree 
with that, and if you post an issue accomplishing that, then I would jump on it 
to get that into 8.0 & 7.x.  For starters, that could simply be a deprecated 
annotation, and some notes in the ref guide at appropriate places.  Bonus would 
be a warning logged on first use.

Personally I've been directing my efforts of late to anything related to 8.0, 
and this issue is not aligned with that.  After 8.0 I hope to assist with 
people's patches more, especially ones that are ready to go.

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13053) NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time populated by restoreState



[ 
https://issues.apache.org/jira/browse/SOLR-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737457#comment-16737457
 ] 

ASF subversion and git services commented on SOLR-13053:


Commit 28859fe654f5ebb4335af675150297efc8c8ac88 in lucene-solr's branch 
refs/heads/branch_8x from Cao Manh Dat
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=28859fe ]

SOLR-13053: Upgrade CHANGES.txt


> NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time 
> populated by restoreState
> 
>
> Key: SOLR-13053
> URL: https://issues.apache.org/jira/browse/SOLR-13053
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Minor
> Attachments: SOLR-13053.patch
>
>
> Currently, NodeAddedTrigger and NodeLostTrigger do not reserve added/removed 
> time populated by restoreState.
> I believe that this is the root cause of failures in 
> {{TestSimTriggerIntegration.testNodeLostTriggerRestoreState}} and 
> {{TestSimTriggerIntegration.testNodeAddedTriggerRestoreState}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13053) NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time populated by restoreState

2019-01-08 Thread Cao Manh Dat (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737458#comment-16737458
 ] 

Cao Manh Dat commented on SOLR-13053:
-

Thanks @steve_rowe

> NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time 
> populated by restoreState
> 
>
> Key: SOLR-13053
> URL: https://issues.apache.org/jira/browse/SOLR-13053
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Minor
> Attachments: SOLR-13053.patch
>
>
> Currently, NodeAddedTrigger and NodeLostTrigger do not reserve added/removed 
> time populated by restoreState.
> I believe that this is the root cause of failures in 
> {{TestSimTriggerIntegration.testNodeLostTriggerRestoreState}} and 
> {{TestSimTriggerIntegration.testNodeAddedTriggerRestoreState}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13053) NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time populated by restoreState



[ 
https://issues.apache.org/jira/browse/SOLR-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737451#comment-16737451
 ] 

Steve Rowe commented on SOLR-13053:
---

[~caomanhdat], looks like you neglected to backport your commit today on this 
issue to branch_8x?

> NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time 
> populated by restoreState
> 
>
> Key: SOLR-13053
> URL: https://issues.apache.org/jira/browse/SOLR-13053
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Minor
> Attachments: SOLR-13053.patch
>
>
> Currently, NodeAddedTrigger and NodeLostTrigger do not reserve added/removed 
> time populated by restoreState.
> I believe that this is the root cause of failures in 
> {{TestSimTriggerIntegration.testNodeLostTriggerRestoreState}} and 
> {{TestSimTriggerIntegration.testNodeAddedTriggerRestoreState}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12983) JavabinLoader should avoid creating String Objects and create UTF8CharSequence fields from byte[]

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737443#comment-16737443
 ] 

Steve Rowe commented on SOLR-12983:
---

[~noble.paul], looks like you neglected to backport commits on this issue to 
branch_8x ?

> JavabinLoader should avoid creating String Objects and create 
> UTF8CharSequence  fields from byte[]
> --
>
> Key: SOLR-12983
> URL: https://issues.apache.org/jira/browse/SOLR-12983
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
> Attachments: SOLR-12983.patch
>
>
> Javabin stings already contain Strings in UTF8 byte[] format. String fields 
> can be created directly from those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-12544) ZkStateReader can cache deleted collections and never refresh it

2019-01-08 Thread Varun Thacker (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker resolved SOLR-12544.
--
Resolution: Fixed

I don't think so?I believe the user upgraded to a more recent version and 
didn't run into this.

> ZkStateReader can cache deleted collections and never refresh it
> 
>
> Key: SOLR-12544
> URL: https://issues.apache.org/jira/browse/SOLR-12544
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2.1
>Reporter: Varun Thacker
>Priority: Major
>
> After a delete collection call, CLUSTERSTATUS starts breaking with this error 
> permanently with this error 
> {code:java}
> org.apache.solr.common.SolrException: Error loading config name for 
> collection my_collection
> at 
> org.apache.solr.common.cloud.ZkStateReader.readConfigName(ZkStateReader.java:198)
> at 
> org.apache.solr.handler.admin.ClusterStatus.getClusterStatus(ClusterStatus.java:141)
> ...
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /collections/my_collection
> ...{code}
> SOLR-10720 addresses the problem by skipping over the collection as it was 
> aimed to fix the  race condition between delete collection and CLUSTERSTATUS 
> being called.
>  
> The fact that we see the error never go away , means there is another bug 
> lingering which will make the state never refresh and thus other calls list 
> LIST will always show the collection. 
>  
> This happened with Solr 7.2.1 and doesn't happen very often. But when it does 
> the only solution is to restart the node. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8631) How Nori Tokenizer can deal with Longest-Matching

2019-01-08 Thread Jim Ferenczi (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737430#comment-16737430
 ] 

Jim Ferenczi commented on LUCENE-8631:
--

Thanks for reporting [~gritmind]. Since we give the same cost to all user words 
I think the easiest way to solve this issue would be to implement a 
longest-only match in the user dictionary. We don't check the main dictionary 
when we have matches in the user dictionary so this should only ensure that the 
longest rule that matches wins. This should also speed up the tokenization 
since we'd add a single path in the lattice (instead of all user words that 
match). I'll work on a patch.

> How Nori Tokenizer can deal with Longest-Matching
> -
>
> Key: LUCENE-8631
> URL: https://issues.apache.org/jira/browse/LUCENE-8631
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Yeongsu Kim
>Priority: Major
>
>  
>  
> I think... Nori tokenizer has one issue.
>  
> I don’t understand why “Longest-Matching” is NOT working to Nori tokenizer 
> via config mode (config mode: 
> [https://www.elastic.co/guide/en/elasticsearch/plugins/6.x/analysis-nori-tokenizer.html).]
>  
> Here is an example for explaining what is longest-matching.
> Let assume we have `userdict_ko.txt` including only three Korean single-words 
> such as ‘골드’, ‘브라운’, ‘골드브라운’, and save it to Nori tokenizer. After update, we 
> can see that it outputs two tokens such as ‘골드’ and ‘브라운’ when the input is 
> ‘골드브라운’. (In English: ‘골드’ means ‘gold’, ‘브라운’ means ‘brown’, and ‘골드브라운’ 
> means ‘goldbrown’)
>  
> With this result, we recognize that “Longest-Matching” is NOT working.
> If “Longest-Matching” is working, the output must be ‘골드브라운’, which is the 
> longest matching word in the user dictionary.
>  
> Curiously enough, when we add user dictionary via custom mode (custom mode: 
> [https://github.com/jimczi/nori/blob/master/how-to-custom-dict.asciidoc|https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjimczi%2Fnori%2Fblob%2Fmaster%2Fhow-to-custom-dict.asciidoc=02%7C01%7Chigh_yeongsu%40wemakeprice.com%7C6953d739414e4da5ad1408d67473a6fe%7C6322d5f522044e9d9ca6d18828a04daf%7C0%7C0%7C636824437418170758=5iuNvKr8WJCXlCkJQrf5r3BgDVnF5hpG7l%2BQL0Ok7Aw%3D=0]),
>  we found the result is ‘골드브라운’, where ‘Longest-Matching’ is applied. We 
> think that the reason is because learned Mecab engine automatically generates 
> word costs by its own criteria. We hope this mechanism is also applied to 
> config mode.
>  
> Would you tell me the way to “Longest-Matching” via config mode (not custom) 
> or give me some hints (e.g. where to modify source codes) to solve this 
> problem?
>  
> P.S
> Recently, I've mailed to Jim Ferenczi, who is a developer of Nori, and 
> received his suggestions:
> * Add a way to set a score to each new rule (this way you could set up a 
> negative cost for the compound word that is less than the sum of the two 
> single words.
> * Same as above but the cost is computed from the statistics of the training 
> (like the custom dictionary does when you recompile entirely).
> * Implement longest-match first in the dictionary.
>  
> Thanks for your support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13053) NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time populated by restoreState



[ 
https://issues.apache.org/jira/browse/SOLR-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737397#comment-16737397
 ] 

ASF subversion and git services commented on SOLR-13053:


Commit 88264633e0d61efdeeab0be77ad0506cf7d61d62 in lucene-solr's branch 
refs/heads/branch_7x from Cao Manh Dat
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8826463 ]

SOLR-13053: Upgrade CHANGES.txt


> NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time 
> populated by restoreState
> 
>
> Key: SOLR-13053
> URL: https://issues.apache.org/jira/browse/SOLR-13053
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Minor
> Attachments: SOLR-13053.patch
>
>
> Currently, NodeAddedTrigger and NodeLostTrigger do not reserve added/removed 
> time populated by restoreState.
> I believe that this is the root cause of failures in 
> {{TestSimTriggerIntegration.testNodeLostTriggerRestoreState}} and 
> {{TestSimTriggerIntegration.testNodeAddedTriggerRestoreState}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737386#comment-16737386
 ] 

ASF subversion and git services commented on LUCENE-8527:
-

Commit 283b19a8da6ab9e0b7e9a75b132d3067218d5502 in lucene-solr's branch 
refs/heads/master from Steven Rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=283b19a ]

LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and 
UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji 
tokenization with the '' token type.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737384#comment-16737384
 ] 

ASF subversion and git services commented on LUCENE-8527:
-

Commit e8c65da6bb8be626242cfba18989e497180e82aa in lucene-solr's branch 
refs/heads/branch_7x from Steven Rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e8c65da ]

LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and 
UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji 
tokenization with the '' token type.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13053) NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time populated by restoreState

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737395#comment-16737395
 ] 

ASF subversion and git services commented on SOLR-13053:


Commit 951b4e4c83756d2d5b8592168cdba828a9133ba3 in lucene-solr's branch 
refs/heads/master from Cao Manh Dat
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=951b4e4 ]

SOLR-13053: Upgrade CHANGES.txt


> NodeAddedTrigger and NodeLostTrigger do not reserve added/removed time 
> populated by restoreState
> 
>
> Key: SOLR-13053
> URL: https://issues.apache.org/jira/browse/SOLR-13053
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Minor
> Attachments: SOLR-13053.patch
>
>
> Currently, NodeAddedTrigger and NodeLostTrigger do not reserve added/removed 
> time populated by restoreState.
> I believe that this is the root cause of failures in 
> {{TestSimTriggerIntegration.testNodeLostTriggerRestoreState}} and 
> {{TestSimTriggerIntegration.testNodeAddedTriggerRestoreState}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0



 [ 
https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-6993.

Resolution: Duplicate
  Assignee: Steve Rowe  (was: Robert Muir)

Superseded by LUCENE-8527.

> Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all 
> JFlex-based tokenizers to support Unicode 8.0
> 
>
> Key: LUCENE-6993
> URL: https://issues.apache.org/jira/browse/LUCENE-6993
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Mike Drob
>Assignee: Steve Rowe
>Priority: Major
> Attachments: LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch, 
> LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch, 
> LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the 
> list of TLDs again. Comparing our old list with a new list indicates 800+ new 
> domains, so it would be nice to include them.
> Also the JFlex tokenizer grammars should be upgraded to support Unicode 8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8527) Upgrade JFlex to 1.7.0

2019-01-08 Thread ASF subversion and git services (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-8527.

   Resolution: Fixed
Fix Version/s: master (9.0)
   7.7
   8.0

> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 8.0, 7.7, master (9.0)
>
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0



[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737385#comment-16737385
 ] 

ASF subversion and git services commented on LUCENE-8527:
-

Commit 0e903cab47e98c75d4fe0bb2a33a84e8f3c648ff in lucene-solr's branch 
refs/heads/branch_8x from Steven Rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0e903ca ]

LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and 
UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji 
tokenization with the '' token type.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-08 Thread Lucene/Solr QA (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737374#comment-16737374
 ] 

Lucene/Solr QA commented on LUCENE-8623:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
25s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8623 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12954126/LUCENE-8623.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / a37e2c6 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/149/testReport/ |
| modules | C: lucene/core U: lucene/core |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/149/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13072) Management of markers for nodeLost / nodeAdded events is broken

2019-01-08 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737358#comment-16737358
 ] 

ASF subversion and git services commented on SOLR-13072:


Commit 7db4121b4553568108e1cf91e82c68fc55b6e9f4 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7db4121 ]

SOLR-13072: Use the same wait in other simulated tests where the same race 
condition may occur.


> Management of markers for nodeLost / nodeAdded events is broken
> ---
>
> Key: SOLR-13072
> URL: https://issues.apache.org/jira/browse/SOLR-13072
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 7.5, 7.6, 8.0
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.0, 7.7
>
>
> In order to prevent {{nodeLost}} events from being lost when it's the 
> Overseer leader that is the node that was lost a mechanism was added to 
> record markers for these events by any other live node, in 
> {{ZkController.registerLiveNodesListener()}}. As similar mechanism also 
> exists for {{nodeAdded}} events.
> On Overseer leader restart if the autoscaling configuration didn't contain 
> any triggers that consume {{nodeLost}} events then these markers are removed. 
> If there are 1 or more trigger configs that consume {{nodeLost}} events then 
> these triggers would read the markers, remove them and generate appropriate 
> events.
> However, as the {{NodeMarkersRegistrationTest}} shows this mechanism is 
> broken and susceptible to race conditions.
> It's not unusual to have more than 1 {{nodeLost}} trigger because in addition 
> to any user-defined triggers there's always one that is automatically defined 
> if missing: {{.auto_add_replicas}}. However, if there's more than 1 
> {{nodeLost}} trigger then the process of consuming and removing the markers 
> becomes non-deterministic - each trigger may pick up (and delete) all, none, 
> or some of the markers.
> So as it is now this mechanism is broken if more than 1 {{nodeLost}} or more 
> than 1 {{nodeAdded}} trigger is defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-master-Linux (64bit/jdk-10.0.1) - Build # 23490 - Unstable!

2019-01-08 Thread ASF subversion and git services (JIRA)

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/23490/
Java: 64bit/jdk-10.0.1 -XX:-UseCompressedOops -XX:+UseParallelGC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.cloud.TestCloudSearcherWarming

Error Message:
5 threads leaked from SUITE scope at 
org.apache.solr.cloud.TestCloudSearcherWarming: 1) Thread[id=24114, 
name=ProcessThread(sid:0 cport:46829):, state=WAITING, 
group=TGRP-TestCloudSearcherWarming] at 
java.base@10.0.1/jdk.internal.misc.Unsafe.park(Native Method) at 
java.base@10.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
 at 
java.base@10.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2075)
 at 
java.base@10.0.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
 at 
app//org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:123)
2) Thread[id=24111, name=NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0, 
state=RUNNABLE, group=TGRP-TestCloudSearcherWarming] at 
java.base@10.0.1/sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) 
at 
java.base@10.0.1/sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:265)  
   at 
java.base@10.0.1/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:92)
 at 
java.base@10.0.1/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:89)  
   at 
java.base@10.0.1/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:100) 
at 
app//org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:196)
 at java.base@10.0.1/java.lang.Thread.run(Thread.java:844)3) 
Thread[id=24112, name=SessionTracker, state=TIMED_WAITING, 
group=TGRP-TestCloudSearcherWarming] at 
java.base@10.0.1/java.lang.Object.wait(Native Method) at 
app//org.apache.zookeeper.server.SessionTrackerImpl.run(SessionTrackerImpl.java:147)
4) Thread[id=24113, name=SyncThread:0, state=WAITING, 
group=TGRP-TestCloudSearcherWarming] at 
java.base@10.0.1/jdk.internal.misc.Unsafe.park(Native Method) at 
java.base@10.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
 at 
java.base@10.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2075)
 at 
java.base@10.0.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
 at 
app//org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:127)
5) Thread[id=24110, name=ZkTestServer Run Thread, state=WAITING, 
group=TGRP-TestCloudSearcherWarming] at 
java.base@10.0.1/java.lang.Object.wait(Native Method) at 
java.base@10.0.1/java.lang.Thread.join(Thread.java:1353) at 
java.base@10.0.1/java.lang.Thread.join(Thread.java:1427) at 
app//org.apache.zookeeper.server.NIOServerCnxnFactory.join(NIOServerCnxnFactory.java:313)
 at 
app//org.apache.solr.cloud.ZkTestServer$ZKServerMain.runFromConfig(ZkTestServer.java:343)
 at app//org.apache.solr.cloud.ZkTestServer$2.run(ZkTestServer.java:564)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 5 threads leaked from SUITE 
scope at org.apache.solr.cloud.TestCloudSearcherWarming: 
   1) Thread[id=24114, name=ProcessThread(sid:0 cport:46829):, state=WAITING, 
group=TGRP-TestCloudSearcherWarming]
at java.base@10.0.1/jdk.internal.misc.Unsafe.park(Native Method)
at 
java.base@10.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at 
java.base@10.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2075)
at 
java.base@10.0.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
at 
app//org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:123)
   2) Thread[id=24111, name=NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0, 
state=RUNNABLE, group=TGRP-TestCloudSearcherWarming]
at java.base@10.0.1/sun.nio.ch.EPollArrayWrapper.epollWait(Native 
Method)
at 
java.base@10.0.1/sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:265)
at 
java.base@10.0.1/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:92)
at 
java.base@10.0.1/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:89)
at 
java.base@10.0.1/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:100)
at 
app//org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:196)
at java.base@10.0.1/java.lang.Thread.run(Thread.java:844)
   3) Thread[id=24112, name=SessionTracker, state=TIMED_WAITING, 
group=TGRP-TestCloudSearcherWarming]
at java.base@10.0.1/java.lang.Object.wait(Native Method)
at 
app//org.apache.zookeeper.server.SessionTrackerImpl.run(SessionTrackerImpl.java:147)
   4)

[jira] [Commented] (SOLR-13118) Redesign integration tests for nodeAdded/nodeLost trigger state restoration



[ 
https://issues.apache.org/jira/browse/SOLR-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737332#comment-16737332
 ] 

ASF subversion and git services commented on SOLR-13118:


Commit 612a1d029f69c268761306891c80e405341979e7 in lucene-solr's branch 
refs/heads/branch_7x from Chris M. Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=612a1d0 ]

SOLR-13118: Fix various nodeAdded/nodeLost trigger (integration) tests related 
to restoriung state

This includes some cleanup and refactoring of unrelated test methods in the 
same classes to use new helper methods

(cherry picked from commit 5a513fab8345cd0397435e7ce830268cd3763651)

Conflicts:

solr/core/src/test/org/apache/solr/cloud/autoscaling/sim/SimSolrCloudTestCase.java


> Redesign integration tests for nodeAdded/nodeLost trigger state restoration
> ---
>
> Key: SOLR-13118
> URL: https://issues.apache.org/jira/browse/SOLR-13118
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Major
> Attachments: SOLR-13118.patch
>
>
> The (integration) tests related to autoscaling nodeAdd/nodeLost trigger's and 
> restoring their state are problematic for a lot of reasons.
> Beyond some silly implementation mistakes, a fundemental timing/concurrency 
> issue is that (as designed) the tests have no way to ensure that "after" 
> creating a nodeAdded/nodeLost situation, they can wait for the (first 
> instance of) the trigger to run() and detect the situation (recording it in 
> the trigger's internal state) so that the test can subsequently "update" the 
> trigger, forcing a new instance to restore the old state and then execute the 
> trigger actions.  This can result i na lot of flaky-ness if the triggers 
> don't run when "expected"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-master-Windows (64bit/jdk-9.0.4) - Build # 7684 - Still Unstable!

2019-01-08 Thread ASF subversion and git services (JIRA)

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7684/
Java: 64bit/jdk-9.0.4 -XX:+UseCompressedOops -XX:+UseG1GC

7 tests failed.
FAILED:  org.apache.solr.handler.TestSQLHandler.doTest

Error Message:
--> http://127.0.0.1:53054/collection1_shard2_replica_n1:Failed to execute 
sqlQuery 'select id, field_i, str_s, field_i_p, field_f_p, field_d_p, field_l_p 
from collection1 where (text='()' OR text='') AND text='' order by 
field_i desc' against JDBC connection 'jdbc:calcitesolr:'. Error while 
executing SQL "select id, field_i, str_s, field_i_p, field_f_p, field_d_p, 
field_l_p from collection1 where (text='()' OR text='') AND text='' 
order by field_i desc": java.io.IOException: 
java.util.concurrent.ExecutionException: java.io.IOException: --> 
http://127.0.0.1:53092/collection1_shard2_replica_n6/:can not sort on a field 
w/o docValues unless it is indexed=true uninvertible=true and the type supports 
Uninversion: field_i

Stack Trace:
java.io.IOException: --> 
http://127.0.0.1:53054/collection1_shard2_replica_n1:Failed to execute sqlQuery 
'select id, field_i, str_s, field_i_p, field_f_p, field_d_p, field_l_p from 
collection1 where (text='()' OR text='') AND text='' order by 
field_i desc' against JDBC connection 'jdbc:calcitesolr:'.
Error while executing SQL "select id, field_i, str_s, field_i_p, field_f_p, 
field_d_p, field_l_p from collection1 where (text='()' OR text='') AND 
text='' order by field_i desc": java.io.IOException: 
java.util.concurrent.ExecutionException: java.io.IOException: --> 
http://127.0.0.1:53092/collection1_shard2_replica_n6/:can not sort on a field 
w/o docValues unless it is indexed=true uninvertible=true and the type supports 
Uninversion: field_i
at 
__randomizedtesting.SeedInfo.seed([4AC9482A0FE6E998:ED8DF08E625DFA21]:0)
at 
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:215)
at 
org.apache.solr.handler.TestSQLHandler.getTuples(TestSQLHandler.java:2617)
at 
org.apache.solr.handler.TestSQLHandler.testBasicSelect(TestSQLHandler.java:145)
at org.apache.solr.handler.TestSQLHandler.doTest(TestSQLHandler.java:93)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1070)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1042)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

[jira] [Commented] (SOLR-13118) Redesign integration tests for nodeAdded/nodeLost trigger state restoration



[ 
https://issues.apache.org/jira/browse/SOLR-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737325#comment-16737325
 ] 

ASF subversion and git services commented on SOLR-13118:


Commit 5a60c3e0db26ec3dba119d6a6a44facf2089c77d in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a60c3e ]

SOLR-13118: Fix various nodeAdded/nodeLost trigger (integration) tests related 
to restoriung state

This includes some cleanup and refactoring of unrelated test methods in the 
same classes to use new helper methods

(cherry picked from commit 5a513fab8345cd0397435e7ce830268cd3763651)


> Redesign integration tests for nodeAdded/nodeLost trigger state restoration
> ---
>
> Key: SOLR-13118
> URL: https://issues.apache.org/jira/browse/SOLR-13118
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Major
> Attachments: SOLR-13118.patch
>
>
> The (integration) tests related to autoscaling nodeAdd/nodeLost trigger's and 
> restoring their state are problematic for a lot of reasons.
> Beyond some silly implementation mistakes, a fundemental timing/concurrency 
> issue is that (as designed) the tests have no way to ensure that "after" 
> creating a nodeAdded/nodeLost situation, they can wait for the (first 
> instance of) the trigger to run() and detect the situation (recording it in 
> the trigger's internal state) so that the test can subsequently "update" the 
> trigger, forcing a new instance to restore the old state and then execute the 
> trigger actions.  This can result i na lot of flaky-ness if the triggers 
> don't run when "expected"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud



[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737275#comment-16737275
 ] 

Olivér Szabó commented on SOLR-13101:
-

[~ysee...@gmail.com], FYI, i had a POC with these, see:
https://github.com/oleewere/solr-cloud-storage-poc
https://github.com/apache/ambari-infra/tree/cloud-storage-poc (custom solr 
build based on solr tarball)
(used hdfs client ... worked on only real environments... but included 
localstack, gcs emulator as a container..actually s3a setup can work against 
localstack, but that one is broken)
some notes:
- i replaced hadoop jars with custom hwx ones (those with 2.7.x build contains 
some classes that is not there in apache maven repo ones)
- s3n looked good, s3a seems to be broken but it would require some changes in 
aws-sdk (requires to use shared connection pool, that can be set in http 
client).
- wasb/wasbs looked good
- adlsV2 had some ssl related issues (although it did not used ssl) - some 
cipher problems, i used solr with jdk10 in docker, maybe that caused some issues
- gcs connector uses guava 27, solr is using like 14, so that results a 
ClassDefNotFound exception during loading the gcs fs implementation, maybe that 
can be solved with updating to a new guava or shade gcs-connector jar with the 
dependencies

what i have mostly see, i could create shards then adding documents as well. 
interestingly a simple delete query only deleted like 40% of the documents 
(then request failed)
also after stopping solr containers, write.lock files needs to be deleted from 
cloud storage, it would be nice if we would have an option to delete those on 
startup (not sure solr already have this or not)

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-master #2459: POMs out of sync

2019-01-08 Thread ASF subversion and git services (JIRA)

Build: https://builds.apache.org/job/Lucene-Solr-Maven-master/2459/

No tests ran.

Build Log:
[...truncated 19678 lines...]
BUILD FAILED
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-master/build.xml:672: The 
following error occurred while executing this line:
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-master/build.xml:209: The 
following error occurred while executing this line:
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-master/lucene/build.xml:411:
 The following error occurred while executing this line:
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-master/lucene/common-build.xml:2268:
 The following error occurred while executing this line:
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-master/lucene/common-build.xml:1726:
 The following error occurred while executing this line:
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-master/lucene/common-build.xml:657:
 Error deploying artifact 'org.apache.lucene:lucene-misc:jar': Error deploying 
artifact: Error transferring file

Total time: 9 minutes 50 seconds
Build step 'Invoke Ant' marked build as failure
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12544) ZkStateReader can cache deleted collections and never refresh it

2019-01-08 Thread Erick Erickson (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737260#comment-16737260
 ] 

Erick Erickson commented on SOLR-12544:
---

[~varunthacker] Can this be closed? Was it included in other JIRAs?

> ZkStateReader can cache deleted collections and never refresh it
> 
>
> Key: SOLR-12544
> URL: https://issues.apache.org/jira/browse/SOLR-12544
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2.1
>Reporter: Varun Thacker
>Priority: Major
>
> After a delete collection call, CLUSTERSTATUS starts breaking with this error 
> permanently with this error 
> {code:java}
> org.apache.solr.common.SolrException: Error loading config name for 
> collection my_collection
> at 
> org.apache.solr.common.cloud.ZkStateReader.readConfigName(ZkStateReader.java:198)
> at 
> org.apache.solr.handler.admin.ClusterStatus.getClusterStatus(ClusterStatus.java:141)
> ...
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /collections/my_collection
> ...{code}
> SOLR-10720 addresses the problem by skipping over the collection as it was 
> aimed to fix the  race condition between delete collection and CLUSTERSTATUS 
> being called.
>  
> The fact that we see the error never go away , means there is another bug 
> lingering which will make the state never refresh and thus other calls list 
> LIST will always show the collection. 
>  
> This happened with Solr 7.2.1 and doesn't happen very often. But when it does 
> the only solution is to restart the node. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13072) Management of markers for nodeLost / nodeAdded events is broken



[ 
https://issues.apache.org/jira/browse/SOLR-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737227#comment-16737227
 ] 

ASF subversion and git services commented on SOLR-13072:


Commit a37e2c609cb26dfffa5b88f8a6b3afa2711880a5 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a37e2c6 ]

SOLR-13072: Wait for autoscaling config refresh to finish before modifying the 
cluster
and enable the tests for now.


> Management of markers for nodeLost / nodeAdded events is broken
> ---
>
> Key: SOLR-13072
> URL: https://issues.apache.org/jira/browse/SOLR-13072
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 7.5, 7.6, 8.0
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.0, 7.7
>
>
> In order to prevent {{nodeLost}} events from being lost when it's the 
> Overseer leader that is the node that was lost a mechanism was added to 
> record markers for these events by any other live node, in 
> {{ZkController.registerLiveNodesListener()}}. As similar mechanism also 
> exists for {{nodeAdded}} events.
> On Overseer leader restart if the autoscaling configuration didn't contain 
> any triggers that consume {{nodeLost}} events then these markers are removed. 
> If there are 1 or more trigger configs that consume {{nodeLost}} events then 
> these triggers would read the markers, remove them and generate appropriate 
> events.
> However, as the {{NodeMarkersRegistrationTest}} shows this mechanism is 
> broken and susceptible to race conditions.
> It's not unusual to have more than 1 {{nodeLost}} trigger because in addition 
> to any user-defined triggers there's always one that is automatically defined 
> if missing: {{.auto_add_replicas}}. However, if there's more than 1 
> {{nodeLost}} trigger then the process of consuming and removing the markers 
> becomes non-deterministic - each trigger may pick up (and delete) all, none, 
> or some of the markers.
> So as it is now this mechanism is broken if more than 1 {{nodeLost}} or more 
> than 1 {{nodeAdded}} trigger is defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13116) Add Admin UI login support for Kerberos

2019-01-08 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737238#comment-16737238
 ] 

Jason Gerlowski commented on SOLR-13116:


Thanks for the pointers Kevin; will check them out.

[~janhoy]  I reproduced again this morning and saw the following error in my 
browser's web console.  I'm not familiar enough with how the login page is 
implemented to tell if it's helpful.  But hopefully you find it enlightening:
{code}
Error: wwwHeader is null
@http://solr1:8983/solr/js/angular/controllers/login.js:31:11
invoke@http://solr1:8983/solr/libs/angular.js:4205:14
instantiate@http://solr1:8983/solr/libs/angular.js:4213:27
$ControllerProvider/this.$gethttp://solr1:8983/solr/libs/angular.js:8472:18
link@http://solr1:8983/solr/libs/angular-route.min.js:30:268
invokeLinkFn@http://solr1:8983/solr/libs/angular.js:8236:9
nodeLinkFn@http://solr1:8983/solr/libs/angular.js:7745:11
compositeLinkFn@http://solr1:8983/solr/libs/angular.js:7098:13
publicLinkFn@http://solr1:8983/solr/libs/angular.js:6977:30
boundTranscludeFn@http://solr1:8983/solr/libs/angular.js:7116:16
controllersBoundTransclude@http://solr1:8983/solr/libs/angular.js:7772:18
x@http://solr1:8983/solr/libs/angular-route.min.js:29:364
$broadcast@http://solr1:8983/solr/libs/angular.js:14725:15
m/<@http://solr1:8983/solr/libs/angular-route.min.js:34:426
processQueue@http://solr1:8983/solr/libs/angular.js:13193:27
scheduleProcessQueue/<@http://solr1:8983/solr/libs/angular.js:13209:27
$eval@http://solr1:8983/solr/libs/angular.js:14406:16
$digest@http://solr1:8983/solr/libs/angular.js:14222:15
$apply@http://solr1:8983/solr/libs/angular.js:14511:13
done@http://solr1:8983/solr/libs/angular.js:9669:36
completeRequest@http://solr1:8983/solr/libs/angular.js:9859:7
requestLoaded@http://solr1:8983/solr/libs/angular.js:9800:9
 
{code}

There's nothing that appears relevant in {{solr.log}}.

As for why your kinit command just hung, I've got a guess.  Docker on Linux 
allows the host machine to reach docker containers by IP address.  But docker 
on Mac 
[doesnt|https://docs.docker.com/docker-for-mac/networking/#per-container-ip-addressing-is-not-possible].
  Since running {{kinit}} on the host machine (your macbook) has it try to talk 
to the Kerberos KDC server by IP address, {{kinit}} just hangs because it can't 
route to the docker container hosting the KDC.  That's my theory at least.  If 
you give it a shot on a Linux box, I bet it'll work for you.

Anyway, hopefully you can reproduce it on your own.  But if you still can't 
reproduce, or want a double check that a fix works, happy to run the 
reproduction again.

> Add Admin UI login support for Kerberos
> ---
>
> Key: SOLR-13116
> URL: https://issues.apache.org/jira/browse/SOLR-13116
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, 7.7
>Reporter: Jan Høydahl
>Priority: Major
> Attachments: eventual_auth.png
>
>
> Spinoff from SOLR-7896. Kerberos auth plugin should get Admin UI Login 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12304) Interesting Terms parameter is ignored by MLT Component

2019-01-08 Thread Alessandro Benedetti (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737213#comment-16737213
 ] 

Alessandro Benedetti commented on SOLR-12304:
-

[~dsmiley] any feedback on the latest messages? I would be happy to help, but 
it seems this issue got forgotten.
Should we proceed in the deprecation path?

Or just keep the component and handler for backward compatibility ?

> Interesting Terms parameter is ignored by MLT Component
> ---
>
> Key: SOLR-12304
> URL: https://issues.apache.org/jira/browse/SOLR-12304
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12304.patch, SOLR-12304.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the More Like This component just ignores the mlt.InterestingTerms 
> parameter ( which is usable by the MoreLikeThisHandler).
> Scope of this issue is to fix the bug and add related tests ( which will 
> succeed after the fix )
> *N.B.* MoreLikeThisComponent and MoreLikeThisHandler are very coupled and the 
> tests for the MoreLikeThisHandler are intersecting the MoreLikeThisComponent 
> ones .
>  It is out of scope for this issue any consideration or refactor of that.
>  Other issues will follow.
> *N.B.* out of scope for this issue is the distributed case, which is much 
> more complicated and requires much deeper investigations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12613) Rename "Cloud" tab as "Cluster" in Admin UI

2019-01-08 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737185#comment-16737185
 ] 

Jason Gerlowski edited comment on SOLR-12613 at 1/8/19 2:51 PM:


Why not both?  I think there's general consensus that we would love to improve 
the UI in larger ways, but any larger effort is bound to take longer to get 
going (particularly when few committers are familiar with the UI).  If renaming 
this menu tab helps our users in the interim, and there's going to be at least 
one release before a broader effort might address this, I think people should 
feel welcome to take it on if they've got time.


was (Author: gerlowskija):
Why not both?  I think there's general consensus that we would love to improve 
the UI in larger ways, but any larger effort is bound to take longer to get 
going (particularly when few committers are familiar with the UI).  If renaming 
this menu tab helps our users in the interim, and there's going to be at least 
one release before a broader effort might address this, I think people should 
feel welcome to take it on.

> Rename "Cloud" tab as "Cluster" in Admin UI
> ---
>
> Key: SOLR-12613
> URL: https://issues.apache.org/jira/browse/SOLR-12613
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: newdev
> Fix For: 8.0
>
>
> Spinoff from SOLR-8207. When adding more cluster-wide functionality to the 
> Admin UI, it feels better to name the "Cloud" UI tab as "Cluster".
> In addition to renaming the "Cloud" tab, we should also change the URL part 
> from {{~cloud}} to {{~cluster}}, update reference guide page names, 
> screenshots and references etc.
> I propose this change is not introduced in 7.x due to the impact, so tagged 
> it as fix-version 8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6968) LSH Filter

2019-01-08 Thread Andy Hind (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737192#comment-16737192
 ] 

Andy Hind commented on LUCENE-6968:
---

[~mayyas]     Hi Mayya, there is a good review paper here 
[https://arxiv.org/pdf/1408.2927.pdf].  See sections 3.5.1 and 3.5.2 and 
related references. I have not found the specific comment about bias I was 
trying to locate.

The handwaving view is that empty or missing hashes are biased for many to many 
comparisons. It is difficult to tune the hash parameters for a wide mix of doc 
sizes, and small documents in particular, as the number of hashes increases 
with doc size over some range. It is better to have some value rather than 
none. There is an argument about what value should be used but that is less 
important. Repetition is one way of filling in gaps and making the hash count 
consistent. For two small docs, there is going to be a bit of asymmetry in the 
measure whatever you do. In some cases, like containment, the bias may be a 
good thing :)

Apologies for my slow response.

> LSH Filter
> --
>
> Key: LUCENE-6968
> URL: https://issues.apache.org/jira/browse/LUCENE-6968
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Cao Manh Dat
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 6.2, 7.0
>
> Attachments: LUCENE-6968.4.patch, LUCENE-6968.5.patch, 
> LUCENE-6968.6.patch, LUCENE-6968.patch, LUCENE-6968.patch, LUCENE-6968.patch
>
>
> I'm planning to implement LSH. Which support query like this
> {quote}
> Find similar documents that have 0.8 or higher similar score with a given 
> document. Similarity measurement can be cosine, jaccard, euclid..
> {quote}
> For example. Given following corpus
> {quote}
> 1. Solr is an open source search engine based on Lucene
> 2. Solr is an open source enterprise search engine based on Lucene
> 3. Solr is an popular open source enterprise search engine based on Lucene
> 4. Apache Lucene is a high-performance, full-featured text search engine 
> library written entirely in Java
> {quote}
> We wanna find documents that have 0.6 score in jaccard measurement with this 
> doc
> {quote}
> Solr is an open source search engine
> {quote}
> It will return only docs 1,2 and 3 (MoreLikeThis will also return doc 4)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13128) Upgrade to Apache Tika 1.20

2019-01-08 Thread DW (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DW updated SOLR-13128:
--
Description: 
Apache Tika 1.20 is now available. [https://tika.apache.org/download.html]

The current Solr package 7.6 contains apache tika 1.19 but new CVEs recommend 
to use apache tika 1.20+

  was:Apache Tika 1.20 is now available. The current Solr package 7.6 contains 
apache tika 1.19 but new CVEs recommend to use apache tika 1.20+


> Upgrade to Apache Tika 1.20
> ---
>
> Key: SOLR-13128
> URL: https://issues.apache.org/jira/browse/SOLR-13128
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 7.6
>Reporter: DW
>Priority: Major
>
> Apache Tika 1.20 is now available. [https://tika.apache.org/download.html]
> The current Solr package 7.6 contains apache tika 1.19 but new CVEs recommend 
> to use apache tika 1.20+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13128) Upgrade to Apache Tika 1.20

2019-01-08 Thread DW (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DW updated SOLR-13128:
--
Description: 
Apache Tika 1.20 is now available. [https://tika.apache.org/download.html]

The current Solr package 7.6 contains apache tika 1.19 but new CVEs recommend 
to use apache tika 1.20+

I can provide the related CVE numbers if needed.

  was:
Apache Tika 1.20 is now available. [https://tika.apache.org/download.html]

The current Solr package 7.6 contains apache tika 1.19 but new CVEs recommend 
to use apache tika 1.20+


> Upgrade to Apache Tika 1.20
> ---
>
> Key: SOLR-13128
> URL: https://issues.apache.org/jira/browse/SOLR-13128
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 7.6
>Reporter: DW
>Priority: Major
>
> Apache Tika 1.20 is now available. [https://tika.apache.org/download.html]
> The current Solr package 7.6 contains apache tika 1.19 but new CVEs recommend 
> to use apache tika 1.20+
> I can provide the related CVE numbers if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13128) Upgrade to Apache Tika 1.20

2019-01-08 Thread DW (JIRA)

DW created SOLR-13128:
-

 Summary: Upgrade to Apache Tika 1.20
 Key: SOLR-13128
 URL: https://issues.apache.org/jira/browse/SOLR-13128
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 7.6
Reporter: DW


Apache Tika 1.20 is now available. The current Solr package 7.6 contains apache 
tika 1.19 but new CVEs recommend to use apache tika 1.20+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12613) Rename "Cloud" tab as "Cluster" in Admin UI

2019-01-08 Thread Jason Gerlowski (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737185#comment-16737185
 ] 

Jason Gerlowski commented on SOLR-12613:


Why not both?  I think there's general consensus that we would love to improve 
the UI in larger ways, but any larger effort is bound to take longer to get 
going (particularly when few committers are familiar with the UI).  If renaming 
this menu tab helps our users in the interim, and there's going to be at least 
one release before a broader effort might address this, I think people should 
feel welcome to take it on.

> Rename "Cloud" tab as "Cluster" in Admin UI
> ---
>
> Key: SOLR-12613
> URL: https://issues.apache.org/jira/browse/SOLR-12613
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: newdev
> Fix For: 8.0
>
>
> Spinoff from SOLR-8207. When adding more cluster-wide functionality to the 
> Admin UI, it feels better to name the "Cloud" UI tab as "Cluster".
> In addition to renaming the "Cloud" tab, we should also change the URL part 
> from {{~cloud}} to {{~cluster}}, update reference guide page names, 
> screenshots and references etc.
> I propose this change is not introduced in 7.x due to the impact, so tagged 
> it as fix-version 8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Installation error - No such file or directory: 'javac'

2019-01-08 Thread NDelt

The result of "sudo which javac" is "/usr/bin/javac". I think there is no
problem in javac's path.
As a temporary measure, I changed JDK from Zulu to Oracle and now it works
well.
But I've set Zulu JDK path in setup.py file before type "sudo python
setup.py install" command...so still I don't know what is the cause.


2019년 1월 8일 (화) 오후 10:44, Michael McCandless 님이
작성:

> Likely the "sudo" env does not have javac on its PATH?
>
> Try "sudo which javac" (if you use bash, or any shell that has the "which"
> command).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Jan 8, 2019 at 5:51 AM NDelt  wrote:
>
> > I'm trying to install PyLucene on Ubuntu Linux and WSL, and my
> development
> > environment is Java 8 and Python 3.6.7.
> >
> > *python setup.py build* was successful, but I got an error after input
> > *sudo
> > python setup.py install *command.
> >
> > Error message:
> > *Applied shared mode monkeypatch to:  > '/usr/lib/python3/dist-packages/setuptools/__init__.py'>*
> > *Traceback (most recent call last):*
> > *File "setup.py", line 389, in main process = Popen(args,
> stderr=PIPE)
> >  *
> > *File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
> > restore_signals, start_new_session)*
> > *File "/usr/lib/python3.6/subprocess.py", line 1344, in
> _execute_child
> > raise child_exception_type(errno_num, err_msg, err_filename)*
> > *FileNotFoundError: [Errno 2] No such file or directory: 'javac':
> > 'javac'*
> >
> > *During handling of the above exception, another exception occurred:*
> >
> > *Traceback (most recent call last):*
> > *File "setup.py", line 449, in  main('--debug' in sys.argv)*
> > *File "setup.py", line 391, in main raise sys.exc_info()[0]("%s: %s"
> > %(sys.exc_info()[1], args))*
> > *FileNotFoundError: [Errno 2] No such file or directory: 'javac':
> > 'javac': ['javac', '-d', 'jcc3/classes',
> > 'java/org/apache/jcc/PythonVM.java',
> > 'java/org/apache/jcc/PythonException.java']*
> >
> > But I already installed JDK 8u192 so *java -version*, *javac -version*
> > command works very well.
> >
> > *$ java -version*
> > *openjdk version "1.8.0_192"*
> > *OpenJDK Runtime Environment (Zulu 8.33.0.1-linux64) (build
> 1.8.0_192-b01)*
> > *OpenJDK 64-Bit Server VM (Zulu 8.33.0.1-linux64) (build 25.192-b01,
> mixed
> > mode)*
> >
> > *$ javac -version*
> > *javac 1.8.0_192*
> >
> > How to I fix this problem?
> >
>

[jira] [Commented] (LUCENE-8585) Create jump-tables for DocValues at index-time

2019-01-08 Thread Toke Eskildsen (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737150#comment-16737150
 ] 

Toke Eskildsen commented on LUCENE-8585:


[~jpountz] I have implemented all the code changes you suggested on the pull 
request. Still pending/under discussion is the 200K test-docs in 
{{BaseDocValuesFormatTestCase.doTestNumericsVsStoredFields}} - there are 
already block-spanning tests in place for the lucene 80 codec, so this is 
"just" about coverage.

I see that the 8.0-ball is beginning to roll. How do you see the status of 
LUCENE-8585 and how does it fit into the 8.0-process?

> Create jump-tables for DocValues at index-time
> --
>
> Key: LUCENE-8585
> URL: https://issues.apache.org/jira/browse/LUCENE-8585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.0
>Reporter: Toke Eskildsen
>Priority: Minor
>  Labels: performance
> Attachments: LUCENE-8585.patch, LUCENE-8585.patch, 
> make_patch_lucene8585.sh
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> As noted in LUCENE-7589, lookup of DocValues should use jump-tables to avoid 
> long iterative walks. This is implemented in LUCENE-8374 at search-time 
> (first request for DocValues from a field in a segment), with the benefit of 
> working without changes to existing Lucene 7 indexes and the downside of 
> introducing a startup time penalty and a memory overhead.
> As discussed in LUCENE-8374, the codec should be updated to create these 
> jump-tables at index time. This eliminates the segment-open time & memory 
> penalties, with the potential downside of increasing index-time for DocValues.
> The three elements of LUCENE-8374 should be transferable to index-time 
> without much alteration of the core structures:
>  * {{IndexedDISI}} block offset and index skips: A {{long}} (64 bits) for 
> every 65536 documents, containing the offset of the block in 33 bits and the 
> index (number of set bits) up to the block in 31 bits.
>  It can be build sequentially and should be stored as a simple sequence of 
> consecutive longs for caching of lookups.
>  As it is fairly small, relative to document count, it might be better to 
> simply memory cache it?
>  * {{IndexedDISI}} DENSE (> 4095, < 65536 set bits) blocks: A {{short}} (16 
> bits) for every 8 {{longs}} (512 bits) for a total of 256 bytes/DENSE_block. 
> Each {{short}} represents the number of set bits up to right before the 
> corresponding sub-block of 512 docIDs.
>  The \{{shorts}} can be computed sequentially or when the DENSE block is 
> flushed (probably the easiest). They should be stored as a simple sequence of 
> consecutive shorts for caching of lookups, one logically independent sequence 
> for each DENSE block. The logical position would be one sequence at the start 
> of every DENSE block.
>  Whether it is best to read all the 16 {{shorts}} up front when a DENSE block 
> is accessed or whether it is best to only read any individual {{short}} when 
> needed is not clear at this point.
>  * Variable Bits Per Value: A {{long}} (64 bits) for every 16384 numeric 
> values. Each {{long}} holds the offset to the corresponding block of values.
>  The offsets can be computed sequentially and should be stored as a simple 
> sequence of consecutive {{longs}} for caching of lookups.
>  The vBPV-offsets has the largest space overhead og the 3 jump-tables and a 
> lot of the 64 bits in each long are not used for most indexes. They could be 
> represented as a simple {{PackedInts}} sequence or {{MonotonicLongValues}}, 
> with the downsides of a potential lookup-time overhead and the need for doing 
> the compression after all offsets has been determined.
> I have no experience with the codec-parts responsible for creating 
> index-structures. I'm quite willing to take a stab at this, although I 
> probably won't do much about it before January 2019. Should anyone else wish 
> to adopt this JIRA-issue or co-work on it, I'll be happy to share.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8630) Allow boosting of particular interval sources



[ 
https://issues.apache.org/jira/browse/LUCENE-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737138#comment-16737138
 ] 

Alan Woodward commented on LUCENE-8630:
---

Another paper that looks relevant here: 
http://www.iro.umontreal.ca/~nie/IFT6255/tao-proximity.pdf

> Allow boosting of particular interval sources
> -
>
> Key: LUCENE-8630
> URL: https://issues.apache.org/jira/browse/LUCENE-8630
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8630.patch
>
>
> In positional queries, it's common to want to promote some terms over others; 
> for example, in lists of synonyms you may want the original term to be 
> weighted more, or more specific terms to receive higher weights than less 
> specific ones.
> Span queries have the 'SpanBoostQuery', which is currently broken; and a 
> 'PayloadScoreQuery' which allows direct modification of the score based on 
> stored payloads, but which does not deal well with a mix of terms 
> with-and-without payloads, and which ends up exposing a lot of the terms API, 
> making it very difficult to customize.
> For interval queries, I'd like to try a different approach, adding a 
> float-valued 'boost()' method to IntervalIterator.  This would make it easy 
> to add simple boosts around particular terms in terms lists, and also allow 
> more fine-grained control using payloads without having to expose the 
> mechanics of the PostingsEnum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Installation error - No such file or directory: 'javac'

2019-01-08 Thread Michael McCandless

Likely the "sudo" env does not have javac on its PATH?

Try "sudo which javac" (if you use bash, or any shell that has the "which"
command).

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 8, 2019 at 5:51 AM NDelt  wrote:

> I'm trying to install PyLucene on Ubuntu Linux and WSL, and my development
> environment is Java 8 and Python 3.6.7.
>
> *python setup.py build* was successful, but I got an error after input
> *sudo
> python setup.py install *command.
>
> Error message:
> *Applied shared mode monkeypatch to:  '/usr/lib/python3/dist-packages/setuptools/__init__.py'>*
> *Traceback (most recent call last):*
> *File "setup.py", line 389, in main process = Popen(args, stderr=PIPE)
>  *
> *File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
> restore_signals, start_new_session)*
> *File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
> raise child_exception_type(errno_num, err_msg, err_filename)*
> *FileNotFoundError: [Errno 2] No such file or directory: 'javac':
> 'javac'*
>
> *During handling of the above exception, another exception occurred:*
>
> *Traceback (most recent call last):*
> *File "setup.py", line 449, in  main('--debug' in sys.argv)*
> *File "setup.py", line 391, in main raise sys.exc_info()[0]("%s: %s"
> %(sys.exc_info()[1], args))*
> *FileNotFoundError: [Errno 2] No such file or directory: 'javac':
> 'javac': ['javac', '-d', 'jcc3/classes',
> 'java/org/apache/jcc/PythonVM.java',
> 'java/org/apache/jcc/PythonException.java']*
>
> But I already installed JDK 8u192 so *java -version*, *javac -version*
> command works very well.
>
> *$ java -version*
> *openjdk version "1.8.0_192"*
> *OpenJDK Runtime Environment (Zulu 8.33.0.1-linux64) (build 1.8.0_192-b01)*
> *OpenJDK 64-Bit Server VM (Zulu 8.33.0.1-linux64) (build 25.192-b01, mixed
> mode)*
>
> *$ javac -version*
> *javac 1.8.0_192*
>
> How to I fix this problem?
>

[GitHub] lucene-solr pull request #525: LUCENE-8585: Index-time jump-tables for DocVa...

2019-01-08 Thread tokee

Github user tokee commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/525#discussion_r245993566
  
--- Diff: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/IndexedDISI.java ---
@@ -0,0 +1,601 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene80;
+
+import java.io.DataInput;
+import java.io.IOException;
+
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.store.RandomAccessInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BitSetIterator;
+import org.apache.lucene.util.FixedBitSet;
+import org.apache.lucene.util.RoaringDocIdSet;
+
+/**
+ * Disk-based implementation of a {@link DocIdSetIterator} which can return
+ * the index of the current document, i.e. the ordinal of the current 
document
+ * among the list of documents that this iterator can return. This is 
useful
+ * to implement sparse doc values by only having to encode values for 
documents
+ * that actually have a value.
+ * Implementation-wise, this {@link DocIdSetIterator} is inspired of
+ * {@link RoaringDocIdSet roaring bitmaps} and encodes ranges of {@code 
65536}
+ * documents independently and picks between 3 encodings depending on the
+ * density of the range:
+ *   {@code ALL} if the range contains 65536 documents exactly,
+ *   {@code DENSE} if the range contains 4096 documents or more; in 
that
+ *   case documents are stored in a bit set,
+ *   {@code SPARSE} otherwise, and the lower 16 bits of the doc IDs are
+ *   stored in a {@link DataInput#readShort() short}.
+ * 
+ * Only ranges that contain at least one value are encoded.
+ * This implementation uses 6 bytes per document in the worst-case, 
which happens
+ * in the case that all ranges contain exactly one document.
+ *
+ * 
+ * To avoid O(n) lookup time complexity, with n being the number of 
documents, two lookup
+ * tables are used: A lookup table for block offset and index, and a rank 
structure
+ * for DENSE block index lookups.
+ *
+ * The lookup table is an array of {@code int}-pairs, with a pair for each 
block. It allows for
+ * direct jumping to the block, as opposed to iteration from the current 
position and forward
+ * one block at a time.
+ *
+ * Each int-pair entry consists of 2 logical parts:
+ *
+ * The first 32 bit int holds the index (number of set bits in the blocks) 
up to just before the
+ * wanted block. The maximum number of set bits is the maximum number of 
documents, which is < 2^31.
+ *
+ * The next int holds the offset in bytes into the underlying slice. As 
there is a maximum of 2^16
+ * blocks, it follows that the maximum size of any block must not exceed 
2^15 bytes to avoid
+ * overflow (2^16 bytes if the int is treated as unsigned). This is 
currently the case, with the
+ * largest block being DENSE and using 2^13 + 36 bytes.
+ *
+ * The cache overhead is numDocs/1024 bytes.
+ *
+ * Note: There are 4 types of blocks: ALL, DENSE, SPARSE and non-existing 
(0 set bits).
+ * In the case of non-existing blocks, the entry in the lookup table has 
index equal to the
+ * previous entry and offset equal to the next non-empty block.
+ *
+ * The block lookup table is stored at the end of the total block 
structure.
+ *
+ *
+ * The rank structure for DENSE blocks is an array of byte-pairs with an 
entry for each
+ * sub-block (default 512 bits) out of the 65536 bits in the outer DENSE 
block.
+ *
+ * Each rank-entry states the number of set bits within the block up to 
the bit before the
+ * bit positioned at the start of the sub-block.
+ * Note that that the rank entry of the first sub-block is always 0 and 
that the last entry can
+ * at most be 65536-2 = 65634 and thus will always fit into an byte-pair 
of 16 bits.
+ *

[jira] [Comment Edited] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-08 Thread mosh (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737089#comment-16737089
 ] 

mosh edited comment on SOLR-13125 at 1/8/19 12:46 PM:
--

{quote}However, we could optimize knowing which shards to fetch docs from for 
the top-X if we know how many docs matched the query in the first phase
{quote}
That sounds like it could save a lot of needless fetching in a large cluster.
 How would you tackle this?
 I was thinking a new SearchComponent might suffice.
 WDYT?

Something I noticed while skimming through Sorj; currently aliases are resolved 
in the client, but no indication of router.name is sent.
 Perhaps this should be changed so SolrJ requests are easier to interpret by 
SolrCloud nodes, eliminating the need to check Zookeeper for _router.name_.


was (Author: moshebla):
{quote}However, we could optimize knowing which shards to fetch docs from for 
the top-X if we know how many docs matched the query in the first phase{quote}
That sounds like it could save a lot of needless fetching in a large cluster.
How would you tackle this?
I was thinking a new SearchComponent could suffice.
WDYT?

Something I noticed while skimming through Sorj; currently aliases are resolved 
in the client, but no indication of router.name is sent.
Perhaps this should be changed so SolrJ requests are easier to interpret by 
SolrCloud nodes, eliminating the need to check Zookeeper for _router.name_.

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8631) How Nori Tokenizer can deal with Longest-Matching

2019-01-08 Thread Yeongsu Kim (JIRA)

Yeongsu Kim created LUCENE-8631:
---

 Summary: How Nori Tokenizer can deal with Longest-Matching
 Key: LUCENE-8631
 URL: https://issues.apache.org/jira/browse/LUCENE-8631
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Yeongsu Kim


 

 

I think... Nori tokenizer has one issue.

 

I don’t understand why “Longest-Matching” is NOT working to Nori tokenizer via 
config mode (config mode: 
[https://www.elastic.co/guide/en/elasticsearch/plugins/6.x/analysis-nori-tokenizer.html).]

 

Here is an example for explaining what is longest-matching.

Let assume we have `userdict_ko.txt` including only three Korean single-words 
such as ‘골드’, ‘브라운’, ‘골드브라운’, and save it to Nori tokenizer. After update, we 
can see that it outputs two tokens such as ‘골드’ and ‘브라운’ when the input is 
‘골드브라운’. (In English: ‘골드’ means ‘gold’, ‘브라운’ means ‘brown’, and ‘골드브라운’ means 
‘goldbrown’)

 

With this result, we recognize that “Longest-Matching” is NOT working.

If “Longest-Matching” is working, the output must be ‘골드브라운’, which is the 
longest matching word in the user dictionary.

 

Curiously enough, when we add user dictionary via custom mode (custom mode: 
[https://github.com/jimczi/nori/blob/master/how-to-custom-dict.asciidoc|https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjimczi%2Fnori%2Fblob%2Fmaster%2Fhow-to-custom-dict.asciidoc=02%7C01%7Chigh_yeongsu%40wemakeprice.com%7C6953d739414e4da5ad1408d67473a6fe%7C6322d5f522044e9d9ca6d18828a04daf%7C0%7C0%7C636824437418170758=5iuNvKr8WJCXlCkJQrf5r3BgDVnF5hpG7l%2BQL0Ok7Aw%3D=0]),
 we found the result is ‘골드브라운’, where ‘Longest-Matching’ is applied. We think 
that the reason is because learned Mecab engine automatically generates word 
costs by its own criteria. We hope this mechanism is also applied to config 
mode.

 

Would you tell me the way to “Longest-Matching” via config mode (not custom) or 
give me some hints (e.g. where to modify source codes) to solve this problem?

 

P.S

Recently, I've mailed to Jim Ferenczi, who is a developer of Nori, and received 
his suggestions:

* Add a way to set a score to each new rule (this way you could set up a 
negative cost for the compound word that is less than the sum of the two single 
words.

* Same as above but the cost is computed from the statistics of the training 
(like the custom dictionary does when you recompile entirely).

* Implement longest-match first in the dictionary.

 

Thanks for your support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-08 Thread mosh (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737089#comment-16737089
 ] 

mosh commented on SOLR-13125:
-

{quote}However, we could optimize knowing which shards to fetch docs from for 
the top-X if we know how many docs matched the query in the first phase{quote}
That sounds like it could save a lot of needless fetching in a large cluster.
How would you tackle this?
I was thinking a new SearchComponent could suffice.
WDYT?

Something I noticed while skimming through Sorj; currently aliases are resolved 
in the client, but no indication of router.name is sent.
Perhaps this should be changed so SolrJ requests are easier to interpret by 
SolrCloud nodes, eliminating the need to check Zookeeper for _router.name_.

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8525) throw more specific exception on data corruption

2019-01-08 Thread Robert Muir (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737017#comment-16737017
 ] 

Robert Muir commented on LUCENE-8525:
-

Sorry, this isn't the job of the library to classify exceptions into these 
buckets you seem to want (Such as recoverable or not), we simply do not know 
this. and the EOF case is a perfect example of every other case, where its 
really ambiguous what the cause is.

We can pass along the problems as exceptions and that is really it: no more. 
Historically that has been difficult enough: it is enough to ensure the 
IOException makes it through unswallowed. the exception handling is already too 
complex, it should not be made even more so for artificial reasons.

plain ioexception is perfectly fine. callers really shouldnt be handling 
anything. if you want to try to do this magical determination when an 
ioexception strikes, I think you are going to have to figure out what heuristic 
you want, and write code yourself to do it (e.g. invoke checksum verification 
code or whatever you decide).


> throw more specific exception on data corruption
> 
>
> Key: LUCENE-8525
> URL: https://issues.apache.org/jira/browse/LUCENE-8525
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vladimir Dolzhenko
>Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like 
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>  
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
>  and maybe 
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch 
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence 
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
>  violates its own contract
> {code:java}
> /**
>* @throws CorruptIndexException if the index is corrupt
>* @throws IOException if there is a low-level IO error
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Installation error - No such file or directory: 'javac'

2019-01-08 Thread NDelt

I'm trying to install PyLucene on Ubuntu Linux and WSL, and my development
environment is Java 8 and Python 3.6.7.

*python setup.py build* was successful, but I got an error after input *sudo
python setup.py install *command.

Error message:
*Applied shared mode monkeypatch to: *
*Traceback (most recent call last):*
*File "setup.py", line 389, in main process = Popen(args, stderr=PIPE)
 *
*File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)*
*File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)*
*FileNotFoundError: [Errno 2] No such file or directory: 'javac':
'javac'*

*During handling of the above exception, another exception occurred:*

*Traceback (most recent call last):*
*File "setup.py", line 449, in  main('--debug' in sys.argv)*
*File "setup.py", line 391, in main raise sys.exc_info()[0]("%s: %s"
%(sys.exc_info()[1], args))*
*FileNotFoundError: [Errno 2] No such file or directory: 'javac':
'javac': ['javac', '-d', 'jcc3/classes',
'java/org/apache/jcc/PythonVM.java',
'java/org/apache/jcc/PythonException.java']*

But I already installed JDK 8u192 so *java -version*, *javac -version*
command works very well.

*$ java -version*
*openjdk version "1.8.0_192"*
*OpenJDK Runtime Environment (Zulu 8.33.0.1-linux64) (build 1.8.0_192-b01)*
*OpenJDK 64-Bit Server VM (Zulu 8.33.0.1-linux64) (build 25.192-b01, mixed
mode)*

*$ javac -version*
*javac 1.8.0_192*

How to I fix this problem?

[jira] [Commented] (LUCENE-8630) Allow boosting of particular interval sources



[ 
https://issues.apache.org/jira/browse/LUCENE-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736936#comment-16736936
 ] 

Alan Woodward commented on LUCENE-8630:
---

Interval scoring currently uses the same implementation as SpanScorer, and 
MultiPhraseScorer, but I agree that it would be good to separate things out a 
bit.  This paper suggests combining the BM25 score and a proximity score by 
summing them, and has some ideas for calculating proximities:
http://www.bigdatalab.ac.cn/~gjf/papers/2012/Exploring%20and%20Exploiting%20Proximity%20Statistic%20for%20Information%20Retrieval%20Model.pdf


> Allow boosting of particular interval sources
> -
>
> Key: LUCENE-8630
> URL: https://issues.apache.org/jira/browse/LUCENE-8630
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8630.patch
>
>
> In positional queries, it's common to want to promote some terms over others; 
> for example, in lists of synonyms you may want the original term to be 
> weighted more, or more specific terms to receive higher weights than less 
> specific ones.
> Span queries have the 'SpanBoostQuery', which is currently broken; and a 
> 'PayloadScoreQuery' which allows direct modification of the score based on 
> stored payloads, but which does not deal well with a mix of terms 
> with-and-without payloads, and which ends up exposing a lot of the terms API, 
> making it very difficult to customize.
> For interval queries, I'd like to try a different approach, adding a 
> float-valued 'boost()' method to IntervalIterator.  This would make it easy 
> to add simple boosts around particular terms in terms lists, and also allow 
> more fine-grained control using payloads without having to expose the 
> mechanics of the PostingsEnum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13127) Solr doesn't make difference by request methods

2019-01-08 Thread Geza Nagy (JIRA)

Geza Nagy created SOLR-13127:


 Summary: Solr doesn't make difference by request methods
 Key: SOLR-13127
 URL: https://issues.apache.org/jira/browse/SOLR-13127
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 7.4
 Environment: Ubuntu 16.04

Solr 7.4

Kerberos

Java 8
Reporter: Geza Nagy


I tested SolrCloud with Kerberos auth and found an interesting scenario.

+*Symptom:*+

I tried to call the solr admin api to add a collection and I got back a 
response of 400 because the collection is already exists.

+*What I used:*+

HTTPUrlConnection + hadoop security's Kerberos Authenticator.

[https://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html]

[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/KerberosAuthenticator.java]

 

+*Root cause:*+

The Kerberos Authenticator uses OPTIONS as request method when checks if the 
client is already authenticated and if it is the OPTIONS request reaches the 
solr endpoint and runs the action included in the uri (as per I provide the 
full url to the authenticator.)

So during the authentication the action is performed and when my original 
request hits the endpoint the collection is already made.

And it can happen because there is no functionality in SOLR to handle properly 
the different request methods.

 

In my opinion it's not a proper functionality if I can call any endpoint with 
any request method and accidently perform action while I just want to check if 
I'm authenticated or not.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 8.0

2019-01-08 Thread Alan Woodward

I think the current plan is to do a 7.7 release at the same time as 8.0, to 
handle any last-minute deprecations etc.  So let’s keep those jobs enabled for 
now.

> On 8 Jan 2019, at 09:10, Uwe Schindler  wrote:
> 
> Hi,
>  
> I will start and add the branch_8x jobs to Jenkins once I have some time 
> later today.
>  
> The question: How to proceed with branch_7x? Should we stop using it and 
> release 7.6.x only (so we would use branch_7_6 only for bugfixes), or are we 
> planning to one more Lucene/Solr 7.7? In the latter case I would keep the 
> jenkins jobs enabled for a while.
>  
> Uwe
>  
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de 
> eMail: u...@thetaphi.de 
>  
> From: Alan Woodward mailto:romseyg...@gmail.com>> 
> Sent: Monday, January 7, 2019 11:30 AM
> To: dev@lucene.apache.org 
> Subject: Re: Lucene/Solr 8.0
>  
> OK, Christmas caught up with me a bit… I’ve just created a branch for 8x from 
> master, and am in the process of updating the master branch to version 9.  
> New commits that should be included in the 8.0 release should also be 
> back-ported to branch_8x from master.
>  
> This is not intended as a feature freeze, as I know there are still some 
> things being worked on for 8.0; however, it should let us clean up master by 
> removing as much deprecated code as possible, and give us an idea of any 
> replacement work that needs to be done.
> 
> 
>> On 19 Dec 2018, at 15:13, David Smiley > > wrote:
>>  
>> January.
>>  
>> On Wed, Dec 19, 2018 at 2:04 AM S G > > wrote:
>>> It would be nice to see Solr 8 in January soon as there is an enhancement 
>>> on nested-documents we are waiting to get our hands on.
>>> Any idea when Solr 8 would be out ?
>>>  
>>> Thx
>>> SG
>>>  
>>> On Mon, Dec 17, 2018 at 1:34 PM David Smiley >> > wrote:
 I see 10 JIRA issues matching this filter:   project in (SOLR, LUCENE) AND 
 priority = Blocker and status = open and fixVersion = "master (8.0)" 
click here:
 https://issues.apache.org/jira/issues/?jql=project%20in%20(SOLR%2C%20LUCENE)%20AND%20priority%20%3D%20Blocker%20and%20status%20%3D%20open%20and%20fixVersion%20%3D%20%22master%20(8.0)%22%20
  
 
  
 Thru the end of the month, I intend to work on those issues not yet 
 assigned. 
  
 On Mon, Dec 17, 2018 at 4:51 AM Adrien Grand >>> > wrote:
> +1
> 
> On Mon, Dec 17, 2018 at 10:38 AM Alan Woodward  > wrote:
> >
> > Hi all,
> >
> > Now that 7.6 is out of the door (thanks Nick!) we should think about 
> > cutting the 8.0 branch and moving master to 9.0.  I’ll volunteer to 
> > create the branch this week - say Wednesday?  Then we should have some 
> > time to clean up the master branch and uncover anything that still 
> > needs to be done on 8.0 before we start the release process next year.
> >
> > On 22 Oct 2018, at 18:12, Cassandra Targett  > > wrote:
> >
> > I'm a bit delayed, but +1 on the 7.6 and 8.0 plan from me too.
> >
> > On Fri, Oct 19, 2018 at 7:18 AM Erick Erickson  > > wrote:
> >>
> >> +1, this gives us all a chance to prioritize getting the blockers out
> >> of the way in a careful manner.
> >> On Fri, Oct 19, 2018 at 7:56 AM jim ferenczi  >> > wrote:
> >> >
> >> > +1 too. With this new perspective we could create the branch just 
> >> > after the 7.6 release and target the 8.0 release for January 2019 
> >> > which gives almost 3 month to finish the blockers ?
> >> >
> >> > Le jeu. 18 oct. 2018 à 23:56, David Smiley  >> > > a écrit :
> >> >>
> >> >> +1 to a 7.6 —lots of stuff in there
> >> >> On Thu, Oct 18, 2018 at 4:47 PM Nicholas Knize  >> >> > wrote:
> >> >>>
> >> >>> If we're planning to postpone cutting an 8.0 branch until a few 
> >> >>> weeks from now then I'd like to propose (and volunteer to RM) a 
> >> >>> 7.6 release targeted for late November or early December 
> >> >>> (following the typical 2 month release pattern). It feels like 
> >> >>> this might give a little breathing room for finishing up 8.0 
> >> >>> blockers? And looking at the change log there appear to be a 
> >> >>> healthy list of features, bug fixes, and improvements to both Solr 
> >> >>> and Lucene that warrant a 7.6 release? Personally I wouldn't mind 
> >> >>> releasing the

[jira] [Updated] (SOLR-13123) Free disk based suggester short circuits without exploring all states



 [ 
https://issues.apache.org/jira/browse/SOLR-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-13123:
-
Fix Version/s: master (9.0)

> Free disk based suggester short circuits without exploring all states
> -
>
> Key: SOLR-13123
> URL: https://issues.apache.org/jira/browse/SOLR-13123
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 7.5, 7.6
>Reporter: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 8.0, 7.7, master (9.0)
>
>
> The free disk variable based suggester can stop generating suggestions too 
> early because it short circuits the suggestion loop after the first node that 
> results in a violation (or increases the violation).
> This is further exacerbated by the fact that the replicas in each node are 
> evaluated in the order of index size ascending which causes the smallest 
> replicas to be considered first. If a violation happens in moving this small 
> replica then the larger replicas are never even considered for a move.
> We should consider all possibilities here and stop short circuiting on a 
> violation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13124) Free disk based suggestions do not guard against running out of disk



 [ 
https://issues.apache.org/jira/browse/SOLR-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-13124:
-
Fix Version/s: (was: 8)
   master (9.0)

> Free disk based suggestions do not guard against running out of disk 
> -
>
> Key: SOLR-13124
> URL: https://issues.apache.org/jira/browse/SOLR-13124
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 7.5, 7.6
>Reporter: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 8.0, 7.7, master (9.0)
>
>
> I have seen instances where free disk based optimization suggestions can go 
> overboard and cause too many cores to be moved to a disk causing it to go out 
> of disk space. At the end of the suggestions, the computed freedisk goes 
> negative. We need to add a guard during suggestion computation to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13124) Free disk based suggestions do not guard against running out of disk



 [ 
https://issues.apache.org/jira/browse/SOLR-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-13124:
-
Fix Version/s: 8

> Free disk based suggestions do not guard against running out of disk 
> -
>
> Key: SOLR-13124
> URL: https://issues.apache.org/jira/browse/SOLR-13124
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 7.5, 7.6
>Reporter: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 8.0, 7.7, 8
>
>
> I have seen instances where free disk based optimization suggestions can go 
> overboard and cause too many cores to be moved to a disk causing it to go out 
> of disk space. At the end of the suggestions, the computed freedisk goes 
> negative. We need to add a guard during suggestion computation to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7896) Add a login page for Solr Administrative Interface



 [ 
https://issues.apache.org/jira/browse/SOLR-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-7896:

Fix Version/s: (was: 8.0)

> Add a login page for Solr Administrative Interface
> --
>
> Key: SOLR-7896
> URL: https://issues.apache.org/jira/browse/SOLR-7896
> Project: Solr
>  Issue Type: New Feature
>  Components: Admin UI, Authentication, security
>Affects Versions: 5.2.1
>Reporter: Aaron Greenspan
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: authentication, login, password
> Fix For: master (9.0), 7.7
>
> Attachments: SOLR-7896-bugfix-7jan.patch, 
> SOLR-7896-bugfix-7jan.patch, dispatchfilter-code.png, eventual_auth.png, 
> login-page.png, login-screen-2.png, logout.png, unknown_scheme.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Now that Solr supports Authentication plugins, the missing piece is to be 
> allowed access from Admin UI when authentication is enabled. For this we need
>  * Some plumbing in Admin UI that allows the UI to detect 401 responses and 
> redirect to login page
>  * Possibility to have multiple login pages depending on auth method and 
> redirect to the correct one
>  * [AngularJS HTTP 
> interceptors|https://docs.angularjs.org/api/ng/service/$http#interceptors] to 
> add correct HTTP headers on all requests when user is logged in
> This issue should aim to implement some of the plumbing mentioned above, and 
> make it work with Basic Auth.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 8.0

2019-01-08 Thread Dawid Weiss

Thanks for doing this Alan. I'll handle RAMDirectory* removals from
the new master (LUCENE-8474)

D.

On Tue, Jan 8, 2019 at 10:11 AM Alan Woodward  wrote:
>
> > It looks like someone renamed the "master (8.0)" version of SOLR & LUCENE
> > in Jira to "master (9.0)" but IIUC that's definitely *NOT* correct ...
> > because it means all the stuff that's been committed to origin/master over
> > the past X months won't be listed as "fixed in '8.0'" when people look
> > at jira in the future.
>
> That would be me… I’ll clean it up, thanks for pointing it out Hoss.
>
> > On 7 Jan 2019, at 23:45, Chris Hostetter  wrote:
> >
> > : OK, Christmas caught up with me a bit… I’ve just created a branch for 8x
> > : from master, and am in the process of updating the master branch to
> > : version 9.  New commits that should be included in the 8.0 release
> > : should also be back-ported to branch_8x from master.
> >
> > It looks like someone renamed the "master (8.0)" version of SOLR & LUCENE
> > in Jira to "master (9.0)" but IIUC that's definitely *NOT* correct ...
> > because it means all the stuff that's been committed to origin/master over
> > the past X months won't be listed as "fixed in '8.0'" when people look
> > at jira in the future.
> >
> > I'm pretty sure "master (8.0)" should have been renamed "8.0" and a 
> > completely
> > new version (with a new internal ID in jira) should have been added for
> > "master (9.0)"
> >
> >   Right?
> >
> > (In the meantime, it seems folks have already added new "8.0"
> > versions for SOLR/LUCENE to Jira, which have a handful of issues mapped to
> > them, that will need cleaned up)
> >
> >
> >
> > : > >> >>
> > : > >> >> On Wed, Oct 17, 2018 at 12:13 PM jim ferenczi 
> > mailto:jim.feren...@gmail.com>> wrote:
> > : > >> >>>
> > : > >> >>> Ok thanks for answering.
> > : > >> >>>
> > : > >> >>> > - I think Solr needs a couple more weeks since the work 
> > Dat is doing isn't quite done yet.
> > : > >> >>>
> > : > >> >>> We can wait a few more weeks to create the branch but I 
> > don't think that one action (creating the branch) prevents the other (the 
> > work Dat is doing).
> > : > >> >>> HTTP/2 is one of the blocker for the release but it can be 
> > done in master and backported to the appropriate branch as any other 
> > feature ? We just need an issue with the blocker label to ensure that
> > : > >> >>> we don't miss it ;). Creating the branch early would also 
> > help in case you don't want to release all the work at once in 8.0.0.
> > : > >> >>> Next week was just a proposal, what I meant was soon because 
> > we target a release in a few months.
> > : > >> >>>
> > : > >> >>>
> > : > >> >>> Le mer. 17 oct. 2018 à 17:52, Cassandra Targett 
> > mailto:casstarg...@gmail.com>> a écrit :
> > : > >> 
> > : > >>  IMO next week is a bit too soon for the branch - I think 
> > Solr needs a couple more weeks since the work Dat is doing isn't quite done 
> > yet.
> > : > >> 
> > : > >>  Solr needs the HTTP/2 work Dat has been doing, and he told 
> > me yesterday he feels it is nearly ready to be merged into master. However, 
> > it does require a new release of Jetty to Solr is able to retain Kerberos 
> > authentication support (Dat has been working with that team to help test 
> > the changes Jetty needs to support Kerberos with HTTP/2). They should get 
> > that release out soon, but we are dependent on them a little bit.
> > : > >> 
> > : > >>  He can hopefully reply with more details on his status and 
> > what else needs to be done.
> > : > >> 
> > : > >>  Once Dat merges his work, IMO we should leave it in master 
> > for a little bit. While he has been beasting and testing with Jenkins as he 
> > goes along, I think it would be good to have all the regular master builds 
> > work on it for a little bit also.
> > : > >> 
> > : > >>  Of the other blockers, the only other large-ish one is to 
> > fully remove Trie* fields, which some of us also discussed yesterday and it 
> > seemed we concluded that Solr isn't really ready to do that. The 
> > performance issues with single value lookups are a major obstacle. It would 
> > be nice if someone with a bit more experience with that could comment in 
> > the issue (SOLR-12632) and/or unmark it as a blocker.
> > : > >> 
> > : > >>  Cassandra
> > : > >> 
> > : > >>  On Wed, Oct 17, 2018 at 8:38 AM Erick Erickson 
> > mailto:erickerick...@gmail.com>> wrote:
> > : > >> >
> > : > >> > I find 9 open blockers for 8.0:
> > : > >> >
> > : > >> > 
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20priority%20%3D%20Blocker%20AND%20status%20%3D%20OPEN
> >  
> > 
> > : > >> >
> > : > >>

[jira] [Updated] (LUCENE-8622) Add a MinimumShouldMatch interval iterator



 [ 
https://issues.apache.org/jira/browse/LUCENE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8622:
--
Fix Version/s: (was: 8.0)
   master (9.0)

> Add a MinimumShouldMatch interval iterator
> --
>
> Key: LUCENE-8622
> URL: https://issues.apache.org/jira/browse/LUCENE-8622
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8622.patch, LUCENE-8622.patch, LUCENE-8622.patch
>
>
> It would be useful to be able to search for intervals that span some subgroup 
> of a set of iterators, allowing us to build a 'some of ' or 'at least' 
> operator - ie, search for terms that appear near at least 3 of a list of 5 
> terms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8629) Add some more Interval functions



 [ 
https://issues.apache.org/jira/browse/LUCENE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8629:
--
Fix Version/s: (was: 8.0)
   master (9.0)

> Add some more Interval functions
> 
>
> Key: LUCENE-8629
> URL: https://issues.apache.org/jira/browse/LUCENE-8629
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8629.patch, LUCENE-8629.patch
>
>
> There are a few missing functions from the current group of IntervalsSource 
> definitions available:
> * a BEFORE/AFTER b - similar to an ordered near, but returning only intervals 
> from 'a' rather than an interval spanning both a and b
> * a WITHIN b - inverse of the already available NOT_WITHIN
> * a OVERLAPPING b - inverse of the already available NOT_OVERLAPPING



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 8.0

2019-01-08 Thread Alan Woodward

> It looks like someone renamed the "master (8.0)" version of SOLR & LUCENE 
> in Jira to "master (9.0)" but IIUC that's definitely *NOT* correct ... 
> because it means all the stuff that's been committed to origin/master over 
> the past X months won't be listed as "fixed in '8.0'" when people look 
> at jira in the future.

That would be me… I’ll clean it up, thanks for pointing it out Hoss.

> On 7 Jan 2019, at 23:45, Chris Hostetter  wrote:
> 
> : OK, Christmas caught up with me a bit… I’ve just created a branch for 8x 
> : from master, and am in the process of updating the master branch to 
> : version 9.  New commits that should be included in the 8.0 release 
> : should also be back-ported to branch_8x from master.
> 
> It looks like someone renamed the "master (8.0)" version of SOLR & LUCENE 
> in Jira to "master (9.0)" but IIUC that's definitely *NOT* correct ... 
> because it means all the stuff that's been committed to origin/master over 
> the past X months won't be listed as "fixed in '8.0'" when people look 
> at jira in the future.
> 
> I'm pretty sure "master (8.0)" should have been renamed "8.0" and a 
> completely 
> new version (with a new internal ID in jira) should have been added for 
> "master (9.0)"
> 
>   Right?
> 
> (In the meantime, it seems folks have already added new "8.0" 
> versions for SOLR/LUCENE to Jira, which have a handful of issues mapped to 
> them, that will need cleaned up)
> 
> 
> 
> : > >> >>
> : > >> >> On Wed, Oct 17, 2018 at 12:13 PM jim ferenczi 
> mailto:jim.feren...@gmail.com>> wrote:
> : > >> >>>
> : > >> >>> Ok thanks for answering.
> : > >> >>>
> : > >> >>> > - I think Solr needs a couple more weeks since the work Dat 
> is doing isn't quite done yet.
> : > >> >>>
> : > >> >>> We can wait a few more weeks to create the branch but I don't 
> think that one action (creating the branch) prevents the other (the work Dat 
> is doing).
> : > >> >>> HTTP/2 is one of the blocker for the release but it can be 
> done in master and backported to the appropriate branch as any other feature 
> ? We just need an issue with the blocker label to ensure that
> : > >> >>> we don't miss it ;). Creating the branch early would also help 
> in case you don't want to release all the work at once in 8.0.0.
> : > >> >>> Next week was just a proposal, what I meant was soon because 
> we target a release in a few months.
> : > >> >>>
> : > >> >>>
> : > >> >>> Le mer. 17 oct. 2018 à 17:52, Cassandra Targett 
> mailto:casstarg...@gmail.com>> a écrit :
> : > >> 
> : > >>  IMO next week is a bit too soon for the branch - I think Solr 
> needs a couple more weeks since the work Dat is doing isn't quite done yet.
> : > >> 
> : > >>  Solr needs the HTTP/2 work Dat has been doing, and he told me 
> yesterday he feels it is nearly ready to be merged into master. However, it 
> does require a new release of Jetty to Solr is able to retain Kerberos 
> authentication support (Dat has been working with that team to help test the 
> changes Jetty needs to support Kerberos with HTTP/2). They should get that 
> release out soon, but we are dependent on them a little bit.
> : > >> 
> : > >>  He can hopefully reply with more details on his status and 
> what else needs to be done.
> : > >> 
> : > >>  Once Dat merges his work, IMO we should leave it in master 
> for a little bit. While he has been beasting and testing with Jenkins as he 
> goes along, I think it would be good to have all the regular master builds 
> work on it for a little bit also.
> : > >> 
> : > >>  Of the other blockers, the only other large-ish one is to 
> fully remove Trie* fields, which some of us also discussed yesterday and it 
> seemed we concluded that Solr isn't really ready to do that. The performance 
> issues with single value lookups are a major obstacle. It would be nice if 
> someone with a bit more experience with that could comment in the issue 
> (SOLR-12632) and/or unmark it as a blocker.
> : > >> 
> : > >>  Cassandra
> : > >> 
> : > >>  On Wed, Oct 17, 2018 at 8:38 AM Erick Erickson 
> mailto:erickerick...@gmail.com>> wrote:
> : > >> >
> : > >> > I find 9 open blockers for 8.0:
> : > >> >
> : > >> > 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20priority%20%3D%20Blocker%20AND%20status%20%3D%20OPEN
>  
> 
> : > >> >
> : > >> > As David mentioned, many of the SOlr committers are at 
> Activate, which
> : > >> > ends Thursday so feedback (and work) may be a bit delayed.
> : > >> > On Wed, Oct 17, 2018 at 8:11 AM David Smiley 
> mailto:david.w.smi...@gmail.com>> wrote:
> : > >> > >
> : > >> > > Hi,
> : > >>

RE: Lucene/Solr 8.0

2019-01-08 Thread Uwe Schindler

Hi,

 

I will start and add the branch_8x jobs to Jenkins once I have some time later 
today.

 

The question: How to proceed with branch_7x? Should we stop using it and 
release 7.6.x only (so we would use branch_7_6 only for bugfixes), or are we 
planning to one more Lucene/Solr 7.7? In the latter case I would keep the 
jenkins jobs enabled for a while.

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

http://www.thetaphi.de  

eMail: u...@thetaphi.de

 

From: Alan Woodward  
Sent: Monday, January 7, 2019 11:30 AM
To: dev@lucene.apache.org
Subject: Re: Lucene/Solr 8.0

 

OK, Christmas caught up with me a bit… I’ve just created a branch for 8x from 
master, and am in the process of updating the master branch to version 9.  New 
commits that should be included in the 8.0 release should also be back-ported 
to branch_8x from master.

 

This is not intended as a feature freeze, as I know there are still some things 
being worked on for 8.0; however, it should let us clean up master by removing 
as much deprecated code as possible, and give us an idea of any replacement 
work that needs to be done.





On 19 Dec 2018, at 15:13, David Smiley mailto:david.w.smi...@gmail.com> > wrote:

 

January.

 

On Wed, Dec 19, 2018 at 2:04 AM S G mailto:sg.online.em...@gmail.com> > wrote:

It would be nice to see Solr 8 in January soon as there is an enhancement on 
nested-documents we are waiting to get our hands on.

Any idea when Solr 8 would be out ?

 

Thx

SG

 

On Mon, Dec 17, 2018 at 1:34 PM David Smiley mailto:david.w.smi...@gmail.com> > wrote:

I see 10 JIRA issues matching this filter:   project in (SOLR, LUCENE) AND 
priority = Blocker and status = open and fixVersion = "master (8.0)" 

   click here:

https://issues.apache.org/jira/issues/?jql=project%20in%20(SOLR%2C%20LUCENE)%20AND%20priority%20%3D%20Blocker%20and%20status%20%3D%20open%20and%20fixVersion%20%3D%20%22master%20(8.0)%22%20

 

Thru the end of the month, I intend to work on those issues not yet assigned. 

 

On Mon, Dec 17, 2018 at 4:51 AM Adrien Grand mailto:jpou...@gmail.com> > wrote:

+1

On Mon, Dec 17, 2018 at 10:38 AM Alan Woodward mailto:romseyg...@gmail.com> > wrote:
>
> Hi all,
>
> Now that 7.6 is out of the door (thanks Nick!) we should think about cutting 
> the 8.0 branch and moving master to 9.0.  I’ll volunteer to create the branch 
> this week - say Wednesday?  Then we should have some time to clean up the 
> master branch and uncover anything that still needs to be done on 8.0 before 
> we start the release process next year.
>
> On 22 Oct 2018, at 18:12, Cassandra Targett   > wrote:
>
> I'm a bit delayed, but +1 on the 7.6 and 8.0 plan from me too.
>
> On Fri, Oct 19, 2018 at 7:18 AM Erick Erickson   > wrote:
>>
>> +1, this gives us all a chance to prioritize getting the blockers out
>> of the way in a careful manner.
>> On Fri, Oct 19, 2018 at 7:56 AM jim ferenczi >  > wrote:
>> >
>> > +1 too. With this new perspective we could create the branch just after 
>> > the 7.6 release and target the 8.0 release for January 2019 which gives 
>> > almost 3 month to finish the blockers ?
>> >
>> > Le jeu. 18 oct. 2018 à 23:56, David Smiley > >  > a écrit :
>> >>
>> >> +1 to a 7.6 —lots of stuff in there
>> >> On Thu, Oct 18, 2018 at 4:47 PM Nicholas Knize > >>  > wrote:
>> >>>
>> >>> If we're planning to postpone cutting an 8.0 branch until a few weeks 
>> >>> from now then I'd like to propose (and volunteer to RM) a 7.6 release 
>> >>> targeted for late November or early December (following the typical 2 
>> >>> month release pattern). It feels like this might give a little breathing 
>> >>> room for finishing up 8.0 blockers? And looking at the change log there 
>> >>> appear to be a healthy list of features, bug fixes, and improvements to 
>> >>> both Solr and Lucene that warrant a 7.6 release? Personally I wouldn't 
>> >>> mind releasing the LatLonShape encoding changes in LUCENE-8521 and 
>> >>> selective indexing work done in LUCENE-8496. Any objections or thoughts?
>> >>>
>> >>> - Nick
>> >>>
>> >>>
>> >>> On Thu, Oct 18, 2018 at 5:32 AM Đạt Cao Mạnh > >>>  > wrote:
>> 
>>  Thanks Cassandra and Jim,
>> 
>>  I created a blocker issue for Solr 8.0 SOLR-12883, currently in 
>>  jira/http2 branch there are a draft-unmature implementation of SPNEGO 
>>  authentication which enough to makes the test pass, this implementation 
>>  will be removed when SOLR-12883 gets resolved . Therefore I don't see 
>>  any problem on merging jira/http2 to master branch in the next week.
>> 
>>  On Thu, Oct 18, 2018 at 2:33 AM jim ferenczi >   > wrote:
>> >
>> > > But if you're working with a different assumption - that just the 
>> > >

[JENKINS] Lucene-Solr-7.x-Windows (64bit/jdk1.8.0_172) - Build # 947 - Unstable!