[JENKINS] Lucene-Solr-6.x-Linux (32bit/jdk1.8.0_72) - Build # 237 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/237/ Java: 32bit/jdk1.8.0_72 -server -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.cloud.TestAuthenticationFramework.testStopAllStartAll Error Message: Address already in use Stack Trace: java.net.BindException: Address already in use at __randomizedtesting.SeedInfo.seed([178FF9B920E45E7C:61B1E6CA61D3F353]:0) at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:326) at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:244) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.eclipse.jetty.server.Server.doStart(Server.java:384) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:327) at org.apache.solr.cloud.MiniSolrCloudCluster.startJettySolrRunner(MiniSolrCloudCluster.java:356) at org.apache.solr.cloud.TestMiniSolrCloudCluster.testStopAllStartAll(TestMiniSolrCloudCluster.java:443) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Updated] (SOLR-8903) Move SolrJ DateUtil to Extraction module as ExtractionDateUtil
[ https://issues.apache.org/jira/browse/SOLR-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-8903: --- Fix Version/s: 6.0 > Move SolrJ DateUtil to Extraction module as ExtractionDateUtil > -- > > Key: SOLR-8903 > URL: https://issues.apache.org/jira/browse/SOLR-8903 > Project: Solr > Issue Type: Task > Components: SolrJ >Reporter: David Smiley >Assignee: David Smiley > Fix For: 6.0 > > > SolrJ doesn't need a DateUtil class, particularly since we're on Java 8 and > can simply use {{new Date(Instant.parse(d).toEpochMilli());}} for parsing and > {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for formatting. Yes, > they are threadsafe. I propose that we deprecate DateUtil from SolrJ, or > perhaps outright remove it from SolrJ for Solr 6. The only SolrJ calls into > this class are to essentially use it to format or parse in the ISO standard > format. > I also think we should move it to the "extraction" (SolrCell) module and name > it something like ExtractionDateUtil. See, this class has a parse method > taking a list of formats, and there's a static list of them taken from > HttpClient's DateUtil. DateUtil's original commit was SOLR-284 to be used by > SolrCell, and SolrCell wants this feature. So I think it should move there. > There are a few other uses: > * Morphlines uses it, but morphlines depends on the extraction module so it > could just as well access it if we move it there. > * The ValueAugmenterFactory (a doc transformer). I really doubt whoever > added it realized that DateUtil.parseDate would try a bunch of formats > instead of only supporting the ISO canonical format. So I think we should > just remove this reference. > * DateFormatUtil.parseMathLenient falls back on this, and this method is in > turn called by just one caller -- DateValueSourceParser, registered as > {{ms}}. I don't think we need leniency in use of this function query; values > given to ms should be computer generated in the ISO format. > > edit: added ms(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8904) Switch from SimpleDateFormat to Java 8 DateTimeFormatter.ISO_INSTANT
David Smiley created SOLR-8904: -- Summary: Switch from SimpleDateFormat to Java 8 DateTimeFormatter.ISO_INSTANT Key: SOLR-8904 URL: https://issues.apache.org/jira/browse/SOLR-8904 Project: Solr Issue Type: Task Reporter: David Smiley Assignee: David Smiley Fix For: 6.0 I'd like to move Solr away from SimpleDateFormat to Java 8's java.time.formatter.DateTimeFormatter API, particularly using simply ISO_INSTANT without any custom rules. This especially involves our DateFormatUtil class in Solr core, but also involves DateUtil (I filed SOLR-8903 to deal with additional delete/move/deprecations for that one). In particular, there's {{new Date(Instant.parse(d).toEpochMilli())}} for parsing and {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for formatting. Simple & thread-safe! I want to simply cut over completely without having special custom rules. There are differences in how ISO_INSTANT does things: * Formatting: Milliseconds are 0 padded to 3 digits if the milliseconds is non-zero. Thus 30 milliseconds will have ".030" added on. Our current formatting code emits ".03". * Dates with years after '' (i.e. 1 and beyond, >= 5 digit years): ISO_INSTANT strictly demands a leading '\+' -- it is formatted with a "\+" and if such a year is parsed it *must* have a "\+" or there is an exception. SimpleDateFormatter requires the opposite -- no '+' and and if you tried to give it one, it would throw an exception. * Currently we don't support negative years (resulting in invisible errors mostly!). ISO_INSTANT supports this! In addition, DateFormatUtil.parseDate currently allows the trailing 'Z' to be optional, but the only caller that could exploit this is the analytics module. I'd like to remove the optional-ness of 'Z' and inline this method away to {{new Date(Instant.parse(d).toEpochMilli())}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5604) Remove deprecations caused by httpclient 4.3.x upgrade
[ https://issues.apache.org/jira/browse/SOLR-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211342#comment-15211342 ] Mark Miller commented on SOLR-5604: --- I am working on getting this done now as part of SOLR-4509. > Remove deprecations caused by httpclient 4.3.x upgrade > -- > > Key: SOLR-5604 > URL: https://issues.apache.org/jira/browse/SOLR-5604 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.7 >Reporter: Shawn Heisey > Fix For: 4.9, master > > Attachments: SOLR-5604-4x-just-lucene.patch, SOLR-5604.patch > > > SOLR-5590 upgraded httpclient in Solr and Lucene to version 4.3.x. This > version deprecates a LOT of classes and methods, recommending that they all > be replaced with various methods from the HttpClientBuilder class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8903) Move SolrJ DateUtil to Extraction module as ExtractionDateUtil
[ https://issues.apache.org/jira/browse/SOLR-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-8903: --- Description: SolrJ doesn't need a DateUtil class, particularly since we're on Java 8 and can simply use {{new Date(Instant.parse(d).toEpochMilli());}} for parsing and {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for formatting. Yes, they are threadsafe. I propose that we deprecate DateUtil from SolrJ, or perhaps outright remove it from SolrJ for Solr 6. The only SolrJ calls into this class are to essentially use it to format or parse in the ISO standard format. I also think we should move it to the "extraction" (SolrCell) module and name it something like ExtractionDateUtil. See, this class has a parse method taking a list of formats, and there's a static list of them taken from HttpClient's DateUtil. DateUtil's original commit was SOLR-284 to be used by SolrCell, and SolrCell wants this feature. So I think it should move there. There are a few other uses: * Morphlines uses it, but morphlines depends on the extraction module so it could just as well access it if we move it there. * The ValueAugmenterFactory (a doc transformer). I really doubt whoever added it realized that DateUtil.parseDate would try a bunch of formats instead of only supporting the ISO canonical format. So I think we should just remove this reference. * DateFormatUtil.parseMathLenient falls back on this, and this method is in turn called by just one caller -- DateValueSourceParser, registered as {{ms}}. I don't think we need leniency in use of this function query; values given to ms should be computer generated in the ISO format. edit: added ms(). was: SolrJ doesn't need a DateUtil class, particularly since we're on Java 8 and can simply use {{new Date(Instant.parse(d).toEpochMilli());}} for parsing and {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for formatting. Yes, they are threadsafe. I propose that we deprecate DateUtil from SolrJ, or perhaps outright remove it from SolrJ for Solr 6. The only SolrJ calls into this class are to essentially use it to format or parse in the ISO standard format. I also think we should move it to the "extraction" (SolrCell) module and name it something like ExtractionDateUtil. See, this class has a parse method taking a list of formats, and there's a static list of them taken from HttpClient's DateUtil. DateUtil's original commit was SOLR-284 to be used by SolrCell, and SolrCell wants this feature. So I think it should move there. There are a couple other uses: * Morphlines uses it, but morphlines depends on the extraction module so it could just as well access it if we move it there. * The ValueAugmenterFactory (a doc transformer). I really doubt whoever added it realized that DateUtil.parseDate would try a bunch of formats instead of only supporting the ISO canonical format. So I think we should just remove this reference. > Move SolrJ DateUtil to Extraction module as ExtractionDateUtil > -- > > Key: SOLR-8903 > URL: https://issues.apache.org/jira/browse/SOLR-8903 > Project: Solr > Issue Type: Task > Components: SolrJ >Reporter: David Smiley >Assignee: David Smiley > > SolrJ doesn't need a DateUtil class, particularly since we're on Java 8 and > can simply use {{new Date(Instant.parse(d).toEpochMilli());}} for parsing and > {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for formatting. Yes, > they are threadsafe. I propose that we deprecate DateUtil from SolrJ, or > perhaps outright remove it from SolrJ for Solr 6. The only SolrJ calls into > this class are to essentially use it to format or parse in the ISO standard > format. > I also think we should move it to the "extraction" (SolrCell) module and name > it something like ExtractionDateUtil. See, this class has a parse method > taking a list of formats, and there's a static list of them taken from > HttpClient's DateUtil. DateUtil's original commit was SOLR-284 to be used by > SolrCell, and SolrCell wants this feature. So I think it should move there. > There are a few other uses: > * Morphlines uses it, but morphlines depends on the extraction module so it > could just as well access it if we move it there. > * The ValueAugmenterFactory (a doc transformer). I really doubt whoever > added it realized that DateUtil.parseDate would try a bunch of formats > instead of only supporting the ISO canonical format. So I think we should > just remove this reference. > * DateFormatUtil.parseMathLenient falls back on this, and this method is in > turn called by just one caller -- DateValueSourceParser, registered as > {{ms}}. I don't think we need leniency in use of this function query; values > given to ms
[jira] [Commented] (SOLR-2774) broken cut/paste code for dealing with parsing/formatting dates
[ https://issues.apache.org/jira/browse/SOLR-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211329#comment-15211329 ] David Smiley commented on SOLR-2774: I'm not sure what the issue here is... but I presume you mean "DateUtil" not DateUtils. I see no DateUtils. I just filed SOLR-8903 which might obsolete this issue? > broken cut/paste code for dealing with parsing/formatting dates > --- > > Key: SOLR-2774 > URL: https://issues.apache.org/jira/browse/SOLR-2774 > Project: Solr > Issue Type: Sub-task >Reporter: Hoss Man > > DateUtils has methods cut/paste from DateField and TestResponseWriter which > are (in both cases) broken and since fixed in other issues. that code either > needs removed or refactored so there is only a single (correct) copy of it. > see parent issue for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8903) Move SolrJ DateUtil to Extraction module as ExtractionDateUtil
David Smiley created SOLR-8903: -- Summary: Move SolrJ DateUtil to Extraction module as ExtractionDateUtil Key: SOLR-8903 URL: https://issues.apache.org/jira/browse/SOLR-8903 Project: Solr Issue Type: Task Components: SolrJ Reporter: David Smiley Assignee: David Smiley SolrJ doesn't need a DateUtil class, particularly since we're on Java 8 and can simply use {{new Date(Instant.parse(d).toEpochMilli());}} for parsing and {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for formatting. Yes, they are threadsafe. I propose that we deprecate DateUtil from SolrJ, or perhaps outright remove it from SolrJ for Solr 6. The only SolrJ calls into this class are to essentially use it to format or parse in the ISO standard format. I also think we should move it to the "extraction" (SolrCell) module and name it something like ExtractionDateUtil. See, this class has a parse method taking a list of formats, and there's a static list of them taken from HttpClient's DateUtil. DateUtil's original commit was SOLR-284 to be used by SolrCell, and SolrCell wants this feature. So I think it should move there. There are a couple other uses: * Morphlines uses it, but morphlines depends on the extraction module so it could just as well access it if we move it there. * The ValueAugmenterFactory (a doc transformer). I really doubt whoever added it realized that DateUtil.parseDate would try a bunch of formats instead of only supporting the ISO canonical format. So I think we should just remove this reference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-EA] Lucene-Solr-6.x-Linux (64bit/jdk-9-jigsaw-ea+110) - Build # 235 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/235/ Java: 64bit/jdk-9-jigsaw-ea+110 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:-CompactStrings 1 tests failed. FAILED: org.apache.solr.handler.TestReqParamsAPI.test Error Message: Could not get expected value 'CY val' for path 'params/c' full output: { "responseHeader":{ "status":0, "QTime":0}, "params":{ "a":"A val", "b":"B val", "wt":"json", "useParams":""}, "context":{ "webapp":"", "path":"/dump1", "httpMethod":"GET"}} Stack Trace: java.lang.AssertionError: Could not get expected value 'CY val' for path 'params/c' full output: { "responseHeader":{ "status":0, "QTime":0}, "params":{ "a":"A val", "b":"B val", "wt":"json", "useParams":""}, "context":{ "webapp":"", "path":"/dump1", "httpMethod":"GET"}} at __randomizedtesting.SeedInfo.seed([D6177184D9890A3F:5E434E5E777567C7]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.core.TestSolrConfigHandler.testForResponseElement(TestSolrConfigHandler.java:458) at org.apache.solr.handler.TestReqParamsAPI.testReqParams(TestReqParamsAPI.java:177) at org.apache.solr.handler.TestReqParamsAPI.test(TestReqParamsAPI.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:531) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:996) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:971) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
Git Policy/Preference: how to fold in feature branches? squash?
SOLR-445 is fairly well done, and i'm ready to land it on master (and eventually branch_6x) My impression is that the sanest way to do that is... # on master ... git merge --squash jira/SOLR-445 emacs CHANGES.txt # to add CHANGES entry git commit -m "Merging branch jira/SOLR-445 to master" # now master has a single commit #DEADBEEF for all SOLR-445 related # work. but the full history of decisions made is still in branch # jira/SOLR-445 # later, once things have baked on master for a bit. # on branch_6x ... git cherry-pick DEADBEEF ...because, unless i'm missing something, if i don't use --squash then every intermediate commit to branch jira/SOLR-445 will be replayed on master, and that doesn't really seem neccessary/helpful to most people. Or am i missing something? is a regular "git merge --no-squash" prefered for some reason i'm overlooking? -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4509) Move to non deprecated HttpClient impl classes to remove stale connection check on every request and move connection lifecycle management towards the client.
[ https://issues.apache.org/jira/browse/SOLR-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211155#comment-15211155 ] Mark Miller commented on SOLR-4509: --- bq. Might be able to fix those APIs now, but we will see. Actually, it's probably not very useful given we cannot put the new HttpClient APIs in the plugin API, and we basically forced to expose that. > Move to non deprecated HttpClient impl classes to remove stale connection > check on every request and move connection lifecycle management towards the > client. > - > > Key: SOLR-4509 > URL: https://issues.apache.org/jira/browse/SOLR-4509 > Project: Solr > Issue Type: Improvement > Components: search > Environment: 5 node SmartOS cluster (all nodes living in same global > zone - i.e. same physical machine) >Reporter: Ryan Zezeski >Assignee: Mark Miller >Priority: Minor > Fix For: 5.0, master > > Attachments: > 0001-SOLR-4509-Move-to-non-deprecated-HttpClient-impl-cla.patch, > 0001-SOLR-4509-Move-to-non-deprecated-HttpClient-impl-cla.patch, > IsStaleTime.java, SOLR-4509-4_4_0.patch, SOLR-4509.patch, SOLR-4509.patch, > SOLR-4509.patch, SOLR-4509.patch, SOLR-4509.patch, SOLR-4509.patch, > SOLR-4509.patch, SOLR-4509.patch, SOLR-4509.patch, > baremetal-stale-nostale-med-latency.dat, > baremetal-stale-nostale-med-latency.svg, > baremetal-stale-nostale-throughput.dat, baremetal-stale-nostale-throughput.svg > > > By disabling the Apache HTTP Client stale check I've witnessed a 2-4x > increase in throughput and reduction of over 100ms. This patch was made in > the context of a project I'm leading, called Yokozuna, which relies on > distributed search. > Here's the patch on Yokozuna: https://github.com/rzezeski/yokozuna/pull/26 > Here's a write-up I did on my findings: > http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html > I'm happy to answer any questions or make changes to the patch to make it > acceptable. > ReviewBoard: https://reviews.apache.org/r/28393/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4509) Move to non deprecated HttpClient impl classes to remove stale connection check on every request and move connection lifecycle management towards the client.
[ https://issues.apache.org/jira/browse/SOLR-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4509: -- Attachment: 0001-SOLR-4509-Move-to-non-deprecated-HttpClient-impl-cla.patch Here is a much more functionally complete patch. Tests should be passing, but a few are ignored. Much closer to done, but still some things to do. > Move to non deprecated HttpClient impl classes to remove stale connection > check on every request and move connection lifecycle management towards the > client. > - > > Key: SOLR-4509 > URL: https://issues.apache.org/jira/browse/SOLR-4509 > Project: Solr > Issue Type: Improvement > Components: search > Environment: 5 node SmartOS cluster (all nodes living in same global > zone - i.e. same physical machine) >Reporter: Ryan Zezeski >Assignee: Mark Miller >Priority: Minor > Fix For: 5.0, master > > Attachments: > 0001-SOLR-4509-Move-to-non-deprecated-HttpClient-impl-cla.patch, > 0001-SOLR-4509-Move-to-non-deprecated-HttpClient-impl-cla.patch, > IsStaleTime.java, SOLR-4509-4_4_0.patch, SOLR-4509.patch, SOLR-4509.patch, > SOLR-4509.patch, SOLR-4509.patch, SOLR-4509.patch, SOLR-4509.patch, > SOLR-4509.patch, SOLR-4509.patch, SOLR-4509.patch, > baremetal-stale-nostale-med-latency.dat, > baremetal-stale-nostale-med-latency.svg, > baremetal-stale-nostale-throughput.dat, baremetal-stale-nostale-throughput.svg > > > By disabling the Apache HTTP Client stale check I've witnessed a 2-4x > increase in throughput and reduction of over 100ms. This patch was made in > the context of a project I'm leading, called Yokozuna, which relies on > distributed search. > Here's the patch on Yokozuna: https://github.com/rzezeski/yokozuna/pull/26 > Here's a write-up I did on my findings: > http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html > I'm happy to answer any questions or make changes to the patch to make it > acceptable. > ReviewBoard: https://reviews.apache.org/r/28393/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7374) Backup/Restore should provide a param for specifying the directory implementation it should use
[ https://issues.apache.org/jira/browse/SOLR-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211148#comment-15211148 ] Hrishikesh Gadre commented on SOLR-7374: [~varunthacker] Sure. I think at the high-level we can expose API to configure Directory configurations for storing snapshots (e.g. local filesystem, HDFS, S3 etc.). As part of the implementation, we can store this config to ZK. Once this mechanism in place, we can pass appropriate directory configuration during the backup command execution. Does this make sense? CC [~dsmiley] (who is also interested in SOLR-5750) > Backup/Restore should provide a param for specifying the directory > implementation it should use > --- > > Key: SOLR-7374 > URL: https://issues.apache.org/jira/browse/SOLR-7374 > Project: Solr > Issue Type: Bug >Reporter: Varun Thacker >Assignee: Varun Thacker > Fix For: 5.2, master > > > Currently when we create a backup we use SimpleFSDirectory to write the > backup indexes. Similarly during a restore we open the index using > FSDirectory.open . > We should provide a param called {{directoryImpl}} or {{type}} which will be > used to specify the Directory implementation to backup the index. > Likewise during a restore you would need to specify the directory impl which > was used during backup so that the index can be opened correctly. > This param will address the problem that currently if a user is running Solr > on HDFS there is no way to use the backup/restore functionality as the > directory is hardcoded. > With this one could be running Solr on a local FS but backup the index on > HDFS etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8902) ReturnFields can return fields that were not requested
[ https://issues.apache.org/jira/browse/SOLR-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-8902: Attachment: SOLR-8902.diff Here is a simple patch with a test > ReturnFields can return fields that were not requested > -- > > Key: SOLR-8902 > URL: https://issues.apache.org/jira/browse/SOLR-8902 > Project: Solr > Issue Type: Bug > Components: Response Writers >Reporter: Ryan McKinley >Assignee: Ryan McKinley >Priority: Minor > Fix For: 6.1, trunk > > Attachments: SOLR-8902.diff > > > It looks like something changed that now returns all fields requested from > lucene, not just the ones request from solr. > This is the difference between 'fields' and 'okFieldNames' in > SolrReturnFields. > The logic here: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrReturnFields.java#L141 > adds all the 'fields' to 'okFieldName' > I think that should be removed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8902) ReturnFields can return fields that were not requested
[ https://issues.apache.org/jira/browse/SOLR-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-8902: Summary: ReturnFields can return fields that were not requested (was: ReturnFields returns fields that were not requested) > ReturnFields can return fields that were not requested > -- > > Key: SOLR-8902 > URL: https://issues.apache.org/jira/browse/SOLR-8902 > Project: Solr > Issue Type: Bug > Components: Response Writers >Reporter: Ryan McKinley >Assignee: Ryan McKinley >Priority: Minor > Fix For: 6.1, trunk > > > It looks like something changed that now returns all fields requested from > lucene, not just the ones request from solr. > This is the difference between 'fields' and 'okFieldNames' in > SolrReturnFields. > The logic here: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrReturnFields.java#L141 > adds all the 'fields' to 'okFieldName' > I think that should be removed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8902) ReturnFields returns fields that were not requested
[ https://issues.apache.org/jira/browse/SOLR-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-8902: Summary: ReturnFields returns fields that were not requested (was: ReturnFields allows fields that were not requested) > ReturnFields returns fields that were not requested > --- > > Key: SOLR-8902 > URL: https://issues.apache.org/jira/browse/SOLR-8902 > Project: Solr > Issue Type: Bug > Components: Response Writers >Reporter: Ryan McKinley >Assignee: Ryan McKinley >Priority: Minor > Fix For: 6.1, trunk > > > It looks like something changed that now returns all fields requested from > lucene, not just the ones request from solr. > This is the difference between 'fields' and 'okFieldNames' in > SolrReturnFields. > The logic here: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrReturnFields.java#L141 > adds all the 'fields' to 'okFieldName' > I think that should be removed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8902) ReturnFields allows fields that were not requested
Ryan McKinley created SOLR-8902: --- Summary: ReturnFields allows fields that were not requested Key: SOLR-8902 URL: https://issues.apache.org/jira/browse/SOLR-8902 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Fix For: 6.1, trunk It looks like something changed that now returns all fields requested from lucene, not just the ones request from solr. This is the difference between 'fields' and 'okFieldNames' in SolrReturnFields. The logic here: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrReturnFields.java#L141 adds all the 'fields' to 'okFieldName' I think that should be removed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7374) Backup/Restore should provide a param for specifying the directory implementation it should use
[ https://issues.apache.org/jira/browse/SOLR-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211093#comment-15211093 ] Varun Thacker commented on SOLR-7374: - Hi Hrishikesh, Please feel free to work on it. I've not got time to even look back at this issue unfortunately. I'll be glad to review if you work on a patch though > Backup/Restore should provide a param for specifying the directory > implementation it should use > --- > > Key: SOLR-7374 > URL: https://issues.apache.org/jira/browse/SOLR-7374 > Project: Solr > Issue Type: Bug >Reporter: Varun Thacker >Assignee: Varun Thacker > Fix For: 5.2, master > > > Currently when we create a backup we use SimpleFSDirectory to write the > backup indexes. Similarly during a restore we open the index using > FSDirectory.open . > We should provide a param called {{directoryImpl}} or {{type}} which will be > used to specify the Directory implementation to backup the index. > Likewise during a restore you would need to specify the directory impl which > was used during backup so that the index can be opened correctly. > This param will address the problem that currently if a user is running Solr > on HDFS there is no way to use the backup/restore functionality as the > directory is hardcoded. > With this one could be running Solr on a local FS but backup the index on > HDFS etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8901) Export data/tlog directory locations for every core in clusterstate
Hrishikesh Gadre created SOLR-8901: -- Summary: Export data/tlog directory locations for every core in clusterstate Key: SOLR-8901 URL: https://issues.apache.org/jira/browse/SOLR-8901 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Hrishikesh Gadre Currently the data and tlog directory path is not exposed as part of the clusterstate.json. This information is important for implementing HDFS based Solr snapshots. In case of HDFS based snapshots, the overseer will figure out the correct HDFS path for the Solr collection and invoke HDFS API to capture the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8030) Transaction log does not store the update chain used for updates
[ https://issues.apache.org/jira/browse/SOLR-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211031#comment-15211031 ] Ludovic Boutros commented on SOLR-8030: --- I learned it the hard way too [~dsmiley]. I'll try to take some times on this next WE. Thx. > Transaction log does not store the update chain used for updates > > > Key: SOLR-8030 > URL: https://issues.apache.org/jira/browse/SOLR-8030 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.3 >Reporter: Ludovic Boutros > Attachments: SOLR-8030.patch > > > Transaction Log does not store the update chain used during updates. > Therefore tLog uses the default update chain during log replay. > If we implement custom update logic with multiple update chains, the log > replay could break this logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8900) The ObjectReleaseTracker should not reference actual objects.
Mark Miller created SOLR-8900: - Summary: The ObjectReleaseTracker should not reference actual objects. Key: SOLR-8900 URL: https://issues.apache.org/jira/browse/SOLR-8900 Project: Solr Issue Type: Improvement Components: Tests Reporter: Mark Miller Assignee: Mark Miller -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7878) Use SortedNumericDocValues (efficient sort & facet on multi-valued numeric fields)
[ https://issues.apache.org/jira/browse/SOLR-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210995#comment-15210995 ] Yonik Seeley commented on SOLR-7878: A perfect time to do this cut-over will be with the addition of the new Point based fields (IntPoint,LongPoint,etc)... This also would allow us to leave TrieField unchanged for back compat. > Use SortedNumericDocValues (efficient sort & facet on multi-valued numeric > fields) > -- > > Key: SOLR-7878 > URL: https://issues.apache.org/jira/browse/SOLR-7878 > Project: Solr > Issue Type: Improvement > Components: Facet Module >Reporter: David Smiley > > Lucene has a SortedNumericDocValues (i.e. multi-valued numeric DocValues), > ever since late in the 4x versions. Solr's TrieField.createFields > unfortunately still uses SortedSetDocValues for the multi-valued case. > SortedNumericDocValues is more efficient than SortedSetDocValues; for example > there is no 'ordinal' mapping for sorting/faceting needed. > Unfortunately, updating Solr here would be quite a bit of work, since there > are backwards-compatibility concerns, and faceting code would need a new code > path implementation just for this. Sorting is relatively simple thanks to > SortedNumericSortField, and today multi-valued sorting isn't directly > possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8899) MergeStrategy code creates HttpClient(s) and HttoSolrClient(s) that it does not close.
Mark Miller created SOLR-8899: - Summary: MergeStrategy code creates HttpClient(s) and HttoSolrClient(s) that it does not close. Key: SOLR-8899 URL: https://issues.apache.org/jira/browse/SOLR-8899 Project: Solr Issue Type: Bug Reporter: Mark Miller -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-jigsaw-ea+110) - Build # 16321 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/16321/ Java: 32bit/jdk-9-jigsaw-ea+110 -client -XX:+UseParallelGC -XX:-CompactStrings 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: ObjectTracker found 4 object(s) that were not released!!! [MockDirectoryWrapper, MockDirectoryWrapper, SolrCore, MDCAwareThreadPoolExecutor] Stack Trace: java.lang.AssertionError: ObjectTracker found 4 object(s) that were not released!!! [MockDirectoryWrapper, MockDirectoryWrapper, SolrCore, MDCAwareThreadPoolExecutor] at __randomizedtesting.SeedInfo.seed([86700937836D4776]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:238) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:531) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:834) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at java.lang.Thread.run(java.base@9-ea/Thread.java:804) FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: 1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) Thread[id=7229, name=searcherExecutor-3512-thread-1, state=WAITING, group=TGRP-TestLazyCores] at jdk.internal.misc.Unsafe.park(java.base@9-ea/Native Method) at java.util.concurrent.locks.LockSupport.park(java.base@9-ea/LockSupport.java:190) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@9-ea/AbstractQueuedSynchronizer.java:2064) at java.util.concurrent.LinkedBlockingQueue.take(java.base@9-ea/LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@9-ea/ThreadPoolExecutor.java:1083) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@9-ea/ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@9-ea/ThreadPoolExecutor.java:632) at java.lang.Thread.run(java.base@9-ea/Thread.java:804) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) Thread[id=7229, name=searcherExecutor-3512-thread-1, state=WAITING, group=TGRP-TestLazyCores] at jdk.internal.misc.Unsafe.park(java.base@9-ea/Native Method) at java.util.concurrent.locks.LockSupport.park(java.base@9-ea/LockSupport.java:190) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@9-ea/AbstractQueuedSynchronizer.java:2064) at java.util.concurrent.LinkedBlockingQueue.take(java.base@9-ea/LinkedBlockingQueue.java:442) at
[jira] [Commented] (SOLR-8785) Use Metrics library for core metrics
[ https://issues.apache.org/jira/browse/SOLR-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210937#comment-15210937 ] ASF GitHub Bot commented on SOLR-8785: -- GitHub user randomstatistic opened a pull request: https://github.com/apache/lucene-solr/pull/25 SOLR-8785: Use Metrics library for core metrics There were three main areas that used the copied classes in org.apache.solr.util.stats: - AnalyticsStatisticsCollector - Overseer.Stats - RequestHandlerBase This patch adds depreciation tags to all the copied classes, and also replaces all usage of those classes with classes from the Metrics library. I added one new class (org.apache.solr.util.stats.Metrics) to provide some common access patterns for metrics gathering. This patch only adds Registry-based tracking to RequestHandlerBase, although all three areas are a fit for it. The effect is that all one needs to do is add a Reporter to the SharedMetricRegistry named “solr.registry.requesthandler” and all named request handler stats will be exported automatically. Compatibility notes: - The “totalTime” stat has been deleted from all three areas. This never seemed very useful, and Metrics didn’t support it in the Timer class, so it would have required some extra code to keep. - RequestHandler stats are now persistent, and will no longer reset on reload. You can merge this pull request into a Git repository by running: $ git pull https://github.com/randomstatistic/lucene-solr metrics_lib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit 77ba4704399ecd5121a6941a3a75c1294172ed21 Author: Jeff WartesDate: 2016-03-16T18:27:35Z SOLR-8785 - Upgrade Metrics lib commit 9ad3a8179ac446f5820e051802a37bf8b2ba911b Author: Jeff Wartes Date: 2016-03-18T02:46:49Z SOLR-8785 - Use the Metrics lib instead of the old classes from the org.apache.solr.util.stats package space. commit 6ee11c807aa7432ec02f9ad63aefc7487a02566a Author: Jeff Wartes Date: 2016-03-18T02:55:33Z SOLR-8785 - Use persistent, reportable timers for named request handlers > Use Metrics library for core metrics > > > Key: SOLR-8785 > URL: https://issues.apache.org/jira/browse/SOLR-8785 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.1 >Reporter: Jeff Wartes > > The Metrics library (https://dropwizard.github.io/metrics/3.1.0/) is a > well-known way to track metrics about applications. > In SOLR-1972, latency percentile tracking was added. The comment list is > long, so here’s my synopsis: > 1. An attempt was made to use the Metrics library > 2. That attempt failed due to a memory leak in Metrics v2.1.1 > 3. Large parts of Metrics were then copied wholesale into the > org.apache.solr.util.stats package space and that was used instead. > Copy/pasting Metrics code into Solr may have been the correct solution at the > time, but I submit that it isn’t correct any more. > The leak in Metrics was fixed even before SOLR-1972 was released, and by > copy/pasting a subset of the functionality, we miss access to other important > things that the Metrics library provides, particularly the concept of a > Reporter. (https://dropwizard.github.io/metrics/3.1.0/manual/core/#reporters) > Further, Metrics v3.0.2 is already packaged with Solr anyway, because it’s > used in two contrib modules. (map-reduce and morphines-core) > I’m proposing that: > 1. Metrics as bundled with Solr be upgraded to the current v3.1.2 > 2. Most of the org.apache.solr.util.stats package space be deleted outright, > or gutted and replaced with simple calls to Metrics. Due to the copy/paste > origin, the concepts should mostly map 1:1. > I’d further recommend a usage pattern like: > SharedMetricRegistries.getOrCreate(System.getProperty(“solr.metrics.registry”, > “solr-registry”)) > There are all kinds of areas in Solr that could benefit from metrics tracking > and reporting. This pattern allows diverse areas of code to track metrics > within a single, named registry. This well-known-name then becomes a handle > you can use to easily attach a Reporter and ship all of those metrics off-box. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8785: Use Metrics library for core ...
GitHub user randomstatistic opened a pull request: https://github.com/apache/lucene-solr/pull/25 SOLR-8785: Use Metrics library for core metrics There were three main areas that used the copied classes in org.apache.solr.util.stats: - AnalyticsStatisticsCollector - Overseer.Stats - RequestHandlerBase This patch adds depreciation tags to all the copied classes, and also replaces all usage of those classes with classes from the Metrics library. I added one new class (org.apache.solr.util.stats.Metrics) to provide some common access patterns for metrics gathering. This patch only adds Registry-based tracking to RequestHandlerBase, although all three areas are a fit for it. The effect is that all one needs to do is add a Reporter to the SharedMetricRegistry named âsolr.registry.requesthandlerâ and all named request handler stats will be exported automatically. Compatibility notes: - The âtotalTimeâ stat has been deleted from all three areas. This never seemed very useful, and Metrics didnât support it in the Timer class, so it would have required some extra code to keep. - RequestHandler stats are now persistent, and will no longer reset on reload. You can merge this pull request into a Git repository by running: $ git pull https://github.com/randomstatistic/lucene-solr metrics_lib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit 77ba4704399ecd5121a6941a3a75c1294172ed21 Author: Jeff WartesDate: 2016-03-16T18:27:35Z SOLR-8785 - Upgrade Metrics lib commit 9ad3a8179ac446f5820e051802a37bf8b2ba911b Author: Jeff Wartes Date: 2016-03-18T02:46:49Z SOLR-8785 - Use the Metrics lib instead of the old classes from the org.apache.solr.util.stats package space. commit 6ee11c807aa7432ec02f9ad63aefc7487a02566a Author: Jeff Wartes Date: 2016-03-18T02:55:33Z SOLR-8785 - Use persistent, reportable timers for named request handlers --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8898) Solr mapreduce doesn't work in kerberized environment
Łukasz Dywicki created SOLR-8898: Summary: Solr mapreduce doesn't work in kerberized environment Key: SOLR-8898 URL: https://issues.apache.org/jira/browse/SOLR-8898 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Environment: Kerberos Reporter: Łukasz Dywicki Jobs, skeltons and tools available in mapreduce do not work with Kerberos. Ie MapReduceIndexerTool is breaking up at Golive phase when it needs to execute HTTP action in kerberized solr. Entire story is written on hortonworks community forum, however it affects solr in very same way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7094) spatial-extras BBoxStrategy and (confusingly!) PointVectorStrategy use legacy numeric encoding
[ https://issues.apache.org/jira/browse/LUCENE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-7094: --- Attachment: LUCENE-7094.patch Updated patch includes the following: * Per request, ComboField deprecation has been removed. I'm still not convinced we need to keep this around but if its a blocker for the Point cut over I'm fine leaving it alone. * Moved get/setFieldType back to {{BBoxStrategy}} only * {{PointVectorStrategy}} requires docvalues without leniency * {{BBoxStrategy}} and {{PointVectorStrategy}} still index docvalues in a separate field. This is needed because docvalues on a {{DoubleField}} stores a long cast from the double value, resulting in truncation of the original double. This is a bug that was not caught before because the test used whole values instead of decimal values. {{DistanceStrategyTest.testRecipScore}} was updated to test on decimal values. > spatial-extras BBoxStrategy and (confusingly!) PointVectorStrategy use legacy > numeric encoding > -- > > Key: LUCENE-7094 > URL: https://issues.apache.org/jira/browse/LUCENE-7094 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Assignee: Nicholas Knize > Attachments: LUCENE-7094.patch, LUCENE-7094.patch > > > We need to deprecate these since they work on the old encoding and provide > points based alternatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210827#comment-15210827 ] Timothy Potter commented on SOLR-445: - LGTM +1 Nice test coverage of all this! This will be very useful for streaming applications (such as from Spark and Storm) where re-trying individual docs after an error is less than ideal. Now we'll be able to pin-point exactly which docs had issues! I'd prefer this to be baked into the default chain but can understand the rationale for leaving it out for now too. So long as we put up an example of how to enable it using the Config API in the ref guide. > Update Handlers abort with bad documents > > > Key: SOLR-445 > URL: https://issues.apache.org/jira/browse/SOLR-445 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Will Johnson >Assignee: Hoss Man > Fix For: master, 6.1 > > Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml > > > This issue adds a new {{TolerantUpdateProcessorFactory}} making it possible > to configure solr updates so that they are "tolerant" of individual errors in > an update request... > {code} > > 10 > > {code} > When a chain with this processor is used, but maxErrors isn't exceeded, > here's what the response looks like... > {code} > $ curl > 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain=json=true=-1' > -H "Content-Type: application/json" --data-binary '{"add" : { > "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}' > { > "responseHeader":{ > "errors":[{ > "type":"ADD", > "id":"1", > "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For > input string: \"bogus\""}, > { > "type":"DELQ", > "id":"malformed:[", > "message":"org.apache.solr.search.SyntaxError: Cannot parse > 'malformed:[': Encountered \"\" at line 1, column 11.\nWas expecting one > of:\n ...\n ...\n"}], > "maxErrors":-1, > "status":0, > "QTime":1}} > {code} > Note in the above example that: > * maxErrors can be overridden on a per-request basis > * an effective {{maxErrors==-1}} (either from config, or request param) means > "unlimited" (under the covers it's using {{Integer.MAX_VALUE}}) > If/When maxErrors is reached for a request, then the _first_ exception that > the processor caught is propagated back to the user, and metadata is set on > that exception with all of the same details about all the tolerated errors. > This next example is the same as the previous except that instead of > {{maxErrors=-1}} the request param is now {{maxErrors=1}}... > {code} > $ curl > 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain=json=true=1' > -H "Content-Type: application/json" --data-binary '{"add" : { > "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}' > { > "responseHeader":{ > "errors":[{ > "type":"ADD", > "id":"1", > "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For > input string: \"bogus\""}, > { > "type":"DELQ", > "id":"malformed:[", > "message":"org.apache.solr.search.SyntaxError: Cannot parse > 'malformed:[': Encountered \"\" at line 1, column 11.\nWas expecting one > of:\n ...\n ...\n"}], > "maxErrors":1, > "status":400, > "QTime":1}, > "error":{ > "metadata":[ > "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] > Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"", > > "org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError: > Cannot parse 'malformed:[': Encountered \"\" at line 1, column 11.\nWas > expecting one of:\n ...\n ...\n", > "error-class","org.apache.solr.common.SolrException", > "root-error-class","java.lang.NumberFormatException"], > "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input > string: \"bogus\"", > "code":400}} > {code} > ...the added exception metadata ensures that even in client code like the > various SolrJ SolrClient implementations, which throw a (client side) > exception on non-200 responses, the end user can access info on all the > tolerated errors that were ignored before the maxErrors threshold was reached. > > {panel:title=Original Jira Request} > Has anyone run into the problem of handling bad documents / failures mid >
[jira] [Updated] (LUCENE-6871) Move SpanQueries out of .spans package
[ https://issues.apache.org/jira/browse/LUCENE-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-6871: -- Attachment: LUCENE-6871.patch Here's an updated patch that just moves everything out of the .spans package into oal.search. I think this is worth doing before 6.0? It would also allow us to make package-private a bunch of classes in oal.search that are currently public but marked as internal, just because they're used by Spans. Merging SpanTermQuery and TermQuery, etc, can be looked at in follow up issues. > Move SpanQueries out of .spans package > -- > > Key: LUCENE-6871 > URL: https://issues.apache.org/jira/browse/LUCENE-6871 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 5.4, master >Reporter: Alan Woodward > Attachments: LUCENE-6871.patch, LUCENE-6871.patch > > > SpanQueries are now essentially the same as a standard query, restricted to a > single field and with an extra scorer type returned by getSpans(). There are > a number of existing queries that fit this contract, including TermQuery and > PhraseQuery, and it should be possible to make them SpanQueries as well > without impacting their existing performance. However, we can't do this > while SpanQuery and its associated Weight and Spans classes are in their own > package. > I'd like to remove the o.a.l.search.spans package entirely, in a few stages: > 1) Move SpanQuery, SpanWeight, Spans, SpanCollector and FilterSpans to > o.a.l.search > 2) Remove SpanTermQuery and merge its functionality into TermQuery > 3) Move SpanNear, SpanNot, SpanOr and SpanMultiTermQueryWrapper to > o.a.l.search > 4) Move the remaining SpanQueries to the queries package > Then we can look at, eg, making PhraseQuery a SpanQuery, removing > SpanMTQWrapper and making MultiTermQuery a SpanQuery, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-445: -- Affects Version/s: (was: 1.3) Fix Version/s: 6.1 master Description: This issue adds a new {{TolerantUpdateProcessorFactory}} making it possible to configure solr updates so that they are "tolerant" of individual errors in an update request... {code} 10 {code} When a chain with this processor is used, but maxErrors isn't exceeded, here's what the response looks like... {code} $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain=json=true=-1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}' { "responseHeader":{ "errors":[{ "type":"ADD", "id":"1", "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""}, { "type":"DELQ", "id":"malformed:[", "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"\" at line 1, column 11.\nWas expecting one of:\n ...\n ...\n"}], "maxErrors":-1, "status":0, "QTime":1}} {code} Note in the above example that: * maxErrors can be overridden on a per-request basis * an effective {{maxErrors==-1}} (either from config, or request param) means "unlimited" (under the covers it's using {{Integer.MAX_VALUE}}) If/When maxErrors is reached for a request, then the _first_ exception that the processor caught is propagated back to the user, and metadata is set on that exception with all of the same details about all the tolerated errors. This next example is the same as the previous except that instead of {{maxErrors=-1}} the request param is now {{maxErrors=1}}... {code} $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain=json=true=1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}' { "responseHeader":{ "errors":[{ "type":"ADD", "id":"1", "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""}, { "type":"DELQ", "id":"malformed:[", "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"\" at line 1, column 11.\nWas expecting one of:\n ...\n ...\n"}], "maxErrors":1, "status":400, "QTime":1}, "error":{ "metadata":[ "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"", "org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"\" at line 1, column 11.\nWas expecting one of:\n ...\n ...\n", "error-class","org.apache.solr.common.SolrException", "root-error-class","java.lang.NumberFormatException"], "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"", "code":400}} {code} ...the added exception metadata ensures that even in client code like the various SolrJ SolrClient implementations, which throw a (client side) exception on non-200 responses, the end user can access info on all the tolerated errors that were ignored before the maxErrors threshold was reached. {panel:title=Original Jira Request} Has anyone run into the problem of handling bad documents / failures mid batch. Ie: 1 2 I_AM_A_BAD_DATE 3 Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. {panel} was: Has anyone run into the problem of handling bad documents / failures mid batch. Ie: 1 2 I_AM_A_BAD_DATE 3 Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. updated summary to reflect basic information about feature being added > Update Handlers abort with bad documents >
[JENKINS] Lucene-Solr-6.x-Windows (64bit/jdk1.8.0_72) - Build # 70 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Windows/70/ Java: 64bit/jdk1.8.0_72 -XX:-UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigAliasReplication Error Message: [C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandler_D16D69F945D74522-001\solr-instance-028\.\collection1\data\index.20160325044451556, C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandler_D16D69F945D74522-001\solr-instance-028\.\collection1\data\index.20160325044451216, C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandler_D16D69F945D74522-001\solr-instance-028\.\collection1\data\] expected:<2> but was:<3> Stack Trace: java.lang.AssertionError: [C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandler_D16D69F945D74522-001\solr-instance-028\.\collection1\data\index.20160325044451556, C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandler_D16D69F945D74522-001\solr-instance-028\.\collection1\data\index.20160325044451216, C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandler_D16D69F945D74522-001\solr-instance-028\.\collection1\data\] expected:<2> but was:<3> at __randomizedtesting.SeedInfo.seed([D16D69F945D74522:261E87A1833FEAC4]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.solr.handler.TestReplicationHandler.checkForSingleIndex(TestReplicationHandler.java:818) at org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigAliasReplication(TestReplicationHandler.java:1248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210784#comment-15210784 ] Hoss Man commented on SOLR-445: --- I'm still beasting the tests a bit, but i think this is pretty solid and ready for master/branch_6x > Update Handlers abort with bad documents > > > Key: SOLR-445 > URL: https://issues.apache.org/jira/browse/SOLR-445 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.3 >Reporter: Will Johnson >Assignee: Hoss Man > Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml > > > Has anyone run into the problem of handling bad documents / failures mid > batch. Ie: > > > 1 > > > 2 > I_AM_A_BAD_DATE > > > 3 > > > Right now solr adds the first doc and then aborts. It would seem like it > should either fail the entire batch or log a message/return a code and then > continue on to add doc 3. Option 1 would seem to be much harder to > accomplish and possibly require more memory while Option 2 would require more > information to come back from the API. I'm about to dig into this but I > thought I'd ask to see if anyone had any suggestions, thoughts or comments. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210781#comment-15210781 ] ASF subversion and git services commented on SOLR-445: -- Commit b08c284b26b1779d03693a45e219db89839461d0 in lucene-solr's branch refs/heads/jira/SOLR-445 from [~hossman_luc...@fucit.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b08c284 ] SOLR-445: fix logger declaration to satisfy precommit > Update Handlers abort with bad documents > > > Key: SOLR-445 > URL: https://issues.apache.org/jira/browse/SOLR-445 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.3 >Reporter: Will Johnson >Assignee: Hoss Man > Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml > > > Has anyone run into the problem of handling bad documents / failures mid > batch. Ie: > > > 1 > > > 2 > I_AM_A_BAD_DATE > > > 3 > > > Right now solr adds the first doc and then aborts. It would seem like it > should either fail the entire batch or log a message/return a code and then > continue on to add doc 3. Option 1 would seem to be much harder to > accomplish and possibly require more memory while Option 2 would require more > information to come back from the API. I'm about to dig into this but I > thought I'd ask to see if anyone had any suggestions, thoughts or comments. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210779#comment-15210779 ] ASF subversion and git services commented on SOLR-445: -- Commit 39884c0b0c02b4090640d6268a45a1cf5f54f3e0 in lucene-solr's branch refs/heads/jira/SOLR-445 from [~hossman_luc...@fucit.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=39884c0 ] SOLR-445: removing questionable isLeader check; beasting the tests w/o this code didn't demonstrate any problems > Update Handlers abort with bad documents > > > Key: SOLR-445 > URL: https://issues.apache.org/jira/browse/SOLR-445 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.3 >Reporter: Will Johnson >Assignee: Hoss Man > Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml > > > Has anyone run into the problem of handling bad documents / failures mid > batch. Ie: > > > 1 > > > 2 > I_AM_A_BAD_DATE > > > 3 > > > Right now solr adds the first doc and then aborts. It would seem like it > should either fail the entire batch or log a message/return a code and then > continue on to add doc 3. Option 1 would seem to be much harder to > accomplish and possibly require more memory while Option 2 would require more > information to come back from the API. I'm about to dig into this but I > thought I'd ask to see if anyone had any suggestions, thoughts or comments. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210780#comment-15210780 ] ASF subversion and git services commented on SOLR-445: -- Commit 1d8cdd27993a46ae17c4ac308504513a33f01a15 in lucene-solr's branch refs/heads/jira/SOLR-445 from [~hossman_luc...@fucit.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1d8cdd2 ] SOLR-445: remove test - we have more complete coverage in TestTolerantUpdateProcessorCloud which uses the more robust SolrCloudTestCase model > Update Handlers abort with bad documents > > > Key: SOLR-445 > URL: https://issues.apache.org/jira/browse/SOLR-445 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.3 >Reporter: Will Johnson >Assignee: Hoss Man > Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445-alternative.patch, > SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, > SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml > > > Has anyone run into the problem of handling bad documents / failures mid > batch. Ie: > > > 1 > > > 2 > I_AM_A_BAD_DATE > > > 3 > > > Right now solr adds the first doc and then aborts. It would seem like it > should either fail the entire batch or log a message/return a code and then > continue on to add doc 3. Option 1 would seem to be much harder to > accomplish and possibly require more memory while Option 2 would require more > information to come back from the API. I'm about to dig into this but I > thought I'd ask to see if anyone had any suggestions, thoughts or comments. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-7136) remove Threads from BaseGeoPointTestCase
[ https://issues.apache.org/jira/browse/LUCENE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-7136. - Resolution: Fixed Fix Version/s: 6.1 master > remove Threads from BaseGeoPointTestCase > > > Key: LUCENE-7136 > URL: https://issues.apache.org/jira/browse/LUCENE-7136 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Fix For: master, 6.1 > > Attachments: LUCENE-7136.patch > > > I don't think we should mix testing threads with all the other stuff going on > here. It makes things too hard to debug. > if we want to test thread safety of e.g. BKD or queries somewhere, that > should be an explicit narrow test just for that (no complicated geometry > going on). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7136) remove Threads from BaseGeoPointTestCase
[ https://issues.apache.org/jira/browse/LUCENE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210691#comment-15210691 ] ASF subversion and git services commented on LUCENE-7136: - Commit 39aaa108ac8a85809080e4f7cf2b5ac0cc0d0fe9 in lucene-solr's branch refs/heads/branch_6x from [~rcmuir] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=39aaa10 ] LUCENE-7136: remove threads from BaseGeoPointTestCase > remove Threads from BaseGeoPointTestCase > > > Key: LUCENE-7136 > URL: https://issues.apache.org/jira/browse/LUCENE-7136 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Fix For: master, 6.1 > > Attachments: LUCENE-7136.patch > > > I don't think we should mix testing threads with all the other stuff going on > here. It makes things too hard to debug. > if we want to test thread safety of e.g. BKD or queries somewhere, that > should be an explicit narrow test just for that (no complicated geometry > going on). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7136) remove Threads from BaseGeoPointTestCase
[ https://issues.apache.org/jira/browse/LUCENE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210689#comment-15210689 ] ASF subversion and git services commented on LUCENE-7136: - Commit fc7f559138b2544f9db42dbd745231f5a8b076c4 in lucene-solr's branch refs/heads/master from [~rcmuir] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fc7f559 ] LUCENE-7136: remove threads from BaseGeoPointTestCase > remove Threads from BaseGeoPointTestCase > > > Key: LUCENE-7136 > URL: https://issues.apache.org/jira/browse/LUCENE-7136 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Attachments: LUCENE-7136.patch > > > I don't think we should mix testing threads with all the other stuff going on > here. It makes things too hard to debug. > if we want to test thread safety of e.g. BKD or queries somewhere, that > should be an explicit narrow test just for that (no complicated geometry > going on). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7075) Clean up LegacyNumericUtils usage.
[ https://issues.apache.org/jira/browse/LUCENE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-7075: --- Affects Version/s: master > Clean up LegacyNumericUtils usage. > -- > > Key: LUCENE-7075 > URL: https://issues.apache.org/jira/browse/LUCENE-7075 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master, 6.0 >Reporter: Robert Muir >Priority: Blocker > > Tons of code is still on the deprecated LegacyNumericUtils. We will never be > able to remove these or even move them to somewhere better (like the > backwards jar) if we don't clean this up! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7075) Clean up LegacyNumericUtils usage.
[ https://issues.apache.org/jira/browse/LUCENE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-7075: --- Priority: Blocker (was: Major) > Clean up LegacyNumericUtils usage. > -- > > Key: LUCENE-7075 > URL: https://issues.apache.org/jira/browse/LUCENE-7075 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master, 6.0 >Reporter: Robert Muir >Priority: Blocker > > Tons of code is still on the deprecated LegacyNumericUtils. We will never be > able to remove these or even move them to somewhere better (like the > backwards jar) if we don't clean this up! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7075) Clean up LegacyNumericUtils usage.
[ https://issues.apache.org/jira/browse/LUCENE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-7075: --- Affects Version/s: 6.0 > Clean up LegacyNumericUtils usage. > -- > > Key: LUCENE-7075 > URL: https://issues.apache.org/jira/browse/LUCENE-7075 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master, 6.0 >Reporter: Robert Muir >Priority: Blocker > > Tons of code is still on the deprecated LegacyNumericUtils. We will never be > able to remove these or even move them to somewhere better (like the > backwards jar) if we don't clean this up! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene FieldType & specifying numeric type (double, float, )
On Thu, Mar 24, 2016 at 11:28 AM, Jack Krupanskywrote: > Yeah, I do recall seeing LUCENE-6917 (Deprecate and rename > NumericField/RangeQuery to LegacyNumeric) go by in the Jira traffic It was also mere weeks between this deprecation (which did not address Solr), and the proposal to start the lucene/solr 6 release process, virtually ensuring that Solr would be on deprecated numeric types for 6.0 Of course given that the release process has apparently stalled and development of the Point stuff is continuing, it seems like the deprecation was premature. This would also seem to mark the end of the ability to upgrade indexes without reindexing (unless the IndexUpgrader will acquire the ability to migrate from old numerics to new numerics). -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene FieldType & specifying numeric type (double, float, )
Thanks Robert, sounds good. And I'll give the blog post a read Mike. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Mar 24, 2016 at 12:51 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > See also my recent blog post describing this new feature: > https://www.elastic.co/blog/lucene-points-6.0 > > Net/net, in the 1D case, points looks like a win across the board vs. > the legacy (postings) implementation. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Mar 24, 2016 at 12:33 PM, Robert Muirwrote: > > On Thu, Mar 24, 2016 at 12:16 PM, Joel Bernstein > wrote: > >> I'm pretty confused about points as well and until very recently thought > >> these we geo-spacial improvements only. > >> > >> It would be good to understand the mechanics of points versus numerics. > I'm > >> particularly interested in not losing the high performance numeric > DocValues > >> support, which has become so important for analytics. > >> > > > > Unrelated. points are the structure used to find matching documents > > from e.g. a query point, range, radius, shape, whatever. They use a > > tree-like structure for this. So the replacement for NumericRangeQuery > > which "simulates" a tree with an inverted index. > > > > Instead of inverted index+postings list, we just have a proper tree > > structure for these things: fixed-width, multidimensional values. It > > has a different indexreader api for example, that lets you control how > > the tree is traversed as it goes (by returning INSIDE [collect all the > > docids in here blindly, this entire tree range is relevant], OUTSIDE > > [not relevant to my query, don't traverse this region anymore], or > > CROSSES [i may or may not be interested, have to traverse further to > > nodes (sub-ranges or values themselves)]. > > > > They also have the advantage of not being limited to 64 bits or 1 > > dimension, you can have up to 128 bits and up to 8 dimensions. So each > > thing you are adding to your document is really a "point in > > n-dimensional space", so if you want to have 3 lat+long pairs as a > > double[] in a single field, that works as you expect. > > > > See more information here: > > > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L35-L79 > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Resolved] (LUCENE-7137) consolidate many tests across Points and GeoPoint queries/fields
[ https://issues.apache.org/jira/browse/LUCENE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-7137. - Resolution: Fixed Fix Version/s: 6.1 master > consolidate many tests across Points and GeoPoint queries/fields > > > Key: LUCENE-7137 > URL: https://issues.apache.org/jira/browse/LUCENE-7137 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Fix For: master, 6.1 > > Attachments: LUCENE-7137.patch > > > We have found repeated basic problems with stuff like equals/hashcode > recently, I think we should consolidate tests and cleanup here. > these different implementations also have a little assortment of simplistic > unit tests, if its not doing anything impl-specific, we should fold those in > too. these are easy to debug and great to see fail if something is wrong. > I will work up a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7137) consolidate many tests across Points and GeoPoint queries/fields
[ https://issues.apache.org/jira/browse/LUCENE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210552#comment-15210552 ] ASF subversion and git services commented on LUCENE-7137: - Commit 139aa0bec5683acbc6c3a00898aad9572853ea91 in lucene-solr's branch refs/heads/branch_6x from [~rcmuir] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=139aa0b ] LUCENE-7137: consolidate many tests across Points and GeoPoint queries/fields > consolidate many tests across Points and GeoPoint queries/fields > > > Key: LUCENE-7137 > URL: https://issues.apache.org/jira/browse/LUCENE-7137 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Attachments: LUCENE-7137.patch > > > We have found repeated basic problems with stuff like equals/hashcode > recently, I think we should consolidate tests and cleanup here. > these different implementations also have a little assortment of simplistic > unit tests, if its not doing anything impl-specific, we should fold those in > too. these are easy to debug and great to see fail if something is wrong. > I will work up a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene FieldType & specifying numeric type (double, float, )
See also my recent blog post describing this new feature: https://www.elastic.co/blog/lucene-points-6.0 Net/net, in the 1D case, points looks like a win across the board vs. the legacy (postings) implementation. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 24, 2016 at 12:33 PM, Robert Muirwrote: > On Thu, Mar 24, 2016 at 12:16 PM, Joel Bernstein wrote: >> I'm pretty confused about points as well and until very recently thought >> these we geo-spacial improvements only. >> >> It would be good to understand the mechanics of points versus numerics. I'm >> particularly interested in not losing the high performance numeric DocValues >> support, which has become so important for analytics. >> > > Unrelated. points are the structure used to find matching documents > from e.g. a query point, range, radius, shape, whatever. They use a > tree-like structure for this. So the replacement for NumericRangeQuery > which "simulates" a tree with an inverted index. > > Instead of inverted index+postings list, we just have a proper tree > structure for these things: fixed-width, multidimensional values. It > has a different indexreader api for example, that lets you control how > the tree is traversed as it goes (by returning INSIDE [collect all the > docids in here blindly, this entire tree range is relevant], OUTSIDE > [not relevant to my query, don't traverse this region anymore], or > CROSSES [i may or may not be interested, have to traverse further to > nodes (sub-ranges or values themselves)]. > > They also have the advantage of not being limited to 64 bits or 1 > dimension, you can have up to 128 bits and up to 8 dimensions. So each > thing you are adding to your document is really a "point in > n-dimensional space", so if you want to have 3 lat+long pairs as a > double[] in a single field, that works as you expect. > > See more information here: > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L35-L79 > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7137) consolidate many tests across Points and GeoPoint queries/fields
[ https://issues.apache.org/jira/browse/LUCENE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210524#comment-15210524 ] ASF subversion and git services commented on LUCENE-7137: - Commit ff70c680a276111dad0268022c964b21648f60a6 in lucene-solr's branch refs/heads/master from [~rcmuir] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff70c68 ] LUCENE-7137: consolidate many tests across Points and GeoPoint queries/fields > consolidate many tests across Points and GeoPoint queries/fields > > > Key: LUCENE-7137 > URL: https://issues.apache.org/jira/browse/LUCENE-7137 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Attachments: LUCENE-7137.patch > > > We have found repeated basic problems with stuff like equals/hashcode > recently, I think we should consolidate tests and cleanup here. > these different implementations also have a little assortment of simplistic > unit tests, if its not doing anything impl-specific, we should fold those in > too. these are easy to debug and great to see fail if something is wrong. > I will work up a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-6733) Umbrella issue - Solr as a standalone application
On 3/24/2016 10:00 AM, Erick Erickson wrote: > How does this play with the notion bandied about during the whole > "stop shipping a war file" of replacing Jetty with something? IIRC > Netty was mentioned as an example. > > Wouldn't want you to put effort into a dead-end... Some build system work will be required either way. My current thought is an executable jar (like Jetty ships), but there's also the way we start ZkCLI, by putting the class name on the commandline. The discussion about Netty hasn't really gone anywhere. I have an email to the zookeeper user list that I've been working on for a while, where I plan to ask them what they think about Netty. They've got it as an alternate network option. I tried asking the question on the #zookeeper IRC channel and didn't get an answer after several hours, so I will send the email once I'm sure it's all worded right. Embedding Jetty is the path of least resistance -- it will require the least amount of work on the code. According to the jetty list, the Java side will be mostly just turning the jetty XML config into code, and it's already pretty close to Java code. We probably also need to turn web.xml into code, and I have no idea how difficult that will be. Once the build system work is done and we have a handle on how to embed Jetty, writing swappable network implementations (like Netty) will be *somewhat* easier. On paper, Netty looks excellent. I just don't know how much work it will require or whether we will like the results from a code maintenance perspective. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-master - Build # 970 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/970/ 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.hdfs.HdfsChaosMonkeySafeLeaderTest Error Message: ObjectTracker found 2 object(s) that were not released!!! [HdfsTransactionLog, HdfsTransactionLog] Stack Trace: java.lang.AssertionError: ObjectTracker found 2 object(s) that were not released!!! [HdfsTransactionLog, HdfsTransactionLog] at __randomizedtesting.SeedInfo.seed([296E9614A96AEF4F]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:238) at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:834) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at java.lang.Thread.run(Thread.java:745) FAILED: org.apache.solr.cloud.hdfs.HdfsCollectionsAPIDistributedZkTest.test Error Message: Timeout occured while waiting response from server at: http://127.0.0.1:47332 Stack Trace: org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://127.0.0.1:47332 at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:588) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.makeRequest(CollectionsAPIDistributedZkTest.java:381) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:508) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test(CollectionsAPIDistributedZkTest.java:169) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:996) at
Re: Lucene FieldType & specifying numeric type (double, float, )
On Thu, Mar 24, 2016 at 12:16 PM, Joel Bernsteinwrote: > I'm pretty confused about points as well and until very recently thought > these we geo-spacial improvements only. > > It would be good to understand the mechanics of points versus numerics. I'm > particularly interested in not losing the high performance numeric DocValues > support, which has become so important for analytics. > Unrelated. points are the structure used to find matching documents from e.g. a query point, range, radius, shape, whatever. They use a tree-like structure for this. So the replacement for NumericRangeQuery which "simulates" a tree with an inverted index. Instead of inverted index+postings list, we just have a proper tree structure for these things: fixed-width, multidimensional values. It has a different indexreader api for example, that lets you control how the tree is traversed as it goes (by returning INSIDE [collect all the docids in here blindly, this entire tree range is relevant], OUTSIDE [not relevant to my query, don't traverse this region anymore], or CROSSES [i may or may not be interested, have to traverse further to nodes (sub-ranges or values themselves)]. They also have the advantage of not being limited to 64 bits or 1 dimension, you can have up to 128 bits and up to 8 dimensions. So each thing you are adding to your document is really a "point in n-dimensional space", so if you want to have 3 lat+long pairs as a double[] in a single field, that works as you expect. See more information here: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L35-L79 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8897) SSL-related passwords in solr.in.sh are in plain text
Esther Quansah created SOLR-8897: Summary: SSL-related passwords in solr.in.sh are in plain text Key: SOLR-8897 URL: https://issues.apache.org/jira/browse/SOLR-8897 Project: Solr Issue Type: Improvement Components: scripts and tools, security Reporter: Esther Quansah As per the steps mentioned at following URL, one needs to store the plain text password for the keystore to configure SSL for Solr, which is not a good idea from security perspective. URL: https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SetcommonSSLrelatedsystemproperties (https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SetcommonSSLrelatedsystemproperties) Is there any way so that the encrypted password can be stored (instead of plain password) in solr.in.cmd/solr.in.sh to configure SSL? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene FieldType & specifying numeric type (double, float, )
Scalar doesnt mean anything. Point is simple, it is a point in n dimensional space, that is what the data structure provides for fast searching on. Numbers are points in one dimensional space. Think of a number line. On Mar 24, 2016 8:37 AM, "David Smiley"wrote: > bq. it wasn't at all clear that the intention was that simple scalars > would now and forever henceforth be referred to as "points". My impression > at the time was that the focus of the Jira was on implementation and > storage level indexing detail rather than the user-facing API level. I see > now that I was wrong about that. It just seems to me that there should have > been a more direct public discussion of eliminating the concept of scalar > values at the API level. > > I knew because I was following closely, but otherwise I agree with your > sentiment. I don't love the "PointValues" terminology either nor did I > like "DimensionalValues"; I should have suggested alternatives at the time > but the Mike & Rob tag-team were working so fast that I didn't interject in > the narrow window of time before a patch was put up with the current > names. More time to publicly discuss would have been better. FWIW I like > your suggestion for "Scalar"; that's more meaningful to me. Naming is hard. > > ~ David > > On Thu, Mar 24, 2016 at 11:28 AM Jack Krupansky > wrote: > >> I wasn't paying close attention when this whole PointValues saga was >> unfolding. I get the value of points for spatial data, but conflating the >> terms "point" and "numeric" is bizarre to say the least. Reading the code, >> I see "Points represent numeric values", which seems nonsensical to me. A >> little later the code comment says "Geospatial Point Types - Although basic >> point types such as DoublePoint support points in multi-dimensional space >> too, Lucene has specialized classes for location data...", which continues >> this odd use of terminology. I mean, aren't all points spatial by >> definition, so that "Geospatial Point" is redundant? It would make more >> sense to speak of a point as a geospatial number, or that a point is >> represented by numbers. >> >> IOW, NumericValues would make more sense as the base, with (spatial) >> PointValues derived from the base of numeric values. At least to me that >> would make more sense. >> >> As the PointValues was progressing I had no idea that its intent was to >> subsume, replace, or deprecate traditional scalar numeric value support in >> Lucene (or Solr.) It came across primarily as being an improvement for >> spatial search. >> >> Not that I have any objection to greatly improved storage in Lucene, but >> to now have to speak of all numeric data as points seems quite... weird. >> >> Sure, I saw the Jira traffic, like LUCENE-6825 (Add multidimensional >> byte[] indexing support to Lucene) and LUCENE-6852 (Add DimensionalFormat >> to Codec), but in all honesty that really did come across as relating to >> purely spatial data and not being applicable to basic scalar number support. >> >> Looking at CHANGES.TXT, I see references like "LUCENE-6852, LUCENE-6975: >> Add support for points (dimensionally indexed values)", but without any >> hint that the intent was to subsume or replace non-dimensional numeric >> indexed values. >> >> Now for all I know, non-dimensional (scalar) numeric data can very >> efficiently be handled as if it had dimension, but that's not exactly >> obvious and warrants at least some illumination. In traditional terminology >> a point is 0-dimension (a line is 1-dimension, and a plane is 2-dimension), >> but traditionally a raw number - a scalar - hasn't been referred to as >> having dimension, so that is a new concept warranting clear definition. >> >> Yeah, I do recall seeing LUCENE-6917 (Deprecate and rename >> NumericField/RangeQuery to LegacyNumeric) go by in the Jira traffic, and >> shame on me for not reading the details more carefully, but it wasn't at >> all clear that the intention was that simple scalars would now and forever >> henceforth be referred to as "points". My impression at the time was that >> the focus of the Jira was on implementation and storage level indexing >> detail rather than the user-facing API level. I see now that I was wrong >> about that. It just seems to me that there should have been a more direct >> public discussion of eliminating the concept of scalar values at the API >> level. >> >> (I wonder what physics would be like if they started referring to scalar >> quantities as vectors.) >> >> My apologies for the rant. >> >> >> -- Jack Krupansky >> >> On Thu, Mar 24, 2016 at 10:34 AM, David Smiley >> wrote: >> >>> With the move to PointValues and away from trie based indexing of the >>> terms index, for numerics, everything associated with the trie stuff seems >>> to be labelled as "Legacy" and marked deprecated. Even >>> FieldType.NumericType (now FieldType.LegacyNumericType) -- a
Re: Lucene FieldType & specifying numeric type (double, float, )
I'm pretty confused about points as well and until very recently thought these we geo-spacial improvements only. It would be good to understand the mechanics of points versus numerics. I'm particularly interested in not losing the high performance numeric DocValues support, which has become so important for analytics. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Mar 24, 2016 at 11:37 AM, David Smileywrote: > bq. it wasn't at all clear that the intention was that simple scalars > would now and forever henceforth be referred to as "points". My impression > at the time was that the focus of the Jira was on implementation and > storage level indexing detail rather than the user-facing API level. I see > now that I was wrong about that. It just seems to me that there should have > been a more direct public discussion of eliminating the concept of scalar > values at the API level. > > I knew because I was following closely, but otherwise I agree with your > sentiment. I don't love the "PointValues" terminology either nor did I > like "DimensionalValues"; I should have suggested alternatives at the time > but the Mike & Rob tag-team were working so fast that I didn't interject in > the narrow window of time before a patch was put up with the current > names. More time to publicly discuss would have been better. FWIW I like > your suggestion for "Scalar"; that's more meaningful to me. Naming is hard. > > ~ David > > On Thu, Mar 24, 2016 at 11:28 AM Jack Krupansky > wrote: > >> I wasn't paying close attention when this whole PointValues saga was >> unfolding. I get the value of points for spatial data, but conflating the >> terms "point" and "numeric" is bizarre to say the least. Reading the code, >> I see "Points represent numeric values", which seems nonsensical to me. A >> little later the code comment says "Geospatial Point Types - Although basic >> point types such as DoublePoint support points in multi-dimensional space >> too, Lucene has specialized classes for location data...", which continues >> this odd use of terminology. I mean, aren't all points spatial by >> definition, so that "Geospatial Point" is redundant? It would make more >> sense to speak of a point as a geospatial number, or that a point is >> represented by numbers. >> >> IOW, NumericValues would make more sense as the base, with (spatial) >> PointValues derived from the base of numeric values. At least to me that >> would make more sense. >> >> As the PointValues was progressing I had no idea that its intent was to >> subsume, replace, or deprecate traditional scalar numeric value support in >> Lucene (or Solr.) It came across primarily as being an improvement for >> spatial search. >> >> Not that I have any objection to greatly improved storage in Lucene, but >> to now have to speak of all numeric data as points seems quite... weird. >> >> Sure, I saw the Jira traffic, like LUCENE-6825 (Add multidimensional >> byte[] indexing support to Lucene) and LUCENE-6852 (Add DimensionalFormat >> to Codec), but in all honesty that really did come across as relating to >> purely spatial data and not being applicable to basic scalar number support. >> >> Looking at CHANGES.TXT, I see references like "LUCENE-6852, LUCENE-6975: >> Add support for points (dimensionally indexed values)", but without any >> hint that the intent was to subsume or replace non-dimensional numeric >> indexed values. >> >> Now for all I know, non-dimensional (scalar) numeric data can very >> efficiently be handled as if it had dimension, but that's not exactly >> obvious and warrants at least some illumination. In traditional terminology >> a point is 0-dimension (a line is 1-dimension, and a plane is 2-dimension), >> but traditionally a raw number - a scalar - hasn't been referred to as >> having dimension, so that is a new concept warranting clear definition. >> >> Yeah, I do recall seeing LUCENE-6917 (Deprecate and rename >> NumericField/RangeQuery to LegacyNumeric) go by in the Jira traffic, and >> shame on me for not reading the details more carefully, but it wasn't at >> all clear that the intention was that simple scalars would now and forever >> henceforth be referred to as "points". My impression at the time was that >> the focus of the Jira was on implementation and storage level indexing >> detail rather than the user-facing API level. I see now that I was wrong >> about that. It just seems to me that there should have been a more direct >> public discussion of eliminating the concept of scalar values at the API >> level. >> >> (I wonder what physics would be like if they started referring to scalar >> quantities as vectors.) >> >> My apologies for the rant. >> >> >> -- Jack Krupansky >> >> On Thu, Mar 24, 2016 at 10:34 AM, David Smiley >> wrote: >> >>> With the move to PointValues and away from trie based indexing of the >>> terms index, for numerics, everything
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210469#comment-15210469 ] David Smiley commented on SOLR-6359: Maybe I misunderstand the impacts of these configuration options, but why even have a maxNumLogsToKeep? i.e. why isn't it effectively unlimited? I don't care how many internal log files the updateLog would like to do its implementation-detail business, so long as I can specify that it has docs added within the last X minutes, and maybe a minimum number of docs. Sounds reasonable? Because X minutes allows me to specify a server restart worth of time. That 'X' minutes is basically the hard auto commit interval, since that's what truncates the current log to a new file. [~andyetitmoves] in your "heavy indexing setup" couldn't you have just set the auto commit window large enough to your liking? The current "numRecordsToKeep" (defaulting to 100) doesn't say if it's a min or max; it seems to be implemented as a soft maximum -- the oldest log files will be removed to stay under, but we'll always have at least one log file, however big or small it may be. In my scenario where I basically don't care how many records it actually is (I care about time), I think I can basically ignore this (leave at 100). > Allow customization of the number of records and logs kept by UpdateLog > --- > > Key: SOLR-6359 > URL: https://issues.apache.org/jira/browse/SOLR-6359 > Project: Solr > Issue Type: Improvement >Reporter: Ramkumar Aiyengar >Assignee: Ramkumar Aiyengar >Priority: Minor > Fix For: 5.1, master > > Attachments: SOLR-6359.patch > > > Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, > and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the > records) in an heavily indexing setup, leading to full recovery even if Solr > was just stopped and restarted. > These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7136) remove Threads from BaseGeoPointTestCase
[ https://issues.apache.org/jira/browse/LUCENE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210464#comment-15210464 ] Michael McCandless commented on LUCENE-7136: +1 > remove Threads from BaseGeoPointTestCase > > > Key: LUCENE-7136 > URL: https://issues.apache.org/jira/browse/LUCENE-7136 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Attachments: LUCENE-7136.patch > > > I don't think we should mix testing threads with all the other stuff going on > here. It makes things too hard to debug. > if we want to test thread safety of e.g. BKD or queries somewhere, that > should be an explicit narrow test just for that (no complicated geometry > going on). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7137) consolidate many tests across Points and GeoPoint queries/fields
[ https://issues.apache.org/jira/browse/LUCENE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210456#comment-15210456 ] Michael McCandless commented on LUCENE-7137: +1, wonderful! > consolidate many tests across Points and GeoPoint queries/fields > > > Key: LUCENE-7137 > URL: https://issues.apache.org/jira/browse/LUCENE-7137 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Attachments: LUCENE-7137.patch > > > We have found repeated basic problems with stuff like equals/hashcode > recently, I think we should consolidate tests and cleanup here. > these different implementations also have a little assortment of simplistic > unit tests, if its not doing anything impl-specific, we should fold those in > too. these are easy to debug and great to see fail if something is wrong. > I will work up a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-6733) Umbrella issue - Solr as a standalone application
Shawn: How does this play with the notion bandied about during the whole "stop shipping a war file" of replacing Jetty with something? IIRC Netty was mentioned as an example. Wouldn't want you to put effort into a dead-end... Erick On Wed, Mar 23, 2016 at 10:21 PM, Shawn Heisey (JIRA)wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209786#comment-15209786 > ] > > Shawn Heisey commented on SOLR-6733: > > > It would be a new package name either way, but as I consider how to handle > the build, I think a separate directory is easier to package into something > with a name like "solr-start.jar". > > I'm encouraged to learn that the insanity level will be less than I feared. > >> Umbrella issue - Solr as a standalone application >> - >> >> Key: SOLR-6733 >> URL: https://issues.apache.org/jira/browse/SOLR-6733 >> Project: Solr >> Issue Type: New Feature >>Reporter: Shawn Heisey >> >> Umbrella issue, for gathering issues relating to smaller pieces required to >> implement the larger feature where Solr can be run as a completely >> standalone application, without a servlet container. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene FieldType & specifying numeric type (double, float, )
bq. it wasn't at all clear that the intention was that simple scalars would now and forever henceforth be referred to as "points". My impression at the time was that the focus of the Jira was on implementation and storage level indexing detail rather than the user-facing API level. I see now that I was wrong about that. It just seems to me that there should have been a more direct public discussion of eliminating the concept of scalar values at the API level. I knew because I was following closely, but otherwise I agree with your sentiment. I don't love the "PointValues" terminology either nor did I like "DimensionalValues"; I should have suggested alternatives at the time but the Mike & Rob tag-team were working so fast that I didn't interject in the narrow window of time before a patch was put up with the current names. More time to publicly discuss would have been better. FWIW I like your suggestion for "Scalar"; that's more meaningful to me. Naming is hard. ~ David On Thu, Mar 24, 2016 at 11:28 AM Jack Krupanskywrote: > I wasn't paying close attention when this whole PointValues saga was > unfolding. I get the value of points for spatial data, but conflating the > terms "point" and "numeric" is bizarre to say the least. Reading the code, > I see "Points represent numeric values", which seems nonsensical to me. A > little later the code comment says "Geospatial Point Types - Although basic > point types such as DoublePoint support points in multi-dimensional space > too, Lucene has specialized classes for location data...", which continues > this odd use of terminology. I mean, aren't all points spatial by > definition, so that "Geospatial Point" is redundant? It would make more > sense to speak of a point as a geospatial number, or that a point is > represented by numbers. > > IOW, NumericValues would make more sense as the base, with (spatial) > PointValues derived from the base of numeric values. At least to me that > would make more sense. > > As the PointValues was progressing I had no idea that its intent was to > subsume, replace, or deprecate traditional scalar numeric value support in > Lucene (or Solr.) It came across primarily as being an improvement for > spatial search. > > Not that I have any objection to greatly improved storage in Lucene, but > to now have to speak of all numeric data as points seems quite... weird. > > Sure, I saw the Jira traffic, like LUCENE-6825 (Add multidimensional > byte[] indexing support to Lucene) and LUCENE-6852 (Add DimensionalFormat > to Codec), but in all honesty that really did come across as relating to > purely spatial data and not being applicable to basic scalar number support. > > Looking at CHANGES.TXT, I see references like "LUCENE-6852, LUCENE-6975: > Add support for points (dimensionally indexed values)", but without any > hint that the intent was to subsume or replace non-dimensional numeric > indexed values. > > Now for all I know, non-dimensional (scalar) numeric data can very > efficiently be handled as if it had dimension, but that's not exactly > obvious and warrants at least some illumination. In traditional terminology > a point is 0-dimension (a line is 1-dimension, and a plane is 2-dimension), > but traditionally a raw number - a scalar - hasn't been referred to as > having dimension, so that is a new concept warranting clear definition. > > Yeah, I do recall seeing LUCENE-6917 (Deprecate and rename > NumericField/RangeQuery to LegacyNumeric) go by in the Jira traffic, and > shame on me for not reading the details more carefully, but it wasn't at > all clear that the intention was that simple scalars would now and forever > henceforth be referred to as "points". My impression at the time was that > the focus of the Jira was on implementation and storage level indexing > detail rather than the user-facing API level. I see now that I was wrong > about that. It just seems to me that there should have been a more direct > public discussion of eliminating the concept of scalar values at the API > level. > > (I wonder what physics would be like if they started referring to scalar > quantities as vectors.) > > My apologies for the rant. > > > -- Jack Krupansky > > On Thu, Mar 24, 2016 at 10:34 AM, David Smiley > wrote: > >> With the move to PointValues and away from trie based indexing of the >> terms index, for numerics, everything associated with the trie stuff seems >> to be labelled as "Legacy" and marked deprecated. Even >> FieldType.NumericType (now FieldType.LegacyNumericType) -- a simple enum of >> INT, LONG, FLOAT, DOUBLE. I wonder if we ought to reconsider doing this >> for FieldType.NumericType, as it articulates the type of numeric data; it >> need not be associated with just trie indexing of terms data; it could >> articulate how any numeric data is encoded, be it docValues or >> pointValues. This is useful metadata. It's not strictly required, true, >> but its
Re: Lucene FieldType & specifying numeric type (double, float, )
I wasn't paying close attention when this whole PointValues saga was unfolding. I get the value of points for spatial data, but conflating the terms "point" and "numeric" is bizarre to say the least. Reading the code, I see "Points represent numeric values", which seems nonsensical to me. A little later the code comment says "Geospatial Point Types - Although basic point types such as DoublePoint support points in multi-dimensional space too, Lucene has specialized classes for location data...", which continues this odd use of terminology. I mean, aren't all points spatial by definition, so that "Geospatial Point" is redundant? It would make more sense to speak of a point as a geospatial number, or that a point is represented by numbers. IOW, NumericValues would make more sense as the base, with (spatial) PointValues derived from the base of numeric values. At least to me that would make more sense. As the PointValues was progressing I had no idea that its intent was to subsume, replace, or deprecate traditional scalar numeric value support in Lucene (or Solr.) It came across primarily as being an improvement for spatial search. Not that I have any objection to greatly improved storage in Lucene, but to now have to speak of all numeric data as points seems quite... weird. Sure, I saw the Jira traffic, like LUCENE-6825 (Add multidimensional byte[] indexing support to Lucene) and LUCENE-6852 (Add DimensionalFormat to Codec), but in all honesty that really did come across as relating to purely spatial data and not being applicable to basic scalar number support. Looking at CHANGES.TXT, I see references like "LUCENE-6852, LUCENE-6975: Add support for points (dimensionally indexed values)", but without any hint that the intent was to subsume or replace non-dimensional numeric indexed values. Now for all I know, non-dimensional (scalar) numeric data can very efficiently be handled as if it had dimension, but that's not exactly obvious and warrants at least some illumination. In traditional terminology a point is 0-dimension (a line is 1-dimension, and a plane is 2-dimension), but traditionally a raw number - a scalar - hasn't been referred to as having dimension, so that is a new concept warranting clear definition. Yeah, I do recall seeing LUCENE-6917 (Deprecate and rename NumericField/RangeQuery to LegacyNumeric) go by in the Jira traffic, and shame on me for not reading the details more carefully, but it wasn't at all clear that the intention was that simple scalars would now and forever henceforth be referred to as "points". My impression at the time was that the focus of the Jira was on implementation and storage level indexing detail rather than the user-facing API level. I see now that I was wrong about that. It just seems to me that there should have been a more direct public discussion of eliminating the concept of scalar values at the API level. (I wonder what physics would be like if they started referring to scalar quantities as vectors.) My apologies for the rant. -- Jack Krupansky On Thu, Mar 24, 2016 at 10:34 AM, David Smileywrote: > With the move to PointValues and away from trie based indexing of the > terms index, for numerics, everything associated with the trie stuff seems > to be labelled as "Legacy" and marked deprecated. Even > FieldType.NumericType (now FieldType.LegacyNumericType) -- a simple enum of > INT, LONG, FLOAT, DOUBLE. I wonder if we ought to reconsider doing this > for FieldType.NumericType, as it articulates the type of numeric data; it > need not be associated with just trie indexing of terms data; it could > articulate how any numeric data is encoded, be it docValues or > pointValues. This is useful metadata. It's not strictly required, true, > but its useful in describing what goes in the field. This makes a > FieldType instance fairly self-sufficient. Otherwise, say you have > docValue numerics and/or pointValues, it's ambiguous how the data should be > interpreted. This doesn't lead to a bug but would help debugging and > allowing APIs to express field requirements simply by providing a FieldType > instance for numeric data. It used to be self sufficient but now if we > imagine the legacy stuff being removed, it's ambiguous. In addition, it > would be useful metadata if it found it's way into FieldInfo. Then, say > Luke, could help you know what's there and maybe search it. > > Thoughts? > > ~ David > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com >
Re: [CONF] Apache Solr Reference Guide > Understanding Analyzers, Tokenizers, and Filters
I’ve filed a JIRA to have this account banned - it’s been used to post 6 spam comments in the last 3 weeks: https://issues.apache.org/jira/browse/INFRA-11537 -- Steve www.lucidworks.com > On Mar 23, 2016, at 10:51 PM, velorina (Confluence)> wrote: > > velorina commented on a page > > > Re: Understanding Analyzers, Tokenizers, and Filters > Iam really impressed with your writing abilities and also with > the structure in your weblog. Is that this a paid subject matter or > did you customize it yourself? Anyway stay up the excellent high quality > writing, it > is uncommon to peer a nice weblog like this > one these days.. > > judi online > > Reply • > >Like > > Stop watching space • > Manage notifications > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8145) bin/solr script oom_killer arg incorrect
[ https://issues.apache.org/jira/browse/SOLR-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210400#comment-15210400 ] Nim Lhûg commented on SOLR-8145: Any chance this can be fixed in 5.5.1? > bin/solr script oom_killer arg incorrect > > > Key: SOLR-8145 > URL: https://issues.apache.org/jira/browse/SOLR-8145 > Project: Solr > Issue Type: Bug > Components: scripts and tools >Affects Versions: 5.2.1 >Reporter: Nate Dire >Assignee: Timothy Potter >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8145.patch, SOLR-8145.patch, SOLR-8145.patch > > > I noticed the oom_killer script wasn't working in our 5.2 deployment. > In the {{bin/solr}} script, the {{OnOutOfMemoryError}} option is being passed > as an arg to the jar rather than to the JVM. I moved it ahead of {{-jar}} > and verified it shows up in the JVM args in the UI. > {noformat} ># run Solr in the background > nohup "$JAVA" "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -jar start.jar \ > "-XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT > $SOLR_LOGS_DIR" "${SOLR_JETTY_CONFIG[@]}" \ > {noformat} > Also, I'm not sure what the {{SOLR_PORT}} and {{SOLR_LOGS_DIR}} args are > doing--they don't appear to be positional arguments to the jar. > Attaching a patch against 5.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-EA] Lucene-Solr-master-Linux (64bit/jdk-9-jigsaw-ea+110) - Build # 16318 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/16318/ Java: 64bit/jdk-9-jigsaw-ea+110 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:-CompactStrings 1 tests failed. FAILED: org.apache.lucene.codecs.lucene60.TestLucene60PointsFormat.testMultiValued Error Message: wrong number of points in split: expected=4589 but actual=4330 Stack Trace: java.lang.IllegalStateException: wrong number of points in split: expected=4589 but actual=4330 at __randomizedtesting.SeedInfo.seed([563D147051FE8BB8:821D70429F3CCBF0]:0) at org.apache.lucene.util.bkd.BKDWriter.build(BKDWriter.java:1224) at org.apache.lucene.util.bkd.BKDWriter.finish(BKDWriter.java:862) at org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.writeField(Lucene60PointsWriter.java:115) at org.apache.lucene.codecs.PointsWriter.mergeOneField(PointsWriter.java:58) at org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:204) at org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:168) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:117) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4099) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3679) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1946) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1779) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1736) at org.apache.lucene.index.RandomIndexWriter.forceMerge(RandomIndexWriter.java:421) at org.apache.lucene.index.BasePointsFormatTestCase.verify(BasePointsFormatTestCase.java:667) at org.apache.lucene.index.BasePointsFormatTestCase.verify(BasePointsFormatTestCase.java:500) at org.apache.lucene.index.BasePointsFormatTestCase.testMultiValued(BasePointsFormatTestCase.java:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:531) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[JENKINS] Lucene-Solr-6.x-Solaris (64bit/jdk1.8.0) - Build # 29 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/29/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseSerialGC 2 tests failed. FAILED: org.apache.solr.schema.BadIndexSchemaTest.testSimDefaultFieldTypeHasNoExplicitSim Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([382FF46A9972D9D]:0) FAILED: junit.framework.TestSuite.org.apache.solr.schema.BadIndexSchemaTest Error Message: Suite timeout exceeded (>= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (>= 720 msec). at __randomizedtesting.SeedInfo.seed([382FF46A9972D9D]:0) Build Log: [...truncated 12261 lines...] [junit4] Suite: org.apache.solr.schema.BadIndexSchemaTest [junit4] 2> Creating dataDir: /export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/build/solr-core/test/J1/temp/solr.schema.BadIndexSchemaTest_382FF46A9972D9D-001/init-core-data-001 [junit4] 2> 969611 INFO (SUITE-BadIndexSchemaTest-seed#[382FF46A9972D9D]-worker) [] o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (false) [junit4] 2> 969613 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testBadExternalFileField [junit4] 2> 969614 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.SolrTestCaseJ4 initCore [junit4] 2> 969614 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrResourceLoader new SolrResourceLoader for directory: '/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/core/src/test-files/solr/collection1' [junit4] 2> 969614 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrResourceLoader JNDI not configured for solr (NoInitialContextEx) [junit4] 2> 969614 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrResourceLoader using system property solr.solr.home: /export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/core/src/test-files/solr [junit4] 2> 969614 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrResourceLoader Adding 'file:/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/core/src/test-files/solr/collection1/lib/classes/' to classloader [junit4] 2> 969614 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrResourceLoader Adding 'file:/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/core/src/test-files/solr/collection1/lib/README' to classloader [junit4] 2> 969652 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrConfig current version of requestparams : -1 [junit4] 2> 969655 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.1.0 [junit4] 2> 969661 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.c.SolrConfig Loaded SolrConfig: solrconfig-basic.xml [junit4] 2> 969665 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.s.IndexSchema [null] Schema name=bad-schema-external-filefield [junit4] 2> 969669 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.SolrTestCaseJ4 ###deleteCore [junit4] 2> 969669 INFO (TEST-BadIndexSchemaTest.testBadExternalFileField-seed#[382FF46A9972D9D]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testBadExternalFileField [junit4] 2> 969671 INFO (TEST-BadIndexSchemaTest.testSevereErrorsForInvalidFieldOptions-seed#[382FF46A9972D9D]) [] o.a.s.SolrTestCaseJ4 ###Starting testSevereErrorsForInvalidFieldOptions [junit4] 2> 969671 INFO (TEST-BadIndexSchemaTest.testSevereErrorsForInvalidFieldOptions-seed#[382FF46A9972D9D]) [] o.a.s.SolrTestCaseJ4 initCore [junit4] 2> 969671 INFO (TEST-BadIndexSchemaTest.testSevereErrorsForInvalidFieldOptions-seed#[382FF46A9972D9D]) [] o.a.s.c.SolrResourceLoader new SolrResourceLoader for directory: '/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/core/src/test-files/solr/collection1' [junit4] 2> 969671 INFO (TEST-BadIndexSchemaTest.testSevereErrorsForInvalidFieldOptions-seed#[382FF46A9972D9D]) [] o.a.s.c.SolrResourceLoader JNDI not configured for solr (NoInitialContextEx) [junit4] 2> 969671 INFO (TEST-BadIndexSchemaTest.testSevereErrorsForInvalidFieldOptions-seed#[382FF46A9972D9D]) [] o.a.s.c.SolrResourceLoader using system property solr.solr.home: /export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/core/src/test-files/solr [junit4] 2> 969672 INFO
Re: Lucene FieldType & specifying numeric type (double, float, )
Again we dont care except that its bytes. FieldInfo already records the same thing we recorded for legacy shit: the length in bytes. That is all legacy numerics ever knew before (simply from the length of the term), if it was 32 or 64 bits. It could not differentiate integer from float, you could never do that. nothing was removed, there is no feature to keep here. On Thu, Mar 24, 2016 at 10:42 AM, David Smileywrote: > I should add, it we keep FieldType.NumericType and use it as I suggest, it > would either need a new enum value of "UNSPECIFIED" (think IPV6 or other > custom uses) or null; I'd prefer to avoid the null. > ~ David > > On Thu, Mar 24, 2016 at 10:34 AM David Smiley > wrote: >> >> With the move to PointValues and away from trie based indexing of the >> terms index, for numerics, everything associated with the trie stuff seems >> to be labelled as "Legacy" and marked deprecated. Even >> FieldType.NumericType (now FieldType.LegacyNumericType) -- a simple enum of >> INT, LONG, FLOAT, DOUBLE. I wonder if we ought to reconsider doing this for >> FieldType.NumericType, as it articulates the type of numeric data; it need >> not be associated with just trie indexing of terms data; it could articulate >> how any numeric data is encoded, be it docValues or pointValues. This is >> useful metadata. It's not strictly required, true, but its useful in >> describing what goes in the field. This makes a FieldType instance fairly >> self-sufficient. Otherwise, say you have docValue numerics and/or >> pointValues, it's ambiguous how the data should be interpreted. This >> doesn't lead to a bug but would help debugging and allowing APIs to express >> field requirements simply by providing a FieldType instance for numeric >> data. It used to be self sufficient but now if we imagine the legacy stuff >> being removed, it's ambiguous. In addition, it would be useful metadata if >> it found it's way into FieldInfo. Then, say Luke, could help you know >> what's there and maybe search it. >> >> Thoughts? >> >> ~ David >> -- >> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >> http://www.solrenterprisesearchserver.com > > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8742) HdfsDirectoryTest fails reliably after changes in LUCENE-6932
[ https://issues.apache.org/jira/browse/SOLR-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210334#comment-15210334 ] Steve Rowe commented on SOLR-8742: -- FYI, git bisect blames the same commit sha for me with this new seed as it did for hoss'ss seed. > HdfsDirectoryTest fails reliably after changes in LUCENE-6932 > - > > Key: SOLR-8742 > URL: https://issues.apache.org/jira/browse/SOLR-8742 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > the following seed fails reliably for me on master... > {noformat} >[junit4] 2> 1370568 INFO > (TEST-HdfsDirectoryTest.testEOF-seed#[A0D22782D87E1CE2]) [] > o.a.s.SolrTestCaseJ4 ###Ending testEOF >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=HdfsDirectoryTest > -Dtests.method=testEOF -Dtests.seed=A0D22782D87E1CE2 -Dtests.slow=true > -Dtests.locale=es-PR -Dtests.timezone=Indian/Mauritius -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1 >[junit4] ERROR 0.13s J0 | HdfsDirectoryTest.testEOF <<< >[junit4]> Throwable #1: java.lang.NullPointerException >[junit4]> at > __randomizedtesting.SeedInfo.seed([A0D22782D87E1CE2:31B9658A9A5ABA9E]:0) >[junit4]> at > org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:69) >[junit4]> at > org.apache.solr.store.hdfs.HdfsDirectoryTest.testEof(HdfsDirectoryTest.java:159) >[junit4]> at > org.apache.solr.store.hdfs.HdfsDirectoryTest.testEOF(HdfsDirectoryTest.java:151) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > git bisect says this is the first commit where it started failing.. > {noformat} > ddc65d977f920013c5fca16c8ac75ae2c6895f9d is the first bad commit > commit ddc65d977f920013c5fca16c8ac75ae2c6895f9d > Author: Michael McCandless> Date: Thu Jan 21 17:50:28 2016 + > LUCENE-6932: RAMInputStream now throws EOFException if you seek beyond > the end of the file > > git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1726039 > 13f79535-47bb-0310-9956-ffa450edef68 > {noformat} > ...which seems remarkable relevant and likely to indicate a problem that > needs fixed in the HdfsDirectory code (or perhaps just the test) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene FieldType & specifying numeric type (double, float, )
I should add, it we keep FieldType.NumericType and use it as I suggest, it would either need a new enum value of "UNSPECIFIED" (think IPV6 or other custom uses) or null; I'd prefer to avoid the null. ~ David On Thu, Mar 24, 2016 at 10:34 AM David Smileywrote: > With the move to PointValues and away from trie based indexing of the > terms index, for numerics, everything associated with the trie stuff seems > to be labelled as "Legacy" and marked deprecated. Even > FieldType.NumericType (now FieldType.LegacyNumericType) -- a simple enum of > INT, LONG, FLOAT, DOUBLE. I wonder if we ought to reconsider doing this > for FieldType.NumericType, as it articulates the type of numeric data; it > need not be associated with just trie indexing of terms data; it could > articulate how any numeric data is encoded, be it docValues or > pointValues. This is useful metadata. It's not strictly required, true, > but its useful in describing what goes in the field. This makes a > FieldType instance fairly self-sufficient. Otherwise, say you have > docValue numerics and/or pointValues, it's ambiguous how the data should be > interpreted. This doesn't lead to a bug but would help debugging and > allowing APIs to express field requirements simply by providing a FieldType > instance for numeric data. It used to be self sufficient but now if we > imagine the legacy stuff being removed, it's ambiguous. In addition, it > would be useful metadata if it found it's way into FieldInfo. Then, say > Luke, could help you know what's there and maybe search it. > > Thoughts? > > ~ David > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
Re: Lucene FieldType & specifying numeric type (double, float, )
This is not recorded into fieldinfo. For points lucene treats the data as a multidimensional byte[]. Its up to higher level indexing classes (Field) and search classes (Query) to deal with various ways to encode information in that byte. There is a lot more that can go in here than just INT, LONG, FLOAT, DOUBLE. See the sandbox which indexes BigInteger, InetAddress, etc and users should be able to extend it to other data types. The FieldType.LegacyNumericType is brain damage and is rightfully removed. On Thu, Mar 24, 2016 at 10:34 AM, David Smileywrote: > With the move to PointValues and away from trie based indexing of the terms > index, for numerics, everything associated with the trie stuff seems to be > labelled as "Legacy" and marked deprecated. Even FieldType.NumericType (now > FieldType.LegacyNumericType) -- a simple enum of INT, LONG, FLOAT, DOUBLE. > I wonder if we ought to reconsider doing this for FieldType.NumericType, as > it articulates the type of numeric data; it need not be associated with just > trie indexing of terms data; it could articulate how any numeric data is > encoded, be it docValues or pointValues. This is useful metadata. It's not > strictly required, true, but its useful in describing what goes in the > field. This makes a FieldType instance fairly self-sufficient. Otherwise, > say you have docValue numerics and/or pointValues, it's ambiguous how the > data should be interpreted. This doesn't lead to a bug but would help > debugging and allowing APIs to express field requirements simply by > providing a FieldType instance for numeric data. It used to be self > sufficient but now if we imagine the legacy stuff being removed, it's > ambiguous. In addition, it would be useful metadata if it found it's way > into FieldInfo. Then, say Luke, could help you know what's there and maybe > search it. > > Thoughts? > > ~ David > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene FieldType & specifying numeric type (double, float, )
With the move to PointValues and away from trie based indexing of the terms index, for numerics, everything associated with the trie stuff seems to be labelled as "Legacy" and marked deprecated. Even FieldType.NumericType (now FieldType.LegacyNumericType) -- a simple enum of INT, LONG, FLOAT, DOUBLE. I wonder if we ought to reconsider doing this for FieldType.NumericType, as it articulates the type of numeric data; it need not be associated with just trie indexing of terms data; it could articulate how any numeric data is encoded, be it docValues or pointValues. This is useful metadata. It's not strictly required, true, but its useful in describing what goes in the field. This makes a FieldType instance fairly self-sufficient. Otherwise, say you have docValue numerics and/or pointValues, it's ambiguous how the data should be interpreted. This doesn't lead to a bug but would help debugging and allowing APIs to express field requirements simply by providing a FieldType instance for numeric data. It used to be self sufficient but now if we imagine the legacy stuff being removed, it's ambiguous. In addition, it would be useful metadata if it found it's way into FieldInfo. Then, say Luke, could help you know what's there and maybe search it. Thoughts? ~ David -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
[jira] [Commented] (SOLR-8896) Processing of _route_ param should imply trailing '!' if not present
[ https://issues.apache.org/jira/browse/SOLR-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210312#comment-15210312 ] Yonik Seeley commented on SOLR-8896: bq. The SolrCloud _route_ param is supposed to end in a ! No it's not... {code} _route_:foo! //address shards of the collection containing documents starting with "foo!" _route_:foo //address shard of the collection containing the single document "foo" {code} > Processing of _route_ param should imply trailing '!' if not present > > > Key: SOLR-8896 > URL: https://issues.apache.org/jira/browse/SOLR-8896 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: David Smiley > > The SolrCloud {{\_route\_}} param is _supposed_ to end in a {{!}}. See > https://cwiki.apache.org/confluence/display/solr/Distributed+Requests If you > don't do it, you get different routing/hashing behavior that is bound to lead > to some head scratching. We should instead not require a trailing > exclamation (don't even mention it in the docs) and if it doesn't have one > then we internally append one prior to applying it to the hashing algorithm > (doc router). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7137) consolidate many tests across Points and GeoPoint queries/fields
[ https://issues.apache.org/jira/browse/LUCENE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7137: Attachment: LUCENE-7137.patch Here's a patch. consolidating the checks found a few minor bugs and inconsistencies in parameter checking. > consolidate many tests across Points and GeoPoint queries/fields > > > Key: LUCENE-7137 > URL: https://issues.apache.org/jira/browse/LUCENE-7137 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Attachments: LUCENE-7137.patch > > > We have found repeated basic problems with stuff like equals/hashcode > recently, I think we should consolidate tests and cleanup here. > these different implementations also have a little assortment of simplistic > unit tests, if its not doing anything impl-specific, we should fold those in > too. these are easy to debug and great to see fail if something is wrong. > I will work up a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8521) Add documentation for how to use Solr JDBC driver with SQL client like DB Visualizer
[ https://issues.apache.org/jira/browse/SOLR-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210296#comment-15210296 ] Joel Bernstein commented on SOLR-8521: -- I think the generic guide makes sense on the CWIKI for sure. I also think the screen shot guides belong on the CWIKI as well. Just in separate sub-pages. > Add documentation for how to use Solr JDBC driver with SQL client like DB > Visualizer > > > Key: SOLR-8521 > URL: https://issues.apache.org/jira/browse/SOLR-8521 > Project: Solr > Issue Type: Sub-task > Components: documentation, SolrJ >Affects Versions: master >Reporter: Kevin Risden > Attachments: solr_jdbc_dbvisualizer_20160203.pdf > > > Currently this requires the following: > * a JDBC SQL client program (like DBVisualizer or SQuirrelSQL) > * all jars from solr/dist/solrj-lib/* to be on the SQL client classpath > * solr/dist/solr-solrj-6.0.0-SNAPSHOT.jar on the SQL client classpath > * a valid JDBC connection string (like > jdbc:solr://SOLR_ZK_CONNECTION_STRING?collection=COLLECTION_NAME) > * without SOLR-8213, the username/password supplied by the SQL client will be > ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-jigsaw-ea+110) - Build # 16317 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/16317/ Java: 32bit/jdk-9-jigsaw-ea+110 -server -XX:+UseParallelGC -XX:-CompactStrings 2 tests failed. FAILED: org.apache.lucene.codecs.lucene60.TestLucene60PointsFormat.testMultiValued Error Message: wrong number of points in split: expected=5639 but actual=5681 Stack Trace: java.lang.IllegalStateException: wrong number of points in split: expected=5639 but actual=5681 at __randomizedtesting.SeedInfo.seed([8EC072D57279B288:5AE016E7BCBBF2C0]:0) at org.apache.lucene.util.bkd.BKDWriter.build(BKDWriter.java:1224) at org.apache.lucene.util.bkd.BKDWriter.finish(BKDWriter.java:862) at org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.writeField(Lucene60PointsWriter.java:115) at org.apache.lucene.index.PointValuesWriter.flush(PointValuesWriter.java:71) at org.apache.lucene.index.DefaultIndexingChain.writePoints(DefaultIndexingChain.java:172) at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:107) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:423) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:502) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:614) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3138) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3113) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1756) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1736) at org.apache.lucene.index.RandomIndexWriter.forceMerge(RandomIndexWriter.java:421) at org.apache.lucene.index.BasePointsFormatTestCase.verify(BasePointsFormatTestCase.java:667) at org.apache.lucene.index.BasePointsFormatTestCase.verify(BasePointsFormatTestCase.java:500) at org.apache.lucene.index.BasePointsFormatTestCase.testMultiValued(BasePointsFormatTestCase.java:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:531) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (LUCENE-6954) More Like This Query: keep fields separated
[ https://issues.apache.org/jira/browse/LUCENE-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210273#comment-15210273 ] Alessandro Benedetti commented on LUCENE-6954: -- Thanks Tommaso, let me know ! A test is in the patch, but we can add more if it is necessary and the solution seems fine. > More Like This Query: keep fields separated > --- > > Key: LUCENE-6954 > URL: https://issues.apache.org/jira/browse/LUCENE-6954 > Project: Lucene - Core > Issue Type: Bug > Components: modules/other >Affects Versions: 5.4 >Reporter: Alessandro Benedetti >Assignee: Tommaso Teofili > Labels: morelikethis > Attachments: LUCENE-6954.patch > > > Currently the query is generated : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > 1) we extract the terms from the interesting fields, adding them to a map : > MaptermFreqMap = new HashMap<>(); > ( we lose the relation field-> term, we don't know anymore where the term was > coming ! ) > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > ... > // go through all the fields and find the largest document frequency > String topField = fieldNames[0]; > int docFreq = 0; > for (String fieldName : fieldNames) { > int freq = ir.docFreq(new Term(fieldName, word)); > topField = (freq > docFreq) ? fieldName : topField; > docFreq = (freq > docFreq) ? freq : docFreq; > } > ... > We identify the topField as the field with the highest document frequency for > the term t . > Then we build the termQuery : > queue.add(new ScoreTerm(word, topField, score, idf, docFreq, tf)); > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : weSell and weDontSell for example. > It is likely I want to find documents with similar terms in the weSell and > similar terms in the weDontSell, without mixing up the things and loosing the > semantic of the terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8896) Processing of _route_ param should imply trailing '!' if not present
David Smiley created SOLR-8896: -- Summary: Processing of _route_ param should imply trailing '!' if not present Key: SOLR-8896 URL: https://issues.apache.org/jira/browse/SOLR-8896 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: David Smiley The SolrCloud {{\_route\_}} param is _supposed_ to end in a {{!}}. See https://cwiki.apache.org/confluence/display/solr/Distributed+Requests If you don't do it, you get different routing/hashing behavior that is bound to lead to some head scratching. We should instead not require a trailing exclamation (don't even mention it in the docs) and if it doesn't have one then we internally append one prior to applying it to the hashing algorithm (doc router). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8742) HdfsDirectoryTest fails reliably after changes in LUCENE-6932
[ https://issues.apache.org/jira/browse/SOLR-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210233#comment-15210233 ] Steve Rowe commented on SOLR-8742: -- Another reproducing failure (master/branch_6x/branch_6_0): {noformat} [junit4] Suite: org.apache.solr.store.hdfs.HdfsDirectoryTest [junit4] 2> Creating dataDir: /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsDirectoryTest_6BF936321AE9FC53-001/init-core-data-001 [junit4] 2> 432246 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) [junit4] 1> Formatting using clusterid: testClusterID [junit4] 2> 432262 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.m.i.MetricsConfig Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties [junit4] 2> 432267 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j [junit4] 2> 432269 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log jetty-6.1.26 [junit4] 2> 432276 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs to ./temp/Jetty_localhost_36931_hdfs.vsqnuq/webapp [junit4] 2> 432337 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 432703 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:36931 [junit4] 2> 432820 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j [junit4] 2> 432821 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log jetty-6.1.26 [junit4] 2> 432829 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_40567_datanode.hd2j4v/webapp [junit4] 2> 432887 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 433283 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:40567 [junit4] 2> 433304 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j [junit4] 2> 433305 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log jetty-6.1.26 [junit4] 2> 433315 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_54236_datanode.2l2cxv/webapp [junit4] 2> 41 INFO (IPC Server handler 3 on 35443) [] BlockStateChange BLOCK* processReport: from storage DS-3362e969-6b1f-4f8b-90c2-519bfe11a4e3 node DatanodeRegistration(127.0.0.1, datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), blocks: 0, hasStaleStorages: true, processing time: 1 msecs [junit4] 2> 42 INFO (IPC Server handler 3 on 35443) [] BlockStateChange BLOCK* processReport: from storage DS-35c6c048-304c-4d94-a2b2-47d07d42be08 node DatanodeRegistration(127.0.0.1, datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), blocks: 0, hasStaleStorages: false, processing time: 0 msecs [junit4] 2> 433404 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 433822 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:54236 [junit4] 2> 433851 INFO (IPC Server handler 4 on 35443) [] BlockStateChange BLOCK* processReport: from storage DS-7e62f7ee-4893-43bb-a5af-ce3fd18691b7 node DatanodeRegistration(127.0.0.1, datanodeUuid=c48cef8e-d1c1-4fa3-90c9-c2e0461c78c1, infoPort=54236, ipcPort=56889, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), blocks: 0, hasStaleStorages: true, processing time: 1
[jira] [Resolved] (SOLR-8895) HdfsDirectoryTest.testEOF() failure: NPE
[ https://issues.apache.org/jira/browse/SOLR-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved SOLR-8895. -- Resolution: Duplicate Sorry didn't do a JIRA search - I'll add the seed over there. > HdfsDirectoryTest.testEOF() failure: NPE > > > Key: SOLR-8895 > URL: https://issues.apache.org/jira/browse/SOLR-8895 > Project: Solr > Issue Type: Bug >Reporter: Steve Rowe > > My Jenkins found a reproducible seed on branch_6x: > {noformat} >[junit4] Suite: org.apache.solr.store.hdfs.HdfsDirectoryTest >[junit4] 2> Creating dataDir: > /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsDirectoryTest_6BF936321AE9FC53-001/init-core-data-001 >[junit4] 2> 432246 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) >[junit4] 1> Formatting using clusterid: testClusterID >[junit4] 2> 432262 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.m.i.MetricsConfig Cannot locate configuration: tried > hadoop-metrics2-namenode.properties,hadoop-metrics2.properties >[junit4] 2> 432267 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432269 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432276 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs > to ./temp/Jetty_localhost_36931_hdfs.vsqnuq/webapp >[junit4] 2> 432337 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 432703 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:36931 >[junit4] 2> 432820 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432821 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432829 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_40567_datanode.hd2j4v/webapp >[junit4] 2> 432887 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433283 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:40567 >[junit4] 2> 433304 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 433305 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 433315 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_54236_datanode.2l2cxv/webapp >[junit4] 2> 41 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-3362e969-6b1f-4f8b-90c2-519bfe11a4e3 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: true, processing time: 1 msecs >[junit4] 2> 42 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-35c6c048-304c-4d94-a2b2-47d07d42be08 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: false, processing time: 0 msecs >[junit4] 2> 433404 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433822 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) []
[jira] [Assigned] (LUCENE-6954) More Like This Query: keep fields separated
[ https://issues.apache.org/jira/browse/LUCENE-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili reassigned LUCENE-6954: --- Assignee: Tommaso Teofili > More Like This Query: keep fields separated > --- > > Key: LUCENE-6954 > URL: https://issues.apache.org/jira/browse/LUCENE-6954 > Project: Lucene - Core > Issue Type: Bug > Components: modules/other >Affects Versions: 5.4 >Reporter: Alessandro Benedetti >Assignee: Tommaso Teofili > Labels: morelikethis > Attachments: LUCENE-6954.patch > > > Currently the query is generated : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > 1) we extract the terms from the interesting fields, adding them to a map : > MaptermFreqMap = new HashMap<>(); > ( we lose the relation field-> term, we don't know anymore where the term was > coming ! ) > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > ... > // go through all the fields and find the largest document frequency > String topField = fieldNames[0]; > int docFreq = 0; > for (String fieldName : fieldNames) { > int freq = ir.docFreq(new Term(fieldName, word)); > topField = (freq > docFreq) ? fieldName : topField; > docFreq = (freq > docFreq) ? freq : docFreq; > } > ... > We identify the topField as the field with the highest document frequency for > the term t . > Then we build the termQuery : > queue.add(new ScoreTerm(word, topField, score, idf, docFreq, tf)); > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : weSell and weDontSell for example. > It is likely I want to find documents with similar terms in the weSell and > similar terms in the weDontSell, without mixing up the things and loosing the > semantic of the terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6954) More Like This Query: keep fields separated
[ https://issues.apache.org/jira/browse/LUCENE-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210228#comment-15210228 ] Tommaso Teofili commented on LUCENE-6954: - [~alessandro.benedetti], thanks for your patch. I can have a look at this. > More Like This Query: keep fields separated > --- > > Key: LUCENE-6954 > URL: https://issues.apache.org/jira/browse/LUCENE-6954 > Project: Lucene - Core > Issue Type: Bug > Components: modules/other >Affects Versions: 5.4 >Reporter: Alessandro Benedetti > Labels: morelikethis > Attachments: LUCENE-6954.patch > > > Currently the query is generated : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > 1) we extract the terms from the interesting fields, adding them to a map : > MaptermFreqMap = new HashMap<>(); > ( we lose the relation field-> term, we don't know anymore where the term was > coming ! ) > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > ... > // go through all the fields and find the largest document frequency > String topField = fieldNames[0]; > int docFreq = 0; > for (String fieldName : fieldNames) { > int freq = ir.docFreq(new Term(fieldName, word)); > topField = (freq > docFreq) ? fieldName : topField; > docFreq = (freq > docFreq) ? freq : docFreq; > } > ... > We identify the topField as the field with the highest document frequency for > the term t . > Then we build the termQuery : > queue.add(new ScoreTerm(word, topField, score, idf, docFreq, tf)); > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : weSell and weDontSell for example. > It is likely I want to find documents with similar terms in the weSell and > similar terms in the weDontSell, without mixing up the things and loosing the > semantic of the terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8521) Add documentation for how to use Solr JDBC driver with SQL client like DB Visualizer
[ https://issues.apache.org/jira/browse/SOLR-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210225#comment-15210225 ] Kevin Risden commented on SOLR-8521: Was there consensus on where the documentation should go? Reference guide or maybe the wiki? My compromise is below: Reference guide: generic guide on what is required to use SolrJ JDBC (no screenshots) Wiki: screenshot-by-screenshot walkthough with a page per client (DbVisualizer, SQuirrel SQL, Apache Zeppelin, etc)? I plan to write some blog posts about how to use each of these as well in the next few weeks. > Add documentation for how to use Solr JDBC driver with SQL client like DB > Visualizer > > > Key: SOLR-8521 > URL: https://issues.apache.org/jira/browse/SOLR-8521 > Project: Solr > Issue Type: Sub-task > Components: documentation, SolrJ >Affects Versions: master >Reporter: Kevin Risden > Attachments: solr_jdbc_dbvisualizer_20160203.pdf > > > Currently this requires the following: > * a JDBC SQL client program (like DBVisualizer or SQuirrelSQL) > * all jars from solr/dist/solrj-lib/* to be on the SQL client classpath > * solr/dist/solr-solrj-6.0.0-SNAPSHOT.jar on the SQL client classpath > * a valid JDBC connection string (like > jdbc:solr://SOLR_ZK_CONNECTION_STRING?collection=COLLECTION_NAME) > * without SOLR-8213, the username/password supplied by the SQL client will be > ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8895) HdfsDirectoryTest.testEOF() failure: NPE
[ https://issues.apache.org/jira/browse/SOLR-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210221#comment-15210221 ] Mark Miller commented on SOLR-8895: --- SOLR-8742 > HdfsDirectoryTest.testEOF() failure: NPE > > > Key: SOLR-8895 > URL: https://issues.apache.org/jira/browse/SOLR-8895 > Project: Solr > Issue Type: Bug >Reporter: Steve Rowe > > My Jenkins found a reproducible seed on branch_6x: > {noformat} >[junit4] Suite: org.apache.solr.store.hdfs.HdfsDirectoryTest >[junit4] 2> Creating dataDir: > /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsDirectoryTest_6BF936321AE9FC53-001/init-core-data-001 >[junit4] 2> 432246 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) >[junit4] 1> Formatting using clusterid: testClusterID >[junit4] 2> 432262 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.m.i.MetricsConfig Cannot locate configuration: tried > hadoop-metrics2-namenode.properties,hadoop-metrics2.properties >[junit4] 2> 432267 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432269 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432276 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs > to ./temp/Jetty_localhost_36931_hdfs.vsqnuq/webapp >[junit4] 2> 432337 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 432703 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:36931 >[junit4] 2> 432820 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432821 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432829 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_40567_datanode.hd2j4v/webapp >[junit4] 2> 432887 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433283 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:40567 >[junit4] 2> 433304 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 433305 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 433315 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_54236_datanode.2l2cxv/webapp >[junit4] 2> 41 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-3362e969-6b1f-4f8b-90c2-519bfe11a4e3 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: true, processing time: 1 msecs >[junit4] 2> 42 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-35c6c048-304c-4d94-a2b2-47d07d42be08 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: false, processing time: 0 msecs >[junit4] 2> 433404 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433822 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started
[jira] [Commented] (SOLR-8895) HdfsDirectoryTest.testEOF() failure: NPE
[ https://issues.apache.org/jira/browse/SOLR-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210220#comment-15210220 ] Mark Miller commented on SOLR-8895: --- I think [~hossman_luc...@fucit.org] already filed an issue for this. > HdfsDirectoryTest.testEOF() failure: NPE > > > Key: SOLR-8895 > URL: https://issues.apache.org/jira/browse/SOLR-8895 > Project: Solr > Issue Type: Bug >Reporter: Steve Rowe > > My Jenkins found a reproducible seed on branch_6x: > {noformat} >[junit4] Suite: org.apache.solr.store.hdfs.HdfsDirectoryTest >[junit4] 2> Creating dataDir: > /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsDirectoryTest_6BF936321AE9FC53-001/init-core-data-001 >[junit4] 2> 432246 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) >[junit4] 1> Formatting using clusterid: testClusterID >[junit4] 2> 432262 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.m.i.MetricsConfig Cannot locate configuration: tried > hadoop-metrics2-namenode.properties,hadoop-metrics2.properties >[junit4] 2> 432267 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432269 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432276 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs > to ./temp/Jetty_localhost_36931_hdfs.vsqnuq/webapp >[junit4] 2> 432337 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 432703 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:36931 >[junit4] 2> 432820 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432821 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432829 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_40567_datanode.hd2j4v/webapp >[junit4] 2> 432887 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433283 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:40567 >[junit4] 2> 433304 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 433305 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 433315 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_54236_datanode.2l2cxv/webapp >[junit4] 2> 41 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-3362e969-6b1f-4f8b-90c2-519bfe11a4e3 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: true, processing time: 1 msecs >[junit4] 2> 42 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-35c6c048-304c-4d94-a2b2-47d07d42be08 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: false, processing time: 0 msecs >[junit4] 2> 433404 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433822 INFO >
[jira] [Commented] (LUCENE-6954) More Like This Query: keep fields separated
[ https://issues.apache.org/jira/browse/LUCENE-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210211#comment-15210211 ] Alessandro Benedetti commented on LUCENE-6954: -- I do agree, it does not make a lot of sense if it was something done on purpose :) > More Like This Query: keep fields separated > --- > > Key: LUCENE-6954 > URL: https://issues.apache.org/jira/browse/LUCENE-6954 > Project: Lucene - Core > Issue Type: Bug > Components: modules/other >Affects Versions: 5.4 >Reporter: Alessandro Benedetti > Labels: morelikethis > Attachments: LUCENE-6954.patch > > > Currently the query is generated : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > 1) we extract the terms from the interesting fields, adding them to a map : > MaptermFreqMap = new HashMap<>(); > ( we lose the relation field-> term, we don't know anymore where the term was > coming ! ) > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > ... > // go through all the fields and find the largest document frequency > String topField = fieldNames[0]; > int docFreq = 0; > for (String fieldName : fieldNames) { > int freq = ir.docFreq(new Term(fieldName, word)); > topField = (freq > docFreq) ? fieldName : topField; > docFreq = (freq > docFreq) ? freq : docFreq; > } > ... > We identify the topField as the field with the highest document frequency for > the term t . > Then we build the termQuery : > queue.add(new ScoreTerm(word, topField, score, idf, docFreq, tf)); > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : weSell and weDontSell for example. > It is likely I want to find documents with similar terms in the weSell and > similar terms in the weDontSell, without mixing up the things and loosing the > semantic of the terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8888) Add shortestPath Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210206#comment-15210206 ] Joel Bernstein commented on SOLR-: -- I've been digging into the implementation and it looks like Streaming provides some real advantages. The biggest advantage comes from the ability to sort entire results by the Node id and do this in parallel across the cluster. This means that once the Nodes arrive at the worker they can simply be written to memory mapped files for the book keeping. The book keeping files need to be sorted by Node Id and most likely need offset information to support binary searching and skipping during intersections. I looked at using MapDB for the book keeping and if the data wasn't already coming in sorted then this would have been the approach to use. But even as fast as MapDB is there is still overhead that we don't need in managing the BTree's. So, in order to get the maximum speed in reading and writing the book keeping files I'm planning on just using memory mapped files with offsets. This is going to take more time to develop but will pay off when there are large traversals. > Add shortestPath Streaming Expression > - > > Key: SOLR- > URL: https://issues.apache.org/jira/browse/SOLR- > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein > > This ticket is to implement a distributed shortest path graph traversal as a > Streaming Expression. > possible expression syntax: > {code} > shortestPath(collection, > from="colA:node1", > to="colB:node2", > fq="limiting query", > maxDepth="10") > {code} > This would start from colA:node1 and traverse from colA to colB iteratively > until it finds colB:node2. The shortestPath function would emit Tuples > representing the shortest path. > The optional fq could be used to apply a filter on the traversal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6954) More Like This Query: keep fields separated
[ https://issues.apache.org/jira/browse/LUCENE-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-6954: - Issue Type: Bug (was: Improvement) Summary: More Like This Query: keep fields separated (was: More Like This Query Generation ) (re-titling and classifying as a bug; I think it's a bug) > More Like This Query: keep fields separated > --- > > Key: LUCENE-6954 > URL: https://issues.apache.org/jira/browse/LUCENE-6954 > Project: Lucene - Core > Issue Type: Bug > Components: modules/other >Affects Versions: 5.4 >Reporter: Alessandro Benedetti > Labels: morelikethis > Attachments: LUCENE-6954.patch > > > Currently the query is generated : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > 1) we extract the terms from the interesting fields, adding them to a map : > MaptermFreqMap = new HashMap<>(); > ( we lose the relation field-> term, we don't know anymore where the term was > coming ! ) > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > ... > // go through all the fields and find the largest document frequency > String topField = fieldNames[0]; > int docFreq = 0; > for (String fieldName : fieldNames) { > int freq = ir.docFreq(new Term(fieldName, word)); > topField = (freq > docFreq) ? fieldName : topField; > docFreq = (freq > docFreq) ? freq : docFreq; > } > ... > We identify the topField as the field with the highest document frequency for > the term t . > Then we build the termQuery : > queue.add(new ScoreTerm(word, topField, score, idf, docFreq, tf)); > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : weSell and weDontSell for example. > It is likely I want to find documents with similar terms in the weSell and > similar terms in the weDontSell, without mixing up the things and loosing the > semantic of the terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7137) consolidate many tests across Points and GeoPoint queries/fields
Robert Muir created LUCENE-7137: --- Summary: consolidate many tests across Points and GeoPoint queries/fields Key: LUCENE-7137 URL: https://issues.apache.org/jira/browse/LUCENE-7137 Project: Lucene - Core Issue Type: Test Reporter: Robert Muir We have found repeated basic problems with stuff like equals/hashcode recently, I think we should consolidate tests and cleanup here. these different implementations also have a little assortment of simplistic unit tests, if its not doing anything impl-specific, we should fold those in too. these are easy to debug and great to see fail if something is wrong. I will work up a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7094) spatial-extras BBoxStrategy and (confusingly!) PointVectorStrategy use legacy numeric encoding
[ https://issues.apache.org/jira/browse/LUCENE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210194#comment-15210194 ] David Smiley commented on LUCENE-7094: -- BTW even if UninvertingReader doesn't work when the field names are the same, I think that's fine. Whoever may be using UninvertingReader should have enabled docValues. Perhaps it's a requirement, not just a suggestion, based on your observations; I'd like to see for myself but either way I think you should remove the addition of a field suffix to your patch. That will limit the changes in the patch a bit too. I slept on this a bit and I think we'll need a better way for BBoxStrategy users to articulate what index options they want; it's now insufficient / ambiguous to provide a FieldType: * terms index (legacy) * pointValues index? * docValues? * stored? * type: double or float? (needed for pointValues & docValues) By default we can have pointValues, docValues, type double, not stored. Perhaps a little inner builderclass would work well. For legacy purposes, we could support the fieldType but it's either-or with the builder. It may take a bit of time to make double vs float configurable so that could be a follow-on issue. As it was, it was a TODO. > spatial-extras BBoxStrategy and (confusingly!) PointVectorStrategy use legacy > numeric encoding > -- > > Key: LUCENE-7094 > URL: https://issues.apache.org/jira/browse/LUCENE-7094 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Assignee: Nicholas Knize > Attachments: LUCENE-7094.patch > > > We need to deprecate these since they work on the old encoding and provide > points based alternatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8894) Support automatic kerberos ticket renewals in standalone Solr
[ https://issues.apache.org/jira/browse/SOLR-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210192#comment-15210192 ] Ishan Chattopadhyaya edited comment on SOLR-8894 at 3/24/16 1:17 PM: - If we cannot find a workaround for this, I suggest that instead of writing custom TGT renewal code, we drop support for standalone users in the kerberos authentication plugin. Currently, for SolrCloud, the TGT renewals can be taken care of by the zk client (if Solr nodes are connected to kerberized zk using the same zk client principal as its service principal for the kerberos authentication plugin) [0]. An alternative way is to use the ticket cache, and use kinit from the command line to renew the ticket, perhaps using a cronjob. If the latter is not working for standalone for some reason, which is what I believe you have tried and you find that it is not working, we should rather drop support for standalone users altogether. In such a case, a user interested in using kerberos authentication with standalone solr could use a forked version of the plugin from a separate repository and add the ticket renewal support and use the plugin. What do you think, [~anshumg], [~noble.paul]? [0] - https://issues.apache.org/jira/browse/ZOOKEEPER-1181 was (Author: ichattopadhyaya): If we cannot find a workaround for this, I suggest that instead of writing custom TGT renewal code, we drop support for standalone users in the kerberos authentication plugin. Currently, for SolrCloud, the TGT renewals can be taken care of by the zk client (if Solr nodes are connected to kerberized zk using the same zk client principal as its service principal for the kerberos authentication plugin) [0]. An alternative way is to use the ticket cache, and use kinit from the command line. If the latter is not working for standalone for some reason, which is what I believe you have tried and you find that it is not working, we should rather drop support for standalone users altogether. In such a case, a user interested in using kerberos authentication with standalone solr could use a forked version of the plugin from a separate repository and add the ticket renewal support and use the plugin. What do you think, [~anshumg], [~noble.paul]? [0] - https://issues.apache.org/jira/browse/ZOOKEEPER-1181 > Support automatic kerberos ticket renewals in standalone Solr > - > > Key: SOLR-8894 > URL: https://issues.apache.org/jira/browse/SOLR-8894 > Project: Solr > Issue Type: Bug >Reporter: Varun Thacker > > Currently in standalone Solr mode , tickets are not renewed automatically. So > once a ticket expires one has to restart the solr node for it to renew the > ticket. > We should support automatic ticket renewals in standalone solr as we do > currently in cloud mode. > There is no workaround for this other than to restart Solr. > If we manually do a kinit ( so that we can set a cron to do future kinit's ) > and then start Solr , Solr doesn't start up correctly. Steps we tried for the > workaround: > - Specify useKeyTab=false in the JAAS fle and then manually do a kinit and > then start solr. So fails to start in this case and throws an error like this > {code} > ERROR - 2016-03-14 20:07:03.505; [ ] org.apache.solr.common.SolrException; > null:org.apache.solr.common.SolrException: Error initializing kerberos > authentication plugin: javax.servlet.ServletException: > org.apache.hadoop.security.authentication.client.AuthenticationException: > javax.security.auth.login.LoginException: No key to store > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8894) Support automatic kerberos ticket renewals in standalone Solr
[ https://issues.apache.org/jira/browse/SOLR-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210192#comment-15210192 ] Ishan Chattopadhyaya commented on SOLR-8894: If we cannot find a workaround for this, I suggest that instead of writing custom TGT renewal code, we drop support for standalone users in the kerberos authentication plugin. Currently, for SolrCloud, the TGT renewals can be taken care of by the zk client (if Solr nodes are connected to kerberized zk using the same zk client principal as its service principal for the kerberos authentication plugin) [0]. An alternative way is to use the ticket cache, and use kinit from the command line. If the latter is not working for standalone for some reason, which is what I believe you have tried and you find that it is not working, we should rather drop support for standalone users altogether. In such a case, a user interested in using kerberos authentication with standalone solr could use a forked version of the plugin from a separate repository and add the ticket renewal support and use the plugin. What do you think, [~anshumg], [~noble.paul]? [0] - https://issues.apache.org/jira/browse/ZOOKEEPER-1181 > Support automatic kerberos ticket renewals in standalone Solr > - > > Key: SOLR-8894 > URL: https://issues.apache.org/jira/browse/SOLR-8894 > Project: Solr > Issue Type: Bug >Reporter: Varun Thacker > > Currently in standalone Solr mode , tickets are not renewed automatically. So > once a ticket expires one has to restart the solr node for it to renew the > ticket. > We should support automatic ticket renewals in standalone solr as we do > currently in cloud mode. > There is no workaround for this other than to restart Solr. > If we manually do a kinit ( so that we can set a cron to do future kinit's ) > and then start Solr , Solr doesn't start up correctly. Steps we tried for the > workaround: > - Specify useKeyTab=false in the JAAS fle and then manually do a kinit and > then start solr. So fails to start in this case and throws an error like this > {code} > ERROR - 2016-03-14 20:07:03.505; [ ] org.apache.solr.common.SolrException; > null:org.apache.solr.common.SolrException: Error initializing kerberos > authentication plugin: javax.servlet.ServletException: > org.apache.hadoop.security.authentication.client.AuthenticationException: > javax.security.auth.login.LoginException: No key to store > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8895) HdfsDirectoryTest.testEOF() failure: NPE
[ https://issues.apache.org/jira/browse/SOLR-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210159#comment-15210159 ] Steve Rowe commented on SOLR-8895: -- This seed reproduces for me on master and branch_6_0 as well. > HdfsDirectoryTest.testEOF() failure: NPE > > > Key: SOLR-8895 > URL: https://issues.apache.org/jira/browse/SOLR-8895 > Project: Solr > Issue Type: Bug >Reporter: Steve Rowe > > My Jenkins found a reproducible seed on branch_6x: > {noformat} >[junit4] Suite: org.apache.solr.store.hdfs.HdfsDirectoryTest >[junit4] 2> Creating dataDir: > /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsDirectoryTest_6BF936321AE9FC53-001/init-core-data-001 >[junit4] 2> 432246 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) >[junit4] 1> Formatting using clusterid: testClusterID >[junit4] 2> 432262 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.m.i.MetricsConfig Cannot locate configuration: tried > hadoop-metrics2-namenode.properties,hadoop-metrics2.properties >[junit4] 2> 432267 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432269 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432276 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs > to ./temp/Jetty_localhost_36931_hdfs.vsqnuq/webapp >[junit4] 2> 432337 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 432703 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:36931 >[junit4] 2> 432820 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 432821 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 432829 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_40567_datanode.hd2j4v/webapp >[junit4] 2> 432887 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433283 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:40567 >[junit4] 2> 433304 WARN > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] > o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j >[junit4] 2> 433305 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > jetty-6.1.26 >[junit4] 2> 433315 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log > Extract > jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode > to ./temp/Jetty_localhost_54236_datanode.2l2cxv/webapp >[junit4] 2> 41 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-3362e969-6b1f-4f8b-90c2-519bfe11a4e3 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: true, processing time: 1 msecs >[junit4] 2> 42 INFO (IPC Server handler 3 on 35443) [] > BlockStateChange BLOCK* processReport: from storage > DS-35c6c048-304c-4d94-a2b2-47d07d42be08 node DatanodeRegistration(127.0.0.1, > datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, > ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), > blocks: 0, hasStaleStorages: false, processing time: 0 msecs >[junit4] 2> 433404 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO > JSP Support for /, did not find org.apache.jasper.servlet.JspServlet >[junit4] 2> 433822 INFO > (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker)
[jira] [Created] (SOLR-8895) HdfsDirectoryTest.testEOF() failure: NPE
Steve Rowe created SOLR-8895: Summary: HdfsDirectoryTest.testEOF() failure: NPE Key: SOLR-8895 URL: https://issues.apache.org/jira/browse/SOLR-8895 Project: Solr Issue Type: Bug Reporter: Steve Rowe My Jenkins found a reproducible seed on branch_6x: {noformat} [junit4] Suite: org.apache.solr.store.hdfs.HdfsDirectoryTest [junit4] 2> Creating dataDir: /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J5/temp/solr.store.hdfs.HdfsDirectoryTest_6BF936321AE9FC53-001/init-core-data-001 [junit4] 2> 432246 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) [junit4] 1> Formatting using clusterid: testClusterID [junit4] 2> 432262 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.m.i.MetricsConfig Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties [junit4] 2> 432267 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j [junit4] 2> 432269 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log jetty-6.1.26 [junit4] 2> 432276 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/hdfs to ./temp/Jetty_localhost_36931_hdfs.vsqnuq/webapp [junit4] 2> 432337 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 432703 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:36931 [junit4] 2> 432820 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j [junit4] 2> 432821 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log jetty-6.1.26 [junit4] 2> 432829 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_40567_datanode.hd2j4v/webapp [junit4] 2> 432887 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 433283 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:40567 [junit4] 2> 433304 WARN (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.a.h.h.HttpRequestLog Jetty request log can only be enabled using Log4j [junit4] 2> 433305 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log jetty-6.1.26 [junit4] 2> 433315 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Extract jar:file:/var/lib/jenkins/.ivy2/cache/org.apache.hadoop/hadoop-hdfs/tests/hadoop-hdfs-2.6.0-tests.jar!/webapps/datanode to ./temp/Jetty_localhost_54236_datanode.2l2cxv/webapp [junit4] 2> 41 INFO (IPC Server handler 3 on 35443) [] BlockStateChange BLOCK* processReport: from storage DS-3362e969-6b1f-4f8b-90c2-519bfe11a4e3 node DatanodeRegistration(127.0.0.1, datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), blocks: 0, hasStaleStorages: true, processing time: 1 msecs [junit4] 2> 42 INFO (IPC Server handler 3 on 35443) [] BlockStateChange BLOCK* processReport: from storage DS-35c6c048-304c-4d94-a2b2-47d07d42be08 node DatanodeRegistration(127.0.0.1, datanodeUuid=a1c6edfb-4bb8-4e12-a3d4-dc5308fd9199, infoPort=40567, ipcPort=34011, storageInfo=lv=-56;cid=testClusterID;nsid=1766496377;c=0), blocks: 0, hasStaleStorages: false, processing time: 0 msecs [junit4] 2> 433404 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet [junit4] 2> 433822 INFO (SUITE-HdfsDirectoryTest-seed#[6BF936321AE9FC53]-worker) [] o.m.log Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:54236 [junit4] 2> 433851 INFO (IPC Server handler 4 on 35443) [] BlockStateChange BLOCK* processReport: from storage DS-7e62f7ee-4893-43bb-a5af-ce3fd18691b7 node DatanodeRegistration(127.0.0.1, datanodeUuid=c48cef8e-d1c1-4fa3-90c9-c2e0461c78c1, infoPort=54236, ipcPort=56889,
[jira] [Updated] (LUCENE-7136) remove Threads from BaseGeoPointTestCase
[ https://issues.apache.org/jira/browse/LUCENE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7136: Attachment: LUCENE-7136.patch > remove Threads from BaseGeoPointTestCase > > > Key: LUCENE-7136 > URL: https://issues.apache.org/jira/browse/LUCENE-7136 > Project: Lucene - Core > Issue Type: Test >Reporter: Robert Muir > Attachments: LUCENE-7136.patch > > > I don't think we should mix testing threads with all the other stuff going on > here. It makes things too hard to debug. > if we want to test thread safety of e.g. BKD or queries somewhere, that > should be an explicit narrow test just for that (no complicated geometry > going on). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7136) remove Threads from BaseGeoPointTestCase
Robert Muir created LUCENE-7136: --- Summary: remove Threads from BaseGeoPointTestCase Key: LUCENE-7136 URL: https://issues.apache.org/jira/browse/LUCENE-7136 Project: Lucene - Core Issue Type: Test Reporter: Robert Muir Attachments: LUCENE-7136.patch I don't think we should mix testing threads with all the other stuff going on here. It makes things too hard to debug. if we want to test thread safety of e.g. BKD or queries somewhere, that should be an explicit narrow test just for that (no complicated geometry going on). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8894) Support automatic kerberos ticket renewals in standalone Solr
Varun Thacker created SOLR-8894: --- Summary: Support automatic kerberos ticket renewals in standalone Solr Key: SOLR-8894 URL: https://issues.apache.org/jira/browse/SOLR-8894 Project: Solr Issue Type: Bug Reporter: Varun Thacker Currently in standalone Solr mode , tickets are not renewed automatically. So once a ticket expires one has to restart the solr node for it to renew the ticket. We should support automatic ticket renewals in standalone solr as we do currently in cloud mode. There is no workaround for this other than to restart Solr. If we manually do a kinit ( so that we can set a cron to do future kinit's ) and then start Solr , Solr doesn't start up correctly. Steps we tried for the workaround: - Specify useKeyTab=false in the JAAS fle and then manually do a kinit and then start solr. So fails to start in this case and throws an error like this {code} ERROR - 2016-03-14 20:07:03.505; [ ] org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Error initializing kerberos authentication plugin: javax.servlet.ServletException: org.apache.hadoop.security.authentication.client.AuthenticationException: javax.security.auth.login.LoginException: No key to store {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6954) More Like This Query Generation
[ https://issues.apache.org/jira/browse/LUCENE-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210109#comment-15210109 ] Alessandro Benedetti commented on LUCENE-6954: -- Just reviewed the patch again, it seems fine to me, but because it's touching MLT internals I would like some second opinion and possible suggestions :) Didn't spend much time trying to re-invent that part, simply followed the original implementation, any feedback is welcome! Cheers > More Like This Query Generation > > > Key: LUCENE-6954 > URL: https://issues.apache.org/jira/browse/LUCENE-6954 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Affects Versions: 5.4 >Reporter: Alessandro Benedetti > Labels: morelikethis > Attachments: LUCENE-6954.patch > > > Currently the query is generated : > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > 1) we extract the terms from the interesting fields, adding them to a map : > MaptermFreqMap = new HashMap<>(); > ( we lose the relation field-> term, we don't know anymore where the term was > coming ! ) > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > 2) we build the queue that will contain the query terms, at this point we > connect again there terms to some field, but : > ... > // go through all the fields and find the largest document frequency > String topField = fieldNames[0]; > int docFreq = 0; > for (String fieldName : fieldNames) { > int freq = ir.docFreq(new Term(fieldName, word)); > topField = (freq > docFreq) ? fieldName : topField; > docFreq = (freq > docFreq) ? freq : docFreq; > } > ... > We identify the topField as the field with the highest document frequency for > the term t . > Then we build the termQuery : > queue.add(new ScoreTerm(word, topField, score, idf, docFreq, tf)); > In this way we lose a lot of precision. > Not sure why we do that. > I would prefer to keep the relation between terms and fields. > The MLT query can improve a lot the quality. > If i run the MLT on 2 fields : weSell and weDontSell for example. > It is likely I want to find documents with similar terms in the weSell and > similar terms in the weDontSell, without mixing up the things and loosing the > semantic of the terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6966) Contribution: Codec for index-level encryption
[ https://issues.apache.org/jira/browse/LUCENE-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Delbru updated LUCENE-6966: -- Attachment: LUCENE-6966-1.patch This patch contains the current state of the codec for index-level encryption. It is up to date with the latest version of the lucene-solr master branch. This patch does not include yet the ability for the users to choose which cipher to use. I'll submit a new patch that will tackle this issue in the next coming week. The full lucene test suite has been executed against this codec using the command: {code} ant -Dtests.codec=EncryptedLucene60 test {code} Only one test fails, TestSizeBoundedForceMerge#testByteSizeLimit, which is expected. This test is incompatible with the codec. The doc values format (prototype based on an encrypted index output) is not included in this patch, and will be submitted as a separate patch in the next coming days. > Contribution: Codec for index-level encryption > -- > > Key: LUCENE-6966 > URL: https://issues.apache.org/jira/browse/LUCENE-6966 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/other >Reporter: Renaud Delbru > Labels: codec, contrib > Attachments: LUCENE-6966-1.patch > > > We would like to contribute a codec that enables the encryption of sensitive > data in the index that has been developed as part of an engagement with a > customer. We think that this could be of interest for the community. > Below is a description of the project. > h1. Introduction > In comparison with approaches where all data is encrypted (e.g., file system > encryption, index output / directory encryption), encryption at a codec level > enables more fine-grained control on which block of data is encrypted. This > is more efficient since less data has to be encrypted. This also gives more > flexibility such as the ability to select which field to encrypt. > Some of the requirements for this project were: > * The performance impact of the encryption should be reasonable. > * The user can choose which field to encrypt. > * Key management: During the life cycle of the index, the user can provide a > new version of his encryption key. Multiple key versions should co-exist in > one index. > h1. What is supported ? > - Block tree terms index and dictionary > - Compressed stored fields format > - Compressed term vectors format > - Doc values format (prototype based on an encrypted index output) - this > will be submitted as a separated patch > - Index upgrader: command to upgrade all the index segments with the latest > key version available. > h1. How it is implemented ? > h2. Key Management > One index segment is encrypted with a single key version. An index can have > multiple segments, each one encrypted using a different key version. The key > version for a segment is stored in the segment info. > The provided codec is abstract, and a subclass is responsible in providing an > implementation of the cipher factory. The cipher factory is responsible of > the creation of a cipher instance based on a given key version. > h2. Encryption Model > The encryption model is based on AES/CBC with padding. Initialisation vector > (IV) is reused for performance reason, but only on a per format and per > segment basis. > While IV reuse is usually considered a bad practice, the CBC mode is somehow > resilient to IV reuse. The only "leak" of information that this could lead to > is being able to know that two encrypted blocks of data starts with the same > prefix. However, it is unlikely that two data blocks in an index segment will > start with the same data: > - Stored Fields Format: Each encrypted data block is a compressed block > (~4kb) of one or more documents. It is unlikely that two compressed blocks > start with the same data prefix. > - Term Vectors: Each encrypted data block is a compressed block (~4kb) of > terms and payloads from one or more documents. It is unlikely that two > compressed blocks start with the same data prefix. > - Term Dictionary Index: The term dictionary index is encoded and encrypted > in one single data block. > - Term Dictionary Data: Each data block of the term dictionary encodes a set > of suffixes. It is unlikely to have two dictionary data blocks sharing the > same prefix within the same segment. > - DocValues: A DocValues file will be composed of multiple encrypted data > blocks. It is unlikely to have two data blocks sharing the same prefix within > the same segment (each one will encodes a list of values associated to a > field). > To the best of our knowledge, this model should be safe. However, it would be > good if someone with security expertise in the community could review and > validate it. > h1. Performance > We report here a
[jira] [Updated] (SOLR-8893) Wrong TermVector docfreq calculation with enabled ExactStatsCache
[ https://issues.apache.org/jira/browse/SOLR-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Daffner updated SOLR-8893: -- Description: Hi, we are currently facing the issue that some calculated values of the TV component are obviously wrong with enabled ExactStatsCache. --> shard-wide TV docfreq calculation This problem is subsequent to SOLR-8459 NPE using TermVectorComponent in combinition with ExactStatsCache Maybe the problem is very trivial and we configured something wrong ... So lets go deeper into that problem: 1) The problem in summary: == We are requesting with enabled "tv.df", "tv.tf" and "tv.tf_idf" --> {code} tv.df=true_idf=true=true {code} additionally for debugging purposes we are requesting by calling {code} termfreq("site_term_maincontent","abakus"),docfreq("site_maincontent_term_wdf","abakus"),ttf("site_maincontent_term_wdf","abakus") {code} Our findings are: - the tv.tf as well as the termfreq seems to be correct - the tv.df as well as the docfreq is obviously wrong - the tv.tf_idf as well as ttf is wrong as well, I guess as subsequent fault of the tv.df (docfeq) 2) What we have: === schema.xml: {code} ... ... ... {code} solrconfig.xml: {code} ... ... true tvComponent ... {code} You can find out any details here: http://149.202.5.192:8820/solr/#/SingleDomainSite_34_shard1_replica1 3) Examples If you are calling this link you can see that there are 6 existent documents containing the word "abakus" in the field "site_maincontent_term_wdf" ... http://149.202.5.192:8820/solr/SingleDomainSite_34_shard1_replica1/tvrh?q=site_maincontent_term_wdf%3Aabakus+AND+site_headercode%3A200=%2Ftvrh=site_maincontent_term_wdf=true_idf=true=true=site_url_id,site_url,termfreq%28%22site_term_maincontent%22,%22abakus%22%29,docfreq%28%22site_maincontent_term_wdf%22,%22abakus%22%29,ttf%28%22site_maincontent_term_wdf%22,%22abakus%22%29 But if you are looking into the field "docfreq" in the output documents, it is incorrect and always different (sould be always the same ...). "docfreq(field,term) returns the number of documents that contain the term in the field. This is a constant (the same value for all documents in the index)." Here is a link with enabled shards.info: http://149.202.5.192:8820/solr/SingleDomainSite_34_shard1_replica1/tvrh?=xml=site_maincontent_term_wdf%3Aabakus=0=10=ttf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cdocfreq%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cidf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Csite_url=/tvrh=true Here is a link with enabled debug: http://149.202.5.192:8820/solr/SingleDomainSite_34_shard1_replica1/tvrh?omitHeader=true=%2Ftvrh=xml=flat=site_maincontent_term_wdf%3Aabakus=0=1000=ttf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cdocfreq%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cidf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Csite_url=true was: Hi, we are currently facing the issue that some calculated values of the TV component are obviously wrong with enabled ExactStatsCache. --> shard-wide TV docfreq calculation Maybe the problem is very trivial and we configured something wrong ... So lets go deeper into that problem: 1) The problem in summary: == We are requesting with enabled "tv.df", "tv.tf" and "tv.tf_idf" --> {code} tv.df=true_idf=true=true {code} additionally for debugging purposes we are requesting by calling {code} termfreq("site_term_maincontent","abakus"),docfreq("site_maincontent_term_wdf","abakus"),ttf("site_maincontent_term_wdf","abakus") {code} Our findings are: - the tv.tf as well as the termfreq seems to be correct - the tv.df as well as the docfreq is obviously wrong - the tv.tf_idf as well as ttf is wrong as well, I guess as subsequent fault of the tv.df (docfeq) 2) What we have: === schema.xml: {code} ... ... ... {code} solrconfig.xml: {code} ... ... true tvComponent ... {code} You can find out any details here: http://149.202.5.192:8820/solr/#/SingleDomainSite_34_shard1_replica1 3) Examples If you are calling this link you can see that there are 6 existent documents containing the word "abakus" in the field "site_maincontent_term_wdf" ...
[jira] [Created] (SOLR-8893) Wrong TermVector docfreq calculation with enabled ExactStatsCache
Andreas Daffner created SOLR-8893: - Summary: Wrong TermVector docfreq calculation with enabled ExactStatsCache Key: SOLR-8893 URL: https://issues.apache.org/jira/browse/SOLR-8893 Project: Solr Issue Type: Bug Affects Versions: 5.5 Reporter: Andreas Daffner Hi, we are currently facing the issue that some calculated values of the TV component are obviously wrong with enabled ExactStatsCache. --> shard-wide TV docfreq calculation Maybe the problem is very trivial and we configured something wrong ... So lets go deeper into that problem: 1) The problem in summary: == We are requesting with enabled "tv.df", "tv.tf" and "tv.tf_idf" --> {code} tv.df=true_idf=true=true {code} additionally for debugging purposes we are requesting by calling {code} termfreq("site_term_maincontent","abakus"),docfreq("site_maincontent_term_wdf","abakus"),ttf("site_maincontent_term_wdf","abakus") {code} Our findings are: - the tv.tf as well as the termfreq seems to be correct - the tv.df as well as the docfreq is obviously wrong - the tv.tf_idf as well as ttf is wrong as well, I guess as subsequent fault of the tv.df (docfeq) 2) What we have: === schema.xml: {code} ... ... ... {code} solrconfig.xml: {code} ... ... true tvComponent ... {code} You can find out any details here: http://149.202.5.192:8820/solr/#/SingleDomainSite_34_shard1_replica1 3) Examples If you are calling this link you can see that there are 6 existent documents containing the word "abakus" in the field "site_maincontent_term_wdf" ... http://149.202.5.192:8820/solr/SingleDomainSite_34_shard1_replica1/tvrh?q=site_maincontent_term_wdf%3Aabakus+AND+site_headercode%3A200=%2Ftvrh=site_maincontent_term_wdf=true_idf=true=true=site_url_id,site_url,termfreq%28%22site_term_maincontent%22,%22abakus%22%29,docfreq%28%22site_maincontent_term_wdf%22,%22abakus%22%29,ttf%28%22site_maincontent_term_wdf%22,%22abakus%22%29 But if you are looking into the field "docfreq" in the output documents, it is incorrect and always different (sould be always the same ...). "docfreq(field,term) returns the number of documents that contain the term in the field. This is a constant (the same value for all documents in the index)." Here is a link with enabled shards.info: http://149.202.5.192:8820/solr/SingleDomainSite_34_shard1_replica1/tvrh?=xml=site_maincontent_term_wdf%3Aabakus=0=10=ttf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cdocfreq%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cidf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Csite_url=/tvrh=true Here is a link with enabled debug: http://149.202.5.192:8820/solr/SingleDomainSite_34_shard1_replica1/tvrh?omitHeader=true=%2Ftvrh=xml=flat=site_maincontent_term_wdf%3Aabakus=0=1000=ttf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cdocfreq%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cidf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Csite_url=true -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's
[ https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210008#comment-15210008 ] Ahmet Arslan commented on LUCENE-7132: -- It is really hard to decipher what is going on inside the good old TFIDFSimilarity. {code:title=TFIDFSimilarity.IDFStats.normalize|borderStyle=solid} @Override public void normalize(float queryNorm, float boost) { this.boost = boost; this.queryNorm = queryNorm; queryWeight = queryNorm * boost * idf.getValue(); value = queryWeight * idf.getValue(); // idf for document } {code} * Why query weight has a IDF multiplicand? * Why TFIDFSimilarity.IDFStats#value is set to IDF square? * Why TFIDFSimilarity.IDFStats#value is need even though we have TFIDFSimilarity.IDFStats.idf.getValue(); * TFIDFSimilarity.TFIDFSimScorer#score returns tf(freq) * IDFStats.value which looks tfxIDFxIDF to me. > ScoreDoc.score() returns a different value than that of Explanation's > - > > Key: LUCENE-7132 > URL: https://issues.apache.org/jira/browse/LUCENE-7132 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 5.5 >Reporter: Ahmet Arslan >Assignee: Steve Rowe > Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, > debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-6.x-Linux (64bit/jdk1.8.0_72) - Build # 226 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/226/ Java: 64bit/jdk1.8.0_72 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.solr.core.TestDynamicLoading.testDynamicLoading Error Message: Could not get expected value 'X val' for path 'x' full output: { "responseHeader":{ "status":0, "QTime":0}, "params":{"wt":"json"}, "context":{ "webapp":"/qnl/g", "path":"/test1", "httpMethod":"GET"}, "class":"org.apache.solr.core.BlobStoreTestRequestHandler", "x":null} Stack Trace: java.lang.AssertionError: Could not get expected value 'X val' for path 'x' full output: { "responseHeader":{ "status":0, "QTime":0}, "params":{"wt":"json"}, "context":{ "webapp":"/qnl/g", "path":"/test1", "httpMethod":"GET"}, "class":"org.apache.solr.core.BlobStoreTestRequestHandler", "x":null} at __randomizedtesting.SeedInfo.seed([27B966539A4B:C95C0AEE918E3FEB]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.core.TestSolrConfigHandler.testForResponseElement(TestSolrConfigHandler.java:458) at org.apache.solr.core.TestDynamicLoading.testDynamicLoading(TestDynamicLoading.java:238) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:996) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:971) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
Re: [JENKINS] Lucene-Solr-master-Linux (32bit/jdk1.8.0_72) - Build # 16312 - Failure!
I pushed a fix ... test bug, only affecting 6.1 and master. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 24, 2016 at 5:05 AM, Michael McCandlesswrote: > I'll dig, this is a new check I added to BKDWriter recently. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Mar 23, 2016 at 11:47 PM, Policeman Jenkins Server > wrote: >> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/16312/ >> Java: 32bit/jdk1.8.0_72 -server -XX:+UseSerialGC >> >> 1 tests failed. >> FAILED: org.apache.lucene.util.bkd.TestBKD.testWithExceptions >> >> Error Message: >> totalPointCount=16427 was passed when we were created, but we just hit 16428 >> values >> >> Stack Trace: >> java.lang.IllegalStateException: totalPointCount=16427 was passed when we >> were created, but we just hit 16428 values >> at >> __randomizedtesting.SeedInfo.seed([8F21BAE79AEA8203:2D72269B1564CDF9]:0) >> at org.apache.lucene.util.bkd.BKDWriter.add(BKDWriter.java:275) >> at org.apache.lucene.util.bkd.TestBKD.verify(TestBKD.java:605) >> at >> org.apache.lucene.util.bkd.TestBKD.testWithExceptions(TestBKD.java:409) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) >> at >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> at >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) >> at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) >> at >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) >> at >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) >> at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) >> at >> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >>