[jira] [Commented] (SOLR-4006) Many tests on Apache Jenkins are failing with lingering threads.
[ https://issues.apache.org/jira/browse/SOLR-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494586#comment-13494586 ] Dawid Weiss commented on SOLR-4006: --- Hi Mark. Can you send me the detailed spec of this freebsd vm you're using and an ant line that reproduces this hang? I'll take a look, I'm also curious what's happening, in particular with regard to those forever-hung jvms that don't timeout (I suppose it's some sort of native socket wait that's causing this). > Many tests on Apache Jenkins are failing with lingering threads. > > > Key: SOLR-4006 > URL: https://issues.apache.org/jira/browse/SOLR-4006 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > > I think I've tracked this down to being related to the black hole. > It seems to be a recovery call to a server that is down or something - it's > hanging in the connect method even though we are using a connect timeout. > {noformat} > Thread[RecoveryThread,5,TGRP-SyncSliceTest] > java.net.PlainSocketImpl.socketConnect(Native Method) > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) > java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) > java.net.Socket.connect(Socket.java:546) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4053) metrics - add statistics on searcher/cache warming
[ https://issues.apache.org/jira/browse/SOLR-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494574#comment-13494574 ] Otis Gospodnetic commented on SOLR-4053: Shawn, arent' warming times already in JMX? I think they are because I know SPM for Solr has nice pie charts and timeseries graphs with warmup timings broken down, and that must be coming from JMX. Or perhaps you want the percentiles in JMX? If so, can't monitoring tools calculate that, shouldn't that be their job? > metrics - add statistics on searcher/cache warming > -- > > Key: SOLR-4053 > URL: https://issues.apache.org/jira/browse/SOLR-4053 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.0 >Reporter: Shawn Heisey >Priority: Minor > Fix For: 4.1, 5.0 > > > One stat that I rely on is the amount of time that it takes to warm caches > and an entire searcher, but unless you turn on INFO logging and write > something to parse the logs, you can only see how long the last commit took > to warm. I propose that we use the new metrics capability added in SOLR-1972 > to give us visibility into historical cache/searcher warming times. > If I find some time in the near future, I will take a stab at creating a > patch, but if someone else has an idea and time, don't wait around for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.7.0_09) - Build # 1511 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1511/ Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC All tests passed Build Log: [...truncated 23892 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:62: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\build.xml:558: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:410: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\contrib\dataimporthandler-extras\build.xml:43: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:359: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:397: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\build.xml:46: Unable to delete file C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar Total time: 30 minutes 1 second Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_09) - Build # 2313 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/2313/ Java: 32bit/jdk1.7.0_09 -server -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest.testRandom Error Message: expected: but was: Stack Trace: org.junit.ComparisonFailure: expected: but was: at __randomizedtesting.SeedInfo.seed([64C9D03229E9A923:1685F53D98891F50]:0) at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest.testRandom(AnalyzingSuggesterTest.java:710) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:722) Build Log: [...truncated 7796 lines...] [junit4:junit4] Suite: org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest [junit4:junit4] 2> NOTE: reproduce with: ant test -Dtestcase=AnalyzingSuggesterTest -Dtests.method=testRandom -Dtests.seed=64C9D03229E9A923 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=hi_IN -Dtests.timezone=Pacific/Norfolk -Dtests.file.encoding=US-ASCII [junit4:junit4] FAILURE 4.47s J0 | AnalyzingSuggesterTest.testRandom <<< [junit
[jira] [Commented] (SOLR-3816) Need a more granular nrt system that is close to a realtime system.
[ https://issues.apache.org/jira/browse/SOLR-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494554#comment-13494554 ] Otis Gospodnetic commented on SOLR-3816: H... didn't check the sources now, but I'm not sure if the above is all correct. Lucene gets the new Reader from IndexWriter, and I would think Solr uses that on soft commit and not something else, big and heavy. Yes, there is Searcher/cache warming, but I'm not sure if that comes into play any more with NRT and soft commits. > Need a more granular nrt system that is close to a realtime system. > --- > > Key: SOLR-3816 > URL: https://issues.apache.org/jira/browse/SOLR-3816 > Project: Solr > Issue Type: Improvement > Components: clients - java, replication (java), search, > SearchComponents - other, SolrCloud, update >Affects Versions: 4.0 >Reporter: Nagendra Nagarajayya > Labels: nrt, realtime, replication, search, solrcloud, update > Attachments: alltests_passed_with_realtime_turnedoff.log, > SOLR-3816_4.0_branch.patch, SOLR-3816-4.x.trunk.patch, > solr-3816-realtime_nrt.patch > > > Need a more granular NRT system that is close to a realtime system. A > realtime system should be able to reflect changes to the index as and when > docs are added/updated to the index. soft-commit offers NRT and is more > realtime friendly than hard commit but is limited by the dependency on the > SolrIndexSearcher being closed and reopened and offers a coarse granular NRT. > Closing and reopening of the SolrIndexSearcher may impact performance also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4059) Custom Sharding
[ https://issues.apache.org/jira/browse/SOLR-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494546#comment-13494546 ] Mark Miller commented on SOLR-4059: --- Hmm..of course you would still want to be able to send to any node I think...so seems more like something along the lines of shardId= on the update > Custom Sharding > --- > > Key: SOLR-4059 > URL: https://issues.apache.org/jira/browse/SOLR-4059 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Mark Miller > > Had not fully thought through this one yet, but Yonik caught me up at > ApacheCon. We need to be able to skip hashing and let the client choose the > shard, but still send to replicas. > Ideas for the interface? hash=false? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Its a really simple answer. Your problem (and i quote): Content indexed as state:california But it seems like I search state:CALIFORNI~0.65 (via solr) it doesn't work. I'm worried that Solr isn't running my text through the query analyzers first! This is some analysis chain configuration issue. We don't need to add support for some unscalable stuff to lucene to correct for that: you just need to make sure lowercasing is happening. NOTE: I will continue to protest/veto/anything i can to block queries with horrible complexity, making as much noise as possible, because the end solution is for users to index and search content correctly and get results in reasonable amount of time. If it doesn't work with 100M documents, i don't want it in lucene. I would have the same opinion if someone wanted unscalable solutions for scoring w/ language models (e.g. not happy with smoothing for unknown probabilities), or if someone claimed that spatial queries should do slow things because they don't currently support interplanetary distances, and so on. On Fri, Nov 9, 2012 at 7:52 PM, Mark Bennett wrote: > Hi Robert, > > I acknowledge your "-1" vote, and I'm guessing that your objection is maybe > 70% "scalability", and only 30% use-case? > > The older Levenstein stuff has been around for a long time, scalable or not, > and already in real systems. > > You seem to have a very "binary" on code being "in" or "out". Is there any > room in your world-view of code for "gray code", unsupported, incubator, > what-have-you? Maybe analagous to people who jailbreak their iPhones or > something? > > You're an important part of the community, and working at Lucid, etc., and > clearly concerned about software quality. When smart folks like you have > such sharp opinions I do try to ponder them against my own circumstances. > > And on the quality of the old code, was it just the scalability, or were > there other concerns such as stability, coding style, or possibly > inconsistent results? > > Isn't the sandbox and admonished reference in Java docs sufficient? > > I'm harping on this because I'm really between a rock and hard place, and > also posted another question. > > Just trying to understand your very strong opinions, and I thank you for > your patience in this matter. This issue is either going to fix or break my > weekend / next-deliverble. > > Sincere thanks, > Mark > > > -- > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 > > > On Fri, Nov 9, 2012 at 4:37 PM, Robert Muir wrote: >> >> I'm -1 for having unscalable shit in lucene's core. This query should >> have never been added. >> >> I don't care if a few people complain because they aren't using >> lowercasefilter or some other insanity. Fix your analysis chain. I >> don't have any sympathy. >> >> On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky >> wrote: >> > +1 for permitting a choice of fuzzy query implementation. >> > >> > I agree that we want a super-fast fuzzy query for simple variations, but >> > I >> > also agree that we should have the option to trade off speed for >> > function. >> > >> > But I am also sympathetic to assuring that any core Lucene features be >> > as >> > performant as possible. >> > >> > Ultimately, if there was a single fuzzy query implementation that did >> > everything for everybody all of the time, that would be the way to go, >> > but >> > if choices need to be made to satisfy competing goals, we should support >> > going that route. >> > >> > -- Jack Krupansky >> > >> > From: Mark Bennett >> > Sent: Friday, November 09, 2012 3:48 PM >> > To: dev@lucene.apache.org >> > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] >> > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast. >> > >> > Hi Robert, >> > >> > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir wrote: >> >> >> >> ... >> >> ... I'm strongly against having this >> >> unscalable garbage in lucene's core. >> >> >> >> There is no use case for ed > 2, thats just crazy. >> > >> > >> > I promise you there ARE use cases for edit distances > 2, especially >> > with >> > longer words. Due to NDA I can't go into details. >> > >> > Also ed>2 can be useful when COMBINING that low-quality part of the >> > search >> > with other sub-queries, or additional business rules. Maybe instead of >> > boiling an ocean this lets you just boil the sea. ;-) >> > >> > I won't comment on the quality of the older Levenstein code, or the >> > likely >> > very slow performance, nor where the code should live, etc. >> > >> > But your statement about "no use case for ed > 2" is simply not true. >> > (whether you'd agree with any of them or not is certainly another >> > matter) >> > >> > I understand your concerns about not having it be the default. (or >> > maybe >> > having a giant warning message or something, whatever) >> > >> >> -- >> >> lucidworks.com >> >> >> >> --
Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
On Fri, Sep 14, 2012 at 7:12 PM, Chris Hostetter wrote: > > for your crazy unscalableness" That really depends - size of the index, requirements around response times, caching, data. If anyone was using it before, they were using it in a way that ended up being acceptable to them. Or they were not using it. -- - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky wrote: > +1 for permitting a choice of fuzzy query implementation. +1. I wouldn't allow it by default though. I'd prefer having to set allowSlowFuzzyAlg or something with some good javadoc. Won't let you accidentally move from a fast alg to a slow one, but also keeps the functionality very discoverable. Having it in contrib is not as good, but okay. -1 on deprecation. Best place to have these discussions - if someone thinks they have a good idea - is in a JIRA issue. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4059) Custom Sharding
Mark Miller created SOLR-4059: - Summary: Custom Sharding Key: SOLR-4059 URL: https://issues.apache.org/jira/browse/SOLR-4059 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Had not fully thought through this one yet, but Yonik caught me up at ApacheCon. We need to be able to skip hashing and let the client choose the shard, but still send to replicas. Ideas for the interface? hash=false? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2592: -- Assignee: (was: Mark Miller) Summary: Custom Hashing (was: Pluggable shard lookup mechanism for SolrCloud) > Custom Hashing > -- > > Key: SOLR-2592 > URL: https://issues.apache.org/jira/browse/SOLR-2592 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Noble Paul > Attachments: dbq_fix.patch, pluggable_sharding.patch, > pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, > SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, > SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch > > > If the data in a cloud can be partitioned on some criteria (say range, hash, > attribute value etc) It will be easy to narrow down the search to a smaller > subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Hi Robert, I acknowledge your "-1" vote, and I'm guessing that your objection is maybe 70% "scalability", and only 30% use-case? The older Levenstein stuff has been around for a long time, scalable or not, and already in real systems. You seem to have a very "binary" on code being "in" or "out". Is there any room in your world-view of code for "gray code", unsupported, incubator, what-have-you? Maybe analagous to people who jailbreak their iPhones or something? You're an important part of the community, and working at Lucid, etc., and clearly concerned about software quality. When smart folks like you have such sharp opinions I do try to ponder them against my own circumstances. And on the quality of the old code, was it just the scalability, or were there other concerns such as stability, coding style, or possibly inconsistent results? Isn't the sandbox and admonished reference in Java docs sufficient? I'm harping on this because I'm really between a rock and hard place, and also posted another question. Just trying to understand your very strong opinions, and I thank you for your patience in this matter. This issue is either going to fix or break my weekend / next-deliverble. Sincere thanks, Mark -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Fri, Nov 9, 2012 at 4:37 PM, Robert Muir wrote: > I'm -1 for having unscalable shit in lucene's core. This query should > have never been added. > > I don't care if a few people complain because they aren't using > lowercasefilter or some other insanity. Fix your analysis chain. I > don't have any sympathy. > > On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky > wrote: > > +1 for permitting a choice of fuzzy query implementation. > > > > I agree that we want a super-fast fuzzy query for simple variations, but > I > > also agree that we should have the option to trade off speed for > function. > > > > But I am also sympathetic to assuring that any core Lucene features be as > > performant as possible. > > > > Ultimately, if there was a single fuzzy query implementation that did > > everything for everybody all of the time, that would be the way to go, > but > > if choices need to be made to satisfy competing goals, we should support > > going that route. > > > > -- Jack Krupansky > > > > From: Mark Bennett > > Sent: Friday, November 09, 2012 3:48 PM > > To: dev@lucene.apache.org > > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] > > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast. > > > > Hi Robert, > > > > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir wrote: > >> > >> ... > >> ... I'm strongly against having this > >> unscalable garbage in lucene's core. > >> > >> There is no use case for ed > 2, thats just crazy. > > > > > > I promise you there ARE use cases for edit distances > 2, especially with > > longer words. Due to NDA I can't go into details. > > > > Also ed>2 can be useful when COMBINING that low-quality part of the > search > > with other sub-queries, or additional business rules. Maybe instead of > > boiling an ocean this lets you just boil the sea. ;-) > > > > I won't comment on the quality of the older Levenstein code, or the > likely > > very slow performance, nor where the code should live, etc. > > > > But your statement about "no use case for ed > 2" is simply not true. > > (whether you'd agree with any of them or not is certainly another matter) > > > > I understand your concerns about not having it be the default. (or maybe > > having a giant warning message or something, whatever) > > > >> -- > >> lucidworks.com > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Created] (SOLR-4058) DIH should use the SolrCloudServer impl when running in SolrCloud mode.
Mark Miller created SOLR-4058: - Summary: DIH should use the SolrCloudServer impl when running in SolrCloud mode. Key: SOLR-4058 URL: https://issues.apache.org/jira/browse/SOLR-4058 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Priority: Minor Fix For: 4.1, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
I'm -1 for having unscalable shit in lucene's core. This query should have never been added. I don't care if a few people complain because they aren't using lowercasefilter or some other insanity. Fix your analysis chain. I don't have any sympathy. On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky wrote: > +1 for permitting a choice of fuzzy query implementation. > > I agree that we want a super-fast fuzzy query for simple variations, but I > also agree that we should have the option to trade off speed for function. > > But I am also sympathetic to assuring that any core Lucene features be as > performant as possible. > > Ultimately, if there was a single fuzzy query implementation that did > everything for everybody all of the time, that would be the way to go, but > if choices need to be made to satisfy competing goals, we should support > going that route. > > -- Jack Krupansky > > From: Mark Bennett > Sent: Friday, November 09, 2012 3:48 PM > To: dev@lucene.apache.org > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast. > > Hi Robert, > > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir wrote: >> >> ... >> ... I'm strongly against having this >> unscalable garbage in lucene's core. >> >> There is no use case for ed > 2, thats just crazy. > > > I promise you there ARE use cases for edit distances > 2, especially with > longer words. Due to NDA I can't go into details. > > Also ed>2 can be useful when COMBINING that low-quality part of the search > with other sub-queries, or additional business rules. Maybe instead of > boiling an ocean this lets you just boil the sea. ;-) > > I won't comment on the quality of the older Levenstein code, or the likely > very slow performance, nor where the code should live, etc. > > But your statement about "no use case for ed > 2" is simply not true. > (whether you'd agree with any of them or not is certainly another matter) > > I understand your concerns about not having it be the default. (or maybe > having a giant warning message or something, whatever) > >> -- >> lucidworks.com >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
+1 for permitting a choice of fuzzy query implementation. I agree that we want a super-fast fuzzy query for simple variations, but I also agree that we should have the option to trade off speed for function. But I am also sympathetic to assuring that any core Lucene features be as performant as possible. Ultimately, if there was a single fuzzy query implementation that did everything for everybody all of the time, that would be the way to go, but if choices need to be made to satisfy competing goals, we should support going that route. -- Jack Krupansky From: Mark Bennett Sent: Friday, November 09, 2012 3:48 PM To: dev@lucene.apache.org Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast. Hi Robert, On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir wrote: ... ... I'm strongly against having this unscalable garbage in lucene's core. There is no use case for ed > 2, thats just crazy. I promise you there ARE use cases for edit distances > 2, especially with longer words. Due to NDA I can't go into details. Also ed>2 can be useful when COMBINING that low-quality part of the search with other sub-queries, or additional business rules. Maybe instead of boiling an ocean this lets you just boil the sea. ;-) I won't comment on the quality of the older Levenstein code, or the likely very slow performance, nor where the code should live, etc. But your statement about "no use case for ed > 2" is simply not true. (whether you'd agree with any of them or not is certainly another matter) I understand your concerns about not having it be the default. (or maybe having a giant warning message or something, whatever) -- lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4006) Many tests on Apache Jenkins are failing with lingering threads.
[ https://issues.apache.org/jira/browse/SOLR-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494461#comment-13494461 ] Mark Miller commented on SOLR-4006: --- I'm back with power and back from ApacheCon finally. I've confirmed with my local freebsd vm that the nio selector change is indeed the culprit. It does seem like perhaps the timeout is not being respected. I'll start by reverting I suppose and keep looking for a solution. At worst we can pass a isFreebsd sys prop or something on our free bsd jenkings machine and then don't use NIO in that case. I'd rather it worked somehow though... > Many tests on Apache Jenkins are failing with lingering threads. > > > Key: SOLR-4006 > URL: https://issues.apache.org/jira/browse/SOLR-4006 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > > I think I've tracked this down to being related to the black hole. > It seems to be a recovery call to a server that is down or something - it's > hanging in the connect method even though we are using a connect timeout. > {noformat} > Thread[RecoveryThread,5,TGRP-SyncSliceTest] > java.net.PlainSocketImpl.socketConnect(Native Method) > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) > java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) > java.net.Socket.connect(Socket.java:546) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.6.0_37) - Build # 1506 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1506/ Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC All tests passed Build Log: [...truncated 23533 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:229: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:397: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\build.xml:46: Unable to delete file C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar Total time: 28 minutes 15 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Hi Robert, On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir wrote: > ... > ... I'm strongly against having this > unscalable garbage in lucene's core. > > There is no use case for ed > 2, thats just crazy. I promise you there ARE use cases for edit distances > 2, especially with longer words. Due to NDA I can't go into details. Also ed>2 can be useful when COMBINING that low-quality part of the search with other sub-queries, or additional business rules. Maybe instead of boiling an ocean this lets you just boil the sea. ;-) I won't comment on the quality of the older Levenstein code, or the likely very slow performance, nor where the code should live, etc. But your statement about "no use case for ed > 2" is simply not true. (whether you'd agree with any of them or not is certainly another matter) I understand your concerns about not having it be the default. (or maybe having a giant warning message or something, whatever) -- > lucidworks.com > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
4 quick questions about Fuzzy Search, including forcing SlowFuzzySearch
I've been checking the code a bit, but it's taking while, and I have 4 questions: Summary: I want to submit fuzzy searches, with lower scores, of long words, via Solr. I want to use the older/slower method, even though it's slower. (I realize low percents on long words sounds like a bad idea, it's a very long story, there's lots of other stuff going on) Also I have search time analyzer logic in schema.xml that needs to used, whether I'm doing a regular search or fuzzy search. Example: state:California~0.65 (overly simple example of course) Or even: state:CALIFORNI~0.65 (1 letter off) And still have match: Content indexed as state:california Things I'm worried about: 1: Need the parser to call SlowFuzzyQuery instead of FuzzySearch (yup, we know it's slow!) Not sure if this is about invoking the old parser, or if it's some type of config issue instead? 2: I don't want the 0.65 score being needlessly translated into an integer and then getting needlessly capped at 2. I'm not sure if the approach is: * "don't bother converting from float to int", OR * "convert to int if you want, but don't cap it at 2" 3: Schema.xml Analyzers apply lowercase to words at both index and search time. (We actually have some other complex analyzers that *need* to happen, just using lowercase as an example) But it seems like I search state:CALIFORNI~0.65 (via solr) it doesn't work. I'm worried that Solr isn't running my text through the query analyzers first! 4: Would the XML parser help with any this? I think it's still somewhat in limbo? We do programmatically build some parts of queries using the Lucene API, then convert to Strings. Then we pass the strings to Solr; this seemed to be suggested workaround I found online. Wondering if XML would bypass this step and give other more precise control over slowfuzzy vs. fuzzy. I'm not sure if this a matter of trying to force the old "classic" query parser, or setting some configuration or -D directive regardless of parser being used. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
[jira] [Updated] (SOLR-4051) DIH Delta updates do not work for all locales
[ https://issues.apache.org/jira/browse/SOLR-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-4051: - Attachment: SOLR-4051.patch This also fixes SOLR-1970 & SOLR-2658, allowing configurable locale, dateformat, filename and location. It needs a new test and validation. This adds a as an element in data-config.xml that allows the user to specify an implementation of interface DIHPropertiesWriter. This interface was introduced in 3.6 and should have been marked as "lucene.experimental". This patch changes this interface and adds the experimental annotation also, just in case it needs to change again. Allowing pluggable property writers should open the door to easily solve issues like SOLR-3365. > DIH Delta updates do not work for all locales > - > > Key: SOLR-4051 > URL: https://issues.apache.org/jira/browse/SOLR-4051 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 4.0 >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-4051.patch > > > DIH Writes the last modified date to a Properties file using the default > locale. This gets sent in plaintext to the database at the next delta > update. DIH does not use prepared statements but just puts the date in an > SQL Statement in -mm-dd hh:mm:ss format. It would probably be best to > always format this date in JDBC escape syntax > (http://docs.oracle.com/javase/1.4.2/docs/guide/jdbc/getstart/statement.html#999472) > and java.sql.Timestamp#toString(). To do this, we'd need to parse the > user's query and remove the single quotes likely there (and now the quotes > would be optional and undesired). > It might just be simpler to change the SimpleDateFormat to use the root > locale as this appears to be the original intent here anyhow. Affected > locales include ja_JP_JP , hi_IN , th_TH -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4057) SolrCloud will not run on the root context.
[ https://issues.apache.org/jira/browse/SOLR-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4057: -- Priority: Minor (was: Major) Changing to minor priority given the workaround - I'd like to fix the cosmetic issue and make documentation less of an issue - I'll also add doc to the wiki to be explicit on the topic. > SolrCloud will not run on the root context. > --- > > Key: SOLR-4057 > URL: https://issues.apache.org/jira/browse/SOLR-4057 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 4.1, 5.0 > > > If you try and pass an empty hostContext to solrcloud when trying to run on > the root context, the empty value simply triggers using the default value of > 8983. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4057) SolrCloud will not run on the root context.
[ https://issues.apache.org/jira/browse/SOLR-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494327#comment-13494327 ] Mark Miller commented on SOLR-4057: --- Interesting - doesn't seem very intuitive to me though - especially because we don't ask for a path - we ask for a string value of the context - at the least it would still be an issue that it's not documented - and it's not something I'd want to require seeing in the resulting URL strings. Even if some people thought it was clear that it was a path so you could use . for the root context, I don't think most people associate . and .. with URL's the same way they do with files. I and at least one other did not anyway :) It's a great workaround though. > SolrCloud will not run on the root context. > --- > > Key: SOLR-4057 > URL: https://issues.apache.org/jira/browse/SOLR-4057 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > > If you try and pass an empty hostContext to solrcloud when trying to run on > the root context, the empty value simply triggers using the default value of > 8983. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4057) SolrCloud will not run on the root context.
[ https://issues.apache.org/jira/browse/SOLR-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494313#comment-13494313 ] Roman Shaposhnik commented on SOLR-4057: it appears that specifying hostContext="." works as expected or am I missing something? > SolrCloud will not run on the root context. > --- > > Key: SOLR-4057 > URL: https://issues.apache.org/jira/browse/SOLR-4057 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.1, 5.0 > > > If you try and pass an empty hostContext to solrcloud when trying to run on > the root context, the empty value simply triggers using the default value of > 8983. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3785) Cluster-state inconsistent
[ https://issues.apache.org/jira/browse/SOLR-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494302#comment-13494302 ] Mark Miller commented on SOLR-3785: --- bq. ZkStateReader should have logic that, when calculating a shard-state, looks at this ephemeral node, but if it is missing assumes "down"-state. That's not a bad idea. > Cluster-state inconsistent > -- > > Key: SOLR-3785 > URL: https://issues.apache.org/jira/browse/SOLR-3785 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Self-build Solr release built on Apache Solr revision > 1355667 from 4.x branch >Reporter: Per Steffensen > Attachments: SOLR-3785.patch > > > Information in CloudSolrServer.getZkStateReader().getCloudState() (called > cloudState below) seems to be inconsistent. > I have a Solr running the leader of slice "sliceName" in collection > "collectionName" - no replica to take over. I shut down this Solr, and I want > to detect that there is now no leader active. > I do e.g. > {code} > ZkNodeProps leader = cloudState.getLeader(indexName, sliceName); > boolean notActive = (leader == null) || > !leader.containsKey(ZkStateReader.STATE_PROP) || > !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE); > {code} > This does not work. It seems like changing state of a shard does it not > changed when this Solr goes down. > I do e.g. > {code} > ZkNodeProps leader = cloudState.getLeader(indexName, sliceName); > boolean notActive = (leader == null) || > !leader.containsKey(ZkStateReader.STATE_PROP) || > !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE) || > !leader.containsKey(ZkStateReader.NODE_NAME_PROP) || > !cloudState.getLiveNodes().contains(leader.get(ZkStateReader.NODE_NAME_PROP)) > {code} > Whis works. > It seems like live-nodes of cloudState is updated when Solr goes down, but > that some of the other info available through cloudState is not - e.g. > getLeader(). > This might already have already been solved on 4.x branch in a revision later > than 1355667. Then please just tell me - thanks. > Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4057) SolrCloud will not run on the root context.
Mark Miller created SOLR-4057: - Summary: SolrCloud will not run on the root context. Key: SOLR-4057 URL: https://issues.apache.org/jira/browse/SOLR-4057 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.1, 5.0 If you try and pass an empty hostContext to solrcloud when trying to run on the root context, the empty value simply triggers using the default value of 8983. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494266#comment-13494266 ] Simon Willnauer commented on LUCENE-4549: - bq. I think you unintentionally always enabled rate limiting in the test case. well I intentionally enabled it but i missed the nocommit. good catch... I plan to commit this soon... > Allow variable buffer size on BufferedIndexOutput > -- > > Key: LUCENE-4549 > URL: https://issues.apache.org/jira/browse/LUCENE-4549 > Project: Lucene - Core > Issue Type: Improvement > Components: core/store >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4549.patch > > > BufferedIndexInput allows to set the buffersize but BufferedIndexOutput > doesn't this could be useful for optimizations related to LUCENE-4537. We > should make the apis here consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4466. - Resolution: Fixed Fix Version/s: 5.0 4.1 > Bounds check inconsistent for stored fields vs term vectors > > > Key: LUCENE-4466 > URL: https://issues.apache.org/jira/browse/LUCENE-4466 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Reporter: Robert Muir > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4466.patch > > > SegmentReader.document does the check for stored fields. Codec's dont. > SegmentReader.getTermVectors doesnt do the check for vectors. Codec does. > I think we should move the vectors check out to SR, too. Codecs can have an > assert if they want, but the APIs should look more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494250#comment-13494250 ] Michael McCandless commented on LUCENE-4466: +1 > Bounds check inconsistent for stored fields vs term vectors > > > Key: LUCENE-4466 > URL: https://issues.apache.org/jira/browse/LUCENE-4466 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Reporter: Robert Muir > Attachments: LUCENE-4466.patch > > > SegmentReader.document does the check for stored fields. Codec's dont. > SegmentReader.getTermVectors doesnt do the check for vectors. Codec does. > I think we should move the vectors check out to SR, too. Codecs can have an > assert if they want, but the APIs should look more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4466: Attachment: LUCENE-4466.patch > Bounds check inconsistent for stored fields vs term vectors > > > Key: LUCENE-4466 > URL: https://issues.apache.org/jira/browse/LUCENE-4466 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Reporter: Robert Muir > Attachments: LUCENE-4466.patch > > > SegmentReader.document does the check for stored fields. Codec's dont. > SegmentReader.getTermVectors doesnt do the check for vectors. Codec does. > I think we should move the vectors check out to SR, too. Codecs can have an > assert if they want, but the APIs should look more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors
[ https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494219#comment-13494219 ] Michael McCandless commented on LUCENE-4466: +1 > Bounds check inconsistent for stored fields vs term vectors > > > Key: LUCENE-4466 > URL: https://issues.apache.org/jira/browse/LUCENE-4466 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Reporter: Robert Muir > > SegmentReader.document does the check for stored fields. Codec's dont. > SegmentReader.getTermVectors doesnt do the check for vectors. Codec does. > I think we should move the vectors check out to SR, too. Codecs can have an > assert if they want, but the APIs should look more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_09) - Build # 1506 - Failure!
I probably caused this with my fix yesterday (exmaple/build.xml, see 'sync-hack'). But why would windows have this file open? On Fri, Nov 9, 2012 at 1:39 PM, Policeman Jenkins Server wrote: > Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/1506/ > Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC > > All tests passed > > Build Log: > [...truncated 24072 lines...] > BUILD FAILED > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:62: The > following error occurred while executing this line: > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build.xml:558: > The following error occurred while executing this line: > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:410: > The following error occurred while executing this line: > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\contrib\dataimporthandler-extras\build.xml:43: > The following error occurred while executing this line: > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:359: > The following error occurred while executing this line: > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:397: > The following error occurred while executing this line: > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\build.xml:46: > Unable to delete file > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar > > Total time: 29 minutes 19 seconds > Build step 'Invoke Ant' marked build as failure > Archiving artifacts > Recording test results > Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC > Email was triggered for: Failure > Sending email for trigger: Failure > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_09) - Build # 1506 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/1506/ Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC All tests passed Build Log: [...truncated 24072 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:62: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build.xml:558: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:410: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\contrib\dataimporthandler-extras\build.xml:43: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:359: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:397: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\build.xml:46: Unable to delete file C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar Total time: 29 minutes 19 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2482) Index sorter
[ https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494187#comment-13494187 ] Matthew Willson commented on LUCENE-2482: - Hi all -- few quick questions if anyone is still watching this. * Could this be used to achieve an impact ordered index, as in e.g. [1], where documents in a given term's postings list are ordered by score contribution or term frequency? * Any caveats or things one should be aware of when it comes to index sorting in combination with different index merge strategies, and some of the more advanced stuff in Solr for managing distributed indexes? * Anyone aware of any other work along the lines of early stopping / dynamic pruning optimisations in Lucene? e.g. MaxScore from [1] (I think Xapian [2] calls it 'operator decay') or accumulator pruning based algorithms from [1] (perhaps in combination with impact ordering)? in particular is there anything in Lucene 4's approach to scoring and indexing which would make these hard in principle? Any pointers gratefully received. [1] Buettcher Clarke & Cormack "Implementing and Evaluating search engines" ch. 5 pp. 143-153 [2] http://xapian.org/docs/matcherdesign.html > Index sorter > > > Key: LUCENE-2482 > URL: https://issues.apache.org/jira/browse/LUCENE-2482 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/other >Affects Versions: 3.1, 4.0-ALPHA >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki > Fix For: 3.6 > > Attachments: indexSorter.patch, LUCENE-2482-4.0.patch > > > A tool to sort index according to a float document weight. Documents with > high weight are given low document numbers, which means that they will be > first evaluated. When using a strategy of "early termination" of queries (see > TimeLimitedCollector) such sorting significantly improves the quality of > partial results. > (Originally this tool was created by Doug Cutting in Nutch, and used norms as > document weights - thus the ordering was limited by the limited resolution of > norms. This is a pure Lucene version of the tool, and it uses arbitrary > floats from a specified stored field). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4056) Contribution of component to gather the most frequent user search request in real-time
Siegfried Goeschl created SOLR-4056: --- Summary: Contribution of component to gather the most frequent user search request in real-time Key: SOLR-4056 URL: https://issues.apache.org/jira/browse/SOLR-4056 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 3.6.1 Reporter: Siegfried Goeschl Priority: Minor Fix For: 3.6.2 I'm now finishing a SOLR project for one of my customers (replacing Microsoft FAST server with SOLR) and got the permission to contribute our improvements. The most interesting thing is a "FrequentSearchTerm" component which allows to analyze the user-supplied search queries in real-time * it keeps track of the last queries per core using a LIFO buffer (so we have an upper limit of memory consumption) * per query entry we keep track of the number of invocations, the average number of result document and the average execution time * we allow for custom searches across the frequent search terms using the MVEL expression language (see http://mvel.codehaus.org) ** find all queries which did not yield any results - 'meanHits==0' ** find all "iPhone" queries - "searchTerm.contains("iphone) || searchTerm.contains("i-phone)'' ** find all long-running "iPhone" queries - '(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) && meanTime>50' * GUI : we have a JSP page which allows to access the frequent search terms * there is also an XML/CSV export we use to display the 50 most frequently used search queries in real-time We use this component * to get input for QA regarding frequently used search terms * to find strange queries, e.g. queries returning no or too many result, e.g. caused by WordDelimeterFilter * to keep our management happy ... :-) Not sure if the name "Frequent Search Term Component" is perfectly suitable as it was taken from FAST - suggestions welcome. Maybe "FrequentSearchQueryComponent" would be more suitable? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [SOLR] RFC - Contributing a FrequentSearchTerm component ...
Absolutely feel free to open up a JIRA and attach a patch for something like this! You can create an account and edit JIRAs freely. You don't need to clean it up much before putting up the first patch. It's often useful to let other eyes take a quick look at it and make comments before polishing. It's perfectly reasonable to have //TODOs or //nocommit comments in the code as a flag that "this isn't finished yet", but it's up to you. Best Erick On Fri, Nov 9, 2012 at 8:37 AM, Siegfried Goeschl wrote: > Hi folks, > > I'm now finishing a SOLR project for one of my customers (replacing > Microsoft FAST server with SOLR) and got the permission to contribute our > improvements. > > The most interesting thing is a "FrequentSearchTerm" component which > allows to analyze the user-supplied search queries in real-time > > +) it keeps track of the last queries per core using a LIFO buffer (so we > have an upper limit of memory consumption) > > +) per query entry we keep track of the number of invocations, the average > number of result document and the average execution time > > +) we allow for custom searches across the frequent search terms using the > MVEL expression language (see http://mvel.codehaus.org) > ++) find all queries which did not yield any results - 'meanHits==0' > ++) find all "iPhone" queries - "searchTerm.contains("iphone) || > searchTerm.contains("i-phone)'**' > ++) find all long-running "iPhone" queries - > '(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) && > meanTime>50' > > +) GUI : we have a JSP page which allows to access the frequent search > terms > > +) there is also an XML/CSV export we use to display the 50 most > frequently used search queries in real-time > > We use this component > > +) to get input for QA regarding frequently used search terms > +) to find strange queries, e.g. queries returning no or too many result, > e.g. caused by WordDelimeterFilter > +) to keep our management happy ... :-) > > So the question is - is the community interested in such a contribution? > If yes than I need to spend some time to improve the code from "industrial > quality" to "open source quality" including documentation ... you know what > I mean :-) > > Thanks in advance, > > Siegfried Goeschl > > PS: Not sure if the name "Frequent Search Term Component" is perfectly > suitable as it was taken from FAST - suggestions welcome > > --**--**- > To unsubscribe, e-mail: > dev-unsubscribe@lucene.apache.**org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details
[ https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1306: - Attachment: SOLR-1306.patch Last fix broke some tests, this fixes them. > Support pluggable persistence/loading of solr.xml details > - > > Key: SOLR-1306 > URL: https://issues.apache.org/jira/browse/SOLR-1306 > Project: Solr > Issue Type: New Feature > Components: multicore >Reporter: Noble Paul >Assignee: Erick Erickson > Fix For: 4.1 > > Attachments: SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch, > SOLR-1306.patch > > > Persisting and loading details from one xml is fine if the no:of cores are > small and the no:of cores are few/fixed . If there are 10's of thousands of > cores in a single box adding a new core (with persistent=true) becomes very > expensive because every core creation has to write this huge xml. > Moreover , there is a good chance that the file gets corrupted and all the > cores become unusable . In that case I would prefer it to be stored in a > centralized DB which is backed up/replicated and all the information is > available in a centralized location. > We may need to refactor CoreContainer to have a pluggable implementation > which can load/persist the details . The default implementation should > write/read from/to solr.xml . And the class should be pluggable as follows in > solr.xml > {code:xml} > > > > {code} > There will be a new interface (or abstract class ) called SolrDataProvider > which this class must implement -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3856) DIH: Better tests for SqlEntityProcessor
[ https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-3856. -- Resolution: Fixed committed fixes. Trunk: r1407547 branch_4x: r1407549 > DIH: Better tests for SqlEntityProcessor > > > Key: SOLR-3856 > URL: https://issues.apache.org/jira/browse/SOLR-3856 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 3.6, 4.0 >Reporter: James Dyer >Assignee: James Dyer > Fix For: 4.1, 5.0 > > Attachments: SOLR-3856_20121109_fixes.patch, SOLR-3856-3.5.patch, > SOLR-3856.patch, SOLR-3856.patch, SOLR-3856.patch > > > The current tests for SqlEntityProcessor (& CachedSqlEntityProcessor), while > many, do not reliably fail when bugs are introduced! They are also difficult > to look at and understand. As we move Jenkins onto new environments, we have > found several of them fail regularly leading to "@Ignore". > My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to > document (hopefully fix) any bugs this reveals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3856) DIH: Better tests for SqlEntityProcessor
[ https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-3856: - Attachment: SOLR-3856_20121109_fixes.patch This adds better messages on failure to help figuring these out. Also added an assume when the locale breaks the test, until SOLR-4051/SOLR-1916 can be fixed > DIH: Better tests for SqlEntityProcessor > > > Key: SOLR-3856 > URL: https://issues.apache.org/jira/browse/SOLR-3856 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 3.6, 4.0 >Reporter: James Dyer >Assignee: James Dyer > Fix For: 4.1, 5.0 > > Attachments: SOLR-3856_20121109_fixes.patch, SOLR-3856-3.5.patch, > SOLR-3856.patch, SOLR-3856.patch, SOLR-3856.patch > > > The current tests for SqlEntityProcessor (& CachedSqlEntityProcessor), while > many, do not reliably fail when bugs are introduced! They are also difficult > to look at and understand. As we move Jenkins onto new environments, we have > found several of them fail regularly leading to "@Ignore". > My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to > document (hopefully fix) any bugs this reveals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-3856) DIH: Better tests for SqlEntityProcessor
[ https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer reopened SOLR-3856: -- re-open to deal with recent test failures. > DIH: Better tests for SqlEntityProcessor > > > Key: SOLR-3856 > URL: https://issues.apache.org/jira/browse/SOLR-3856 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 3.6, 4.0 >Reporter: James Dyer >Assignee: James Dyer > Fix For: 4.1, 5.0 > > Attachments: SOLR-3856-3.5.patch, SOLR-3856.patch, SOLR-3856.patch, > SOLR-3856.patch > > > The current tests for SqlEntityProcessor (& CachedSqlEntityProcessor), while > many, do not reliably fail when bugs are introduced! They are also difficult > to look at and understand. As we move Jenkins onto new environments, we have > found several of them fail regularly leading to "@Ignore". > My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to > document (hopefully fix) any bugs this reveals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-752) Allow better Field Compression options
[ https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494117#comment-13494117 ] David Smiley commented on SOLR-752: --- LUCENE-4226 basically does this but you can't configure codecs; you pick a codec in its default mode. The Compressing codec defaults to "fast" and yields ~50% savings based on Adrien's tests of a "small to medium" sized index: https://issues.apache.org/jira/browse/LUCENE-4226?focusedCommentId=13451708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13451708 But what I'd like to see is the ability to compress a large text field (alone), for the purposes of highlighting, and much more than 50% compression. It might not be able to handle that many concurrent requests to meet response time SLAs, but some search apps aren't under high load. > Allow better Field Compression options > -- > > Key: SOLR-752 > URL: https://issues.apache.org/jira/browse/SOLR-752 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Priority: Minor > Attachments: compressed_field.patch, compressedtextfield.patch > > > See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression > It would be good if Solr handled field compression outside of Lucene's > Field.COMPRESS capabilities, since those capabilities are less than ideal > when it comes to control over compression. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3918) Change the way -excl-slf4j targets work
[ https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494107#comment-13494107 ] Shawn Heisey commented on SOLR-3918: I've changed the issue title, because the latest version of the patch changes how dist-war-excl-slf4j works, in addition to creating a new dist-excl-slf4j target. The current build.xml leaves slf4j-api in the war, forcing you to stick with that specific slf4j version. The patch removes all slf4j jars from the war. With my patch, someone who wants to change the slf4j binding can go to slf4j.org and download the newest version. By putting the appropriate jars into the proper location (lib/ext for the included jetty8), they can use the -excl-slf4j war and have everything work. The required jars are slf4j-api, jcl-over-slf4j, log4j-over-slf4j, the required binding jar. In the case of log4j, you have to include the log4j jar itself as well. > Change the way -excl-slf4j targets work > --- > > Key: SOLR-3918 > URL: https://issues.apache.org/jira/browse/SOLR-3918 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0-BETA >Reporter: Shawn Heisey >Priority: Trivial > Fix For: 3.6.2, 4.1, 5.0 > > Attachments: SOLR-3918.patch, SOLR-3918.patch > > > If you want to create an entire dist target but leave out slf4j bindings, you > must currently use this: > ant dist-solrj, dist-core, dist-test-framework, dist-contrib > dist-war-excl-slf4j > It would be better to have a single target. Attaching a patch against > branch_4x for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3816) Need a more granular nrt system that is close to a realtime system.
[ https://issues.apache.org/jira/browse/SOLR-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494105#comment-13494105 ] Nagendra Nagarajayya commented on SOLR-3816: @Otis: Yes, you could set to something low or 0, but this means it has to close and open the SolrIndexSearcher this often. SolrIndexSearcher is a heavy object that is reference counted so there may be searches going on, etc. has lots critical areas that need to be synchronized to close and reopen a new searcher, warms it up, etc.; was not meant for this kind of a use ... Realtime-search just gets a new nrt reader from the writer and passes this along to the Searcher, a lean searcher with no state. In the future if lucene's developers make the reader more realtime so it sees more changes as they happen at the writer realtime-search should be able to handle it ... "Quote from the user using realtime-search" Insertion speed – while we can’t really explain this, we are able to insert 70k records per second at a steady rate over time with RA, while we can only do 40k at a descending rate with normal Solr. Granted we haven’t even slightly configured regular Solr for high speed insertion with regard to segment configs, but this was good for us to get us quickly off the ground. "end quote" I think has gotten better with the 4.0 release. I have also requested the user to benchmark and update the JIRA as I don't have the required hardware. > Need a more granular nrt system that is close to a realtime system. > --- > > Key: SOLR-3816 > URL: https://issues.apache.org/jira/browse/SOLR-3816 > Project: Solr > Issue Type: Improvement > Components: clients - java, replication (java), search, > SearchComponents - other, SolrCloud, update >Affects Versions: 4.0 >Reporter: Nagendra Nagarajayya > Labels: nrt, realtime, replication, search, solrcloud, update > Attachments: alltests_passed_with_realtime_turnedoff.log, > SOLR-3816_4.0_branch.patch, SOLR-3816-4.x.trunk.patch, > solr-3816-realtime_nrt.patch > > > Need a more granular NRT system that is close to a realtime system. A > realtime system should be able to reflect changes to the index as and when > docs are added/updated to the index. soft-commit offers NRT and is more > realtime friendly than hard commit but is limited by the dependency on the > SolrIndexSearcher being closed and reopened and offers a coarse granular NRT. > Closing and reopening of the SolrIndexSearcher may impact performance also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3918) Change the way -excl-slf4j targets work
[ https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-3918: --- Summary: Change the way -excl-slf4j targets work (was: Create dist-excl-slf4j target) > Change the way -excl-slf4j targets work > --- > > Key: SOLR-3918 > URL: https://issues.apache.org/jira/browse/SOLR-3918 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0-BETA >Reporter: Shawn Heisey >Priority: Trivial > Fix For: 3.6.2, 4.1, 5.0 > > Attachments: SOLR-3918.patch, SOLR-3918.patch > > > If you want to create an entire dist target but leave out slf4j bindings, you > must currently use this: > ant dist-solrj, dist-core, dist-test-framework, dist-contrib > dist-war-excl-slf4j > It would be better to have a single target. Attaching a patch against > branch_4x for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
[ https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494086#comment-13494086 ] David Smiley commented on LUCENE-4550: -- A solution is to calculate the distance from a bbox corner to its center, instead of the current algorithm which takes half of the distance from opposite corners. The only small issue to consider is that the distance from a bbox corner to its center will vary up to ~4x (worse case) depending on wether you take a top corner or bottom corner, so I could do both and take the shorter (resulting in a little more accuracy than taking the longer). > For extremely wide shapes (> 180 degrees) distErrPct is not used correctly > -- > > Key: LUCENE-4550 > URL: https://issues.apache.org/jira/browse/LUCENE-4550 > Project: Lucene - Core > Issue Type: Bug > Components: modules/spatial >Affects Versions: 4.0 >Reporter: David Smiley >Priority: Minor > > When a shape is given to a PrefixTreeStrategy (index or query time), it needs > to know how many levels down the prefix tree to go for a target precision > (distErrPct). distErrPct is basically a fraction of the radius of the shape, > defaulting to 2.5% (0.0025). > If the shape presented is extremely wide, > 180 degrees, then the internal > calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure > the shape's size as having width < 180 degrees, yielding *more* accuracy than > intended. Given that this happens for unrealistic shape sizes and results in > more accuracy, I am flagging this as "minor", but a bug nonetheless. Indeed, > this was discovered as a result of someone using lucene-spatial incorrectly, > not for an actual shape they have. But in the extreme \[erroneous\] case > they had, they had 566k terms (!) generated, when it should have been ~1k > tops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
[ https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4550: - Description: When a shape is given to a PrefixTreeStrategy (index or query time), it needs to know how many levels down the prefix tree to go for a target precision (distErrPct). distErrPct is basically a fraction of the radius of the shape, defaulting to 2.5% (0.0025). If the shape presented is extremely wide, > 180 degrees, then the internal calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure the shape's size as having width < 180 degrees, yielding *more* accuracy than intended. Given that this happens for unrealistic shape sizes and results in more accuracy, I am flagging this as "minor", but a bug nonetheless. Indeed, this was discovered as a result of someone using lucene-spatial incorrectly, not for an actual shape they have. But in the extreme \[erroneous\] case they had, they had 566k terms (!) generated, when it should have been ~1k tops. was: When a shape is given to a PrefixTreeStrategy (index or query time), it needs to know how many levels down the prefix tree to go for a target precision (distErrPct). distErrPct is basically a fraction of the radius of the shape, defaulting to 2.5% (0.0025). If the shape presented is extremely wide, > 180 degrees, then the internal calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure the shape's size as having width < 180 degrees, yielding *more* accuracy than intended. Given that this happens for unrealistic shape sizes and results in more accuracy, I am flagging this as "minor", but a bug nonetheless. Indeed, this was discovered as a result of someone using lucene-spatial incorrectly, not for an actual shape they have. But in the extreme [erroneous] case they had, they had 566k terms (!) generated, when it should have been ~1k tops. > For extremely wide shapes (> 180 degrees) distErrPct is not used correctly > -- > > Key: LUCENE-4550 > URL: https://issues.apache.org/jira/browse/LUCENE-4550 > Project: Lucene - Core > Issue Type: Bug > Components: modules/spatial >Affects Versions: 4.0 >Reporter: David Smiley >Priority: Minor > > When a shape is given to a PrefixTreeStrategy (index or query time), it needs > to know how many levels down the prefix tree to go for a target precision > (distErrPct). distErrPct is basically a fraction of the radius of the shape, > defaulting to 2.5% (0.0025). > If the shape presented is extremely wide, > 180 degrees, then the internal > calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure > the shape's size as having width < 180 degrees, yielding *more* accuracy than > intended. Given that this happens for unrealistic shape sizes and results in > more accuracy, I am flagging this as "minor", but a bug nonetheless. Indeed, > this was discovered as a result of someone using lucene-spatial incorrectly, > not for an actual shape they have. But in the extreme \[erroneous\] case > they had, they had 566k terms (!) generated, when it should have been ~1k > tops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
David Smiley created LUCENE-4550: Summary: For extremely wide shapes (> 180 degrees) distErrPct is not used correctly Key: LUCENE-4550 URL: https://issues.apache.org/jira/browse/LUCENE-4550 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Affects Versions: 4.0 Reporter: David Smiley Priority: Minor When a shape is given to a PrefixTreeStrategy (index or query time), it needs to know how many levels down the prefix tree to go for a target precision (distErrPct). distErrPct is basically a fraction of the radius of the shape, defaulting to 2.5% (0.0025). If the shape presented is extremely wide, > 180 degrees, then the internal calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure the shape's size as having width < 180 degrees, yielding *more* accuracy than intended. Given that this happens for unrealistic shape sizes and results in more accuracy, I am flagging this as "minor", but a bug nonetheless. Indeed, this was discovered as a result of someone using lucene-spatial incorrectly, not for an actual shape they have. But in the extreme [erroneous] case they had, they had 566k terms (!) generated, when it should have been ~1k tops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-752) Allow better Field Compression options
[ https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494042#comment-13494042 ] Pieter commented on SOLR-752: - Doesn't the new Lucene 4.1 field compression (LUCENE-4226 if I am right) tackle this? > Allow better Field Compression options > -- > > Key: SOLR-752 > URL: https://issues.apache.org/jira/browse/SOLR-752 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Priority: Minor > Attachments: compressed_field.patch, compressedtextfield.patch > > > See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression > It would be good if Solr handled field compression outside of Lucene's > Field.COMPRESS capabilities, since those capabilities are less than ideal > when it comes to control over compression. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Failing tests aka. "what's the point of running them?"
+1 I mean, yes, I would like to see any test failures addressed quickly, but for any tests that fail chronically, it makes sense to both disable/ignore them as well as make sure that "blocker" Jira's get filed for them. Personally, I'd like to see Jira's for all test failure errors so that people can easily search Jira for any failure message, hang, etc., including reasonably detailed narrative to explain how the failure occurred and how it was tracked down and the nature of the fix so that future failures can be fixed more promptly and by more people. Leave enough info that less senior community members can begin to learn what it takes to fix test failures. As things stand, I don't have a ghost of a chance at looking at any of these test failures - the whole test infrastructure is a black box with such complexity that I can fathom only the simplest of tests. If I have difficulty with this, maybe there are others who would benefit as well from a greater sharing of the expertise needed to track down these chronic and seemingly mysterious failures. I mean, the expertise to even LOOK at some of these failures is in the heads and hands of too small a set of individuals. And maybe the list of failure modes is also indicative of the lack of a rich enough test infrastructure at the application level, especially when dealing with timing issues, let alone the vagaries of Java and individual JVM idiosyncrasies . The irony is that sometimes we spend more time trying to get the tests to pass on timing issues than to put more stress of code to expose more timing problems. Oh, and if some JVM's are failing chronically, tag those JVM's as "unsupported" until sufficient testing and testing expertise is available to get things fixed. File detailed Jira's as well, as above, except NOT as blockers. I mean, let's get the chronic failures under control on the "main" supported platforms before expanding the supported environments - "supported but with significant and chronic test failures" should not be supported. -- Jack Krupansky -Original Message- From: Simon Willnauer Sent: Friday, November 09, 2012 2:36 AM To: dev@lucene.apache.org Subject: Failing tests aka. "what's the point of running them?" hey folks, I know yonik and mark had a long time power outage so I don't want to blame anybody here but we need to fix those test failures. Really when you look at those jenkins jobs: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/ http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/ https://builds.apache.org/computer/lucene/ https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/ its really funny if that'd be a joke but it isn't. If I'd be a new contributor I'd be scared as sh** when I subscribe to the mailing list. It's also not a good advertisement for us either. Yet, even further it makes me miss failures I might have caused since the amount of failure mails don't encourage to look at them since its the same tests that fail over and over again no matter what code I commit. I can already hear somebody saying "why don't you fix it" - well fair enough but this project is massive and we are a large committer base and I don't see myself fix the code I have never ever touched. Anyhow, I really ask myself what is the point of running these tests, specifically the solr ones, if the fail over and over again and nobody cares? Even if folks care they don't get fixed and this project as more committers than yonik and mark. Its really a bad sign if we are at the point where we rely on 2 people to fix tests on a stable branch... I'd really like to hear how people want to address this, I mean it would be just fair to disable the tests until somebody has the patience / time to fix them we can / should make it blockers for a release. Really a jenkins mail should be the exception not the rule...I also think if the FreeBSD jenkins black hole stuff is a problem for the tests then lets add a @BlackHoleProne annotation and only run that on linux? I really don't care how we fix it but if we don't have a solution by the end of next week I will add @Ignore to all of them that failed in the last 2 weeks. Sorry I got so frustrated about this - this is really bad press here! simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494036#comment-13494036 ] Adrien Grand commented on LUCENE-4549: -- I think you unintentionally always enabled rate limiting in the test case. {code} -if (rarely(random)) { +if (rarely(random) || true) { {code} > Allow variable buffer size on BufferedIndexOutput > -- > > Key: LUCENE-4549 > URL: https://issues.apache.org/jira/browse/LUCENE-4549 > Project: Lucene - Core > Issue Type: Improvement > Components: core/store >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4549.patch > > > BufferedIndexInput allows to set the buffersize but BufferedIndexOutput > doesn't this could be useful for optimizations related to LUCENE-4537. We > should make the apis here consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494029#comment-13494029 ] Jay Hacker commented on SOLR-3274: -- Not sure if it's the same problem, but I have seen similar issues with 4.0.0 release. I get errors like: {code} ClusterState says we are the leader, but locally we don't think so There was a problem finding the leader in zk forwarding update to http://solr83:4000/solr/main/ failed - retrying ... Cannot open channel to 3 at election address solr84/X.X.X.X:5002 Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect {code} I'm running zookeeper embedded, and the problem turns out to be long garbage collection pauses. During a stop-the-world collection, zookeeper times out. It's especially bad if the system has to page in a bunch of memory from disk. This would explain why things run fine for a while, until memory fills up and you need to do a big GC. This is quite repeatable for us; just index until memory is pretty full, wait for a long GC or trigger one manually with VisualVM. You can try different garbage collectors or specifying maximum pause times (I've had some luck with {{-XX:+UseConcMarkSweepGC}} ), but the best solution may be to run zookeeper in an independent JVM. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpCo
Re: Failing tests aka. "what's the point of running them?"
This is a recurring discussion -- the latest one (I participated in) was attached to a Jira issue about excluding those frequently failing tests from the build (and leaving a single build on jenkins that would _not_ sent e-mails to the list for those who are interested in those failures). This was met with mixed feelings and I dropped the subject. > I want to know if anybody is even looking at the test failures. And this was my primary concern... Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4549: Attachment: LUCENE-4549.patch here is a patch... > Allow variable buffer size on BufferedIndexOutput > -- > > Key: LUCENE-4549 > URL: https://issues.apache.org/jira/browse/LUCENE-4549 > Project: Lucene - Core > Issue Type: Improvement > Components: core/store >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4549.patch > > > BufferedIndexInput allows to set the buffersize but BufferedIndexOutput > doesn't this could be useful for optimizations related to LUCENE-4537. We > should make the apis here consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-4549: --- Assignee: Simon Willnauer > Allow variable buffer size on BufferedIndexOutput > -- > > Key: LUCENE-4549 > URL: https://issues.apache.org/jira/browse/LUCENE-4549 > Project: Lucene - Core > Issue Type: Improvement > Components: core/store >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4549.patch > > > BufferedIndexInput allows to set the buffersize but BufferedIndexOutput > doesn't this could be useful for optimizations related to LUCENE-4537. We > should make the apis here consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[SOLR] RFC - Contributing a FrequentSearchTerm component ...
Hi folks, I'm now finishing a SOLR project for one of my customers (replacing Microsoft FAST server with SOLR) and got the permission to contribute our improvements. The most interesting thing is a "FrequentSearchTerm" component which allows to analyze the user-supplied search queries in real-time +) it keeps track of the last queries per core using a LIFO buffer (so we have an upper limit of memory consumption) +) per query entry we keep track of the number of invocations, the average number of result document and the average execution time +) we allow for custom searches across the frequent search terms using the MVEL expression language (see http://mvel.codehaus.org) ++) find all queries which did not yield any results - 'meanHits==0' ++) find all "iPhone" queries - "searchTerm.contains("iphone) || searchTerm.contains("i-phone)'' ++) find all long-running "iPhone" queries - '(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) && meanTime>50' +) GUI : we have a JSP page which allows to access the frequent search terms +) there is also an XML/CSV export we use to display the 50 most frequently used search queries in real-time We use this component +) to get input for QA regarding frequently used search terms +) to find strange queries, e.g. queries returning no or too many result, e.g. caused by WordDelimeterFilter +) to keep our management happy ... :-) So the question is - is the community interested in such a contribution? If yes than I need to spend some time to improve the code from "industrial quality" to "open source quality" including documentation ... you know what I mean :-) Thanks in advance, Siegfried Goeschl PS: Not sure if the name "Frequent Search Term Component" is perfectly suitable as it was taken from FAST - suggestions welcome - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Failing tests aka. "what's the point of running them?"
On Fri, Nov 9, 2012 at 5:36 AM, Simon Willnauer wrote: > hey folks, > > I know yonik and mark had a long time power outage so I don't want to > blame anybody here but we need to fix those test failures. > Really when you look at those jenkins jobs: > > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/ > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/ > https://builds.apache.org/computer/lucene/ > https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/ > > its really funny if that'd be a joke but it isn't. If I'd be a new > contributor I'd be scared as sh** when I subscribe to the mailing > list. It's also not a good advertisement for us either. Yet, even > further it makes me miss failures I might have caused since the amount > of failure mails don't encourage to look at them since its the same > tests that fail over and over again no matter what code I commit. I > can already hear somebody saying "why don't you fix it" - well fair > enough but this project is massive and we are a large committer base > and I don't see myself fix the code I have never ever touched. Anyhow, > I really ask myself what is the point of running these tests, > specifically the solr ones, if the fail over and over again and nobody > cares? I want to know if anybody is even looking at the test failures. At some point I began filtering solr test failures to my email spam folder via 3 gmail rules. I don't run solr tests locally anymore either because of the huge false failure rate. I have to do these things because I don't want to miss lucene failures in the noise of this stuff. For now I disabled the solr tests in jenkins jobs. This shouldnt be controversial: they havent passed in over 15 days. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3368 - Still Failing
The bug here (in my opinion) is that ThaiWordFilter is a filter at all (it should be a tokenizer). Like WDF and other filters that really should be tokenizers, It doesn't expect and can't handle arbitrary input correctly (e.g. thats been through a shingle filter...) Another problem is that offsetsAreCorrect=false allows for offsets to "go backwards" in the stream. But this leniency is a false sense of security, because if you add a shingle filter then you have a situation like this where startOffset > endOffset. On Fri, Nov 9, 2012 at 7:31 AM, Apache Jenkins Server wrote: > Error Message: > startOffset must be non-negative, and endOffset must be >= startOffset, > startOffset=5,endOffset=3 > > Stack Trace: > java.lang.IllegalAr> [junit4:junit4] 2> Exception from random analyzer: > [junit4:junit4] 2> charfilters= > [junit4:junit4] 2> tokenizer= > [junit4:junit4] 2> > org.apache.lucene.analysis.core.WhitespaceTokenizer(LUCENE_50, > org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@7f4aaa58) > [junit4:junit4] 2> filters= > [junit4:junit4] 2> > org.apache.lucene.analysis.miscellaneous.LengthFilter(false, > org.apache.lucene.analysis.ValidatingTokenFilter@1, -30, 69) > [junit4:junit4] 2> > org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea, > tpzabzsxye) > [junit4:junit4] 2> > org.apache.lucene.analysis.th.ThaiWordFilter(LUCENE_50, > org.apache.lucene.analysis.ValidatingTokenFilter@37caea) > [junit4:junit4] 2> > org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea) > [junit4:junit4] 2> offsetsAreCorrect=false - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3368 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3368/ 1 tests failed. REGRESSION: org.apache.lucene.analysis.core.TestRandomChains.testRandomChains Error Message: startOffset must be non-negative, and endOffset must be >= startOffset, startOffset=5,endOffset=3 Stack Trace: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, startOffset=5,endOffset=3 at __randomizedtesting.SeedInfo.seed([FE0FDF1A0D2C367D:C3EEF67B4A3E2BBD]:0) at org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:43) at org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:323) at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:632) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:542) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:443) at org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:859) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
[ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493891#comment-13493891 ] zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:38 AM: - patch for solr 4.0.0 available #v400-SOLR-2549.patch was (Author: zakibenz): patch for solr 4.0.0 available > DIH LineEntityProcessor support for delimited & fixed-width files > - > > Key: SOLR-2549 > URL: https://issues.apache.org/jira/browse/SOLR-2549 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0-ALPHA >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, > SOLR-2549.patch, v400-SOLR-2549.patch > > > Provides support for Fixed Width and Delimited Files without needing to write > a Transformer. > The following xml properties are supported with this version of > LineEntityProcessor: > For fixed width files: > - colDef[#] > For Delimited files: > - fieldDelimiterRegex > - firstLineHasFieldnames > - delimitedFieldNames > - delimitedFieldTypes > These properties are described in the api documentation. See patch. > When combined with the cache improvements from SOLR-2382 this allows you to > join a flat file entity with other entities (sql, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
[ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493891#comment-13493891 ] zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:38 AM: - patch for solr 4.0.0 available was (Author: zakibenz): patch for solr 4.0.0 > DIH LineEntityProcessor support for delimited & fixed-width files > - > > Key: SOLR-2549 > URL: https://issues.apache.org/jira/browse/SOLR-2549 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0-ALPHA >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, > SOLR-2549.patch, v400-SOLR-2549.patch > > > Provides support for Fixed Width and Delimited Files without needing to write > a Transformer. > The following xml properties are supported with this version of > LineEntityProcessor: > For fixed width files: > - colDef[#] > For Delimited files: > - fieldDelimiterRegex > - firstLineHasFieldnames > - delimitedFieldNames > - delimitedFieldTypes > These properties are described in the api documentation. See patch. > When combined with the cache improvements from SOLR-2382 this allows you to > join a flat file entity with other entities (sql, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
[ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zakaria benzidalmal updated SOLR-2549: -- Attachment: v400-SOLR-2549.patch patch for solr 4.0.0 > DIH LineEntityProcessor support for delimited & fixed-width files > - > > Key: SOLR-2549 > URL: https://issues.apache.org/jira/browse/SOLR-2549 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0-ALPHA >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, > SOLR-2549.patch, v400-SOLR-2549.patch > > > Provides support for Fixed Width and Delimited Files without needing to write > a Transformer. > The following xml properties are supported with this version of > LineEntityProcessor: > For fixed width files: > - colDef[#] > For Delimited files: > - fieldDelimiterRegex > - firstLineHasFieldnames > - delimitedFieldNames > - delimitedFieldTypes > These properties are described in the api documentation. See patch. > When combined with the cache improvements from SOLR-2382 this allows you to > join a flat file entity with other entities (sql, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Failing tests aka. "what's the point of running them?"
hey folks, I know yonik and mark had a long time power outage so I don't want to blame anybody here but we need to fix those test failures. Really when you look at those jenkins jobs: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/ http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/ https://builds.apache.org/computer/lucene/ https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/ its really funny if that'd be a joke but it isn't. If I'd be a new contributor I'd be scared as sh** when I subscribe to the mailing list. It's also not a good advertisement for us either. Yet, even further it makes me miss failures I might have caused since the amount of failure mails don't encourage to look at them since its the same tests that fail over and over again no matter what code I commit. I can already hear somebody saying "why don't you fix it" - well fair enough but this project is massive and we are a large committer base and I don't see myself fix the code I have never ever touched. Anyhow, I really ask myself what is the point of running these tests, specifically the solr ones, if the fail over and over again and nobody cares? Even if folks care they don't get fixed and this project as more committers than yonik and mark. Its really a bad sign if we are at the point where we rely on 2 people to fix tests on a stable branch... I'd really like to hear how people want to address this, I mean it would be just fair to disable the tests until somebody has the patience / time to fix them we can / should make it blockers for a release. Really a jenkins mail should be the exception not the rule...I also think if the FreeBSD jenkins black hole stuff is a problem for the tests then lets add a @BlackHoleProne annotation and only run that on linux? I really don't care how we fix it but if we don't have a solution by the end of next week I will add @Ignore to all of them that failed in the last 2 weeks. Sorry I got so frustrated about this - this is really bad press here! simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3785) Cluster-state inconsistent
[ https://issues.apache.org/jira/browse/SOLR-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493888#comment-13493888 ] Per Steffensen commented on SOLR-3785: -- Well, I believe the entire thing with the Overseer is a bad idea. It requires at least one Solr is running before you can trust the state-descriptions in ZK - even if this particular "issue" SOLR-3785 is solved using Overseer. We have clients that uses the state-descriptions (through CloudSolrServer/ZkStateReader) to detect if the Solr cluster is running well enough to use it. If all Solrs are down I believe it cannot be seen from the state (you can check live-nodes, and if no Solrs are running you know that you cant trust it). I think you should remove the Overseer entirely and modify ZkStateReader to be able to, single-handedly, look at the ZK state and calculate correct ClusterState. E.g. shard-state could be maintained by the Solr running the shard (as it is today), but as an ephemeral node that disappears when the Solr is not running. ZkStateReader should have logic that, when calculating a shard-state, looks at this ephemeral node, but if it is missing assumes "down"-state. Regards, Per Steffensen > Cluster-state inconsistent > -- > > Key: SOLR-3785 > URL: https://issues.apache.org/jira/browse/SOLR-3785 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Self-build Solr release built on Apache Solr revision > 1355667 from 4.x branch >Reporter: Per Steffensen > Attachments: SOLR-3785.patch > > > Information in CloudSolrServer.getZkStateReader().getCloudState() (called > cloudState below) seems to be inconsistent. > I have a Solr running the leader of slice "sliceName" in collection > "collectionName" - no replica to take over. I shut down this Solr, and I want > to detect that there is now no leader active. > I do e.g. > {code} > ZkNodeProps leader = cloudState.getLeader(indexName, sliceName); > boolean notActive = (leader == null) || > !leader.containsKey(ZkStateReader.STATE_PROP) || > !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE); > {code} > This does not work. It seems like changing state of a shard does it not > changed when this Solr goes down. > I do e.g. > {code} > ZkNodeProps leader = cloudState.getLeader(indexName, sliceName); > boolean notActive = (leader == null) || > !leader.containsKey(ZkStateReader.STATE_PROP) || > !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE) || > !leader.containsKey(ZkStateReader.NODE_NAME_PROP) || > !cloudState.getLiveNodes().contains(leader.get(ZkStateReader.NODE_NAME_PROP)) > {code} > Whis works. > It seems like live-nodes of cloudState is updated when Solr goes down, but > that some of the other info available through cloudState is not - e.g. > getLeader(). > This might already have already been solved on 4.x branch in a revision later > than 1355667. Then please just tell me - thanks. > Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4055) Remove/Reload the collection will occur the thread safe issue.
[ https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raintung Li updated SOLR-4055: -- Attachment: patch-4055 the bug patch > Remove/Reload the collection will occur the thread safe issue. > -- > > Key: SOLR-4055 > URL: https://issues.apache.org/jira/browse/SOLR-4055 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 > Environment: Solr cloud >Reporter: Raintung Li > Attachments: patch-4055 > > > OverseerCollectionProcessor class for collectionCmd method has thread safe > issue. > The major issue is ModifiableSolrParams params instance will deliver into > other thread use(HttpShardHandler.submit). Modify parameter will affect the > other threads the correct parameter. > In the method collectionCmd , change the value > params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); > , that occur send the http request thread will get the wrong core name. The > result is that can't delete/reload the right core. > The easy fix is clone the ModifiableSolrParams for every request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
[ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493871#comment-13493871 ] zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:10 AM: - thanks to james for his help ;) was (Author: zakibenz): thanks to james for his help > DIH LineEntityProcessor support for delimited & fixed-width files > - > > Key: SOLR-2549 > URL: https://issues.apache.org/jira/browse/SOLR-2549 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0-ALPHA >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, > SOLR-2549.patch > > > Provides support for Fixed Width and Delimited Files without needing to write > a Transformer. > The following xml properties are supported with this version of > LineEntityProcessor: > For fixed width files: > - colDef[#] > For Delimited files: > - fieldDelimiterRegex > - firstLineHasFieldnames > - delimitedFieldNames > - delimitedFieldTypes > These properties are described in the api documentation. See patch. > When combined with the cache improvements from SOLR-2382 this allows you to > join a flat file entity with other entities (sql, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
[ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493871#comment-13493871 ] zakaria benzidalmal commented on SOLR-2549: --- thanks to james for his help > DIH LineEntityProcessor support for delimited & fixed-width files > - > > Key: SOLR-2549 > URL: https://issues.apache.org/jira/browse/SOLR-2549 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0-ALPHA >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, > SOLR-2549.patch > > > Provides support for Fixed Width and Delimited Files without needing to write > a Transformer. > The following xml properties are supported with this version of > LineEntityProcessor: > For fixed width files: > - colDef[#] > For Delimited files: > - fieldDelimiterRegex > - firstLineHasFieldnames > - delimitedFieldNames > - delimitedFieldTypes > These properties are described in the api documentation. See patch. > When combined with the cache improvements from SOLR-2382 this allows you to > join a flat file entity with other entities (sql, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
[ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493870#comment-13493870 ] zakaria benzidalmal commented on SOLR-2549: --- data config example: /> > DIH LineEntityProcessor support for delimited & fixed-width files > - > > Key: SOLR-2549 > URL: https://issues.apache.org/jira/browse/SOLR-2549 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0-ALPHA >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, > SOLR-2549.patch > > > Provides support for Fixed Width and Delimited Files without needing to write > a Transformer. > The following xml properties are supported with this version of > LineEntityProcessor: > For fixed width files: > - colDef[#] > For Delimited files: > - fieldDelimiterRegex > - firstLineHasFieldnames > - delimitedFieldNames > - delimitedFieldTypes > These properties are described in the api documentation. See patch. > When combined with the cache improvements from SOLR-2382 this allows you to > join a flat file entity with other entities (sql, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4055) Remove/Reload the collection will occur the thread safe issue.
Raintung Li created SOLR-4055: - Summary: Remove/Reload the collection will occur the thread safe issue. Key: SOLR-4055 URL: https://issues.apache.org/jira/browse/SOLR-4055 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0, 4.0-BETA, 4.0-ALPHA Environment: Solr cloud Reporter: Raintung Li OverseerCollectionProcessor class for collectionCmd method has thread safe issue. The major issue is ModifiableSolrParams params instance will deliver into other thread use(HttpShardHandler.submit). Modify parameter will affect the other threads the correct parameter. In the method collectionCmd , change the value params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); , that occur send the http request thread will get the wrong core name. The result is that can't delete/reload the right core. The easy fix is clone the ModifiableSolrParams for every request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
[ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zakaria benzidalmal updated SOLR-2549: -- Attachment: SOLR-2549.patch Fix NPE Bug when escape parameter is not specified. > DIH LineEntityProcessor support for delimited & fixed-width files > - > > Key: SOLR-2549 > URL: https://issues.apache.org/jira/browse/SOLR-2549 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0-ALPHA >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, > SOLR-2549.patch > > > Provides support for Fixed Width and Delimited Files without needing to write > a Transformer. > The following xml properties are supported with this version of > LineEntityProcessor: > For fixed width files: > - colDef[#] > For Delimited files: > - fieldDelimiterRegex > - firstLineHasFieldnames > - delimitedFieldNames > - delimitedFieldTypes > These properties are described in the api documentation. See patch. > When combined with the cache improvements from SOLR-2382 this allows you to > join a flat file entity with other entities (sql, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4054) delta import of solr4.0 put median data(id of db changed data) to transformer
[ https://issues.apache.org/jira/browse/SOLR-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4054. -- Resolution: Invalid Please raise this kind of issue on the user's list, see http://lucene.apache.org/solr/discussion.html for info. JIRAs are intended for bugs/enhancements rather than usage issues. > delta import of solr4.0 put median data(id of db changed data) to transformer > - > > Key: SOLR-4054 > URL: https://issues.apache.org/jira/browse/SOLR-4054 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 4.0 > Environment: suse server linux 11 + resin4 >Reporter: xuzheng > > followingis my config, when i use delta import in my project, in my resin > java log, > i saw the median data created by deltaQuery was also sent to > SuggestionTransformer before the data created by deltaImportQuery, what i > want is only data dumped by deltaImportQuery can be sent to my transformer, > anybody can explain this or tell me what is the mistake i have made? >pk="Id" > query="select * from Video" > deltaImportQuery="select * from Video where > Id='${dataimporter.delta.Id}'" > deltaQuery="select Id from Video where 'UpdateTime' > > '${dataimporter.last_index_time}'" > transformer="videosearch.dataimport.SuggestionTransformer"> > ranking="true"/> > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org