[jira] [Commented] (SOLR-4006) Many tests on Apache Jenkins are failing with lingering threads.

2012-11-09 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494586#comment-13494586
 ] 

Dawid Weiss commented on SOLR-4006:
---

Hi Mark. Can you send me the detailed spec of this freebsd vm you're using and 
an ant line that reproduces this hang? I'll take a look, I'm also curious 
what's happening, in particular with regard to those forever-hung jvms that 
don't timeout (I suppose it's some sort of native socket wait that's causing 
this).

> Many tests on Apache Jenkins are failing with lingering threads.
> 
>
> Key: SOLR-4006
> URL: https://issues.apache.org/jira/browse/SOLR-4006
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>
> I think I've tracked this down to being related to the black hole.
> It seems to be a recovery call to a server that is down or something - it's 
> hanging in the connect method even though we are using a connect timeout.
> {noformat}
> Thread[RecoveryThread,5,TGRP-SyncSliceTest]
> java.net.PlainSocketImpl.socketConnect(Native Method)
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
> java.net.Socket.connect(Socket.java:546)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4053) metrics - add statistics on searcher/cache warming

2012-11-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494574#comment-13494574
 ] 

Otis Gospodnetic commented on SOLR-4053:


Shawn, arent' warming times already in JMX?  I think they are because I know 
SPM for Solr has nice pie charts and timeseries graphs with warmup timings 
broken down, and that must be coming from JMX.

Or perhaps you want the percentiles in JMX?  If so, can't monitoring tools 
calculate that, shouldn't that be their job?


> metrics - add statistics on searcher/cache warming
> --
>
> Key: SOLR-4053
> URL: https://issues.apache.org/jira/browse/SOLR-4053
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.0
>Reporter: Shawn Heisey
>Priority: Minor
> Fix For: 4.1, 5.0
>
>
> One stat that I rely on is the amount of time that it takes to warm caches 
> and an entire searcher, but unless you turn on INFO logging and write 
> something to parse the logs, you can only see how long the last commit took 
> to warm.  I propose that we use the new metrics capability added in SOLR-1972 
> to give us visibility into historical cache/searcher warming times.
> If I find some time in the near future, I will take a stab at creating a 
> patch, but if someone else has an idea and time, don't wait around for me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.7.0_09) - Build # 1511 - Failure!

2012-11-09 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1511/
Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 23892 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:62: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\build.xml:558: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:410:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\contrib\dataimporthandler-extras\build.xml:43:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:359:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:397:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\build.xml:46:
 Unable to delete file 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar

Total time: 30 minutes 1 second
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_09) - Build # 2313 - Failure!

2012-11-09 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/2313/
Java: 32bit/jdk1.7.0_09 -server -XX:+UseG1GC

1 tests failed.
REGRESSION:  
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest.testRandom

Error Message:
expected: but was:

Stack Trace:
org.junit.ComparisonFailure: expected: but was:
at 
__randomizedtesting.SeedInfo.seed([64C9D03229E9A923:1685F53D98891F50]:0)
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest.testRandom(AnalyzingSuggesterTest.java:710)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:722)




Build Log:
[...truncated 7796 lines...]
[junit4:junit4] Suite: 
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest
[junit4:junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=AnalyzingSuggesterTest -Dtests.method=testRandom 
-Dtests.seed=64C9D03229E9A923 -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=hi_IN -Dtests.timezone=Pacific/Norfolk 
-Dtests.file.encoding=US-ASCII
[junit4:junit4] FAILURE 4.47s J0 | AnalyzingSuggesterTest.testRandom <<<
[junit

[jira] [Commented] (SOLR-3816) Need a more granular nrt system that is close to a realtime system.

2012-11-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494554#comment-13494554
 ] 

Otis Gospodnetic commented on SOLR-3816:


H... didn't check the sources now, but I'm not sure if the above is all 
correct.  Lucene gets the new Reader from IndexWriter, and I would think Solr 
uses that on soft commit and not something else, big and heavy.  Yes, there is 
Searcher/cache warming, but I'm not sure if that comes into play any more with 
NRT and soft commits.


> Need a more granular nrt system that is close to a realtime system.
> ---
>
> Key: SOLR-3816
> URL: https://issues.apache.org/jira/browse/SOLR-3816
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, replication (java), search, 
> SearchComponents - other, SolrCloud, update
>Affects Versions: 4.0
>Reporter: Nagendra Nagarajayya
>  Labels: nrt, realtime, replication, search, solrcloud, update
> Attachments: alltests_passed_with_realtime_turnedoff.log, 
> SOLR-3816_4.0_branch.patch, SOLR-3816-4.x.trunk.patch, 
> solr-3816-realtime_nrt.patch
>
>
> Need a more granular NRT system that is close to a realtime system. A 
> realtime system should be able to reflect changes to the index as and when 
> docs are added/updated to the index. soft-commit offers NRT and is more 
> realtime friendly than hard commit but is limited by the dependency on the 
> SolrIndexSearcher being closed and reopened and offers a coarse granular NRT. 
> Closing and reopening of the SolrIndexSearcher may impact performance also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4059) Custom Sharding

2012-11-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494546#comment-13494546
 ] 

Mark Miller commented on SOLR-4059:
---

Hmm..of course you would still want to be able to send to any node I think...so 
seems more like something along the lines of shardId= on the update

> Custom Sharding
> ---
>
> Key: SOLR-4059
> URL: https://issues.apache.org/jira/browse/SOLR-4059
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Mark Miller
>
> Had not fully thought through this one yet, but Yonik caught me up at 
> ApacheCon. We need to be able to skip hashing and let the client choose the 
> shard, but still send to replicas.
> Ideas for the interface? hash=false?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

2012-11-09 Thread Robert Muir
Its a really simple answer.

Your problem (and i quote):
Content indexed as state:california
But it seems like I search state:CALIFORNI~0.65  (via solr) it doesn't work.
  I'm worried that Solr isn't running my text through the query analyzers first!

This is some analysis chain configuration issue.

We don't need to add support for some unscalable stuff to lucene to
correct for that: you just need to make sure lowercasing is happening.

NOTE: I will continue to protest/veto/anything i can to block queries
with horrible complexity, making as much noise as possible, because
the end solution is for users to index and search content correctly
and get results in reasonable amount of time.

If it doesn't work with 100M documents, i don't want it in lucene.

I would have the same opinion if someone wanted unscalable solutions
for scoring w/ language models (e.g. not happy with smoothing for
unknown probabilities), or if someone claimed that spatial queries
should do slow things because they don't currently support
interplanetary distances, and so on.

On Fri, Nov 9, 2012 at 7:52 PM, Mark Bennett  wrote:
> Hi Robert,
>
> I acknowledge your "-1" vote, and I'm guessing that your objection is maybe
> 70% "scalability", and only 30% use-case?
>
> The older Levenstein stuff has been around for a long time, scalable or not,
> and already in real systems.
>
> You seem to have a very "binary" on code being "in" or "out".  Is there any
> room in your world-view of code for "gray code", unsupported, incubator,
> what-have-you?  Maybe analagous to people who jailbreak their iPhones or
> something?
>
> You're an important part of the community, and working at Lucid, etc., and
> clearly concerned about software quality.  When smart folks like you have
> such sharp opinions I do try to ponder them against my own circumstances.
>
> And on the quality of the old code, was it just the scalability, or were
> there other concerns such as stability, coding style, or possibly
> inconsistent results?
>
> Isn't the sandbox and admonished reference in Java docs sufficient?
>
> I'm harping on this because I'm really between a rock and hard place, and
> also posted another question.
>
> Just trying to understand your very strong opinions, and I thank you for
> your patience in this matter.  This issue is either going to fix or break my
> weekend / next-deliverble.
>
> Sincere thanks,
> Mark
>
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>
>
> On Fri, Nov 9, 2012 at 4:37 PM, Robert Muir  wrote:
>>
>> I'm -1 for having unscalable shit in lucene's core. This query should
>> have never been added.
>>
>> I don't care if a few people complain because they aren't using
>> lowercasefilter or some other insanity. Fix your analysis chain. I
>> don't have any sympathy.
>>
>> On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky 
>> wrote:
>> > +1 for permitting a choice of fuzzy query implementation.
>> >
>> > I agree that we want a super-fast fuzzy query for simple variations, but
>> > I
>> > also agree that we should have the option to trade off speed for
>> > function.
>> >
>> > But I am also sympathetic to assuring that any core Lucene features be
>> > as
>> > performant as possible.
>> >
>> > Ultimately, if there was a single fuzzy query implementation that did
>> > everything for everybody all of the time, that would be the way to go,
>> > but
>> > if choices need to be made to satisfy competing goals, we should support
>> > going that route.
>> >
>> > -- Jack Krupansky
>> >
>> > From: Mark Bennett
>> > Sent: Friday, November 09, 2012 3:48 PM
>> > To: dev@lucene.apache.org
>> > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira]
>> > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
>> >
>> > Hi Robert,
>> >
>> > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir  wrote:
>> >>
>> >> ...
>> >> ... I'm strongly against having this
>> >> unscalable garbage in lucene's core.
>> >>
>> >> There is no use case for ed > 2, thats just crazy.
>> >
>> >
>> > I promise you there ARE use cases for edit distances > 2, especially
>> > with
>> > longer words.  Due to NDA I can't go into details.
>> >
>> > Also ed>2 can be useful when COMBINING that low-quality part of the
>> > search
>> > with other sub-queries, or additional business rules.  Maybe instead of
>> > boiling an ocean this lets you just boil the sea.  ;-)
>> >
>> > I won't comment on the quality of the older Levenstein code, or the
>> > likely
>> > very slow performance, nor where the code should live, etc.
>> >
>> > But your statement about "no use case for ed > 2" is simply not true.
>> > (whether you'd agree with any of them or not is certainly another
>> > matter)
>> >
>> > I understand your concerns about not having it be the default.  (or
>> > maybe
>> > having a giant warning message or something, whatever)
>> >
>> >> --
>> >> lucidworks.com
>> >>
>> >> --

Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

2012-11-09 Thread Mark Miller
On Fri, Sep 14, 2012 at 7:12 PM, Chris Hostetter
 wrote:
>
>  for your crazy unscalableness"

That really depends - size of the index, requirements around response
times, caching, data. If anyone was using it before, they were using
it in a way that ended up being acceptable to them. Or they were not
using it.

-- 
- Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

2012-11-09 Thread Mark Miller
On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky  wrote:
> +1 for permitting a choice of fuzzy query implementation.

+1. I wouldn't allow it by default though. I'd prefer having to set
allowSlowFuzzyAlg or something with some good javadoc. Won't let you
accidentally move from a fast alg to a slow one, but also keeps the
functionality very discoverable.

Having it in contrib is not as good, but okay. -1 on deprecation.

Best place to have these discussions - if someone thinks they have a
good idea - is in a JIRA issue.


- Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4059) Custom Sharding

2012-11-09 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4059:
-

 Summary: Custom Sharding
 Key: SOLR-4059
 URL: https://issues.apache.org/jira/browse/SOLR-4059
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller


Had not fully thought through this one yet, but Yonik caught me up at 
ApacheCon. We need to be able to skip hashing and let the client choose the 
shard, but still send to replicas.

Ideas for the interface? hash=false?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2592) Custom Hashing

2012-11-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2592:
--

Assignee: (was: Mark Miller)
 Summary: Custom Hashing  (was: Pluggable shard lookup mechanism for 
SolrCloud)

> Custom Hashing
> --
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA
>Reporter: Noble Paul
> Attachments: dbq_fix.patch, pluggable_sharding.patch, 
> pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
> SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
> SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

2012-11-09 Thread Mark Bennett
Hi Robert,

I acknowledge your "-1" vote, and I'm guessing that your objection is maybe
70% "scalability", and only 30% use-case?

The older Levenstein stuff has been around for a long time, scalable or
not, and already in real systems.

You seem to have a very "binary" on code being "in" or "out".  Is there any
room in your world-view of code for "gray code", unsupported, incubator,
what-have-you?  Maybe analagous to people who jailbreak their iPhones or
something?

You're an important part of the community, and working at Lucid, etc., and
clearly concerned about software quality.  When smart folks like you have
such sharp opinions I do try to ponder them against my own circumstances.

And on the quality of the old code, was it just the scalability, or were
there other concerns such as stability, coding style, or possibly
inconsistent results?

Isn't the sandbox and admonished reference in Java docs sufficient?

I'm harping on this because I'm really between a rock and hard place, and
also posted another question.

Just trying to understand your very strong opinions, and I thank you for
your patience in this matter.  This issue is either going to fix or break
my weekend / next-deliverble.

Sincere thanks,
Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Fri, Nov 9, 2012 at 4:37 PM, Robert Muir  wrote:

> I'm -1 for having unscalable shit in lucene's core. This query should
> have never been added.
>
> I don't care if a few people complain because they aren't using
> lowercasefilter or some other insanity. Fix your analysis chain. I
> don't have any sympathy.
>
> On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky 
> wrote:
> > +1 for permitting a choice of fuzzy query implementation.
> >
> > I agree that we want a super-fast fuzzy query for simple variations, but
> I
> > also agree that we should have the option to trade off speed for
> function.
> >
> > But I am also sympathetic to assuring that any core Lucene features be as
> > performant as possible.
> >
> > Ultimately, if there was a single fuzzy query implementation that did
> > everything for everybody all of the time, that would be the way to go,
> but
> > if choices need to be made to satisfy competing goals, we should support
> > going that route.
> >
> > -- Jack Krupansky
> >
> > From: Mark Bennett
> > Sent: Friday, November 09, 2012 3:48 PM
> > To: dev@lucene.apache.org
> > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira]
> > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
> >
> > Hi Robert,
> >
> > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir  wrote:
> >>
> >> ...
> >> ... I'm strongly against having this
> >> unscalable garbage in lucene's core.
> >>
> >> There is no use case for ed > 2, thats just crazy.
> >
> >
> > I promise you there ARE use cases for edit distances > 2, especially with
> > longer words.  Due to NDA I can't go into details.
> >
> > Also ed>2 can be useful when COMBINING that low-quality part of the
> search
> > with other sub-queries, or additional business rules.  Maybe instead of
> > boiling an ocean this lets you just boil the sea.  ;-)
> >
> > I won't comment on the quality of the older Levenstein code, or the
> likely
> > very slow performance, nor where the code should live, etc.
> >
> > But your statement about "no use case for ed > 2" is simply not true.
> > (whether you'd agree with any of them or not is certainly another matter)
> >
> > I understand your concerns about not having it be the default.  (or maybe
> > having a giant warning message or something, whatever)
> >
> >> --
> >> lucidworks.com
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Created] (SOLR-4058) DIH should use the SolrCloudServer impl when running in SolrCloud mode.

2012-11-09 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4058:
-

 Summary: DIH should use the SolrCloudServer impl when running in 
SolrCloud mode.
 Key: SOLR-4058
 URL: https://issues.apache.org/jira/browse/SOLR-4058
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Priority: Minor
 Fix For: 4.1, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

2012-11-09 Thread Robert Muir
I'm -1 for having unscalable shit in lucene's core. This query should
have never been added.

I don't care if a few people complain because they aren't using
lowercasefilter or some other insanity. Fix your analysis chain. I
don't have any sympathy.

On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky  wrote:
> +1 for permitting a choice of fuzzy query implementation.
>
> I agree that we want a super-fast fuzzy query for simple variations, but I
> also agree that we should have the option to trade off speed for function.
>
> But I am also sympathetic to assuring that any core Lucene features be as
> performant as possible.
>
> Ultimately, if there was a single fuzzy query implementation that did
> everything for everybody all of the time, that would be the way to go, but
> if choices need to be made to satisfy competing goals, we should support
> going that route.
>
> -- Jack Krupansky
>
> From: Mark Bennett
> Sent: Friday, November 09, 2012 3:48 PM
> To: dev@lucene.apache.org
> Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira]
> [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
>
> Hi Robert,
>
> On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir  wrote:
>>
>> ...
>> ... I'm strongly against having this
>> unscalable garbage in lucene's core.
>>
>> There is no use case for ed > 2, thats just crazy.
>
>
> I promise you there ARE use cases for edit distances > 2, especially with
> longer words.  Due to NDA I can't go into details.
>
> Also ed>2 can be useful when COMBINING that low-quality part of the search
> with other sub-queries, or additional business rules.  Maybe instead of
> boiling an ocean this lets you just boil the sea.  ;-)
>
> I won't comment on the quality of the older Levenstein code, or the likely
> very slow performance, nor where the code should live, etc.
>
> But your statement about "no use case for ed > 2" is simply not true.
> (whether you'd agree with any of them or not is certainly another matter)
>
> I understand your concerns about not having it be the default.  (or maybe
> having a giant warning message or something, whatever)
>
>> --
>> lucidworks.com
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

2012-11-09 Thread Jack Krupansky
+1 for permitting a choice of fuzzy query implementation.

I agree that we want a super-fast fuzzy query for simple variations, but I also 
agree that we should have the option to trade off speed for function.

But I am also sympathetic to assuring that any core Lucene features be as 
performant as possible.

Ultimately, if there was a single fuzzy query implementation that did 
everything for everybody all of the time, that would be the way to go, but if 
choices need to be made to satisfy competing goals, we should support going 
that route.

-- Jack Krupansky

From: Mark Bennett 
Sent: Friday, November 09, 2012 3:48 PM
To: dev@lucene.apache.org 
Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] 
(LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

Hi Robert,


On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir  wrote:

  ...

  ... I'm strongly against having this
  unscalable garbage in lucene's core.

  There is no use case for ed > 2, thats just crazy.

I promise you there ARE use cases for edit distances > 2, especially with 
longer words.  Due to NDA I can't go into details.

Also ed>2 can be useful when COMBINING that low-quality part of the search with 
other sub-queries, or additional business rules.  Maybe instead of boiling an 
ocean this lets you just boil the sea.  ;-)

I won't comment on the quality of the older Levenstein code, or the likely very 
slow performance, nor where the code should live, etc.

But your statement about "no use case for ed > 2" is simply not true.  (whether 
you'd agree with any of them or not is certainly another matter)

I understand your concerns about not having it be the default.  (or maybe 
having a giant warning message or something, whatever)


  --
  lucidworks.com


  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (SOLR-4006) Many tests on Apache Jenkins are failing with lingering threads.

2012-11-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494461#comment-13494461
 ] 

Mark Miller commented on SOLR-4006:
---

I'm back with power and back from ApacheCon finally.

I've confirmed with my local freebsd vm that the nio selector change is indeed 
the culprit.

It does seem like perhaps the timeout is not being respected. I'll start by 
reverting I suppose and keep looking for a solution.

At worst we can pass a isFreebsd sys prop or something on our free bsd jenkings 
machine and then don't use NIO in that case. I'd rather it worked somehow 
though...

> Many tests on Apache Jenkins are failing with lingering threads.
> 
>
> Key: SOLR-4006
> URL: https://issues.apache.org/jira/browse/SOLR-4006
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>
> I think I've tracked this down to being related to the black hole.
> It seems to be a recovery call to a server that is down or something - it's 
> hanging in the connect method even though we are using a connect timeout.
> {noformat}
> Thread[RecoveryThread,5,TGRP-SyncSliceTest]
> java.net.PlainSocketImpl.socketConnect(Native Method)
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
> java.net.Socket.connect(Socket.java:546)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.6.0_37) - Build # 1506 - Failure!

2012-11-09 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1506/
Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 23533 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:229: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:397:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\build.xml:46:
 Unable to delete file 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar

Total time: 28 minutes 15 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

2012-11-09 Thread Mark Bennett
Hi Robert,

On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir  wrote:

> ...
> ... I'm strongly against having this
> unscalable garbage in lucene's core.
>
> There is no use case for ed > 2, thats just crazy.


I promise you there ARE use cases for edit distances > 2, especially with
longer words.  Due to NDA I can't go into details.

Also ed>2 can be useful when COMBINING that low-quality part of the search
with other sub-queries, or additional business rules.  Maybe instead of
boiling an ocean this lets you just boil the sea.  ;-)

I won't comment on the quality of the older Levenstein code, or the likely
very slow performance, nor where the code should live, etc.

But your statement about "no use case for ed > 2" is simply not true.
(whether you'd agree with any of them or not is certainly another matter)

I understand your concerns about not having it be the default.  (or maybe
having a giant warning message or something, whatever)

 --
> lucidworks.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


4 quick questions about Fuzzy Search, including forcing SlowFuzzySearch

2012-11-09 Thread Mark Bennett
I've been checking the code a bit, but it's taking while, and I have 4
questions:

Summary:

I want to submit fuzzy searches, with lower scores, of long words, via
Solr.  I want to use the older/slower method, even though it's slower.   (I
realize low percents on long words sounds like a bad idea, it's a very long
story, there's lots of other stuff going on)

Also I have search time analyzer logic in schema.xml that needs to used,
whether I'm doing a regular search or fuzzy search.

Example:
state:California~0.65
(overly simple example of course)
Or even:
state:CALIFORNI~0.65  (1 letter off)
And still have match:
Content indexed as state:california

Things I'm worried about:

1: Need the parser to call SlowFuzzyQuery instead of FuzzySearch (yup, we
know it's slow!)
Not sure if this is about invoking the old parser, or if it's some type
of config issue instead?

2: I don't want the 0.65 score being needlessly translated into an integer
and then getting needlessly capped at 2.
  I'm not sure if the approach is:
* "don't bother converting from float to int",
  OR
* "convert to int if you want, but don't cap it at 2"

3: Schema.xml Analyzers apply lowercase to words at both index and search
time.
  (We actually have some other complex analyzers that *need* to happen,
just using lowercase as an example)
  But it seems like I search state:CALIFORNI~0.65  (via solr) it doesn't
work.
  I'm worried that Solr isn't running my text through the query analyzers
first!

4: Would the XML parser help with any this?  I think it's still somewhat in
limbo?
We do programmatically build some parts of queries using the Lucene
API, then convert to Strings.
Then we pass the strings to Solr; this seemed to be suggested
workaround I found online.
Wondering if XML would bypass this step and give other more precise
control over slowfuzzy vs. fuzzy.


I'm not sure if this a matter of trying to force the old "classic" query
parser, or setting some configuration or -D directive regardless of parser
being used.




--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


[jira] [Updated] (SOLR-4051) DIH Delta updates do not work for all locales

2012-11-09 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-4051:
-

Attachment: SOLR-4051.patch

This also fixes SOLR-1970 & SOLR-2658, allowing configurable locale, 
dateformat, filename and location.  It needs a new test and validation.

This adds a  as an element in data-config.xml that allows the 
user to specify an implementation of interface DIHPropertiesWriter.  This 
interface was introduced in 3.6 and should have been marked as 
"lucene.experimental".  This patch changes this interface and adds the 
experimental annotation also, just in case it needs to change again.

Allowing pluggable property writers should open the door to easily solve issues 
like SOLR-3365.

> DIH Delta updates do not work for all locales
> -
>
> Key: SOLR-4051
> URL: https://issues.apache.org/jira/browse/SOLR-4051
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-4051.patch
>
>
> DIH Writes the last modified date to a Properties file using the default 
> locale.  This gets sent in plaintext to the database at the next delta 
> update.  DIH does not use prepared statements but just puts the date in an 
> SQL Statement in -mm-dd hh:mm:ss format.  It would probably be best to 
> always format this date in JDBC escape syntax 
> (http://docs.oracle.com/javase/1.4.2/docs/guide/jdbc/getstart/statement.html#999472)
>  and java.sql.Timestamp#toString().  To do this, we'd need to parse the 
> user's query and remove the single quotes likely there (and now the quotes 
> would be optional and undesired).  
> It might just be simpler to change the SimpleDateFormat to use the root 
> locale as this appears to be the original intent here anyhow.  Affected 
> locales include ja_JP_JP , hi_IN , th_TH

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4057) SolrCloud will not run on the root context.

2012-11-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4057:
--

Priority: Minor  (was: Major)

Changing to minor priority given the workaround - I'd like to fix the cosmetic 
issue and make documentation less of an issue - I'll also add doc to the wiki 
to be explicit on the topic.

> SolrCloud will not run on the root context.
> ---
>
> Key: SOLR-4057
> URL: https://issues.apache.org/jira/browse/SOLR-4057
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.1, 5.0
>
>
> If you try and pass an empty hostContext to solrcloud when trying to run on 
> the root context, the empty value simply triggers using the default value of 
> 8983.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4057) SolrCloud will not run on the root context.

2012-11-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494327#comment-13494327
 ] 

Mark Miller commented on SOLR-4057:
---

Interesting - doesn't seem very intuitive to me though - especially because we 
don't ask for a path - we ask for a string value of the context - at the least 
it would still be an issue that it's not documented - and it's not something 
I'd want to require seeing in the resulting URL strings. Even if some people 
thought it was clear that it was a path so you could use . for the root 
context, I don't think most people associate . and .. with URL's the same way 
they do with files. I and at least one other did not anyway :)

It's a great workaround though.



> SolrCloud will not run on the root context.
> ---
>
> Key: SOLR-4057
> URL: https://issues.apache.org/jira/browse/SOLR-4057
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> If you try and pass an empty hostContext to solrcloud when trying to run on 
> the root context, the empty value simply triggers using the default value of 
> 8983.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4057) SolrCloud will not run on the root context.

2012-11-09 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494313#comment-13494313
 ] 

Roman Shaposhnik commented on SOLR-4057:


it appears that specifying hostContext="." works as expected or am I missing 
something?

> SolrCloud will not run on the root context.
> ---
>
> Key: SOLR-4057
> URL: https://issues.apache.org/jira/browse/SOLR-4057
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> If you try and pass an empty hostContext to solrcloud when trying to run on 
> the root context, the empty value simply triggers using the default value of 
> 8983.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3785) Cluster-state inconsistent

2012-11-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494302#comment-13494302
 ] 

Mark Miller commented on SOLR-3785:
---

bq. ZkStateReader should have logic that, when calculating a shard-state, looks 
at this ephemeral node, but if it is missing assumes "down"-state.

That's not a bad idea.

> Cluster-state inconsistent
> --
>
> Key: SOLR-3785
> URL: https://issues.apache.org/jira/browse/SOLR-3785
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: Self-build Solr release built on Apache Solr revision 
> 1355667 from 4.x branch
>Reporter: Per Steffensen
> Attachments: SOLR-3785.patch
>
>
> Information in CloudSolrServer.getZkStateReader().getCloudState() (called 
> cloudState below) seems to be inconsistent. 
> I have a Solr running the leader of slice "sliceName" in collection 
> "collectionName" - no replica to take over. I shut down this Solr, and I want 
> to detect that there is now no leader active. 
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || 
> !leader.containsKey(ZkStateReader.STATE_PROP) || 
> !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE);
> {code}
> This does not work. It seems like changing state of a shard does it not 
> changed when this Solr goes down.
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || 
> !leader.containsKey(ZkStateReader.STATE_PROP) || 
> !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE) ||
> !leader.containsKey(ZkStateReader.NODE_NAME_PROP) || 
> !cloudState.getLiveNodes().contains(leader.get(ZkStateReader.NODE_NAME_PROP))
> {code}
> Whis works.
> It seems like live-nodes of cloudState is updated when Solr goes down, but 
> that some of the other info available through cloudState is not - e.g. 
> getLeader().
> This might already have already been solved on 4.x branch in a revision later 
> than 1355667. Then please just tell me - thanks.
> Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4057) SolrCloud will not run on the root context.

2012-11-09 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4057:
-

 Summary: SolrCloud will not run on the root context.
 Key: SOLR-4057
 URL: https://issues.apache.org/jira/browse/SOLR-4057
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.1, 5.0


If you try and pass an empty hostContext to solrcloud when trying to run on the 
root context, the empty value simply triggers using the default value of 8983.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput

2012-11-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494266#comment-13494266
 ] 

Simon Willnauer commented on LUCENE-4549:
-

bq. I think you unintentionally always enabled rate limiting in the test case.
well I intentionally enabled it but i missed the nocommit. good catch... I plan 
to commit this soon...

> Allow variable buffer size on BufferedIndexOutput 
> --
>
> Key: LUCENE-4549
> URL: https://issues.apache.org/jira/browse/LUCENE-4549
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4549.patch
>
>
> BufferedIndexInput allows to set the buffersize but BufferedIndexOutput 
> doesn't this could be useful for optimizations related to LUCENE-4537. We 
> should make the apis here consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors

2012-11-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4466.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.1

> Bounds check inconsistent for stored fields vs term vectors 
> 
>
> Key: LUCENE-4466
> URL: https://issues.apache.org/jira/browse/LUCENE-4466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Robert Muir
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4466.patch
>
>
> SegmentReader.document does the check for stored fields. Codec's dont.
> SegmentReader.getTermVectors doesnt do the check for vectors. Codec does.
> I think we should move the vectors check out to SR, too. Codecs can have an 
> assert if they want, but the APIs should look more consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors

2012-11-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494250#comment-13494250
 ] 

Michael McCandless commented on LUCENE-4466:


+1

> Bounds check inconsistent for stored fields vs term vectors 
> 
>
> Key: LUCENE-4466
> URL: https://issues.apache.org/jira/browse/LUCENE-4466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Robert Muir
> Attachments: LUCENE-4466.patch
>
>
> SegmentReader.document does the check for stored fields. Codec's dont.
> SegmentReader.getTermVectors doesnt do the check for vectors. Codec does.
> I think we should move the vectors check out to SR, too. Codecs can have an 
> assert if they want, but the APIs should look more consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors

2012-11-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4466:


Attachment: LUCENE-4466.patch

> Bounds check inconsistent for stored fields vs term vectors 
> 
>
> Key: LUCENE-4466
> URL: https://issues.apache.org/jira/browse/LUCENE-4466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Robert Muir
> Attachments: LUCENE-4466.patch
>
>
> SegmentReader.document does the check for stored fields. Codec's dont.
> SegmentReader.getTermVectors doesnt do the check for vectors. Codec does.
> I think we should move the vectors check out to SR, too. Codecs can have an 
> assert if they want, but the APIs should look more consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4466) Bounds check inconsistent for stored fields vs term vectors

2012-11-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494219#comment-13494219
 ] 

Michael McCandless commented on LUCENE-4466:


+1

> Bounds check inconsistent for stored fields vs term vectors 
> 
>
> Key: LUCENE-4466
> URL: https://issues.apache.org/jira/browse/LUCENE-4466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Robert Muir
>
> SegmentReader.document does the check for stored fields. Codec's dont.
> SegmentReader.getTermVectors doesnt do the check for vectors. Codec does.
> I think we should move the vectors check out to SR, too. Codecs can have an 
> assert if they want, but the APIs should look more consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_09) - Build # 1506 - Failure!

2012-11-09 Thread Robert Muir
I probably caused this with my fix yesterday (exmaple/build.xml, see
'sync-hack').

But why would windows have this file open?

On Fri, Nov 9, 2012 at 1:39 PM, Policeman Jenkins Server
 wrote:
> Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/1506/
> Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC
>
> All tests passed
>
> Build Log:
> [...truncated 24072 lines...]
> BUILD FAILED
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:62: The 
> following error occurred while executing this line:
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build.xml:558: 
> The following error occurred while executing this line:
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:410:
>  The following error occurred while executing this line:
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\contrib\dataimporthandler-extras\build.xml:43:
>  The following error occurred while executing this line:
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:359:
>  The following error occurred while executing this line:
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:397:
>  The following error occurred while executing this line:
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\build.xml:46:
>  Unable to delete file 
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar
>
> Total time: 29 minutes 19 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Recording test results
> Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC
> Email was triggered for: Failure
> Sending email for trigger: Failure
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_09) - Build # 1506 - Failure!

2012-11-09 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/1506/
Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 24072 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:62: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build.xml:558: 
The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:410:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\contrib\dataimporthandler-extras\build.xml:43:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:359:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\common-build.xml:397:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\build.xml:46:
 Unable to delete file 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\example\lib\jetty-continuation-8.1.7.v20120910.jar

Total time: 29 minutes 19 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseParallelGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2482) Index sorter

2012-11-09 Thread Matthew Willson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494187#comment-13494187
 ] 

Matthew Willson commented on LUCENE-2482:
-

Hi all -- few quick questions if anyone is still watching this.

* Could this be used to achieve an impact ordered index, as in e.g. [1], where 
documents in a given term's postings list are ordered by score contribution or 
term frequency?

* Any caveats or things one should be aware of when it comes to index sorting 
in combination with different index merge strategies, and some of the more 
advanced stuff in Solr for managing distributed indexes?

* Anyone aware of any other work along the lines of early stopping / dynamic 
pruning optimisations in Lucene? e.g. MaxScore from [1] (I think Xapian [2] 
calls it 'operator decay') or accumulator pruning based algorithms from [1] 
(perhaps in combination with impact ordering)? in particular is there anything 
in Lucene 4's approach to scoring and indexing which would make these hard in 
principle?

Any pointers gratefully received.

[1] Buettcher Clarke & Cormack "Implementing and Evaluating search engines" ch. 
5 pp. 143-153
[2] http://xapian.org/docs/matcherdesign.html

> Index sorter
> 
>
> Key: LUCENE-2482
> URL: https://issues.apache.org/jira/browse/LUCENE-2482
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/other
>Affects Versions: 3.1, 4.0-ALPHA
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
> Fix For: 3.6
>
> Attachments: indexSorter.patch, LUCENE-2482-4.0.patch
>
>
> A tool to sort index according to a float document weight. Documents with 
> high weight are given low document numbers, which means that they will be 
> first evaluated. When using a strategy of "early termination" of queries (see 
> TimeLimitedCollector) such sorting significantly improves the quality of 
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as 
> document weights - thus the ordering was limited by the limited resolution of 
> norms. This is a pure Lucene version of the tool, and it uses arbitrary 
> floats from a specified stored field).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4056) Contribution of component to gather the most frequent user search request in real-time

2012-11-09 Thread Siegfried Goeschl (JIRA)
Siegfried Goeschl created SOLR-4056:
---

 Summary: Contribution of component to gather the most frequent 
user search request in real-time
 Key: SOLR-4056
 URL: https://issues.apache.org/jira/browse/SOLR-4056
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 3.6.1
Reporter: Siegfried Goeschl
Priority: Minor
 Fix For: 3.6.2


I'm now finishing a SOLR project for one of my customers (replacing Microsoft 
FAST server with SOLR) and got the permission to contribute our improvements.

The most interesting thing is a "FrequentSearchTerm" component which allows to 
analyze the user-supplied search queries in real-time

 * it keeps track of the last queries per core using a LIFO buffer (so we have 
an upper limit of memory consumption)
 * per query entry we keep track of the number of invocations, the average 
number of result document and the average execution time
 * we allow for custom searches across the frequent search terms using the MVEL 
expression language (see http://mvel.codehaus.org)
 ** find all queries which did not yield any results - 'meanHits==0'
 ** find all "iPhone" queries - "searchTerm.contains("iphone) || 
searchTerm.contains("i-phone)''
 ** find all long-running "iPhone" queries - '(searchTerm.contains("iphone) || 
searchTerm.contains("i-phone)) && meanTime>50'
 * GUI : we have a JSP page which allows to access the frequent search terms
 * there is also an XML/CSV export we use to display the 50 most frequently 
used search queries in real-time

We use this component

 * to get input for QA regarding frequently used search terms
 * to find strange queries, e.g. queries returning no or too many result, e.g. 
caused by WordDelimeterFilter
 * to keep our management happy ... :-)


 Not sure if the name "Frequent Search Term Component" is perfectly suitable as 
it was taken from FAST - suggestions welcome. Maybe 
"FrequentSearchQueryComponent" would be more suitable?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [SOLR] RFC - Contributing a FrequentSearchTerm component ...

2012-11-09 Thread Erick Erickson
Absolutely feel free to open up a JIRA and attach a patch for something
like this! You can create an account and edit JIRAs freely.

You don't need to clean it up much before putting up the first patch. It's
often useful to let other eyes take a quick look at it and make comments
before polishing. It's perfectly reasonable to have //TODOs or //nocommit
comments in the code as a flag that "this isn't finished yet", but it's up
to you.

Best
Erick


On Fri, Nov 9, 2012 at 8:37 AM, Siegfried Goeschl  wrote:

> Hi folks,
>
> I'm now finishing a SOLR project for one of my customers (replacing
> Microsoft FAST server with SOLR) and got the permission to contribute our
> improvements.
>
> The most interesting thing is a "FrequentSearchTerm" component which
> allows to analyze the user-supplied search queries in real-time
>
> +) it keeps track of the last queries per core using a LIFO buffer (so we
> have an upper limit of memory consumption)
>
> +) per query entry we keep track of the number of invocations, the average
> number of result document and the average execution time
>
> +) we allow for custom searches across the frequent search terms using the
> MVEL expression language (see http://mvel.codehaus.org)
> ++) find all queries which did not yield any results - 'meanHits==0'
> ++) find all "iPhone" queries - "searchTerm.contains("iphone) ||
> searchTerm.contains("i-phone)'**'
> ++) find all long-running "iPhone" queries -
> '(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) &&
> meanTime>50'
>
> +) GUI : we have a JSP page which allows to access the frequent search
> terms
>
> +) there is also an XML/CSV export we use to display the 50 most
> frequently used search queries in real-time
>
> We use this component
>
> +) to get input for QA regarding frequently used search terms
> +) to find strange queries, e.g. queries returning no or too many result,
> e.g. caused by WordDelimeterFilter
> +) to keep our management happy ... :-)
>
> So the question is - is the community interested in such a contribution?
> If yes than I need to spend some time to improve the code from "industrial
> quality" to "open source quality" including documentation ... you know what
> I mean  :-)
>
> Thanks in advance,
>
> Siegfried Goeschl
>
> PS: Not sure if the name "Frequent Search Term Component" is perfectly
> suitable as it was taken from FAST - suggestions welcome
>
> --**--**-
> To unsubscribe, e-mail: 
> dev-unsubscribe@lucene.apache.**org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details

2012-11-09 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-1306:
-

Attachment: SOLR-1306.patch

Last fix broke some tests, this fixes them.

> Support pluggable persistence/loading of solr.xml details
> -
>
> Key: SOLR-1306
> URL: https://issues.apache.org/jira/browse/SOLR-1306
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Reporter: Noble Paul
>Assignee: Erick Erickson
> Fix For: 4.1
>
> Attachments: SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch, 
> SOLR-1306.patch
>
>
> Persisting and loading details from one xml is fine if the no:of cores are 
> small and the no:of cores are few/fixed . If there are 10's of thousands of 
> cores in a single box adding a new core (with persistent=true) becomes very 
> expensive because every core creation has to write this huge xml. 
> Moreover , there is a good chance that the file gets corrupted and all the 
> cores become unusable . In that case I would prefer it to be stored in a 
> centralized DB which is backed up/replicated and all the information is 
> available in a centralized location. 
> We may need to refactor CoreContainer to have a pluggable implementation 
> which can load/persist the details . The default implementation should 
> write/read from/to solr.xml . And the class should be pluggable as follows in 
> solr.xml
> {code:xml}
> 
>   
> 
> {code}
> There will be a new interface (or abstract class ) called SolrDataProvider 
> which this class must implement

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3856) DIH: Better tests for SqlEntityProcessor

2012-11-09 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-3856.
--

Resolution: Fixed

committed fixes.

Trunk: r1407547
branch_4x: r1407549

> DIH: Better tests for SqlEntityProcessor
> 
>
> Key: SOLR-3856
> URL: https://issues.apache.org/jira/browse/SOLR-3856
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 3.6, 4.0
>Reporter: James Dyer
>Assignee: James Dyer
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3856_20121109_fixes.patch, SOLR-3856-3.5.patch, 
> SOLR-3856.patch, SOLR-3856.patch, SOLR-3856.patch
>
>
> The current tests for SqlEntityProcessor (& CachedSqlEntityProcessor), while 
> many, do not reliably fail when bugs are introduced!  They are also difficult 
> to look at and understand.  As we move Jenkins onto new environments, we have 
> found several of them fail regularly leading to "@Ignore".  
> My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to 
> document (hopefully fix) any bugs this reveals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3856) DIH: Better tests for SqlEntityProcessor

2012-11-09 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-3856:
-

Attachment: SOLR-3856_20121109_fixes.patch

This adds better messages on failure to help figuring these out.  Also added an 
assume when the locale breaks the test, until SOLR-4051/SOLR-1916 can be fixed

> DIH: Better tests for SqlEntityProcessor
> 
>
> Key: SOLR-3856
> URL: https://issues.apache.org/jira/browse/SOLR-3856
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 3.6, 4.0
>Reporter: James Dyer
>Assignee: James Dyer
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3856_20121109_fixes.patch, SOLR-3856-3.5.patch, 
> SOLR-3856.patch, SOLR-3856.patch, SOLR-3856.patch
>
>
> The current tests for SqlEntityProcessor (& CachedSqlEntityProcessor), while 
> many, do not reliably fail when bugs are introduced!  They are also difficult 
> to look at and understand.  As we move Jenkins onto new environments, we have 
> found several of them fail regularly leading to "@Ignore".  
> My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to 
> document (hopefully fix) any bugs this reveals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-3856) DIH: Better tests for SqlEntityProcessor

2012-11-09 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer reopened SOLR-3856:
--


re-open to deal with recent test failures.

> DIH: Better tests for SqlEntityProcessor
> 
>
> Key: SOLR-3856
> URL: https://issues.apache.org/jira/browse/SOLR-3856
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 3.6, 4.0
>Reporter: James Dyer
>Assignee: James Dyer
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3856-3.5.patch, SOLR-3856.patch, SOLR-3856.patch, 
> SOLR-3856.patch
>
>
> The current tests for SqlEntityProcessor (& CachedSqlEntityProcessor), while 
> many, do not reliably fail when bugs are introduced!  They are also difficult 
> to look at and understand.  As we move Jenkins onto new environments, we have 
> found several of them fail regularly leading to "@Ignore".  
> My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to 
> document (hopefully fix) any bugs this reveals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-752) Allow better Field Compression options

2012-11-09 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494117#comment-13494117
 ] 

David Smiley commented on SOLR-752:
---

LUCENE-4226 basically does this but you can't configure codecs; you pick a 
codec in its default mode.  The Compressing codec defaults to "fast" and yields 
~50% savings based on Adrien's tests of a "small to medium" sized index:  
https://issues.apache.org/jira/browse/LUCENE-4226?focusedCommentId=13451708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13451708

But what I'd like to see is the ability to compress a large text field (alone), 
for the purposes of highlighting, and much more than 50% compression.  It might 
not be able to handle that many concurrent requests to meet response time SLAs, 
but some search apps aren't under high load.

> Allow better Field Compression options
> --
>
> Key: SOLR-752
> URL: https://issues.apache.org/jira/browse/SOLR-752
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: compressed_field.patch, compressedtextfield.patch
>
>
> See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
> It would be good if Solr handled field compression outside of Lucene's 
> Field.COMPRESS capabilities, since those capabilities are less than ideal 
> when it comes to control over compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3918) Change the way -excl-slf4j targets work

2012-11-09 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494107#comment-13494107
 ] 

Shawn Heisey commented on SOLR-3918:


I've changed the issue title, because the latest version of the patch changes 
how dist-war-excl-slf4j works, in addition to creating a new dist-excl-slf4j 
target.  The current build.xml leaves slf4j-api in the war, forcing you to 
stick with that specific slf4j version.  The patch removes all slf4j jars from 
the war.

With my patch, someone who wants to change the slf4j binding can go to 
slf4j.org and download the newest version.  By putting the appropriate jars 
into the proper location (lib/ext for the included jetty8), they can use the 
-excl-slf4j war and have everything work.  The required jars are slf4j-api, 
jcl-over-slf4j, log4j-over-slf4j, the required binding jar.  In the case of 
log4j, you have to include the log4j jar itself as well.


> Change the way -excl-slf4j targets work
> ---
>
> Key: SOLR-3918
> URL: https://issues.apache.org/jira/browse/SOLR-3918
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Shawn Heisey
>Priority: Trivial
> Fix For: 3.6.2, 4.1, 5.0
>
> Attachments: SOLR-3918.patch, SOLR-3918.patch
>
>
> If you want to create an entire dist target but leave out slf4j bindings, you 
> must currently use this:
> ant dist-solrj, dist-core, dist-test-framework, dist-contrib 
> dist-war-excl-slf4j
> It would be better to have a single target.  Attaching a patch against 
> branch_4x for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3816) Need a more granular nrt system that is close to a realtime system.

2012-11-09 Thread Nagendra Nagarajayya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494105#comment-13494105
 ] 

Nagendra Nagarajayya commented on SOLR-3816:


@Otis:

Yes, you could set to something low or 0, but this means it has to close and 
open the SolrIndexSearcher this often. SolrIndexSearcher is a heavy object that 
is reference counted so there may be searches going on, etc. has lots critical 
areas that need to be synchronized to close and reopen a new searcher, warms it 
up, etc.; was not meant for this kind of a use ...

Realtime-search just gets a new nrt reader from the writer and passes this 
along to the Searcher, a lean searcher with no state. In the future if lucene's 
developers make the reader more realtime so it sees more changes as they happen 
at the writer realtime-search should be able to handle it ...

"Quote from the user using realtime-search"
Insertion speed – while we can’t really explain this, we are able to insert 70k 
records per second at a steady rate over time with RA, while we can only do 40k 
at a descending rate with normal Solr. Granted we haven’t even slightly 
configured regular Solr for high speed insertion with regard to segment 
configs, but this was good for us to get us quickly off the ground.
"end quote"

I think has gotten better with the 4.0 release. I have also requested the user 
to benchmark and update the JIRA as I don't have the required hardware.



> Need a more granular nrt system that is close to a realtime system.
> ---
>
> Key: SOLR-3816
> URL: https://issues.apache.org/jira/browse/SOLR-3816
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, replication (java), search, 
> SearchComponents - other, SolrCloud, update
>Affects Versions: 4.0
>Reporter: Nagendra Nagarajayya
>  Labels: nrt, realtime, replication, search, solrcloud, update
> Attachments: alltests_passed_with_realtime_turnedoff.log, 
> SOLR-3816_4.0_branch.patch, SOLR-3816-4.x.trunk.patch, 
> solr-3816-realtime_nrt.patch
>
>
> Need a more granular NRT system that is close to a realtime system. A 
> realtime system should be able to reflect changes to the index as and when 
> docs are added/updated to the index. soft-commit offers NRT and is more 
> realtime friendly than hard commit but is limited by the dependency on the 
> SolrIndexSearcher being closed and reopened and offers a coarse granular NRT. 
> Closing and reopening of the SolrIndexSearcher may impact performance also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3918) Change the way -excl-slf4j targets work

2012-11-09 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-3918:
---

Summary: Change the way -excl-slf4j targets work  (was: Create 
dist-excl-slf4j target)

> Change the way -excl-slf4j targets work
> ---
>
> Key: SOLR-3918
> URL: https://issues.apache.org/jira/browse/SOLR-3918
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Shawn Heisey
>Priority: Trivial
> Fix For: 3.6.2, 4.1, 5.0
>
> Attachments: SOLR-3918.patch, SOLR-3918.patch
>
>
> If you want to create an entire dist target but leave out slf4j bindings, you 
> must currently use this:
> ant dist-solrj, dist-core, dist-test-framework, dist-contrib 
> dist-war-excl-slf4j
> It would be better to have a single target.  Attaching a patch against 
> branch_4x for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2012-11-09 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494086#comment-13494086
 ] 

David Smiley commented on LUCENE-4550:
--

A solution is to calculate the distance from a bbox corner to its center, 
instead of the current algorithm which takes half of the distance from opposite 
corners.  The only small issue to consider is that the distance from a bbox 
corner to its center will vary up to ~4x (worse case) depending on wether you 
take a top corner or bottom corner, so I could do both and take the shorter 
(resulting in a little more accuracy than taking the longer).

> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Priority: Minor
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2012-11-09 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4550:
-

Description: 
When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
to know how many levels down the prefix tree to go for a target precision 
(distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
defaulting to 2.5% (0.0025).

If the shape presented is extremely wide, > 180 degrees, then the internal 
calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
the shape's size as having width < 180 degrees, yielding *more* accuracy than 
intended.  Given that this happens for unrealistic shape sizes and results in 
more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
this was discovered as a result of someone using lucene-spatial incorrectly, 
not for an actual shape they have.  But in the extreme \[erroneous\] case they 
had, they had 566k terms (!) generated, when it should have been ~1k tops. 

  was:
When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
to know how many levels down the prefix tree to go for a target precision 
(distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
defaulting to 2.5% (0.0025).

If the shape presented is extremely wide, > 180 degrees, then the internal 
calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
the shape's size as having width < 180 degrees, yielding *more* accuracy than 
intended.  Given that this happens for unrealistic shape sizes and results in 
more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
this was discovered as a result of someone using lucene-spatial incorrectly, 
not for an actual shape they have.  But in the extreme [erroneous] case they 
had, they had 566k terms (!) generated, when it should have been ~1k tops. 


> For extremely wide shapes (> 180 degrees) distErrPct is not used correctly
> --
>
> Key: LUCENE-4550
> URL: https://issues.apache.org/jira/browse/LUCENE-4550
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial
>Affects Versions: 4.0
>Reporter: David Smiley
>Priority: Minor
>
> When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
> to know how many levels down the prefix tree to go for a target precision 
> (distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
> defaulting to 2.5% (0.0025).
> If the shape presented is extremely wide, > 180 degrees, then the internal 
> calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
> the shape's size as having width < 180 degrees, yielding *more* accuracy than 
> intended.  Given that this happens for unrealistic shape sizes and results in 
> more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
> this was discovered as a result of someone using lucene-spatial incorrectly, 
> not for an actual shape they have.  But in the extreme \[erroneous\] case 
> they had, they had 566k terms (!) generated, when it should have been ~1k 
> tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4550) For extremely wide shapes (> 180 degrees) distErrPct is not used correctly

2012-11-09 Thread David Smiley (JIRA)
David Smiley created LUCENE-4550:


 Summary: For extremely wide shapes (> 180 degrees) distErrPct is 
not used correctly
 Key: LUCENE-4550
 URL: https://issues.apache.org/jira/browse/LUCENE-4550
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.0
Reporter: David Smiley
Priority: Minor


When a shape is given to a PrefixTreeStrategy (index or query time), it needs 
to know how many levels down the prefix tree to go for a target precision 
(distErrPct).  distErrPct is basically a fraction of the radius of the shape, 
defaulting to 2.5% (0.0025).

If the shape presented is extremely wide, > 180 degrees, then the internal 
calculations in SpatialArgs.calcDistanceFromErrPct(...) will wrongly measure 
the shape's size as having width < 180 degrees, yielding *more* accuracy than 
intended.  Given that this happens for unrealistic shape sizes and results in 
more accuracy, I am flagging this as "minor", but a bug nonetheless.  Indeed, 
this was discovered as a result of someone using lucene-spatial incorrectly, 
not for an actual shape they have.  But in the extreme [erroneous] case they 
had, they had 566k terms (!) generated, when it should have been ~1k tops. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-752) Allow better Field Compression options

2012-11-09 Thread Pieter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494042#comment-13494042
 ] 

Pieter commented on SOLR-752:
-

Doesn't the new Lucene 4.1 field compression (LUCENE-4226 if I am right) tackle 
this?

> Allow better Field Compression options
> --
>
> Key: SOLR-752
> URL: https://issues.apache.org/jira/browse/SOLR-752
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: compressed_field.patch, compressedtextfield.patch
>
>
> See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
> It would be good if Solr handled field compression outside of Lucene's 
> Field.COMPRESS capabilities, since those capabilities are less than ideal 
> when it comes to control over compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Failing tests aka. "what's the point of running them?"

2012-11-09 Thread Jack Krupansky

+1

I mean, yes, I would like to see any test failures addressed quickly, but 
for any tests that fail chronically, it makes sense to both disable/ignore 
them as well as make sure that "blocker" Jira's get filed for them.


Personally, I'd like to see Jira's for all test failure errors so that 
people can easily search Jira for any failure message, hang, etc., including 
reasonably detailed narrative to explain how the failure occurred and how it 
was tracked down and the nature of the fix so that future failures can be 
fixed more promptly and by more people. Leave enough info that less senior 
community members can begin to learn what it takes to fix test failures.


As things stand, I don't have a ghost of a chance at looking at any of these 
test failures - the whole test infrastructure is a black box with such 
complexity that I can fathom only the simplest of tests. If I have 
difficulty with this, maybe there are others who would benefit as well from 
a greater sharing of the expertise needed to track down these chronic and 
seemingly mysterious failures. I mean, the expertise to even LOOK at some of 
these failures is in the heads and hands of too small a set of individuals.


And maybe the list of failure modes is also indicative of the lack of a rich 
enough test infrastructure at the application level, especially when dealing 
with timing issues, let alone the vagaries of Java and individual JVM 
idiosyncrasies . The irony is that sometimes we spend more time trying to 
get the tests to pass on timing issues than to put more stress of code to 
expose more timing problems.


Oh, and if some JVM's are failing chronically, tag those JVM's as 
"unsupported" until sufficient testing and testing expertise is available to 
get things fixed. File detailed Jira's as well, as above, except NOT as 
blockers. I mean, let's get the chronic failures under control on the "main" 
supported platforms before expanding the supported environments - "supported 
but with significant and chronic test failures" should not be supported.


-- Jack Krupansky

-Original Message- 
From: Simon Willnauer

Sent: Friday, November 09, 2012 2:36 AM
To: dev@lucene.apache.org
Subject: Failing tests aka. "what's the point of running them?"

hey folks,

I know yonik and mark had a long time power outage so I don't want to
blame anybody here but we need to fix those test failures.
Really when you look at those jenkins jobs:

http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/
https://builds.apache.org/computer/lucene/
https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/

its really funny if that'd be a joke but it isn't. If I'd be a new
contributor I'd be scared as sh** when I subscribe to the mailing
list. It's also not a good advertisement for us either. Yet, even
further it makes me miss failures I might have caused since the amount
of failure mails don't encourage to look at them since its the same
tests that fail over and over again no matter what code I commit. I
can already hear somebody saying "why don't you fix it" - well fair
enough but this project is massive and we are a large committer base
and I don't see myself fix the code I have never ever touched. Anyhow,
I really ask myself what is the point of running these tests,
specifically the solr ones, if the fail over and over again and nobody
cares? Even if folks care they don't get fixed and this project as
more committers than yonik and mark. Its really a bad sign if we are
at the point where we rely on 2 people to fix tests on a stable
branch...
I'd really like to hear how people want to address this, I mean it
would be just fair to disable the tests until somebody has the
patience / time to fix them we can / should make it blockers for a
release. Really a jenkins mail should be the exception not the
rule...I also think if the FreeBSD jenkins black hole stuff is a
problem for the tests then lets add a @BlackHoleProne annotation and
only run that on linux? I really don't care how we fix it but if we
don't have a solution by the end of next week I will add @Ignore to
all of them that failed in the last 2 weeks.

Sorry I got so frustrated about this - this is really bad press here!

simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput

2012-11-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494036#comment-13494036
 ] 

Adrien Grand commented on LUCENE-4549:
--

I think you unintentionally always enabled rate limiting in the test case.
{code}
-if (rarely(random)) { 
+if (rarely(random) || true) { 
{code}

> Allow variable buffer size on BufferedIndexOutput 
> --
>
> Key: LUCENE-4549
> URL: https://issues.apache.org/jira/browse/LUCENE-4549
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4549.patch
>
>
> BufferedIndexInput allows to set the buffersize but BufferedIndexOutput 
> doesn't this could be useful for optimizations related to LUCENE-4537. We 
> should make the apis here consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-11-09 Thread Jay Hacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494029#comment-13494029
 ] 

Jay Hacker commented on SOLR-3274:
--

Not sure if it's the same problem, but I have seen similar issues with 4.0.0 
release.  I get errors like:

{code}
ClusterState says we are the leader, but locally we don't think so
There was a problem finding the leader in zk
forwarding update to http://solr83:4000/solr/main/ failed - retrying ...
Cannot open channel to 3 at election address solr84/X.X.X.X:5002
Session 0x0 for server null, unexpected error, closing socket connection and 
attempting reconnect
{code}

I'm running zookeeper embedded, and the problem turns out to be long garbage 
collection pauses.  During a stop-the-world collection, zookeeper times out.  
It's especially bad if the system has to page in a bunch of memory from disk.  
This would explain why things run fine for a while, until memory fills up and 
you need to do a big GC.  This is quite repeatable for us; just index until 
memory is pretty full, wait for a long GC or trigger one manually with VisualVM.

You can try different garbage collectors or specifying maximum pause times 
(I've had some luck with {{-XX:+UseConcMarkSweepGC}} ), but the best solution 
may be to run zookeeper in an independent JVM.

> ZooKeeper related SolrCloud problems
> 
>
> Key: SOLR-3274
> URL: https://issues.apache.org/jira/browse/SOLR-3274
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Any
>Reporter: Per Steffensen
>Assignee: Mark Miller
>
> Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
> Solr servers, running 28 slices of the same collection (collA) - all slices 
> have one replica (two shards all in all - leader + replica) - 56 cores all in 
> all (8 shards on each solr instance). But anyways...
> Besides the problem reported in SOLR-3273, the system seems to run fine under 
> high load for several hours, but eventually errors like the ones shown below 
> start to occur. I might be wrong, but they all seem to indicate some kind of 
> unstability in the collaboration between Solr and ZooKeeper. I have to say 
> that I havnt been there to check ZooKeeper "at the moment where those 
> exception occur", but basically I dont believe the exceptions occur because 
> ZooKeeper is not running stable - at least when I go and check ZooKeeper 
> through other "channels" (e.g. my eclipse ZK plugin) it is always accepting 
> my connection and generally seems to be doing fine.
> Exception 1) Often the first error we see in solr.log is something like this
> {code}
> Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
> Updates are disabled.
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpCo

Re: Failing tests aka. "what's the point of running them?"

2012-11-09 Thread Dawid Weiss
This is a recurring discussion -- the latest one (I participated in)
was attached to a Jira issue about excluding those frequently failing
tests from the build (and leaving a single build on jenkins that would
_not_ sent e-mails to the list for those who are interested in those
failures).

This was met with mixed feelings and I dropped the subject.

> I want to know if anybody is even looking at the test failures.

And this was my primary concern...

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput

2012-11-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4549:


Attachment: LUCENE-4549.patch

here is a patch...

> Allow variable buffer size on BufferedIndexOutput 
> --
>
> Key: LUCENE-4549
> URL: https://issues.apache.org/jira/browse/LUCENE-4549
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4549.patch
>
>
> BufferedIndexInput allows to set the buffersize but BufferedIndexOutput 
> doesn't this could be useful for optimizations related to LUCENE-4537. We 
> should make the apis here consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4549) Allow variable buffer size on BufferedIndexOutput

2012-11-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-4549:
---

Assignee: Simon Willnauer

> Allow variable buffer size on BufferedIndexOutput 
> --
>
> Key: LUCENE-4549
> URL: https://issues.apache.org/jira/browse/LUCENE-4549
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4549.patch
>
>
> BufferedIndexInput allows to set the buffersize but BufferedIndexOutput 
> doesn't this could be useful for optimizations related to LUCENE-4537. We 
> should make the apis here consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[SOLR] RFC - Contributing a FrequentSearchTerm component ...

2012-11-09 Thread Siegfried Goeschl

Hi folks,

I'm now finishing a SOLR project for one of my customers (replacing 
Microsoft FAST server with SOLR) and got the permission to contribute 
our improvements.


The most interesting thing is a "FrequentSearchTerm" component which 
allows to analyze the user-supplied search queries in real-time


+) it keeps track of the last queries per core using a LIFO buffer (so 
we have an upper limit of memory consumption)


+) per query entry we keep track of the number of invocations, the 
average number of result document and the average execution time


+) we allow for custom searches across the frequent search terms using 
the MVEL expression language (see http://mvel.codehaus.org)

++) find all queries which did not yield any results - 'meanHits==0'
++) find all "iPhone" queries - "searchTerm.contains("iphone) || 
searchTerm.contains("i-phone)''
++) find all long-running "iPhone" queries - 
'(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) && 
meanTime>50'


+) GUI : we have a JSP page which allows to access the frequent search terms

+) there is also an XML/CSV export we use to display the 50 most 
frequently used search queries in real-time


We use this component

+) to get input for QA regarding frequently used search terms
+) to find strange queries, e.g. queries returning no or too many 
result, e.g. caused by WordDelimeterFilter

+) to keep our management happy ... :-)

So the question is - is the community interested in such a contribution? 
If yes than I need to spend some time to improve the code from 
"industrial quality" to "open source quality" including documentation 
... you know what I mean  :-)


Thanks in advance,

Siegfried Goeschl

PS: Not sure if the name "Frequent Search Term Component" is perfectly 
suitable as it was taken from FAST - suggestions welcome


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Failing tests aka. "what's the point of running them?"

2012-11-09 Thread Robert Muir
On Fri, Nov 9, 2012 at 5:36 AM, Simon Willnauer
 wrote:
> hey folks,
>
> I know yonik and mark had a long time power outage so I don't want to
> blame anybody here but we need to fix those test failures.
> Really when you look at those jenkins jobs:
>
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/
> https://builds.apache.org/computer/lucene/
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/
>
> its really funny if that'd be a joke but it isn't. If I'd be a new
> contributor I'd be scared as sh** when I subscribe to the mailing
> list. It's also not a good advertisement for us either. Yet, even
> further it makes me miss failures I might have caused since the amount
> of failure mails don't encourage to look at them since its the same
> tests that fail over and over again no matter what code I commit. I
> can already hear somebody saying "why don't you fix it" - well fair
> enough but this project is massive and we are a large committer base
> and I don't see myself fix the code I have never ever touched. Anyhow,
> I really ask myself what is the point of running these tests,
> specifically the solr ones, if the fail over and over again and nobody
> cares?

I want to know if anybody is even looking at the test failures.

At some point I began filtering solr test failures to my email spam
folder via 3 gmail rules. I don't run solr tests locally anymore
either because of the huge false failure rate.

I have to do these things because I don't want to miss lucene failures
in the noise of this stuff.

For now I disabled the solr tests in jenkins jobs. This shouldnt be
controversial: they havent passed in over 15 days.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3368 - Still Failing

2012-11-09 Thread Robert Muir
The bug here (in my opinion) is that ThaiWordFilter is a filter at all
(it should be a tokenizer). Like WDF and other filters that really
should be tokenizers, It doesn't expect and can't handle arbitrary
input correctly (e.g. thats been through a shingle filter...)

Another problem is that offsetsAreCorrect=false allows for offsets to
"go backwards" in the stream. But this leniency is a false sense of
security, because if you add a shingle filter then you have a
situation like this where startOffset > endOffset.

On Fri, Nov 9, 2012 at 7:31 AM, Apache Jenkins Server
 wrote:
> Error Message:
> startOffset must be non-negative, and endOffset must be >= startOffset, 
> startOffset=5,endOffset=3
>
> Stack Trace:
> java.lang.IllegalAr> [junit4:junit4]   2> Exception from random analyzer:
> [junit4:junit4]   2> charfilters=
> [junit4:junit4]   2> tokenizer=
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.core.WhitespaceTokenizer(LUCENE_50, 
> org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@7f4aaa58)
> [junit4:junit4]   2> filters=
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.miscellaneous.LengthFilter(false, 
> org.apache.lucene.analysis.ValidatingTokenFilter@1, -30, 69)
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea,
>  tpzabzsxye)
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.th.ThaiWordFilter(LUCENE_50, 
> org.apache.lucene.analysis.ValidatingTokenFilter@37caea)
> [junit4:junit4]   2>   
> org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea)
> [junit4:junit4]   2> offsetsAreCorrect=false

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3368 - Still Failing

2012-11-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3368/

1 tests failed.
REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains

Error Message:
startOffset must be non-negative, and endOffset must be >= startOffset, 
startOffset=5,endOffset=3

Stack Trace:
java.lang.IllegalArgumentException: startOffset must be non-negative, and 
endOffset must be >= startOffset, startOffset=5,endOffset=3
at 
__randomizedtesting.SeedInfo.seed([FE0FDF1A0D2C367D:C3EEF67B4A3E2BBD]:0)
at 
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:43)
at 
org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:323)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:632)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:542)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:443)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:859)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 

[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493891#comment-13493891
 ] 

zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:38 AM:
-

patch for solr 4.0.0 available #v400-SOLR-2549.patch



  was (Author: zakibenz):
patch for solr 4.0.0 available 
  
> DIH LineEntityProcessor support for delimited & fixed-width files
> -
>
> Key: SOLR-2549
> URL: https://issues.apache.org/jira/browse/SOLR-2549
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, 
> SOLR-2549.patch, v400-SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write 
> a Transformer. 
> The following xml properties are supported with this version of 
> LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to 
> join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493891#comment-13493891
 ] 

zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:38 AM:
-

patch for solr 4.0.0 available 

  was (Author: zakibenz):
patch for solr 4.0.0
  
> DIH LineEntityProcessor support for delimited & fixed-width files
> -
>
> Key: SOLR-2549
> URL: https://issues.apache.org/jira/browse/SOLR-2549
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, 
> SOLR-2549.patch, v400-SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write 
> a Transformer. 
> The following xml properties are supported with this version of 
> LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to 
> join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zakaria benzidalmal updated SOLR-2549:
--

Attachment: v400-SOLR-2549.patch

patch for solr 4.0.0

> DIH LineEntityProcessor support for delimited & fixed-width files
> -
>
> Key: SOLR-2549
> URL: https://issues.apache.org/jira/browse/SOLR-2549
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, 
> SOLR-2549.patch, v400-SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write 
> a Transformer. 
> The following xml properties are supported with this version of 
> LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to 
> join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Failing tests aka. "what's the point of running them?"

2012-11-09 Thread Simon Willnauer
hey folks,

I know yonik and mark had a long time power outage so I don't want to
blame anybody here but we need to fix those test failures.
Really when you look at those jenkins jobs:

http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/
https://builds.apache.org/computer/lucene/
https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/

its really funny if that'd be a joke but it isn't. If I'd be a new
contributor I'd be scared as sh** when I subscribe to the mailing
list. It's also not a good advertisement for us either. Yet, even
further it makes me miss failures I might have caused since the amount
of failure mails don't encourage to look at them since its the same
tests that fail over and over again no matter what code I commit. I
can already hear somebody saying "why don't you fix it" - well fair
enough but this project is massive and we are a large committer base
and I don't see myself fix the code I have never ever touched. Anyhow,
I really ask myself what is the point of running these tests,
specifically the solr ones, if the fail over and over again and nobody
cares? Even if folks care they don't get fixed and this project as
more committers than yonik and mark. Its really a bad sign if we are
at the point where we rely on 2 people to fix tests on a stable
branch...
I'd really like to hear how people want to address this, I mean it
would be just fair to disable the tests until somebody has the
patience / time to fix them we can / should make it blockers for a
release. Really a jenkins mail should be the exception not the
rule...I also think if the FreeBSD jenkins black hole stuff is a
problem for the tests then lets add a @BlackHoleProne annotation and
only run that on linux? I really don't care how we fix it but if we
don't have a solution by the end of next week I will add @Ignore to
all of them that failed in the last 2 weeks.

Sorry I got so frustrated about this - this is really bad press here!

simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3785) Cluster-state inconsistent

2012-11-09 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493888#comment-13493888
 ] 

Per Steffensen commented on SOLR-3785:
--

Well, I believe the entire thing with the Overseer is a bad idea. It requires 
at least one Solr is running before you can trust the state-descriptions in ZK 
- even if this particular "issue" SOLR-3785 is solved using Overseer. We have 
clients that uses the state-descriptions (through 
CloudSolrServer/ZkStateReader) to detect if the Solr cluster is running well 
enough to use it. If all Solrs are down I believe it cannot be seen from the 
state (you can check live-nodes, and if no Solrs are running you know that you 
cant trust it).

I think you should remove the Overseer entirely and modify ZkStateReader to be 
able to, single-handedly, look at the ZK state and calculate correct 
ClusterState. E.g. shard-state could be maintained by the Solr running the 
shard (as it is today), but as an ephemeral node that disappears when the Solr 
is not running. ZkStateReader should have logic that, when calculating a 
shard-state, looks at this ephemeral node, but if it is missing assumes 
"down"-state.

Regards, Per Steffensen

> Cluster-state inconsistent
> --
>
> Key: SOLR-3785
> URL: https://issues.apache.org/jira/browse/SOLR-3785
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: Self-build Solr release built on Apache Solr revision 
> 1355667 from 4.x branch
>Reporter: Per Steffensen
> Attachments: SOLR-3785.patch
>
>
> Information in CloudSolrServer.getZkStateReader().getCloudState() (called 
> cloudState below) seems to be inconsistent. 
> I have a Solr running the leader of slice "sliceName" in collection 
> "collectionName" - no replica to take over. I shut down this Solr, and I want 
> to detect that there is now no leader active. 
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || 
> !leader.containsKey(ZkStateReader.STATE_PROP) || 
> !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE);
> {code}
> This does not work. It seems like changing state of a shard does it not 
> changed when this Solr goes down.
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || 
> !leader.containsKey(ZkStateReader.STATE_PROP) || 
> !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE) ||
> !leader.containsKey(ZkStateReader.NODE_NAME_PROP) || 
> !cloudState.getLiveNodes().contains(leader.get(ZkStateReader.NODE_NAME_PROP))
> {code}
> Whis works.
> It seems like live-nodes of cloudState is updated when Solr goes down, but 
> that some of the other info available through cloudState is not - e.g. 
> getLeader().
> This might already have already been solved on 4.x branch in a revision later 
> than 1355667. Then please just tell me - thanks.
> Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4055) Remove/Reload the collection will occur the thread safe issue.

2012-11-09 Thread Raintung Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raintung Li updated SOLR-4055:
--

Attachment: patch-4055

the bug patch

> Remove/Reload the collection will occur the thread safe issue.
> --
>
> Key: SOLR-4055
> URL: https://issues.apache.org/jira/browse/SOLR-4055
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0
> Environment: Solr cloud
>Reporter: Raintung Li
> Attachments: patch-4055
>
>
> OverseerCollectionProcessor class for collectionCmd method has thread safe 
> issue.
> The major issue is ModifiableSolrParams params instance will deliver into 
> other thread use(HttpShardHandler.submit). Modify parameter will affect the 
> other threads the correct parameter.
> In the method collectionCmd , change the value 
> params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); 
> , that occur send the http request thread will get the wrong core name. The 
> result is that can't delete/reload the right core.
> The easy fix is clone the ModifiableSolrParams for every request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493871#comment-13493871
 ] 

zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:10 AM:
-

thanks to james for his help ;)

  was (Author: zakibenz):
thanks to james for his help
  
> DIH LineEntityProcessor support for delimited & fixed-width files
> -
>
> Key: SOLR-2549
> URL: https://issues.apache.org/jira/browse/SOLR-2549
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, 
> SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write 
> a Transformer. 
> The following xml properties are supported with this version of 
> LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to 
> join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493871#comment-13493871
 ] 

zakaria benzidalmal commented on SOLR-2549:
---

thanks to james for his help

> DIH LineEntityProcessor support for delimited & fixed-width files
> -
>
> Key: SOLR-2549
> URL: https://issues.apache.org/jira/browse/SOLR-2549
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, 
> SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write 
> a Transformer. 
> The following xml properties are supported with this version of 
> LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to 
> join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493870#comment-13493870
 ] 

zakaria benzidalmal commented on SOLR-2549:
---

data config example:







/>








> DIH LineEntityProcessor support for delimited & fixed-width files
> -
>
> Key: SOLR-2549
> URL: https://issues.apache.org/jira/browse/SOLR-2549
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, 
> SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write 
> a Transformer. 
> The following xml properties are supported with this version of 
> LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to 
> join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4055) Remove/Reload the collection will occur the thread safe issue.

2012-11-09 Thread Raintung Li (JIRA)
Raintung Li created SOLR-4055:
-

 Summary: Remove/Reload the collection will occur the thread safe 
issue.
 Key: SOLR-4055
 URL: https://issues.apache.org/jira/browse/SOLR-4055
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0, 4.0-BETA, 4.0-ALPHA
 Environment: Solr cloud
Reporter: Raintung Li


OverseerCollectionProcessor class for collectionCmd method has thread safe 
issue.
The major issue is ModifiableSolrParams params instance will deliver into other 
thread use(HttpShardHandler.submit). Modify parameter will affect the other 
threads the correct parameter.

In the method collectionCmd , change the value params.set(CoreAdminParams.CORE, 
node.getStr(ZkStateReader.CORE_NAME_PROP)); , that occur send the http request 
thread will get the wrong core name. The result is that can't delete/reload the 
right core.

The easy fix is clone the ModifiableSolrParams for every request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

2012-11-09 Thread zakaria benzidalmal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zakaria benzidalmal updated SOLR-2549:
--

Attachment: SOLR-2549.patch

Fix NPE Bug when escape parameter is not specified.

> DIH LineEntityProcessor support for delimited & fixed-width files
> -
>
> Key: SOLR-2549
> URL: https://issues.apache.org/jira/browse/SOLR-2549
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, 
> SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write 
> a Transformer. 
> The following xml properties are supported with this version of 
> LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to 
> join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4054) delta import of solr4.0 put median data(id of db changed data) to transformer

2012-11-09 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4054.
--

Resolution: Invalid

Please raise this kind of issue on the user's list, see 
http://lucene.apache.org/solr/discussion.html for info.

JIRAs are intended for bugs/enhancements rather than usage issues.

> delta import of solr4.0 put median data(id of db changed data) to transformer
> -
>
> Key: SOLR-4054
> URL: https://issues.apache.org/jira/browse/SOLR-4054
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
> Environment: suse server linux 11 +  resin4
>Reporter: xuzheng
>
> followingis my config, when i use delta import in my project, in my resin 
> java log,
> i saw the median data created by deltaQuery was also sent to 
> SuggestionTransformer before the data created by deltaImportQuery, what i 
> want is only data dumped by deltaImportQuery can be sent to my transformer, 
> anybody can explain  this or tell me what is the mistake i have made?
>pk="Id"
> query="select * from Video" 
> deltaImportQuery="select * from Video where 
> Id='${dataimporter.delta.Id}'"
> deltaQuery="select Id from Video where 'UpdateTime' > 
> '${dataimporter.last_index_time}'"
>  transformer="videosearch.dataimport.SuggestionTransformer">
>  ranking="true"/>
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org