[jira] [Updated] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP

2014-01-07 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5610:
-

Description: 
Add a collection admin API for cluster wide property management
the new API would create an entry in the root as 
/cluster-props.json
{code:javascript}
{
"prop":val"
}
{code}

The API would work as

/command=clusterprop&name=propName&value=propVal

there will be a set of well-known properties which can be set or unset with 
this command

  was:
Add a collection admin API for cluster wide property management
the new API would create an entry in the root as 
/cluster-props.json
{code:javascipt}
{
"prop":val"
}

The API would work as

/command=clusterprop&name=propName&value=propVal

there will be a set of well-known properties which can be set or unset with 
this command


> Support cluster-wide properties with an API called CLUSTERPROP
> --
>
> Key: SOLR-5610
> URL: https://issues.apache.org/jira/browse/SOLR-5610
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>
> Add a collection admin API for cluster wide property management
> the new API would create an entry in the root as 
> /cluster-props.json
> {code:javascript}
> {
> "prop":val"
> }
> {code}
> The API would work as
> /command=clusterprop&name=propName&value=propVal
> there will be a set of well-known properties which can be set or unset with 
> this command



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5560) Enable LocalParams without escaping the query

2014-01-07 Thread Ryan Cutter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865053#comment-13865053
 ] 

Ryan Cutter commented on SOLR-5560:
---

I don't know, I assume a committer familiar with this area will take a look in 
the near future.  I see other unassigned tickets with patches attached so I'm 
sure there's a process.

> Enable LocalParams without escaping the query
> -
>
> Key: SOLR-5560
> URL: https://issues.apache.org/jira/browse/SOLR-5560
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.6
>Reporter: Isaac Hebsh
> Fix For: 4.7, 4.6.1
>
> Attachments: SOLR-5560.patch
>
>
> This query should be a legit syntax:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND {!lucene df=text}(TERM2 TERM3 "TERM4 TERM5")
> currently it isn't, because the LocalParams can be specified on a single term 
> only.
> [~billnbell] thinks it is a bug.
> From the mailing list:
> {quote}
> We want to set a LocalParam on a nested query. When quering with "v" inline 
> parameter, it works fine:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND {!lucene df=text v="TERM2 TERM3 \"TERM4 TERM5\""}
> the parsedquery_toString is
> +id:TERM1 +(text:term2 text:term3 text:"term4 term5")
> Query using the "_query_" also works fine:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND _query_:"{!lucene df=text}TERM2 TERM3 \"TERM4 TERM5\""
> (parsedquery is exactly the same).
> Obviously, there is the option of external parameter ({... 
> v=$nestedq}&nestedq=...)
> This is a good solution, but it is not practical, when having a lot of such 
> nested queries.
> BUT, when trying to put the nested query in place, it yields syntax error:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND {!lucene df=text}(TERM2 TERM3 "TERM4 TERM5")
> org.apache.solr.search.SyntaxError: Cannot parse '(TERM2'
> The previous options are less preferred, because the escaping that should be 
> made on the nested query.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2553) Nested Field Collapsing

2014-01-07 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865012#comment-13865012
 ] 

Kranti Parisa commented on SOLR-2553:
-

I think we will also need to support other grouping params especially 
group.limit. so that user can restrict the results even with Nested Groups

> Nested Field Collapsing
> ---
>
> Key: SOLR-2553
> URL: https://issues.apache.org/jira/browse/SOLR-2553
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Martijn Laarman
>
> Currently specifying grouping on multiple fields returns multiple datasets. 
> It would be nice if Solr supported cascading / nested grouping by applying 
> the first group over the entire result set, the next over each group and so 
> forth and so forth. 
> Even if limited to supporting nesting grouping 2 levels deep would cover alot 
> of use cases. 
> group.field=location&group.field=type
> -Location X
> ---Type 1
> -documents
> ---Type 2
> documents
> -Location Y
> ---Type 1
> documents
> ---Type 2
> documents
> instead of 
> -Location X
> -- documents
> -Location Y
> --documents
> -Type 1
> --documents
> -Type2
> --documents



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

2014-01-07 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5618:
---

Attachment: SOLR-5618.patch

This smells like a caching related bug ... but i have no idea why/where.

The test does multiple iterations where in each iteration it builds an index of 
a random number of documents, each containing an incremented value for "id" and 
"val_i" -- the number of documents can range from 1 to 21, with the id and 
val_i fields starting at "0".  Then it generates a bunch of random requests 
consisting of random q and fq params.

This is what the failing request looks like...

{noformat}
q  = {!frange v=val_i l=0 u=1 cost=139 tag=t}
fq = {!frange v=val_i l=0 u=1}
fq = {! cost=92}-_query_:"{!frange v=val_i l=1 u=1}" 
fq = {!frange v=val_i l=0 u=1 cache=true tag=t}
fq = {! cache=true tag=t}-_query_:"{!frange v=val_i l=1 u=1}"
{noformat}

So basically: it will only ever match docs which have val_i==0 -- which given 
how the index is built means it should always match exactly 1 document: the 0th 
doc -- but in the failure message we can see that it doens't match any docs.

(FWIW: adding some debugging indicates that in the iteration where this fails, 
the index only has 2 documents in it -- doc#0 and doc#1)

In this patch i'm attaching, I hacked the test to explicitly attempt the above 
query in every iteration, regardless of the num docs in the index, immediately 
after building the index -- and that new assertion never fails.  but then after 
it passes, it continues on with the existing logic, to generating a bunch of 
random requests and executing them -- and when it randomly generates the same 
query as above (that already succeeded in matching 1 doc against the current 
index) that query then fails to match any docs.

which smells to me like some sort of filter caching glitch .. right?

> Reproducible failure from TestFiltering.testRandomFiltering
> ---
>
> Key: SOLR-5618
> URL: https://issues.apache.org/jira/browse/SOLR-5618
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: SOLR-5618.patch
>
>
> uwe's jenkins found this in java8...
> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
> -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
> -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
>[junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering <<<
>[junit4]> Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
> qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
> v=val_i l=0 u=1}, fq, {! cost=92}-_query_:"{!frange v=val_i l=1 u=1}", fq, 
> {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
> tag=t}-_query_:"{!frange v=val_i l=1 u=1}"]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
>[junit4]>  at 
> org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
> {noformat}
> The seed fails consistently for me on trunk using java7, and on 4x using both 
> java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: The Old Git Discussion

2014-01-07 Thread David Smiley (@MITRE.org)
+1, Mark.

Git isn't perfect; I sympathize with the annoyances pointed out by Rob et.
all.  But I think we would be better off for it -- a net win considering the
upsides.  In the end I'd love to track changes via branches (which includes
forks people make to add changes), not with attaching patch files to an
issue tracker.  The way we do things here sucks for collaboration and it's a
higher bar for people to get involved than it can and should be.

~ David


Mark Miller-3 wrote
> I don’t really buy the fad argument, but as I’ve said, I’m willing to wait
> a little longer for others to catch on. I try and follow the stats and
> reports and articles on this pretty closely.
> 
> As I mentioned early in the thread, by all appearances, the shift from SVN
> to GIT looks much like the shift from CVS to SVN. This was not a fad
> change, nor is the next mass movement likely to be.
> 
> Just like no one starts a project on CVS anymore, we are almost already to
> the point where new projects start exclusive on GIT - especially open
> source.
> 
> I’m happy to sit back and watch the trend continue though. The number of
> GIT users in the committee and among the committers only grows every time
> the discussion comes up.
> 
> If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad
> argument. But it just doesn’t jive in 2014.
> 
> - Mark





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-Old-Git-Discussion-tp4109193p4110109.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

2014-01-07 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864911#comment-13864911
 ] 

Hoss Man commented on SOLR-5618:


Relevant log snipper from jenkins...

{noformat}
   [junit4]   2> 558586 T3202 C2360 oasc.SolrCore.execute [collection1] 
webapp=null path=null 
params={q={!frange+v%3Dval_i+l%3D0+u%3D1+cost%3D139+tag%3Dt}&fq={!frange+v%3Dval_i+l%3D0+u%3D1}&fq={!+cost%3D92}-_query_:"{!frange+v%3Dval_i+l%3D1+u%3D1}"&fq={!frange+v%3Dval_i+l%3D0+u%3D1+cache%3Dtrue+tag%3Dt}&fq={!+cache%3Dtrue+tag%3Dt}-_query_:"{!frange+v%3Dval_i+l%3D1+u%3D1}"}
 hits=0 status=0 QTime=1 
   [junit4]   2> 558586 T3202 oas.SolrTestCaseJ4.assertJQ ERROR query failed 
JSON validation. error=mismatch: '1'!='0' @ response/numFound
   [junit4]   2> expected =/response/numFound==1
   [junit4]   2> response = {
   [junit4]   2>  "responseHeader":{
   [junit4]   2>"status":0,
   [junit4]   2>"QTime":1},
   [junit4]   2>  "response":{"numFound":0,"start":0,"docs":[]
   [junit4]   2>  }}
   [junit4]   2>
   [junit4]   2> request = 
q={!frange+v%3Dval_i+l%3D0+u%3D1+cost%3D139+tag%3Dt}&fq={!frange+v%3Dval_i+l%3D0+u%3D1}&fq={!+cost%3D92}-_query_:"{!frange+v%3Dval_i+l%3D1+u%3D1}"&fq={!frange+v%3Dval_i+l%3D0+u%3D1+cache%3Dtrue+tag%3Dt}&fq={!+cache%3Dtrue+tag%3Dt}-_query_:"{!frange+v%3Dval_i+l%3D1+u%3D1}"
   [junit4]   2> 558587 T3202 oasc.SolrException.log ERROR 
java.lang.RuntimeException: mismatch: '1'!='0' @ response/numFound
   [junit4]   2>at 
org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:732)
   [junit4]   2>at 
org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:679)
   [junit4]   2>at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:316)
...
   [junit4]   2> 558588 T3202 oass.TestFiltering.testRandomFiltering ERROR 
FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 
tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:"{!frange 
v=val_i l=1 u=1}", fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! 
cache=true tag=t}-_query_:"{!frange v=val_i l=1 u=1}"]
   [junit4]   2> 558588 T3202 oas.SolrTestCaseJ4.tearDown ###Ending 
testRandomFiltering
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
-Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
-Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering <<<
   [junit4]> Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
v=val_i l=0 u=1}, fq, {! cost=92}-_query_:"{!frange v=val_i l=1 u=1}", fq, 
{!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
tag=t}-_query_:"{!frange v=val_i l=1 u=1}"]
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
   [junit4]>at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
   [junit4]>at java.lang.Thread.run(Thread.java:744)
{noformat}
{noformat}



> Reproducible failure from TestFiltering.testRandomFiltering
> ---
>
> Key: SOLR-5618
> URL: https://issues.apache.org/jira/browse/SOLR-5618
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> uwe's jenkins found this in java8...
> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
> -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
> -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
>[junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering <<<
>[junit4]> Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
> qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
> v=val_i l=0 u=1}, fq, {! cost=92}-_query_:"{!frange v=val_i l=1 u=1}", fq, 
> {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
> tag=t}-_query_:"{!frange v=val_i l=1 u=1}"]
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
>[junit4]>  at 
> org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
> {noformat}
> The seed fails consistently for me on trunk using java7, and on 4x using both 
> java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-u

[jira] [Created] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

2014-01-07 Thread Hoss Man (JIRA)
Hoss Man created SOLR-5618:
--

 Summary: Reproducible failure from 
TestFiltering.testRandomFiltering
 Key: SOLR-5618
 URL: https://issues.apache.org/jira/browse/SOLR-5618
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man


uwe's jenkins found this in java8...

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
-Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
-Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering <<<
   [junit4]> Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
v=val_i l=0 u=1}, fq, {! cost=92}-_query_:"{!frange v=val_i l=1 u=1}", fq, 
{!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
tag=t}-_query_:"{!frange v=val_i l=1 u=1}"]
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
   [junit4]>at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
{noformat}

The seed fails consistently for me on trunk using java7, and on 4x using both 
java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1201 - Failure!

2014-01-07 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1201/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 9939 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140107_235447_516.syserr
   [junit4] >>> JVM J0: stderr (verbatim) 
   [junit4] java(208,0x149d18000) malloc: *** error for object 0x149d06ad1: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4] <<< JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/java 
-XX:+UseCompressedOops -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=6B057318ACC0851A -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.disableHdfs=true -Dfile.encoding=ISO-8859-1 -classpath 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.0.13.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/expressions/lucene-expressions-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/antlr-runtime-3.5.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-commons-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/com

[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-07 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864855#comment-13864855
 ] 

Anshum Gupta commented on SOLR-5477:


bq. in my experience, when implementing an async callback API like this, it can 
be handy to require the client to specify the magical...

Considering that we have a 1-n relationship between calls made by the client to 
the OCP and OCP to Cores, we can't really use the client generated id. We would 
anyways need multiple ids be generated at the OCP-Core call level.

> Async execution of OverseerCollectionProcessor tasks
> 
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are 
> completed the queue entry is removed. OverSeerColectionProcessor will perform 
> these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: oom in documentation-lint

2014-01-07 Thread Robert Muir
The jtidy-macro we use is not very efficient. It just uses the
built-in jtidytask.

I think this is a real problem, last i checked it seemed impossible to
fix without writing a custom task to integrate with jtidy.

we could either disable it, or you could try setting a large Xmx in
ANT_OPTS as a workaround, but I do think we need to fix or disable
this.

On Tue, Jan 7, 2014 at 5:51 PM, Benson Margulies  wrote:
> Is there a recipe to avoid this?
>
> -documentation-lint:
>  [echo] checking for broken html...
> [ivy:cachepath] downloading
> http://repo1.maven.org/maven2/net/sf/jtidy/jtidy/r938/jtidy-r938.jar
> ...
> [ivy:cachepath]
> ..
> (244kB)
> [ivy:cachepath] .. (0kB)
> [ivy:cachepath] [SUCCESSFUL ] net.sf.jtidy#jtidy;r938!jtidy.jar (383ms)
> [jtidy] Checking for broken html (such as invalid tags)...
>
> BUILD FAILED
> /Users/benson/asf/lucene-solr/build.xml:57: The following error
> occurred while executing this line:
> /Users/benson/asf/lucene-solr/lucene/build.xml:208: The following
> error occurred while executing this line:
> /Users/benson/asf/lucene-solr/lucene/build.xml:214: The following
> error occurred while executing this line:
> /Users/benson/asf/lucene-solr/lucene/common-build.xml:1851:
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2271)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
> at java.io.BufferedWriter.write(BufferedWriter.java:230)
> at java.io.PrintWriter.write(PrintWriter.java:456)
> at java.io.PrintWriter.write(PrintWriter.java:473)
> at java.io.PrintWriter.print(PrintWriter.java:603)
> at java.io.PrintWriter.println(PrintWriter.java:739)
> at org.w3c.tidy.Report.printMessage(Report.java:754)
> at org.w3c.tidy.Report.errorSummary(Report.java:1572)
> at org.w3c.tidy.Tidy.parse(Tidy.java:608)
> at org.w3c.tidy.Tidy.parse(Tidy.java:263)
> at org.w3c.tidy.ant.JTidyTask.processFile(JTidyTask.java:457)
> at org.w3c.tidy.ant.JTidyTask.executeSet(JTidyTask.java:420)
> at org.w3c.tidy.ant.JTidyTask.execute(JTidyTask.java:364)
> at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
> at org.apache.tools.ant.Task.perform(Task.java:348)
> at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
> at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
>
> Total time: 3 minutes 35 seconds
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components

2014-01-07 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864825#comment-13864825
 ] 

Benson Margulies commented on LUCENE-5389:
--

https://github.com/apache/lucene-solr/pull/14



> Even more doc for construction of TokenStream components
> 
>
> Key: LUCENE-5389
> URL: https://issues.apache.org/jira/browse/LUCENE-5389
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Benson Margulies
>
> There are more useful things to tell would-be authors of tokenizers. Let's 
> tell them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: LUCENE-5389: more analysis advice.

2014-01-07 Thread benson-basis
GitHub user benson-basis opened a pull request:

https://github.com/apache/lucene-solr/pull/14

LUCENE-5389: more analysis advice.

Before we change the protocol for tokenizer construction,
let's get plenty of explanation of the existing one, in case
of a 4.7.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benson-basis/lucene-solr 
lucene-5389-more-analysis-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/14.patch


commit 1ddc14c97396183ac99fb9ee5a40bdc09b3994c5
Author: Benson Margulies 
Date:   2014-01-07T22:52:11Z

LUCENE-5389: more analysis advice.
Before we change the protocol for tokenizer construction,
let's get plenty of explanation of the existing one, in case
of a 4.7.




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



oom in documentation-lint

2014-01-07 Thread Benson Margulies
Is there a recipe to avoid this?

-documentation-lint:
 [echo] checking for broken html...
[ivy:cachepath] downloading
http://repo1.maven.org/maven2/net/sf/jtidy/jtidy/r938/jtidy-r938.jar
...
[ivy:cachepath]
..
(244kB)
[ivy:cachepath] .. (0kB)
[ivy:cachepath] [SUCCESSFUL ] net.sf.jtidy#jtidy;r938!jtidy.jar (383ms)
[jtidy] Checking for broken html (such as invalid tags)...

BUILD FAILED
/Users/benson/asf/lucene-solr/build.xml:57: The following error
occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/build.xml:208: The following
error occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/build.xml:214: The following
error occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/common-build.xml:1851:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
at java.io.BufferedWriter.write(BufferedWriter.java:230)
at java.io.PrintWriter.write(PrintWriter.java:456)
at java.io.PrintWriter.write(PrintWriter.java:473)
at java.io.PrintWriter.print(PrintWriter.java:603)
at java.io.PrintWriter.println(PrintWriter.java:739)
at org.w3c.tidy.Report.printMessage(Report.java:754)
at org.w3c.tidy.Report.errorSummary(Report.java:1572)
at org.w3c.tidy.Tidy.parse(Tidy.java:608)
at org.w3c.tidy.Tidy.parse(Tidy.java:263)
at org.w3c.tidy.ant.JTidyTask.processFile(JTidyTask.java:457)
at org.w3c.tidy.ant.JTidyTask.executeSet(JTidyTask.java:420)
at org.w3c.tidy.ant.JTidyTask.execute(JTidyTask.java:364)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)

Total time: 3 minutes 35 seconds

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864689#comment-13864689
 ] 

Joel Bernstein edited comment on SOLR-5244 at 1/7/14 10:12 PM:
---

I'll do some testing of the performance of this. Unless I'm missing something 
though, it looks like you have go through a PagedBytes.Reader, 
PackedInts.Reader to get the BytesRef. I think would have similar performance 
to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc 
IntObjectOpenHashMap, which I should been able to do 10 million+ read 
operations per second.


was (Author: joel.bernstein):
I'll do some testing of the performance of this. Unless I'm missing something 
though, it looks like you have go through a PagedBytes.Reader, 
PackedInts.Reader to get the BytesRef. I think would perform with similar 
performance to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc 
IntObjectOpenHashMap, which I should been able to do 10 million+ read 
operations per second.

> Full Search Result Export
> -
>
> Key: SOLR-5244
> URL: https://issues.apache.org/jira/browse/SOLR-5244
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 5.0
>
> Attachments: SOLR-5244.patch
>
>
> It would be great if Solr could efficiently export entire search result sets 
> without scoring or ranking documents. This would allow external systems to 
> perform rapid bulk imports from Solr. It also provides a possible platform 
> for exporting results to support distributed join scenarios within Solr.
> This ticket provides a patch that has two pluggable components:
> 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
> document results and does not delegate to ranking collectors. Instead it puts 
> the BitSet on the request context.
> 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
> the entire result as a binary stream. A header is provided at the beginning 
> of the stream so external clients can self configure.
> Note:
> These two components will be sufficient for a non-distributed environment. 
> For distributed export a new Request handler will need to be developed.
> After applying the patch and building the dist or example, you can register 
> the components through the following changes to solrconfig.xml
> Register export contrib libraries:
> 
>  
> Register the "export" queryParser with the following line:
>  
>  class="org.apache.solr.export.ExportQParserPlugin"/>
>  
> Register the "xbin" writer:
>  
>  class="org.apache.solr.export.BinaryExportWriter"/>
>  
> The following query will perform the export:
> {code}
> http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i
> {code}
> Initial patch supports export of four data-types:
> 1) Single value trie int, long and float
> 2) Binary doc values.
> The numerics are currently exported from the FieldCache and the Binary doc 
> values can be in memory or on disk.
> Since this is designed to export very large result sets efficiently, stored 
> fields are not used for the export.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Priority: Minor  (was: Major)

> Default classloader restrictions may be too tight
> -
>
> Key: SOLR-5617
> URL: https://issues.apache.org/jira/browse/SOLR-5617
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Shawn Heisey
>Priority: Minor
>  Labels: security
> Fix For: 5.0, 4.7
>
>
> SOLR-4882 introduced restrictions for the Solr class loader that cause 
> resources outside the instanceDir to fail to load.  This is a very good goal, 
> but what if you have common resources like included config files that are 
> outside instanceDir but are still fully inside the solr home?
> I can understand not wanting to load resources from an arbitrary path, but 
> the solr home and its children should be about as trustworthy as instanceDir.
> Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
> automatically.  If I need to define a system property to make this happen, 
> I'm OK with that -- as long as I don't have to turn off the safety checking 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864742#comment-13864742
 ] 

Robert Muir commented on LUCENE-5388:
-

+1, its really silly its this way. I guess its the right thing to do this for 
5.0 only: i wish we had done it for 4.0, but it is what it is.

Should be a rather large and noisy change unfortunately. I can help, let me 
know.

> Eliminate construction over readers for Tokenizer
> -
>
> Key: LUCENE-5388
> URL: https://issues.apache.org/jira/browse/LUCENE-5388
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Benson Margulies
>
> In the modern world, Tokenizers are intended to be reusable, with input 
> supplied via #setReader. The constructors that take Reader are a vestige. 
> Worse yet, they invite people to make mistakes in handling the reader that 
> tangle them up with the state machine in Tokenizer. The sensible thing is to 
> eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864741#comment-13864741
 ] 

Shawn Heisey commented on SOLR-5617:


I have figured out a workaround.  I've got a config structure that heavily uses 
xinclude and symlinks.  By changing things around so that only the symlinks 
traverse upwards and xinclude only refers to "local" files, I no longer need to 
enable unsafe loading.

I still think that it would be useful to fix this issue, but the urgency is 
gone.

> Default classloader restrictions may be too tight
> -
>
> Key: SOLR-5617
> URL: https://issues.apache.org/jira/browse/SOLR-5617
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Shawn Heisey
>  Labels: security
> Fix For: 5.0, 4.7
>
>
> SOLR-4882 introduced restrictions for the Solr class loader that cause 
> resources outside the instanceDir to fail to load.  This is a very good goal, 
> but what if you have common resources like included config files that are 
> outside instanceDir but are still fully inside the solr home?
> I can understand not wanting to load resources from an arbitrary path, but 
> the solr home and its children should be about as trustworthy as instanceDir.
> Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
> automatically.  If I need to define a system property to make this happen, 
> I'm OK with that -- as long as I don't have to turn off the safety checking 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-5170:
--

Attachment: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt

Adds recipDistance scoring, lat/long is one param.

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864738#comment-13864738
 ] 

Jeff Wartes commented on SOLR-5170:
---

I've been using this patch with some minor tweaks and solr 4.3.1 in production 
for about six months now. Since I was applying it again against 4.6 this 
morning, I figured I should attach my tweaks, and mention it passes tests 
against 4.6.

This does NOT address the design issues David raises in the initial comment. 
The changes vs the initial patchfile allow it to be applied against a greater 
range of solr versions, and brings it a little closer to feeling the same as 
geofilt's params.

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5389) Even more doc for construction of TokenStream components

2014-01-07 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-5389:


 Summary: Even more doc for construction of TokenStream components
 Key: LUCENE-5389
 URL: https://issues.apache.org/jira/browse/LUCENE-5389
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Benson Margulies


There are more useful things to tell would-be authors of tokenizers. Let's tell 
them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-07 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-5388:


 Summary: Eliminate construction over readers for Tokenizer
 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies


In the modern world, Tokenizers are intended to be reusable, with input 
supplied via #setReader. The constructors that take Reader are a vestige. Worse 
yet, they invite people to make mistakes in handling the reader that tangle 
them up with the state machine in Tokenizer. The sensible thing is to eliminate 
these ctors, and force setReader usage.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864505#comment-13864505
 ] 

Shawn Heisey edited comment on SOLR-5617 at 1/7/14 9:44 PM:


Here's a stacktrace from my attempted start on 4.6.0 without the option to 
allow unsafe resource loading.  The solr home is /index/solr4:

{noformat}
ERROR - 2014-01-07 14:37:05.493; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: SolrCore 's1build' is not available 
due to init failure: Could not load config file 
/index/solr4/cores/s1_0/solrconfig.xml
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:825)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:293)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1476)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Could not load config file 
/index/solr4/cores/s1_0/solrconfig.xml
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:532)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:599)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:245)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Caused by: org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; 
systemId: solrres:/solrconfig.xml; lineNumber: 7; columnNumber: 70; An include 
with href '../../../config/common/luceneMatchVersion.xml'failed, and no 
fallback element was found.
at org.apache.solr.core.Config.(Config.java:148)
at org.apache.solr.core.Config.(Config.java:86)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:129)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:529)
... 11 more
Caused by: org.xml.sax.SAXParseException; systemId: solrres:/solrconfig.xml; 
lineNumber: 7; columnNumber: 70; An include with href 
'../../../config/common/luceneMatchVersion.xml'failed, and no fall

[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Description: 
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but what if you have common resources like included config files that are 
outside instanceDir but are still fully inside the solr home?

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
automatically.  If I need to define a system property to make this happen, I'm 
OK with that -- as long as I don't have to turn off the safety checking 
entirely.

  was:
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
it is searched automatically.  If I need to define a system property to make 
this happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.


> Default classloader restrictions may be too tight
> -
>
> Key: SOLR-5617
> URL: https://issues.apache.org/jira/browse/SOLR-5617
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Shawn Heisey
>  Labels: security
> Fix For: 5.0, 4.7
>
>
> SOLR-4882 introduced restrictions for the Solr class loader that cause 
> resources outside the instanceDir to fail to load.  This is a very good goal, 
> but what if you have common resources like included config files that are 
> outside instanceDir but are still fully inside the solr home?
> I can understand not wanting to load resources from an arbitrary path, but 
> the solr home and its children should be about as trustworthy as instanceDir.
> Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
> automatically.  If I need to define a system property to make this happen, 
> I'm OK with that -- as long as I don't have to turn off the safety checking 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: The Old Git Discussion

2014-01-07 Thread Mark Miller
I don’t really buy the fad argument, but as I’ve said, I’m willing to wait a 
little longer for others to catch on. I try and follow the stats and reports 
and articles on this pretty closely.

As I mentioned early in the thread, by all appearances, the shift from SVN to 
GIT looks much like the shift from CVS to SVN. This was not a fad change, nor 
is the next mass movement likely to be.

Just like no one starts a project on CVS anymore, we are almost already to the 
point where new projects start exclusive on GIT - especially open source.

I’m happy to sit back and watch the trend continue though. The number of GIT 
users in the committee and among the committers only grows every time the 
discussion comes up.

If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad 
argument. But it just doesn’t jive in 2014.

- Mark

On Jan 7, 2014, at 3:33 PM, Lajos  wrote:

> I've followed this thread with interest, and although I'm (sadly) a lapsed 
> Apache committer (not Lucene/Solr), I finally had to comment as I've just 
> gone through the pain of learning git after many happy years with svn.
> 
> In my long experience in IT I've learned one incontrovertible fact: most 
> times, the technical merits of one technology over another are not nearly as 
> important as everyone thinks. It is really all about how WELL you use a given 
> technology to get the job done. The stuff I do in git now, I could do in SVN, 
> and vice versa. I'd wager I could do the same in CVS or even older 
> technologies. It like Ant versus Maven versus Gradle. I can do the same in 
> each of these. Each has their own good and bad points. I'll stick with Ant 
> and SVN to the end but hey, if a client works only with Gradle and Git and 
> XYZ technology and has an intellectual investment there, I'm not gonna argue 
> the point on technical merits.
> 
> That being said, I think the worst argument one could make about anything is 
> that "we should move to it because everyone else is". People will flock to 
> fads as much (I could argue: more) than to genuine technical improvements 
> (anyone remember the 70s? 80s? 90s?). Git feels a bit faddish to me, and is 
> definitely immature. I get some of the advantages, but I don't think I should 
> have to be a gitk expert to use the damn software - its over-engineered and 
> actually opens up the door to more convoluted development processes.
> 
> Whether Git is a fad or not, the issue, as pointed out below, is supporting 
> the way contributors work. The win-win situation would be to keep the core 
> based on SVN but support git contributions (as I know someone else 
> suggested). SVN is a technology that is stable and which all core committers 
> know like the back of their hands - no sense in wasting time learning git 
> when people are donating time and that time is better spent on JIRAs. What I 
> don't know is how this GIT integration would work, but I'd hope it could be 
> done.
> 
> Just to push home the point, I'll bet most of us who have been around a while 
> have plenty of stories of IT shops moving from one technology to another ... 
> and then in a few years to another ... and then to another - all because some 
> manager got a burr up his rear or was wined and dined by a vendor. Why? Why 
> hurt productivity for the sake of keep up with the times? How about setting 
> an example of sticking with what works despite the made rush to github?
> 
> My €.02.
> 
> Lajos Moczar
> 
> 
> 
> 
> On 06/01/2014 17:01, Robert Muir wrote:
>> On Sun, Jan 5, 2014 at 12:07 PM, Mark Miller  wrote:
>>> My point here is not really to discuss the merits of Git VS SVN on a feature
>>> / interface basis. We might as well talk about MySQL vs Postgres.
>>> 
>>> Personally, I prefer GIT. It feels good when I use it. SVN feels like crap.
>>> That doesn't make me want to move. I've used SVN for years with Lucene/Solr,
>>> and like everyone, it's pretty much second nature.
>>> 
>>> The problem is the world is moving. It may not be clear to everyone yet, but
>>> give it a bit more time and it will be.
>>> 
>>> Git already owns the open source world. It rivals SVN by most guesses in the
>>> proprietary world. This is a strong hard trend. The same trend that saw SVN
>>> eat CVS. I think clearly, a distributed version control system will
>>> dominate. And clearly Git has won.
>>> 
>>> I'm not ready to call a vote, because I don't think it's critical we switch
>>> yet. But I wanted to continue the discussion, as obviously, plenty of it
>>> will be needed over time before we made such a switch.
>>> 
>>> It's not about one thing being better than the other. It's about using what
>>> everyone else uses so you don't provide a barrier to contribution. It's
>>> about the post I linked to when I started this thread.
>>> 
>>> I personally don't care about pull requests and Github. I don't think any of
>>> it's features are that great, other than it acts as a central repo. Git is
>>> not good because of Github IMO. 

[jira] [Commented] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864689#comment-13864689
 ] 

Joel Bernstein commented on SOLR-5244:
--

I'll do some testing of the performance of this. Unless I'm missing something 
though, it looks like you have go through a PagedBytes.Reader, 
PackedInts.Reader to get the BytesRef. I think would perform with similar 
performance to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc 
IntObjectOpenHashMap, which I should been able to do 10 million+ read 
operations per second.

> Full Search Result Export
> -
>
> Key: SOLR-5244
> URL: https://issues.apache.org/jira/browse/SOLR-5244
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 5.0
>
> Attachments: SOLR-5244.patch
>
>
> It would be great if Solr could efficiently export entire search result sets 
> without scoring or ranking documents. This would allow external systems to 
> perform rapid bulk imports from Solr. It also provides a possible platform 
> for exporting results to support distributed join scenarios within Solr.
> This ticket provides a patch that has two pluggable components:
> 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
> document results and does not delegate to ranking collectors. Instead it puts 
> the BitSet on the request context.
> 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
> the entire result as a binary stream. A header is provided at the beginning 
> of the stream so external clients can self configure.
> Note:
> These two components will be sufficient for a non-distributed environment. 
> For distributed export a new Request handler will need to be developed.
> After applying the patch and building the dist or example, you can register 
> the components through the following changes to solrconfig.xml
> Register export contrib libraries:
> 
>  
> Register the "export" queryParser with the following line:
>  
>  class="org.apache.solr.export.ExportQParserPlugin"/>
>  
> Register the "xbin" writer:
>  
>  class="org.apache.solr.export.BinaryExportWriter"/>
>  
> The following query will perform the export:
> {code}
> http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i
> {code}
> Initial patch supports export of four data-types:
> 1) Single value trie int, long and float
> 2) Binary doc values.
> The numerics are currently exported from the FieldCache and the Binary doc 
> values can be in memory or on disk.
> Since this is designed to export very large result sets efficiently, stored 
> fields are not used for the export.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2014-01-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864683#comment-13864683
 ] 

Michael McCandless commented on LUCENE-5354:


Woops, sorry, this fell below the event horizon of my TODO list.  I'll look at 
your new patch soon.

There is an existing performance test, LookupBenchmarkTest, but it's a bit 
tricky to run.  See the comment on LUCENE-5030: 
https://issues.apache.org/jira/browse/LUCENE-5030?focusedCommentId=13689155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13689155

> Blended score in AnalyzingInfixSuggester
> 
>
> Key: LUCENE-5354
> URL: https://issues.apache.org/jira/browse/LUCENE-5354
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Affects Versions: 4.4
>Reporter: Remi Melisson
>Priority: Minor
>  Labels: suggester
> Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch
>
>
> I'm working on a custom suggester derived from the AnalyzingInfix. I require 
> what is called a "blended score" (//TODO ln.399 in AnalyzingInfixSuggester) 
> to transform the suggestion weights depending on the position of the searched 
> term(s) in the text.
> Right now, I'm using an easy solution :
> If I want 10 suggestions, then I search against the current ordered index for 
> the 100 first results and transform the weight :
> bq. a) by using the term position in the text (found with TermVector and 
> DocsAndPositionsEnum)
> or
> bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
> searching
> and return the updated 10 most weighted suggestions.
> Since we usually don't need to suggest so many things, the bigger search + 
> rescoring overhead is not so significant but I agree that this is not the 
> most elegant solution.
> We could include this factor (here the position of the term) directly into 
> the index.
> So, I can contribute to this if you think it's worth adding it.
> Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
> dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Iterating BinaryDocValues

2014-01-07 Thread Michael McCandless
Going sequentially should help, if the pages are not hot (in the OS's IO cache).

You can also use a different DVFormat, e.g. Direct, but this holds all
bytes in RAM.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
 wrote:
> Joel,
>
> I tried to hack it straightforwardly, but found no free gain there. The only
> attempt I can suggest is to try to reuse bytes in
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
> right now it allocates bytes every time, which beside of GC can also impact
> memory access locality. Could you try fix memory waste and repeat
> performance test?
>
> Have a good hack!
>
>
> On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein  wrote:
>>
>>
>> Hi,
>>
>> I'm looking for a faster way to perform large scale docId -> bytesRef
>> lookups for BinaryDocValues.
>>
>> I'm finding that I can't get the performance that I need from the random
>> access seek in the BinaryDocValues interface.
>>
>> I'm wondering if sequentially scanning the docValues would be a faster
>> approach. I have a BitSet of matching docs, so if I sequentially moved
>> through the docValues I could test each one against that bitset.
>>
>> Wondering if that approach would be faster for bulk extracts and how
>> tricky it would be to add an iterator to the BinaryDocValues interface?
>>
>> Thanks,
>> Joel
>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: The Old Git Discussion

2014-01-07 Thread Lajos
I've followed this thread with interest, and although I'm (sadly) a 
lapsed Apache committer (not Lucene/Solr), I finally had to comment as 
I've just gone through the pain of learning git after many happy years 
with svn.


In my long experience in IT I've learned one incontrovertible fact: most 
times, the technical merits of one technology over another are not 
nearly as important as everyone thinks. It is really all about how WELL 
you use a given technology to get the job done. The stuff I do in git 
now, I could do in SVN, and vice versa. I'd wager I could do the same in 
CVS or even older technologies. It like Ant versus Maven versus Gradle. 
I can do the same in each of these. Each has their own good and bad 
points. I'll stick with Ant and SVN to the end but hey, if a client 
works only with Gradle and Git and XYZ technology and has an 
intellectual investment there, I'm not gonna argue the point on 
technical merits.


That being said, I think the worst argument one could make about 
anything is that "we should move to it because everyone else is". People 
will flock to fads as much (I could argue: more) than to genuine 
technical improvements (anyone remember the 70s? 80s? 90s?). Git feels a 
bit faddish to me, and is definitely immature. I get some of the 
advantages, but I don't think I should have to be a gitk expert to use 
the damn software - its over-engineered and actually opens up the door 
to more convoluted development processes.


Whether Git is a fad or not, the issue, as pointed out below, is 
supporting the way contributors work. The win-win situation would be to 
keep the core based on SVN but support git contributions (as I know 
someone else suggested). SVN is a technology that is stable and which 
all core committers know like the back of their hands - no sense in 
wasting time learning git when people are donating time and that time is 
better spent on JIRAs. What I don't know is how this GIT integration 
would work, but I'd hope it could be done.


Just to push home the point, I'll bet most of us who have been around a 
while have plenty of stories of IT shops moving from one technology to 
another ... and then in a few years to another ... and then to another - 
all because some manager got a burr up his rear or was wined and dined 
by a vendor. Why? Why hurt productivity for the sake of keep up with the 
times? How about setting an example of sticking with what works despite 
the made rush to github?


My €.02.

Lajos Moczar




On 06/01/2014 17:01, Robert Muir wrote:

On Sun, Jan 5, 2014 at 12:07 PM, Mark Miller  wrote:

My point here is not really to discuss the merits of Git VS SVN on a feature
/ interface basis. We might as well talk about MySQL vs Postgres.

Personally, I prefer GIT. It feels good when I use it. SVN feels like crap.
That doesn't make me want to move. I've used SVN for years with Lucene/Solr,
and like everyone, it's pretty much second nature.

The problem is the world is moving. It may not be clear to everyone yet, but
give it a bit more time and it will be.

Git already owns the open source world. It rivals SVN by most guesses in the
proprietary world. This is a strong hard trend. The same trend that saw SVN
eat CVS. I think clearly, a distributed version control system will
dominate. And clearly Git has won.

I'm not ready to call a vote, because I don't think it's critical we switch
yet. But I wanted to continue the discussion, as obviously, plenty of it
will be needed over time before we made such a switch.

It's not about one thing being better than the other. It's about using what
everyone else uses so you don't provide a barrier to contribution. It's
about the post I linked to when I started this thread.

I personally don't care about pull requests and Github. I don't think any of
it's features are that great, other than it acts as a central repo. Git is
not good because of Github IMO. But Git and Github are eating the world.

Most of the patches I have processed now are made against Git. Jumping from
SVN to Git and back is very annoying IMO though. There are plenty of tools
and workflows for it and they all suck.

Anyway, as the trend continues, it will become even more obvious that
Lucene/Solr will start looking stale on SVN. We have enough image problems
in terms of being modern at Apache. We will need to manage the ones we can.

We should not choose the tools that simply make us fuzzy and comfortable.
We should choose the tools that are best for the project and future
contributions in the long term.

- Mark




The idea that this has anything to do with contributors is misleading.

Today contributors can use either SVN or GIT. They have their choice.
How can it be any better than that for contributors?

As demonstrated over the weekend, its also possible today for
contributors to use svn+jira or git+pull request workflow.

As i said earlier, why not spend our time trying to make it easier on
contributors and support git/github workflows (e.g.

Re: Pull requests versus JIRAta

2014-01-07 Thread Benson Margulies
OK. Hopefully this time I'll remember to watch my own JIRA so that I
don't ignore Uwe.

On Tue, Jan 7, 2014 at 3:24 PM, Robert Muir  wrote:
> I think 1 or 3 is best.
>
> The downside of 2 is just the confusion, since the other doc was good,
> i dont think we have to reopen it.
>
> i cant imagine anyone worried about having too many jiras with
> documentation fixes!
>
> On Tue, Jan 7, 2014 at 3:21 PM, Benson Margulies  
> wrote:
>> Further adventures in token streams have motivated me to play tech
>> writer some more.
>>
>> Options:
>>
>> 1. just create github pull requests.
>> 2. reopen prior jira
>> 3. make new jira
>>
>> preference?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Pull requests versus JIRAta

2014-01-07 Thread Robert Muir
I think 1 or 3 is best.

The downside of 2 is just the confusion, since the other doc was good,
i dont think we have to reopen it.

i cant imagine anyone worried about having too many jiras with
documentation fixes!

On Tue, Jan 7, 2014 at 3:21 PM, Benson Margulies  wrote:
> Further adventures in token streams have motivated me to play tech
> writer some more.
>
> Options:
>
> 1. just create github pull requests.
> 2. reopen prior jira
> 3. make new jira
>
> preference?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Pull requests versus JIRAta

2014-01-07 Thread Benson Margulies
Further adventures in token streams have motivated me to play tech
writer some more.

Options:

1. just create github pull requests.
2. reopen prior jira
3. make new jira

preference?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5361) FVH throws away some boosts

2014-01-07 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864583#comment-13864583
 ] 

Adrien Grand commented on LUCENE-5361:
--

Thanks Nik, your fix looks good! I don't think cloning the queries is an issue, 
it happens all the time when doing rewrites, and it's definitely better than 
modifying those queries in-place.

I'll commit it tomorrow if there is no objection.

> FVH throws away some boosts
> ---
>
> Key: LUCENE-5361
> URL: https://issues.apache.org/jira/browse/LUCENE-5361
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5361.patch
>
>
> The FVH's FieldQuery throws away some boosts when flattening queries, 
> including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
> against queries containing boosted boolean queries don't end up sorted 
> correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings

2014-01-07 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864579#comment-13864579
 ] 

Hoss Man commented on SOLR-5594:


* Aren't there other parsers classes that will need similar changes? 
(PrefixQParserPlugin, SimplerQParserPlugin at a minimum i think)
* I think your new FieldType.getPrefixQuery method has a back compat break for 
any existing FieldTypes that people might be using because it now calls 
readableToIndexed ... that smells like it could break things for some 
FieldTypes ... but maybe i'm missing something?
* FieldType.getPrefixQuery has lots of bogus cut/pasted javadocs from 
getRangeQuery
* Can't your MyIndexedBinaryField just subclass BinaryField to reduce some 
code?  for that matter: is there any reason why we shouldn't just make 
BinaryField implement prefix queries in the way your MyIndexedBinaryField does?
* i'm not sure i understand why you need BinaryTokenStream for the test (see 
previous comment about just extending/improving BinaryField) but if so perhaps 
it should be moved from lucene/core to lucene/test-framework?

> Enable using extended field types with prefix queries for non-default encoded 
> strings
> -
>
> Key: SOLR-5594
> URL: https://issues.apache.org/jira/browse/SOLR-5594
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers, Schema and Analysis
>Affects Versions: 4.6
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Minor
> Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch
>
>
> Enable users to be able to use prefix query with custom field types with 
> non-default encoding/decoding for queries more easily. e.g. having a custom 
> field work with base64 encoded query strings.
> Currently, the workaround for it is to have the override at getRewriteMethod 
> level. Perhaps having the prefixQuery also use the calling FieldType's 
> readableToIndexed method would work better.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Another rev that adds what I think is a decent change anyway - before joining 
an election, cancel any known previous election participation.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
> Fix For: 5.0, 4.7, 4.6.1
>
> Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5560) Enable LocalParams without escaping the query

2014-01-07 Thread Isaac Hebsh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864553#comment-13864553
 ] 

Isaac Hebsh commented on SOLR-5560:
---

Hi [~ryancutter], thank you a lot!
I'm not familiar with parser states (thank god), so I can't review the patch.

What action is should be performed in order to commit this patch? (into 4.7?)

> Enable LocalParams without escaping the query
> -
>
> Key: SOLR-5560
> URL: https://issues.apache.org/jira/browse/SOLR-5560
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.6
>Reporter: Isaac Hebsh
> Fix For: 4.7, 4.6.1
>
> Attachments: SOLR-5560.patch
>
>
> This query should be a legit syntax:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND {!lucene df=text}(TERM2 TERM3 "TERM4 TERM5")
> currently it isn't, because the LocalParams can be specified on a single term 
> only.
> [~billnbell] thinks it is a bug.
> From the mailing list:
> {quote}
> We want to set a LocalParam on a nested query. When quering with "v" inline 
> parameter, it works fine:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND {!lucene df=text v="TERM2 TERM3 \"TERM4 TERM5\""}
> the parsedquery_toString is
> +id:TERM1 +(text:term2 text:term3 text:"term4 term5")
> Query using the "_query_" also works fine:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND _query_:"{!lucene df=text}TERM2 TERM3 \"TERM4 TERM5\""
> (parsedquery is exactly the same).
> Obviously, there is the option of external parameter ({... 
> v=$nestedq}&nestedq=...)
> This is a good solution, but it is not practical, when having a lot of such 
> nested queries.
> BUT, when trying to put the nested query in place, it yields syntax error:
> http://localhost:8983/solr/collection1/select?debugQuery=true&defType=lucene&df=id&q=TERM1
>  AND {!lucene df=text}(TERM2 TERM3 "TERM4 TERM5")
> org.apache.solr.search.SyntaxError: Cannot parse '(TERM2'
> The previous options are less preferred, because the escaping that should be 
> made on the nested query.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-5611) When documents are uniformly distributed over shards, enable returning approximated results in distributed query

2014-01-07 Thread Isaac Hebsh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isaac Hebsh closed SOLR-5611.
-

Resolution: Not A Problem

Oops. I missed the {{shards.rows}} parameter.

> When documents are uniformly distributed over shards, enable returning 
> approximated results in distributed query
> 
>
> Key: SOLR-5611
> URL: https://issues.apache.org/jira/browse/SOLR-5611
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Isaac Hebsh
>  Labels: distributed_search, shard, solrcloud
> Fix For: 4.7
>
>
> Query with rows=1000, which sent to a collection of 100 shards (shard key 
> behaviour is default - based on hash of the unique key), will generate 100 
> requests of rows=1000, on each shard.
> This results to total number of rows*numShards unique keys to be retrieved. 
> This behaviour is getting worst as numShards grows.
> If the documents are uniformly distributed over the shards, the expected 
> number of document should be ~ rows/numShards. Obviously, there might be 
> extreme cases, when all of the top X documents are in a specific shard.
> I suggest adding an optional parameter, say approxResults=true, which decides 
> whether we should limit the rows in the shard requests to rows/numShardsor 
> not. Moreover, we can add a numeric parameter which increases the limit, to 
> be more accurate.
> For example, the query {{approxResults=true&approxResults.factor=1.5}} will 
> retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and 
> rows=1000, each shard will return 15 documents.
> Furthermore, this can reduce the problem of deep paging, because the same 
> thing can be applied there. when requested start=10, Solr creating shard 
> request with start=0 and rows=START+ROWS. In the approximated approach, start 
> parameter (in the shard requests) can be set to 10/numShards. The idea of 
> the approxResults.factor creates some difficulties here, though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864505#comment-13864505
 ] 

Shawn Heisey commented on SOLR-5617:


I will have to double-check, but I probably have the specifics that required me 
to turn off the safety checking wrong.  It may have been configuration 
components gathered via xinclude, not jarfiles.  Either way, I am sure that 
everything is under the solr home.


> Default classloader restrictions may be too tight
> -
>
> Key: SOLR-5617
> URL: https://issues.apache.org/jira/browse/SOLR-5617
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Shawn Heisey
>  Labels: security
> Fix For: 5.0, 4.7
>
>
> SOLR-4882 introduced restrictions for the Solr class loader that cause 
> resources outside the instanceDir to fail to load.  This is a very good goal, 
> but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
> order to get those jars to work, I must turn off all SOLR-4882 safety 
> checking.
> I can understand not wanting to load resources from an arbitrary path, but 
> the solr home and its children should be about as trustworthy as instanceDir.
> Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
> it is searched automatically.  If I need to define a system property to make 
> this happen, I'm OK with that -- as long as I don't have to turn off the 
> safety checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864502#comment-13864502
 ] 

Mark Miller commented on SOLR-5615:
---

Yeah, I've been considered the same thing. My inclination was it was okay, but 
we may have to add something to cancel our leader election before joining the 
election to be sure.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
> Fix For: 5.0, 4.7, 4.6.1
>
> Attachments: SOLR-5615.patch, SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Description: 
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
it is searched automatically.  If I need to define a system property to make 
this happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.

  was:
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in ${solr.solr.home}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but 
${solr.solr.home} and its children should be about as trustworthy as 
instanceDir.

Ideally I'd like to have ${solr.solr.home}/lib trusted automatically, since it 
is searched automatically.  If I need to define a system property to make this 
happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.


> Default classloader restrictions may be too tight
> -
>
> Key: SOLR-5617
> URL: https://issues.apache.org/jira/browse/SOLR-5617
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Shawn Heisey
>  Labels: security
> Fix For: 5.0, 4.7
>
>
> SOLR-4882 introduced restrictions for the Solr class loader that cause 
> resources outside the instanceDir to fail to load.  This is a very good goal, 
> but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
> order to get those jars to work, I must turn off all SOLR-4882 safety 
> checking.
> I can understand not wanting to load resources from an arbitrary path, but 
> the solr home and its children should be about as trustworthy as instanceDir.
> Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
> it is searched automatically.  If I need to define a system property to make 
> this happen, I'm OK with that -- as long as I don't have to turn off the 
> safety checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864496#comment-13864496
 ] 

Mikhail Khludnev commented on SOLR-5244:


bq. 1) Add a special cache that speeds up the docId-> bytesRef lookup. This 
would be a segment level cache of the top N terms (by frequency) in the index. 
The cache would be a simple int to BytesRef hashmap, mapping the segment level 
ord to the bytesRef

that's exactly what you've got on FieldCache.DEFAULT.getTerms() for an indexed 
field without docvalues enabled. See. FieldCacheImpl.BinaryDocValuesCache

> Full Search Result Export
> -
>
> Key: SOLR-5244
> URL: https://issues.apache.org/jira/browse/SOLR-5244
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 5.0
>
> Attachments: SOLR-5244.patch
>
>
> It would be great if Solr could efficiently export entire search result sets 
> without scoring or ranking documents. This would allow external systems to 
> perform rapid bulk imports from Solr. It also provides a possible platform 
> for exporting results to support distributed join scenarios within Solr.
> This ticket provides a patch that has two pluggable components:
> 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
> document results and does not delegate to ranking collectors. Instead it puts 
> the BitSet on the request context.
> 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
> the entire result as a binary stream. A header is provided at the beginning 
> of the stream so external clients can self configure.
> Note:
> These two components will be sufficient for a non-distributed environment. 
> For distributed export a new Request handler will need to be developed.
> After applying the patch and building the dist or example, you can register 
> the components through the following changes to solrconfig.xml
> Register export contrib libraries:
> 
>  
> Register the "export" queryParser with the following line:
>  
>  class="org.apache.solr.export.ExportQParserPlugin"/>
>  
> Register the "xbin" writer:
>  
>  class="org.apache.solr.export.BinaryExportWriter"/>
>  
> The following query will perform the export:
> {code}
> http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i
> {code}
> Initial patch supports export of four data-types:
> 1) Single value trie int, long and float
> 2) Binary doc values.
> The numerics are currently exported from the FieldCache and the Binary doc 
> values can be in memory or on disk.
> Since this is designed to export very large result sets efficiently, stored 
> fields are not used for the export.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864491#comment-13864491
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Fair enough. Would that allow multiple onReconnect.command () invocations to 
run simultaneously, and is that fine? (on mobile, so my reading of the patch 
could be wrong) What if we were in the process of recovering when we were 
unfortunate enough to get a second expiry thereby bringing all nodes down?

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
> Fix For: 5.0, 4.7, 4.6.1
>
> Attachments: SOLR-5615.patch, SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-5617:
--

 Summary: Default classloader restrictions may be too tight
 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
 Fix For: 5.0, 4.7


SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in ${solr.solr.home}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but 
${solr.solr.home} and its children should be about as trustworthy as 
instanceDir.

Ideally I'd like to have ${solr.solr.home}/lib trusted automatically, since it 
is searched automatically.  If I need to define a system property to make this 
happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864475#comment-13864475
 ] 

Mark Miller commented on SOLR-5615:
---

Even with the other changes, I like the idea of using a background thread 
because I don't think it's right that we do that whole reconnect process before 
we set that we are connected to zk and get out of the connection manager. I 
really don't think that process should hold up the connection manager at all - 
it's meant to just trigger it.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
> Fix For: 5.0, 4.7, 4.6.1
>
> Attachments: SOLR-5615.patch, SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Iterating BinaryDocValues

2014-01-07 Thread Mikhail Khludnev
Joel,

I tried to hack it straightforwardly, but found no free gain there. The
only attempt I can suggest is to try to reuse bytes in
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401right
now it allocates bytes every time, which beside of GC can also impact
memory access locality. Could you try fix memory waste and repeat
performance test?

Have a good hack!


On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein  wrote:

>
> Hi,
>
> I'm looking for a faster way to perform large scale docId -> bytesRef
> lookups for BinaryDocValues.
>
> I'm finding that I can't get the performance that I need from the random
> access seek in the BinaryDocValues interface.
>
> I'm wondering if sequentially scanning the docValues would be a faster
> approach. I have a BitSet of matching docs, so if I sequentially moved
> through the docValues I could test each one against that bitset.
>
> Wondering if that approach would be faster for bulk extracts and how
> tricky it would be to add an iterator to the BinaryDocValues interface?
>
> Thanks,
> Joel
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


[jira] [Resolved] (SOLR-5614) Boost documents using "map" and "query" functions

2014-01-07 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5614.


Resolution: Invalid

please don't file a bug just because you've been waiting 24 hours for an answer 
to a question on the solr-user mailing list - sometimes it takes longer then 
that for people to answer.

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201312.mbox/%3c52c17579.30...@kelkoo.com%3E

> Boost documents using "map" and "query" functions
> -
>
> Key: SOLR-5614
> URL: https://issues.apache.org/jira/browse/SOLR-5614
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.6
>Reporter: Anca Kopetz
>
> We want to boost documents that contain specific search terms in its fields. 
> We tried the following simplified query : 
> http://localhost:8983/solr/collection1/select?q=ipod 
> belkin&wt=xml&debugQuery=true&q.op=AND&defType=edismax&bf=map(query($qq),0,0,0,100.0)&qq={!edismax}power
> And we get the following error : 
> org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
> 'power'
> And the stacktrace :
> ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
> Infinite Recursion detected parsing query 'power'
> at 
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
> parsing query 'power'
> at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
> at org.apache.solr.search.QParser.subQuery(QParser.java:200)
> at 
> org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
> at 
> org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDisma

[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Fix Version/s: 4.6.1
   4.7
   5.0
 Assignee: Mark Miller

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
> Fix For: 5.0, 4.7, 4.6.1
>
> Attachments: SOLR-5615.patch, SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Another rev.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-5615.patch, SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864460#comment-13864460
 ] 

Mark Miller commented on SOLR-5615:
---

bq. However, onReconnect in any case runs in the event thread of the expired ZK 
which wouldn't have events after that, so it's effectively backgrounded?

But it holds the ConnectionManager this lock while it runs right? I think we 
just don't want to hold that lock while it runs. 

I think the other changes are likely okay too - I'm playing around with a 
combination of the two.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864446#comment-13864446
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

That, incidentally, was my first attempt at a fix! (Should have a diff..) 
However, onReconnect in any case runs in the event thread of the expired ZK 
which wouldn't have events after that, so it's effectively backgrounded? It 
should still work as a solution I guess..

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864434#comment-13864434
 ] 

Mark Miller commented on SOLR-5615:
---

Okay, now it's more clear to me. We need to run onReconnect in a background 
thread I think.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Nolan Lawson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864402#comment-13864402
 ] 

Nolan Lawson commented on SOLR-5379:


[~markus17]: They're boosted equally.  It was the subject of [a 
bug|https://github.com/healthonnet/hon-lucene-synonyms/issues/31].

[~iorixxx]: I just tested it out now.  I got:

{code}
(+(DisjunctionMaxQuery((text:"president usa"~5)) 
(((+DisjunctionMaxQuery((text:"president united states of 
america"~5)))/no_coord/no_coord // parsedQuery
+((text:"president usa"~5) ((+(text:"president united states of america"~5 
// parsedQuery.toString()
{code}

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.7
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864401#comment-13864401
 ] 

Mark Miller commented on SOLR-5615:
---

Thanks, perfect.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5616) Make grouping code use response builder needDocList

2014-01-07 Thread Steven Bower (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Bower updated SOLR-5616:
---

Attachment: SOLR-5616.patch

Here is a patch that makes this change. It's against trunk but should easily 
patch onto older versions. Ideally this would get onto a 4.x release..

> Make grouping code use response builder needDocList
> ---
>
> Key: SOLR-5616
> URL: https://issues.apache.org/jira/browse/SOLR-5616
> Project: Solr
>  Issue Type: Bug
>Reporter: Steven Bower
> Attachments: SOLR-5616.patch
>
>
> Right now the grouping code does this to check if it needs to generate a 
> docList for grouped results:
> {code}
> if (rb.doHighlights || rb.isDebug() || params.getBool(MoreLikeThisParams.MLT, 
> false) ){
> ...
> }
> {code}
> this is ugly because any new component that needs a doclist, from grouped 
> results, will need to modify QueryComponent to add a check to this if. 
> Ideally this should just use the rb.isNeedDocList() flag...
> Coincidentally this boolean is really never used at for non-grouped results 
> it always gets generated..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar edited comment on SOLR-5615 at 1/7/14 5:04 PM:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got leader props from ZK successfully, so there 
is actually a leader at this point contrary to what it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1&maxConnectionsPerHost=20&connTimeout=3&socketTimeout=3&retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1&maxConnectionsPerHost=20&connTimeout=3&socketTimeout=3&retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down

// many more such disc events, and then the watches

2014-01-06 07:19:21,694 WARN [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:281] ZooKeeper watch triggered, but Solr cannot talk to ZK
2014-01-06 07:19:21,694 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:210] A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 112)
2014-01-06 07:19:21,694 WARN [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:234] ZooKeeper watch triggered, but Solr cannot talk to ZK

{code}



was (Author: andyetitmoves):
Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover r

[jira] [Created] (SOLR-5616) Make grouping code use response builder needDocList

2014-01-07 Thread Steven Bower (JIRA)
Steven Bower created SOLR-5616:
--

 Summary: Make grouping code use response builder needDocList
 Key: SOLR-5616
 URL: https://issues.apache.org/jira/browse/SOLR-5616
 Project: Solr
  Issue Type: Bug
Reporter: Steven Bower


Right now the grouping code does this to check if it needs to generate a 
docList for grouped results:

{code}
if (rb.doHighlights || rb.isDebug() || params.getBool(MoreLikeThisParams.MLT, 
false) ){
...
}
{code}

this is ugly because any new component that needs a doclist, from grouped 
results, will need to modify QueryComponent to add a check to this if. Ideally 
this should just use the rb.isNeedDocList() flag...

Coincidentally this boolean is really never used at for non-grouped results it 
always gets generated..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar edited comment on SOLR-5615 at 1/7/14 5:02 PM:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got leader props from ZK successfully, so there 
is actually a leader at this point contrary to what it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1&maxConnectionsPerHost=20&connTimeout=3&socketTimeout=3&retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1&maxConnectionsPerHost=20&connTimeout=3&socketTimeout=3&retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down
{code}



was (Author: andyetitmoves):
Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as 

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got cluster state from ZK and successfully 
found the leader, so there is actually a leader at this point contrary to what 
it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1&maxConnectionsPerHost=20&connTimeout=3&socketTimeout=3&retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1&maxConnectionsPerHost=20&connTimeout=3&socketTimeout=3&retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down
{code}


> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster 

[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

2014-01-07 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5463:
---

Description: 
I'd like to revist a solution to the problem of "deep paging" in Solr, 
leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at 
the lucene level: require the clients to provide back a token indicating the 
sort values of the last document seen on the previous "page".  This is similar 
to the "cursor" model I've seen in several other REST APIs that support 
"pagnation" over a large sets of results (notable the twitter API and it's 
"since_id" param) except that we'll want something that works with arbitrary 
multi-level sort critera that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while 
ago, but the key bit of argument parsing to leverage it was commented out due 
to some problems (see comments in that issue).  It's also somewhat out of date 
at this point: at the time it was commited, IndexSearcher only supported 
searchAfter for simple scores, not arbitrary field sorts; and the params added 
in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on 
ensuring that we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode

{panel:title=Basic Usage}
* send a request with {{sort=X&start=0&rows=N&cursorMark=*}}
** sort can be anything, but must include the uniqueKey field (as a tie 
breaker) 
** "N" can be any number you want per page
** start must be "0"
** "\*" denotes you want to use a cursor starting at the beginning mark
* parse the response body and extract the (String) {{nextCursorMark}} value
* Replace the "\*" value in your initial request params with the 
{{nextCursorMark}} value from the response in the subsequent request
* repeat until the {{nextCursorMark}} value stops changing, or you have 
collected as many docs as you need
{panel}


  was:
I'd like to revist a solution to the problem of "deep paging" in Solr, 
leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at 
the lucene level: require the clients to provide back a token indicating the 
sort values of the last document seen on the previous "page".  This is similar 
to the "cursor" model I've seen in several other REST APIs that support 
"pagnation" over a large sets of results (notable the twitter API and it's 
"since_id" param) except that we'll want something that works with arbitrary 
multi-level sort critera that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while 
ago, but the key bit of argument parsing to leverage it was commented out due 
to some problems (see comments in that issue).  It's also somewhat out of date 
at this point: at the time it was commited, IndexSearcher only supported 
searchAfter for simple scores, not arbitrary field sorts; and the params added 
in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on 
ensuring that we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode



> Provide cursor/token based "searchAfter" support that works with arbitrary 
> sorting (ie: "deep paging")
> --
>
> Key: SOLR-5463
> URL: https://issues.apache.org/jira/browse/SOLR-5463
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 5.0
>
> Attachments: SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
> SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man__MissingStringLastComparatorSource.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr, 
> leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
> at the lucene level: require the clients to provide back a token indicating 
> the sort values of the last document seen on the previous "page".  This is 
> similar to the "cursor" model I've seen in several other REST APIs that 
> support "pagnation" over a large sets of results (notable the twitter API and 
> it's "since_id" param) except that we'll want something that works with 
> arbitrary multi-level sort critera that can be either ascending or descending.
> SOLR-1726 lai

[jira] [Commented] (SOLR-5613) Upgrade Apache Commons Codec to version 1.9 in order to improve performance of BeiderMorseFilter

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864364#comment-13864364
 ] 

Shawn Heisey commented on SOLR-5613:


I upgraded commons-codec to 1.9 on an up-to-date branch_4x checkout and found 
that all tests (both Lucene and Solr) passed.  This was on a linux machine.  I 
wasn't too surprised by this.  I think we can accommodate this request easily.

Just for giggles, I went even further and upgraded all commons.apache.org 
components to the newest versions I could find via ivy.  All tests *still* 
passed.  This was on a Windows 8 machine.  With so many upgrades, I was really 
surprised it passed.

{code}
Index: lucene/ivy-versions.properties
===
--- lucene/ivy-versions.properties  (revision 1555313)
+++ lucene/ivy-versions.properties  (working copy)
@@ -19,16 +19,16 @@
 /com.ibm.icu/icu4j = 52.1
 /com.spatial4j/spatial4j = 0.3
 /com.sun.jersey/jersey-core = 1.16
-/commons-beanutils/commons-beanutils = 1.7.0
+/commons-beanutils/commons-beanutils = 1.9.0
 /commons-cli/commons-cli = 1.2
-/commons-codec/commons-codec = 1.7
+/commons-codec/commons-codec = 1.9
 /commons-collections/commons-collections = 3.2.1
-/commons-configuration/commons-configuration = 1.6
-/commons-digester/commons-digester = 2.0
-/commons-fileupload/commons-fileupload = 1.2.1
-/commons-io/commons-io = 2.1
+/commons-configuration/commons-configuration = 1.10
+/commons-digester/commons-digester = 2.1
+/commons-fileupload/commons-fileupload = 1.3
+/commons-io/commons-io = 2.4
 /commons-lang/commons-lang = 2.6
-/commons-logging/commons-logging = 1.1.1
+/commons-logging/commons-logging = 1.1.3
 /de.l3s.boilerpipe/boilerpipe = 1.1.0
 /dom4j/dom4j = 1.6.1
 /edu.ucar/netcdf = 4.2-min
{code}

I'm not advocating that we upgrade all the components at once, but it looks 
like we can indeed upgrade them all eventually.  I only ran the basic tests, so 
additional tests (nightly, weekly, etc) need to be done.


> Upgrade Apache Commons Codec to version 1.9 in order to improve performance 
> of BeiderMorseFilter
> 
>
> Key: SOLR-5613
> URL: https://issues.apache.org/jira/browse/SOLR-5613
> Project: Solr
>  Issue Type: Improvement
>  Components: Rules, Schema and Analysis, search
>Affects Versions: 3.6, 3.6.1, 3.6.2, 4.0, 4.1, 4.2, 4.2.1, 4.3, 4.3.1, 
> 4.4, 4.5, 4.5.1, 4.6
>Reporter: Thomas Champagne
>  Labels: codec, commons, commons-codec, phonetic, search
>
> In version 1.9 of commons-codec project, there are a lot of optimizations in 
> the Beider Morse encoder. This is used by the BeiderMorseFilter in Solr. 
> Do you think it is possible to upgrade this dependency ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Not sure given the info, but the patch doesn't seem crazy to me. I've made a 
few adjustments in this patch.

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2014-01-07 Thread Remi Melisson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864328#comment-13864328
 ] 

Remi Melisson commented on LUCENE-5354:
---

Hi, any news about this feature ?
Could I do anything else ?

> Blended score in AnalyzingInfixSuggester
> 
>
> Key: LUCENE-5354
> URL: https://issues.apache.org/jira/browse/LUCENE-5354
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Affects Versions: 4.4
>Reporter: Remi Melisson
>Priority: Minor
>  Labels: suggester
> Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch
>
>
> I'm working on a custom suggester derived from the AnalyzingInfix. I require 
> what is called a "blended score" (//TODO ln.399 in AnalyzingInfixSuggester) 
> to transform the suggestion weights depending on the position of the searched 
> term(s) in the text.
> Right now, I'm using an easy solution :
> If I want 10 suggestions, then I search against the current ordered index for 
> the 100 first results and transform the weight :
> bq. a) by using the term position in the text (found with TermVector and 
> DocsAndPositionsEnum)
> or
> bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
> searching
> and return the updated 10 most weighted suggestions.
> Since we usually don't need to suggest so many things, the bigger search + 
> rescoring overhead is not so significant but I agree that this is not the 
> most elegant solution.
> We could include this factor (here the position of the term) directly into 
> the index.
> So, I can contribute to this if you think it's worth adding it.
> Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
> dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864250#comment-13864250
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Submitted https://github.com/apache/lucene-solr/pull/13 for one possible 
solution, though I am not sure if this is the right way to go about this..

> Deadlock while trying to recover after a ZK session expiry
> --
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Ramkumar Aiyengar
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: Allow ConnectionManager.process to run from multi...

2014-01-07 Thread andyetitmoves
GitHub user andyetitmoves opened a pull request:

https://github.com/apache/lucene-solr/pull/13

Allow ConnectionManager.process to run from multiple threads

One potential fix for SOLR-5615. Hardly sure about whether this is the 
correct way to go about this, but it's a start I guess..

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andyetitmoves/lucene-solr 
on-recovery-deadlock-4x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/13.patch


commit ad7ac506bc614d43f391aaad7ab25d9b426421c4
Author: Ramkumar Aiyengar 
Date:   2014-01-07T11:57:25Z

Allow ConnectionManager.process to run from multiple threads




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)
Ramkumar Aiyengar created SOLR-5615:
---

 Summary: Deadlock while trying to recover after a ZK session expiry
 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.5, 4.4
Reporter: Ramkumar Aiyengar


The sequence of events which might trigger this is as follows:

 - Leader of a shard, say OL, has a ZK expiry
 - The new leader, NL, starts the election process
 - NL, through Overseer, clears the current leader (OL) for the shard from the 
cluster state
 - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
 - OL marks itself down
 - OL sets up watches for cluster state, and then retrieves it (with no leader 
for this shard)
 - NL, through Overseer, updates cluster state to mark itself leader for the 
shard
 - OL tries to register itself as a replica, and waits till the cluster state 
is updated
   with the new leader from event thread
 - ZK sends a watch update to OL, but it is blocked on the event thread waiting 
for it.

Oops. This finally breaks out after trying to register itself as replica times 
out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5476) Overseer Role for nodes

2014-01-07 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5476:
-

Attachment: SOLR-5476.patch

> Overseer Role for nodes
> ---
>
> Key: SOLR-5476
> URL: https://issues.apache.org/jira/browse/SOLR-5476
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5476.patch, SOLR-5476.patch, SOLR-5476.patch, 
> SOLR-5476.patch
>
>
> In a very large cluster the Overseer is likely to be overloaded.If the same 
> node is a serving a few other shards it can lead to OverSeer getting slowed 
> down due to GC pauses , or simply too much of work  . If the cluster is 
> really large , it is possible to dedicate high end h/w for OverSeers
> It works as a new collection admin command
> command=addrole&role=overseer&node=192.168.1.5:8983_solr
> This results in the creation of a entry in the /roles.json in ZK which would 
> look like the following
> {code:javascript}
> {
> "overseer" : ["192.168.1.5:8983_solr"]
> }
> {code}
> If a node is designated for overseer it gets preference over others when 
> overseer election takes place. If no designated servers are available another 
> random node would become the Overseer.
> Later on, if one of the designated nodes are brought up ,it would take over 
> the Overseer role from the current Overseer to become the Overseer of the 
> system



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5609) Don't let cores create slices/named replicas

2014-01-07 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864241#comment-13864241
 ] 

Noble Paul commented on SOLR-5609:
--

makes sense to have an omnibus property like "legacyCloudMode"  rather than 
having specific properties for each behavior.


> Don't let cores create slices/named replicas
> 
>
> Key: SOLR-5609
> URL: https://issues.apache.org/jira/browse/SOLR-5609
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
> Fix For: 5.0, 4.7
>
>
> In SolrCloud, it is possible for a core to come up in any node , and register 
> itself with an arbitrary slice/coreNodeName. This is a legacy requirement and 
> we would like to make it only possible for Overseer to initiate creation of 
> slice/replicas
> We plan to introduce cluster level properties at the top level
> /cluster-props.json
> {code:javascript}
> {
> "noSliceOrReplicaByCores":true"
> }
> {code}
> If this property is set to true, cores won't be able to send STATE commands 
> with unknown slice/coreNodeName . Those commands will fail at Overseer. This 
> is useful for SOLR-5310 / SOLR-5311 where a core/replica is deleted by a 
> command and  it comes up later and tries to create a replica/slice



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5614) Boost documents using "map" and "query" functions

2014-01-07 Thread Anca Kopetz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anca Kopetz updated SOLR-5614:
--

Description: 
We want to boost documents that contain specific search terms in its fields. 

We tried the following simplified query : 
http://localhost:8983/solr/collection1/select?q=ipod 
belkin&wt=xml&debugQuery=true&q.op=AND&defType=edismax&bf=map(query($qq),0,0,0,100.0)&qq={!edismax}power

And we get the following error : 
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
'power'

And the stacktrace :

ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
Infinite Recursion detected parsing query 'power'
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
parsing query 'power'
at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
at org.apache.solr.search.QParser.subQuery(QParser.java:200)
at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
at 
org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at 
org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
at org.apache.solr.search.QParser.getQuery(QP

[jira] [Updated] (SOLR-5614) Boost documents using "map" and "query" functions

2014-01-07 Thread Anca Kopetz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anca Kopetz updated SOLR-5614:
--

Description: 
We want to boost documents that contain specific search terms in its fields. 

We tried the following simplified query : 
http://localhost:8983/solr/collection1/select?q=ipod%20belkin&wt=xml&debugQuery=true&q.op=AND&defType=edismax&bf=map(query($qq),0,0,0,100.0)&qq={!edismax}power

And we get the following error : 
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
'power'

And the stacktrace :

ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
Infinite Recursion detected parsing query 'power'
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
parsing query 'power'
at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
at org.apache.solr.search.QParser.subQuery(QParser.java:200)
at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
at 
org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at 
org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
at org.apache.solr.search.QParser.getQuery(Q

[jira] [Created] (SOLR-5614) Boost documents using "map" and "query" functions

2014-01-07 Thread Anca Kopetz (JIRA)
Anca Kopetz created SOLR-5614:
-

 Summary: Boost documents using "map" and "query" functions
 Key: SOLR-5614
 URL: https://issues.apache.org/jira/browse/SOLR-5614
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Anca Kopetz


We want to boost documents that contain specific search terms in its fields. 

We tried the following simplified query : 
http://localhost:8983/solr/collection1/select?q=ipod 
belkin&wt=xml&debugQuery=true&q.op=AND&defType=edismax&bf=map(query($qq),0,0,0,100.0)&qq={!edismax}power

And we get the following error : 
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
'power'

And the stacktrace :

ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
Infinite Recursion detected parsing query 'power'
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
parsing query 'power'
at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
at org.apache.solr.search.QParser.subQuery(QParser.java:200)
at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
at 
org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at 
org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQPars

[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864149#comment-13864149
 ] 

Ahmet Arslan commented on SOLR-5379:


Assume synonyms are {code}  usa, united states of america {code} What happens 
if I fire the following sloppy phrase query  *"president usa"~5*

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.7
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864147#comment-13864147
 ] 

Markus Jelsma commented on SOLR-5379:
-

How does this patch handle boosts?  Are the synonym and the original keywords 
boosted equally?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.7
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

2014-01-07 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864141#comment-13864141
 ] 

Markus Jelsma commented on SOLR-4260:
-

Ok, I followed all the great work here and in related tickets and yesterday i 
had the time to rebuild Solr and check for this issue. I hadn't seen it 
yesterday but it is right in front of me again, using a fresh build from 
January 6th.

Leader has Num Docs: 379659
Replica has Num Docs: 379661

> Inconsistent numDocs between leader and replica
> ---
>
> Key: SOLR-4260
> URL: https://issues.apache.org/jira/browse/SOLR-4260
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
> Environment: 5.0.0.2013.01.04.15.31.51
>Reporter: Markus Jelsma
>Assignee: Mark Miller
>Priority: Critical
> Fix For: 5.0, 4.7
>
> Attachments: 192.168.20.102-replica1.png, 
> 192.168.20.104-replica2.png, clusterstate.png
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using 
> CloudSolrServer we see inconsistencies between the leader and replica for 
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
> a small deviation in then number of documents. The leader and slave deviate 
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my 
> attention, there were small IDF differences for exactly the same record 
> causing a record to shift positions in the result set. During those tests no 
> records were indexed. Consecutive catch all queries also return different 
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor 
> of two and frequently reindex using a fresh build from trunk. I've not seen 
> this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5288) Delta import is calling applyTranformer() during deltaQuerry and causing ScriptException

2014-01-07 Thread Daniele Baldi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864057#comment-13864057
 ] 

Daniele Baldi commented on SOLR-5288:
-

Hi,
I found this error while experimenting delta import using TemplateTransformer:

 WARN   : TemplateTransformer : Unable to resolve variable:  
while parsing expression: ${}

This error is thrown because SOLR try to apply transformers on deltaQuery, too. 
I also think transformation is not required for deltaQuery. 

Thanks
Daniele

> Delta import is calling applyTranformer() during deltaQuerry and causing 
> ScriptException
> 
>
> Key: SOLR-5288
> URL: https://issues.apache.org/jira/browse/SOLR-5288
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.4
>Reporter: Balaji Manoharan
>Priority: Critical
>
> While experimenting delta import, was getting Script Exception such as 
> 'toString()' is not found on null.
> These are the queries that am using
> a) Query > SELECT PK_FIELD, JOIN_DATE, USER_NAME FROM USERS
> b) Delta Query > SELECY PK_FIELD FROM USERS WHERE LAST_MODIFIED_DATE > 
> '${dih.last_index_time}'
> c) Delta Import Query > SELECT PK_FIELD, JOIN_DATE, USER_NAME FROM USERS 
> WHERE PK_FIELD = '${dih.delta.PK_FIELD}'
> Have a script transformer as below
> function dynamicData(){
>   var joinDt = row.get('JOIN_DATE');
>   var dtDisplay = joinDt.toString();  //e.g to show that am not doing 
> null check since join_date is a not null field
>   ...
>   ...
>   return row;
> }
> 
> ...
> 
> Problem: While performing delta import, was getting exception from Rhino 
> engine on the script line 'joinDt.toString()'.
> The exception trace is as follows
> Caused by: javax.script.ScriptException: 
> sun.org.mozilla.javascript.internal.EcmaError: TypeError: Cannot call method 
> "t
> oString" of null (#4) in  at line number 4
> at 
> com.sun.script.javascript.RhinoScriptEngine.invoke(RhinoScriptEngine.java:300)
> at 
> com.sun.script.javascript.RhinoScriptEngine.invokeFunction(RhinoScriptEngine.java:258)
> at 
> org.apache.solr.handler.dataimport.ScriptTransformer.transformRow(ScriptTransformer.java:56)
> ... 8 more
> Root Cause: Since I know join_date can not be null, have explored the solr 
> source code and noticed that applyTransformer() is called during deltaQuery 
> and at that time join_date will not be available.
> Reference: EntityProcessorWrapper.nextModifiedRowKey()
> I think transformation is not required for deltaQuery since it is mainly 
> designed to retrieving the primary keys of the modified rows. Further, the 
> output of deltaQuery will be used only in another SQL.
> Work around:
> Just added a null check as a workaround as below 
> function dynamicData(){
>   var joinDt = row.get('JOIN_DATE');
>   if(joinDt == null){
>   return row;
>   }
>   ...
>   ...
>   return row;
> }
> I don't have too much knowledge about Solr and hence my suggestion could be 
> invalid while looking from main use cases.
> Please validate my comments once.
> Thanks
> Balaji



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5613) Upgrade Apache Commons Codec to version 1.9 in order to improve performance of BeiderMorseFilter

2014-01-07 Thread Thomas Champagne (JIRA)
Thomas Champagne created SOLR-5613:
--

 Summary: Upgrade Apache Commons Codec to version 1.9 in order to 
improve performance of BeiderMorseFilter
 Key: SOLR-5613
 URL: https://issues.apache.org/jira/browse/SOLR-5613
 Project: Solr
  Issue Type: Improvement
  Components: Rules, Schema and Analysis, search
Affects Versions: 4.6, 4.5.1, 4.5, 4.4, 4.3.1, 4.3, 4.2.1, 4.2, 4.1, 4.0, 
3.6.2, 3.6.1, 3.6
Reporter: Thomas Champagne


In version 1.9 of commons-codec project, there are a lot of optimizations in 
the Beider Morse encoder. This is used by the BeiderMorseFilter in Solr. 
Do you think it is possible to upgrade this dependency ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org