date:20131209


 [ 
https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-5525.
--

Resolution: Fixed

 deprecate ClusterState#getCollectionStates() 
 -

 Key: SOLR-5525
 URL: https://issues.apache.org/jira/browse/SOLR-5525
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5525.patch, SOLR-5525.patch


 This is a very expensive call if there are are large no:of collections. 
 Mostly, it is used to check if a collection exists



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5473) Make one state.json per collection


 [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5473:
-

Attachment: SOLR-5473.patch

a couple of tests fail

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843343#comment-13843343
]

Mark Miller commented on SOLR-1301:
---

bq. if we need some of the classes this jar provides, we should declare direct
dependencies on the appropriate artifacts.

Right - Wolfgang likely knows best when it comes to Morphlines.. At a minimum
we should pull the necessary jars in explicitly I think. I've got to take a
look at what they are.

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 5.0, 4.7

Attachments: README.txt, SOLR-1301-hadoop-0-20.patch,
SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar,
commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar,
hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch,
log4j-1.2.15.jar

This patch contains a contrib module that provides distributed indexing
(using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is
twofold:
* provide an API that is familiar to Hadoop developers, i.e. that of
OutputFormat
* avoid unnecessary export and (de)serialization of data maintained on HDFS.
SolrOutputFormat consumes data produced by reduce tasks directly, without
storing it in intermediate files. Furthermore, by using an
EmbeddedSolrServer, the indexing task is split into as many parts as there
are reducers, and the data to be indexed is not sent over the network.
Design
--
Key/value pairs produced by reduce tasks are passed to SolrOutputFormat,
which in turn uses SolrRecordWriter to write this data. SolrRecordWriter
instantiates an EmbeddedSolrServer, and it also instantiates an
implementation of SolrDocumentConverter, which is responsible for turning
Hadoop (key, value) into a SolrInputDocument. This data is then added to a
batch, which is periodically submitted to EmbeddedSolrServer. When reduce
task completes, and the OutputFormat is closed, SolrRecordWriter calls
commit() and optimize() on the EmbeddedSolrServer.
The API provides facilities to specify an arbitrary existing solr.home
directory, from which the conf/ and lib/ files will be taken.
This process results in the creation of as many partial Solr home directories
as there were reduce tasks. The output shards are placed in the output
directory on the default filesystem (e.g. HDFS). Such part-N directories
can be used to run N shard servers. Additionally, users can specify the
number of reduce tasks, in particular 1 reduce task, in which case the output
will consist of a single shard.
An example application is provided that processes large CSV files and uses
this API. It uses a custom CSV processing to avoid (de)serialization overhead.
This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this
issue, you should put it in contrib/hadoop/lib.
Note: the development of this patch was sponsored by an anonymous contributor
and approved for release under Apache License.

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5467) Provide Solr Ref Guide in .epub format

2013-12-09 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843413#comment-13843413
 ] 

Hoss Man commented on SOLR-5467:


Thread where this initially came up: 
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3c528a1321.4060...@hebis.uni-frankfurt.de%3E

 Provide Solr Ref Guide in .epub format
 --

 Key: SOLR-5467
 URL: https://issues.apache.org/jira/browse/SOLR-5467
 Project: Solr
  Issue Type: Wish
  Components: documentation
Reporter: Cassandra Targett

 From the solr-user list, a request for an .epub version of the Solr Ref Guide.
 There are two possible approaches that immediately come to mind:
 * Ask infra to install a plugin that automatically outputs the Confluence 
 pages in .epub
 * Investigate converting HTML export to .epub with something like calibre
 There might be other options, and there would be additional issues for 
 automating the process of creation and publication, so for now just recording 
 the request with an issue.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2013-12-09 Thread wolfgang hoschek (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843443#comment-13843443
 ] 

wolfgang hoschek commented on SOLR-1301:


I'm not aware of anything needing jersey except perhaps hadoop pulls that in.

The combined dependencies of all morphline modules is here: 
http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html

The dependencies of each individual morphline modules is here: 
http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html

The source and POMs are here, as usual: 
https://github.com/cloudera/cdk/tree/master/cdk-morphlines

By the way, a somewhat separate issue is that it seems to me that the ivy 
dependences for solr-morphlines-core and solr-morphlines-cell and 
solr-map-reduce are a bit backwards upstream in that solr-morphlines-core pulls 
in a ton of dependencies that it doesn't need, and those deps should rather be 
pulled in by the solr-map-reduce (which is a essentially an out-of-the-box 
app). Would be good to organize ivy and mvn upstream in such a way that 

* solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all 
plus xyz
* solr-morphlines-cell should depend on solr-morphlines-core plus xyz
* solr-morphlines-core should depend on cdk-morphlines-core plus xyz 

More concretely, FWIW, to see how the deps look like in production releases 
downstream review the following POMs: 

https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml

and

https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml

and

https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml

 Add a Solr contrib that allows for building Solr indexes via Hadoop's 
 Map-Reduce.
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: New Feature
Reporter: Andrzej Bialecki 
Assignee: Mark Miller
 Fix For: 5.0, 4.7

 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
 SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
 commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
 hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
 log4j-1.2.15.jar


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
 SolrOutputFormat consumes data produced by reduce tasks directly, without 
 storing it in intermediate files. Furthermore, by using an 
 EmbeddedSolrServer, the indexing task is split into as many parts as there 
 are reducers, and the data to be indexed is not sent over the network.
 Design
 --
 Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
 which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
 instantiates an EmbeddedSolrServer, and it also instantiates an 
 implementation of SolrDocumentConverter, which is responsible for turning 
 Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
 batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
 task completes, and the OutputFormat is closed, SolrRecordWriter calls 
 commit() and optimize() on the EmbeddedSolrServer.
 The API provides facilities to specify an arbitrary existing solr.home 
 directory, from which the conf/ and lib/ files will be taken.
 This process results in the creation of as many partial Solr home directories 
 as there were reduce tasks. The output shards are placed in the output 
 directory on the default filesystem (e.g. HDFS). Such part-N directories 
 can be used to run N shard servers. Additionally, users can specify the 
 number of reduce tasks, in particular 1 reduce task, in which case the output 
 will consist of a single shard.
 An example application is provided that processes large CSV files and uses 
 this API. It uses a custom CSV processing to avoid (de)serialization overhead.
 This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
 issue, you should put it in contrib/hadoop/lib.
 Note: the development of this patch was sponsored by an

[jira] [Comment Edited] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2013-12-09 Thread wolfgang hoschek (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843443#comment-13843443
 ] 

wolfgang hoschek edited comment on SOLR-1301 at 12/9/13 7:30 PM:
-

I'm not aware of anything needing jersey except perhaps hadoop pulls that in.

The combined dependencies of all morphline modules is here: 
http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html

The dependencies of each individual morphline modules is here: 
http://cloudera.github.io/cdk/docs/current/dependencies.html

The source and POMs are here, as usual: 
https://github.com/cloudera/cdk/tree/master/cdk-morphlines

By the way, a somewhat separate issue is that it seems to me that the ivy 
dependences for solr-morphlines-core and solr-morphlines-cell and 
solr-map-reduce are a bit backwards upstream in that currently 
solr-morphlines-core pulls in a ton of dependencies that it doesn't need, and 
those deps should rather be pulled in by the solr-map-reduce (which is a 
essentially an out-of-the-box app that bundles user level deps). 
Correspondingly, would be good to organize ivy and mvn upstream in such a way 
that 

* solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all 
minus cdk-morphlines-solr-cell (now upstream) minus cdk-morphlines-solr-core 
(now upstream) plus xyz
* solr-morphlines-cell should depend on solr-morphlines-core plus xyz
* solr-morphlines-core should depend on cdk-morphlines-core plus xyz 

More concretely, FWIW, to see how the deps look like in production releases 
downstream review the following POMs: 

https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml

and

https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml

and

https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml


was (Author: whoschek):
I'm not aware of anything needing jersey except perhaps hadoop pulls that in.

The combined dependencies of all morphline modules is here: 
http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html

The dependencies of each individual morphline modules is here: 
http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html

The source and POMs are here, as usual: 
https://github.com/cloudera/cdk/tree/master/cdk-morphlines

By the way, a somewhat separate issue is that it seems to me that the ivy 
dependences for solr-morphlines-core and solr-morphlines-cell and 
solr-map-reduce are a bit backwards upstream in that solr-morphlines-core pulls 
in a ton of dependencies that it doesn't need, and those deps should rather be 
pulled in by the solr-map-reduce (which is a essentially an out-of-the-box 
app). Would be good to organize ivy and mvn upstream in such a way that 

* solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all 
plus xyz
* solr-morphlines-cell should depend on solr-morphlines-core plus xyz
* solr-morphlines-core should depend on cdk-morphlines-core plus xyz 

More concretely, FWIW, to see how the deps look like in production releases 
downstream review the following POMs: 

https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml

and

https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml

and

https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml

 Add a Solr contrib that allows for building Solr indexes via Hadoop's 
 Map-Reduce.
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: New Feature
Reporter: Andrzej Bialecki 
Assignee: Mark Miller
 Fix For: 5.0, 4.7

 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
 SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
 commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
 hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
 log4j-1.2.15.jar


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS.

[jira] [Commented] (SOLR-5473) Make one state.json per collection


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843462#comment-13843462
 ] 

Mark Miller commented on SOLR-5473:
---

bq. if(debugState 

Best to do that with debug logging level rather than introduce a debug sys prop 
for this class.

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5542) Global query parameters to facet queries

2013-12-09 Thread Isaac Hebsh (JIRA)

Isaac Hebsh created SOLR-5542:
-

 Summary: Global query parameters to facet queries
 Key: SOLR-5542
 URL: https://issues.apache.org/jira/browse/SOLR-5542
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.6
Reporter: Isaac Hebsh


(From the Mailing List)

It seems that a facet query does not use the global query parameters (for 
example, field aliasing for edismax parser).
We have an intensive use of facet queries (in some cases, we have a lot of 
facet.query for a single q), and the using of LocalParams for each facet.query 
is not convenient.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843496#comment-13843496
]

Steve Rowe commented on SOLR-1301:
--

[~whoschek], I'm lost: what do you mean by upstream/downstream? In my
experience, upstream refers to a parent project, i.e. one from which the
project in question is derived, and downstream is the child/derived project.
I don't know the history here, but you seem to be referring to the solr
contribs when you say upstream? If that's true, then my understanding of
these terms is the opposite of how you're using them. Maybe the question I
should be asking is: what is/are the relationship(s) between/among
cdk-morphlines-solr-* and solr-morphlines-*?

And (I assume) relatedly, how how does cdk-morphlines-all relate to
cdk-morphlines-solr-core/-cell?

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 5.0, 4.7

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843524#comment-13843524
]

Steve Rowe commented on SOLR-1301:
--

bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to
cdk-morphlines-solr-core/-cell?

I can answer this one myself from
[https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]:
it's an aggregation-only modules that depends on all of the cdk-morphlines-*
modules.

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 5.0, 4.7

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2013-12-09 Thread wolfgang hoschek (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843523#comment-13843523
]

wolfgang hoschek commented on SOLR-1301:

Apologies for the confusion. We are upstreaming cdk-morphlines-solr-cell into
the solr contrib solr-morphlines-cell as well as cdk-morphlines-solr-core into
the solr contrib solr-morphlines-core as well as search-mr into the solr
contrib solr-map-reduce. Once the upstreaming is done these old modules will go
away. Next, downstream will be made identical to upstream plus perhaps some
critical fixes as necessary, and the upstream/downstream terms will apply in
the way folks usually think about them, but we are not quite yet there today,
but getting there...

cdk-morphlines-all is simply a convenience pom that includes all the other
morphline poms so there's less to type for users who like a bit more auto magic.

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 5.0, 4.7

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843524#comment-13843524
]

Steve Rowe edited comment on SOLR-1301 at 12/9/13 8:34 PM:
---

bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to
cdk-morphlines-solr-core/-cell?

I can answer this one myself from
[https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]:
it's an aggregation-only module that depends on all of the cdk-morphlines-*
modules.

was (Author: steve_rowe):
bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to
cdk-morphlines-solr-core/-cell?

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 5.0, 4.7

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters

2013-12-09 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5541:
-

Attachment: SOLR-5541.patch

Added test case

 Allow QueryElevationComponent to accept elevateIds and excludeIds as http 
 parameters
 

 Key: SOLR-5541
 URL: https://issues.apache.org/jira/browse/SOLR-5541
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.6
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5541.patch, SOLR-5541.patch


 The QueryElevationComponent currently uses an xml file to map query strings 
 to elevateIds and excludeIds.
 This ticket adds the ability to pass in elevateIds and excludeIds through two 
 new http parameters elevateIds and excludeIds.
 This will allow more sophisticated business logic to be used in selecting 
 which ids to elevate/exclude.
 Proposed syntax:
 http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8
 The elevateIds and excludeIds point to the unique document Id.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4983) Problematic core naming by collection create API


[ 
https://issues.apache.org/jira/browse/SOLR-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843571#comment-13843571
 ] 

Mark Miller commented on SOLR-4983:
---

bq. could anyone suggest if by creating cores separately (with the same 
collection name) we would achieve the same effect as creating collection via 
Collections API? 

By and large, currently, yes, this is supported. There is a flag that tracks if 
the collection was created with the collections api or not - and if it is, you 
will end up being able to use further features in the future - but currently 
you should be able to use the cores api to do what you want no problem.

 Problematic core naming by collection create API 
 -

 Key: SOLR-4983
 URL: https://issues.apache.org/jira/browse/SOLR-4983
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Chris Toomey

 The SolrCloud collection create API creates cores named 
 foo_shardx_replicay when asked to create collection foo.
 This is problematic for at least 2 reasons: 
 1) these ugly core names show up in the core admin UI, and will vary 
 depending on which node is being used,
 2) it prevents collections from being used in SolrCloud joins, since join 
 takes a core name as the fromIndex parameter and there's no single core name 
 for the collection.  As I've documented in 
 https://issues.apache.org/jira/browse/SOLR-4905 and 
 http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4074038.html,
  SolrCloud join does work when the inner collection (fromIndex) is not 
 sharded, assuming that collection is available and initialized at SolrCloud 
 bootstrap time.
 Could this be changed to instead use the collection name for the core name?  
 Or at least add a core-name option to the API?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!

2013-12-09 Thread builder

Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexableField.testArbitraryFields

Error Message:
-60

Stack Trace:
java.lang.ArrayIndexOutOfBoundsException: -60
at 
__randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0)
at java.util.ArrayList.get(ArrayList.java:324)
at 
org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106)
at 
org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45)
at 
org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201)
at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108)
at 
org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at

[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters

2013-12-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843573#comment-13843573
 ] 

Mark Miller commented on SOLR-5541:
---

+1

One comment:

+  assertQ(All six should make it, req

Should update the copy/paste assert comment - only 5 should make it because b 
is excluded.

 Allow QueryElevationComponent to accept elevateIds and excludeIds as http 
 parameters
 

 Key: SOLR-5541
 URL: https://issues.apache.org/jira/browse/SOLR-5541
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.6
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5541.patch, SOLR-5541.patch


 The QueryElevationComponent currently uses an xml file to map query strings 
 to elevateIds and excludeIds.
 This ticket adds the ability to pass in elevateIds and excludeIds through two 
 new http parameters elevateIds and excludeIds.
 This will allow more sophisticated business logic to be used in selecting 
 which ids to elevate/exclude.
 Proposed syntax:
 http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8
 The elevateIds and excludeIds point to the unique document Id.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!

2013-12-09 Thread Simon Willnauer

nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1

On Mon, Dec 9, 2013 at 9:24 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestIndexableField.testArbitraryFields

 Error Message:
 -60

 Stack Trace:
 java.lang.ArrayIndexOutOfBoundsException: -60
 at 
 __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0)
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106)
 at 
 org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45)
 at 
 org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201)
 at 
 org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
 at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108)
 at 
 org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at

Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!

2013-12-09 Thread Simon Willnauer

I committed a fix

On Mon, Dec 9, 2013 at 9:36 PM, Simon Willnauer sim...@apache.org wrote:
 nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1

 On Mon, Dec 9, 2013 at 9:24 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestIndexableField.testArbitraryFields

 Error Message:
 -60

 Stack Trace:
 java.lang.ArrayIndexOutOfBoundsException: -60
 at 
 __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0)
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106)
 at 
 org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45)
 at 
 org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201)
 at 
 org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
 at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108)
 at 
 org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at

[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters

2013-12-09 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843584#comment-13843584
 ] 

Joel Bernstein commented on SOLR-5541:
--

Thanks Mark, I'll fix that up.

 Allow QueryElevationComponent to accept elevateIds and excludeIds as http 
 parameters
 

 Key: SOLR-5541
 URL: https://issues.apache.org/jira/browse/SOLR-5541
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.6
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5541.patch, SOLR-5541.patch


 The QueryElevationComponent currently uses an xml file to map query strings 
 to elevateIds and excludeIds.
 This ticket adds the ability to pass in elevateIds and excludeIds through two 
 new http parameters elevateIds and excludeIds.
 This will allow more sophisticated business logic to be used in selecting 
 which ids to elevate/exclude.
 Proposed syntax:
 http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8
 The elevateIds and excludeIds point to the unique document Id.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!

2013-12-09 Thread Dawid Weiss

Hurray for the improbable... :)

D.

On Mon, Dec 9, 2013 at 10:36 PM, Simon Willnauer sim...@apache.org wrote:
 nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1

 On Mon, Dec 9, 2013 at 9:24 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestIndexableField.testArbitraryFields

 Error Message:
 -60

 Stack Trace:
 java.lang.ArrayIndexOutOfBoundsException: -60
 at 
 __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0)
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106)
 at 
 org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45)
 at 
 org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201)
 at 
 org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
 at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108)
 at 
 org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at

[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters

2013-12-09 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5541:
-

Attachment: SOLR-5541.patch

 Allow QueryElevationComponent to accept elevateIds and excludeIds as http 
 parameters
 

 Key: SOLR-5541
 URL: https://issues.apache.org/jira/browse/SOLR-5541
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.6
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch


 The QueryElevationComponent currently uses an xml file to map query strings 
 to elevateIds and excludeIds.
 This ticket adds the ability to pass in elevateIds and excludeIds through two 
 new http parameters elevateIds and excludeIds.
 This will allow more sophisticated business logic to be used in selecting 
 which ids to elevate/exclude.
 Proposed syntax:
 http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8
 The elevateIds and excludeIds point to the unique document Id.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5364) Review usages of hard-coded Version constants


[ 
https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843652#comment-13843652
 ] 

ASF subversion and git services commented on LUCENE-5364:
-

Commit 1549701 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1549701 ]

LUCENE-5364: Replace hard-coded Version.LUCENE_XY that doesn't have to be 
hard-coded (because of back-compat testing or version dependent behavior, or 
demo code that should exemplify pinning versions in user code), with 
Version.LUCENE_CURRENT in non-test code, or with 
LuceneTestCase.TEST_VERSION_CURRENT in test code; upgrade hard-coded 
Version.LUCENE_XY constants that should track the next release version to the 
next release version if they aren't already there, and put a token near them so 
that they can be found and upgraded when the next release version changes: 
':Post-Release-Update-Version.LUCENE_XY:'

 Review usages of hard-coded Version constants
 -

 Key: LUCENE-5364
 URL: https://issues.apache.org/jira/browse/LUCENE-5364
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 5.0, 4.7
Reporter: Steve Rowe
Priority: Minor
 Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, 
 LUCENE-5364-trunk.patch


 There are some hard-coded {{Version.LUCENE_XY}} constants used in various 
 places.  Some of these are intentional and appropriate:
 * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses 
 {{Version.LUCENE_31}}
 * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other 
 analysis components)
 * to test different behavior at different points in history (e.g. 
 {{TestStopFilter}} to test position increments)
 But should hard-coded constants be used elsewhere?
 For those that should remain, and need to be updated with each release, there 
 should be an easy way to find them.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5364) Review usages of hard-coded Version constants

2013-12-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843658#comment-13843658
 ] 

ASF subversion and git services commented on LUCENE-5364:
-

Commit 1549703 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1549703 ]

LUCENE-5364: Replace hard-coded Version.LUCENE_XY that doesn't have to be 
hard-coded (because of back-compat testing or version dependent behavior, or 
demo code that should exemplify pinning versions in user code), with 
Version.LUCENE_CURRENT in non-test code, or with 
LuceneTestCase.TEST_VERSION_CURRENT in test code; upgrade hard-coded 
Version.LUCENE_XY constants that should track the next release version to the 
next release version if they aren't already there, and put a token near them so 
that they can be found and upgraded when the next release version changes: 
':Post-Release-Update-Version.LUCENE_XY:' (merge trunk r1549701)

 Review usages of hard-coded Version constants
 -

 Key: LUCENE-5364
 URL: https://issues.apache.org/jira/browse/LUCENE-5364
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 5.0, 4.7
Reporter: Steve Rowe
Priority: Minor
 Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, 
 LUCENE-5364-trunk.patch


 There are some hard-coded {{Version.LUCENE_XY}} constants used in various 
 places.  Some of these are intentional and appropriate:
 * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses 
 {{Version.LUCENE_31}}
 * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other 
 analysis components)
 * to test different behavior at different points in history (e.g. 
 {{TestStopFilter}} to test position increments)
 But should hard-coded constants be used elsewhere?
 For those that should remain, and need to be updated with each release, there 
 should be an easy way to find them.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2013-12-09 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5463:
---

Attachment: SOLR-5463__straw_man.patch

Ok, updated patch making the change in user semantics I mentioned wanting to 
try last week.  Best way to explain it is with a walk through of a simple 
example (note: if you try the current strawman code, the numFound and start 
values returned in the docList don't match what i've pasted in the examples 
below -- these examples show what the final results should look like in the 
finished solution)

Initial requests using searchAfter should always start with a totem value of 
{{\*}}

{code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+descsearchAfter=*}
{
  responseHeader:{
status:0,
QTime:2},
  response:{numFound:32,start:-1,docs:[
  // ...20 docs here...
]
  },
  nextSearchAfter:AoEjTk9L}
{code}

The {{nextSearchAfter}} token returned by this request tells us what to use in 
the second request...

{code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+descsearchAfter=AoEjTk9L}
{
  responseHeader:{
status:0,
QTime:7},
  response:{numFound:32,start:-1,docs:[
  // ...12 docs here...
]
  },
  nextSearchAfter:AoEoMDU3OUIwMDI=}
{code}

Since this result block contains fewer rows then were requested, the client 
could automatically stop, but the {{nextSearchAfter}} is still returned, and 
it's still safe to request a subsequent page (this is the fundemental diff from 
the previous patches, where {{nextSearchAfter}} was set to {{null}} anytime the 
code could tell there were no more results ...

{code:title=http://localhost:8983/solr/deep?q=*:*wt=jsonindent=truerows=20fl=id,pricesort=id+descsearchAfter=AoEoMDU3OUIwMDI=}
{
  responseHeader:{
status:0,
QTime:1},
  response:{numFound:32,start:-1,docs:[]
  },
  nextSearchAfter:AoEoMDU3OUIwMDI=}
{code}

Note that in this case, with no docs included in the response, the 
{{nextSearchAfter}} totem is the same as the input.

For some sorts this makes it possible for clients to resume a full walk of 
all documents matching a query -- picking up where they let off if more 
documents are added to the index that match (for example: when doing an 
ascending sort on a numeric uniqueKey field that always increases as new docs 
are added, sorting by a timestamp field (asc) indicating when documents are 
crawled, etc...)

This also works as you would expect for searches that don't match any 
documents...

{code:title=http://localhost:8983/solr/deep?q=text:bogusrows=20sort=id+descsearchAfter=*}
{
  responseHeader:{
status:0,
QTime:21},
  response:{numFound:0,start:-1,docs:[]
  },
  nextSearchAfter:*}
{code}


 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2013-12-09 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843687#comment-13843687
 ] 

Hoss Man commented on SOLR-5463:


The one significant change i still want to make before abandoming this straw 
man and moving on to using PaginatingCollector under the covers is to rethink 
the vocabulary.

at the Lucene/IndexSearcher level, this functionality is leveraged using a 
searchAfter param which indicates the exact FieldDoc returned by a previous 
search.  The name makes a lot of sense in this API given that the FieldDoc you 
specify is expected to come from a previous search, and you are specifying that 
you want to search for documents after this document in the ocntext of the 
specified query/sort.

For the Solr request API however, I feel like this terminology might confuse 
people.  I'm concerned people might think they can use the uniqueKey of the 
last document they got on the previous page (instead of realizing they need to 
specify the special token they were returned as part of that page).

My thinking is that from a user perspective, we should call this functionality 
a Result Cursor and rename the request param and response key appropriately. 
something along the lines of...

{code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=AoEjTk9L}
{
  responseHeader:{
status:0,
QTime:7},
  response:{numFound:32,start:-1,docs:[
  // ... docs here...
]
  },
  cursorContinue:AoEoMDU3OUIwMDI=}
{code}

* searchAfter = cursor
* nextSearchAfter = cursorContinue

What do folks think?


 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5364) Review usages of hard-coded Version constants


 [ 
https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-5364.


   Resolution: Fixed
Fix Version/s: 4.7
   5.0
 Assignee: Steve Rowe
Lucene Fields: New,Patch Available  (was: New)

Committed to trunk and branch_4x.

I added a note to the Lucene ReleaseToDo wiki page about using 
{{:Post-Release-Update-Version.LUCENE_XY:}} to find constants that should be 
upgraded to the next release version after a release branch has been cut.

 Review usages of hard-coded Version constants
 -

 Key: LUCENE-5364
 URL: https://issues.apache.org/jira/browse/LUCENE-5364
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 5.0, 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, 
 LUCENE-5364-trunk.patch


 There are some hard-coded {{Version.LUCENE_XY}} constants used in various 
 places.  Some of these are intentional and appropriate:
 * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses 
 {{Version.LUCENE_31}}
 * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other 
 analysis components)
 * to test different behavior at different points in history (e.g. 
 {{TestStopFilter}} to test position increments)
 But should hard-coded constants be used elsewhere?
 For those that should remain, and need to be updated with each release, there 
 should be an easy way to find them.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)


[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843748#comment-13843748
 ] 

Steve Rowe commented on SOLR-5463:
--

{quote}
* searchAfter = cursor
* nextSearchAfter = cursorContinue
{quote}

+1

bq. I'm concerned people might think they can use the uniqueKey of the last 
document they got on the previous page

I tried making this mistake (using the trailing unique id (NOK in this 
example) as the searchAfter param value, and I got the following error message:

{code}
{
  responseHeader:{
status:400,
QTime:2},
  error:{
msg:Unable to parse search after totem: NOK,
code:400}}
{code}

I think that error message should include the param name ({{cursorContinue}}) 
that couldn't be parsed.

Also, maybe it would be useful to include a prefix that will (probably) never 
be used in unique ids, to visually identify the cursor as such: like always 
prepending '*'?  So your example of the future would become:

{code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=*AoEjTk9L}
{
  responseHeader:{
status:0,
QTime:7},
  response:{numFound:32,start:-1,docs:[
  // ... docs here...
]
  },
  cursorContinue:*AoEoMDU3OUIwMDI=}
{code}

The error message when someone gives an unparseable {{cursor}} could then 
include this piece of information: cursors begin with an asterisk.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection

2013-12-09 Thread Timothy Potter (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843749#comment-13843749
 ] 

Timothy Potter commented on SOLR-5473:
--

Thanks for fixing the CloudSolrServerTest failure ... One thing I wasn't sure 
about when looking over the latest patch was whether allCollections in 
ZkStateReader will hold the names of external collections? I assume so by the 
name *all* but it doesn't seem like any external collection names are added to 
that Set currently. 

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)


[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843748#comment-13843748
 ] 

Steve Rowe edited comment on SOLR-5463 at 12/10/13 12:21 AM:
-

{quote}
* searchAfter = cursor
* nextSearchAfter = cursorContinue
{quote}

+1

bq. I'm concerned people might think they can use the uniqueKey of the last 
document they got on the previous page

I tried making this mistake (using the trailing unique id (NOK in this 
example) as the searchAfter param value, and I got the following error message:

{code}
{
  responseHeader:{
status:400,
QTime:2},
  error:{
msg:Unable to parse search after totem: NOK,
code:400}}
{code}

(*edit*: {{cursorContinue}} = {{cursor}} in the sentence below)

I think that error message should include the param name ({{cursor}}) that 
couldn't be parsed.

Also, maybe it would be useful to include a prefix that will (probably) never 
be used in unique ids, to visually identify the cursor as such: like always 
prepending '*'?  So your example of the future would become:

{code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=*AoEjTk9L}
{
  responseHeader:{
status:0,
QTime:7},
  response:{numFound:32,start:-1,docs:[
  // ... docs here...
]
  },
  cursorContinue:*AoEoMDU3OUIwMDI=}
{code}

The error message when someone gives an unparseable {{cursor}} could then 
include this piece of information: cursors begin with an asterisk.


was (Author: steve_rowe):
{quote}
* searchAfter = cursor
* nextSearchAfter = cursorContinue
{quote}

+1

bq. I'm concerned people might think they can use the uniqueKey of the last 
document they got on the previous page

I tried making this mistake (using the trailing unique id (NOK in this 
example) as the searchAfter param value, and I got the following error message:

{code}
{
  responseHeader:{
status:400,
QTime:2},
  error:{
msg:Unable to parse search after totem: NOK,
code:400}}
{code}

I think that error message should include the param name ({{cursorContinue}}) 
that couldn't be parsed.

Also, maybe it would be useful to include a prefix that will (probably) never 
be used in unique ids, to visually identify the cursor as such: like always 
prepending '*'?  So your example of the future would become:

{code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=*AoEjTk9L}
{
  responseHeader:{
status:0,
QTime:7},
  response:{numFound:32,start:-1,docs:[
  // ... docs here...
]
  },
  cursorContinue:*AoEoMDU3OUIwMDI=}
{code}

The error message when someone gives an unparseable {{cursor}} could then 
include this piece of information: cursors begin with an asterisk.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional

[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)


[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843774#comment-13843774
 ] 

Steve Rowe commented on SOLR-5463:
--

Another idea about the cursor: the Base64-encoded text is used verbatim, 
including the trailing padding '=' characters - these could be stripped out for 
external use (since they're there just to make the string length divisible by 
four), and then added back before Base64-decoding.  In a URL non-metacharacter 
'='-s look weird, since they're already used to separate param names and values.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843827#comment-13843827
]

Mark Miller commented on SOLR-1301:
---

bq. I'm not aware of anything needing jersey except perhaps hadoop pulls that
in.

Yeah, tests use this for running hadoop.

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 5.0, 4.7

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4983) Problematic core naming by collection create API


[ 
https://issues.apache.org/jira/browse/SOLR-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843840#comment-13843840
 ] 

Noble Paul commented on SOLR-4983:
--

I think solving HIS problem alone is simple. If the collection is present in 
the same jvm it is very easy to do a lookup of the collection  and of there is 
a core that serves the collection set the fromIndex as that. If the user can 
ensure that all his collections are present in all nodes it will be ok. The 
hard part is making it work with a remote node 


 Problematic core naming by collection create API 
 -

 Key: SOLR-4983
 URL: https://issues.apache.org/jira/browse/SOLR-4983
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Chris Toomey

 The SolrCloud collection create API creates cores named 
 foo_shardx_replicay when asked to create collection foo.
 This is problematic for at least 2 reasons: 
 1) these ugly core names show up in the core admin UI, and will vary 
 depending on which node is being used,
 2) it prevents collections from being used in SolrCloud joins, since join 
 takes a core name as the fromIndex parameter and there's no single core name 
 for the collection.  As I've documented in 
 https://issues.apache.org/jira/browse/SOLR-4905 and 
 http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4074038.html,
  SolrCloud join does work when the inner collection (fromIndex) is not 
 sharded, assuming that collection is available and initialized at SolrCloud 
 bootstrap time.
 Could this be changed to instead use the collection name for the core name?  
 Or at least add a core-name option to the API?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843933#comment-13843933
 ] 

Noble Paul commented on SOLR-5473:
--

[~timp74]
The allCollections will store ALL collections. If you are looking at the trunk 
. There are no external collections in trunk yet. Please apply the patch and 
check


 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection