[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843025#comment-13843025 ] Markus Jelsma commented on SOLR-1632: - It is much faster now, even usable. But i haven't tried it in a larger cluster yet. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, distrib-2.patch, distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3191) field exclusion from fl
[ https://issues.apache.org/jira/browse/SOLR-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-3191: --- Assignee: (was: Shalin Shekhar Mangar) I don't have time right now to review this. I assigned it to myself because there was a lot of public interest but no assignee. However it looks like a couple of other committers have interest in this issue as well. I can only look at this after a few weeks so if no one takes it up, then I will. field exclusion from fl --- Key: SOLR-3191 URL: https://issues.apache.org/jira/browse/SOLR-3191 Project: Solr Issue Type: Improvement Reporter: Luca Cavanna Priority: Minor Attachments: SOLR-3191.patch, SOLR-3191.patch I think it would be useful to add a way to exclude field from the Solr response. If I have for example 100 stored fields and I want to return all of them but one, it would be handy to list just the field I want to exclude instead of the 99 fields for inclusion through fl. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843070#comment-13843070 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549552 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1549552 ] SOLR-5525 deprecate ClusterState#getCollectionStates() - Key: SOLR-5525 URL: https://issues.apache.org/jira/browse/SOLR-5525 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5525.patch, SOLR-5525.patch This is a very expensive call if there are are large no:of collections. Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843074#comment-13843074 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549554 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1549554 ] SOLR-5525 deprecate ClusterState#getCollectionStates() - Key: SOLR-5525 URL: https://issues.apache.org/jira/browse/SOLR-5525 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5525.patch, SOLR-5525.patch This is a very expensive call if there are are large no:of collections. Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4386) Variable expansion doesn't work in DIH SimplePropertiesWriter's filename
[ https://issues.apache.org/jira/browse/SOLR-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843099#comment-13843099 ] Ryuzo Yamamoto commented on SOLR-4386: -- Hi! Do you have plan to fix this? I also want to use variable expansion in SimplePropertiesWriter's filename. Variable expansion doesn't work in DIH SimplePropertiesWriter's filename Key: SOLR-4386 URL: https://issues.apache.org/jira/browse/SOLR-4386 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.1 Reporter: Jonas Birgander Assignee: Shalin Shekhar Mangar Labels: dataimport Attachments: SOLR-4386.patch I'm testing Solr 4.1, but I've run into some problems with DataImportHandler's new propertyWriter tag. I'm trying to use variable expansion in the `filename` field when using SimplePropertiesWriter. Here are the relevant parts of my configuration: conf/solrconfig.xml - requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst lst name=invariants !-- country_code is available -- str name=country_code${country_code}/str !-- In the real config, more variables are set here -- /lst /requestHandler conf/db-data-config.xml - dataConfig propertyWriter dateFormat=-MM-dd HH:mm:ss type=SimplePropertiesWriter directory=conf filename=${dataimporter.request.country_code}.dataimport.properties / dataSource type=JdbcDataSource driver=${dataimporter.request.db_driver} url=${dataimporter.request.db_url} user=${dataimporter.request.db_user} password=${dataimporter.request.db_password} batchSize=${dataimporter.request.db_batch_size} / document entity name=item query=my normal SQL, not really relevant -- country=${dataimporter.request.country_code} field column=id/ !-- ...more field tags... -- field column=$deleteDocById/ field column=$skipDoc/ /entity /document /dataConfig If country_code is set to gb, I want the last_index_time to be read and written in the file conf/gb.dataimport.properties, instead of the default conf/dataimport.properties The variable expansion works perfectly in the SQL and setup of the data source, but not in the property writer's filename field. When initiating an import, the log file shows: Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter maybeReloadConfiguration INFO: Loading DIH Configuration: db-data-config.xml Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema INFO: The field :$skipDoc present in DataConfig does not have a counterpart in Solr Schema Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema INFO: The field :$deleteDocById present in DataConfig does not have a counterpart in Solr Schema Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties WARNING: Unable to read: ${dataimporter.request.country_code}.dataimport.properties -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843217#comment-13843217 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549591 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1549591 ] SOLR-5525 deprecate ClusterState#getCollectionStates() - Key: SOLR-5525 URL: https://issues.apache.org/jira/browse/SOLR-5525 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5525.patch, SOLR-5525.patch This is a very expensive call if there are are large no:of collections. Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843220#comment-13843220 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549592 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1549592 ] SOLR-5525 deprecate ClusterState#getCollectionStates() - Key: SOLR-5525 URL: https://issues.apache.org/jira/browse/SOLR-5525 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5525.patch, SOLR-5525.patch This is a very expensive call if there are are large no:of collections. Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-5525. -- Resolution: Fixed deprecate ClusterState#getCollectionStates() - Key: SOLR-5525 URL: https://issues.apache.org/jira/browse/SOLR-5525 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5525.patch, SOLR-5525.patch This is a very expensive call if there are are large no:of collections. Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5473: - Attachment: SOLR-5473.patch a couple of tests fail Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843343#comment-13843343 ] Mark Miller commented on SOLR-1301: --- bq. if we need some of the classes this jar provides, we should declare direct dependencies on the appropriate artifacts. Right - Wolfgang likely knows best when it comes to Morphlines.. At a minimum we should pull the necessary jars in explicitly I think. I've got to take a look at what they are. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5467) Provide Solr Ref Guide in .epub format
[ https://issues.apache.org/jira/browse/SOLR-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843413#comment-13843413 ] Hoss Man commented on SOLR-5467: Thread where this initially came up: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3c528a1321.4060...@hebis.uni-frankfurt.de%3E Provide Solr Ref Guide in .epub format -- Key: SOLR-5467 URL: https://issues.apache.org/jira/browse/SOLR-5467 Project: Solr Issue Type: Wish Components: documentation Reporter: Cassandra Targett From the solr-user list, a request for an .epub version of the Solr Ref Guide. There are two possible approaches that immediately come to mind: * Ask infra to install a plugin that automatically outputs the Confluence pages in .epub * Investigate converting HTML export to .epub with something like calibre There might be other options, and there would be additional issues for automating the process of creation and publication, so for now just recording the request with an issue. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843443#comment-13843443 ] wolfgang hoschek commented on SOLR-1301: I'm not aware of anything needing jersey except perhaps hadoop pulls that in. The combined dependencies of all morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The dependencies of each individual morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The source and POMs are here, as usual: https://github.com/cloudera/cdk/tree/master/cdk-morphlines By the way, a somewhat separate issue is that it seems to me that the ivy dependences for solr-morphlines-core and solr-morphlines-cell and solr-map-reduce are a bit backwards upstream in that solr-morphlines-core pulls in a ton of dependencies that it doesn't need, and those deps should rather be pulled in by the solr-map-reduce (which is a essentially an out-of-the-box app). Would be good to organize ivy and mvn upstream in such a way that * solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all plus xyz * solr-morphlines-cell should depend on solr-morphlines-core plus xyz * solr-morphlines-core should depend on cdk-morphlines-core plus xyz More concretely, FWIW, to see how the deps look like in production releases downstream review the following POMs: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml and https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml and https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an
[jira] [Comment Edited] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843443#comment-13843443 ] wolfgang hoschek edited comment on SOLR-1301 at 12/9/13 7:30 PM: - I'm not aware of anything needing jersey except perhaps hadoop pulls that in. The combined dependencies of all morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The dependencies of each individual morphline modules is here: http://cloudera.github.io/cdk/docs/current/dependencies.html The source and POMs are here, as usual: https://github.com/cloudera/cdk/tree/master/cdk-morphlines By the way, a somewhat separate issue is that it seems to me that the ivy dependences for solr-morphlines-core and solr-morphlines-cell and solr-map-reduce are a bit backwards upstream in that currently solr-morphlines-core pulls in a ton of dependencies that it doesn't need, and those deps should rather be pulled in by the solr-map-reduce (which is a essentially an out-of-the-box app that bundles user level deps). Correspondingly, would be good to organize ivy and mvn upstream in such a way that * solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all minus cdk-morphlines-solr-cell (now upstream) minus cdk-morphlines-solr-core (now upstream) plus xyz * solr-morphlines-cell should depend on solr-morphlines-core plus xyz * solr-morphlines-core should depend on cdk-morphlines-core plus xyz More concretely, FWIW, to see how the deps look like in production releases downstream review the following POMs: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml and https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml and https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml was (Author: whoschek): I'm not aware of anything needing jersey except perhaps hadoop pulls that in. The combined dependencies of all morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The dependencies of each individual morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The source and POMs are here, as usual: https://github.com/cloudera/cdk/tree/master/cdk-morphlines By the way, a somewhat separate issue is that it seems to me that the ivy dependences for solr-morphlines-core and solr-morphlines-cell and solr-map-reduce are a bit backwards upstream in that solr-morphlines-core pulls in a ton of dependencies that it doesn't need, and those deps should rather be pulled in by the solr-map-reduce (which is a essentially an out-of-the-box app). Would be good to organize ivy and mvn upstream in such a way that * solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all plus xyz * solr-morphlines-cell should depend on solr-morphlines-core plus xyz * solr-morphlines-core should depend on cdk-morphlines-core plus xyz More concretely, FWIW, to see how the deps look like in production releases downstream review the following POMs: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml and https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml and https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS.
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843462#comment-13843462 ] Mark Miller commented on SOLR-5473: --- bq. if(debugState Best to do that with debug logging level rather than introduce a debug sys prop for this class. Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5542) Global query parameters to facet queries
Isaac Hebsh created SOLR-5542: - Summary: Global query parameters to facet queries Key: SOLR-5542 URL: https://issues.apache.org/jira/browse/SOLR-5542 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.6 Reporter: Isaac Hebsh (From the Mailing List) It seems that a facet query does not use the global query parameters (for example, field aliasing for edismax parser). We have an intensive use of facet queries (in some cases, we have a lot of facet.query for a single q), and the using of LocalParams for each facet.query is not convenient. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843496#comment-13843496 ] Steve Rowe commented on SOLR-1301: -- [~whoschek], I'm lost: what do you mean by upstream/downstream? In my experience, upstream refers to a parent project, i.e. one from which the project in question is derived, and downstream is the child/derived project. I don't know the history here, but you seem to be referring to the solr contribs when you say upstream? If that's true, then my understanding of these terms is the opposite of how you're using them. Maybe the question I should be asking is: what is/are the relationship(s) between/among cdk-morphlines-solr-* and solr-morphlines-*? And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843524#comment-13843524 ] Steve Rowe commented on SOLR-1301: -- bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? I can answer this one myself from [https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]: it's an aggregation-only modules that depends on all of the cdk-morphlines-* modules. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843523#comment-13843523 ] wolfgang hoschek commented on SOLR-1301: Apologies for the confusion. We are upstreaming cdk-morphlines-solr-cell into the solr contrib solr-morphlines-cell as well as cdk-morphlines-solr-core into the solr contrib solr-morphlines-core as well as search-mr into the solr contrib solr-map-reduce. Once the upstreaming is done these old modules will go away. Next, downstream will be made identical to upstream plus perhaps some critical fixes as necessary, and the upstream/downstream terms will apply in the way folks usually think about them, but we are not quite yet there today, but getting there... cdk-morphlines-all is simply a convenience pom that includes all the other morphline poms so there's less to type for users who like a bit more auto magic. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843524#comment-13843524 ] Steve Rowe edited comment on SOLR-1301 at 12/9/13 8:34 PM: --- bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? I can answer this one myself from [https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]: it's an aggregation-only module that depends on all of the cdk-morphlines-* modules. was (Author: steve_rowe): bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? I can answer this one myself from [https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]: it's an aggregation-only modules that depends on all of the cdk-morphlines-* modules. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5541: - Attachment: SOLR-5541.patch Added test case Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4983) Problematic core naming by collection create API
[ https://issues.apache.org/jira/browse/SOLR-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843571#comment-13843571 ] Mark Miller commented on SOLR-4983: --- bq. could anyone suggest if by creating cores separately (with the same collection name) we would achieve the same effect as creating collection via Collections API? By and large, currently, yes, this is supported. There is a flag that tracks if the collection was created with the collections api or not - and if it is, you will end up being able to use further features in the future - but currently you should be able to use the cores api to do what you want no problem. Problematic core naming by collection create API - Key: SOLR-4983 URL: https://issues.apache.org/jira/browse/SOLR-4983 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Chris Toomey The SolrCloud collection create API creates cores named foo_shardx_replicay when asked to create collection foo. This is problematic for at least 2 reasons: 1) these ugly core names show up in the core admin UI, and will vary depending on which node is being used, 2) it prevents collections from being used in SolrCloud joins, since join takes a core name as the fromIndex parameter and there's no single core name for the collection. As I've documented in https://issues.apache.org/jira/browse/SOLR-4905 and http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4074038.html, SolrCloud join does work when the inner collection (fromIndex) is not sharded, assuming that collection is available and initialized at SolrCloud bootstrap time. Could this be changed to instead use the collection name for the core name? Or at least add a core-name option to the API? -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields Error Message: -60 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: -60 at __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) at java.util.ArrayList.get(ArrayList.java:324) at org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) at org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) at org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) at org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843573#comment-13843573 ] Mark Miller commented on SOLR-5541: --- +1 One comment: + assertQ(All six should make it, req Should update the copy/paste assert comment - only 5 should make it because b is excluded. Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1 On Mon, Dec 9, 2013 at 9:24 PM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields Error Message: -60 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: -60 at __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) at java.util.ArrayList.get(ArrayList.java:324) at org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) at org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) at org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) at org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
I committed a fix On Mon, Dec 9, 2013 at 9:36 PM, Simon Willnauer sim...@apache.org wrote: nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1 On Mon, Dec 9, 2013 at 9:24 PM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields Error Message: -60 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: -60 at __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) at java.util.ArrayList.get(ArrayList.java:324) at org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) at org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) at org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) at org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843584#comment-13843584 ] Joel Bernstein commented on SOLR-5541: -- Thanks Mark, I'll fix that up. Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
Hurray for the improbable... :) D. On Mon, Dec 9, 2013 at 10:36 PM, Simon Willnauer sim...@apache.org wrote: nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1 On Mon, Dec 9, 2013 at 9:24 PM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields Error Message: -60 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: -60 at __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) at java.util.ArrayList.get(ArrayList.java:324) at org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) at org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) at org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) at org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5541: - Attachment: SOLR-5541.patch Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5364) Review usages of hard-coded Version constants
[ https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843652#comment-13843652 ] ASF subversion and git services commented on LUCENE-5364: - Commit 1549701 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1549701 ] LUCENE-5364: Replace hard-coded Version.LUCENE_XY that doesn't have to be hard-coded (because of back-compat testing or version dependent behavior, or demo code that should exemplify pinning versions in user code), with Version.LUCENE_CURRENT in non-test code, or with LuceneTestCase.TEST_VERSION_CURRENT in test code; upgrade hard-coded Version.LUCENE_XY constants that should track the next release version to the next release version if they aren't already there, and put a token near them so that they can be found and upgraded when the next release version changes: ':Post-Release-Update-Version.LUCENE_XY:' Review usages of hard-coded Version constants - Key: LUCENE-5364 URL: https://issues.apache.org/jira/browse/LUCENE-5364 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 5.0, 4.7 Reporter: Steve Rowe Priority: Minor Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, LUCENE-5364-trunk.patch There are some hard-coded {{Version.LUCENE_XY}} constants used in various places. Some of these are intentional and appropriate: * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses {{Version.LUCENE_31}} * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other analysis components) * to test different behavior at different points in history (e.g. {{TestStopFilter}} to test position increments) But should hard-coded constants be used elsewhere? For those that should remain, and need to be updated with each release, there should be an easy way to find them. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5364) Review usages of hard-coded Version constants
[ https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843658#comment-13843658 ] ASF subversion and git services commented on LUCENE-5364: - Commit 1549703 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1549703 ] LUCENE-5364: Replace hard-coded Version.LUCENE_XY that doesn't have to be hard-coded (because of back-compat testing or version dependent behavior, or demo code that should exemplify pinning versions in user code), with Version.LUCENE_CURRENT in non-test code, or with LuceneTestCase.TEST_VERSION_CURRENT in test code; upgrade hard-coded Version.LUCENE_XY constants that should track the next release version to the next release version if they aren't already there, and put a token near them so that they can be found and upgraded when the next release version changes: ':Post-Release-Update-Version.LUCENE_XY:' (merge trunk r1549701) Review usages of hard-coded Version constants - Key: LUCENE-5364 URL: https://issues.apache.org/jira/browse/LUCENE-5364 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 5.0, 4.7 Reporter: Steve Rowe Priority: Minor Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, LUCENE-5364-trunk.patch There are some hard-coded {{Version.LUCENE_XY}} constants used in various places. Some of these are intentional and appropriate: * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses {{Version.LUCENE_31}} * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other analysis components) * to test different behavior at different points in history (e.g. {{TestStopFilter}} to test position increments) But should hard-coded constants be used elsewhere? For those that should remain, and need to be updated with each release, there should be an easy way to find them. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5463: --- Attachment: SOLR-5463__straw_man.patch Ok, updated patch making the change in user semantics I mentioned wanting to try last week. Best way to explain it is with a walk through of a simple example (note: if you try the current strawman code, the numFound and start values returned in the docList don't match what i've pasted in the examples below -- these examples show what the final results should look like in the finished solution) Initial requests using searchAfter should always start with a totem value of {{\*}} {code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+descsearchAfter=*} { responseHeader:{ status:0, QTime:2}, response:{numFound:32,start:-1,docs:[ // ...20 docs here... ] }, nextSearchAfter:AoEjTk9L} {code} The {{nextSearchAfter}} token returned by this request tells us what to use in the second request... {code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+descsearchAfter=AoEjTk9L} { responseHeader:{ status:0, QTime:7}, response:{numFound:32,start:-1,docs:[ // ...12 docs here... ] }, nextSearchAfter:AoEoMDU3OUIwMDI=} {code} Since this result block contains fewer rows then were requested, the client could automatically stop, but the {{nextSearchAfter}} is still returned, and it's still safe to request a subsequent page (this is the fundemental diff from the previous patches, where {{nextSearchAfter}} was set to {{null}} anytime the code could tell there were no more results ... {code:title=http://localhost:8983/solr/deep?q=*:*wt=jsonindent=truerows=20fl=id,pricesort=id+descsearchAfter=AoEoMDU3OUIwMDI=} { responseHeader:{ status:0, QTime:1}, response:{numFound:32,start:-1,docs:[] }, nextSearchAfter:AoEoMDU3OUIwMDI=} {code} Note that in this case, with no docs included in the response, the {{nextSearchAfter}} totem is the same as the input. For some sorts this makes it possible for clients to resume a full walk of all documents matching a query -- picking up where they let off if more documents are added to the index that match (for example: when doing an ascending sort on a numeric uniqueKey field that always increases as new docs are added, sorting by a timestamp field (asc) indicating when documents are crawled, etc...) This also works as you would expect for searches that don't match any documents... {code:title=http://localhost:8983/solr/deep?q=text:bogusrows=20sort=id+descsearchAfter=*} { responseHeader:{ status:0, QTime:21}, response:{numFound:0,start:-1,docs:[] }, nextSearchAfter:*} {code} Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843687#comment-13843687 ] Hoss Man commented on SOLR-5463: The one significant change i still want to make before abandoming this straw man and moving on to using PaginatingCollector under the covers is to rethink the vocabulary. at the Lucene/IndexSearcher level, this functionality is leveraged using a searchAfter param which indicates the exact FieldDoc returned by a previous search. The name makes a lot of sense in this API given that the FieldDoc you specify is expected to come from a previous search, and you are specifying that you want to search for documents after this document in the ocntext of the specified query/sort. For the Solr request API however, I feel like this terminology might confuse people. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page (instead of realizing they need to specify the special token they were returned as part of that page). My thinking is that from a user perspective, we should call this functionality a Result Cursor and rename the request param and response key appropriately. something along the lines of... {code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=AoEjTk9L} { responseHeader:{ status:0, QTime:7}, response:{numFound:32,start:-1,docs:[ // ... docs here... ] }, cursorContinue:AoEoMDU3OUIwMDI=} {code} * searchAfter = cursor * nextSearchAfter = cursorContinue What do folks think? Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5364) Review usages of hard-coded Version constants
[ https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-5364. Resolution: Fixed Fix Version/s: 4.7 5.0 Assignee: Steve Rowe Lucene Fields: New,Patch Available (was: New) Committed to trunk and branch_4x. I added a note to the Lucene ReleaseToDo wiki page about using {{:Post-Release-Update-Version.LUCENE_XY:}} to find constants that should be upgraded to the next release version after a release branch has been cut. Review usages of hard-coded Version constants - Key: LUCENE-5364 URL: https://issues.apache.org/jira/browse/LUCENE-5364 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 5.0, 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.7 Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, LUCENE-5364-trunk.patch There are some hard-coded {{Version.LUCENE_XY}} constants used in various places. Some of these are intentional and appropriate: * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses {{Version.LUCENE_31}} * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other analysis components) * to test different behavior at different points in history (e.g. {{TestStopFilter}} to test position increments) But should hard-coded constants be used elsewhere? For those that should remain, and need to be updated with each release, there should be an easy way to find them. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843748#comment-13843748 ] Steve Rowe commented on SOLR-5463: -- {quote} * searchAfter = cursor * nextSearchAfter = cursorContinue {quote} +1 bq. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page I tried making this mistake (using the trailing unique id (NOK in this example) as the searchAfter param value, and I got the following error message: {code} { responseHeader:{ status:400, QTime:2}, error:{ msg:Unable to parse search after totem: NOK, code:400}} {code} I think that error message should include the param name ({{cursorContinue}}) that couldn't be parsed. Also, maybe it would be useful to include a prefix that will (probably) never be used in unique ids, to visually identify the cursor as such: like always prepending '*'? So your example of the future would become: {code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=*AoEjTk9L} { responseHeader:{ status:0, QTime:7}, response:{numFound:32,start:-1,docs:[ // ... docs here... ] }, cursorContinue:*AoEoMDU3OUIwMDI=} {code} The error message when someone gives an unparseable {{cursor}} could then include this piece of information: cursors begin with an asterisk. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843749#comment-13843749 ] Timothy Potter commented on SOLR-5473: -- Thanks for fixing the CloudSolrServerTest failure ... One thing I wasn't sure about when looking over the latest patch was whether allCollections in ZkStateReader will hold the names of external collections? I assume so by the name *all* but it doesn't seem like any external collection names are added to that Set currently. Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843748#comment-13843748 ] Steve Rowe edited comment on SOLR-5463 at 12/10/13 12:21 AM: - {quote} * searchAfter = cursor * nextSearchAfter = cursorContinue {quote} +1 bq. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page I tried making this mistake (using the trailing unique id (NOK in this example) as the searchAfter param value, and I got the following error message: {code} { responseHeader:{ status:400, QTime:2}, error:{ msg:Unable to parse search after totem: NOK, code:400}} {code} (*edit*: {{cursorContinue}} = {{cursor}} in the sentence below) I think that error message should include the param name ({{cursor}}) that couldn't be parsed. Also, maybe it would be useful to include a prefix that will (probably) never be used in unique ids, to visually identify the cursor as such: like always prepending '*'? So your example of the future would become: {code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=*AoEjTk9L} { responseHeader:{ status:0, QTime:7}, response:{numFound:32,start:-1,docs:[ // ... docs here... ] }, cursorContinue:*AoEoMDU3OUIwMDI=} {code} The error message when someone gives an unparseable {{cursor}} could then include this piece of information: cursors begin with an asterisk. was (Author: steve_rowe): {quote} * searchAfter = cursor * nextSearchAfter = cursorContinue {quote} +1 bq. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page I tried making this mistake (using the trailing unique id (NOK in this example) as the searchAfter param value, and I got the following error message: {code} { responseHeader:{ status:400, QTime:2}, error:{ msg:Unable to parse search after totem: NOK, code:400}} {code} I think that error message should include the param name ({{cursorContinue}}) that couldn't be parsed. Also, maybe it would be useful to include a prefix that will (probably) never be used in unique ids, to visually identify the cursor as such: like always prepending '*'? So your example of the future would become: {code:title=http://localhost:8983/solr/deep?q=*:*rows=20sort=id+desccursor=*AoEjTk9L} { responseHeader:{ status:0, QTime:7}, response:{numFound:32,start:-1,docs:[ // ... docs here... ] }, cursorContinue:*AoEoMDU3OUIwMDI=} {code} The error message when someone gives an unparseable {{cursor}} could then include this piece of information: cursors begin with an asterisk. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843774#comment-13843774 ] Steve Rowe commented on SOLR-5463: -- Another idea about the cursor: the Base64-encoded text is used verbatim, including the trailing padding '=' characters - these could be stripped out for external use (since they're there just to make the string length divisible by four), and then added back before Base64-decoding. In a URL non-metacharacter '='-s look weird, since they're already used to separate param names and values. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843827#comment-13843827 ] Mark Miller commented on SOLR-1301: --- bq. I'm not aware of anything needing jersey except perhaps hadoop pulls that in. Yeah, tests use this for running hadoop. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4983) Problematic core naming by collection create API
[ https://issues.apache.org/jira/browse/SOLR-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843840#comment-13843840 ] Noble Paul commented on SOLR-4983: -- I think solving HIS problem alone is simple. If the collection is present in the same jvm it is very easy to do a lookup of the collection and of there is a core that serves the collection set the fromIndex as that. If the user can ensure that all his collections are present in all nodes it will be ok. The hard part is making it work with a remote node Problematic core naming by collection create API - Key: SOLR-4983 URL: https://issues.apache.org/jira/browse/SOLR-4983 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Chris Toomey The SolrCloud collection create API creates cores named foo_shardx_replicay when asked to create collection foo. This is problematic for at least 2 reasons: 1) these ugly core names show up in the core admin UI, and will vary depending on which node is being used, 2) it prevents collections from being used in SolrCloud joins, since join takes a core name as the fromIndex parameter and there's no single core name for the collection. As I've documented in https://issues.apache.org/jira/browse/SOLR-4905 and http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4074038.html, SolrCloud join does work when the inner collection (fromIndex) is not sharded, assuming that collection is available and initialized at SolrCloud bootstrap time. Could this be changed to instead use the collection name for the core name? Or at least add a core-name option to the API? -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843933#comment-13843933 ] Noble Paul commented on SOLR-5473: -- [~timp74] The allCollections will store ALL collections. If you are looking at the trunk . There are no external collections in trunk yet. Please apply the patch and check Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843935#comment-13843935 ] Noble Paul commented on SOLR-5473: -- bq. if(debugState Thanks for the suggestion.However ,I added it for my dev testing will be removed before commit. Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6
Bill Bell created SOLR-5543: --- Summary: solr.xml duplicat eentries after SWAP 4.6 Key: SOLR-5543 URL: https://issues.apache.org/jira/browse/SOLR-5543 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Bill Bell We are having issues with SWAP CoreAdmin in 4.6. Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. It has been running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5544) Log spamming DefaultSolrHighlighter
MANISH KUMAR created SOLR-5544: -- Summary: Log spamming DefaultSolrHighlighter Key: SOLR-5544 URL: https://issues.apache.org/jira/browse/SOLR-5544 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.0 Reporter: MANISH KUMAR In DefaultSolrHighlighter.java The method useFastVectorHighlighter has log.warn( Solr will use Highlighter instead of FastVectorHighlighter because {} field does not store TermPositions and TermOffsets., fieldName ); Above method gets called each field and there could be cases where TermPositions TermOffsets are not stored. The above line causes huge spamming of logs. It should be at max a DEBUG level log which will give flexibility of turning it off in production environments. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5544) Log spamming DefaultSolrHighlighter
[ https://issues.apache.org/jira/browse/SOLR-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MANISH KUMAR updated SOLR-5544: --- Priority: Minor (was: Major) Log spamming DefaultSolrHighlighter --- Key: SOLR-5544 URL: https://issues.apache.org/jira/browse/SOLR-5544 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.0 Reporter: MANISH KUMAR Priority: Minor In DefaultSolrHighlighter.java The method useFastVectorHighlighter has log.warn( Solr will use Highlighter instead of FastVectorHighlighter because {} field does not store TermPositions and TermOffsets., fieldName ); Above method gets called each field and there could be cases where TermPositions TermOffsets are not stored. The above line causes huge spamming of logs. It should be at max a DEBUG level log which will give flexibility of turning it off in production environments. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5544) Log spamming by DefaultSolrHighlighter
[ https://issues.apache.org/jira/browse/SOLR-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MANISH KUMAR updated SOLR-5544: --- Summary: Log spamming by DefaultSolrHighlighter (was: Log spamming DefaultSolrHighlighter) Log spamming by DefaultSolrHighlighter -- Key: SOLR-5544 URL: https://issues.apache.org/jira/browse/SOLR-5544 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.0 Reporter: MANISH KUMAR Priority: Minor In DefaultSolrHighlighter.java The method useFastVectorHighlighter has log.warn( Solr will use Highlighter instead of FastVectorHighlighter because {} field does not store TermPositions and TermOffsets., fieldName ); Above method gets called each field and there could be cases where TermPositions TermOffsets are not stored. The above line causes huge spamming of logs. It should be at max a DEBUG level log which will give flexibility of turning it off in production environments. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org