[jira] [Created] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light
Radu Ghita created SOLR-5075: Summary: SolrCloud commit process is too time consuming, even if documents are light Key: SOLR-5075 URL: https://issues.apache.org/jira/browse/SOLR-5075 Project: Solr Issue Type: Bug Components: Schema and Analysis, SolrCloud Affects Versions: 4.1 Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer. Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb SSD and 50tb SAS memory Reporter: Radu Ghita We are having a client with business model that requires indexing each month billion rows into solr from mysql in a small time-frame. The documents are very light, but the number is very high and we need to achieve speeds of around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after some hours ( ~12 ) it crashes and the speed slows down as hours go by. Therefore we have developed a custom java importer that connects directly to mysql and solrcloud via zookeeper, grabs data from mysql, creates documents and then imports into solr. This helps because we are opening ~50 threads and the indexing process speeds up. We have optimized the mysql queries ( mysql was the initial bottleneck ) and the speeds we get now are over 100k/s, but as index number gets bigger, solr stays very long on adding documents. I assume it needs to be something from solrconfig that makes solr stay and even block after 100 mil documents indexed. Here is the java code that creates documents and then adds to solr server: public void createDocuments() throws SQLException, SolrServerException, IOException { App.logger.write(Creating documents..); this.docs = new ArrayListSolrInputDocument(); App.logger.incrementNumberOfRows(this.size); while(this.results.next()) { this.docs.add(this.getDocumentFromResultSet(this.results)); } this.statement.close(); this.results.close(); } public void commitDocuments() throws SolrServerException, IOException { App.logger.write(Committing..); App.solrServer.add(this.docs); // here it stays very long and then blocks App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); } I am also pasting solrconfig.xml parameters that make sense to this discussion: maxIndexingThreads128/maxIndexingThreads useCompoundFilefalse/useCompoundFile ramBufferSizeMB1/ramBufferSizeMB maxBufferedDocs100/maxBufferedDocs mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce2/int int name=segmentsPerTier100/int int name=maxMergeAtOnceExplicit1/int /mergePolicy mergeFactor100/mergeFactor termIndexInterval1024/termIndexInterval autoCommit maxTime15000/maxTime maxDocs100/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime200/maxTime /autoSoftCommit Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. If there's any other info needed please let me know. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf
I noticed that the ref guide was missing a page about upgrading to Solr 4.4 (there is such a page for releases 4.1 through 4.3), so I created one, based on upgrade notes and bug fixes from CHANGES.txt: https://cwiki.apache.org/confluence/display/solr/Upgrading+to+Solr+4.4. Please edit it to make it better if you notice any problems. I think we should respin for this. I've added a new section Pre-publication actions, and a bullet point about creating the per-release upgrade page to https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation. But maybe we should have a different page dedicated to this and similar activities? (Or maybe there already is one?) I also noticed that David Smiley made a bunch of modifications, AFAICT to spatial and related topics, and it would be good to include those. Steve On Jul 24, 2013, at 8:30 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC1.pdf As this is our first Documentation release VOTE, folks may wish to familiarize themselves with the doc release process that I posted a while back, but got very little (none if I remember correctly) feedback on... https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf
Crap, I just noticed Hoss's https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr, which is a shorter version of the 4.4 upgrade notes page I just created. Hoss, can you take a look at my new page and see if any of the extra stuff I've included beyond your page should be incorporated? It feels weird to have upgrade notes for different versions in multiple places - maybe the previous release upgrade pages could stay where they are, but add references to them from the current release upgrade notes? Actually, it also seems weird that the previous 4.X upgrade notes are under the Major Changes from Solr 3 to Solr 4 page in the left-hand navigation pane. Depending on the nature of any changes we make for this, the new Pre-publication acitons section on the Howto publish internal page will need to be adjusted. Steve On Jul 25, 2013, at 4:29 AM, Steve Rowe sar...@gmail.com wrote: I noticed that the ref guide was missing a page about upgrading to Solr 4.4 (there is such a page for releases 4.1 through 4.3), so I created one, based on upgrade notes and bug fixes from CHANGES.txt: https://cwiki.apache.org/confluence/display/solr/Upgrading+to+Solr+4.4. Please edit it to make it better if you notice any problems. I think we should respin for this. I've added a new section Pre-publication actions, and a bullet point about creating the per-release upgrade page to https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation. But maybe we should have a different page dedicated to this and similar activities? (Or maybe there already is one?) I also noticed that David Smiley made a bunch of modifications, AFAICT to spatial and related topics, and it would be good to include those. Steve On Jul 24, 2013, at 8:30 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC1.pdf As this is our first Documentation release VOTE, folks may wish to familiarize themselves with the doc release process that I posted a while back, but got very little (none if I remember correctly) feedback on... https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5131) CheckIndex is confusing for docvalues fields
[ https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719404#comment-13719404 ] Adrien Grand commented on LUCENE-5131: -- Definitely +1 for this patch and printing statistics about unique value counts for SORTED and SORTED_SET. CheckIndex is confusing for docvalues fields Key: LUCENE-5131 URL: https://issues.apache.org/jira/browse/LUCENE-5131 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5131.patch, LUCENE-5131.patch it prints things like: {noformat} test: docvalues...OK [0 total doc count; 18 docvalues fields] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719417#comment-13719417 ] Elran Dvir commented on SOLR-2894: -- I have downloaded the source code from Solr's website. Then opened it with my IDE: Intellij. when I tried applying the patch, Intellij reported there were problems with some files. Thanks. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.5 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719512#comment-13719512 ] Otis Gospodnetic commented on SOLR-5045: [~joel.bernstein] how does this play with SOLR-2894? Overlap? Is the plan to be able to use this approach here to implement SOLR-2894 later on? Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lst name=aggregates lst name=mysum long name=sum85/long /lst /lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light
[ https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-5075. Resolution: Invalid SolrCloud commit process is too time consuming, even if documents are light --- Key: SOLR-5075 URL: https://issues.apache.org/jira/browse/SOLR-5075 Project: Solr Issue Type: Bug Components: Schema and Analysis, SolrCloud Affects Versions: 4.1 Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer. Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb SSD and 50tb SAS memory Reporter: Radu Ghita Labels: import, solrconfig.xml We are having a client with business model that requires indexing each month billion rows into solr from mysql in a small time-frame. The documents are very light, but the number is very high and we need to achieve speeds of around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after some hours ( ~12 ) it crashes and the speed slows down as hours go by. Therefore we have developed a custom java importer that connects directly to mysql and solrcloud via zookeeper, grabs data from mysql, creates documents and then imports into solr. This helps because we are opening ~50 threads and the indexing process speeds up. We have optimized the mysql queries ( mysql was the initial bottleneck ) and the speeds we get now are over 100k/s, but as index number gets bigger, solr stays very long on adding documents. I assume it needs to be something from solrconfig that makes solr stay and even block after 100 mil documents indexed. Here is the java code that creates documents and then adds to solr server: public void createDocuments() throws SQLException, SolrServerException, IOException { App.logger.write(Creating documents..); this.docs = new ArrayListSolrInputDocument(); App.logger.incrementNumberOfRows(this.size); while(this.results.next()) { this.docs.add(this.getDocumentFromResultSet(this.results)); } this.statement.close(); this.results.close(); } public void commitDocuments() throws SolrServerException, IOException { App.logger.write(Committing..); App.solrServer.add(this.docs); // here it stays very long and then blocks App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); } I am also pasting solrconfig.xml parameters that make sense to this discussion: maxIndexingThreads128/maxIndexingThreads useCompoundFilefalse/useCompoundFile ramBufferSizeMB1/ramBufferSizeMB maxBufferedDocs100/maxBufferedDocs mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce2/int int name=segmentsPerTier100/int int name=maxMergeAtOnceExplicit1/int /mergePolicy mergeFactor100/mergeFactor termIndexInterval1024/termIndexInterval autoCommit maxTime15000/maxTime maxDocs100/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime200/maxTime /autoSoftCommit Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. If there's any other info needed please let me know. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light
[ https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719527#comment-13719527 ] Otis Gospodnetic commented on SOLR-5075: [~r...@wmds.ro] you should close this issue and ask on the solr-user mailing list. SolrCloud commit process is too time consuming, even if documents are light --- Key: SOLR-5075 URL: https://issues.apache.org/jira/browse/SOLR-5075 Project: Solr Issue Type: Bug Components: Schema and Analysis, SolrCloud Affects Versions: 4.1 Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer. Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb SSD and 50tb SAS memory Reporter: Radu Ghita Labels: import, solrconfig.xml We are having a client with business model that requires indexing each month billion rows into solr from mysql in a small time-frame. The documents are very light, but the number is very high and we need to achieve speeds of around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after some hours ( ~12 ) it crashes and the speed slows down as hours go by. Therefore we have developed a custom java importer that connects directly to mysql and solrcloud via zookeeper, grabs data from mysql, creates documents and then imports into solr. This helps because we are opening ~50 threads and the indexing process speeds up. We have optimized the mysql queries ( mysql was the initial bottleneck ) and the speeds we get now are over 100k/s, but as index number gets bigger, solr stays very long on adding documents. I assume it needs to be something from solrconfig that makes solr stay and even block after 100 mil documents indexed. Here is the java code that creates documents and then adds to solr server: public void createDocuments() throws SQLException, SolrServerException, IOException { App.logger.write(Creating documents..); this.docs = new ArrayListSolrInputDocument(); App.logger.incrementNumberOfRows(this.size); while(this.results.next()) { this.docs.add(this.getDocumentFromResultSet(this.results)); } this.statement.close(); this.results.close(); } public void commitDocuments() throws SolrServerException, IOException { App.logger.write(Committing..); App.solrServer.add(this.docs); // here it stays very long and then blocks App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); } I am also pasting solrconfig.xml parameters that make sense to this discussion: maxIndexingThreads128/maxIndexingThreads useCompoundFilefalse/useCompoundFile ramBufferSizeMB1/ramBufferSizeMB maxBufferedDocs100/maxBufferedDocs mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce2/int int name=segmentsPerTier100/int int name=maxMergeAtOnceExplicit1/int /mergePolicy mergeFactor100/mergeFactor termIndexInterval1024/termIndexInterval autoCommit maxTime15000/maxTime maxDocs100/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime200/maxTime /autoSoftCommit Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. If there's any other info needed please let me know. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light
[ https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719537#comment-13719537 ] Erick Erickson commented on SOLR-5075: -- FWIW, I was about to say the same thing, but had one comment. SOLR-4816 (not in 4.4, but soon) should add some efficiencies to SolrJ updating, I'd love to see what the effects in your situation are. One thing, it looks like you're accumulating all the docs from the select in one huge batch and indexing them all at once. If that's true, try submitting them, say, 1,000 at a time. I suspect that will not hang, but I also suspect that will slow your initial ingest rate because you'll actually be sending docs to Solr rather than just accumulating them all locally. SolrCloud commit process is too time consuming, even if documents are light --- Key: SOLR-5075 URL: https://issues.apache.org/jira/browse/SOLR-5075 Project: Solr Issue Type: Bug Components: Schema and Analysis, SolrCloud Affects Versions: 4.1 Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer. Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb SSD and 50tb SAS memory Reporter: Radu Ghita Labels: import, solrconfig.xml We are having a client with business model that requires indexing each month billion rows into solr from mysql in a small time-frame. The documents are very light, but the number is very high and we need to achieve speeds of around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after some hours ( ~12 ) it crashes and the speed slows down as hours go by. Therefore we have developed a custom java importer that connects directly to mysql and solrcloud via zookeeper, grabs data from mysql, creates documents and then imports into solr. This helps because we are opening ~50 threads and the indexing process speeds up. We have optimized the mysql queries ( mysql was the initial bottleneck ) and the speeds we get now are over 100k/s, but as index number gets bigger, solr stays very long on adding documents. I assume it needs to be something from solrconfig that makes solr stay and even block after 100 mil documents indexed. Here is the java code that creates documents and then adds to solr server: public void createDocuments() throws SQLException, SolrServerException, IOException { App.logger.write(Creating documents..); this.docs = new ArrayListSolrInputDocument(); App.logger.incrementNumberOfRows(this.size); while(this.results.next()) { this.docs.add(this.getDocumentFromResultSet(this.results)); } this.statement.close(); this.results.close(); } public void commitDocuments() throws SolrServerException, IOException { App.logger.write(Committing..); App.solrServer.add(this.docs); // here it stays very long and then blocks App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); } I am also pasting solrconfig.xml parameters that make sense to this discussion: maxIndexingThreads128/maxIndexingThreads useCompoundFilefalse/useCompoundFile ramBufferSizeMB1/ramBufferSizeMB maxBufferedDocs100/maxBufferedDocs mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce2/int int name=segmentsPerTier100/int int name=maxMergeAtOnceExplicit1/int /mergePolicy mergeFactor100/mergeFactor termIndexInterval1024/termIndexInterval autoCommit maxTime15000/maxTime maxDocs100/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime200/maxTime /autoSoftCommit Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. If there's any other info needed please let me know. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719540#comment-13719540 ] Otis Gospodnetic commented on SOLR-5069: This is great to see - I asked about this in SOLR-1301 - https://issues.apache.org/jira/browse/SOLR-1301?focusedCommentId=13678948page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13678948 :) {quote} The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). {quote} Lukas and Andrzej have already addressed my immediate thought when I read the above, but they talked about using the cost approach, limiting resource use, and such. But I think we should learn from others' mistakes and choices here. Is it good enough to limit resources? Just limiting resources means that any concurrent queries *will* be effected - the question is just how much. Would it be better to mark some nodes as eligible for running analytical/batch/MR jobs + search or eligible for running analytical/batch/MR jobs and NO search - i.e. nodes that are a part of the SolrCloud cluster, but run ONLY these jobs and do NOT handle queries? I think we saw DataStax do this with Cassandra and Brisk and we see that with people using HBase with HBase replication and using one HBase cluster for real-time/interactive access and the other cluster for running jobs. MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count =
[jira] [Updated] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result
[ https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5133: --- Attachment: LUCENE-5133.patch Thanks Shai, new patch attached. AnalyzingInfixSuggester should return structured highlighted results instead of single String per result Key: LUCENE-5133 URL: https://issues.apache.org/jira/browse/LUCENE-5133 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.0, 4.5 Attachments: LUCENE-5133.patch, LUCENE-5133.patch Today it renders to an HTML string (b../b for hits) in protected methods that one can override to change the highlighting, but this is hard/inefficient to use for search servers that want to e.g. return JSON representation of the highlighted result. This is the same issue as LUCENE-4906 (PostingsHighlighter) but for AnalyzingInfixSuggester's highlights instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719550#comment-13719550 ] Noble Paul commented on SOLR-5069: -- bq.Would it be better to mark some nodes as eligible for running analytical/batch/MR jobs + search Instead of marking certain nodes as (eligible for X)how about passing the node names in the request itself ? That way we are not introducing some kind of 'role' in the system but still get all the benefits? MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719554#comment-13719554 ] Otis Gospodnetic commented on SOLR-5069: bq. Instead of marking certain nodes as (eligible for X)how about passing the node names in the request itself ? That way we are not introducing some kind of 'role' in the system but still get all the benefits? But if searches are running on *all* nodes, then the above doesn't achieve complete separation of search vs. job work. MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719556#comment-13719556 ] Noble Paul commented on SOLR-5069: -- bq.But if searches are running on all nodes, then the above doesn't achieve complete separation of search vs. job work. makes sense... MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719556#comment-13719556 ] Noble Paul edited comment on SOLR-5069 at 7/25/13 12:11 PM: bq.But if searches are running on all nodes, then the above doesn't achieve complete separation of search vs. job work. makes sense. It should be something we should think of as a feature of Solr. Being a part of a cluster but not taking part in certain roles (leader/search/jobs etc) was (Author: noble.paul): bq.But if searches are running on all nodes, then the above doesn't achieve complete separation of search vs. job work. makes sense... MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result
[ https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719559#comment-13719559 ] Shai Erera commented on LUCENE-5133: Looks good. AnalyzingInfixSuggester should return structured highlighted results instead of single String per result Key: LUCENE-5133 URL: https://issues.apache.org/jira/browse/LUCENE-5133 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.0, 4.5 Attachments: LUCENE-5133.patch, LUCENE-5133.patch Today it renders to an HTML string (b../b for hits) in protected methods that one can override to change the highlighting, but this is hard/inefficient to use for search servers that want to e.g. return JSON representation of the highlighted result. This is the same issue as LUCENE-4906 (PostingsHighlighter) but for AnalyzingInfixSuggester's highlights instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719563#comment-13719563 ] Otis Gospodnetic commented on SOLR-5069: bq. It should be something we should think of as a feature of Solr. Being a part of a cluster but not taking part in certain roles (leader/search/jobs etc Yeah, perhaps something like that. We already have Overseer and Leader, which are also roles of some sort, though those are completely managed by SolrCloud, meaning SolrCloud/ZK do the node election and node assignment for these particular roles, AFAIK. While for search vs. job (vs. mixed) role the assignment is likely to come from a human+ZK. MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5134) Consider implementing lookback merge policy
Otis Gospodnetic created LUCENE-5134: Summary: Consider implementing lookback merge policy Key: LUCENE-5134 URL: https://issues.apache.org/jira/browse/LUCENE-5134 Project: Lucene - Core Issue Type: Improvement Reporter: Otis Gospodnetic Priority: Minor In http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Mike mentioned lookahead as something that could possibly yield more optimal merges. But what about lookback? What if some sort of stats were kept about about which segments were picked for merges? With some sort of stats in hand, could one look back and, knowing what happened after those merges, evaluate if more optimal merge choices could have been made and then use that next time? See http://search-lucene.com/m/D7ypz1gT2H91 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5135) Consider time-based MergeScheduler
Otis Gospodnetic created LUCENE-5135: Summary: Consider time-based MergeScheduler Key: LUCENE-5135 URL: https://issues.apache.org/jira/browse/LUCENE-5135 Project: Lucene - Core Issue Type: Improvement Reporter: Otis Gospodnetic Priority: Minor Very often search traffic follows the wave pattern, which could mean that more aggressive merging could be done during periods with lower query rates (e.g. nights and weekends) ... or maybe during that time more segments could be allowed to live in the index, assuming that after allowing that for some time, the subsequent merge could be bigger/more thorough, so to speak. See http://search-lucene.com/m/D7ypz1gT2H91 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lookback and/or time-aware Merge Policy?
Thanks for showing I wasn't completely crazy to think this made sense, Mike. I added: https://issues.apache.org/jira/browse/LUCENE-5134 https://issues.apache.org/jira/browse/LUCENE-5135 Otis On Mon, Jul 15, 2013 at 1:28 PM, Michael McCandless luc...@mikemccandless.com wrote: Lookback is a good idea: you could at least gather statistics and assess, later, whether good merges had been selected, and maybe play what if games to explore if different merge selections would have resulted in less copying. A time-based MergeScheduler would make sense: e.g., it would allow small merges to run any time, but big ones must wait until after hours. Also, RateLimitedDirWrapper can be used to limit IO impact of ongoing merges. It's like a naive ionice, for merging. Mike McCandless http://blog.mikemccandless.com On Mon, Jul 8, 2013 at 10:41 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I was (re-re-re-re)-reading Mike's post about Lucene segment merges - http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Mike mentioned lookhead as something that could possibly yield more optimal merges. But what about lookback? :) What if some sort of stats were kept about about which segments were picked for merges? With some sort of stats in hand, could one look back and, knowing what happened after those merges, evaluate if more optimal merge choices could have been made and then use that next time? Also, what about time of day and query rates? Very often search traffic follows the wave pattern, which could mean that more aggressive merging could be done during periods with lower query rates... or maybe during that time more segments could be allowed to live in the index, assuming that after allowing that for some time, the subsequent merge could be bigger/more thorough, so to speak. Thoughts? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719593#comment-13719593 ] Yonik Seeley commented on SOLR-5069: bq. It should be something we should think of as a feature of Solr. Right - it's unrelated to this feature. We've already kicked around the idea of roles for nodes for years now (like in SOLR-2765), and they would be useful in many contexts. Someone actually needs to do the work though... patches welcome ;-) MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5073) Improve SolrQuery class and add support for facet limit on per field basis in SolrJ
[ https://issues.apache.org/jira/browse/SOLR-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandro Mario Zbinden updated SOLR-5073: --- Attachment: SOLR-5073.patch Add patch that allows the class SolrQuery to set the facet.lmit on a per field basis with the new method setFacetLimit(String field, int limit) and getFacetLimit(int limit) Improve SolrQuery class and add support for facet limit on per field basis in SolrJ --- Key: SOLR-5073 URL: https://issues.apache.org/jira/browse/SOLR-5073 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 5.0, 4.4 Reporter: Sandro Mario Zbinden Priority: Minor Labels: facet, solrj Attachments: SOLR-5073.patch Original Estimate: 2h Remaining Estimate: 2h Currently the SolrQuery (org.apache.solr.client.solrj) class supports the setFacetLimit(int limit) and getFacetLimit() method. Recently someone added a feature to specifiy the facet.limit on a per field basis. It would be great if this feature could be used from solrj. with setFacetLimit(String field, int limit) and getFacetLimit(String field) setFacetPrefix is already implemetned like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.
[ https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4916. --- Resolution: Fixed Fix Version/s: (was: 4.5) 4.4 Add support to write and read Solr index files and transaction log files to and from HDFS. -- Key: SOLR-4916 URL: https://issues.apache.org/jira/browse/SOLR-4916 Project: Solr Issue Type: New Feature Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4916-ivy.patch, SOLR-4916-move-MiniDfsCluster-deps-from-solr-test-framework-to-solr-core.patch, SOLR-4916-nulloutput.patch, SOLR-4916-nulloutput.patch, SOLR-4916.patch, SOLR-4916.patch, SOLR-4916.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719612#comment-13719612 ] Andrzej Bialecki commented on SOLR-5069: - bq. some things will be completely streamable w/o any need for buffering... think of re-implementing the terms component here - we can access terms in sorted order so the reducer would simply need to do a merge sort on the streams and then stream that result back! It could be probably implemented as a special case, because it strongly depends on the map() output being sorted. However, in general case reducer must wait for all mappers to finish because mappers may produce keys out of order and non-unique. +1 on node roles, as a separate issue - it should not hold off this issue. MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Commented] (SOLR-5045) Pluggable Analytics
[ https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719614#comment-13719614 ] Joel Bernstein commented on SOLR-5045: -- Yeah, the plan eventually would be to port the techniques used in SOLR-2894 to a pluggable Aggregator. Ideally pluggable analytics would lead to the implementation of different aggregation libraries. Since they can be implemented as pure plugins, developers wouldn't have to worry about getting their library committed. Interesting commercial opportunity for developing and maintaining a high performance analytic library for Solr, above and beyond what the community provides. Pluggable Analytics --- Key: SOLR-5045 URL: https://issues.apache.org/jira/browse/SOLR-5045 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0 Attachments: SOLR-5045.patch, SOLR-5045.patch This ticket provides a pluggable aggregation framework through the introduction of a new *Aggregator* interface and a new search component called the *AggregatorComponent*. The *Aggregator* interface extends the PostFilter interface providing methods that allow DelegatingCollectors to perform aggregation at collect time. Aggregators were designed to play nicely with the CollapsingQParserPlugin introduced in SOLR-5027. The *AggregatorComponent* manages the output and distributed merging of aggregate results. This ticket is an alternate design to SOLR-4465 which had the same basic idea but a very different implementation. This implementation resolves the caching issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field collapsing. It is also much less intrusive on the core code as it's entirely implemented with plugins. Initial Syntax for the sample SumQParserPlugin Aggregator: ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity id=mysum\}aggregate=true *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it to sum the field popularity. *aggregate=true* - turns on the AggregatorComponent The output contains a block that looks like this: {code:xml} lst name=aggregates lst name=mysum long name=sum85/long /lst /lst {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5131) CheckIndex is confusing for docvalues fields
[ https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719618#comment-13719618 ] ASF subversion and git services commented on LUCENE-5131: - Commit 1506964 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1506964 ] LUCENE-5131: CheckIndex is confusing for docvalues fields CheckIndex is confusing for docvalues fields Key: LUCENE-5131 URL: https://issues.apache.org/jira/browse/LUCENE-5131 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5131.patch, LUCENE-5131.patch it prints things like: {noformat} test: docvalues...OK [0 total doc count; 18 docvalues fields] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5131) CheckIndex is confusing for docvalues fields
[ https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5131. - Resolution: Fixed Fix Version/s: 4.5 5.0 CheckIndex is confusing for docvalues fields Key: LUCENE-5131 URL: https://issues.apache.org/jira/browse/LUCENE-5131 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, 4.5 Attachments: LUCENE-5131.patch, LUCENE-5131.patch it prints things like: {noformat} test: docvalues...OK [0 total doc count; 18 docvalues fields] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5131) CheckIndex is confusing for docvalues fields
[ https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719620#comment-13719620 ] ASF subversion and git services commented on LUCENE-5131: - Commit 1506968 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1506968 ] LUCENE-5131: CheckIndex is confusing for docvalues fields CheckIndex is confusing for docvalues fields Key: LUCENE-5131 URL: https://issues.apache.org/jira/browse/LUCENE-5131 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5131.patch, LUCENE-5131.patch it prints things like: {noformat} test: docvalues...OK [0 total doc count; 18 docvalues fields] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4985) Make it easier to mix different kinds of FacetRequests
[ https://issues.apache.org/jira/browse/LUCENE-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719637#comment-13719637 ] Shai Erera commented on LUCENE-4985: I have been thinking about how to achieve that .. here's a proposal: * Make FacetsAccumulator abstract with following current impls: ** TaxonomyFacetsAccumulator, assumes that TaxoReader is needed, FacetArrays etc. ** SortedSetFacetsAccumulator, assumes that categories were indexed to a SortedSetDVField ** RangeFacetsAccumulator, for computing facet ranges on NumericDV ** MultiFacetsAccumulator allows chaining several ones (basically a generic version of RangeFacetsAccumulatorWrapper) * Add to FacetRequest.createFacetsAccumulator() ** CountFacetRequest, Association**FacetRequest return TaxoFacetsAccumulator ** SortedSetCountFacetRequest returns SortedSetFA (and also verify that the given CategoryPath was actually indexed in a SortedSetDVField) ** RangeFacetRequest returns RangeFacetsAccumulator This pretty much divides the FacetRequests into the source from which they read the facets information. Now we need to handle the different aggregation functions currently supported by the TaxoFacetAcc variants: counting, associations. TaxoFacetAcc will let you specify the FacetsAggregator: * CountFacetRequest will set the aggregator to FastCounting (if possible) or just Counting. * **AssociationFacetRequest will set the aggregator to the matching one * Additional requests can set their own aggregator * FacetsAggregator will need to implement equals() and hashCode() Then we'll have FacetsAccumulator.create(ListFacetRequest) which creates the right accumulator: * Group all requests that use the same FacetsAccumulator, so that all RangeFRs are grouped together, all TaxoFacetAcc requests together etc. * For the TaxoFacetAcc requests, it groups them by their aggregator, so that: ** All CountingAggregators that read the same category list are grouped together, separate from ones that do counting yet on a different category list ** All AssociationAggregators are grouped together, by their function, list id etc. * It then creates either a single accumulator, or MultiFacetAccumulator which chains the accumulate call What do we gain -- it's easy for an app to create the right accumulator for a given list of requests. Today it needs to sort of do this logic on its own, which is sometimes impossible (e.g. if it's a component that doesn't know what it's given). Also, the requests are self-descriptive. What do we lose -- today if one wants to count A, B and C using CachedOrdsCountingFacetsAggregator, it needs to override FacetsAccumulator.getAggregator(), once. With this change, he will need to do that for every CountFacetRequest he creates .. I think that's an OK tradeoff, given the situation today which makes apps' life tougher. I think we'll also need to create an Aggregator (old FacetsAggregator) wrapper. It is still needed by StandardFacetsAccumulator, until we finish the cleanup of sampling, complements counting etc. I'll look into that too, perhaps it can be done separately in a different issue. Now need to hope I took all the parameters into account, and won't hit a brick wall when trying to implement it :). Make it easier to mix different kinds of FacetRequests -- Key: LUCENE-4985 URL: https://issues.apache.org/jira/browse/LUCENE-4985 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Fix For: 5.0, 4.5 Spinoff from LUCENE-4980, where we added a strange class called RangeFacetsAccumulatorWrapper, which takes an incoming FSP, splits out the FacetRequests into range and non-range, delegates to two accumulators for each set, and then zips the results back together in order. Somehow we should generalize this class and make it work with SortedSetDocValuesAccumulator as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4221) Custom sharding
[ https://issues.apache.org/jira/browse/SOLR-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-4221: - Attachment: SOLR-4221.patch working patch with testcases . Custom sharding --- Key: SOLR-4221 URL: https://issues.apache.org/jira/browse/SOLR-4221 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Noble Paul Attachments: SOLR-4221.patch, SOLR-4221.patch Features to let users control everything about sharding/routing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5073) Improve SolrQuery class and add support for facet limit on per field basis in SolrJ
[ https://issues.apache.org/jira/browse/SOLR-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandro Mario Zbinden updated SOLR-5073: --- Affects Version/s: (was: 4.4) 4.5 Improve SolrQuery class and add support for facet limit on per field basis in SolrJ --- Key: SOLR-5073 URL: https://issues.apache.org/jira/browse/SOLR-5073 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 5.0, 4.5 Reporter: Sandro Mario Zbinden Priority: Minor Labels: facet, solrj Attachments: SOLR-5073.patch Original Estimate: 2h Remaining Estimate: 2h Currently the SolrQuery (org.apache.solr.client.solrj) class supports the setFacetLimit(int limit) and getFacetLimit() method. Recently someone added a feature to specifiy the facet.limit on a per field basis. It would be great if this feature could be used from solrj. with setFacetLimit(String field, int limit) and getFacetLimit(String field) setFacetPrefix is already implemetned like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719654#comment-13719654 ] Andrzej Bialecki commented on SOLR-5069: - An alternative solution for minimizing the amount of data in memory during reduce phase is to use re-reduce, or a reduce-side combiner, using Hadoop terminology. This is an additional function that runs on the reducer and periodically performs intermediate reductions of already accumulated values for a key, and preserves the intermediate results (and discards the accumulated values). This function does not emit anything to the final output. Then the final reduction function operates on a mix of values that arrived since the last intermediate reduction, plus all results of previous intermediate reductions. This works well for simple aggregations (where the additional function may be in fact a copy of the reduce function) but may not be suitable to all classes of problems. MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 657 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/657/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean Error Message: IOException occured when talking to server at: https://127.0.0.1:54453/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:54453/solr/collection1 at __randomizedtesting.SeedInfo.seed([99320139438B3BFA:FAD9003B23EFFCD8]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146) at org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean(TestBatchUpdate.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at
[jira] [Updated] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5127: Attachment: LUCENE-5127.patch I made some progress... Finally clean up divisor and interval, which are only confusing to users since they have done nothing in the default codec for so long: and in 5.x we dont have to read any preflex indexes. this makes interval a codec parameter for fixedgap and so on (like blocktree's min/max). this is cleaner and more flexible anyway, because it means e.g. if you use one of these codecs you can specify it per-field in the usual ways rather than globally for the whole index. the fieldcache-like divisor is gone. As far as the special -1 value, i didnt yet clean this up, but i see two directions. The best IMO is to nuke the mergeReader shit from ReadersAndLiveDocs completely. Otherwise we keep it and codecs can do special shit based on IOContext, but in all cases we dont need a special param. tests are passing (at least once). More cleanups are needed to some of the codec impls, and some of the special case tests for corner-case bugs in the past (e.g. TII0+empty field name) should really be moved to fix-gap specific unit tests. FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #397: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/397/ 2 tests failed. FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=5910, name=recoveryCmdExecutor-3203-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=5910, name=recoveryCmdExecutor-3203-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) at __randomizedtesting.SeedInfo.seed([A720B41D9C5A2470]:0) FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated: 1) Thread[id=5910, name=recoveryCmdExecutor-3203-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719728#comment-13719728 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507035 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507035 ] LUCENE-5127: create branch FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719729#comment-13719729 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507036 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507036 ] LUCENE-5127: dump current state FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719732#comment-13719732 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507041 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507041 ] LUCENE-5127: randomize codec parameter FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4489) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719735#comment-13719735 ] ASF subversion and git services commented on SOLR-4489: --- Commit 1507042 from [~jdyer] in branch 'dev/trunk' [ https://svn.apache.org/r1507042 ] SOLR-4489: fix StringIndexOutOfBoundsException in SpellCheckComponent StringIndexOutOfBoundsException in SpellCheckComponent --- Key: SOLR-4489 URL: https://issues.apache.org/jira/browse/SOLR-4489 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.4, 4.3.1 Environment: all Reporter: venkata marrapu Assignee: James Dyer Priority: Minor Fix For: 4.5 Attachments: SOLR-4489.patch, SOLR-4489.patch My SOLR request params are as shown below. spellcheck=trueenableElevation=truefacet=truespellcheck.q=minecraftspellcheck.extendedResults=truespellcheck.maxCollations=10spellcheck.collate=truewt=javabindefType=edismaxspellcheck.onlyMorePopular=true etc. Note: this work fine many use cases, however it fails for some query terms. Feb 22, 2013 11:06:04 AM org.apache.solr.common.SolrException log SEVERE: null:java.lang.StringIndexOutOfBoundsException: String index out of range: -5 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:680) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA,
[jira] [Commented] (SOLR-4489) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719741#comment-13719741 ] ASF subversion and git services commented on SOLR-4489: --- Commit 1507049 from [~jdyer] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1507049 ] SOLR-4489: fix StringIndexOutOfBoundsException in SpellCheckComponent StringIndexOutOfBoundsException in SpellCheckComponent --- Key: SOLR-4489 URL: https://issues.apache.org/jira/browse/SOLR-4489 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.4, 4.3.1 Environment: all Reporter: venkata marrapu Assignee: James Dyer Priority: Minor Fix For: 4.5 Attachments: SOLR-4489.patch, SOLR-4489.patch My SOLR request params are as shown below. spellcheck=trueenableElevation=truefacet=truespellcheck.q=minecraftspellcheck.extendedResults=truespellcheck.maxCollations=10spellcheck.collate=truewt=javabindefType=edismaxspellcheck.onlyMorePopular=true etc. Note: this work fine many use cases, however it fails for some query terms. Feb 22, 2013 11:06:04 AM org.apache.solr.common.SolrException log SEVERE: null:java.lang.StringIndexOutOfBoundsException: String index out of range: -5 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:680) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Resolved] (SOLR-4489) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-4489. -- Resolution: Fixed StringIndexOutOfBoundsException in SpellCheckComponent --- Key: SOLR-4489 URL: https://issues.apache.org/jira/browse/SOLR-4489 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.4, 4.3.1 Environment: all Reporter: venkata marrapu Assignee: James Dyer Priority: Minor Fix For: 4.5 Attachments: SOLR-4489.patch, SOLR-4489.patch My SOLR request params are as shown below. spellcheck=trueenableElevation=truefacet=truespellcheck.q=minecraftspellcheck.extendedResults=truespellcheck.maxCollations=10spellcheck.collate=truewt=javabindefType=edismaxspellcheck.onlyMorePopular=true etc. Note: this work fine many use cases, however it fails for some query terms. Feb 22, 2013 11:06:04 AM org.apache.solr.common.SolrException log SEVERE: null:java.lang.StringIndexOutOfBoundsException: String index out of range: -5 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:680) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result
[ https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719749#comment-13719749 ] Robert Muir commented on LUCENE-5133: - Why not use Object like the patch on LUCENE-4906 and try to get some consistency: I can easily see this becoming hell because different expert users want different things. It might work for your particular case to have String text + boolean, but other people might want to know crazy things like: * score for the passage * which multi-valued field instance they hit * position or something of the passage within the doc In general I also think its really bad to add additional classes that users must learn (the previous api here is string, which everyone already knows). anyway i dont care too much for this class, but I'd hate for us to make this mistake over on LUCENE-4906. I feel like the other highlighters already introduce way too many new classes (besides already known simple ones like IndexSearcher,TopDocs,String, etc) and it makes them difficult to use. AnalyzingInfixSuggester should return structured highlighted results instead of single String per result Key: LUCENE-5133 URL: https://issues.apache.org/jira/browse/LUCENE-5133 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.0, 4.5 Attachments: LUCENE-5133.patch, LUCENE-5133.patch Today it renders to an HTML string (b../b for hits) in protected methods that one can override to change the highlighting, but this is hard/inefficient to use for search servers that want to e.g. return JSON representation of the highlighted result. This is the same issue as LUCENE-4906 (PostingsHighlighter) but for AnalyzingInfixSuggester's highlights instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719752#comment-13719752 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507054 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507054 ] LUCENE-5127: fix solr tests FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719775#comment-13719775 ] Michael McCandless commented on LUCENE-5127: This cleanup is awesome, thanks Rob! I think we should just nuke the special -1 don't load terms index value? FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5076) Make it possible to get list of collections with CollectionsHandler
Shawn Heisey created SOLR-5076: -- Summary: Make it possible to get list of collections with CollectionsHandler Key: SOLR-5076 URL: https://issues.apache.org/jira/browse/SOLR-5076 Project: Solr Issue Type: Improvement Reporter: Shawn Heisey Priority: Minor It would be very useful to have /admin/collections (CollectionsHandler) send a response similar to /admin/cores. This should probably be the default action, but requiring ?action=STATUS wouldn't be the end of the world. It would be very useful if CloudSolrServer were to implement a getCollections method, but that probably should be a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf
: Crap, I just noticed Hoss's : https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr, which : is a shorter version of the 4.4 upgrade notes page I just created. : It feels weird to have upgrade notes for different versions in multiple : places - maybe the previous release upgrade pages could stay where they : are, but add references to them from the current release upgrade notes? : Actually, it also seems weird that the previous 4.X upgrade notes are : under the Major Changes from Solr 3 to Solr 4 page in the left-hand : navigation pane. Once upon a time, Major Changes from Solr 3 to Solr 4 was a top level section very early in the doc, and it had child pages for UPgrading to 4.x for each of the 4.x versions released so far -- this was primarily because Lucid only hosted a single version of the guide for all of 4.x. I had a discussion with cassandra on IRC about eliminating that page and it's children and having a single Upgrading page replace them (at the begining of the doc). But then we decided that since this is the first official copy of the guide to be released by apache, we should keep the Major Changes from Solr 3 to Solr 4 page arround for at least one release as sort of as an appendix. The fact that the other Upgrading to solr 4.x pages were left as children was purely a mistake on my part -- i ment to delete those. Your new Upgrading to Solr 4.4 page is better then the one we alreayd have, but i think we should rename it to simply Upgrading Solr so that it has a consistent name/url moving forward. I'll do the following: * delete all of the old upgrading pages * move your new upgrading page to the front of the doc rename it to the general Ugrading Solr * add the one sentence i think is missing to your upgrading page (If you are upgrading from Solr 3.x, you should familiarize yourself with the Major Changes from Solr 3 to Solr 4.) * cut a new RC2 cool? -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719793#comment-13719793 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507067 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507067 ] LUCENE-5127: nuke mergeReader FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719798#comment-13719798 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507070 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507070 ] LUCENE-5127: simplify seek-within-block FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719826#comment-13719826 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507075 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507075 ] LUCENE-5127: explicit var gap testing part 1 FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719836#comment-13719836 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507078 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507078 ] LUCENE-5127: explicit var gap testing part 2 FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 325 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/325/ No tests ran. Build Log: [...truncated 6600 lines...] FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:713) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167) at com.sun.proxy.$Proxy40.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.Ant.perform(Ant.java:217) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1593) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:247) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:773) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719873#comment-13719873 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507083 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507083 ] LUCENE-5127: simplify vargap FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf
Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf Changes since RC1... * Additional info from dsmiley on several pages related to spatial * Improvements in organization of Upgrading instuctions * minor corrections to the HDFS Admin UI pages -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result
[ https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719877#comment-13719877 ] Michael McCandless commented on LUCENE-5133: OK I'll try to cutover to Object instead. AnalyzingInfixSuggester should return structured highlighted results instead of single String per result Key: LUCENE-5133 URL: https://issues.apache.org/jira/browse/LUCENE-5133 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.0, 4.5 Attachments: LUCENE-5133.patch, LUCENE-5133.patch Today it renders to an HTML string (b../b for hits) in protected methods that one can override to change the highlighting, but this is hard/inefficient to use for search servers that want to e.g. return JSON representation of the highlighted result. This is the same issue as LUCENE-4906 (PostingsHighlighter) but for AnalyzingInfixSuggester's highlights instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf
+1 On Jul 25, 2013, at 2:24 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf Changes since RC1... * Additional info from dsmiley on several pages related to spatial * Improvements in organization of Upgrading instuctions * minor corrections to the HDFS Admin UI pages -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Muldowney updated SOLR-2894: --- Attachment: SOLR-2894.patch Fixed an issue where commas in string fields would cause infinite refinement loops. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.5 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719892#comment-13719892 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507086 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507086 ] LUCENE-5127: simplify fixedgap FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719894#comment-13719894 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507087 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507087 ] LUCENE-5127: fix indent FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf
: Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf : : https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf +1 -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #919: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/919/ 2 tests failed. FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2181, name=recoveryCmdExecutor-1072-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=2181, name=recoveryCmdExecutor-1072-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) at __randomizedtesting.SeedInfo.seed([C4A07F58248377E0]:0) FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated: 1) Thread[id=2181, name=recoveryCmdExecutor-1072-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at
[jira] [Commented] (SOLR-5076) Make it possible to get list of collections with CollectionsHandler
[ https://issues.apache.org/jira/browse/SOLR-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719919#comment-13719919 ] Shawn Heisey commented on SOLR-5076: Slightly off-topic: The initial motivation for this issue is getting the collection list from CloudSolrServer, but when I went looking for ways to get that information in a machine-readable way from Solr without getting into zookeeper objects, I couldn't find one. Within CloudSolrServer, it might make sense to use the ZK objects rather than /admin/collections, but I don't think we should force a user to do so. Make it possible to get list of collections with CollectionsHandler --- Key: SOLR-5076 URL: https://issues.apache.org/jira/browse/SOLR-5076 Project: Solr Issue Type: Improvement Reporter: Shawn Heisey Priority: Minor It would be very useful to have /admin/collections (CollectionsHandler) send a response similar to /admin/cores. This should probably be the default action, but requiring ?action=STATUS wouldn't be the end of the world. It would be very useful if CloudSolrServer were to implement a getCollections method, but that probably should be a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719932#comment-13719932 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507097 from [~mikemccand] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507097 ] LUCENE-5127: add tests FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [CONF] Apache Solr Reference Guide Internal - How To Publish This Documentation
Hi, One question: should we also add signatures and checksums on the pdf artifact? In my opinion we should create those so we can verify that we all vote on the same pdf file created by the RM. The GPG signature would ensure this. Uwe Hoss Man (Confluence) conflue...@apache.org schrieb: Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr) Page: Internal - How To Publish This Documentation (https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation) Change Comment: - tweak pre/post publish actions to match how the Upgrade page currently exists Edited by Hoss Man: - {toc} h1. Pre-publication Actions * Make sure that the [Upgrading Solr] page is up to date for the current version. * Sanity check that none of the [post-publishing version number updating steps|#Update Links Version Numbers] from the last version published were skipped. h1. How To Export the PDF from Confluence * Load [The PDF Space Export Page|https://cwiki.apache.org/confluence/spaces/flyingpdf/flyingpdf.action?key=solr] in your browser * Uncheck the box next to [** Internal MetaDocs] to suppress it and its children from being included in the PDF * Click the Export button * On the subsequent page, wait for a Download here link to dynamically appear. * Click Download here and save the PDF to your local machine * Use scp to copy the into your public_html directory on people.apache.org, named appropriately as a release candidate. For example... \\ {noformat}scp solr-220713-2054-17096.pdf people.apache.org:public_html/apache-solr-ref-guide-4.4_RC1.pdf{noformat} {note}The Export URLs returned by the Download here link won't work from curl on people.apache.org, so you have to make a local copy first.{note} h1. Hold a VOTE * Send an email to dev@lucene (CC general@lucene) with a Subject VOTE: RC1 Release apache-solr-ref-guide-X.Y.pdf and include the full URL from {{http://people.apache.org/~yourname/apache-solr-ref-guide-X.Y_RC1.pdf}}. * If there are problems with the RC that are fixed in Confluence, Export a new copy (using the instructions above) with a new name (RC2, RC3, etc...) and send out another VOTE thread. h1. Publish to SvnSubPub Mirrors Once [three PMC members have voted for a release, it may be published|http://www.apache.org/foundation/voting.html#ReleaseVotes]... * Check-out the {{lucene/solr/ref-guide}} directory from the dist repo (or svn update if you already have a checkout) ... \\ {noformat} svn co https://dist.apache.org/repos/dist/release/lucene/solr/ref-guide solr-ref-guide-dist # OR svn update solr-ref-guide-dist {noformat} * Copy the RC ref guide into this directory using its final name and commit... \\ {noformat} cp apache-solr-ref-guide-4.4_RC1.pdf solr-ref-guide-dist/apache-solr-ref-guide-4.4.pdf svn commit -m 4.4 ref guide solr-ref-guide-dist {noformat} * Wait 24 hours to give the mirrors a chance to get the new release. The status of the mirrors can be monitored using {{dev-tools/scripts/poll-mirrors.pl}}... \\ {noformat} perl dev-tools/scripts/poll-mirrors.pl -details -p lucene/solr/ref-guide/apache-solr-ref-guide-X.Y.pdf {noformat} h1. Post Publish Actions Once most mirrors have been updated, we can link to (and announce) the new guide. h2. Update Links Version Numbers When linking to the current version of the ref guide, always use the download redirector. Example: {{https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide.X.Y.pdf}} When linking to old versions of the ref guide, always use archive.apache.org. Example: {{https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide.X.Y.pdf}} h3. Website (lucene.apache.org) * Update links on [the Solr documentation page|https://lucene.apache.org/solr/documentation.html] to point to the current version of the ref guide. * (!) :TODO: Other places to link from? (!) h3. Confluence * On the [Confluence Theme Configuration Page|https://cwiki.apache.org/confluence/spaces/doctheme/configuretheme.action?key=solr] for the Solr Ref Guide... ** Update the Left Nav to add a link to the current version of the ref guide. ** Update the Left Nav to change the link for the previous version(s) of the ref guide so that they use the archive URL. ** Update the Left Nav and Header Message to refer to the next version that the live copy of the documentation will refer to (ie: if the 4.4 ref guide has just been published, change _*4.4* Draft Ref Guide Topics_ to _*4.5* Draft Ref Guide Topics_ and _This Unreleased Guide Will Cover Apache Solr *4.4*_ to _This Unreleased Guide Will Cover Apache Solr *4.5*_) * On the [Confluence PDF Layout Page|https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdflayoutconfig.action?key=solr] for the Solr Ref Guide... ** Update the Title Page to refer to the next version (ie: 4.4 \-
[jira] [Updated] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result
[ https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5133: --- Attachment: LUCENE-5133.patch New patch, cutover to Object. It's more work for the [very expert] user since they need to re-implement the entire highlight method ... but I think that's acceptable. AnalyzingInfixSuggester should return structured highlighted results instead of single String per result Key: LUCENE-5133 URL: https://issues.apache.org/jira/browse/LUCENE-5133 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.0, 4.5 Attachments: LUCENE-5133.patch, LUCENE-5133.patch, LUCENE-5133.patch Today it renders to an HTML string (b../b for hits) in protected methods that one can override to change the highlighting, but this is hard/inefficient to use for search servers that want to e.g. return JSON representation of the highlighted result. This is the same issue as LUCENE-4906 (PostingsHighlighter) but for AnalyzingInfixSuggester's highlights instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4876) IndexWriterConfig.clone should clone the MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719987#comment-13719987 ] Shai Erera commented on LUCENE-4876: Perhaps we can do a minor change -- stop calling IWC.clone() by IW on init. We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone() before passing to IW? That's line a 1-liner change (well + clarifying the jdocs), that will make 99% of the users happy. The rest should just do {{new IW(dir, conf.clone())}} ... that's simple enough? IndexWriterConfig.clone should clone the MergeScheduler --- Key: LUCENE-4876 URL: https://issues.apache.org/jira/browse/LUCENE-4876 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.3 Attachments: LUCENE-4876.patch, LUCENE-4876.patch ConcurrentMergeScheduler has a ListMergeThread member to track the running merging threads, so IndexWriterConfig.clone should clone the merge scheduler so that both IndexWriterConfig instances are independant. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4876) IndexWriterConfig.clone should clone the MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719988#comment-13719988 ] Michael McCandless commented on LUCENE-4876: +1 IndexWriterConfig.clone should clone the MergeScheduler --- Key: LUCENE-4876 URL: https://issues.apache.org/jira/browse/LUCENE-4876 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.3 Attachments: LUCENE-4876.patch, LUCENE-4876.patch ConcurrentMergeScheduler has a ListMergeThread member to track the running merging threads, so IndexWriterConfig.clone should clone the merge scheduler so that both IndexWriterConfig instances are independant. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720005#comment-13720005 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507111 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507111 ] LUCENE-5127: clear nocommits FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720024#comment-13720024 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507116 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507116 ] LUCENE-5127: fix TestLucene40PF and clean up some more outdated stuff FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720027#comment-13720027 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507118 from [~mikemccand] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507118 ] LUCENE-5127: fix false fail when terms dict is a ghostbuster FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720044#comment-13720044 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507120 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507120 ] LUCENE-5127: clean up error msgs FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5136) Improve FacetRequest javadocs
Shai Erera created LUCENE-5136: -- Summary: Improve FacetRequest javadocs Key: LUCENE-5136 URL: https://issues.apache.org/jira/browse/LUCENE-5136 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 While working on LUCENE-4985, I noticed that FacetRequest's jdocs are severely outdated. I rewrote them entirely, so prefer to commit them separately than the rest of the changes. Will post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5136) Improve FacetRequest javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5136: --- Attachment: LUCENE-5136.patch if others have suggestions for better wording, you are more than welcome to let me know. Otherwise, I will commit this tomorrow. Improve FacetRequest javadocs - Key: LUCENE-5136 URL: https://issues.apache.org/jira/browse/LUCENE-5136 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5136.patch While working on LUCENE-4985, I noticed that FacetRequest's jdocs are severely outdated. I rewrote them entirely, so prefer to commit them separately than the rest of the changes. Will post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf
+1 Nice job everyone! -Yonik http://lucidworks.com On Thu, Jul 25, 2013 at 2:24 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf Changes since RC1... * Additional info from dsmiley on several pages related to spatial * Improvements in organization of Upgrading instuctions * minor corrections to the HDFS Admin UI pages -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5136) Improve FacetRequest javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720118#comment-13720118 ] Michael McCandless commented on LUCENE-5136: +1 Minor things: * sepcify - specify * Such requests will also usually won't use - Such requests won't use (?) Improve FacetRequest javadocs - Key: LUCENE-5136 URL: https://issues.apache.org/jira/browse/LUCENE-5136 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5136.patch While working on LUCENE-4985, I noticed that FacetRequest's jdocs are severely outdated. I rewrote them entirely, so prefer to commit them separately than the rest of the changes. Will post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf
On 7/25/13 4:29 AM, Steve Rowe sar...@gmail.com wrote: I also noticed that David Smiley made a bunch of modifications, AFAICT to spatial and related topics, and it would be good to include those. Yes, the spatial page needed an overhaul; I think it's much better now. It was a refactor to better express the existing information; it doesn't really convey anything new. I'll do a lot more to it in a future release. What do you mean by it would be good to include those? Include my changes where? It's in the PDF I saw that was just published which I presumed it would be by virtue of me making my edits within the timeframe Hoss outlined. ~ David - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4876) IndexWriterConfig.clone should clone the MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720129#comment-13720129 ] Yonik Seeley commented on LUCENE-4876: -- bq. We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone() +1 IndexWriterConfig.clone should clone the MergeScheduler --- Key: LUCENE-4876 URL: https://issues.apache.org/jira/browse/LUCENE-4876 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.3 Attachments: LUCENE-4876.patch, LUCENE-4876.patch ConcurrentMergeScheduler has a ListMergeThread member to track the running merging threads, so IndexWriterConfig.clone should clone the merge scheduler so that both IndexWriterConfig instances are independant. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5137) UAX29URLEmailTokenizer.java causes NullPointerException in 4.3 and 4.4
Allan Rofer created LUCENE-5137: --- Summary: UAX29URLEmailTokenizer.java causes NullPointerException in 4.3 and 4.4 Key: LUCENE-5137 URL: https://issues.apache.org/jira/browse/LUCENE-5137 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.3 Environment: Windows 7 Reporter: Allan Rofer There is a comment (best effort NPE if you dont call reset) in the getScannerFor method in UAX29URLEmailTokenizer. The callers of getScannerFor do NOT call reset, so an NPE is thrown in the parser which has a null Reader. If you put the line this.scanner.yyreset(input); after each call to getScannerFor, the NPE is avoided. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf
On Jul 25, 2013 6:00 PM, Smiley, David W. dsmi...@mitre.org wrote: On 7/25/13 4:29 AM, Steve Rowe sar...@gmail.com wrote: I also noticed that David Smiley made a bunch of modifications, AFAICT to spatial and related topics, and it would be good to include those. [...] What do you mean by it would be good to include those? Include my changes where? It's in the PDF I saw that was just published which I presumed it would be by virtue of me making my edits within the timeframe Hoss outlined. David, you made your edits after Hoss called the RC1 vote - I was arguing for an RC2 based partly on your changes. Steve
Re: [CONF] Apache Solr Reference Guide Internal - How To Publish This Documentation
: One question: should we also add signatures and checksums on the pdf : artifact? In my opinion we should create those so we can verify that we : all vote on the same pdf file created by the RM. The GPG signature would : ensure this. Good question. I briefly considered this a while back when i first started drafting up the process (i think i even asked about it on IRC and got no response) but ultimately didn't include it because... 1) i didn't see any risk from potentially rouge mirrors trying to modify the docs (not like with source code) 2) from the precendence i could see from httpd-docs, they didn't bother with signing or providing checksums for their doc releases 3) i was trying to keey things simple. But you're right -- particularly for ensuring that we are all voting on the same thing having sigs/checksums are a good idea -- and if we're going to generate them, we might as well also push them to the mirrors. I'll update the docs, but in the meantime I don't think we need to call a new VOTE of a new RC -- but i'll reply to the existing RC2 thread with specifics on the sig/checksum. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf
: : Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf : : : : https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf For completeness, the RC2 artifact i'm voting +1 to is... 2973817acf6ea5e4b607e5eac2bd49d7857b5406 apache-solr-ref-guide-4.4_RC2.pdf checksum PGP sig... https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf.sha1 https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf.asc NOTE: My PGP key is brand new (I never needed one before today), and not really in the web of trust yet, but it has been slurped in by the ASF key management system, so folks should be able to verify that sig... https://people.apache.org/keys/committer/hossman.asc https://people.apache.org/keys/group/lucene-pmc.asc -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5137) UAX29URLEmailTokenizer.java causes NullPointerException in 4.3 and 4.4
[ https://issues.apache.org/jira/browse/LUCENE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5137. - Resolution: Not A Problem The best effort is throwing a problem because the consumer (you) isn't calling reset. See the javadocs of tokenstream. you must call reset before the incrementToken loop. UAX29URLEmailTokenizer.java causes NullPointerException in 4.3 and 4.4 -- Key: LUCENE-5137 URL: https://issues.apache.org/jira/browse/LUCENE-5137 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.3 Environment: Windows 7 Reporter: Allan Rofer Original Estimate: 1h Remaining Estimate: 1h There is a comment (best effort NPE if you dont call reset) in the getScannerFor method in UAX29URLEmailTokenizer. The callers of getScannerFor do NOT call reset, so an NPE is thrown in the parser which has a null Reader. If you put the line this.scanner.yyreset(input); after each call to getScannerFor, the NPE is avoided. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720357#comment-13720357 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507179 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507179 ] LUCENE-5127: use less ram when writing the terms index FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5127: Attachment: LUCENE-5127.patch Patch for trunk, i think its ready. FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf
+1 Thanks Hoss, Cassandra, and to everyone else who contributed! - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/VOTE-RC2-Release-apache-solr-ref-guide-4-4-pdf-tp4080395p4080488.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf
Oh; I should have read more carefully. Thanks! ~ David sarowe wrote David, you made your edits after Hoss called the RC1 vote - I was arguing for an RC2 based partly on your changes. Steve - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/VOTE-RC1-Release-apache-solr-ref-guide-4-4-pdf-tp4080196p4080489.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720397#comment-13720397 ] Hoss Man commented on SOLR-5057: Similar to yoniks point initial point: In my experience, the situations where folks are going to be most concerned about having good cache usage are the situations where queries are generated programatically and the order of the filter queries is already deterministic (or can be made deterministic easy enough by the client) My straw man suggestion would be to not modify QueryResultKey at all, and instead write a new (optional) SearchComponent that did nothing by sort the getFilters() array in it's prepare() method. Users who can't ensure that requests with equivalent fq params queries come in the same order can register it to run just after the query component and get good cache hit ratios, but it wouldn't affect the performance in any way for users who send queries with fqs i na determinstic manner queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Assignee: Erick Erickson Priority: Minor Attachments: SOLR-5057.patch, SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5136) Improve FacetRequest javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720409#comment-13720409 ] ASF subversion and git services commented on LUCENE-5136: - Commit 1507194 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1507194 ] LUCENE-5136: improve FacetRequest javadocs Improve FacetRequest javadocs - Key: LUCENE-5136 URL: https://issues.apache.org/jira/browse/LUCENE-5136 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5136.patch While working on LUCENE-4985, I noticed that FacetRequest's jdocs are severely outdated. I rewrote them entirely, so prefer to commit them separately than the rest of the changes. Will post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5136) Improve FacetRequest javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720412#comment-13720412 ] ASF subversion and git services commented on LUCENE-5136: - Commit 1507195 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1507195 ] LUCENE-5136: improve FacetRequest javadocs Improve FacetRequest javadocs - Key: LUCENE-5136 URL: https://issues.apache.org/jira/browse/LUCENE-5136 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5136.patch While working on LUCENE-4985, I noticed that FacetRequest's jdocs are severely outdated. I rewrote them entirely, so prefer to commit them separately than the rest of the changes. Will post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5136) Improve FacetRequest javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5136. Resolution: Fixed Thanks Mike. Committed to trunk and 4x. Improve FacetRequest javadocs - Key: LUCENE-5136 URL: https://issues.apache.org/jira/browse/LUCENE-5136 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5136.patch While working on LUCENE-4985, I noticed that FacetRequest's jdocs are severely outdated. I rewrote them entirely, so prefer to commit them separately than the rest of the changes. Will post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org