date:20130725

I noticed that the ref guide was missing a page about upgrading to Solr 4.4 
(there is such a page for releases 4.1 through 4.3), so I created one, based on 
upgrade notes and bug fixes from CHANGES.txt: 
https://cwiki.apache.org/confluence/display/solr/Upgrading+to+Solr+4.4.  
Please edit it to make it better if you notice any problems.

I think we should respin for this.

I've added a new section Pre-publication actions, and a bullet point about 
creating the per-release upgrade page to 
https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation.
  But maybe we should have a different page dedicated to this and similar 
activities?  (Or maybe there already is one?)

I also noticed that David Smiley made a bunch of modifications, AFAICT to 
spatial and related topics, and it would be good to include those.

Steve

On Jul 24, 2013, at 8:30 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf
 
 https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC1.pdf
 
 
 
 As this is our first Documentation release VOTE, folks may wish to 
 familiarize themselves with the doc release process that I posted a while 
 back, but got very little (none if I remember correctly) feedback on...
 
 https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation
 
 
 -Hoss
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf

Crap, I just noticed Hoss's 
https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr, which is a 
shorter version of the 4.4 upgrade notes page I just created.

Hoss, can you take a look at my new page and see if any of the extra stuff I've 
included beyond your page should be incorporated?

It feels weird to have upgrade notes for different versions in multiple places 
- maybe the previous release upgrade pages could stay where they are, but add 
references to them from the current release upgrade notes?  Actually, it also 
seems weird that the previous 4.X upgrade notes are under the Major Changes 
from Solr 3 to Solr 4 page in the left-hand navigation pane.

Depending on the nature of any changes we make for this, the new 
Pre-publication acitons section on the Howto publish internal page will 
need to be adjusted. 

Steve

On Jul 25, 2013, at 4:29 AM, Steve Rowe sar...@gmail.com wrote:

 I noticed that the ref guide was missing a page about upgrading to Solr 4.4 
 (there is such a page for releases 4.1 through 4.3), so I created one, based 
 on upgrade notes and bug fixes from CHANGES.txt: 
 https://cwiki.apache.org/confluence/display/solr/Upgrading+to+Solr+4.4.  
 Please edit it to make it better if you notice any problems.
 
 I think we should respin for this.
 
 I've added a new section Pre-publication actions, and a bullet point about 
 creating the per-release upgrade page to 
 https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation.
   But maybe we should have a different page dedicated to this and similar 
 activities?  (Or maybe there already is one?)
 
 I also noticed that David Smiley made a bunch of modifications, AFAICT to 
 spatial and related topics, and it would be good to include those.
 
 Steve
 
 On Jul 24, 2013, at 8:30 PM, Chris Hostetter hossman_luc...@fucit.org wrote:
 
 
 Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf
 
 https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC1.pdf
 
 
 
 As this is our first Documentation release VOTE, folks may wish to 
 familiarize themselves with the doc release process that I posted a while 
 back, but got very little (none if I remember correctly) feedback on...
 
 https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation
 
 
 -Hoss
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5131) CheckIndex is confusing for docvalues fields

2013-07-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719404#comment-13719404
 ] 

Adrien Grand commented on LUCENE-5131:
--

Definitely +1 for this patch and printing statistics about unique value counts 
for SORTED and SORTED_SET.

 CheckIndex is confusing for docvalues fields
 

 Key: LUCENE-5131
 URL: https://issues.apache.org/jira/browse/LUCENE-5131
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5131.patch, LUCENE-5131.patch


 it prints things like:
 {noformat}
 test: docvalues...OK [0 total doc count; 18 docvalues fields]
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-07-25 Thread Elran Dvir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719417#comment-13719417
 ] 

Elran Dvir commented on SOLR-2894:
--

I have downloaded the source code from Solr's website.
Then opened it with my IDE: Intellij.
when I tried applying the patch, Intellij reported there were problems with 
some files.

Thanks.  

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.5

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5045) Pluggable Analytics


[ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719512#comment-13719512
 ] 

Otis Gospodnetic commented on SOLR-5045:


[~joel.bernstein] how does this play with SOLR-2894?  Overlap?  Is the plan to 
be able to use this approach here to implement SOLR-2894 later on?


 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 {code:xml}
 lst name=aggregates
   lst name=mysum
 long name=sum85/long
   /lst
 /lst
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light


 [ 
https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-5075.


Resolution: Invalid

 SolrCloud commit process is too time consuming, even if documents are light
 ---

 Key: SOLR-5075
 URL: https://issues.apache.org/jira/browse/SOLR-5075
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, SolrCloud
Affects Versions: 4.1
 Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom 
 java importer.
 Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb 
 SSD and 50tb SAS memory
Reporter: Radu Ghita
  Labels: import, solrconfig.xml

 We are having a client with business model that requires indexing each month 
 billion rows into solr from mysql in a small time-frame. The documents are 
 very light, but the number is very high and we need to achieve speeds of 
 around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after 
 some hours ( ~12 ) it crashes and the speed slows down as hours go by.
 Therefore we have developed a custom java importer that connects directly to 
 mysql and solrcloud via zookeeper, grabs data from mysql, creates documents 
 and then imports into solr. This helps because we are opening ~50 threads and 
 the indexing process speeds up. We have optimized the mysql queries ( mysql 
 was the initial bottleneck ) and the speeds we get now are over 100k/s, but 
 as index number gets bigger, solr stays very long on adding documents. I 
 assume it needs to be something from solrconfig that makes solr stay and even 
 block after 100 mil documents indexed.
 Here is the java code that creates documents and then adds to solr server:
 public void createDocuments() throws SQLException, SolrServerException, 
 IOException
   {
   App.logger.write(Creating documents..);
   this.docs = new ArrayListSolrInputDocument();
   App.logger.incrementNumberOfRows(this.size);
   while(this.results.next())
   {
  
 this.docs.add(this.getDocumentFromResultSet(this.results));
   }
   this.statement.close();
   this.results.close();
   }
   
   public void commitDocuments() throws SolrServerException, IOException
   {
   App.logger.write(Committing..);
   App.solrServer.add(this.docs); // here it stays very long and 
 then blocks
   App.logger.incrementNumberOfRows(this.docs.size());
   this.docs.clear();
   }
 I am also pasting solrconfig.xml parameters that make sense to this 
 discussion:
 maxIndexingThreads128/maxIndexingThreads
 useCompoundFilefalse/useCompoundFile
 ramBufferSizeMB1/ramBufferSizeMB
 maxBufferedDocs100/maxBufferedDocs
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce2/int
   int name=segmentsPerTier100/int
   int name=maxMergeAtOnceExplicit1/int
 /mergePolicy
 mergeFactor100/mergeFactor
 termIndexInterval1024/termIndexInterval
 autoCommit 
maxTime15000/maxTime 
maxDocs100/maxDocs
openSearcherfalse/openSearcher 
  /autoCommit
 autoSoftCommit 
  maxTime200/maxTime 
/autoSoftCommit
 Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. 
 If there's any other info needed please let me know.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light


[ 
https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719527#comment-13719527
 ] 

Otis Gospodnetic commented on SOLR-5075:


[~r...@wmds.ro] you should close this issue and ask on the solr-user mailing 
list.

 SolrCloud commit process is too time consuming, even if documents are light
 ---

 Key: SOLR-5075
 URL: https://issues.apache.org/jira/browse/SOLR-5075
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, SolrCloud
Affects Versions: 4.1
 Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom 
 java importer.
 Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb 
 SSD and 50tb SAS memory
Reporter: Radu Ghita
  Labels: import, solrconfig.xml

 We are having a client with business model that requires indexing each month 
 billion rows into solr from mysql in a small time-frame. The documents are 
 very light, but the number is very high and we need to achieve speeds of 
 around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after 
 some hours ( ~12 ) it crashes and the speed slows down as hours go by.
 Therefore we have developed a custom java importer that connects directly to 
 mysql and solrcloud via zookeeper, grabs data from mysql, creates documents 
 and then imports into solr. This helps because we are opening ~50 threads and 
 the indexing process speeds up. We have optimized the mysql queries ( mysql 
 was the initial bottleneck ) and the speeds we get now are over 100k/s, but 
 as index number gets bigger, solr stays very long on adding documents. I 
 assume it needs to be something from solrconfig that makes solr stay and even 
 block after 100 mil documents indexed.
 Here is the java code that creates documents and then adds to solr server:
 public void createDocuments() throws SQLException, SolrServerException, 
 IOException
   {
   App.logger.write(Creating documents..);
   this.docs = new ArrayListSolrInputDocument();
   App.logger.incrementNumberOfRows(this.size);
   while(this.results.next())
   {
  
 this.docs.add(this.getDocumentFromResultSet(this.results));
   }
   this.statement.close();
   this.results.close();
   }
   
   public void commitDocuments() throws SolrServerException, IOException
   {
   App.logger.write(Committing..);
   App.solrServer.add(this.docs); // here it stays very long and 
 then blocks
   App.logger.incrementNumberOfRows(this.docs.size());
   this.docs.clear();
   }
 I am also pasting solrconfig.xml parameters that make sense to this 
 discussion:
 maxIndexingThreads128/maxIndexingThreads
 useCompoundFilefalse/useCompoundFile
 ramBufferSizeMB1/ramBufferSizeMB
 maxBufferedDocs100/maxBufferedDocs
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce2/int
   int name=segmentsPerTier100/int
   int name=maxMergeAtOnceExplicit1/int
 /mergePolicy
 mergeFactor100/mergeFactor
 termIndexInterval1024/termIndexInterval
 autoCommit 
maxTime15000/maxTime 
maxDocs100/maxDocs
openSearcherfalse/openSearcher 
  /autoCommit
 autoSoftCommit 
  maxTime200/maxTime 
/autoSoftCommit
 Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. 
 If there's any other info needed please let me know.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5075) SolrCloud commit process is too time consuming, even if documents are light

2013-07-25 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719537#comment-13719537
 ] 

Erick Erickson commented on SOLR-5075:
--

FWIW, I was about to say the same thing, but had one comment.

SOLR-4816 (not in 4.4, but soon) should add some efficiencies to SolrJ 
updating, I'd love to see what the effects in your situation are.

One thing, it looks like you're accumulating all the docs from the select in 
one huge batch and indexing them all at once. If that's true, try submitting 
them, say, 1,000 at a time. I suspect that will not hang, but I also suspect 
that will slow your initial ingest rate because you'll actually be sending docs 
to Solr rather than just accumulating them all locally.

 SolrCloud commit process is too time consuming, even if documents are light
 ---

 Key: SOLR-5075
 URL: https://issues.apache.org/jira/browse/SOLR-5075
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, SolrCloud
Affects Versions: 4.1
 Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom 
 java importer.
 Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb 
 SSD and 50tb SAS memory
Reporter: Radu Ghita
  Labels: import, solrconfig.xml

 We are having a client with business model that requires indexing each month 
 billion rows into solr from mysql in a small time-frame. The documents are 
 very light, but the number is very high and we need to achieve speeds of 
 around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after 
 some hours ( ~12 ) it crashes and the speed slows down as hours go by.
 Therefore we have developed a custom java importer that connects directly to 
 mysql and solrcloud via zookeeper, grabs data from mysql, creates documents 
 and then imports into solr. This helps because we are opening ~50 threads and 
 the indexing process speeds up. We have optimized the mysql queries ( mysql 
 was the initial bottleneck ) and the speeds we get now are over 100k/s, but 
 as index number gets bigger, solr stays very long on adding documents. I 
 assume it needs to be something from solrconfig that makes solr stay and even 
 block after 100 mil documents indexed.
 Here is the java code that creates documents and then adds to solr server:
 public void createDocuments() throws SQLException, SolrServerException, 
 IOException
   {
   App.logger.write(Creating documents..);
   this.docs = new ArrayListSolrInputDocument();
   App.logger.incrementNumberOfRows(this.size);
   while(this.results.next())
   {
  
 this.docs.add(this.getDocumentFromResultSet(this.results));
   }
   this.statement.close();
   this.results.close();
   }
   
   public void commitDocuments() throws SolrServerException, IOException
   {
   App.logger.write(Committing..);
   App.solrServer.add(this.docs); // here it stays very long and 
 then blocks
   App.logger.incrementNumberOfRows(this.docs.size());
   this.docs.clear();
   }
 I am also pasting solrconfig.xml parameters that make sense to this 
 discussion:
 maxIndexingThreads128/maxIndexingThreads
 useCompoundFilefalse/useCompoundFile
 ramBufferSizeMB1/ramBufferSizeMB
 maxBufferedDocs100/maxBufferedDocs
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce2/int
   int name=segmentsPerTier100/int
   int name=maxMergeAtOnceExplicit1/int
 /mergePolicy
 mergeFactor100/mergeFactor
 termIndexInterval1024/termIndexInterval
 autoCommit 
maxTime15000/maxTime 
maxDocs100/maxDocs
openSearcherfalse/openSearcher 
  /autoCommit
 autoSoftCommit 
  maxTime200/maxTime 
/autoSoftCommit
 Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. 
 If there's any other info needed please let me know.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719540#comment-13719540
 ] 

Otis Gospodnetic commented on SOLR-5069:


This is great to see - I asked about this in SOLR-1301 - 
https://issues.apache.org/jira/browse/SOLR-1301?focusedCommentId=13678948page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13678948
 :)

{quote}
The node which received the command would first identify one replica from each 
slice where the map program is executed . It will also identify one another 
node from the same collection where the reduce program is run. Each run is 
given an id and the details of the nodes participating in the run will be 
written to ZK (as an ephemeral node).
{quote}

Lukas and Andrzej have already addressed my immediate thought when I read the 
above, but they talked about using the cost approach, limiting resource use, 
and such.  But I think we should learn from others' mistakes and choices here.  
Is it good enough to limit resources?  Just limiting resources means that any 
concurrent queries *will* be effected - the question is just how much.  Would 
it be better to mark some nodes as eligible for running analytical/batch/MR 
jobs + search or eligible for running analytical/batch/MR jobs and NO search 
- i.e. nodes that are a part of the SolrCloud cluster, but run ONLY these jobs 
and do NOT handle queries?

I think we saw DataStax do this with Cassandra and Brisk and we see that with 
people using HBase with HBase replication and using one HBase cluster for 
real-time/interactive access and the other cluster for running jobs.


 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count =

[jira] [Updated] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result


 [ 
https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5133:
---

Attachment: LUCENE-5133.patch

Thanks Shai, new patch attached.

 AnalyzingInfixSuggester should return structured highlighted results instead 
 of single String per result
 

 Key: LUCENE-5133
 URL: https://issues.apache.org/jira/browse/LUCENE-5133
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5133.patch, LUCENE-5133.patch


 Today it renders to an HTML string (b../b for hits) in protected
 methods that one can override to change the highlighting, but this is
 hard/inefficient to use for search servers that want to e.g. return
 JSON representation of the highlighted result.
 This is the same issue as LUCENE-4906 (PostingsHighlighter) but for
 AnalyzingInfixSuggester's highlights instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719550#comment-13719550
 ] 

Noble Paul commented on SOLR-5069:
--

bq.Would it be better to mark some nodes as eligible for running 
analytical/batch/MR jobs + search 

Instead of marking certain nodes as (eligible for X)how about passing the node 
names in the request itself ? That way we are not introducing some kind of 
'role' in the system but still get all the benefits?

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719554#comment-13719554
 ] 

Otis Gospodnetic commented on SOLR-5069:


bq. Instead of marking certain nodes as (eligible for X)how about passing the 
node names in the request itself ? That way we are not introducing some kind of 
'role' in the system but still get all the benefits?

But if searches are running on *all* nodes, then the above doesn't achieve 
complete separation of search vs. job work.

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719556#comment-13719556
 ] 

Noble Paul commented on SOLR-5069:
--

bq.But if searches are running on all nodes, then the above doesn't achieve 
complete separation of search vs. job work.

makes sense...

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5069) MapReduce for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719556#comment-13719556
 ] 

Noble Paul edited comment on SOLR-5069 at 7/25/13 12:11 PM:


bq.But if searches are running on all nodes, then the above doesn't achieve 
complete separation of search vs. job work.

makes sense. It should be something we should think of as a feature of Solr. 
Being a part of a cluster but not taking part in certain roles 
(leader/search/jobs etc)

  was (Author: noble.paul):
bq.But if searches are running on all nodes, then the above doesn't achieve 
complete separation of search vs. job work.

makes sense...
  
 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result


[ 
https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719559#comment-13719559
 ] 

Shai Erera commented on LUCENE-5133:


Looks good.

 AnalyzingInfixSuggester should return structured highlighted results instead 
 of single String per result
 

 Key: LUCENE-5133
 URL: https://issues.apache.org/jira/browse/LUCENE-5133
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5133.patch, LUCENE-5133.patch


 Today it renders to an HTML string (b../b for hits) in protected
 methods that one can override to change the highlighting, but this is
 hard/inefficient to use for search servers that want to e.g. return
 JSON representation of the highlighted result.
 This is the same issue as LUCENE-4906 (PostingsHighlighter) but for
 AnalyzingInfixSuggester's highlights instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719563#comment-13719563
 ] 

Otis Gospodnetic commented on SOLR-5069:


bq. It should be something we should think of as a feature of Solr. Being a 
part of a cluster but not taking part in certain roles (leader/search/jobs etc

Yeah, perhaps something like that.  We already have Overseer and Leader, which 
are also roles of some sort, though those are completely managed by SolrCloud, 
meaning SolrCloud/ZK do the node election and node assignment for these 
particular roles, AFAIK. While for search vs. job (vs. mixed) role the 
assignment is likely to come from a human+ZK.

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5134) Consider implementing lookback merge policy

Otis Gospodnetic created LUCENE-5134:


 Summary: Consider implementing lookback merge policy
 Key: LUCENE-5134
 URL: https://issues.apache.org/jira/browse/LUCENE-5134
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Otis Gospodnetic
Priority: Minor


In 
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html 
Mike mentioned lookahead as something that could possibly yield more
optimal merges.

But what about lookback?

What if some sort of stats were kept about about which segments were
picked for merges?  With some sort of stats in hand, could one look
back and, knowing what happened after those merges, evaluate if more
optimal merge choices could have been made and then use that next
time?

See http://search-lucene.com/m/D7ypz1gT2H91


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5135) Consider time-based MergeScheduler

2013-07-25 Thread ASF subversion and git services (JIRA)

Otis Gospodnetic created LUCENE-5135:


 Summary: Consider time-based MergeScheduler
 Key: LUCENE-5135
 URL: https://issues.apache.org/jira/browse/LUCENE-5135
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Otis Gospodnetic
Priority: Minor


Very often search traffic follows the wave pattern, which could mean that more
aggressive merging could be done during periods with lower query
rates (e.g. nights and weekends) ... or maybe during that time more segments 
could be allowed to live in the index, assuming that after allowing that for 
some time, the subsequent merge could be bigger/more thorough, so to speak.

See http://search-lucene.com/m/D7ypz1gT2H91


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lookback and/or time-aware Merge Policy?

2013-07-25 Thread Otis Gospodnetic

Thanks for showing I wasn't completely crazy to think this made sense, Mike.

I added:
https://issues.apache.org/jira/browse/LUCENE-5134
https://issues.apache.org/jira/browse/LUCENE-5135

Otis



On Mon, Jul 15, 2013 at 1:28 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Lookback is a good idea: you could at least gather statistics and
 assess, later, whether good merges had been selected, and maybe play
 what if games to explore if different merge selections would have
 resulted in less copying.

 A time-based MergeScheduler would make sense: e.g., it would allow
 small merges to run any time, but big ones must wait until after
 hours.

 Also, RateLimitedDirWrapper can be used to limit IO impact of ongoing
 merges.  It's like a naive ionice, for merging.

 Mike McCandless

 http://blog.mikemccandless.com


 On Mon, Jul 8, 2013 at 10:41 PM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:
 Hi,

 I was (re-re-re-re)-reading Mike's post about Lucene segment merges -
 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 Mike mentioned lookhead as something that could possibly yield more
 optimal merges.

 But what about lookback? :)

 What if some sort of stats were kept about about which segments were
 picked for merges?  With some sort of stats in hand, could one look
 back and, knowing what happened after those merges, evaluate if more
 optimal merge choices could have been made and then use that next
 time?

 Also, what about time of day and query rates?  Very often search
 traffic follows the wave pattern, which could mean that more
 aggressive merging could be done during periods with lower query
 rates... or maybe during that time more segments could be allowed to
 live in the index, assuming that after allowing that for some time,
 the subsequent merge could be bigger/more thorough, so to speak.

 Thoughts?

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

2013-07-25 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719593#comment-13719593
 ] 

Yonik Seeley commented on SOLR-5069:


bq. It should be something we should think of as a feature of Solr.

Right - it's unrelated to this feature.  We've already kicked around the idea 
of roles for nodes for years now (like in SOLR-2765), and they would be 
useful in many contexts.  Someone actually needs to do the work though... 
patches welcome ;-)


 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5073) Improve SolrQuery class and add support for facet limit on per field basis in SolrJ

2013-07-25 Thread Sandro Mario Zbinden (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Mario Zbinden updated SOLR-5073:
---

Attachment: SOLR-5073.patch

Add patch that allows the class SolrQuery to set the facet.lmit on a per field 
basis with the new method setFacetLimit(String field, int limit) and 
getFacetLimit(int limit)

 Improve SolrQuery class and add support for facet limit on per field basis in 
 SolrJ
 ---

 Key: SOLR-5073
 URL: https://issues.apache.org/jira/browse/SOLR-5073
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 5.0, 4.4
Reporter: Sandro Mario Zbinden
Priority: Minor
  Labels: facet, solrj
 Attachments: SOLR-5073.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently the SolrQuery (org.apache.solr.client.solrj) class supports the 
 setFacetLimit(int limit) and getFacetLimit() method. 
 Recently someone added a feature to specifiy the facet.limit on a per field 
 basis. It would be great if this feature could be used from solrj.
 with setFacetLimit(String field, int limit) and getFacetLimit(String field)
 setFacetPrefix is already implemetned like this.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.

2013-07-25 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4916.
---

   Resolution: Fixed
Fix Version/s: (was: 4.5)
   4.4

 Add support to write and read Solr index files and transaction log files to 
 and from HDFS.
 --

 Key: SOLR-4916
 URL: https://issues.apache.org/jira/browse/SOLR-4916
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, 4.4

 Attachments: SOLR-4916-ivy.patch, 
 SOLR-4916-move-MiniDfsCluster-deps-from-solr-test-framework-to-solr-core.patch,
  SOLR-4916-nulloutput.patch, SOLR-4916-nulloutput.patch, SOLR-4916.patch, 
 SOLR-4916.patch, SOLR-4916.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

2013-07-25 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719612#comment-13719612
 ] 

Andrzej Bialecki  commented on SOLR-5069:
-

bq. some things will be completely streamable w/o any need for buffering... 
think of re-implementing the terms component here - we can access terms in 
sorted order so the reducer would simply need to do a merge sort on the streams 
and then stream that result back!
It could be probably implemented as a special case, because it strongly depends 
on the map() output being sorted. However, in general case reducer must wait 
for all mappers to finish because mappers may produce keys out of order and 
non-unique.

+1 on node roles, as a separate issue - it should not hold off this issue.

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

[jira] [Commented] (SOLR-5045) Pluggable Analytics

2013-07-25 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719614#comment-13719614
 ] 

Joel Bernstein commented on SOLR-5045:
--

Yeah, the plan eventually would be to port the techniques used in SOLR-2894 to 
a pluggable Aggregator. Ideally pluggable analytics would lead to the 
implementation of different aggregation libraries. Since they can be 
implemented as pure plugins, developers wouldn't have to worry about getting 
their library committed. Interesting commercial opportunity for developing and 
maintaining a high performance analytic library for Solr, above and beyond what 
the community provides. 

 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 {code:xml}
 lst name=aggregates
   lst name=mysum
 long name=sum85/long
   /lst
 /lst
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5131) CheckIndex is confusing for docvalues fields


[ 
https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719618#comment-13719618
 ] 

ASF subversion and git services commented on LUCENE-5131:
-

Commit 1506964 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1506964 ]

LUCENE-5131: CheckIndex is confusing for docvalues fields

 CheckIndex is confusing for docvalues fields
 

 Key: LUCENE-5131
 URL: https://issues.apache.org/jira/browse/LUCENE-5131
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5131.patch, LUCENE-5131.patch


 it prints things like:
 {noformat}
 test: docvalues...OK [0 total doc count; 18 docvalues fields]
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5131) CheckIndex is confusing for docvalues fields

2013-07-25 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5131.
-

   Resolution: Fixed
Fix Version/s: 4.5
   5.0

 CheckIndex is confusing for docvalues fields
 

 Key: LUCENE-5131
 URL: https://issues.apache.org/jira/browse/LUCENE-5131
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5131.patch, LUCENE-5131.patch


 it prints things like:
 {noformat}
 test: docvalues...OK [0 total doc count; 18 docvalues fields]
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5131) CheckIndex is confusing for docvalues fields


[ 
https://issues.apache.org/jira/browse/LUCENE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719620#comment-13719620
 ] 

ASF subversion and git services commented on LUCENE-5131:
-

Commit 1506968 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1506968 ]

LUCENE-5131: CheckIndex is confusing for docvalues fields

 CheckIndex is confusing for docvalues fields
 

 Key: LUCENE-5131
 URL: https://issues.apache.org/jira/browse/LUCENE-5131
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5131.patch, LUCENE-5131.patch


 it prints things like:
 {noformat}
 test: docvalues...OK [0 total doc count; 18 docvalues fields]
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4985) Make it easier to mix different kinds of FacetRequests


[ 
https://issues.apache.org/jira/browse/LUCENE-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719637#comment-13719637
 ] 

Shai Erera commented on LUCENE-4985:


I have been thinking about how to achieve that .. here's a proposal:

* Make FacetsAccumulator abstract with following current impls:
** TaxonomyFacetsAccumulator, assumes that TaxoReader is needed, FacetArrays 
etc.
** SortedSetFacetsAccumulator, assumes that categories were indexed to a 
SortedSetDVField
** RangeFacetsAccumulator, for computing facet ranges on NumericDV
** MultiFacetsAccumulator allows chaining several ones (basically a generic 
version of RangeFacetsAccumulatorWrapper)

* Add to FacetRequest.createFacetsAccumulator()
** CountFacetRequest, Association**FacetRequest return TaxoFacetsAccumulator
** SortedSetCountFacetRequest returns SortedSetFA (and also verify that the 
given CategoryPath was actually indexed in a SortedSetDVField)
** RangeFacetRequest returns RangeFacetsAccumulator

This pretty much divides the FacetRequests into the source from which they 
read the facets information. Now we need to handle the different aggregation 
functions currently supported by the TaxoFacetAcc variants: counting, 
associations. TaxoFacetAcc will let you specify the FacetsAggregator:
* CountFacetRequest will set the aggregator to FastCounting (if possible) or 
just Counting.
* **AssociationFacetRequest will set the aggregator to the matching one
* Additional requests can set their own aggregator
* FacetsAggregator will need to implement equals() and hashCode()

Then we'll have FacetsAccumulator.create(ListFacetRequest) which creates the 
right accumulator:

* Group all requests that use the same FacetsAccumulator, so that all RangeFRs 
are grouped together, all TaxoFacetAcc requests together etc.
* For the TaxoFacetAcc requests, it groups them by their aggregator, so that:
** All CountingAggregators that read the same category list are grouped 
together, separate from ones that do counting yet on a different category list
** All AssociationAggregators are grouped together, by their function, list id 
etc.
* It then creates either a single accumulator, or MultiFacetAccumulator which 
chains the accumulate call

What do we gain -- it's easy for an app to create the right accumulator for a 
given list of requests. Today it needs to sort of do this logic on its own, 
which is sometimes impossible (e.g. if it's a component that doesn't know what 
it's given). Also, the requests are self-descriptive.

What do we lose -- today if one wants to count A, B and C using 
CachedOrdsCountingFacetsAggregator, it needs to override 
FacetsAccumulator.getAggregator(), once. With this change, he will need to do 
that for every CountFacetRequest he creates .. I think that's an OK tradeoff, 
given the situation today which makes apps' life tougher.

I think we'll also need to create an Aggregator (old FacetsAggregator) wrapper. 
It is still needed by StandardFacetsAccumulator, until we finish the cleanup of 
sampling, complements counting etc. I'll look into that too, perhaps it can be 
done separately in a different issue.

Now need to hope I took all the parameters into account, and won't hit a brick 
wall when trying to implement it :).

 Make it easier to mix different kinds of FacetRequests
 --

 Key: LUCENE-4985
 URL: https://issues.apache.org/jira/browse/LUCENE-4985
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
 Fix For: 5.0, 4.5


 Spinoff from LUCENE-4980, where we added a strange class called 
 RangeFacetsAccumulatorWrapper, which takes an incoming FSP, splits out the 
 FacetRequests into range and non-range, delegates to two accumulators for 
 each set, and then zips the results back together in order.
 Somehow we should generalize this class and make it work with 
 SortedSetDocValuesAccumulator as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4221) Custom sharding


 [ 
https://issues.apache.org/jira/browse/SOLR-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-4221:
-

Attachment: SOLR-4221.patch

working patch with testcases . 

 Custom sharding
 ---

 Key: SOLR-4221
 URL: https://issues.apache.org/jira/browse/SOLR-4221
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Noble Paul
 Attachments: SOLR-4221.patch, SOLR-4221.patch


 Features to let users control everything about sharding/routing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5073) Improve SolrQuery class and add support for facet limit on per field basis in SolrJ

2013-07-25 Thread Sandro Mario Zbinden (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Mario Zbinden updated SOLR-5073:
---

Affects Version/s: (was: 4.4)
   4.5

 Improve SolrQuery class and add support for facet limit on per field basis in 
 SolrJ
 ---

 Key: SOLR-5073
 URL: https://issues.apache.org/jira/browse/SOLR-5073
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 5.0, 4.5
Reporter: Sandro Mario Zbinden
Priority: Minor
  Labels: facet, solrj
 Attachments: SOLR-5073.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently the SolrQuery (org.apache.solr.client.solrj) class supports the 
 setFacetLimit(int limit) and getFacetLimit() method. 
 Recently someone added a feature to specifiy the facet.limit on a per field 
 basis. It would be great if this feature could be used from solrj.
 with setFacetLimit(String field, int limit) and getFacetLimit(String field)
 setFacetPrefix is already implemetned like this.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

2013-07-25 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719654#comment-13719654
 ] 

Andrzej Bialecki  commented on SOLR-5069:
-

An alternative solution for minimizing the amount of data in memory during 
reduce phase is to use re-reduce, or a reduce-side combiner, using Hadoop 
terminology.

This is an additional function that runs on the reducer and periodically 
performs intermediate reductions of already accumulated values for a key, and 
preserves the intermediate results (and discards the accumulated values). This 
function does not emit anything to the final output. Then the final reduction 
function operates on a mix of values that arrived since the last intermediate 
reduction, plus all results of previous intermediate reductions.

This works well for simple aggregations (where the additional function may be 
in fact a copy of the reduce function) but may not be suitable to all classes 
of problems.

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 657 - Failure!

2013-07-25 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/657/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean

Error Message:
IOException occured when talking to server at: 
https://127.0.0.1:54453/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: https://127.0.0.1:54453/solr/collection1
at 
__randomizedtesting.SeedInfo.seed([99320139438B3BFA:FAD9003B23EFFCD8]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
at 
org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean(TestBatchUpdate.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at

[jira] [Updated] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-5127:

Attachment: LUCENE-5127.patch

I made some progress...

Finally clean up divisor and interval, which are only confusing to users since
they have done nothing in the default codec for so long: and in 5.x we dont
have to read any preflex indexes.

this makes interval a codec parameter for fixedgap and so on (like blocktree's
min/max). this is cleaner and more flexible anyway, because it means e.g. if
you use one of these codecs you can specify it per-field in the usual ways
rather than globally for the whole index.

the fieldcache-like divisor is gone. As far as the special -1 value, i didnt
yet clean this up, but i see two directions. The best IMO is to nuke the
mergeReader shit from ReadersAndLiveDocs completely. Otherwise we keep it and
codecs can do special shit based on IOContext, but in all cases we dont need a
special param.

tests are passing (at least once). More cleanups are needed to some of the
codec impls, and some of the special case tests for corner-case bugs in the
past (e.g. TII0+empty field name) should really be moved to fix-gap specific
unit tests.

FixedGapTermsIndex should use monotonic compression
---

Key: LUCENE-5127
URL: https://issues.apache.org/jira/browse/LUCENE-5127
Project: Lucene - Core
Issue Type: Improvement
Reporter: Robert Muir
Attachments: LUCENE-5127.patch, LUCENE-5127.patch

for the addresses in the big in-memory byte[] and disk blocks, we could save
a good deal of RAM here.
I think this codec just never got upgraded when we added these new packed
improvements, but it might be interesting to try to use for the terms data of
sorted/sortedset DV implementations.
patch works, but has nocommits and currently ignores the divisor. The
annoying problem there being that we have the shared interface with
get(int) for PackedInts.Mutable/Reader, but no equivalent base class for
monotonics get(long)...
Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #397: POMs out of sync

2013-07-25 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/397/

2 tests failed.
FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=5910, name=recoveryCmdExecutor-3203-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=5910, name=recoveryCmdExecutor-3203-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
at __randomizedtesting.SeedInfo.seed([A720B41D9C5A2470]:0)


FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:
   1) Thread[id=5910, name=recoveryCmdExecutor-3203-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719728#comment-13719728
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507035 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507035 ]

LUCENE-5127: create branch

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719729#comment-13719729
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507036 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507036 ]

LUCENE-5127: dump current state

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719732#comment-13719732
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507041 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507041 ]

LUCENE-5127: randomize codec parameter

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4489) StringIndexOutOfBoundsException in SpellCheckComponent

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719735#comment-13719735
 ] 

ASF subversion and git services commented on SOLR-4489:
---

Commit 1507042 from [~jdyer] in branch 'dev/trunk'
[ https://svn.apache.org/r1507042 ]

SOLR-4489: fix StringIndexOutOfBoundsException in SpellCheckComponent

 StringIndexOutOfBoundsException in SpellCheckComponent 
 ---

 Key: SOLR-4489
 URL: https://issues.apache.org/jira/browse/SOLR-4489
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.4, 4.3.1
 Environment: all
Reporter: venkata marrapu
Assignee: James Dyer
Priority: Minor
 Fix For: 4.5

 Attachments: SOLR-4489.patch, SOLR-4489.patch


 My SOLR request params are as shown below.
 spellcheck=trueenableElevation=truefacet=truespellcheck.q=minecraftspellcheck.extendedResults=truespellcheck.maxCollations=10spellcheck.collate=truewt=javabindefType=edismaxspellcheck.onlyMorePopular=true
   etc.
 Note: this work fine many use cases, however it fails for some query terms.
 Feb 22, 2013 11:06:04 AM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.StringIndexOutOfBoundsException: String index out of 
 range: -5
   at 
 java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
   at java.lang.StringBuilder.replace(StringBuilder.java:271)
   at 
 org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
   at 
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
   at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
   at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
   at org.eclipse.jetty.server.Server.handle(Server.java:351)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
   at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
   at java.lang.Thread.run(Thread.java:680)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA,

[jira] [Commented] (SOLR-4489) StringIndexOutOfBoundsException in SpellCheckComponent

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719741#comment-13719741
 ] 

ASF subversion and git services commented on SOLR-4489:
---

Commit 1507049 from [~jdyer] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1507049 ]

SOLR-4489: fix StringIndexOutOfBoundsException in SpellCheckComponent

 StringIndexOutOfBoundsException in SpellCheckComponent 
 ---

 Key: SOLR-4489
 URL: https://issues.apache.org/jira/browse/SOLR-4489
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.4, 4.3.1
 Environment: all
Reporter: venkata marrapu
Assignee: James Dyer
Priority: Minor
 Fix For: 4.5

 Attachments: SOLR-4489.patch, SOLR-4489.patch


 My SOLR request params are as shown below.
 spellcheck=trueenableElevation=truefacet=truespellcheck.q=minecraftspellcheck.extendedResults=truespellcheck.maxCollations=10spellcheck.collate=truewt=javabindefType=edismaxspellcheck.onlyMorePopular=true
   etc.
 Note: this work fine many use cases, however it fails for some query terms.
 Feb 22, 2013 11:06:04 AM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.StringIndexOutOfBoundsException: String index out of 
 range: -5
   at 
 java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
   at java.lang.StringBuilder.replace(StringBuilder.java:271)
   at 
 org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
   at 
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
   at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
   at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
   at org.eclipse.jetty.server.Server.handle(Server.java:351)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
   at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
   at java.lang.Thread.run(Thread.java:680)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more

[jira] [Resolved] (SOLR-4489) StringIndexOutOfBoundsException in SpellCheckComponent

2013-07-25 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-4489.
--

Resolution: Fixed

 StringIndexOutOfBoundsException in SpellCheckComponent 
 ---

 Key: SOLR-4489
 URL: https://issues.apache.org/jira/browse/SOLR-4489
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.4, 4.3.1
 Environment: all
Reporter: venkata marrapu
Assignee: James Dyer
Priority: Minor
 Fix For: 4.5

 Attachments: SOLR-4489.patch, SOLR-4489.patch


 My SOLR request params are as shown below.
 spellcheck=trueenableElevation=truefacet=truespellcheck.q=minecraftspellcheck.extendedResults=truespellcheck.maxCollations=10spellcheck.collate=truewt=javabindefType=edismaxspellcheck.onlyMorePopular=true
   etc.
 Note: this work fine many use cases, however it fails for some query terms.
 Feb 22, 2013 11:06:04 AM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.StringIndexOutOfBoundsException: String index out of 
 range: -5
   at 
 java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
   at java.lang.StringBuilder.replace(StringBuilder.java:271)
   at 
 org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
   at 
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
   at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
   at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
   at org.eclipse.jetty.server.Server.handle(Server.java:351)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
   at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
   at java.lang.Thread.run(Thread.java:680)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719749#comment-13719749
 ] 

Robert Muir commented on LUCENE-5133:
-

Why not use Object like the patch on LUCENE-4906 and try to get some 
consistency:
I can easily see this becoming hell because different expert users want 
different things.

It might work for your particular case to have String text + boolean, but other 
people might want to know crazy things like:
* score for the passage
* which multi-valued field instance they hit
* position or something of the passage within the doc

In general I also think its really bad to add additional classes that users 
must learn (the previous api here is string, which everyone already knows). 

anyway i dont care too much for this class, but I'd hate for us to make this 
mistake over on LUCENE-4906. I feel like the other highlighters already 
introduce way too many new classes (besides already known simple ones like 
IndexSearcher,TopDocs,String, etc) and it makes them difficult to use.

 AnalyzingInfixSuggester should return structured highlighted results instead 
 of single String per result
 

 Key: LUCENE-5133
 URL: https://issues.apache.org/jira/browse/LUCENE-5133
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5133.patch, LUCENE-5133.patch


 Today it renders to an HTML string (b../b for hits) in protected
 methods that one can override to change the highlighting, but this is
 hard/inefficient to use for search servers that want to e.g. return
 JSON representation of the highlighted result.
 This is the same issue as LUCENE-4906 (PostingsHighlighter) but for
 AnalyzingInfixSuggester's highlights instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719752#comment-13719752
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507054 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507054 ]

LUCENE-5127: fix solr tests

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719775#comment-13719775
 ] 

Michael McCandless commented on LUCENE-5127:


This cleanup is awesome, thanks Rob!

I think we should just nuke the special -1 don't load terms index value?

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5076) Make it possible to get list of collections with CollectionsHandler

2013-07-25 Thread Shawn Heisey (JIRA)

Shawn Heisey created SOLR-5076:
--

 Summary: Make it possible to get list of collections with 
CollectionsHandler
 Key: SOLR-5076
 URL: https://issues.apache.org/jira/browse/SOLR-5076
 Project: Solr
  Issue Type: Improvement
Reporter: Shawn Heisey
Priority: Minor


It would be very useful to have /admin/collections (CollectionsHandler) send a 
response similar to /admin/cores.  This should probably be the default action, 
but requiring ?action=STATUS wouldn't be the end of the world.

It would be very useful if CloudSolrServer were to implement a getCollections 
method, but that probably should be a separate issue.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf

2013-07-25 Thread ASF subversion and git services (JIRA)


: Crap, I just noticed Hoss's 
: https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr, which 
: is a shorter version of the 4.4 upgrade notes page I just created.

: It feels weird to have upgrade notes for different versions in multiple 
: places - maybe the previous release upgrade pages could stay where they 
: are, but add references to them from the current release upgrade notes?  
: Actually, it also seems weird that the previous 4.X upgrade notes are 
: under the Major Changes from Solr 3 to Solr 4 page in the left-hand 
: navigation pane.

Once upon a time, Major Changes from Solr 3 to Solr 4 was a top level 
section very early in the doc, and it had child pages for UPgrading to 
4.x for each of the 4.x versions released so far -- this was primarily 
because Lucid only hosted a single version of the guide for all of 4.x.

I had a discussion with cassandra on IRC about eliminating that page and 
it's children and having a single Upgrading page replace them (at the 
begining of the doc).  But then we decided that since this is the first 
official copy of the guide to be released by apache, we should keep the 
Major Changes from Solr 3 to Solr 4 page arround for at least one 
release as sort of as an appendix.

The fact that the other Upgrading to solr 4.x pages were left as 
children was purely a mistake on my part -- i ment to delete those.

Your new Upgrading to Solr 4.4 page is better then the one we alreayd 
have, but i think we should rename it to simply Upgrading Solr so that 
it has a consistent name/url moving forward.

I'll do the following:
 * delete all of the old upgrading pages
 * move your new upgrading page to the front of the doc  rename it 
   to the general Ugrading Solr
 * add the one sentence i think is missing to your upgrading page 
   (If you are upgrading from Solr 3.x, you should familiarize 
yourself with the Major Changes from Solr 3 to Solr 4.)
 * cut a new RC2



cool?



-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719793#comment-13719793
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507067 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507067 ]

LUCENE-5127: nuke mergeReader

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719798#comment-13719798
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507070 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507070 ]

LUCENE-5127: simplify seek-within-block

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719826#comment-13719826
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507075 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507075 ]

LUCENE-5127: explicit var gap testing part 1

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719836#comment-13719836
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507078 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507078 ]

LUCENE-5127: explicit var gap testing part 2

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 325 - Still Failing

2013-07-25 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/325/

No tests ran.

Build Log:
[...truncated 6600 lines...]
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:713)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167)
at com.sun.proxy.$Proxy40.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925)
at hudson.Launcher$ProcStarter.join(Launcher.java:360)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:160)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
at hudson.model.Run.execute(Run.java:1593)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:247)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: 
Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:773)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at hudson.remoting.Command.readFrom(Command.java:92)
at 
hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719873#comment-13719873
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507083 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507083 ]

LUCENE-5127: simplify vargap

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf



Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf

https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf

Changes since RC1...

 * Additional info from dsmiley on several pages related to spatial
 * Improvements in organization of Upgrading instuctions
 * minor corrections to the HDFS  Admin UI pages


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result


[ 
https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719877#comment-13719877
 ] 

Michael McCandless commented on LUCENE-5133:


OK I'll try to cutover to Object instead.

 AnalyzingInfixSuggester should return structured highlighted results instead 
 of single String per result
 

 Key: LUCENE-5133
 URL: https://issues.apache.org/jira/browse/LUCENE-5133
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5133.patch, LUCENE-5133.patch


 Today it renders to an HTML string (b../b for hits) in protected
 methods that one can override to change the highlighting, but this is
 hard/inefficient to use for search servers that want to e.g. return
 JSON representation of the highlighted result.
 This is the same issue as LUCENE-4906 (PostingsHighlighter) but for
 AnalyzingInfixSuggester's highlights instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf

2013-07-25 Thread ASF subversion and git services (JIRA)

+1

On Jul 25, 2013, at 2:24 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf
 
 https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf
 
 Changes since RC1...
 
 * Additional info from dsmiley on several pages related to spatial
 * Improvements in organization of Upgrading instuctions
 * minor corrections to the HDFS  Admin UI pages
 
 
 -Hoss
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

2013-07-25 Thread Andrew Muldowney (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Muldowney updated SOLR-2894:
---

Attachment: SOLR-2894.patch

Fixed an issue where commas in string fields would cause infinite refinement 
loops.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.5

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719892#comment-13719892
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507086 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507086 ]

LUCENE-5127: simplify fixedgap

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719894#comment-13719894
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507087 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507087 ]

LUCENE-5127: fix indent

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf

2013-07-25 Thread ASF subversion and git services (JIRA)


: Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf
: 
: https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf

+1

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #919: POMs out of sync

2013-07-25 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/919/

2 tests failed.
FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=2181, name=recoveryCmdExecutor-1072-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=2181, name=recoveryCmdExecutor-1072-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
at __randomizedtesting.SeedInfo.seed([C4A07F58248377E0]:0)


FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:
   1) Thread[id=2181, name=recoveryCmdExecutor-1072-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at

[jira] [Commented] (SOLR-5076) Make it possible to get list of collections with CollectionsHandler

2013-07-25 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719919#comment-13719919
 ] 

Shawn Heisey commented on SOLR-5076:


Slightly off-topic: The initial motivation for this issue is getting the 
collection list from CloudSolrServer, but when I went looking for ways to get 
that information in a machine-readable way from Solr without getting into 
zookeeper objects, I couldn't find one.  Within CloudSolrServer, it might make 
sense to use the ZK objects rather than /admin/collections, but I don't think 
we should force a user to do so.


 Make it possible to get list of collections with CollectionsHandler
 ---

 Key: SOLR-5076
 URL: https://issues.apache.org/jira/browse/SOLR-5076
 Project: Solr
  Issue Type: Improvement
Reporter: Shawn Heisey
Priority: Minor

 It would be very useful to have /admin/collections (CollectionsHandler) send 
 a response similar to /admin/cores.  This should probably be the default 
 action, but requiring ?action=STATUS wouldn't be the end of the world.
 It would be very useful if CloudSolrServer were to implement a getCollections 
 method, but that probably should be a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719932#comment-13719932
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507097 from [~mikemccand] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507097 ]

LUCENE-5127: add tests

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [CONF] Apache Solr Reference Guide Internal - How To Publish This Documentation

2013-07-25 Thread Uwe Schindler

Hi,

One question: should we also add signatures and checksums on the pdf artifact? 
In my opinion we should create those so we can verify that we all vote on the 
same pdf file created by the RM. The GPG signature would ensure this. 

Uwe



Hoss Man (Confluence) conflue...@apache.org schrieb:
Space: Apache Solr Reference Guide
(https://cwiki.apache.org/confluence/display/solr)
Page: Internal - How To Publish This Documentation
(https://cwiki.apache.org/confluence/display/solr/Internal+-+How+To+Publish+This+Documentation)

Change Comment:
-
tweak pre/post publish actions to match how the Upgrade page currently
exists

Edited by Hoss Man:
-
{toc}

h1. Pre-publication Actions

* Make sure that the [Upgrading Solr] page is up to date for the
current version.
* Sanity check that none of the [post-publishing version number
updating steps|#Update Links  Version Numbers] from the last version
published were skipped.

h1. How To Export the PDF from Confluence

* Load [The PDF Space Export
Page|https://cwiki.apache.org/confluence/spaces/flyingpdf/flyingpdf.action?key=solr]
in your browser
* Uncheck the box next to [** Internal MetaDocs] to suppress it and its
children from being included in the PDF
* Click the Export button
* On the subsequent page, wait for a Download here link to
dynamically appear.
* Click Download here and save the PDF to your local machine
* Use scp to copy the into your public_html directory on
people.apache.org, named appropriately as a release candidate.  For
example...
\\ 
{noformat}scp solr-220713-2054-17096.pdf
people.apache.org:public_html/apache-solr-ref-guide-4.4_RC1.pdf{noformat}

{note}The Export URLs returned by the Download here link won't work
from curl on people.apache.org, so you have to make a local copy
first.{note}

h1. Hold a VOTE

* Send an email to dev@lucene (CC general@lucene) with a Subject VOTE:
RC1 Release apache-solr-ref-guide-X.Y.pdf and include the full URL
from
{{http://people.apache.org/~yourname/apache-solr-ref-guide-X.Y_RC1.pdf}}.
* If there are problems with the RC that are fixed in Confluence,
Export a new copy (using the instructions above) with a new name (RC2,
RC3, etc...) and send out another VOTE thread.

h1. Publish to SvnSubPub  Mirrors

Once [three PMC members have voted for a release, it may be
published|http://www.apache.org/foundation/voting.html#ReleaseVotes]...

* Check-out the {{lucene/solr/ref-guide}} directory from the dist repo
(or svn update if you already have a checkout) ...
\\
{noformat}
svn co https://dist.apache.org/repos/dist/release/lucene/solr/ref-guide
solr-ref-guide-dist
# OR
svn update solr-ref-guide-dist
{noformat}
* Copy the RC ref guide into this directory using its final name and
commit...
\\
{noformat}
cp apache-solr-ref-guide-4.4_RC1.pdf
solr-ref-guide-dist/apache-solr-ref-guide-4.4.pdf
svn commit -m 4.4 ref guide solr-ref-guide-dist
{noformat}
* Wait 24 hours to give the mirrors a chance to get the new release. 
The status of the mirrors can be monitored using
{{dev-tools/scripts/poll-mirrors.pl}}...
\\
{noformat}
perl dev-tools/scripts/poll-mirrors.pl -details -p
lucene/solr/ref-guide/apache-solr-ref-guide-X.Y.pdf
{noformat}

h1. Post Publish Actions

Once most mirrors have been updated, we can link to (and announce) the
new guide.

h2. Update Links  Version Numbers

When linking to the current version of the ref guide, always use the
download redirector. Example:
{{https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide.X.Y.pdf}}

When linking to old versions of the ref guide, always use
archive.apache.org.  Example:
{{https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide.X.Y.pdf}}

h3. Website (lucene.apache.org)

* Update links on [the Solr documentation
page|https://lucene.apache.org/solr/documentation.html] to point to the
current version of the ref guide.
* (!) :TODO: Other places to link from? (!)

h3. Confluence

* On the [Confluence Theme Configuration
Page|https://cwiki.apache.org/confluence/spaces/doctheme/configuretheme.action?key=solr]
for the Solr Ref Guide...
** Update the Left Nav to add a link to the current version of the ref
guide.
** Update the Left Nav to change the link for the previous version(s)
of the ref guide so that they use the archive URL.
** Update the Left Nav and Header Message to refer to the next
version that the live copy of the documentation will refer to (ie: if
the 4.4 ref guide has just been published, change _*4.4* Draft Ref
Guide Topics_ to _*4.5* Draft Ref Guide Topics_ and _This
Unreleased Guide Will Cover Apache Solr *4.4*_ to _This Unreleased
Guide Will Cover Apache Solr *4.5*_)
* On the [Confluence PDF Layout
Page|https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdflayoutconfig.action?key=solr]
for the Solr Ref Guide...
** Update the Title Page to refer to the next version (ie: 4.4 \-

[jira] [Updated] (LUCENE-5133) AnalyzingInfixSuggester should return structured highlighted results instead of single String per result


 [ 
https://issues.apache.org/jira/browse/LUCENE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5133:
---

Attachment: LUCENE-5133.patch

New patch, cutover to Object.

It's more work for the [very expert] user since they need to re-implement the 
entire highlight method ... but I think that's acceptable.

 AnalyzingInfixSuggester should return structured highlighted results instead 
 of single String per result
 

 Key: LUCENE-5133
 URL: https://issues.apache.org/jira/browse/LUCENE-5133
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5133.patch, LUCENE-5133.patch, LUCENE-5133.patch


 Today it renders to an HTML string (b../b for hits) in protected
 methods that one can override to change the highlighting, but this is
 hard/inefficient to use for search servers that want to e.g. return
 JSON representation of the highlighted result.
 This is the same issue as LUCENE-4906 (PostingsHighlighter) but for
 AnalyzingInfixSuggester's highlights instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4876) IndexWriterConfig.clone should clone the MergeScheduler


[ 
https://issues.apache.org/jira/browse/LUCENE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719987#comment-13719987
 ] 

Shai Erera commented on LUCENE-4876:


Perhaps we can do a minor change -- stop calling IWC.clone() by IW on init. We 
keep clone() on IWC, and the rest of the objects, and tell users that it's 
their responsibility to call IWC.clone() before passing to IW? That's line a 
1-liner change (well + clarifying the jdocs), that will make 99% of the users 
happy. The rest should just do {{new IW(dir, conf.clone())}} ... that's simple 
enough?

 IndexWriterConfig.clone should clone the MergeScheduler
 ---

 Key: LUCENE-4876
 URL: https://issues.apache.org/jira/browse/LUCENE-4876
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.3

 Attachments: LUCENE-4876.patch, LUCENE-4876.patch


 ConcurrentMergeScheduler has a ListMergeThread member to track the running 
 merging threads, so IndexWriterConfig.clone should clone the merge scheduler 
 so that both IndexWriterConfig instances are independant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4876) IndexWriterConfig.clone should clone the MergeScheduler

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719988#comment-13719988
 ] 

Michael McCandless commented on LUCENE-4876:


+1

 IndexWriterConfig.clone should clone the MergeScheduler
 ---

 Key: LUCENE-4876
 URL: https://issues.apache.org/jira/browse/LUCENE-4876
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.3

 Attachments: LUCENE-4876.patch, LUCENE-4876.patch


 ConcurrentMergeScheduler has a ListMergeThread member to track the running 
 merging threads, so IndexWriterConfig.clone should clone the merge scheduler 
 so that both IndexWriterConfig instances are independant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720005#comment-13720005
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507111 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507111 ]

LUCENE-5127: clear nocommits

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720024#comment-13720024
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507116 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507116 ]

LUCENE-5127: fix TestLucene40PF and clean up some more outdated stuff

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720027#comment-13720027
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507118 from [~mikemccand] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507118 ]

LUCENE-5127: fix false fail when terms dict is a ghostbuster

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720044#comment-13720044
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507120 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507120 ]

LUCENE-5127: clean up error msgs

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5136) Improve FacetRequest javadocs

Shai Erera created LUCENE-5136:
--

 Summary: Improve FacetRequest javadocs
 Key: LUCENE-5136
 URL: https://issues.apache.org/jira/browse/LUCENE-5136
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5


While working on LUCENE-4985, I noticed that FacetRequest's jdocs are severely 
outdated. I rewrote them entirely, so prefer to commit them separately than the 
rest of the changes. Will post a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5136) Improve FacetRequest javadocs


 [ 
https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5136:
---

Attachment: LUCENE-5136.patch

if others have suggestions for better wording, you are more than welcome to let 
me know. Otherwise, I will commit this tomorrow.

 Improve FacetRequest javadocs
 -

 Key: LUCENE-5136
 URL: https://issues.apache.org/jira/browse/LUCENE-5136
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5136.patch


 While working on LUCENE-4985, I noticed that FacetRequest's jdocs are 
 severely outdated. I rewrote them entirely, so prefer to commit them 
 separately than the rest of the changes. Will post a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf

2013-07-25 Thread Yonik Seeley

+1

Nice job everyone!

-Yonik
http://lucidworks.com


On Thu, Jul 25, 2013 at 2:24 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf

 https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf

 Changes since RC1...

  * Additional info from dsmiley on several pages related to spatial
  * Improvements in organization of Upgrading instuctions
  * minor corrections to the HDFS  Admin UI pages


 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5136) Improve FacetRequest javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720118#comment-13720118
 ] 

Michael McCandless commented on LUCENE-5136:


+1

Minor things:

  * sepcify - specify

  * Such requests will also usually won't use - Such requests won't
use (?)


 Improve FacetRequest javadocs
 -

 Key: LUCENE-5136
 URL: https://issues.apache.org/jira/browse/LUCENE-5136
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5136.patch


 While working on LUCENE-4985, I noticed that FacetRequest's jdocs are 
 severely outdated. I rewrote them entirely, so prefer to commit them 
 separately than the rest of the changes. Will post a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf

2013-07-25 Thread Smiley, David W.



On 7/25/13 4:29 AM, Steve Rowe sar...@gmail.com wrote:


I also noticed that David Smiley made a bunch of modifications, AFAICT to
spatial and related topics, and it would be good to include those.

Yes, the spatial page needed an overhaul; I think it's much better now.
It was a refactor to better express the existing information; it doesn't
really convey anything new.  I'll do a lot more to it in a future release.

What do you mean by it would be good to include those?  Include my
changes where?  It's in the PDF I saw that was just published which I
presumed it would be by virtue of me making my edits within the timeframe
Hoss outlined.

~ David


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4876) IndexWriterConfig.clone should clone the MergeScheduler

2013-07-25 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720129#comment-13720129
 ] 

Yonik Seeley commented on LUCENE-4876:
--

bq. We keep clone() on IWC, and the rest of the objects, and tell users that 
it's their responsibility to call IWC.clone()

+1

 IndexWriterConfig.clone should clone the MergeScheduler
 ---

 Key: LUCENE-4876
 URL: https://issues.apache.org/jira/browse/LUCENE-4876
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.3

 Attachments: LUCENE-4876.patch, LUCENE-4876.patch


 ConcurrentMergeScheduler has a ListMergeThread member to track the running 
 merging threads, so IndexWriterConfig.clone should clone the merge scheduler 
 so that both IndexWriterConfig instances are independant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5137) UAX29URLEmailTokenizer.java causes NullPointerException in 4.3 and 4.4

2013-07-25 Thread Allan Rofer (JIRA)

Allan Rofer created LUCENE-5137:
---

 Summary: UAX29URLEmailTokenizer.java causes NullPointerException 
in 4.3 and 4.4
 Key: LUCENE-5137
 URL: https://issues.apache.org/jira/browse/LUCENE-5137
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.3
 Environment: Windows 7
Reporter: Allan Rofer


There is a comment (best effort NPE if you dont call reset) in the 
getScannerFor method in UAX29URLEmailTokenizer. The callers of getScannerFor 
do NOT call reset, so an NPE is thrown in the parser which has a null Reader.  
If you put the line this.scanner.yyreset(input); after each call to 
getScannerFor, the NPE is avoided.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf

On Jul 25, 2013 6:00 PM, Smiley, David W. dsmi...@mitre.org wrote:
 On 7/25/13 4:29 AM, Steve Rowe sar...@gmail.com wrote:

 
 I also noticed that David Smiley made a bunch of modifications, AFAICT to
 spatial and related topics, and it would be good to include those.
 [...]
 What do you mean by it would be good to include those?  Include my
 changes where?  It's in the PDF I saw that was just published which I
 presumed it would be by virtue of me making my edits within the timeframe
 Hoss outlined.

David, you made your edits after Hoss called the RC1 vote - I was arguing
for an RC2 based partly on your changes.

Steve

Re: [CONF] Apache Solr Reference Guide Internal - How To Publish This Documentation


: One question: should we also add signatures and checksums on the pdf 
: artifact? In my opinion we should create those so we can verify that we 
: all vote on the same pdf file created by the RM. The GPG signature would 
: ensure this.

Good question.

I briefly considered this a while back when i first started drafting up 
the process (i think i even asked about it on IRC and got no response) but 
ultimately didn't include it because...

 1) i didn't see any risk from potentially rouge mirrors trying 
to modify the docs (not like with source code)
 2) from the precendence i could see from httpd-docs, they didn't 
bother with signing or providing checksums for their doc releases
 3) i was trying to keey things simple.

But you're right -- particularly for ensuring that we are all voting on 
the same thing having sigs/checksums are a good idea -- and if we're going 
to generate them, we might as well also push them to the mirrors.

I'll update the docs, but in the meantime I don't think we need to call a 
new VOTE of a new RC -- but i'll reply to the existing RC2 thread 
with specifics on the sig/checksum.



-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf


: : Please VOTE to release the following PDF as apache-solr-ref-guide-4.4.pdf
: : 
: : https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf

For completeness, the RC2 artifact i'm voting +1 to is...

2973817acf6ea5e4b607e5eac2bd49d7857b5406  apache-solr-ref-guide-4.4_RC2.pdf

checksum  PGP sig...

https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf.sha1
https://people.apache.org/~hossman/apache-solr-ref-guide-4.4_RC2.pdf.asc


NOTE: My PGP key is brand new (I never needed one before today), and not 
really in the web of trust yet, but it has been slurped in by the ASF 
key management system, so folks should be able to verify that sig...

https://people.apache.org/keys/committer/hossman.asc
https://people.apache.org/keys/group/lucene-pmc.asc


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5137) UAX29URLEmailTokenizer.java causes NullPointerException in 4.3 and 4.4

2013-07-25 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5137.
-

Resolution: Not A Problem

The best effort is throwing a problem because the consumer (you) isn't calling 
reset.

See the javadocs of tokenstream. you must call reset before the incrementToken 
loop.

 UAX29URLEmailTokenizer.java causes NullPointerException in 4.3 and 4.4
 --

 Key: LUCENE-5137
 URL: https://issues.apache.org/jira/browse/LUCENE-5137
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.3
 Environment: Windows 7
Reporter: Allan Rofer
   Original Estimate: 1h
  Remaining Estimate: 1h

 There is a comment (best effort NPE if you dont call reset) in the 
 getScannerFor method in UAX29URLEmailTokenizer. The callers of 
 getScannerFor do NOT call reset, so an NPE is thrown in the parser which has 
 a null Reader.  If you put the line this.scanner.yyreset(input); after each 
 call to getScannerFor, the NPE is avoided.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression


[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720357#comment-13720357
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507179 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507179 ]

LUCENE-5127: use less ram when writing the terms index

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5127:


Attachment: LUCENE-5127.patch

Patch for trunk, i think its ready.

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC2 Release apache-solr-ref-guide-4.4.pdf

2013-07-25 Thread David Smiley (@MITRE.org)

+1
Thanks Hoss, Cassandra, and to everyone else who contributed!



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/VOTE-RC2-Release-apache-solr-ref-guide-4-4-pdf-tp4080395p4080488.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC1 Release apache-solr-ref-guide-4.4.pdf

2013-07-25 Thread David Smiley (@MITRE.org)

Oh; I should have read more carefully.  Thanks!
~ David


sarowe wrote
 David, you made your edits after Hoss called the RC1 vote - I was arguing
 for an RC2 based partly on your changes.
 
 Steve





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/VOTE-RC1-Release-apache-solr-ref-guide-4-4-pdf-tp4080196p4080489.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-25 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720397#comment-13720397
 ] 

Hoss Man commented on SOLR-5057:


Similar to yoniks point initial point: In my experience, the situations where 
folks are going to be most concerned about having good cache usage are the 
situations where queries are generated programatically and the order of the 
filter queries is already deterministic (or can be made deterministic easy 
enough by the client)

My straw man suggestion would be to not modify QueryResultKey at all, and 
instead write a new (optional) SearchComponent that did nothing by sort the 
getFilters() array in it's prepare() method.  Users who can't ensure that 
requests with equivalent fq params queries come in the same order can 
register it to run just after the query component and get good cache hit 
ratios, but it wouldn't affect the performance in any way for users who send 
queries with fqs i na determinstic manner

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-5057.patch, SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5136) Improve FacetRequest javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720409#comment-13720409
 ] 

ASF subversion and git services commented on LUCENE-5136:
-

Commit 1507194 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1507194 ]

LUCENE-5136: improve FacetRequest javadocs

 Improve FacetRequest javadocs
 -

 Key: LUCENE-5136
 URL: https://issues.apache.org/jira/browse/LUCENE-5136
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5136.patch


 While working on LUCENE-4985, I noticed that FacetRequest's jdocs are 
 severely outdated. I rewrote them entirely, so prefer to commit them 
 separately than the rest of the changes. Will post a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5136) Improve FacetRequest javadocs

2013-07-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720412#comment-13720412
 ] 

ASF subversion and git services commented on LUCENE-5136:
-

Commit 1507195 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1507195 ]

LUCENE-5136: improve FacetRequest javadocs

 Improve FacetRequest javadocs
 -

 Key: LUCENE-5136
 URL: https://issues.apache.org/jira/browse/LUCENE-5136
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5136.patch


 While working on LUCENE-4985, I noticed that FacetRequest's jdocs are 
 severely outdated. I rewrote them entirely, so prefer to commit them 
 separately than the rest of the changes. Will post a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5136) Improve FacetRequest javadocs