[jira] Commented: (SOLR-72) specify max buffered docs memory for IndexWriter in solrconfig.xml

2006-11-22 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-72?page=comments#action_12452076 ] 

Yonik Seeley commented on SOLR-72:
--

perhaps add memory usage of buffered documents to the statistics too.

> specify max buffered docs memory for IndexWriter in solrconfig.xml
> --
>
> Key: SOLR-72
> URL: http://issues.apache.org/jira/browse/SOLR-72
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Priority: Minor
>
> Take advantage of this: 
> https://issues.apache.org/jira/browse/LUCENE-709

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support

2006-11-22 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-69?page=comments#action_12452044 ] 

Yonik Seeley commented on SOLR-69:
--

I finally got around to checking this out... looks cool!
In your example URL, it looks like mindf=1 is repeated... is that right, or 
should one of them have been mintf=1?


> PATCH:MoreLikeThis support
> --
>
> Key: SOLR-69
> URL: http://issues.apache.org/jira/browse/SOLR-69
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch
>
>
> Here's a patch that implements simple support of Lucene's MoreLikeThis class.
> The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
> more appropriate ;-) Erik Hatcher's example mentioned in 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html
> To use it, add at least the following parameters to a standard or dismax 
> query:
>   mlt=true
>   mlt.fl=list,of,fields,which,define,similarity
> See the MoreLikeThisHelper source code for more parameters.
> Here are two URLs that work with the example config, after loading all 
> documents found in exampledocs in the index (just to show that it seems to 
> work - of course you need a larger corpus to make it interesting):
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> Results are added to the output like this:
> 
>   ...
>   
> 
>   
> 1.5293242
> SOLR1000
>   
> 
> 
>   
> 1.5293242
> UTF8TEST
>   
> 
>   
> I haven't tested this extensively yet, will do in the next few days. But 
> comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-22 Thread Yonik Seeley

On 11/22/06, Walter Underwood <[EMAIL PROTECTED]> wrote:

> I took pains to make things streamable.. I'd hate to discard that.
> How do other servers handle streaming back a response and hitting an error?

Does Lucene access fetch information from disk while we iterate
through the search results?


Yes.

Originally, all the documents were retrieved up-front, and the
response writer didn't even have access to the IndexReader.  After
seeing some users ask for some fields of *all* the documents in an
index on a different search product, I decided I'd better add
streamability to avoid OOM errors. A secondary consideration was
improving latency of the first document to the client when there are a
large number to be returned.

So Solr currently only records the ids (the internal integer lucene
docid) and optionally scores for documents to be returned.  During
response writing, the document for each id is read (which may involve
going to disk) right before it is written to the output stream.

-Yonik


Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-22 Thread Walter Underwood
On 11/20/06 5:51 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
>> : If you really want to handle failure in an error response, write that
>> : to a string and if that fails, send a hard-coded string.
>> 
>> Hmmm... i could definitely get on board an idea like that.
> 
> I took pains to make things streamable.. I'd hate to discard that.
> How do other servers handle streaming back a response and hitting an error?

You found the design tradeoff! We can stream the results or we can
give reliable error codes for errors that happen during result processing.
We can't do both. Ultraseek does streaming, but we were generating
HTML, so we could print reasonable errors in-line.

Streaming is very useful for HTML pages, because it allows the first
pixels to be painted as soon as possible. It isn't as important on the
back end, unless someone has gone to the considerable trouble of making
their entire front-end able to stream the back-end results to HTML.

If we aren't calling Writer.flush occasionally, then the streaming is
just filling up a buffer smoothly. The client won't see anything until
TCP decides to send it.

Does Lucene access fetch information from disk while we iterate
through the search results? If that happens a few times, then
streaming might make a difference. If it is mostly CPU-bound,
then streaming probably doesn't help.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: SolrIndexSearcher HitCollector

2006-11-22 Thread Yonik Seeley

On 11/22/06, Peter Keegan <[EMAIL PROTECTED]> wrote:

I see. So, does the trunk version always deliver docs in order, or is it bad
to assume so?


Yes, the trunk version does unless someone sets BooleanQuery.useScorer14 to true

-Yonik


Re: SolrIndexSearcher HitCollector

2006-11-22 Thread Peter Keegan

I see. So, does the trunk version always deliver docs in order, or is it bad
to assume so?

Peter


On 11/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 11/22/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
>   The following code is from the HitCollector of SolrIndexSearcher:
>
>
>   if (numHits[0]++ < lastDocRequested || score >= minScore) {
> // if docs are always delivered in order, we could use
"score>minScore"
> // but might BooleanScorer14 might still be used and deliver docs
> out-of-order?
> hq.insert(new ScoreDoc(doc, score));
> minScore = ((ScoreDoc)hq.top()).score;
>   }
>
>   Could someone explain this conditional and whether or not it is valid
when
> used with the trunk version of BooleanScorer2?

Yes, this code is valid for both scorers.

After being initially confused by my own comments, I just clarified them:
// TODO: if docs are always delivered in order, we could
use "score>minScore"
// instead of "score>=minScore" and avoid tiebreaking scores
// in the priority queue.
// but might BooleanScorer14 might still be used and
deliver docs out-of-order?

This is for the no-sort case (meaning sort-by-score).  To get a stable
sort, a secondary sort is done on docid when the score matches.  If we
knew that docs were always delivered in order, we could avoid putting
docs with scores matching the current min score in the priority queue.

That could be a decent optimization when there are many docs with the
same score (think range query, terms with the same idf, etc)

-Yonik



Re: SolrIndexSearcher HitCollector

2006-11-22 Thread Yonik Seeley

On 11/22/06, Peter Keegan <[EMAIL PROTECTED]> wrote:

  The following code is from the HitCollector of SolrIndexSearcher:


  if (numHits[0]++ < lastDocRequested || score >= minScore) {
// if docs are always delivered in order, we could use "score>minScore"
// but might BooleanScorer14 might still be used and deliver docs
out-of-order?
hq.insert(new ScoreDoc(doc, score));
minScore = ((ScoreDoc)hq.top()).score;
  }

  Could someone explain this conditional and whether or not it is valid when
used with the trunk version of BooleanScorer2?


Yes, this code is valid for both scorers.

After being initially confused by my own comments, I just clarified them:
   // TODO: if docs are always delivered in order, we could
use "score>minScore"
   // instead of "score>=minScore" and avoid tiebreaking scores
   // in the priority queue.
   // but might BooleanScorer14 might still be used and
deliver docs out-of-order?

This is for the no-sort case (meaning sort-by-score).  To get a stable
sort, a secondary sort is done on docid when the score matches.  If we
knew that docs were always delivered in order, we could avoid putting
docs with scores matching the current min score in the priority queue.

That could be a decent optimization when there are many docs with the
same score (think range query, terms with the same idf, etc)

-Yonik


SolrIndexSearcher HitCollector

2006-11-22 Thread Peter Keegan

 The following code is from the HitCollector of SolrIndexSearcher:


 if (numHits[0]++ < lastDocRequested || score >= minScore) {
   // if docs are always delivered in order, we could use "score>minScore"
   // but might BooleanScorer14 might still be used and deliver docs
out-of-order?
   hq.insert(new ScoreDoc(doc, score));
   minScore = ((ScoreDoc)hq.top()).score;
 }

 Could someone explain this conditional and whether or not it is valid when
used with the trunk version of BooleanScorer2?

 Thanks,
 Peter


Re: XML vs. JSON, Python, Ruby

2006-11-22 Thread Erik Hatcher

Seconded I'm happily using the Ruby format with a Rails application.

It is very nice that Solr has this flexible output capability.

Erik


On Nov 22, 2006, at 3:57 AM, Mike Klaas wrote:


On 11/21/06, Fuad Efendi <[EMAIL PROTECTED]> wrote:

SOLR is a Web-Application with well-defined XML-based API:
- indexing service
- asynchronous; no need for 'real time' (content has well-defined  
TTL); can

use HTTP Caching for increased performance
- provides native support for XSL

The question: do we really need to maintain JSON/Puby as a  
ServletOutput? We
can focus on 'Public XML API' only, and provide samples of XSL-to- 
JSON,

XML-to-WML, and etc...


-1.  Python, ruby, and JSON are going to be increasingly important on
the web, and maintaining those interfaces is a feature that gives solr
a more cutting-edge feel.

The alternative interfaces can also be much more efficient for  
these languages.


-Mike




Re: XML vs. JSON, Python, Ruby

2006-11-22 Thread Mike Klaas

On 11/21/06, Fuad Efendi <[EMAIL PROTECTED]> wrote:

SOLR is a Web-Application with well-defined XML-based API:
- indexing service
- asynchronous; no need for 'real time' (content has well-defined TTL); can
use HTTP Caching for increased performance
- provides native support for XSL

The question: do we really need to maintain JSON/Puby as a ServletOutput? We
can focus on 'Public XML API' only, and provide samples of XSL-to-JSON,
XML-to-WML, and etc...


-1.  Python, ruby, and JSON are going to be increasingly important on
the web, and maintaining those interfaces is a feature that gives solr
a more cutting-edge feel.

The alternative interfaces can also be much more efficient for these languages.

-Mike