[jira] Commented: (SOLR-72) specify max buffered docs memory for IndexWriter in solrconfig.xml
[ http://issues.apache.org/jira/browse/SOLR-72?page=comments#action_12452076 ] Yonik Seeley commented on SOLR-72: -- perhaps add memory usage of buffered documents to the statistics too. > specify max buffered docs memory for IndexWriter in solrconfig.xml > -- > > Key: SOLR-72 > URL: http://issues.apache.org/jira/browse/SOLR-72 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Priority: Minor > > Take advantage of this: > https://issues.apache.org/jira/browse/LUCENE-709 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support
[ http://issues.apache.org/jira/browse/SOLR-69?page=comments#action_12452044 ] Yonik Seeley commented on SOLR-69: -- I finally got around to checking this out... looks cool! In your example URL, it looks like mindf=1 is repeated... is that right, or should one of them have been mintf=1? > PATCH:MoreLikeThis support > -- > > Key: SOLR-69 > URL: http://issues.apache.org/jira/browse/SOLR-69 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Bertrand Delacretaz >Priority: Minor > Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch > > > Here's a patch that implements simple support of Lucene's MoreLikeThis class. > The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be > more appropriate ;-) Erik Hatcher's example mentioned in > http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html > To use it, add at least the following parameters to a standard or dismax > query: > mlt=true > mlt.fl=list,of,fields,which,define,similarity > See the MoreLikeThisHelper source code for more parameters. > Here are two URLs that work with the example config, after loading all > documents found in exampledocs in the index (just to show that it seems to > work - of course you need a larger corpus to make it interesting): > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > Results are added to the output like this: > > ... > > > > 1.5293242 > SOLR1000 > > > > > 1.5293242 > UTF8TEST > > > > I haven't tested this extensively yet, will do in the next few days. But > comments are welcome of course. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30
On 11/22/06, Walter Underwood <[EMAIL PROTECTED]> wrote: > I took pains to make things streamable.. I'd hate to discard that. > How do other servers handle streaming back a response and hitting an error? Does Lucene access fetch information from disk while we iterate through the search results? Yes. Originally, all the documents were retrieved up-front, and the response writer didn't even have access to the IndexReader. After seeing some users ask for some fields of *all* the documents in an index on a different search product, I decided I'd better add streamability to avoid OOM errors. A secondary consideration was improving latency of the first document to the client when there are a large number to be returned. So Solr currently only records the ids (the internal integer lucene docid) and optionally scores for documents to be returned. During response writing, the document for each id is read (which may involve going to disk) right before it is written to the output stream. -Yonik
Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30
On 11/20/06 5:51 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: >> : If you really want to handle failure in an error response, write that >> : to a string and if that fails, send a hard-coded string. >> >> Hmmm... i could definitely get on board an idea like that. > > I took pains to make things streamable.. I'd hate to discard that. > How do other servers handle streaming back a response and hitting an error? You found the design tradeoff! We can stream the results or we can give reliable error codes for errors that happen during result processing. We can't do both. Ultraseek does streaming, but we were generating HTML, so we could print reasonable errors in-line. Streaming is very useful for HTML pages, because it allows the first pixels to be painted as soon as possible. It isn't as important on the back end, unless someone has gone to the considerable trouble of making their entire front-end able to stream the back-end results to HTML. If we aren't calling Writer.flush occasionally, then the streaming is just filling up a buffer smoothly. The client won't see anything until TCP decides to send it. Does Lucene access fetch information from disk while we iterate through the search results? If that happens a few times, then streaming might make a difference. If it is mostly CPU-bound, then streaming probably doesn't help. wunder -- Walter Underwood Search Guru, Netflix
Re: SolrIndexSearcher HitCollector
On 11/22/06, Peter Keegan <[EMAIL PROTECTED]> wrote: I see. So, does the trunk version always deliver docs in order, or is it bad to assume so? Yes, the trunk version does unless someone sets BooleanQuery.useScorer14 to true -Yonik
Re: SolrIndexSearcher HitCollector
I see. So, does the trunk version always deliver docs in order, or is it bad to assume so? Peter On 11/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 11/22/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > The following code is from the HitCollector of SolrIndexSearcher: > > > if (numHits[0]++ < lastDocRequested || score >= minScore) { > // if docs are always delivered in order, we could use "score>minScore" > // but might BooleanScorer14 might still be used and deliver docs > out-of-order? > hq.insert(new ScoreDoc(doc, score)); > minScore = ((ScoreDoc)hq.top()).score; > } > > Could someone explain this conditional and whether or not it is valid when > used with the trunk version of BooleanScorer2? Yes, this code is valid for both scorers. After being initially confused by my own comments, I just clarified them: // TODO: if docs are always delivered in order, we could use "score>minScore" // instead of "score>=minScore" and avoid tiebreaking scores // in the priority queue. // but might BooleanScorer14 might still be used and deliver docs out-of-order? This is for the no-sort case (meaning sort-by-score). To get a stable sort, a secondary sort is done on docid when the score matches. If we knew that docs were always delivered in order, we could avoid putting docs with scores matching the current min score in the priority queue. That could be a decent optimization when there are many docs with the same score (think range query, terms with the same idf, etc) -Yonik
Re: SolrIndexSearcher HitCollector
On 11/22/06, Peter Keegan <[EMAIL PROTECTED]> wrote: The following code is from the HitCollector of SolrIndexSearcher: if (numHits[0]++ < lastDocRequested || score >= minScore) { // if docs are always delivered in order, we could use "score>minScore" // but might BooleanScorer14 might still be used and deliver docs out-of-order? hq.insert(new ScoreDoc(doc, score)); minScore = ((ScoreDoc)hq.top()).score; } Could someone explain this conditional and whether or not it is valid when used with the trunk version of BooleanScorer2? Yes, this code is valid for both scorers. After being initially confused by my own comments, I just clarified them: // TODO: if docs are always delivered in order, we could use "score>minScore" // instead of "score>=minScore" and avoid tiebreaking scores // in the priority queue. // but might BooleanScorer14 might still be used and deliver docs out-of-order? This is for the no-sort case (meaning sort-by-score). To get a stable sort, a secondary sort is done on docid when the score matches. If we knew that docs were always delivered in order, we could avoid putting docs with scores matching the current min score in the priority queue. That could be a decent optimization when there are many docs with the same score (think range query, terms with the same idf, etc) -Yonik
SolrIndexSearcher HitCollector
The following code is from the HitCollector of SolrIndexSearcher: if (numHits[0]++ < lastDocRequested || score >= minScore) { // if docs are always delivered in order, we could use "score>minScore" // but might BooleanScorer14 might still be used and deliver docs out-of-order? hq.insert(new ScoreDoc(doc, score)); minScore = ((ScoreDoc)hq.top()).score; } Could someone explain this conditional and whether or not it is valid when used with the trunk version of BooleanScorer2? Thanks, Peter
Re: XML vs. JSON, Python, Ruby
Seconded I'm happily using the Ruby format with a Rails application. It is very nice that Solr has this flexible output capability. Erik On Nov 22, 2006, at 3:57 AM, Mike Klaas wrote: On 11/21/06, Fuad Efendi <[EMAIL PROTECTED]> wrote: SOLR is a Web-Application with well-defined XML-based API: - indexing service - asynchronous; no need for 'real time' (content has well-defined TTL); can use HTTP Caching for increased performance - provides native support for XSL The question: do we really need to maintain JSON/Puby as a ServletOutput? We can focus on 'Public XML API' only, and provide samples of XSL-to- JSON, XML-to-WML, and etc... -1. Python, ruby, and JSON are going to be increasingly important on the web, and maintaining those interfaces is a feature that gives solr a more cutting-edge feel. The alternative interfaces can also be much more efficient for these languages. -Mike
Re: XML vs. JSON, Python, Ruby
On 11/21/06, Fuad Efendi <[EMAIL PROTECTED]> wrote: SOLR is a Web-Application with well-defined XML-based API: - indexing service - asynchronous; no need for 'real time' (content has well-defined TTL); can use HTTP Caching for increased performance - provides native support for XSL The question: do we really need to maintain JSON/Puby as a ServletOutput? We can focus on 'Public XML API' only, and provide samples of XSL-to-JSON, XML-to-WML, and etc... -1. Python, ruby, and JSON are going to be increasingly important on the web, and maintaining those interfaces is a feature that gives solr a more cutting-edge feel. The alternative interfaces can also be much more efficient for these languages. -Mike