Re: [jira] Updated: (SOLR-52) Lazy Field loading
: updated version of patch. Addresses some of Hoss' (minor) comments. : Also, the .doc() method of SolrIndexSearcher will added the unique key : field unconditionally if it is present in the schema. IT is used : randomly in several places and including checks for it in other places : decreases readability. We probably don't want to add the unique key field directly to Set passed by the client -- partially because it's bad form to modify a collection as a side affect of another method, but also because Set.add is an optional method that might through UnsupportedOperationException. -Hoss
Re: changes before release?
: http://issues.apache.org/jira/browse/SOLR-49 (XSLTResponseWriter) : could be committed. It has no impact on existing code and can be : useful for simple setups, demos, etc. On the subject of stablizing the external APIs, the one thing about your patch in it's current format that I rememebr not being fond of using XML node attributes to configure queryResponseWriters instead of refactoring the code used to get requestHandler configuration using nested XML as a NamedList (which can easily be converted to SolrParams). Other then that, i agree it would be a really handy thing to have in our first release. -Hoss
Re: changes before release?
: - stabilize/review external api (query parameters, : schema.xml/solrconfig.xml format, XML response format). (for Right ... it's not something that i've thought about lately, but doing something with the XSD in SOLR-17 so that the XML output format can be validated would probably be a good idea for an "official" release. -Hoss
Re: changes before release?
Hi, On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: There is other stuff that needs to be done before we can release, such as a change of copyright notices to comply with new ASF requirements. I imagine incubator-general will go over the first release we make with a fine-tooth comb looking for compliance with ASF release rules. Yes. Speaking with my incubator PMC hat on, I think we should do the release prep stuff (LICENSE and NOTICE files, license resolution, etc.) first before the technical things, because we know less about them, they're required, and we are likely to underestimate the effort. On 10/9/06, Mike Klaas <[EMAIL PROTECTED]> wrote: The most important things that come to mind: - stabilize/review external api (query parameters, schema.xml/solrconfig.xml format, XML response format). (for instance, should debugQuery/explainOther combo be changed to debug/debug.otherQuery? Is the other query explain functionality important?) Yes, important, but not essential. Whatever release we do first is likely to be an alpha release to show a healthy and active community and focus on non-technical issues towards graduating from the incubator. - review/trim internal api. Not as crucial as the above, but still important. An example is that fields have two write() methods, one for the old XMLWriter and another for a generic TextResponseWriter. Perhaps we could make a parent interface for output writing so that this can be reduced to one method (the methods are identical for most of defined fields). Plenty of those examples around, all worthy of review and trimming, but see above. - remove all deprecated/compatibility code. Yes, and this one actually has a bonus in that we don't have to worry about licensing / noting (in our release documentation) any source we don't ship... Yoav
[jira] Commented: (SOLR-52) Lazy Field loading
[ http://issues.apache.org/jira/browse/SOLR-52?page=comments#action_12440974 ] Mike Klaas commented on SOLR-52: Note the above patch does not address the issue of lazy field use mismatch between two handlers (see solr-dev) > Lazy Field loading > -- > > Key: SOLR-52 > URL: http://issues.apache.org/jira/browse/SOLR-52 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Mike Klaas > Assigned To: Mike Klaas >Priority: Minor > Attachments: lazyfields_patch.diff, lazyfields_patch.diff > > > Add lazy field loading to solr. > Currently solr reads all stored fields and filters the undesired fields based > on the field list. This is usually not a performance concern, but when using > solr to store large numbers of fields, or just one large field (doc contents, > eg. for highlighting), it is perceptible. > Now, there is a concern with the doc cache of SolrIndexSearcher, which > assumes it has the whole document in the cache. To maintain this invariant, > it is still the case that all the fields in a document are loaded in a > searcher.doc(i) call. However, if a field set is given to teh method, only > the given fields are loaded directly, while the rest are loaded lazily. > Some concerns about lazy field loading > 1. Lazy field are only valid while the IndexReader is open. I believe this > is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all > docs in the cache have the reader available. > 2. It is slower to read a field lazily and retrieve its value later than > retrieve it directory to begin with (though I don't know how much--depends on > i/o factors). We certainly don't want this to be the common case. I added > an optional call which accumulates all the field likely to be used in the > request (highlighting, reponse writing), and populates the IndexSearcher > cache a priori. This has the added advantage of concentrating doc retrieval > in a single place, which is nice from a performance testing perspective. > 3. LazyFields are incompatible with the sundry Field declarations scattered > about Solr. I believe I've changed all the necessary locations to Fieldable. > Comments appreciated -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-52) Lazy Field loading
[ http://issues.apache.org/jira/browse/SOLR-52?page=all ] Mike Klaas updated SOLR-52: --- Attachment: lazyfields_patch.diff updated version of patch. Addresses some of Hoss' (minor) comments. Also, the .doc() method of SolrIndexSearcher will added the unique key field unconditionally if it is present in the schema. IT is used randomly in several places and including checks for it in other places decreases readability. > Lazy Field loading > -- > > Key: SOLR-52 > URL: http://issues.apache.org/jira/browse/SOLR-52 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Mike Klaas > Assigned To: Mike Klaas >Priority: Minor > Attachments: lazyfields_patch.diff, lazyfields_patch.diff > > > Add lazy field loading to solr. > Currently solr reads all stored fields and filters the undesired fields based > on the field list. This is usually not a performance concern, but when using > solr to store large numbers of fields, or just one large field (doc contents, > eg. for highlighting), it is perceptible. > Now, there is a concern with the doc cache of SolrIndexSearcher, which > assumes it has the whole document in the cache. To maintain this invariant, > it is still the case that all the fields in a document are loaded in a > searcher.doc(i) call. However, if a field set is given to teh method, only > the given fields are loaded directly, while the rest are loaded lazily. > Some concerns about lazy field loading > 1. Lazy field are only valid while the IndexReader is open. I believe this > is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all > docs in the cache have the reader available. > 2. It is slower to read a field lazily and retrieve its value later than > retrieve it directory to begin with (though I don't know how much--depends on > i/o factors). We certainly don't want this to be the common case. I added > an optional call which accumulates all the field likely to be used in the > request (highlighting, reponse writing), and populates the IndexSearcher > cache a priori. This has the added advantage of concentrating doc retrieval > in a single place, which is nice from a performance testing perspective. > 3. LazyFields are incompatible with the sundry Field declarations scattered > about Solr. I believe I've changed all the necessary locations to Fieldable. > Comments appreciated -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: changes before release?
On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: I think we should plan on making a release soon (end of October?) At the technical level, what do people thing should be changed/committed before we do? Off the top of my head, perhaps http://issues.apache.org/jira/browse/SOLR-2 because it has to do with interface, and none of the java clients are really locked down yet. The most important things that come to mind: - stabilize/review external api (query parameters, schema.xml/solrconfig.xml format, XML response format). (for instance, should debugQuery/explainOther combo be changed to debug/debug.otherQuery? Is the other query explain functionality important?) - review/trim internal api. Not as crucial as the above, but still important. An example is that fields have two write() methods, one for the old XMLWriter and another for a generic TextResponseWriter. Perhaps we could make a parent interface for output writing so that this can be reduced to one method (the methods are identical for most of defined fields). - remove all deprecated/compatibility code. -Mike
Re: changes before release?
On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 10/9/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: > ...Not sure what you mean by "technical level" There is other stuff that needs to be done before we can release, such as a change of copyright notices to comply with new ASF requirements Ok, got it now. ...Perhaps we could link your presentation here? http://wiki.apache.org/solr/SolrResources Good idea, done! My presentation was well received at the Cocoon GetTogether, and I talked with several people who are considering Solr for their projects. The question that usually comes is: how to convince my manager to use software that is still in incubation, so making a release is certainly a Good Thing. -Bertrand
Re: changes before release?
On 10/9/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > ...At the technical level, what do people thing should be > changed/committed before we do?... Not sure what you mean by "technical level" There is other stuff that needs to be done before we can release, such as a change of copyright notices to comply with new ASF requirements. I imagine incubator-general will go over the first release we make with a fine-tooth comb looking for compliance with ASF release rules. ...if I'm allowed to suggest my own patch, I think http://issues.apache.org/jira/browse/SOLR-49 (XSLTResponseWriter) could be committed. It has no impact on existing code and can be useful for simple setups, demos, etc. Definitely... I'll take another look at the latest version. Oh, and thanks for all the work you've been doing getting the word out on Solr! http://wiki.apache.org/cocoon/GT2006Notes Perhaps we could link your presentation here? http://wiki.apache.org/solr/SolrResources -Yonik
Re: changes before release?
On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: ...At the technical level, what do people thing should be changed/committed before we do?... Not sure what you mean by "technical level"...if I'm allowed to suggest my own patch, I think http://issues.apache.org/jira/browse/SOLR-49 (XSLTResponseWriter) could be committed. It has no impact on existing code and can be useful for simple setups, demos, etc. -Bertrand
changes before release?
I think we should plan on making a release soon (end of October?) At the technical level, what do people thing should be changed/committed before we do? Off the top of my head, perhaps http://issues.apache.org/jira/browse/SOLR-2 because it has to do with interface, and none of the java clients are really locked down yet. -Yonik
[jira] Resolved: (SOLR-43) query parameter overhaul
[ http://issues.apache.org/jira/browse/SOLR-43?page=all ] Yonik Seeley resolved SOLR-43. -- Resolution: Fixed closing... the removal of deprecated methods should probably be more tied to releases. > query parameter overhaul > > > Key: SOLR-43 > URL: http://issues.apache.org/jira/browse/SOLR-43 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > Attachments: dismax-solrparams.patch, solrparams.patch, > solrparams.patch > > > Goals: > - per field parameters that fall back to global values > - defaults in solrconfig.xml per request handler, overridable per > This is desirable for highlighting additions: > http://issues.apache.org/jira/browse/SOLR-37 > last email thread: > http://www.nabble.com/parameter-defaults-and-config-tf2020863.html#a5556298 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Re: [jira] Created: (SOLR-52) Lazy Field loading
On 10/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I wouldn't expect there to be much of a difference. Lazy fields hold : on to a stream and an offset, and operate by seek()'ing to the right ... Hmmm... yeah it sounds like it shouldn't matter. If i get soem time i'll try to do a micro benchmark to compare loading a doc with one field and then loading the rest lazy vs loading the doc twice. If lazy loading is ever shown to be a performance problem, a simple solution would be to have a switch in solrconfig.xml to disable it. -Yonik
Re: Re: [jira] Created: (SOLR-52) Lazy Field loading
: I wouldn't expect there to be much of a difference. Lazy fields hold : on to a stream and an offset, and operate by seek()'ing to the right ... Hmmm... yeah it sounds like it shouldn't matter. If i get soem time i'll try to do a micro benchmark to compare loading a doc with one field and then loading the rest lazy vs loading the doc twice. : > An alternate idea: the single arg version could check if an item found in : > the cache contains lazy fields and if so re-fetch and recache the full : > Document? : : That could work though I wonder if the O(num fields) cost per document : access is worth it. Perhaps the document could be stored with a : "lazy" flag in the cache, to make this check O(1). right ... checking the individual fields would be a very bad idea. : Perhaps instead of a "lazy" flag, the number of real fields could be : stored in the cache. On the next document request, if there are more : than 1-2 more fields requeste than "real" in the cache, the full : document is returned. that's the kind of crazy, hueristic/AIish kind of soulution approach I love! ... but probably not worth the effort unless we see a demonstratable problem. : How much of a problem this is depends also on how often documets are : hit once in the cache. If it if more than a few times, the load-once : behaviour of lazy fields should amortize out the extra cost. right ... and if it's not more then a few times, you might as well skip the doc cache completely. : > 2) why doesn't optimizePreFetchDocs use SolrIndexSearcher.readDocs (looks : > like cut/paste of method body) : : Avoids the allocation of the Document[] array, and is three lines (vs. : two lines to allocate array and call readDocs). Ah ... that makes sense. -Hoss
[jira] Commented: (SOLR-52) Lazy Field loading
[ http://issues.apache.org/jira/browse/SOLR-52?page=comments#action_12440945 ] Yonik Seeley commented on SOLR-52: -- > hence the data will be stored in the field *twice* for some reason FYI, I just checked in a Lucene fix for this. > Lazy Field loading > -- > > Key: SOLR-52 > URL: http://issues.apache.org/jira/browse/SOLR-52 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Mike Klaas > Assigned To: Mike Klaas >Priority: Minor > Attachments: lazyfields_patch.diff > > > Add lazy field loading to solr. > Currently solr reads all stored fields and filters the undesired fields based > on the field list. This is usually not a performance concern, but when using > solr to store large numbers of fields, or just one large field (doc contents, > eg. for highlighting), it is perceptible. > Now, there is a concern with the doc cache of SolrIndexSearcher, which > assumes it has the whole document in the cache. To maintain this invariant, > it is still the case that all the fields in a document are loaded in a > searcher.doc(i) call. However, if a field set is given to teh method, only > the given fields are loaded directly, while the rest are loaded lazily. > Some concerns about lazy field loading > 1. Lazy field are only valid while the IndexReader is open. I believe this > is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all > docs in the cache have the reader available. > 2. It is slower to read a field lazily and retrieve its value later than > retrieve it directory to begin with (though I don't know how much--depends on > i/o factors). We certainly don't want this to be the common case. I added > an optional call which accumulates all the field likely to be used in the > request (highlighting, reponse writing), and populates the IndexSearcher > cache a priori. This has the added advantage of concentrating doc retrieval > in a single place, which is nice from a performance testing perspective. > 3. LazyFields are incompatible with the sundry Field declarations scattered > about Solr. I believe I've changed all the necessary locations to Fieldable. > Comments appreciated -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Re: [jira] Created: (SOLR-52) Lazy Field loading
> 3) should we be concerned about letting people specify prefixes/suffixes > of the fields they want to forcably load for dynamicFields instead of just > a Set of names? .. or should we cross that bridge when we come to > it? (I ask because we have no cache aware method that takes in a > FieldSelector, just the one that takes in the Set) It would be very easy to add a parallel method which takes a FieldSelector. My only concern with that is that it might make it hard to do cache flushing heuristics like you suggested above. Yeah, I had thought about that and decided it was probably best left out for now... one can always get the IndexReader and use it's methods to provide uncached doc access with a FieldSelector. -Yonik