date:20061009

Re: [jira] Updated: (SOLR-52) Lazy Field loading

2006-10-09 Thread Chris Hostetter


: updated version of patch.  Addresses some of Hoss' (minor) comments.
: Also, the .doc() method of SolrIndexSearcher will added the unique key
: field unconditionally if it is present in the schema.  IT is used
: randomly in several places and including checks for it in other places
: decreases readability.

We probably don't want to add the unique key field directly to Set passed
by the client -- partially because it's bad form to modify a collection as
a side affect of another method, but also because Set.add is an optional
method that might through UnsupportedOperationException.



-Hoss

Re: changes before release?

2006-10-09 Thread Chris Hostetter


: http://issues.apache.org/jira/browse/SOLR-49 (XSLTResponseWriter)
: could be committed.  It has no impact on existing code and can be
: useful for simple setups, demos, etc.

On the subject of stablizing the external APIs, the one thing about your
patch in it's current format that I rememebr not being fond of using XML
node attributes to configure queryResponseWriters instead of refactoring
the code used to get requestHandler configuration using nested XML as a
NamedList (which can easily be converted to SolrParams).

Other then that, i agree it would be a really handy thing to have in our
first release.



-Hoss

Re: changes before release?

2006-10-09 Thread Chris Hostetter


:  - stabilize/review external api (query parameters,
: schema.xml/solrconfig.xml format, XML response format).  (for

Right ... it's not something that i've thought about lately, but doing
something with the XSD in SOLR-17 so that the XML output format can be
validated would probably be a good idea for an "official" release.



-Hoss

Re: changes before release?

2006-10-09 Thread Yoav Shapira


Hi,

On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

There is other stuff that needs to be done before we can release, such
as a change of copyright notices to comply with new ASF requirements.
I imagine incubator-general will go over the first release we make
with a fine-tooth comb looking for compliance with ASF release rules.


Yes.  Speaking with my incubator PMC hat on, I think we should do the
release prep stuff (LICENSE and NOTICE files, license resolution,
etc.) first before the technical things, because we know less about
them, they're required, and we are likely to underestimate the effort.

On 10/9/06, Mike Klaas  <[EMAIL PROTECTED]> wrote:

The most important things that come to mind:
- stabilize/review external api (query parameters,
schema.xml/solrconfig.xml format, XML response format).  (for
instance, should debugQuery/explainOther combo be changed to
debug/debug.otherQuery?  Is the other query explain functionality
important?)


Yes, important, but not essential.  Whatever release we do first is
likely to be an alpha release to show a healthy and active community
and focus on non-technical issues towards graduating from the
incubator.


- review/trim internal api.  Not as crucial as the above, but still
important.  An example is that fields have two write() methods, one
for the old XMLWriter and another for a generic TextResponseWriter.
Perhaps we could make a parent interface for output writing so that
this can be reduced to one method (the methods are identical for most
of defined fields).


Plenty of those examples around, all worthy of review and trimming,
but see above.


- remove all deprecated/compatibility code.


Yes, and this one actually has a bonus in that we don't have to worry
about licensing / noting (in our release documentation) any source we
don't ship...

Yoav

[jira] Commented: (SOLR-52) Lazy Field loading

2006-10-09 Thread Mike Klaas (JIRA)

[ 
http://issues.apache.org/jira/browse/SOLR-52?page=comments#action_12440974 ] 

Mike Klaas commented on SOLR-52:


Note the above patch does not address the issue of lazy field use mismatch 
between two handlers (see solr-dev)

> Lazy Field loading
> --
>
> Key: SOLR-52
> URL: http://issues.apache.org/jira/browse/SOLR-52
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
>Priority: Minor
> Attachments: lazyfields_patch.diff, lazyfields_patch.diff
>
>
> Add lazy field loading to solr.
> Currently solr reads all stored fields and filters the undesired fields based 
> on the field list.  This is usually not a performance concern, but when using 
> solr to store large numbers of fields, or just one large field (doc contents, 
> eg. for highlighting), it is perceptible.
> Now, there is a concern with the doc cache of SolrIndexSearcher, which 
> assumes it has the whole document in the cache.  To maintain this invariant, 
> it is still the case that all the fields in a document are loaded in a 
> searcher.doc(i) call.  However, if a field set is given to teh method, only 
> the given fields are loaded directly, while the rest are loaded lazily.
> Some concerns about lazy field loading
>   1. Lazy field are only valid while the IndexReader is open.  I believe this 
> is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all 
> docs in the cache have the reader available.  
>   2. It is slower to read a field lazily and retrieve its value later than 
> retrieve it directory to begin with (though I don't know how much--depends on 
> i/o factors).  We certainly don't want this to be the common case.  I added 
> an optional call which accumulates all the field likely to be used in the 
> request (highlighting, reponse writing), and populates the IndexSearcher 
> cache a priori.  This has the added advantage of concentrating doc retrieval 
> in a single place, which is nice from a performance testing perspective.
>  3. LazyFields are incompatible with the sundry Field declarations scattered 
> about Solr.  I believe I've changed all the necessary locations to Fieldable.
> Comments appreciated

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (SOLR-52) Lazy Field loading

2006-10-09 Thread Mike Klaas (JIRA)

 [ http://issues.apache.org/jira/browse/SOLR-52?page=all ]

Mike Klaas updated SOLR-52:
---

Attachment: lazyfields_patch.diff

updated version of patch.  Addresses some of Hoss' (minor) comments.  Also, the 
.doc() method of SolrIndexSearcher will added the unique key field 
unconditionally if it is present in the schema.  IT is used randomly in several 
places and including checks for it in other places decreases readability. 

> Lazy Field loading
> --
>
> Key: SOLR-52
> URL: http://issues.apache.org/jira/browse/SOLR-52
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
>Priority: Minor
> Attachments: lazyfields_patch.diff, lazyfields_patch.diff
>
>
> Add lazy field loading to solr.
> Currently solr reads all stored fields and filters the undesired fields based 
> on the field list.  This is usually not a performance concern, but when using 
> solr to store large numbers of fields, or just one large field (doc contents, 
> eg. for highlighting), it is perceptible.
> Now, there is a concern with the doc cache of SolrIndexSearcher, which 
> assumes it has the whole document in the cache.  To maintain this invariant, 
> it is still the case that all the fields in a document are loaded in a 
> searcher.doc(i) call.  However, if a field set is given to teh method, only 
> the given fields are loaded directly, while the rest are loaded lazily.
> Some concerns about lazy field loading
>   1. Lazy field are only valid while the IndexReader is open.  I believe this 
> is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all 
> docs in the cache have the reader available.  
>   2. It is slower to read a field lazily and retrieve its value later than 
> retrieve it directory to begin with (though I don't know how much--depends on 
> i/o factors).  We certainly don't want this to be the common case.  I added 
> an optional call which accumulates all the field likely to be used in the 
> request (highlighting, reponse writing), and populates the IndexSearcher 
> cache a priori.  This has the added advantage of concentrating doc retrieval 
> in a single place, which is nice from a performance testing perspective.
>  3. LazyFields are incompatible with the sundry Field declarations scattered 
> about Solr.  I believe I've changed all the necessary locations to Fieldable.
> Comments appreciated

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: changes before release?

2006-10-09 Thread Mike Klaas


On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

I think we should plan on making a release soon (end of October?)
At the technical level, what do people thing should be
changed/committed before we do?

Off the top of my head, perhaps
  http://issues.apache.org/jira/browse/SOLR-2
because it has to do with interface, and none of the java clients are
really locked down yet.


The most important things that come to mind:
- stabilize/review external api (query parameters,
schema.xml/solrconfig.xml format, XML response format).  (for
instance, should debugQuery/explainOther combo be changed to
debug/debug.otherQuery?  Is the other query explain functionality
important?)
- review/trim internal api.  Not as crucial as the above, but still
important.  An example is that fields have two write() methods, one
for the old XMLWriter and another for a generic TextResponseWriter.
Perhaps we could make a parent interface for output writing so that
this can be reduced to one method (the methods are identical for most
of defined fields).
- remove all deprecated/compatibility code.

-Mike

Re: changes before release?

2006-10-09 Thread Bertrand Delacretaz


On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 10/9/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:
> ...Not sure what you mean by "technical level"

There is other stuff that needs to be done before we can release, such
as a change of copyright notices to comply with new ASF requirements


Ok, got it now.


...Perhaps we could link your presentation here?
http://wiki.apache.org/solr/SolrResources


Good idea, done!

My presentation was well received at the Cocoon GetTogether, and I
talked with several people who are considering Solr for their
projects.

The question that usually comes is: how to convince my manager to use
software that is still in incubation, so making a release is certainly
a Good Thing.

-Bertrand

Re: changes before release?

2006-10-09 Thread Yonik Seeley

On 10/9/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> ...At the technical level, what do people thing should be
> changed/committed before we do?...

Not sure what you mean by "technical level"

There is other stuff that needs to be done before we can release, such
as a change of copyright notices to comply with new ASF requirements.
I imagine incubator-general will go over the first release we make
with a fine-tooth comb looking for compliance with ASF release rules.

...if I'm allowed to
suggest my own patch, I think
http://issues.apache.org/jira/browse/SOLR-49 (XSLTResponseWriter)
could be committed.  It has no impact on existing code and can be
useful for simple setups, demos, etc.

Definitely... I'll take another look at the latest version.

Oh, and thanks for all the work you've been doing getting the word out on Solr!
http://wiki.apache.org/cocoon/GT2006Notes
Perhaps we could link your presentation here?
http://wiki.apache.org/solr/SolrResources

-Yonik

Re: changes before release?

2006-10-09 Thread Bertrand Delacretaz


On 10/9/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

...At the technical level, what do people thing should be
changed/committed before we do?...


Not sure what you mean by "technical level"...if I'm allowed to
suggest my own patch, I think
http://issues.apache.org/jira/browse/SOLR-49 (XSLTResponseWriter)
could be committed.  It has no impact on existing code and can be
useful for simple setups, demos, etc.

-Bertrand

changes before release?

2006-10-09 Thread Yonik Seeley


I think we should plan on making a release soon (end of October?)
At the technical level, what do people thing should be
changed/committed before we do?

Off the top of my head, perhaps
 http://issues.apache.org/jira/browse/SOLR-2
because it has to do with interface, and none of the java clients are
really locked down yet.

-Yonik

[jira] Resolved: (SOLR-43) query parameter overhaul

2006-10-09 Thread Yonik Seeley (JIRA)

 [ http://issues.apache.org/jira/browse/SOLR-43?page=all ]

Yonik Seeley resolved SOLR-43.
--

Resolution: Fixed

closing... the removal of deprecated methods should probably be more tied to 
releases.

> query parameter overhaul
> 
>
> Key: SOLR-43
> URL: http://issues.apache.org/jira/browse/SOLR-43
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
> Attachments: dismax-solrparams.patch, solrparams.patch, 
> solrparams.patch
>
>
> Goals:
> - per field parameters that fall back to global values
> - defaults in solrconfig.xml per request handler, overridable per
> This is desirable for highlighting additions: 
> http://issues.apache.org/jira/browse/SOLR-37 
> last email thread: 
> http://www.nabble.com/parameter-defaults-and-config-tf2020863.html#a5556298

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Re: [jira] Created: (SOLR-52) Lazy Field loading

2006-10-09 Thread Yonik Seeley


On 10/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: I wouldn't expect there to be much of a difference.  Lazy fields hold
: on to a stream and an offset, and operate by seek()'ing to the right
...

Hmmm... yeah it sounds like it shouldn't matter.  If i get soem time i'll
try to do a micro benchmark to compare loading a doc with one field and
then loading the rest lazy vs loading the doc twice.


If lazy loading is ever shown to be a performance problem, a simple
solution would be to have a switch in solrconfig.xml to disable it.

-Yonik

Re: Re: [jira] Created: (SOLR-52) Lazy Field loading

2006-10-09 Thread Chris Hostetter


: I wouldn't expect there to be much of a difference.  Lazy fields hold
: on to a stream and an offset, and operate by seek()'ing to the right
...

Hmmm... yeah it sounds like it shouldn't matter.  If i get soem time i'll
try to do a micro benchmark to compare loading a doc with one field and
then loading the rest lazy vs loading the doc twice.

: > An alternate idea: the single arg version could check if an item found in
: > the cache contains lazy fields and if so re-fetch and recache the full
: > Document?
:
: That could work though I wonder if the O(num fields) cost per document
: access is worth it.  Perhaps the document could be stored with a
: "lazy" flag in the cache, to make this check O(1).

right ... checking the individual fields would be a very bad idea.

: Perhaps instead of a "lazy" flag, the number of real fields could be
: stored in the cache.  On the next document request, if there are more
: than 1-2 more fields requeste than "real" in the cache, the full
: document is returned.

that's the kind of crazy, hueristic/AIish kind of soulution approach I
love! ... but probably not worth the effort unless we see a demonstratable
problem.

: How much of a problem this is depends also on how often documets are
: hit once in the cache.  If it if more than a few times, the load-once
: behaviour of lazy fields should amortize out the extra cost.

right ... and if it's not more then a few times, you might as well skip
the doc cache completely.

: > 2) why doesn't optimizePreFetchDocs use SolrIndexSearcher.readDocs (looks
: > like cut/paste of method body)
:
: Avoids the allocation of the Document[] array, and is three lines (vs.
: two lines to allocate array and call readDocs).

Ah ... that makes sense.



-Hoss

[jira] Commented: (SOLR-52) Lazy Field loading

2006-10-09 Thread Yonik Seeley (JIRA)

[ 
http://issues.apache.org/jira/browse/SOLR-52?page=comments#action_12440945 ] 

Yonik Seeley commented on SOLR-52:
--

> hence the data will be stored in the field *twice* for some reason

FYI, I just checked in a Lucene fix for this.

> Lazy Field loading
> --
>
> Key: SOLR-52
> URL: http://issues.apache.org/jira/browse/SOLR-52
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
>Priority: Minor
> Attachments: lazyfields_patch.diff
>
>
> Add lazy field loading to solr.
> Currently solr reads all stored fields and filters the undesired fields based 
> on the field list.  This is usually not a performance concern, but when using 
> solr to store large numbers of fields, or just one large field (doc contents, 
> eg. for highlighting), it is perceptible.
> Now, there is a concern with the doc cache of SolrIndexSearcher, which 
> assumes it has the whole document in the cache.  To maintain this invariant, 
> it is still the case that all the fields in a document are loaded in a 
> searcher.doc(i) call.  However, if a field set is given to teh method, only 
> the given fields are loaded directly, while the rest are loaded lazily.
> Some concerns about lazy field loading
>   1. Lazy field are only valid while the IndexReader is open.  I believe this 
> is fine since the IndexReader is kept alive by the SolrIndexSearcher, so all 
> docs in the cache have the reader available.  
>   2. It is slower to read a field lazily and retrieve its value later than 
> retrieve it directory to begin with (though I don't know how much--depends on 
> i/o factors).  We certainly don't want this to be the common case.  I added 
> an optional call which accumulates all the field likely to be used in the 
> request (highlighting, reponse writing), and populates the IndexSearcher 
> cache a priori.  This has the added advantage of concentrating doc retrieval 
> in a single place, which is nice from a performance testing perspective.
>  3. LazyFields are incompatible with the sundry Field declarations scattered 
> about Solr.  I believe I've changed all the necessary locations to Fieldable.
> Comments appreciated

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Re: [jira] Created: (SOLR-52) Lazy Field loading

2006-10-09 Thread Yonik Seeley


> 3) should we be concerned about letting people specify prefixes/suffixes
> of the fields they want to forcably load for dynamicFields instead of just
> a Set of names? .. or should we cross that bridge when we come to
> it?  (I ask because we have no cache aware method that takes in a
> FieldSelector, just the one that takes in the Set)

It would be very easy to add a parallel method which takes a
FieldSelector.  My only concern with that is that it might make it
hard to do cache flushing heuristics like you suggested above.


Yeah, I had thought about that and decided it was probably best left
out for now... one can always get the IndexReader and use it's methods
to provide uncached doc access with a FieldSelector.

-Yonik

Re: [jira] Updated: (SOLR-52) Lazy Field loading

Re: changes before release?

Re: changes before release?

Re: changes before release?

[jira] Commented: (SOLR-52) Lazy Field loading

[jira] Updated: (SOLR-52) Lazy Field loading

Re: changes before release?

Re: changes before release?

Re: changes before release?

Re: changes before release?

changes before release?

[jira] Resolved: (SOLR-43) query parameter overhaul

Re: Re: [jira] Created: (SOLR-52) Lazy Field loading

Re: Re: [jira] Created: (SOLR-52) Lazy Field loading

[jira] Commented: (SOLR-52) Lazy Field loading

Re: Re: [jira] Created: (SOLR-52) Lazy Field loading

16 matches

Site Navigation

Mail list logo

Footer information