Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
Hey Yonik.
Thanks for clarifying.
The reason I went rolling my own way - I asked previously is there's any
plan to back-port the field collapse to solr 1.4 and
I understood that its not at all straight forward.

If you think it'll be fairly easy to look at the new code in Solr 4.0 trunk
and use that as basis for example I'd go ahead and do that.

Q - does the field collapse componet expect the field to collapse on to be
stored? or does it also try to use field cache trickery?

Thanks,
Adam

On Mon, Dec 6, 2010 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Sun, Dec 5, 2010 at 6:12 PM, Adam H. jimmoe...@gmail.com wrote:
  StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader,
  collapseField);
 
  where 'reader' is the instance of the SolrIndexReader passed along to the
  component with the ResponseBuilder.SolrQueryRequest object.
 
  As I understand, this can double memory usage due to (re)loading this
  fieldcache on a reader-wide basis rather than on a per segment basis?

 Yep.  Sorting and function queries use per-segment FieldCache entries.
 So If you also request a FieldCache from the top level reader, it
 won't reuse the per-segment caches and hence will take up 2x memory
 over just using per-segment.

 Solr's field collapsing already works on a per-segment basis... if
 your needs are at all general, it could make sense to try and get it
 rolled into solr rather than implementing custom code.

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Yonik Seeley
On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote:
 Hey Yonik.
 Thanks for clarifying.
 The reason I went rolling my own way - I asked previously is there's any
 plan to back-port the field collapse to solr 1.4 and
 I understood that its not at all straight forward.

Ahhh... I'd just use trunk if possible ;-)

The risks to being in production on custom code that no one else uses
is perhaps greater than running on a widely used development version.

But yes... I don't see a backport happening for 1.4

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
Fair enough - I might give it a shot if most functionality is compatible to
solr 1.4.1 to your mind? and is fairly stable?

One last Q regarding correct usage of per-segment FieldCache in Solr
components -

since this is something I might also have issues with elsewhere, and I
suspect other people who work on custom logic as well,
i think it might be useful to have some documentation and/or a simple
programmatic interface for implementing
correct access path to these inside a custom SolrComponent.

I looked around the Grouping code abit and have yet to fully understand
whats going on, but is the ValueSource supposed to take care of access to
underlying field?

On Mon, Dec 6, 2010 at 12:34 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote:
  Hey Yonik.
  Thanks for clarifying.
  The reason I went rolling my own way - I asked previously is there's any
  plan to back-port the field collapse to solr 1.4 and
  I understood that its not at all straight forward.

 Ahhh... I'd just use trunk if possible ;-)

 The risks to being in production on custom code that no one else uses
 is perhaps greater than running on a widely used development version.

 But yes... I don't see a backport happening for 1.4

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Yonik Seeley
On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote:
 Fair enough - I might give it a shot if most functionality is compatible to
 solr 1.4.1 to your mind? and is fairly stable?

Yes, the external APIs are very compatible.
The internal APIs - not so much.
You should reindex also.

 One last Q regarding correct usage of per-segment FieldCache in Solr
 components -

 since this is something I might also have issues with elsewhere, and I
 suspect other people who work on custom logic as well,
 i think it might be useful to have some documentation and/or a simple
 programmatic interface for implementing
 correct access path to these inside a custom SolrComponent.

 I looked around the Grouping code abit and have yet to fully understand
 whats going on, but is the ValueSource supposed to take care of access to
 underlying field?

Yes - you can actually group on arbitrary function queries even.
That will be more useful when we add some bucketing functions.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Ryan McKinley
On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote:
 Fair enough - I might give it a shot if most functionality is compatible to
 solr 1.4.1 to your mind? and is fairly stable?

 Yes, the external APIs are very compatible.
 The internal APIs - not so much.
 You should reindex also.

And not be (too) surprised if things change before the official 4.x
release -- the chances are good that something will change that may
require reindexing.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
So,
summing up all the information i now have, and the fact I have some
additional custom components that use fieldcache,
such that the specific answer for field collapsing by migrating to solr 4.0
is not a complete solution to my problems,

it seems to me more and more like I might have to actually implement a
custom solr QueryComponent, whereby I will pass it
multiple collectors (perhaps via some kind of MultiCollector interface,
similar to Grouping uses) which will do their appropriate field value
collection/aggregation
as results are being fetched.

In other words, using a per-segment fieldcache collection as a
post-processing step (e.g after QueryComponent did its collection) does not
seem at all trivial, if at all possible ( is it possible? )
Is this accurate?

Thanks again for all the info here..

Adam

On Mon, Dec 6, 2010 at 1:48 PM, Ryan McKinley ryan...@gmail.com wrote:

 On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:
  On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote:
  Fair enough - I might give it a shot if most functionality is compatible
 to
  solr 1.4.1 to your mind? and is fairly stable?
 
  Yes, the external APIs are very compatible.
  The internal APIs - not so much.
  You should reindex also.

 And not be (too) surprised if things change before the official 4.x
 release -- the chances are good that something will change that may
 require reindexing.

 ryan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Yonik Seeley
On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote:
 In other words, using a per-segment fieldcache collection as a
 post-processing step (e.g after QueryComponent did its collection) does not
 seem at all trivial, if at all possible ( is it possible? )

Sure, it's possible, and not too hard (as long as no sort field involves score).
Just instruct the QueryComponent to retrieve the set of all matching
documents, then you can use that to run then through whatever
collectors you want again.  I've been meaning to implement this
optimization to field collapsing...

Depending on the details, either replacing the QueryComponent with
your custom one, or inserting an additional component after the query
component could make sense.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
ah! so just so I can get cracking on this - Can you be alittle more
specific? e.g

in my component implementation that runs in the request handling after the
normal QueryComponent,
How would I access the specific field value for the documents that were
retrieved?

i.e how would it fit in a code like this if at all:

// docList is the matching documents for given offset/rows/query
DocIterator it = docList.iterator();

while (it.hasNext()) {
docId = it.next();
score = it.score();


// this would've worked if this was stored field:
// reader.document(docId).get(fieldName)
??
}



On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote:
  In other words, using a per-segment fieldcache collection as a
  post-processing step (e.g after QueryComponent did its collection) does
 not
  seem at all trivial, if at all possible ( is it possible? )

 Sure, it's possible, and not too hard (as long as no sort field involves
 score).
 Just instruct the QueryComponent to retrieve the set of all matching
 documents, then you can use that to run then through whatever
 collectors you want again.  I've been meaning to implement this
 optimization to field collapsing...

 Depending on the details, either replacing the QueryComponent with
 your custom one, or inserting an additional component after the query
 component could make sense.

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
One more comment/question -
Having looked at the Solr stats panel, I do not see detailed memory usage
for the field i'm collapsing on in the lucene FieldCache entries listings.

As I understand ( after having looked through this ticket:
https://issues.apache.org/jira/browse/SOLR-1292 ), this means that its not
an 'insanity' instance,
and so actually I am not using double the memory, but rather only have this
field in the FieldCache on the whole index level.

This got me thinking - If i'm not using any segment-level fieldcaching for
this field, there's no reason not to use an index-wide one,
as long as I can guarantee thats the only use case for this field in the
fieldcache.. is this correct?

Thanks again for helping me out with this delicate subject :)

Adam

On Mon, Dec 6, 2010 at 3:21 PM, Adam H. jimmoe...@gmail.com wrote:

 ah! so just so I can get cracking on this - Can you be alittle more
 specific? e.g

 in my component implementation that runs in the request handling after the
 normal QueryComponent,
 How would I access the specific field value for the documents that were
 retrieved?

 i.e how would it fit in a code like this if at all:

 // docList is the matching documents for given offset/rows/query
 DocIterator it = docList.iterator();

 while (it.hasNext()) {
 docId = it.next();
 score = it.score();


 // this would've worked if this was stored field:
 // reader.document(docId).get(fieldName)
 ??

 }



 On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley 
 yo...@lucidimagination.comwrote:

 On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote:
  In other words, using a per-segment fieldcache collection as a
  post-processing step (e.g after QueryComponent did its collection) does
 not
  seem at all trivial, if at all possible ( is it possible? )

 Sure, it's possible, and not too hard (as long as no sort field involves
 score).
 Just instruct the QueryComponent to retrieve the set of all matching
 documents, then you can use that to run then through whatever
 collectors you want again.  I've been meaning to implement this
 optimization to field collapsing...

 Depending on the details, either replacing the QueryComponent with
 your custom one, or inserting an additional component after the query
 component could make sense.

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





FieldCache usage for custom field collapse in solr 1.4

2010-12-05 Thread Adam H.
Hey,
I'm trying to use the lucene FieldCache for some custom field collapsing
implementation: basically i'm collapsing on a non-stored field,
and so am using the fieldcache to retrieve field value instances during run.

I noticed I'm getting some OOM's after deploying it, and after looking into
it for abit, figured that it might be to do with using a call like this:

StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader,
collapseField);

where 'reader' is the instance of the SolrIndexReader passed along to the
component with the ResponseBuilder.SolrQueryRequest object.

As I understand, this can double memory usage due to (re)loading this
fieldcache on a reader-wide basis rather than on a per segment basis?
If so, what would be a way to migrate this code to use a per segment cache?
i'm not sure I understand the semantics there at all...

Any help will be greatly appreciated, thanks alot!

Adam