On Thu, Jan 31, 2013 at 7:07 AM, Gili Nachum <gi...@il.ibm.com> wrote:

> So, when loading the results I want to return (say 10 documents), if not
> all docs fit in RAM, I would incur up to 10 individual disk seek
> operations. Which will kill my performance. Is that correct?

Yes, 10 seeks, and that may or may not kill your performance ...
depends on how fast your disks are (eg SSD seek cost is minor).

Also note that this is not an MMapDirectory problem: any time your
index cannot fit in RAM, no matter which directory impl you use,
you'll be seeking.

> Considering what are my alternatives:
> 1. Create another separate lean index that would fit in RAM.
> 2. Keep stored fields to a minimum, store non frequent accessed store
> fields outside of Lucene.

You could also use DocValues (in 4.0/4.1, but API is likely changing
in 4.2), which store fields "column stride".  This way your rarely
accessed fields would pay the seek cost but your commonly accessed
fiedls, if htey are small enough, could fit into available RAM.

> In this particular use case, it would have really helped if I could order
> Lucene which stored fields should be eagerly read loading a document, and
> which should be lazy loaded from else where in the disk. Thereby fitting
> into memory those stored fields that are frequently needed.
> I guess my use case is too specific?

You could also make a custom StoredFieldsFormat (part of your Codec)
that does exactly this.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to