Shai Erera wrote:
I think you misunderstood me - ultimately, the process reached 128MB.
However it was flushing the .fdt file before it reached that. Your
explanation on stored fields explains that behavior, but it did
consume128MB.

Ahh, phew.

Also, the CFS files that were written were of size >200MB (but less than 256) - which does not align with the 128MB setting. But I'm sure there's a
good explanation to that as well :-)

Yes: the fdt/fdx (and term vectors if you had used them) are included in that CFS file. Though, due to inefficiency of RAM usage I'd expect all non-stored-field files in a segment to be maybe 64 MB (assuming 50% RAM efficiency). This means you have really really big stored fields. Does that sound right?

As for the RAMDirectory usage, I would think that if Lucene would store a true directory in-memory, with segments information and all, writing that to the file system would be as efficient as flushing big chunks of byte [], not
having to process the postings and flush them (god forbid) one posting
element at a time.

Not necessarily. By inserting an intermediate RAMDirectory in DocumentsWriter we could get better net RAM efficiency, at hopefully not too much added time cost, than what we have now, as measured by "size of what's flushed to the filesystem divided by RAM buffer size", I think. Really it needs testing. DocumentsWriter is forced to waste some space (much less than before) in order to quickly update posting lists... so this tradeoff of "flush frequently & merge them in RAM" vs "only flush to the filesystem when all RAM buffer is full", may be worthwhile. (We only do the latter today).

The reason I'm worried about the performance of RAM vs. maxBufferredDocs (MBD) is that I was hoping that with Lucene 2.3, if I have a machine with 4GB of RAM available for indexing, I'll be able to utilize it. But according my small test, setting RAM to 128 or MBD to 10,000 (which consumed around 70 MB) gave the same performance. So I find myself asking whether flush by RAM
usage is more useful than by MBD (as the documentation states).

I think this is just because performance levels off? Ie, if you set your RAM buffer size to 70 MB you should see about the same performance as well?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to