Shai Erera wrote:
I think you misunderstood me - ultimately, the process reached 128MB.
However it was flushing the .fdt file before it reached that. Your
explanation on stored fields explains that behavior, but it did
consume128MB.
Ahh, phew.
Also, the CFS files that were written were of size >200MB (but less
than
256) - which does not align with the 128MB setting. But I'm sure
there's a
good explanation to that as well :-)
Yes: the fdt/fdx (and term vectors if you had used them) are included
in that CFS file. Though, due to inefficiency of RAM usage I'd
expect all non-stored-field files in a segment to be maybe 64 MB
(assuming 50% RAM efficiency). This means you have really really big
stored fields. Does that sound right?
As for the RAMDirectory usage, I would think that if Lucene would
store a
true directory in-memory, with segments information and all,
writing that to
the file system would be as efficient as flushing big chunks of byte
[], not
having to process the postings and flush them (god forbid) one posting
element at a time.
Not necessarily. By inserting an intermediate RAMDirectory in
DocumentsWriter we could get better net RAM efficiency, at hopefully
not too much added time cost, than what we have now, as measured by
"size of what's flushed to the filesystem divided by RAM buffer
size", I think. Really it needs testing. DocumentsWriter is forced
to waste some space (much less than before) in order to quickly
update posting lists... so this tradeoff of "flush frequently & merge
them in RAM" vs "only flush to the filesystem when all RAM buffer is
full", may be worthwhile. (We only do the latter today).
The reason I'm worried about the performance of RAM vs.
maxBufferredDocs
(MBD) is that I was hoping that with Lucene 2.3, if I have a
machine with
4GB of RAM available for indexing, I'll be able to utilize it. But
according
my small test, setting RAM to 128 or MBD to 10,000 (which consumed
around 70
MB) gave the same performance. So I find myself asking whether
flush by RAM
usage is more useful than by MBD (as the documentation states).
I think this is just because performance levels off? Ie, if you set
your RAM buffer size to 70 MB you should see about the same
performance as well?
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]