This is actually [sadly] expected. This is showing that your RAM efficiency is ~50% (well, less, if the segment also has stored fields / term vectors).
This is because the in-RAM data structures cannot be 100% efficient as they must leave room to "grow" the individual postings. But once written on disk the format is obviously compacted vs what's in RAM. Mike http://blog.mikemccandless.com On Thu, Apr 14, 2011 at 7:21 AM, Shai Erera <[email protected]> wrote: > Hi > > I'm indexing w/ IW, flush-by-RAM=off and flush-by-doc=MAX_INT. Whenever > iw.ramSizeInBytes() >= threshold, I commit the changes, serializes the > Directory somewhere and starts with a new Directory and IW instance. > > The threshold is currently 32MB. I noticed though that the size of the > serialized Directory is nearly half (<16 MB). Is that expected? Will I see > that behavior every time (e.g. w/ large stored fields), or is it data > dependent? I assume that the data can affect the compression, but I never > thought that by 50% factor, from RAM to disk. > > Shai > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
