Lubos Kosco <[email protected]> writes:

> Great job guys, looks awesome :)
> such optimizations would be needed for other analyzers as well ...
> e.g. in one of the Dougs pictures there were other analyzers taking up
> a lot of memory too ...

Right, the way the analyzers work currently is sub-optimal in many ways
when it comes to memory usage:

1) They're optimized to minimize reading from disk. But because we do
two passes per file (one to add it to the Lucene index and one to write
its xref to disk), the entire file needs to be kept in memory to avoid
reading it again in the second pass.

2) Each analyzer instance preserves its read buffer for reuse. Because
of (1) the read buffer will grow to fit the largest file the analyzer
has seen. So once a big file is seen, the read buffer will grow and
never shrink again.

3) The analyzers are cached in thread locals, so the total size of the
read buffers may grow up to (number of indexer threads * number of
analyzer classes * size of the largest file).

Some possible experiments we could do and see how they affect both
memory consumption and indexer performance:

a) Read the file from disk again when generating the xref. Then the
analysis phase doesn't need to make the read buffer fit the entire file,
so we avoid these enormous read buffers.

b) Remove the caching of analyzer instances in thread locals. Instead
generate a fresh instance when needed. Then the unnecessarily big read
buffers can be garbage collected.

-- 
Knut Anders
_______________________________________________
opengrok-dev mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opengrok-dev

Reply via email to