>> In my case I have to switch to MMap/Buffers, Java behaves ugly with >> 8Gb heaps. > Do you mean that because garbage collection does not perform well > on these larger heaps, one should avoid to create arrays to have heaps > of that size, and rather use (direct) MMap/Buffers? Yes, exactly. Keeping big Directories in heap is painful in many ways: 1. Old-gen GC is slow on big heaps. Our 3Gb heaps were collected for 6-8 seconds with parallel collector on four-way machines. Concurrent collector consistently core dumps, whatever the settings :) Then we tried increasing heaps (upto 8Gb) in pursuit of less machines in cluster, and it just collected for eternity. 2. Eden-survivor-old chain is showering sparks around when you feed it with huge arrays created in numbers. So your New-gen GCs are still swift (100-200ms), but happen too often. As a consequence some of short-lived objects start leaking into Old-gen. 3. You have to reserve place for merges. Fully optimizing index is very taxing, I cheat by stopping accepting outside requests, switching off memory cache, optimizing, then putting everything back in place.
I'm currently testing mmap approach, and despite Sun's braindead API, it works like a charm. While I'm at it, I got two more questions about MMapDirectory. How often openInput() is called for a file? Is it worthy to do getChannel().map() when file is written and closed, and then clone the buffer for each openInput()? Why don't you force() a newly-mapped Buffer? It will save first few searches hitting a new segment from pagefaults and waiting for that segment to be loaded. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org