I've been running into an interesting situation that I wanted to ask about.
I've been doing some testing by building up indexes with code that looks like this... IndexWriter writer = null; try { writer = new IndexWriter("index", new StandardAnalyzer(), true); writer.mergeFactor = MERGE_FACTOR; PooledExecutor queue = new PooledExecutor(NUM_UPDATE_THREADS); queue.waitWhenBlocked(); for (int min=low; min < high; min += BATCH_SIZE) { int max = min + BATCH_SIZE; if (high < max) { max = high; } queue.execute(new BatchIndexer(writer, min, max)); } end = new Date(); System.out.println("Build Time: " + (end.getTime() - start.getTime()) + "ms"); start = end; writer.optimize(); } finally { if (null != writer) { try { writer.close(); } catch (Exception ignore) {/*NOOP*/; } } } end = new Date(); System.out.println("Optimize Time: " + (end.getTime() - start.getTime()) + "ms"); (where BatchIndexer is a class i have that gets a DB connection, and slurps all records from my DB between min and max and builds some simple Documents out of them and calls writer.addDocument(doc) on each) This was working fine with small ranges, but then i tried building up a nice big index for doing some performance testing. i left it running overnight and when i came back in the morning i discovered that after successfully building up the whole index (~112K docs, ~1.5GB disk) it crashed with an OutOfMemory exception while trying to optimize. I then realized i was only running my JVM with a 256m upper limit on RAM, and i figured that PooledExecutor was still in scope, and maybe it was maintaining some state that was using up a lot of space, so i whiped up a quick little app to solve my problem... public static void main(String[] args) throws Exception { IndexWriter writer = null; try { writer = new IndexWriter("index", new StandardAnalyzer(), false); writer.optimize(); } finally { if (null != writer) { try { writer.close(); } catch (Exception ignore) { /*NOOP*/; } } } } ...but I was dissapointed to discover that even this couldn't run with only 256m of ram. I bumped it up to 512m and then it manged to complete successfully (the final index was only 1.1GB of disk). This raises a few questions in my mind: 1) Is there a rule of thumb for knowing how much memory it takes to optimize an index? 2) Is there a "Best Practice" to follow when building up a large index from scratch in order to reduce the amount of memory needed to optimize once the whole index is build? (ie: would spining up a thread that called writer.optimize() every N minutes be a good idea?) 3) Given an unoptimized index that's allready been built (ie: in the case where my builder crashed and i wanted to try and optimize it without having to rebuild from scratch) is there anyway to get IndexWriter to use less RAM and more disk (trading spead for a smaller form factor -- and aparently: greater stability so that the app doesn't crash) I imagine that the answers to #1 and #2 are largely dependent on the nature of the data in the index (ie: the frequency of terms) but i'm wondering if there is a high level formula that could be used to say "based on the nature of your data, you want to take this approach to optimizing when you build" -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]