Re: Memory usage

Doug Cutting Tue, 02 Aug 2005 10:53:30 -0700

Try the following settings in your nutch-site.xml:

<property>
  <name>io.map.index.skip</name>
  <value>7</value>
</property>


<property>
  <name>indexer.termIndexInterval</name>
  <value>1024</value>
</property>

The first causes data files to use considerably less memory.

The second affects index creation, so must be done before you create theindex you search. It's okay if your segment indexes were createdwithout this, you can just (re-)merge indexes and the merged index willget the setting and use less memory when searching.

Combining these two I have searched a 40+M page index on a machine usingabout 500MB of RAM. That said, search times with such a large index arenot good. At some point, as your collection grows, you will want tomerge multiple indexes containing different subsets of segments and puteach on a separate box and search them with distributed search.


Doug

Jay Pound wrote:

I'm testing an index of 30 million pages, it requires 1.5gb of ram to search
using tomcat 5, I plan on having an index with multiple billion pages, but
if this is to scale then even with 16GB of ram I wont be able to have an
index larger than 320million pages? how can I distribute the memory
requirements across multiple machines, or is there another servlet program
(like resin) that will require less memory to operate, has anyone else run
into this?
Thanks,
-Jay Pound

Re: Memory usage

Reply via email to