We've had somewhat of a similar situation ourselves, where we are indexing about a million records to an index, and each record can be somewhat large.
Now..what happened on our side was that the index files (very similar in structure to what you have below) came up to a 2 gig limit and stopped there..and the indexer started crashing each time it hit this limit. On your side, I don't see your index file sizes really that large. I think the compiling with large file support only really kicks in when you hit this 2 gig size limit. Couple of thoughts that might help: 1. On our side, to keep size down, I would optimize the index at every 100,000 documents. The optimize call also flushes the index. 2. Make sure you close the index once you index your data. Small thing..but just making sure. 3. With the index being this large, we actually have two copies, one for searching against an already optimized index, and the other copy doing the indexing. This way, no items are being searched on while the indexing is taking place. 4. One neat thing that I learned with indexing large items, was that I don't have to actually store everything. I can have a field set to tokenize, but not store, so that it can be searched..but I don't need it to be displayed in the search results per say..I don't actually store it, so I was able to keep my index size down. > From: "Ben Lee" <[EMAIL PROTECTED]> > Reply-To: [email protected] > Date: Tue, 10 Oct 2006 18:35:35 -0700 > To: [email protected] > Subject: [Ferret-talk] Indexing problem 10.9/10.10 > > Sorry if this is a repost- I wasn't sure if the www.ruby-forum.com > list works for postings. > I've been having trouble with indexing a large amount of documents(2.4M). > > > Essentially, I have one process that is following the tutorial > dumping documents to an index stored on the file system. If I open the > index with another process, and run the size() method it is stuck at > a number of documents much smaller than the number I've added to the index. > > Eg. 290k -- when the indexer process has already gone through 1 M. > > Additionally, if I search, I don't get results past an > even smaller number of docs (22k) . I've tried the two latest ferret releases. > > > Does this listing of the index directory look right? > > -rw------- 1 blee blee 3.8M Oct 10 17:06 _v.fdt > -rw------- 1 blee blee 51K Oct 10 17:06 _v.fdx > -rw------- 1 blee blee 12M Oct 10 16:49 _u.cfs > -rw------- 1 blee blee 97 Oct 10 16:49 fields > > -rw------- 1 blee blee 78 Oct 10 16:49 segments > -rw------- 1 blee blee 11M Oct 10 16:23 _t.cfs > -rw------- 1 blee blee 11M Oct 10 15:56 _s.cfs > -rw------- 1 blee blee 15M Oct 10 15:11 _r.cfs > -rw------- 1 blee blee 13M Oct 10 14:48 _q.cfs > > -rw------- 1 blee blee 14M Oct 10 14:37 _p.cfs > -rw------- 1 blee blee 13M Oct 10 14:28 _o.cfs > -rw------- 1 blee blee 12M Oct 10 14:19 _n.cfs > -rw------- 1 blee blee 12M Oct 10 14:16 _m.cfs > -rw------- 1 blee blee 118M Oct 10 14:10 _l.cfs > > -rw------- 1 blee blee 129M Oct 10 13:24 _a.cfs > -rw------- 1 blee blee 0 Oct 10 13:00 ferret-write.lck > > Thanks, > Ben > _______________________________________________ > Ferret-talk mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/ferret-talk > _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

