On 10/11/06, Ben Lee <[EMAIL PROTECTED]> wrote:
> Sorry if this is a repost-  I wasn't sure if the www.ruby-forum.com
> list works for postings.
> I've been having trouble with indexing a large amount of documents(2.4M).
>
>
> Essentially, I have one process that is following the tutorial
> dumping documents to an index stored on the file system.  If I open the
> index with another process, and run the size() method it is stuck at
> a number of documents much smaller than the number I've added to the index.
>
> Eg. 290k -- when the indexer process has already gone through 1 M.
>
> Additionally, if I search, I don't get results past an
> even smaller number of docs (22k) . I've tried the two latest ferret releases.
>
>
> Does this listing of the index directory look right?
>
> -rw-------  1 blee blee 3.8M Oct 10 17:06 _v.fdt
> -rw-------  1 blee blee  51K Oct 10 17:06 _v.fdx
> -rw-------  1 blee blee  12M Oct 10 16:49 _u.cfs
> -rw-------  1 blee blee   97 Oct 10 16:49 fields
>
> -rw-------  1 blee blee   78 Oct 10 16:49 segments
> -rw-------  1 blee blee  11M Oct 10 16:23 _t.cfs
> -rw-------  1 blee blee  11M Oct 10 15:56 _s.cfs
> -rw-------  1 blee blee  15M Oct 10 15:11 _r.cfs
> -rw-------  1 blee blee  13M Oct 10 14:48 _q.cfs
>
> -rw-------  1 blee blee  14M Oct 10 14:37 _p.cfs
> -rw-------  1 blee blee  13M Oct 10 14:28 _o.cfs
> -rw-------  1 blee blee  12M Oct 10 14:19 _n.cfs
> -rw-------  1 blee blee  12M Oct 10 14:16 _m.cfs
> -rw-------  1 blee blee 118M Oct 10 14:10 _l.cfs
>
> -rw-------  1 blee blee 129M Oct 10 13:24 _a.cfs
> -rw-------  1 blee blee    0 Oct 10 13:00 ferret-write.lck
>
> Thanks,
> Ben

I thought this was possibly due to the fact that you didn't have
Ferret compiled with large-file support but by the looks of it you
aren't getting near that limit yet. In the directory listing you have
here there is no way you could have added more than 290K documents
unless you set :max_buffered_docs to a different value (> 10,000).
Perhaps the index is getting over-written at some stage. Could you
show us the code you are using for indexing?

As for search results only showing for the top 22k documents, I'm not
sure what the problem might be. You need to make sure you open the
index reader or searcher after committing the index writer, otherwise
the latest results won't show up. I don't think this is your problem
though as I'm sure you would have opened the index-reader much later
than after indexing 22k documents.

Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to