RE: Fastest batch indexing with 1.3-rc1

2003-08-20 Thread Dan Quaroni
Looks like I spoke too soon... As the index gets larger, time to merge
becomes prohibitably high.  It appears to increase linearly.

Oh well.  I guess I'll just have to go with about 3ms/doc.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Fastest batch indexing with 1.3-rc1

2003-08-20 Thread Doug Cutting
As the index grows, disk i/o becomes the bottleneck.  The default 
indexing parameters do a pretty good job of optimizing this.  But if you 
have lots of CPUs and lots of disks, you might try building several 
indexes in parallel, each containing a subset of the documents, optimize 
each index and finally merge them all into a single index at the end. 
But you need lots of i/o capacity for this to pay off.

Doug

Dan Quaroni wrote:
Looks like I spoke too soon... As the index gets larger, time to merge
becomes prohibitably high.  It appears to increase linearly.
Oh well.  I guess I'll just have to go with about 3ms/doc.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Fastest batch indexing with 1.3-rc1

2003-08-20 Thread Leo Galambos
Isn't it better for Dan to skip the optimization phase before merging? I 
am not sure, but he could save some time on this (if he has enough file 
handles for that, of course). What strategy do you use in nutch?

THX

-g-

Doug Cutting wrote:

As the index grows, disk i/o becomes the bottleneck.  The default 
indexing parameters do a pretty good job of optimizing this.  But if 
you have lots of CPUs and lots of disks, you might try building 
several indexes in parallel, each containing a subset of the 
documents, optimize each index and finally merge them all into a 
single index at the end. But you need lots of i/o capacity for this to 
pay off.

Doug

Dan Quaroni wrote:

Looks like I spoke too soon... As the index gets larger, time to merge
becomes prohibitably high.  It appears to increase linearly.
Oh well.  I guess I'll just have to go with about 3ms/doc.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]