RE: Lucene on Windows
Hi Tate (didn't know you were lurking on the list), I've found that it's often not very clear what truly affects performance. Doing batch indexes with a data set of 250,000 docs (with 10 fields each) on a machine with 2 Gbytes of 400 DDR RAM, I've tested a few merge factors to discover that it seemed optimal at 50 and even then, performance wasn't much better than with a MF of 20. Nowadays, there can be so many hidden optimisations by HDs and OSs, that it's often worth testing with each configuration used. sv On Tue, 21 Oct 2003, Tate Avery wrote: > Doug, > > Re: high merge factor. I was building test indexes and writing out 300 segments of > 300 docs and merging them every 90,000 kept the 'merging' time down to a minimum > (for my slowish HD). > > I was assuming that 11 of these large merges during the indexing of 1,000,000 docs > (plus a final optimize) would be faster than 10,000 little merges if the mergeFactor > was set to 10 (for the same corpus). > > Maybe this is not the case. > > > > > Tate > > > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: October 21, 2003 12:37 PM > To: Lucene Users List > Subject: Re: Lucene on Windows > > > Tate Avery wrote: > > You might have trouble with "too many open files" if you set your mergeFactor too > > high. For example, on my Win2k, I can go up to mergeFactor=300 (or so). At 400 I > > get a too many open files error. Note: the default mergeFactor of 10 should give > > no trouble. > > Please note that it is never recommended that you set mergeFactor > anywhere near this high. I don't know why folks do this. It really > doesn't make indexing much faster, and it makes searching slower if you > don't optimize. It's a bad idea. The default setting of 10 works > pretty well. I've also had good experience setting it as high as 50 on > big batch indexing runs, but do not recommend setting it much higher > than that. Even then, this can cause problems if you need to use > several indexes at once, or you have lots of fields. > > Doug > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene on Windows
Doug, Re: high merge factor. I was building test indexes and writing out 300 segments of 300 docs and merging them every 90,000 kept the 'merging' time down to a minimum (for my slowish HD). I was assuming that 11 of these large merges during the indexing of 1,000,000 docs (plus a final optimize) would be faster than 10,000 little merges if the mergeFactor was set to 10 (for the same corpus). Maybe this is not the case. Tate -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: October 21, 2003 12:37 PM To: Lucene Users List Subject: Re: Lucene on Windows Tate Avery wrote: > You might have trouble with "too many open files" if you set your mergeFactor too > high. For example, on my Win2k, I can go up to mergeFactor=300 (or so). At 400 I > get a too many open files error. Note: the default mergeFactor of 10 should give no > trouble. Please note that it is never recommended that you set mergeFactor anywhere near this high. I don't know why folks do this. It really doesn't make indexing much faster, and it makes searching slower if you don't optimize. It's a bad idea. The default setting of 10 works pretty well. I've also had good experience setting it as high as 50 on big batch indexing runs, but do not recommend setting it much higher than that. Even then, this can cause problems if you need to use several indexes at once, or you have lots of fields. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene on Windows
Tate Avery wrote: You might have trouble with "too many open files" if you set your mergeFactor too high. For example, on my Win2k, I can go up to mergeFactor=300 (or so). At 400 I get a too many open files error. Note: the default mergeFactor of 10 should give no trouble. Please note that it is never recommended that you set mergeFactor anywhere near this high. I don't know why folks do this. It really doesn't make indexing much faster, and it makes searching slower if you don't optimize. It's a bad idea. The default setting of 10 works pretty well. I've also had good experience setting it as high as 50 on big batch indexing runs, but do not recommend setting it much higher than that. Even then, this can cause problems if you need to use several indexes at once, or you have lots of fields. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene on Windows
A very rough and simple 'add a single document to the index' test shows that the Compound Index is marginally slower than the traditional one. I did not test searching. Otis --- Eric Jain <[EMAIL PROTECTED]> wrote: > > The CVS version of Lucene has a patch that allows one to use a > > 'Compound Index' instead of the traditional one. This reduces the > > number of open files. For more info, see/make the Javadocs for > > IndexWriter. > > Interesting option. Do you have a rough idea of what the performance > impact of using this setting is? > > -- > Eric Jain > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene on Windows
> The CVS version of Lucene has a patch that allows one to use a > 'Compound Index' instead of the traditional one. This reduces the > number of open files. For more info, see/make the Javadocs for > IndexWriter. Interesting option. Do you have a rough idea of what the performance impact of using this setting is? -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene on Windows
I'm using Lucene on Windows without problems. - Original Message - From: "Steve Jenkins" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "Steve Jenkins" <[EMAIL PROTECTED]> Sent: Monday, October 20, 2003 5:00 PM Subject: Lucene on Windows Hi, Wonder if anyone can help. Has anyone used Lucene on a Windows environment? Anyone know of any documentation specifically focused on doing that? Or anyone know of any gotchas to avoid? Thanks for any help, Cheers Steve. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene on Windows
The CVS version of Lucene has a patch that allows one to use a 'Compound Index' instead of the traditional one. This reduces the number of open files. For more info, see/make the Javadocs for IndexWriter. Otis --- Tate Avery <[EMAIL PROTECTED]> wrote: > > You might have trouble with "too many open files" if you set your > mergeFactor too high. For example, on my Win2k, I can go up to > mergeFactor=300 (or so). At 400 I get a too many open files error. > Note: the default mergeFactor of 10 should give no trouble. > > FYI - On my linux box, I got the 'too many open' error on > mergeFactor=300 (and 200). So, I am using 100. > > > Tate > > > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: October 20, 2003 12:11 PM > To: Lucene Users List > Subject: Re: Lucene on Windows > > > > On Monday, October 20, 2003, at 12:00 PM, Steve Jenkins wrote: > > Hi, > > > > Wonder if anyone can help. Has anyone used Lucene on a Windows > > environment? > > Anyone know of any documentation specifically focused on doing > that? > > Or anyone know of any gotchas to avoid? > > Yup, used Lucene on Windows lots. Is there a specific issue you feel > > is Windows related? Its pure Java and works the same on all > supported > platforms. So no real gotchas with respect to Windows. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene on Windows
You might have trouble with "too many open files" if you set your mergeFactor too high. For example, on my Win2k, I can go up to mergeFactor=300 (or so). At 400 I get a too many open files error. Note: the default mergeFactor of 10 should give no trouble. FYI - On my linux box, I got the 'too many open' error on mergeFactor=300 (and 200). So, I am using 100. Tate -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: October 20, 2003 12:11 PM To: Lucene Users List Subject: Re: Lucene on Windows On Monday, October 20, 2003, at 12:00 PM, Steve Jenkins wrote: > Hi, > > Wonder if anyone can help. Has anyone used Lucene on a Windows > environment? > Anyone know of any documentation specifically focused on doing that? > Or anyone know of any gotchas to avoid? Yup, used Lucene on Windows lots. Is there a specific issue you feel is Windows related? Its pure Java and works the same on all supported platforms. So no real gotchas with respect to Windows. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene on Windows
On Monday, October 20, 2003, at 12:00 PM, Steve Jenkins wrote: Hi, Wonder if anyone can help. Has anyone used Lucene on a Windows environment? Anyone know of any documentation specifically focused on doing that? Or anyone know of any gotchas to avoid? Yup, used Lucene on Windows lots. Is there a specific issue you feel is Windows related? Its pure Java and works the same on all supported platforms. So no real gotchas with respect to Windows. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene on Windows
Hi, Wonder if anyone can help. Has anyone used Lucene on a Windows environment? Anyone know of any documentation specifically focused on doing that? Or anyone know of any gotchas to avoid? Thanks for any help, Cheers Steve.
LARM running with Lucene on Windows 2000?
Have anyone successfully make LARM crawler integrate with Lucene on Windows 2000 platform? Thank you.