RE: Lucene on Windows

2003-10-21 Thread Stephane Vaucher
Hi Tate (didn't know you were lurking on the list),

I've found that it's often not very clear what truly affects performance. 
Doing batch indexes with  a data set of 250,000 docs (with 10 fields each) 
on a machine with 2 Gbytes of 400 DDR RAM, I've tested a few merge factors 
to discover that it seemed optimal at 50 and even then, performance wasn't 
much better than with a MF of 20. Nowadays, there can be so many hidden 
optimisations by HDs and OSs, that it's often worth testing with each 
configuration used.

sv

On Tue, 21 Oct 2003, Tate Avery wrote:

> Doug,
> 
> Re: high merge factor.  I was building test indexes and writing out 300 segments of 
> 300 docs and merging them every 90,000 kept the 'merging' time down to a minimum 
> (for my slowish HD).
> 
> I was assuming that 11 of these large merges during the indexing of 1,000,000 docs 
> (plus a final optimize) would be faster than 10,000 little merges if the mergeFactor 
> was set to 10 (for the same corpus).
> 
> Maybe this is not the case.
> 
> 
> 
> 
> Tate
> 
> 
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: October 21, 2003 12:37 PM
> To: Lucene Users List
> Subject: Re: Lucene on Windows
> 
> 
> Tate Avery wrote:
> > You might have trouble with "too many open files" if you set your mergeFactor too 
> > high.  For example, on my Win2k, I can go up to mergeFactor=300 (or so).  At 400 I 
> > get a too many open files error.  Note: the default mergeFactor of 10 should give 
> > no trouble.
> 
> Please note that it is never recommended that you set mergeFactor 
> anywhere near this high.  I don't know why folks do this.  It really 
> doesn't make indexing much faster, and it makes searching slower if you 
> don't optimize.  It's a bad idea.  The default setting of 10 works 
> pretty well.  I've also had good experience setting it as high as 50 on 
> big batch indexing runs, but do not recommend setting it much higher 
> than that.  Even then, this can cause problems if you need to use 
> several indexes at once, or you have lots of fields.
> 
> Doug
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene on Windows

2003-10-21 Thread Tate Avery
Doug,

Re: high merge factor.  I was building test indexes and writing out 300 segments of 
300 docs and merging them every 90,000 kept the 'merging' time down to a minimum (for 
my slowish HD).

I was assuming that 11 of these large merges during the indexing of 1,000,000 docs 
(plus a final optimize) would be faster than 10,000 little merges if the mergeFactor 
was set to 10 (for the same corpus).

Maybe this is not the case.




Tate


-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: October 21, 2003 12:37 PM
To: Lucene Users List
Subject: Re: Lucene on Windows


Tate Avery wrote:
> You might have trouble with "too many open files" if you set your mergeFactor too 
> high.  For example, on my Win2k, I can go up to mergeFactor=300 (or so).  At 400 I 
> get a too many open files error.  Note: the default mergeFactor of 10 should give no 
> trouble.

Please note that it is never recommended that you set mergeFactor 
anywhere near this high.  I don't know why folks do this.  It really 
doesn't make indexing much faster, and it makes searching slower if you 
don't optimize.  It's a bad idea.  The default setting of 10 works 
pretty well.  I've also had good experience setting it as high as 50 on 
big batch indexing runs, but do not recommend setting it much higher 
than that.  Even then, this can cause problems if you need to use 
several indexes at once, or you have lots of fields.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene on Windows

2003-10-21 Thread Doug Cutting
Tate Avery wrote:
You might have trouble with "too many open files" if you set your mergeFactor too high.  For example, on my Win2k, I can go up to mergeFactor=300 (or so).  At 400 I get a too many open files error.  Note: the default mergeFactor of 10 should give no trouble.
Please note that it is never recommended that you set mergeFactor 
anywhere near this high.  I don't know why folks do this.  It really 
doesn't make indexing much faster, and it makes searching slower if you 
don't optimize.  It's a bad idea.  The default setting of 10 works 
pretty well.  I've also had good experience setting it as high as 50 on 
big batch indexing runs, but do not recommend setting it much higher 
than that.  Even then, this can cause problems if you need to use 
several indexes at once, or you have lots of fields.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene on Windows

2003-10-21 Thread Otis Gospodnetic
A very rough and simple 'add a single document to the index' test shows
that the Compound Index is marginally slower than the traditional one.
I did not test searching.

Otis

--- Eric Jain <[EMAIL PROTECTED]> wrote:
> > The CVS version of Lucene has a patch that allows one to use a
> > 'Compound Index' instead of the traditional one.  This reduces the
> > number of open files.  For more info, see/make the Javadocs for
> > IndexWriter.
> 
> Interesting option. Do you have a rough idea of what the performance
> impact of using this setting is?
> 
> --
> Eric Jain
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene on Windows

2003-10-21 Thread Eric Jain
> The CVS version of Lucene has a patch that allows one to use a
> 'Compound Index' instead of the traditional one.  This reduces the
> number of open files.  For more info, see/make the Javadocs for
> IndexWriter.

Interesting option. Do you have a rough idea of what the performance
impact of using this setting is?

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene on Windows

2003-10-20 Thread Marco Tedone
I'm using Lucene on Windows without problems. 
- Original Message - 
From: "Steve Jenkins" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Steve Jenkins" <[EMAIL PROTECTED]>
Sent: Monday, October 20, 2003 5:00 PM
Subject: Lucene on Windows


Hi,

Wonder if anyone can help. Has anyone used Lucene on a Windows environment?
Anyone know of any documentation specifically focused on doing that? 
Or anyone know of any gotchas to avoid?

Thanks for any help,
Cheers Steve.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene on Windows

2003-10-20 Thread Otis Gospodnetic
The CVS version of Lucene has a patch that allows one to use a
'Compound Index' instead of the traditional one.  This reduces the
number of open files.  For more info, see/make the Javadocs for
IndexWriter.

Otis

--- Tate Avery <[EMAIL PROTECTED]> wrote:
> 
> You might have trouble with "too many open files" if you set your
> mergeFactor too high.  For example, on my Win2k, I can go up to
> mergeFactor=300 (or so).  At 400 I get a too many open files error. 
> Note: the default mergeFactor of 10 should give no trouble.
> 
> FYI - On my linux box, I got the 'too many open' error on
> mergeFactor=300 (and 200).  So, I am using 100.
> 
> 
> Tate
> 
> 
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: October 20, 2003 12:11 PM
> To: Lucene Users List
> Subject: Re: Lucene on Windows
> 
> 
> 
> On Monday, October 20, 2003, at 12:00  PM, Steve Jenkins wrote:
> > Hi,
> >
> > Wonder if anyone can help. Has anyone used Lucene on a Windows 
> > environment?
> > Anyone know of any documentation specifically focused on doing
> that?
> > Or anyone know of any gotchas to avoid?
> 
> Yup, used Lucene on Windows lots.  Is there a specific issue you feel
> 
> is Windows related?  Its pure Java and works the same on all
> supported 
> platforms.  So no real gotchas with respect to Windows.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene on Windows

2003-10-20 Thread Tate Avery

You might have trouble with "too many open files" if you set your mergeFactor too 
high.  For example, on my Win2k, I can go up to mergeFactor=300 (or so).  At 400 I get 
a too many open files error.  Note: the default mergeFactor of 10 should give no 
trouble.

FYI - On my linux box, I got the 'too many open' error on mergeFactor=300 (and 200).  
So, I am using 100.


Tate


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: October 20, 2003 12:11 PM
To: Lucene Users List
Subject: Re: Lucene on Windows



On Monday, October 20, 2003, at 12:00  PM, Steve Jenkins wrote:
> Hi,
>
> Wonder if anyone can help. Has anyone used Lucene on a Windows 
> environment?
> Anyone know of any documentation specifically focused on doing that?
> Or anyone know of any gotchas to avoid?

Yup, used Lucene on Windows lots.  Is there a specific issue you feel 
is Windows related?  Its pure Java and works the same on all supported 
platforms.  So no real gotchas with respect to Windows.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene on Windows

2003-10-20 Thread Erik Hatcher
On Monday, October 20, 2003, at 12:00  PM, Steve Jenkins wrote:
Hi,

Wonder if anyone can help. Has anyone used Lucene on a Windows 
environment?
Anyone know of any documentation specifically focused on doing that?
Or anyone know of any gotchas to avoid?
Yup, used Lucene on Windows lots.  Is there a specific issue you feel 
is Windows related?  Its pure Java and works the same on all supported 
platforms.  So no real gotchas with respect to Windows.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Lucene on Windows

2003-10-20 Thread Steve Jenkins
Hi,

Wonder if anyone can help. Has anyone used Lucene on a Windows environment?
Anyone know of any documentation specifically focused on doing that? 
Or anyone know of any gotchas to avoid?

Thanks for any help,
Cheers Steve.



LARM running with Lucene on Windows 2000?

2002-12-23 Thread TJ Tee
Have anyone successfully make LARM crawler integrate with Lucene on Windows
2000 platform? Thank you.