Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-18 Thread Otis Gospodnetic
I added support for all items listed below, except commit/write lock
file name.  I don't see why one would want to change that, considering
those files are still limited to the index directory.

Otis

--- Stephane James Vaucher [EMAIL PROTECTED] wrote:
 How about (looking big rather than small):
 
 - MaxClause from BooleanQuery (I know there has been discussions on 
 the dev list, but I haven't been following it)
 - default commit_lock_name
 - default commit_lock_timeout
 - default maxFieldLength
 - default maxMergeDocs
 - default mergeFactor
 - default minMergeDocs
 - default write_lock_name
 - default write_lock_timeout
 
 I'm currently configuring parts of my app using sys properties, 
 particularly the mergeFactor because my prod system has 2GB of RAM
 and is 
 windows based and my dev machine has 256MB and is linux. If no one
 takes a 
 crack at this, I'll see what I can do in 2 weeks, after my vacations.
 
 Cheers,
 sv
 
 On Wed, 3 Mar 2004, Doug Cutting wrote:
 
  Stephane James Vaucher wrote:
   As I've stated in my earlier mail, I like this change. More
 importantly, 
   could this become a standard way of changing configurations at
 runtime? 
   For example, the default merge factor could also be set in this
 manner.
  
  Sure, that's reasonable, so this would be something like:
  
  private static final int DEFAULT_MERGE_FACTOR =
   
 

Integer.parseInt(System.getProperty(org.apache.lucene.mergeFactor,10));
  
  In IndexWriter.java.
  
  What other candidates are there for this treatment?
  
  Doug
  
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-09 Thread hui
(ms) 16
39772 total within(ms) 32
18645 total within(ms) 32
995 total within(ms) 47
6 open index time:47
28367 total within(ms) 15
37970 total within(ms) 31
45169 total within(ms) 46
21168 total within(ms) 31
1112 total within(ms) 31
7 open index time:31
31424 total within(ms) 31
42002 total within(ms) 16
49994 total within(ms) 31
23432 total within(ms) 32
1223 total within(ms) 47
8 open index time:46
33895 total within(ms) 32
45292 total within(ms) 47
53957 total within(ms) 47
25230 total within(ms) 32
1352 total within(ms) 47
9 open index time:63
37320 total within(ms) 31
49922 total within(ms) 15
59412 total within(ms) 47
27830 total within(ms) 31
1474 total within(ms) 62
10 open index time:984
38475 total within(ms) 16
51552 total within(ms) 16
61389 total within(ms) 47
28638 total within(ms) 46
1530 total within(ms) 157

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 08, 2004 1:16 PM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir  once again

hui wrote:
 Index time: 
 compound format is 89 seconds slower.
 
 compound format:
 1389507 total milliseconds
 non-compound format:
 1300534 total milliseconds
 
 The index size is 85m with 4 fields only. The files are stored in the
index.
 The compound format has only 3 files and the other has 13 files. 

Thanks for performing this benchmark!

It looks like the compound format is around 7% slower when indexing.  To 
my thinking that's acceptable, given the dramatic reduction in file 
handles.  If folks really need maximal indexing performance, then they 
can explicitly disable the compound format.

Would anyone object to making compound format the default for Lucene 
1.4?  This is an incompatible change, but I don't think it should break 
applications.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread hui




Hi,

Here is the indexing performance testing result for the two index formats.


1000 megahertz Intel Pentium III (2 installed)
32 kilobyte primary memory cache
256 kilobyte secondary memory cache

SCSI Hard drive 145.45 GB  
RAm 3G

Windows 2000 Advanced Server, Service Pack 2

JDK 140
JVM memory 512m

Indexed files: local 66100 local text files around 400m

Index time: 
compound format is 89 seconds slower.

compound format:
1389507 total milliseconds
non-compound format:
1300534 total milliseconds

The index size is 85m with 4 fields only. The files are stored in the index.
The compound format has only 3 files and the other has 13 files. 

Search Time (with only top 10 retrieved, no indexing at the same time, only
one thread search, indices are optimized and opened)
Do not see too much constant difference for the simple situation.

compound format:
Query: iraq
4275 total within(ms) 110
Query: war
5728 total within(ms) 0
Query: iraq AND war
3182 total within(ms) 16

non-compound format:
Query: war
5728 total within(ms) 125
Query: iraq war
6821 total within(ms) 31
Query: iraq AND war
3182 total within(ms) 0



-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 04, 2004 11:54 AM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir  once again

hui wrote:
 Not yet. For the compound file format, when the files get bigger, if I add
 few new files frequently, the bigger files has to be updated. Will that
 affect lot on the search and produce heavier disk I/O compared with the
 traditional index format? It seems OS cache makes quite difference when
the
 files not changed differently.

The compound format slows indexing performance slightly, but should not 
affect search performance much.  It radically reduces the number of file 
handles used when searching, by a factor of eight or more, depending on 
how many indexed fields you have.

Perhaps the compound format should be the default format in 1.4.  Can 
folks provide any benchmarks for how it affects performance?

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread Andrzej Bialecki
hui wrote:



Hi,

Here is the indexing performance testing result for the two index formats.
A shameless plug: you can use Luke (http://www.getopt.org/luke) to 
convert the same index between compound/non-compound formats. Which 
could be useful to rule out any possible differences in the 
indexing/inserting process between the runs. Luke provides you also with 
a simple time measurement for query execution. Just FYI.

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread hui
Thank you, the converting option from Luke is really helpful for migrate
existing user index.
Regards,
Hui

-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 08, 2004 10:57 AM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir  once again

hui wrote:

 
 
 
 Hi,
 
 Here is the indexing performance testing result for the two index formats.

A shameless plug: you can use Luke (http://www.getopt.org/luke) to 
convert the same index between compound/non-compound formats. Which 
could be useful to rule out any possible differences in the 
indexing/inserting process between the runs. Luke provides you also with 
a simple time measurement for query execution. Just FYI.

-- 
Best regards,
Andrzej Bialecki

-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread Doug Cutting
hui wrote:
Index time: 
compound format is 89 seconds slower.

compound format:
1389507 total milliseconds
non-compound format:
1300534 total milliseconds
The index size is 85m with 4 fields only. The files are stored in the index.
The compound format has only 3 files and the other has 13 files. 
Thanks for performing this benchmark!

It looks like the compound format is around 7% slower when indexing.  To 
my thinking that's acceptable, given the dramatic reduction in file 
handles.  If folks really need maximal indexing performance, then they 
can explicitly disable the compound format.

Would anyone object to making compound format the default for Lucene 
1.4?  This is an incompatible change, but I don't think it should break 
applications.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-08 Thread Terry Steichen
I tend to agree (but with the same uncertainty as to why I feel that way).

Regards,

Terry
- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, March 08, 2004 2:34 PM
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir  once again


 I can't explain why, but I feel like the old index format should stay
 by default.  I feel like I'd rather a (slightly) faster index, and
 switch to the compound one when/IF I encounter problems, than have a
 safer, but slower index, and never realize that there is a faster
 option available.
 
 Weak argument, I know, but some instinct in me thinks that the current
 mode should remain.
 
 Otis
 
 
 --- Doug Cutting [EMAIL PROTECTED] wrote:
  hui wrote:
   Index time: 
   compound format is 89 seconds slower.
   
   compound format:
   1389507 total milliseconds
   non-compound format:
   1300534 total milliseconds
   
   The index size is 85m with 4 fields only. The files are stored in
  the index.
   The compound format has only 3 files and the other has 13 files. 
  
  Thanks for performing this benchmark!
  
  It looks like the compound format is around 7% slower when indexing. 
  To 
  my thinking that's acceptable, given the dramatic reduction in file 
  handles.  If folks really need maximal indexing performance, then
  they 
  can explicitly disable the compound format.
  
  Would anyone object to making compound format the default for Lucene 
  1.4?  This is an incompatible change, but I don't think it should
  break 
  applications.
  
  Doug
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-04 Thread hui
Not yet. For the compound file format, when the files get bigger, if I add
few new files frequently, the bigger files has to be updated. Will that
affect lot on the search and produce heavier disk I/O compared with the
traditional index format? It seems OS cache makes quite difference when the
files not changed differently.

Regards,
Hui

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 03, 2004 9:21 PM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir  once again


On Mar 3, 2004, at 4:25 PM, hui wrote:
 Anoterh similar issue. If we could have a parameter to control the max
 number of the files within the index, that is going to avoid the 
 problem of
 running of the file handler issue.
 When the file number within one index reaches the limit, optimization 
 is
 going to be called.
 Right now, if the file number within one index out of the limit of your
 window system, you lost the index.
 Thank you for the consideration.

Have you tried using the compound file format introduced in 1.3?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-04 Thread Doug Cutting
hui wrote:
Not yet. For the compound file format, when the files get bigger, if I add
few new files frequently, the bigger files has to be updated. Will that
affect lot on the search and produce heavier disk I/O compared with the
traditional index format? It seems OS cache makes quite difference when the
files not changed differently.
The compound format slows indexing performance slightly, but should not 
affect search performance much.  It radically reduces the number of file 
handles used when searching, by a factor of eight or more, depending on 
how many indexed fields you have.

Perhaps the compound format should be the default format in 1.4.  Can 
folks provide any benchmarks for how it affects performance?

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Stephane James Vaucher
How about (looking big rather than small):

- MaxClause from BooleanQuery (I know there has been discussions on 
the dev list, but I haven't been following it)
- default commit_lock_name
- default commit_lock_timeout
- default maxFieldLength
- default maxMergeDocs
- default mergeFactor
- default minMergeDocs
- default write_lock_name
- default write_lock_timeout

I'm currently configuring parts of my app using sys properties, 
particularly the mergeFactor because my prod system has 2GB of RAM and is 
windows based and my dev machine has 256MB and is linux. If no one takes a 
crack at this, I'll see what I can do in 2 weeks, after my vacations.

Cheers,
sv

On Wed, 3 Mar 2004, Doug Cutting wrote:

 Stephane James Vaucher wrote:
  As I've stated in my earlier mail, I like this change. More importantly, 
  could this become a standard way of changing configurations at runtime? 
  For example, the default merge factor could also be set in this manner.
 
 Sure, that's reasonable, so this would be something like:
 
 private static final int DEFAULT_MERGE_FACTOR =
  
 Integer.parseInt(System.getProperty(org.apache.lucene.mergeFactor,10));
 
 In IndexWriter.java.
 
 What other candidates are there for this treatment?
 
 Doug
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread hui
Anoterh similar issue. If we could have a parameter to control the max
number of the files within the index, that is going to avoid the problem of
running of the file handler issue.
When the file number within one index reaches the limit, optimization is
going to be called.
Right now, if the file number within one index out of the limit of your
window system, you lost the index.
Thank you for the consideration.

Regards,
hui

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 03, 2004 3:46 PM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir  once again

Stephane James Vaucher wrote:
 As I've stated in my earlier mail, I like this change. More importantly, 
 could this become a standard way of changing configurations at runtime? 
 For example, the default merge factor could also be set in this manner.

Sure, that's reasonable, so this would be something like:

private static final int DEFAULT_MERGE_FACTOR =
 
Integer.parseInt(System.getProperty(org.apache.lucene.mergeFactor,10));

In IndexWriter.java.

What other candidates are there for this treatment?

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Erik Hatcher
On Mar 3, 2004, at 4:25 PM, hui wrote:
Anoterh similar issue. If we could have a parameter to control the max
number of the files within the index, that is going to avoid the 
problem of
running of the file handler issue.
When the file number within one index reaches the limit, optimization 
is
going to be called.
Right now, if the file number within one index out of the limit of your
window system, you lost the index.
Thank you for the consideration.
Have you tried using the compound file format introduced in 1.3?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]