Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
I added support for all items listed below, except commit/write lock file name. I don't see why one would want to change that, considering those files are still limited to the index directory. Otis --- Stephane James Vaucher [EMAIL PROTECTED] wrote: How about (looking big rather than small): - MaxClause from BooleanQuery (I know there has been discussions on the dev list, but I haven't been following it) - default commit_lock_name - default commit_lock_timeout - default maxFieldLength - default maxMergeDocs - default mergeFactor - default minMergeDocs - default write_lock_name - default write_lock_timeout I'm currently configuring parts of my app using sys properties, particularly the mergeFactor because my prod system has 2GB of RAM and is windows based and my dev machine has 256MB and is linux. If no one takes a crack at this, I'll see what I can do in 2 weeks, after my vacations. Cheers, sv On Wed, 3 Mar 2004, Doug Cutting wrote: Stephane James Vaucher wrote: As I've stated in my earlier mail, I like this change. More importantly, could this become a standard way of changing configurations at runtime? For example, the default merge factor could also be set in this manner. Sure, that's reasonable, so this would be something like: private static final int DEFAULT_MERGE_FACTOR = Integer.parseInt(System.getProperty(org.apache.lucene.mergeFactor,10)); In IndexWriter.java. What other candidates are there for this treatment? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sys properties Was: java.io.tmpdir as lock dir .... once again
(ms) 16 39772 total within(ms) 32 18645 total within(ms) 32 995 total within(ms) 47 6 open index time:47 28367 total within(ms) 15 37970 total within(ms) 31 45169 total within(ms) 46 21168 total within(ms) 31 1112 total within(ms) 31 7 open index time:31 31424 total within(ms) 31 42002 total within(ms) 16 49994 total within(ms) 31 23432 total within(ms) 32 1223 total within(ms) 47 8 open index time:46 33895 total within(ms) 32 45292 total within(ms) 47 53957 total within(ms) 47 25230 total within(ms) 32 1352 total within(ms) 47 9 open index time:63 37320 total within(ms) 31 49922 total within(ms) 15 59412 total within(ms) 47 27830 total within(ms) 31 1474 total within(ms) 62 10 open index time:984 38475 total within(ms) 16 51552 total within(ms) 16 61389 total within(ms) 47 28638 total within(ms) 46 1530 total within(ms) 157 -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, March 08, 2004 1:16 PM To: Lucene Users List Subject: Re: Sys properties Was: java.io.tmpdir as lock dir once again hui wrote: Index time: compound format is 89 seconds slower. compound format: 1389507 total milliseconds non-compound format: 1300534 total milliseconds The index size is 85m with 4 fields only. The files are stored in the index. The compound format has only 3 files and the other has 13 files. Thanks for performing this benchmark! It looks like the compound format is around 7% slower when indexing. To my thinking that's acceptable, given the dramatic reduction in file handles. If folks really need maximal indexing performance, then they can explicitly disable the compound format. Would anyone object to making compound format the default for Lucene 1.4? This is an incompatible change, but I don't think it should break applications. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sys properties Was: java.io.tmpdir as lock dir .... once again
Hi, Here is the indexing performance testing result for the two index formats. 1000 megahertz Intel Pentium III (2 installed) 32 kilobyte primary memory cache 256 kilobyte secondary memory cache SCSI Hard drive 145.45 GB RAm 3G Windows 2000 Advanced Server, Service Pack 2 JDK 140 JVM memory 512m Indexed files: local 66100 local text files around 400m Index time: compound format is 89 seconds slower. compound format: 1389507 total milliseconds non-compound format: 1300534 total milliseconds The index size is 85m with 4 fields only. The files are stored in the index. The compound format has only 3 files and the other has 13 files. Search Time (with only top 10 retrieved, no indexing at the same time, only one thread search, indices are optimized and opened) Do not see too much constant difference for the simple situation. compound format: Query: iraq 4275 total within(ms) 110 Query: war 5728 total within(ms) 0 Query: iraq AND war 3182 total within(ms) 16 non-compound format: Query: war 5728 total within(ms) 125 Query: iraq war 6821 total within(ms) 31 Query: iraq AND war 3182 total within(ms) 0 -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, March 04, 2004 11:54 AM To: Lucene Users List Subject: Re: Sys properties Was: java.io.tmpdir as lock dir once again hui wrote: Not yet. For the compound file format, when the files get bigger, if I add few new files frequently, the bigger files has to be updated. Will that affect lot on the search and produce heavier disk I/O compared with the traditional index format? It seems OS cache makes quite difference when the files not changed differently. The compound format slows indexing performance slightly, but should not affect search performance much. It radically reduces the number of file handles used when searching, by a factor of eight or more, depending on how many indexed fields you have. Perhaps the compound format should be the default format in 1.4. Can folks provide any benchmarks for how it affects performance? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
hui wrote: Hi, Here is the indexing performance testing result for the two index formats. A shameless plug: you can use Luke (http://www.getopt.org/luke) to convert the same index between compound/non-compound formats. Which could be useful to rule out any possible differences in the indexing/inserting process between the runs. Luke provides you also with a simple time measurement for query execution. Just FYI. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sys properties Was: java.io.tmpdir as lock dir .... once again
Thank you, the converting option from Luke is really helpful for migrate existing user index. Regards, Hui -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Monday, March 08, 2004 10:57 AM To: Lucene Users List Subject: Re: Sys properties Was: java.io.tmpdir as lock dir once again hui wrote: Hi, Here is the indexing performance testing result for the two index formats. A shameless plug: you can use Luke (http://www.getopt.org/luke) to convert the same index between compound/non-compound formats. Which could be useful to rule out any possible differences in the indexing/inserting process between the runs. Luke provides you also with a simple time measurement for query execution. Just FYI. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
hui wrote: Index time: compound format is 89 seconds slower. compound format: 1389507 total milliseconds non-compound format: 1300534 total milliseconds The index size is 85m with 4 fields only. The files are stored in the index. The compound format has only 3 files and the other has 13 files. Thanks for performing this benchmark! It looks like the compound format is around 7% slower when indexing. To my thinking that's acceptable, given the dramatic reduction in file handles. If folks really need maximal indexing performance, then they can explicitly disable the compound format. Would anyone object to making compound format the default for Lucene 1.4? This is an incompatible change, but I don't think it should break applications. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
I tend to agree (but with the same uncertainty as to why I feel that way). Regards, Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, March 08, 2004 2:34 PM Subject: Re: Sys properties Was: java.io.tmpdir as lock dir once again I can't explain why, but I feel like the old index format should stay by default. I feel like I'd rather a (slightly) faster index, and switch to the compound one when/IF I encounter problems, than have a safer, but slower index, and never realize that there is a faster option available. Weak argument, I know, but some instinct in me thinks that the current mode should remain. Otis --- Doug Cutting [EMAIL PROTECTED] wrote: hui wrote: Index time: compound format is 89 seconds slower. compound format: 1389507 total milliseconds non-compound format: 1300534 total milliseconds The index size is 85m with 4 fields only. The files are stored in the index. The compound format has only 3 files and the other has 13 files. Thanks for performing this benchmark! It looks like the compound format is around 7% slower when indexing. To my thinking that's acceptable, given the dramatic reduction in file handles. If folks really need maximal indexing performance, then they can explicitly disable the compound format. Would anyone object to making compound format the default for Lucene 1.4? This is an incompatible change, but I don't think it should break applications. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sys properties Was: java.io.tmpdir as lock dir .... once again
Not yet. For the compound file format, when the files get bigger, if I add few new files frequently, the bigger files has to be updated. Will that affect lot on the search and produce heavier disk I/O compared with the traditional index format? It seems OS cache makes quite difference when the files not changed differently. Regards, Hui -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 03, 2004 9:21 PM To: Lucene Users List Subject: Re: Sys properties Was: java.io.tmpdir as lock dir once again On Mar 3, 2004, at 4:25 PM, hui wrote: Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is going to be called. Right now, if the file number within one index out of the limit of your window system, you lost the index. Thank you for the consideration. Have you tried using the compound file format introduced in 1.3? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
hui wrote: Not yet. For the compound file format, when the files get bigger, if I add few new files frequently, the bigger files has to be updated. Will that affect lot on the search and produce heavier disk I/O compared with the traditional index format? It seems OS cache makes quite difference when the files not changed differently. The compound format slows indexing performance slightly, but should not affect search performance much. It radically reduces the number of file handles used when searching, by a factor of eight or more, depending on how many indexed fields you have. Perhaps the compound format should be the default format in 1.4. Can folks provide any benchmarks for how it affects performance? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
How about (looking big rather than small): - MaxClause from BooleanQuery (I know there has been discussions on the dev list, but I haven't been following it) - default commit_lock_name - default commit_lock_timeout - default maxFieldLength - default maxMergeDocs - default mergeFactor - default minMergeDocs - default write_lock_name - default write_lock_timeout I'm currently configuring parts of my app using sys properties, particularly the mergeFactor because my prod system has 2GB of RAM and is windows based and my dev machine has 256MB and is linux. If no one takes a crack at this, I'll see what I can do in 2 weeks, after my vacations. Cheers, sv On Wed, 3 Mar 2004, Doug Cutting wrote: Stephane James Vaucher wrote: As I've stated in my earlier mail, I like this change. More importantly, could this become a standard way of changing configurations at runtime? For example, the default merge factor could also be set in this manner. Sure, that's reasonable, so this would be something like: private static final int DEFAULT_MERGE_FACTOR = Integer.parseInt(System.getProperty(org.apache.lucene.mergeFactor,10)); In IndexWriter.java. What other candidates are there for this treatment? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sys properties Was: java.io.tmpdir as lock dir .... once again
Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is going to be called. Right now, if the file number within one index out of the limit of your window system, you lost the index. Thank you for the consideration. Regards, hui -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 03, 2004 3:46 PM To: Lucene Users List Subject: Re: Sys properties Was: java.io.tmpdir as lock dir once again Stephane James Vaucher wrote: As I've stated in my earlier mail, I like this change. More importantly, could this become a standard way of changing configurations at runtime? For example, the default merge factor could also be set in this manner. Sure, that's reasonable, so this would be something like: private static final int DEFAULT_MERGE_FACTOR = Integer.parseInt(System.getProperty(org.apache.lucene.mergeFactor,10)); In IndexWriter.java. What other candidates are there for this treatment? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
On Mar 3, 2004, at 4:25 PM, hui wrote: Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is going to be called. Right now, if the file number within one index out of the limit of your window system, you lost the index. Thank you for the consideration. Have you tried using the compound file format introduced in 1.3? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]