Re: upgrade from Lucene 1.3 final to 1.4rc3 problem
Hi! Thanks, the problem was sovled by using lucene1.4 final. Regards, AlexAw - Original Message - From: Zilverline info [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 10:32 PM Subject: Re: upgrade from Lucene 1.3 final to 1.4rc3 problem This is a bug (see posting 'Lockfile Problem Solved'), upgrade to 1.4-final, and you'll be fine Alex Aw Seat Kiong wrote: Hi! I'm using Lucene 1.3 final currently, all things were working fine. But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it) We can re-compile it successfuly. but when will try to index the document. It give the error as below: java.lang.NullPointerException at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146) at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173) Which wrong? Pls help. Thanks. Regards, Alex - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
upgrade from Lucene 1.3 final to 1.4rc3 problem
Hi! I'm using Lucene 1.3 final currently, all things were working fine. But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it) We can re-compile it successfuly. but when will try to index the document. It give the error as below: java.lang.NullPointerException at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146) at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173) Which wrong? Pls help. Thanks. Regards, Alex
How to use QueryParser to query to get the index summary info?
Hi! How to use QueryParser to query to get the index summary info, like? a. Last and first index document? b. Size of each document was indexed? b. Total size of all documents were indexed? c. Total count of all documents were indexed? Anyone know about it? Thanks. Regards, Alex
Are the lucene index server support for other language, like chinese?
Hi! Are the lucene index server support for other language, like chinese? What is the additional work need to be done for support it? Thanks, Alex - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, May 06, 2004 3:37 AM Subject: Re: Where does the name lucene come from? Funny, earlier today I started to reply to this message, and then decided not to answer this question any more. It is a FAQ entry now: http://www.jguru.com/faq/Lucene Otis --- Steven Rowe [EMAIL PROTECTED] wrote: Til Schneider wrote: Hi, Working now for a few months with this really great search engine, I was wondering where the name Lucene comes from? What does it mean? Is there any deeper sense? Doug Cutting's response: URL:http://tinyurl.com/2hh5c (full original URL: URL:http://issues.apache.org/eyebrowse/[EMAIL PROTECTED] .apache.orgmsgId=961817 ) Otis, shouldn't this be an FAQ? Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Document Clustering
Hi! I'm also interest it. Kindly CC to me the lastest progress of your clustering project. Regards, AlexAw - Original Message - From: Eric Jain [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 10:07 PM Subject: Re: Document Clustering I'm working on it. Classification and Clustering as well. Very interesting... if you get something working, please don't forget to notify this list :-) -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
lucene indexing and searching engine performance
Hi Doug Cutting ! That's really very helpful, thanks to Doug. I'm doing the performance research of the lucene speed of indexing and searching engine. So, isn't able to give me more details of 1. searching But if you need to search two million 2kB documents on a 500Mhz Pentium with 128MB of RAM in a couple of seconds per query, you're probably okay. What is the other hardware spec, like - SCSI harddisk or IDE harddisk? If it's SCSI harddisk, what is the model of the harddisk and SCSI card model, PRM? - Which OS was use for this performance testing? - Which Application Server was use for this performance testing? 2. indexing (assume the hardware and software spec is same as searching server) Index space should be generally less than the original document size, right? Assume, for 500MB Disk Space for the application, Max index size : should been more than 250,000 document in 2 KB size, right? Max Speed of indexing : ??? documents in 2KB size per hours Can share the performance test was done to among of us? Thank You. Regards, AlexAw - Original Message - From: Maurice Coyle [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 6:50 PM Subject: Re: large index query time that's very helpful, thanks to all who replied. my index is definitely larger than my RAM so i guess the increase in query time is due to an increase in time to open the index/perform a search. thanks again, maurice - Original Message - From: Tate Avery [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, October 24, 2003 5:33 PM Subject: RE: large index query time Below are some posts from Doug (circa 2001) that I found very helpful with regard to understanding Lucene scalability. I am assuming that they are still generally applicable. You might also find them useful. Tate --- Performance for large indices is frequently governed by i/o performance. If an index is larger than RAM then searches will need to read data from disk. This can quickly become a bottleneck. A search for a term that occurs in a million documents can require over 1MB of data, which can take some time to read. With multiple searching threads, the disk can easily become a bottleneck. Disk arrays can alleviate this, more RAM helps even more! For some folks, queries that take over a second are unacceptable, for others, ten seconds is okay. Performance should be more-or-less linear: a two-million document index will be almost twice as slow to search as a one-million document index. There are lots of factors, including document size, CPU-speed, RAM-size, i/o subsystem, but a rough rule-of-thumb for Lucene performance might be that, in a typical configuration, it can search a million documents per second. So if you need to search 20 million 100kB documents on a 100Mhz 386 with 8MB of RAM with sub-second response time, Lucene will probably fail. But if you need to search two million 2kB documents on a 500Mhz Pentium with 128MB of RAM in a couple of seconds per query, you're probably okay. - Doug Cutting (10/08/2001) Some more precise statements: The cost to search for a term is proportional to the number of documents that contain that term. The cost to search for a phrase is proportional to the sum of the number of occurrences of its constituent terms. The cost to execute a boolean query is the sum of the costs of its sub-queries. Longer documents contain more terms: usually both more unique terms and more occurrences. Total vocabulary size is not a big factor in search performance. When you open an index Lucene does read one out of every 128 unique terms into a table, so an index with a large number of unique terms will be slower to open. Searching that table for query terms is also slower for bigger indexes, but the time to search that table is not significant in overall performance. Lucene also reads at index open one byte per document per indexed field (the normalization factor). So an index with lots of documents and fields will also be slower to open. But, once opened, the cost of searching is largely dependent on the frequency characteristics of query terms. And, since IndexReaders and Searchers are thread safe, you don't need to open indexes very often. - Doug Cutting (10/08/2001) -Original Message- From: Dan Quaroni [mailto:[EMAIL PROTECTED] Sent: October 24, 2003 1:33 PM To: 'Lucene Users List' Subject: RE: large index query time My experience is that the query time (and memory usage) can be affected greatly by booleans that retrieve lots of results. Are you finding it slow when doing a simple query that should return only a handful of results, or is it on more complex queries? -Original Message- From: Maurice Coyle [mailto:[EMAIL PROTECTED] Sent: Friday, October 24, 2003 1:29 PM To:
Re: large index query time
Hi Doug Cutting ! That's really very helpful, thanks to Doug. I'm doing the performance research of the lucene speed of indexing and searching engine. So, isn't able to give me more details of 1. searching But if you need to search two million 2kB documents on a 500Mhz Pentium with 128MB of RAM in a couple of seconds per query, you're probably okay. What is the other hardware spec, like - SCSI harddisk or IDE harddisk? If it's SCSI harddisk, what is the model of the harddisk and SCSI card model, PRM? - Which OS was use for this performance testing? - Which Application Server was use for this performance testing? 2. indexing (assume the hardware and software spec is same as searching server) Index space should be generally less than the original document size, right? Assume, for 500MB Disk Space for the application, Max index size : should been more than 250,000 document in 2 KB size, right? Max Speed of indexing : ??? documents in 2KB size per hours Can share the performance test was done to among of us? Thank You. Regards, AlexAw - Original Message - From: Maurice Coyle [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 6:50 PM Subject: Re: large index query time that's very helpful, thanks to all who replied. my index is definitely larger than my RAM so i guess the increase in query time is due to an increase in time to open the index/perform a search. thanks again, maurice - Original Message - From: Tate Avery [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, October 24, 2003 5:33 PM Subject: RE: large index query time Below are some posts from Doug (circa 2001) that I found very helpful with regard to understanding Lucene scalability. I am assuming that they are still generally applicable. You might also find them useful. Tate --- Performance for large indices is frequently governed by i/o performance. If an index is larger than RAM then searches will need to read data from disk. This can quickly become a bottleneck. A search for a term that occurs in a million documents can require over 1MB of data, which can take some time to read. With multiple searching threads, the disk can easily become a bottleneck. Disk arrays can alleviate this, more RAM helps even more! For some folks, queries that take over a second are unacceptable, for others, ten seconds is okay. Performance should be more-or-less linear: a two-million document index will be almost twice as slow to search as a one-million document index. There are lots of factors, including document size, CPU-speed, RAM-size, i/o subsystem, but a rough rule-of-thumb for Lucene performance might be that, in a typical configuration, it can search a million documents per second. So if you need to search 20 million 100kB documents on a 100Mhz 386 with 8MB of RAM with sub-second response time, Lucene will probably fail. But if you need to search two million 2kB documents on a 500Mhz Pentium with 128MB of RAM in a couple of seconds per query, you're probably okay. - Doug Cutting (10/08/2001) Some more precise statements: The cost to search for a term is proportional to the number of documents that contain that term. The cost to search for a phrase is proportional to the sum of the number of occurrences of its constituent terms. The cost to execute a boolean query is the sum of the costs of its sub-queries. Longer documents contain more terms: usually both more unique terms and more occurrences. Total vocabulary size is not a big factor in search performance. When you open an index Lucene does read one out of every 128 unique terms into a table, so an index with a large number of unique terms will be slower to open. Searching that table for query terms is also slower for bigger indexes, but the time to search that table is not significant in overall performance. Lucene also reads at index open one byte per document per indexed field (the normalization factor). So an index with lots of documents and fields will also be slower to open. But, once opened, the cost of searching is largely dependent on the frequency characteristics of query terms. And, since IndexReaders and Searchers are thread safe, you don't need to open indexes very often. - Doug Cutting (10/08/2001) -Original Message- From: Dan Quaroni [mailto:[EMAIL PROTECTED] Sent: October 24, 2003 1:33 PM To: 'Lucene Users List' Subject: RE: large index query time My experience is that the query time (and memory usage) can be affected greatly by booleans that retrieve lots of results. Are you finding it slow when doing a simple query that should return only a handful of results, or is it on more complex queries? -Original Message- From: Maurice Coyle [mailto:[EMAIL PROTECTED] Sent: