Re: upgrade from Lucene 1.3 final to 1.4rc3 problem

2004-07-07 Thread Alex Aw Seat Kiong
Hi!

Thanks, the problem was sovled by using lucene1.4 final.

Regards,
AlexAw


- Original Message - 
From: Zilverline info [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 10:32 PM
Subject: Re: upgrade from Lucene 1.3 final to 1.4rc3 problem


 This is a bug (see posting 'Lockfile Problem Solved'), upgrade to
 1.4-final, and you'll be fine

 Alex Aw Seat Kiong wrote:

 Hi!
 
 I'm using Lucene 1.3 final currently, all things were working fine.
 But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite
the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
 We can re-compile it successfuly. but when will try to index the
document. It give the error as below:
 java.lang.NullPointerException
 at
org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
 at
org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
 at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
 at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
 at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
 Which wrong? Pls help.
 
 Thanks.
 
 Regards,
 Alex
 
 
 
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



upgrade from Lucene 1.3 final to 1.4rc3 problem

2004-07-06 Thread Alex Aw Seat Kiong
Hi!

I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the 
lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document. It give the 
error as below:
java.lang.NullPointerException
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.

Thanks.

Regards,
Alex





How to use QueryParser to query to get the index summary info?

2004-07-05 Thread Alex Aw Seat Kiong
Hi!

How to use QueryParser to query to get the index summary info, like?
a. Last and first index document? 
b. Size of each document was indexed?
b. Total size of all documents were indexed?
c. Total count of all documents were indexed?
Anyone know about it?

Thanks.

Regards,
Alex

Are the lucene index server support for other language, like chinese?

2004-05-06 Thread Alex Aw Seat Kiong
Hi!

Are the lucene index server support for other language, like chinese?
What is the additional work need to be done for support it?


Thanks,
Alex




- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, May 06, 2004 3:37 AM
Subject: Re: Where does the name lucene come from?


 Funny, earlier today I started to reply to this message, and then
 decided not to answer this question any more.  It is a FAQ entry now:
 http://www.jguru.com/faq/Lucene

 Otis

 --- Steven Rowe [EMAIL PROTECTED] wrote:
  Til Schneider wrote:
   Hi,
  
   Working now for a few months with this really great search engine,
  I was
   wondering where the name Lucene comes from? What does it mean? Is
 
   there any deeper sense?
 
  Doug Cutting's response:
  URL:http://tinyurl.com/2hh5c
 
  (full original URL:
 

URL:http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]
.apache.orgmsgId=961817
  )
 
  Otis, shouldn't this be an FAQ?
 
  Steve
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Document Clustering

2003-11-11 Thread Alex Aw Seat Kiong
Hi!

I'm also interest it. Kindly CC to me the lastest progress of your
clustering project.

Regards,
AlexAw


- Original Message - 
From: Eric Jain [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 10:07 PM
Subject: Re: Document Clustering


  I'm working on it. Classification and Clustering as well.

 Very interesting... if you get something working, please don't forget to
 notify this list :-)

 --
 Eric Jain


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



lucene indexing and searching engine performance

2003-10-30 Thread Alex Aw Seat Kiong

Hi Doug Cutting !

That's really very helpful, thanks to Doug.
I'm doing the performance research of the lucene speed of indexing and
searching engine.
So, isn't able to give me more details of
1. searching
But if you
 need to search two million 2kB documents on a 500Mhz Pentium with 128MB of
 RAM in a couple of seconds per query, you're probably okay.
What is the other hardware spec, like
- SCSI harddisk or IDE harddisk? If it's SCSI harddisk, what is the model of
the harddisk and SCSI card model,  PRM?
- Which OS was use for this performance testing?
- Which Application Server was use for this performance testing?

2. indexing (assume the hardware and software spec is same as searching
server)
Index space should be generally less than the original document size, right?
Assume, for 500MB Disk Space for the application,
Max index size  : should been more than 250,000 document in 2 KB
size, right?
Max Speed of indexing : ??? documents in 2KB size per hours


Can share the performance test was done to among of us?

Thank You.

Regards,
AlexAw





- Original Message - 
From: Maurice Coyle [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, October 28, 2003 6:50 PM
Subject: Re: large index query time


 that's very helpful, thanks to all who replied.

 my index is definitely larger than my RAM so i guess the increase in query
 time is due to an increase in time to open the index/perform a search.

 thanks again,
 maurice


 - Original Message -
 From: Tate Avery [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Friday, October 24, 2003 5:33 PM
 Subject: RE: large index query time



 Below are some posts from Doug (circa 2001) that I found very helpful with
 regard to understanding Lucene scalability.  I am assuming that they are
 still generally applicable.  You might also find them useful.

 Tate


 ---


 Performance for large indices is frequently governed by i/o performance.
If
 an index is larger than RAM then searches will need to read data from
disk.
 This can quickly become a bottleneck.  A search for a term that occurs in
a
 million documents can require over 1MB of data, which can take some time
to
 read.  With multiple searching threads, the disk can easily become a
 bottleneck.  Disk arrays can alleviate this, more RAM helps even more!

 For some folks, queries that take over a second are unacceptable, for
 others, ten seconds is okay.

 Performance should be more-or-less linear: a two-million document index
will
 be almost twice as slow to search as a one-million document index.  There
 are lots of factors, including document size, CPU-speed, RAM-size, i/o
 subsystem, but a rough rule-of-thumb for Lucene performance might be that,
 in a typical configuration, it can search a million documents per
second.

 So if you need to search 20 million 100kB documents on a 100Mhz 386 with
8MB
 of RAM with sub-second response time, Lucene will probably fail.  But if
you
 need to search two million 2kB documents on a 500Mhz Pentium with 128MB of
 RAM in a couple of seconds per query, you're probably okay.

 - Doug Cutting (10/08/2001)


 Some more precise statements: The cost to search for a term is
proportional
 to the number of documents that contain that term.  The cost to search for
a
 phrase is proportional to the sum of the number of occurrences of its
 constituent terms.  The cost to execute a boolean query is the sum of the
 costs of its sub-queries.  Longer documents contain more terms: usually
both
 more unique terms and more occurrences.

 Total vocabulary size is not a big factor in search performance.  When you
 open an index Lucene does read one out of every 128 unique terms into a
 table, so an index with a large number of unique terms will be slower to
 open.  Searching that table for query terms is also slower for bigger
 indexes, but the time to search that table is not significant in overall
 performance.  Lucene also reads at index open one byte per document per
 indexed field (the normalization factor).  So an index with lots of
 documents and fields will also be slower to open.  But, once opened, the
 cost of searching is largely dependent on the frequency characteristics of
 query terms.  And, since IndexReaders and Searchers are thread safe, you
 don't need to open indexes very often.

 - Doug Cutting (10/08/2001)





 -Original Message-
 From: Dan Quaroni [mailto:[EMAIL PROTECTED]
 Sent: October 24, 2003 1:33 PM
 To: 'Lucene Users List'
 Subject: RE: large index query time


 My experience is that the query time (and memory usage) can be affected
 greatly by booleans that retrieve lots of results.

 Are you finding it slow when doing a simple query that should return only
a
 handful of results, or is it on more complex queries?

 -Original Message-
 From: Maurice Coyle [mailto:[EMAIL PROTECTED]
 Sent: Friday, October 24, 2003 1:29 PM
 To: 

Re: large index query time

2003-10-29 Thread Alex Aw Seat Kiong
Hi Doug Cutting !

That's really very helpful, thanks to Doug.
I'm doing the performance research of the lucene speed of indexing and searching 
engine.
So, isn't able to give me more details of 
1. searching 
But if you 
 need to search two million 2kB documents on a 500Mhz Pentium with 128MB of
 RAM in a couple of seconds per query, you're probably okay.
What is the other hardware spec, like
- SCSI harddisk or IDE harddisk? If it's SCSI harddisk, what is the model of the 
harddisk and SCSI card model, PRM?
- Which OS was use for this performance testing?
- Which Application Server was use for this performance testing?

2. indexing (assume the hardware and software spec is same as searching server)
Index space should be generally less than the original document size, right?
Assume, for 500MB Disk Space for the application, 
Max index size  : should been more than 250,000 document in 2 KB size, 
right?
Max Speed of indexing : ??? documents in 2KB size per hours


Can share the performance test was done to among of us?

Thank You.

Regards,
AlexAw





- Original Message - 
From: Maurice Coyle [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, October 28, 2003 6:50 PM
Subject: Re: large index query time


 that's very helpful, thanks to all who replied.
 
 my index is definitely larger than my RAM so i guess the increase in query
 time is due to an increase in time to open the index/perform a search.
 
 thanks again,
 maurice
 
 
 - Original Message -
 From: Tate Avery [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Friday, October 24, 2003 5:33 PM
 Subject: RE: large index query time
 
 
 
 Below are some posts from Doug (circa 2001) that I found very helpful with
 regard to understanding Lucene scalability.  I am assuming that they are
 still generally applicable.  You might also find them useful.
 
 Tate
 
 
 ---
 
 
 Performance for large indices is frequently governed by i/o performance.  If
 an index is larger than RAM then searches will need to read data from disk.
 This can quickly become a bottleneck.  A search for a term that occurs in a
 million documents can require over 1MB of data, which can take some time to
 read.  With multiple searching threads, the disk can easily become a
 bottleneck.  Disk arrays can alleviate this, more RAM helps even more!
 
 For some folks, queries that take over a second are unacceptable, for
 others, ten seconds is okay.
 
 Performance should be more-or-less linear: a two-million document index will
 be almost twice as slow to search as a one-million document index.  There
 are lots of factors, including document size, CPU-speed, RAM-size, i/o
 subsystem, but a rough rule-of-thumb for Lucene performance might be that,
 in a typical configuration, it can search a million documents per second.
 
 So if you need to search 20 million 100kB documents on a 100Mhz 386 with 8MB
 of RAM with sub-second response time, Lucene will probably fail.  But if you
 need to search two million 2kB documents on a 500Mhz Pentium with 128MB of
 RAM in a couple of seconds per query, you're probably okay.
 
 - Doug Cutting (10/08/2001)
 
 
 Some more precise statements: The cost to search for a term is proportional
 to the number of documents that contain that term.  The cost to search for a
 phrase is proportional to the sum of the number of occurrences of its
 constituent terms.  The cost to execute a boolean query is the sum of the
 costs of its sub-queries.  Longer documents contain more terms: usually both
 more unique terms and more occurrences.
 
 Total vocabulary size is not a big factor in search performance.  When you
 open an index Lucene does read one out of every 128 unique terms into a
 table, so an index with a large number of unique terms will be slower to
 open.  Searching that table for query terms is also slower for bigger
 indexes, but the time to search that table is not significant in overall
 performance.  Lucene also reads at index open one byte per document per
 indexed field (the normalization factor).  So an index with lots of
 documents and fields will also be slower to open.  But, once opened, the
 cost of searching is largely dependent on the frequency characteristics of
 query terms.  And, since IndexReaders and Searchers are thread safe, you
 don't need to open indexes very often.
 
 - Doug Cutting (10/08/2001)
 
 
 
 
 
 -Original Message-
 From: Dan Quaroni [mailto:[EMAIL PROTECTED]
 Sent: October 24, 2003 1:33 PM
 To: 'Lucene Users List'
 Subject: RE: large index query time
 
 
 My experience is that the query time (and memory usage) can be affected
 greatly by booleans that retrieve lots of results.
 
 Are you finding it slow when doing a simple query that should return only a
 handful of results, or is it on more complex queries?
 
 -Original Message-
 From: Maurice Coyle [mailto:[EMAIL PROTECTED]
 Sent: