Re: Range Query Sombody HELP please

2004-06-03 Thread Ype Kingma
On Thursday 03 June 2004 07:10, Karthik N S wrote:
 Hey

Ype the Query  of range

+button +shirt +filename:[b10181_p100 TO b10181_p200]

   did not work for me but on other way around

   +(button OR shirt) +filename:[b10181_p100 TO b10181_p200]

   resulted to me in 2 hits with either one term  button / shirt   in each
 page,but not both of them

  I found from the Html file that both words are present  in more then 2
 files,

  Are there any other possibilities  for getting both words.

Your index contains book pages as Lucene documents.
In this case you need to index larger parts of the books
as Lucene documents in order to retrieve books with multiple
subjects on different pages.


Kind regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



why the score is not 1.0?

2004-06-03 Thread uddam chukmol
Dear all,
 
I have another trouble in one of my program using Lucene. I tried to compare the same 
string and use the same analyzer to index it. You can see my code as following:
 
-

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.Searcher;



public class LuceneIndexExample {



private static final String _indexDir =c://lucene-index-dir;



public static void main(String args[]) throws Exception {



String sss = All work and no play makes Jack a dull boy;



Analyzer analyzer = new StandardAnalyzer();

boolean flag = true;

IndexWriter writer = new IndexWriter(_indexDir, analyzer, flag);



Document document = new Document();

document.add(Field.Text(champs, sss));

writer.addDocument(document);

writer.close();



Searcher s = new IndexSearcher(_indexDir);

String str = All work and no play makes Jack a dull boy;

Query q = QueryParser.parse(str, champs, analyzer);

Hits hits = s.search(q);



try {

for (int i=0; i=hits.length();i++){

System.out.println(score of + i + = +hits.score(i));

}

} catch (IndexOutOfBoundsException e){

}

}

}

--
 
I think i should get 1.0 as the score from the hits collection but got 0.30444607 
instead. 
 
SOMEBODY KNOWS WHY IT'S GONE LIKE THIS? PLEASE HELP! 
 
Thanks before hand.
 
Uddam


-
Do you Yahoo!?
Friends.  Fun. Try the all-new Yahoo! Messenger

Re: why the score is not 1.0?

2004-06-03 Thread Erik Hatcher
Without looking at your code, a good first suggestion is to se  
IndexSearcher.explain(Query,docId) to see why scores are they way they  
are.

Erik
On Jun 3, 2004, at 7:21 AM, uddam chukmol wrote:
Dear all,
I have another trouble in one of my program using Lucene. I tried to  
compare the same string and use the same analyzer to index it. You can  
see my code as following:

--- 
--

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Searcher;

public class LuceneIndexExample {

private static final String _indexDir =c://lucene-index-dir;

public static void main(String args[]) throws Exception {

String sss = All work and no play makes Jack a dull boy;

Analyzer analyzer = new StandardAnalyzer();
boolean flag = true;
IndexWriter writer = new IndexWriter(_indexDir, analyzer, flag);

Document document = new Document();
document.add(Field.Text(champs, sss));
writer.addDocument(document);
writer.close();

Searcher s = new IndexSearcher(_indexDir);
String str = All work and no play makes Jack a dull boy;
Query q = QueryParser.parse(str, champs, analyzer);
Hits hits = s.search(q);

try {
for (int i=0; i=hits.length();i++){
System.out.println(score of + i + = +hits.score(i));
}
} catch (IndexOutOfBoundsException e){
}
}
}
--- 
---

I think i should get 1.0 as the score from the hits collection but  
got 0.30444607 instead.

SOMEBODY KNOWS WHY IT'S GONE LIKE THIS? PLEASE HELP!
Thanks before hand.
Uddam

-
Do you Yahoo!?
Friends.  Fun. Try the all-new Yahoo! Messenger

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


flush an index directory

2004-06-03 Thread uddam chukmol
Hi all,
 
I'm in real trouble with the way Lucene organizes its index. First time, i run an 
application to index a text. Then, I changed the text and executed it once again, but 
still can not get my index refreshed.
 
Is there anyway to do this? PLEASE HELP!!!
 
Thanks you before hand!
 
Uddam



-
Do you Yahoo!?
Friends.  Fun. Try the all-new Yahoo! Messenger

Re: flush an index directory

2004-06-03 Thread jt oob
If I understand your question correctly you have document, you index
it, you change the document, you index the document again.

This will lead to the document being in your index twice, once with
each version. Searches will return hits for either the old or new
version of the document. If you change a document you must remove it
from the index and then re-add it.

If you want to delete your entire index just remove all files in your
index directory.

hope that helps,
jt

 --- uddam chukmol [EMAIL PROTECTED] wrote:  Hi all,
  
 I'm in real trouble with the way Lucene organizes its index. First
 time, i run an application to index a text. Then, I changed the text
 and executed it once again, but still can not get my index refreshed.
  
 Is there anyway to do this? PLEASE HELP!!!
  
 Thanks you before hand!
  
 Uddam
 
 
   
 -
 Do you Yahoo!?
 Friends.  Fun. Try the all-new Yahoo! Messenger 






Yahoo! Messenger - Communicate instantly...Ping 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



FileNotFoundException when trying to indexing.

2004-06-03 Thread Prasad Ganguri
I am using Lucene for buiding our document management system. I tested it in
Windows2000 Professional and got successful execution.

Recently, when we ported the code onto an WindowsXP Professional, we are
getting the following exception. I tried to create segments folder using my
code, but throwing Access denied error.

Could some one help me, what is wrong with my code?

java.io.FileNotFoundException: C:\cms\index\segments (The system cannot find
the file specified)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
at org.apache.lucene.store.FSInputStream$Descriptor.init(Unknown
Source)
at org.apache.lucene.store.FSInputStream.init(Unknown Source)
at org.apache.lucene.store.FSDirectory.openFile(Unknown Source)
at org.apache.lucene.index.SegmentInfos.read(Unknown Source)
at org.apache.lucene.index.IndexWriter$1.doBody(Unknown Source)
at org.apache.lucene.store.Lock$With.run(Unknown Source)
at org.apache.lucene.index.IndexWriter.init(Unknown Source)
at org.apache.lucene.index.IndexWriter.init(Unknown Source)
at
com.ganguri.cms.contentmanagement.index.FileIndexer.index(FileIndexer.java:6
2)
at
com.ganguri.cms.contentmanagement.filemanager.Document.moveFileToRepository(
Document.java:215)
at
jsp_servlet._content._indexcardprocess._jspService(_indexcardprocess.java:19
3)
at com.ganguri.cms.jsp.CMSJSPPage.service(CMSJSPPage.java:20)
at
weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java
:105)
at
weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java
:123)
at
weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImp
l.java:742)
at
weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImp
l.java:686)
at
weblogic.servlet.internal.ServletContextManager.invokeServlet(ServletContext
Manager.java:247)
at
weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:361)
at
weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:261)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)

The corresponding code is as follows:

public static void index(File indexDir, File dataDir, boolean isNew) throws
Exception
{
if (!dataDir.exists())
throw new IOException(dataDir.getName() +  does not exist.);
System.out.println( indexDir existing.?  +
indexDir.exists());
IndexWriter writer = null;
if (!indexDir.exists())
{
indexDir.mkdirs();
}
try
{
writer = new IndexWriter(indexDir, getAnalyzer(), isNew);  // Here the
exception is thrown
if (dataDir.isFile())
indexFile(writer, dataDir);
else if (dataDir.isDirectory())
indexDirectory(writer, dataDir);
else
return;
writer.optimize();
writer.close();
}
catch (Exception e)
{
e.printStackTrace();
}
finally
{
if (writer != null)
writer.close();
}
}

Thanks in advance..


Prasad


disableLuceneLocks system property

2004-06-03 Thread Supun Edirisinghe
why is disableLuceneLocks not in the list at 
http://jakarta.apache.org/lucene/docs/systemproperties.html ?

is it not advisable to use anymore? is it still valid?
will it be supported in version 1.4?
how is the usage? java ... -DdisableLuceneLocks ... or java ... 
-DdisableLuceneLocks=true

thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: FileNotFoundException when trying to indexing.

2004-06-03 Thread Terry Steichen
Prasad,

I think you'll have to provide more code so we can see what's actually going
on.  BTW, I don't see you calling the UseCompoundFile method (unless you do
it inside indexFile/Directory) - I wonder if that could be an issue?

Regards,

Terry

PS: I run on XP/Pro just fine, so there's nothing intrinsically wrong with
the platform.

- Original Message - 
From: Prasad Ganguri [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, June 03, 2004 12:59 PM
Subject: FileNotFoundException when trying to indexing.


I am using Lucene for buiding our document management system. I tested it in
Windows2000 Professional and got successful execution.

Recently, when we ported the code onto an WindowsXP Professional, we are
getting the following exception. I tried to create segments folder using my
code, but throwing Access denied error.

Could some one help me, what is wrong with my code?

java.io.FileNotFoundException: C:\cms\index\segments (The system cannot find
the file specified)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
at org.apache.lucene.store.FSInputStream$Descriptor.init(Unknown
Source)
at org.apache.lucene.store.FSInputStream.init(Unknown Source)
at org.apache.lucene.store.FSDirectory.openFile(Unknown Source)
at org.apache.lucene.index.SegmentInfos.read(Unknown Source)
at org.apache.lucene.index.IndexWriter$1.doBody(Unknown Source)
at org.apache.lucene.store.Lock$With.run(Unknown Source)
at org.apache.lucene.index.IndexWriter.init(Unknown Source)
at org.apache.lucene.index.IndexWriter.init(Unknown Source)
at
com.ganguri.cms.contentmanagement.index.FileIndexer.index(FileIndexer.java:6
2)
at
com.ganguri.cms.contentmanagement.filemanager.Document.moveFileToRepository(
Document.java:215)
at
jsp_servlet._content._indexcardprocess._jspService(_indexcardprocess.java:19
3)
at com.ganguri.cms.jsp.CMSJSPPage.service(CMSJSPPage.java:20)
at
weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java
:105)
at
weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java
:123)
at
weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImp
l.java:742)
at
weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImp
l.java:686)
at
weblogic.servlet.internal.ServletContextManager.invokeServlet(ServletContext
Manager.java:247)
at
weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:361)
at
weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:261)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)

The corresponding code is as follows:

public static void index(File indexDir, File dataDir, boolean isNew) throws
Exception
{
if (!dataDir.exists())
throw new IOException(dataDir.getName() +  does not exist.);
System.out.println( indexDir existing.?  +
indexDir.exists());
IndexWriter writer = null;
if (!indexDir.exists())
{
indexDir.mkdirs();
}
try
{
writer = new IndexWriter(indexDir, getAnalyzer(), isNew);  // Here the
exception is thrown
if (dataDir.isFile())
indexFile(writer, dataDir);
else if (dataDir.isDirectory())
indexDirectory(writer, dataDir);
else
return;
writer.optimize();
writer.close();
}
catch (Exception e)
{
e.printStackTrace();
}
finally
{
if (writer != null)
writer.close();
}
}

Thanks in advance..


Prasad


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problems with lucene in multithreaded environment

2004-06-03 Thread Supun Edirisinghe
I noticed delays when concurrent threads query an IndexSearcher too.
our index is about 550MB with about 850,000 docs. each doc with 20-30 
fields of which only 3 are indexed. Our queries are not very complex -- 
just 3 required term queries.

this is what my test did:
intialilize an array of terms that are known to appear in the
initialize a IndexSearcher
start a number of threads
	that query the indexsearcher and extract
	each thread picks random terms that are known to appear in the indexed 
Keyword fields and builds a boolean query
	and then extracts all 20-30 fields from the 1st 10 hits.
	waits .5 secondseach thread does this 30 times.

typical queries returned 20 - 100 hits
with just one thread: 30 queries ran over a span about 20 seconds. 
search time for each query generally took 40ms to 75ms. The longest 
search time was 445ms but searches that took more than 100ms were rare.

with 5 threads: 150 queries ran over a span of 62 seconds. search time 
for each query for the most part increased to 120ms to 300ms. big 
delays were more prevalent and took 3 or 4 seconds.

with 10 or more threads things got bad. and I didn't run enough tests. 
but most searches took 1 to 2 seconds and some searches did take 20 to 
30 seconds.

when I ran the test with 5 concurrent thread each doing one query 
search times were like 100ms to 200 ms with a max of 700ms.

I have not looked into the code Lucene much and I didn't think queries 
were queued.

I ran my test with the -DdisableLuceneLocks in the command line. But I 
wasn't sure it did anything.

I ran the test on Lucene1.3 final on my powerbook G4 and tests ran with 
alot of other processes going on.

I was interested in this discussion because I could not figure out the 
delay if queries are run in parallel.

On Jun 2, 2004, at 9:32 PM, Doug Cutting wrote:
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds slow, unless your queries are very complex.  What are your 
queries like?

What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances.
Multiple queries executed from different threads using a single 
searcher should not queue, but should run in parallel.  A technique to 
find out where threads are queueing is to get a thread dump and see 
where all of the threads are stuck.  In Solaris and Linux, sending the 
JVM a SIGQUIT will give a thread dump.  On Windows, use Control-Break.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Writing a stemmer

2004-06-03 Thread Musku, Anil (LA)

Hi,

Can anyone provide some help on writing a stemmer for non-english languages?
How proficient must I be in a language for which I wish to write the stemmer?

Regards,
Anil

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: FileNotFoundException when trying to indexing.

2004-06-03 Thread Prasad Ganguri
Hi Terry,

Thanks for your reply.

I identified the problem. I am creating a new index, but passing a parameter
that the index is existing one. So, it is looking for segment file (thinking
that the index is existing).

You rightly pointed out that, i am not calling UseCompoundFile method. I
will incorporate this call.

regards
Prasad

- Original Message - 
From: Terry Steichen [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, June 03, 2004 2:58 PM
Subject: Re: FileNotFoundException when trying to indexing.


 Prasad,

 I think you'll have to provide more code so we can see what's actually
going
 on.  BTW, I don't see you calling the UseCompoundFile method (unless you
do
 it inside indexFile/Directory) - I wonder if that could be an issue?

 Regards,

 Terry

 PS: I run on XP/Pro just fine, so there's nothing intrinsically wrong with
 the platform.

 - Original Message - 
 From: Prasad Ganguri [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, June 03, 2004 12:59 PM
 Subject: FileNotFoundException when trying to indexing.


 I am using Lucene for buiding our document management system. I tested it
in
 Windows2000 Professional and got successful execution.

 Recently, when we ported the code onto an WindowsXP Professional, we are
 getting the following exception. I tried to create segments folder using
my
 code, but throwing Access denied error.

 Could some one help me, what is wrong with my code?

 java.io.FileNotFoundException: C:\cms\index\segments (The system cannot
find
 the file specified)
 at java.io.RandomAccessFile.open(Native Method)
 at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
 at org.apache.lucene.store.FSInputStream$Descriptor.init(Unknown
 Source)
 at org.apache.lucene.store.FSInputStream.init(Unknown Source)
 at org.apache.lucene.store.FSDirectory.openFile(Unknown Source)
 at org.apache.lucene.index.SegmentInfos.read(Unknown Source)
 at org.apache.lucene.index.IndexWriter$1.doBody(Unknown Source)
 at org.apache.lucene.store.Lock$With.run(Unknown Source)
 at org.apache.lucene.index.IndexWriter.init(Unknown Source)
 at org.apache.lucene.index.IndexWriter.init(Unknown Source)
 at

com.ganguri.cms.contentmanagement.index.FileIndexer.index(FileIndexer.java:6
 2)
 at

com.ganguri.cms.contentmanagement.filemanager.Document.moveFileToRepository(
 Document.java:215)
 at

jsp_servlet._content._indexcardprocess._jspService(_indexcardprocess.java:19
 3)
 at com.ganguri.cms.jsp.CMSJSPPage.service(CMSJSPPage.java:20)
 at

weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java
 :105)
 at

weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java
 :123)
 at

weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImp
 l.java:742)
 at

weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImp
 l.java:686)
 at

weblogic.servlet.internal.ServletContextManager.invokeServlet(ServletContext
 Manager.java:247)
 at

weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:361)
 at
 weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:261)
 at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)

 The corresponding code is as follows:

 public static void index(File indexDir, File dataDir, boolean isNew)
throws
 Exception
 {
 if (!dataDir.exists())
 throw new IOException(dataDir.getName() +  does not exist.);
 System.out.println( indexDir existing.?  +
 indexDir.exists());
 IndexWriter writer = null;
 if (!indexDir.exists())
 {
 indexDir.mkdirs();
 }
 try
 {
 writer = new IndexWriter(indexDir, getAnalyzer(), isNew);  // Here the
 exception is thrown
 if (dataDir.isFile())
 indexFile(writer, dataDir);
 else if (dataDir.isDirectory())
 indexDirectory(writer, dataDir);
 else
 return;
 writer.optimize();
 writer.close();
 }
 catch (Exception e)
 {
 e.printStackTrace();
 }
 finally
 {
 if (writer != null)
 writer.close();
 }
 }

 Thanks in advance..


 Prasad


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Writing a stemmer

2004-06-03 Thread Grant Ingersoll
Anil,

I suppose it depends on how complex the language is and what is acceptable for your 
program.  I have written a couple of stemmers that are fairly straightforward based on 
papers that I have read and work well for the langs. we are using.  Your best bet is 
probably to do a literature search for the languages you are interested in and go from 
there.  

I am, of course, assumming stemmers for your languages don't already exist.  If your 
languages are common, there probably is a stemmer available in some form that you can 
use or adapt. You'd be suprised at what you get by doing a simple google search for 
lang X stemmer where lang X is the language you are interested in and no quotes.

Hooking them into Lucene is straightforward and there are several examples of this 
available in the docs and code.

-Grant

 [EMAIL PROTECTED] 06/03/04 04:09PM 

Hi,

Can anyone provide some help on writing a stemmer for non-english languages?
How proficient must I be in a language for which I wish to write the stemmer?

Regards,
Anil

-
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Writing a stemmer

2004-06-03 Thread Erik Hatcher
On Jun 3, 2004, at 4:09 PM, Musku, Anil (LA) wrote:
Can anyone provide some help on writing a stemmer for non-english 
languages?
Have a look at the snowball project in the Lucene sandbox.  If its 
non-European-based languages, I suspect it's quite complex.  It's 
highly language dependent.

How proficient must I be in a language for which I wish to write the 
stemmer?
I would venture to say you would need to be an expert in a language to 
write a decent stemmer.  The SnowballAnalyzer is quite hairy 
underneath, that's for sure.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


bonus for exact case match

2004-06-03 Thread David Spencer
Does anyone have any experiences with giving a bonus for exactly 
matching case in queries?

One use case is in the java world maybe I want to see references to 
Map (java.util.Map)  but am not interested in a (geographical) map.

I believe, in the context of Lucene, one way is to have an Analyzer that 
returns a TokenStream which, in cases where a word has some upper case 
characters, returns the word twice in that position, once as-is and once 
in lower case,  using the magic of Token.getPositionIncrement(). Then 
you'll need a query expander or whatnot which, when given a query like 
Map, expands it to Map^2 map.

Thoughts/comments?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Writing a stemmer

2004-06-03 Thread Leo Galambos
Erik Hatcher [EMAIL PROTECTED] wrote:
__

 How proficient must I be in a language for which I wish to write the 
 stemmer?
I would venture to say you would need to be an expert in a language to 
write a decent stemmer.

I'm sorry for a self-promo ;), but
the stemmer of egothor project can be
adapted to any language, and you needn't be
a language expert. Moreover, the stemmer
achieves better F-measure than Porter's stemmers.

Cheers,
Leo



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problems with lucene in multithreaded environment

2004-06-03 Thread Jayant Kumar
We conducted a test on our search for 500 requests
given in 27 seconds. We noticed that in the first 5
seconds, the results were coming in 100 to 500 ms. But
as the queue size kept increasing, the response time
of the search increased drastically to approx 80-100
seconds. 

Please find enclosed jvmdump.txt which contains a dump
of our search program after about 20 seconds of
starting the program.

Also enclosed is the file queries.txt which contains
few sample search queries.

Please note that this is done on a sample of 400,000
documents (450MB) on P4 having 1GB RAM.

Kindly let us know if this helps to identify the cause
of slow response.

Jayant

 --- Doug Cutting [EMAIL PROTECTED] wrote:  Jayant
Kumar wrote:
  We recently tested lucene with an index size of 2
 GB
  which has about 1,500,000 documents, each document
  having about 25 fields. The frequency of search
 was
  about 20 queries per second. This resulted in an
  average response time of about 20 seconds approx
  per search.
 
 That sounds slow, unless your queries are very
 complex.  What are your 
 queries like?
 
  What we observed was that lucene queues
  the queries and does not release them until the
  results are found. so the queries that have come
 in
  later take up about 500 seconds. Please let us
 know
  whether there is a technique to optimize lucene in
  such circumstances. 
 
 Multiple queries executed from different threads
 using a single searcher 
 should not queue, but should run in parallel.  A
 technique to find out 
 where threads are queueing is to get a thread dump
 and see where all of 
 the threads are stuck.  In Solaris and Linux,
 sending the JVM a SIGQUIT 
 will give a thread dump.  On Windows, use
 Control-Break.
 
 Doug
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
  


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/Thread-14 prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry 
[4d61a000..4d61ac18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.init(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)
at resdex.searchinc.getHits(searchinc.java:752)
at resdex.searchinc.Search(searchinc.java:943)
at resdex.searchinctest.conductTestSearch(searchinctest.java:99)
at resdex.Server$Handler.run(Server.java:64)
at java.lang.Thread.run(Thread.java:534)

Thread-12 prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry 
[4d51a000..4d51ad18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.init(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)
at resdex.searchinc.getHits(searchinc.java:752)
at resdex.searchinc.Search(searchinc.java:943)
at resdex.searchinctest.conductTestSearch(searchinctest.java:99)
at