Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Liviu Matei
Hi,

In order to achieve a somehow smarter search that takes into
consideration also the context I decided to use PhraseQuery. Now I create
~100 phrase queries from the input text and combine them with boolean query
into one query and issue a search against the index.
Now if the index size is big (1+ million entries with a lot of content) I
am encountering performance hits - reponse time ~30 seconds which is not
acceptable. Can you please tell me if there is a way to tune the
PhraseQueries ? Or is it another way to improve perfomance besides reducing
the number of queries, I've read a little about N-Gram query but not sure
if it is suitable in this scenario ?

Thanks and regards,
Liviu


Re: Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Jack Krupansky
Does your index fit fully in system memory - the OS file cache? If not, 
there could be a lot of thrashing (I/O) as Lucene accesses the index.


-- Jack Krupansky

-Original Message- 
From: Liviu Matei

Sent: Monday, May 19, 2014 4:21 PM
To: java-user@lucene.apache.org
Subject: Performance issue when using multiple PhraseQueries against a 1+ 
million entries index


Hi,

In order to achieve a somehow smarter search that takes into
consideration also the context I decided to use PhraseQuery. Now I create
~100 phrase queries from the input text and combine them with boolean query
into one query and issue a search against the index.
Now if the index size is big (1+ million entries with a lot of content) I
am encountering performance hits - reponse time ~30 seconds which is not
acceptable. Can you please tell me if there is a way to tune the
PhraseQueries ? Or is it another way to improve perfomance besides reducing
the number of queries, I've read a little about N-Gram query but not sure
if it is suitable in this scenario ?

Thanks and regards,
Liviu 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Liviu Matei
Thanks for the reply.
When you mention system memory you referring to RAM (or HEAP as this is
running as a java process) ?
The index size is around 13G and the java process is not given so many
memory (in terms of XMX).
Could this be the cause? My understandint while reading some articles on
the internet was that it is not good when using MMapDirectory (like I use)
to allocate all the RAM to the java process.

Thanks,
Liviu




On Mon, May 19, 2014 at 11:28 PM, Jack Krupansky j...@basetechnology.comwrote:

 Does your index fit fully in system memory - the OS file cache? If not,
 there could be a lot of thrashing (I/O) as Lucene accesses the index.

 -- Jack Krupansky

 -Original Message- From: Liviu Matei
 Sent: Monday, May 19, 2014 4:21 PM
 To: java-user@lucene.apache.org
 Subject: Performance issue when using multiple PhraseQueries against a 1+
 million entries index


 Hi,

 In order to achieve a somehow smarter search that takes into
 consideration also the context I decided to use PhraseQuery. Now I create
 ~100 phrase queries from the input text and combine them with boolean query
 into one query and issue a search against the index.
 Now if the index size is big (1+ million entries with a lot of content) I
 am encountering performance hits - reponse time ~30 seconds which is not
 acceptable. Can you please tell me if there is a way to tune the
 PhraseQueries ? Or is it another way to improve perfomance besides reducing
 the number of queries, I've read a little about N-Gram query but not sure
 if it is suitable in this scenario ?

 Thanks and regards,
 Liviu

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org