Lucene IndexSearcher PrefixQuery seach getting really slow after a while

2016-11-03 Thread Jason Wu
Hi Team,

We are using lucene 4.8.1 to do some info searches every day for years. 
However, recently we encounter some performance issues which greatly slow down 
the lucene search.

After application running for a while, we are facing below issues, which  
IndexSearcher PrefixQuery taking much longer time to search:

[cid:image002.png@01D235EC.3C063740]

Our cpu and memory are fine, no leak found:
[cid:image004.jpg@01D235EC.3C063740]


However, for the exactly same java instance we are running on another box,  for 
the same info we are searching, it is very fast.

I/O, memory, CPUS are all fine on both boxes.

So, do you know any reasons can cause this performance issue?

Thank you,
J.W



This e-mail, including accompanying communications and attachments, is strictly 
confidential and only for the intended recipient. Any retention, use or 
disclosure not expressly authorised by Markit is prohibited. This email is 
subject to all waivers and other terms at the following link: 
http://www.markit.com/en/about/legal/email-disclaimer.page

Please visit http://www.markit.com/en/about/contact/contact-us.page for contact 
information on our offices worldwide.


Lucene Indexing performance issue

2014-10-22 Thread Jason Wu
Hi Team,

I am a new user of Lucene 4.8.1. I encountered a Lucene indexing
performance issue which slow down my application greatly. I tried several
ways from google searchs but still couldn't resolve it. Any suggestions
from your experts might help me a lot.

One of my application uses the lucene index for fast data searching. When I
start my application, I will index all the necessary data from database
which will be 88 MB index data after indexing is done. In this case,
indexing only takes less than 4 minutes.

I have another shell script task running every night, which send a JMX call
to my application to re-indexing all the data. The re-indexing method will
clear my current indexing directory data, reading data from database and
recreating the index from the ground. Everything works fine at the
beginning, indexing only takes a little more than 3 mins. But after my
application running for a while(one day or two), the re-indexing speed
slows down greatly which now takes more than 22 mins.

Here is the procedure of my Lucene indexing and re-indexing:

   1. If index data exists inside index directory, remove all the index
   data.
   2. Create IndexWriter with 200MB RAMBUFFERSIZE, (6.6) MaxMergesAndThreads
   3. Process DB result set
   - When I loop the result set, I reuse the same Document instance.
  - At the end of each loop, I call indexWriter.addDocument(doc)
   4. IndexWriter.commit()
   5. IndexWriter.close();


I did a profiling when it was slow and found out that
indexWriter.addDocument method took most of the time. Then, i put some
logging code as below:

long start = System.currentTimeMillis();
indexWriter.addDocument(doc);
totalAddDocTime += (System.currentTimeMillis() - start);

After several tests, when the indexing is slow down, the total time took by
indexWriter.addDocument(doc) is about 20 mins.

During indexing, i also observed the cpu usage sometimes above 100.

6G memory assigned to my application. When indexing, other processing
modules are all suspended waiting for indexing finish and I don't see any
memory leak in my application.

Can you give me some suggestions about my issue?

Thank you,

Jason


Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Nischal,

I had similar indexing issue. My lucene indexing took 22 mins for 70 MB
docs. When i debugged the problem, i found out the
indexWriter.addDocument(doc) taking a really long time.

Have you already found the solution about it?

Thank you,
Jason



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Making-lucene-indexing-multi-threaded-tp4087830p4166094.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Fuad,

Thanks for your suggestions and quick response. I am using a single-threaded
indexing way to add docs. I will try the multiple-threaded indexing to see
if my issue will be resolved.

This issue only exists after I upgraded lucene version from 2.4.1(with Java
1.6) to 4.8.1(with Java 1.7). I don't have this problem in old lucene
version.

The indexing speed is fast when i start the application, which only takes 3
mins indexing. But after my application running for a while(a day, etc), 
once i send a JMX call to my application to reindex docs, the indexing speed
will slow down and take 22 mins.

Did you have any similar experience like the above before?

Thank you,
Jason



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Making-lucene-indexing-multi-threaded-tp4087830p4166116.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Gary,

Thanks for your response. I only call the commit when all my docs are added.

Here is the procedure of my Lucene indexing and re-indexing: 

   1. If index data exists inside index directory, remove all the index 
   data. 
   2. Create IndexWriter with 256MB RAMBUFFERSIZE
   3. Process DB result set 
   - When I loop the result set, I reuse the same Document instance. 
  - At the end of each loop, I call indexWriter.addDocument(doc) 
   4. After all docs are added, call IndexWriter.commit() 
   5. IndexWriter.close(); 

Thank you,
Jason



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Making-lucene-indexing-multi-threaded-tp4087830p4166123.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org