RE: Performance Tuning

2007-01-30 Thread Fuad Efendi
After 8 hours:

552955 documents - 554827
(average adds: about 3000/hour, but many of them are currently updates of
existing records)

 Memory: 3.5Gb of 11.7Gb RAM, 0 swap.   
 Index Files: 62 items, 200Mb  

197 files, 223 Mb


After stopping and issuing commit explicitly (it took maybe 20 seconds):
17 files, 207.1 Mb



 I changed memory limits, restored default solr-config.xml, and I am trying
 to execute it again: 
 4Gb - database
 1Gb - Tomcat+Solr
 512Mb - Tomcat+Cocoon
 2Gb - Standalone Java Solr-Client
 unknown - Apache Httpd (default settings; probably 1Gb total)



Re: Performance Tuning

2007-01-29 Thread Mike Klaas

Hi Fuad,

The point at which the thread is blocking is the only synchronization
on tracker, so another thread must be in the block.  Is it possible
that another thread is stalled during commit?  Do you have autowarming
enabled, and are these queries processor-intensive?

If it is in fact a deadlock situation, could you provide a full thread
dump (kill -QUIT pid)?

Thanks,
-MIke

On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:

I am doing some performance analysis,
There are currently more than 55 of documents in SOLR, and Tokenizer
(web-crawler, http://www.tokenizer.org) adds about 2000 of new documents
each hour. I was forced to stop crawler, but even after 20 minutes SOLR uses
about 60% CPU (two Opteron 252 processors, SLES 10).
I have autocommit set to 1000, and default merge to 1000. Didn't issue
optimize yet, and I have 738 files in /solr/data/index folder. Usually
optimize does not help (after I reached 40 docs).
I want to share some findings... Currently, database size can easily reach 2
millions of docs (by adding some URLs from USA); but I am forced to stop
crawler.
Max number of open files is set to 65000, SuSE 10 Ent. Server.
8192 max open files didn't help - in fact, this number is enough; but OS
have some kind of delay (when it is overloaded), it shows 8000 open files
when we have only 3000 (it will show correct number after some delay! It's
not truly real-time number)

Solr runs at Tomcat with 4Gb: -Xms4096M -Xmx4096M

Ok. I issued commit (via HTTP XML), it took maybe 10 seconds... I have 17
file now in index folder, but SOLR still uses about 66% double-CPU, and
there are no any incoming HTTP requests, and Robot stopped crawl.

Size of SOLR index files on disk is 200Mb total, so I expect that 4Gb for
dedicated Tomcat is more than enough.

I see often such message at Admin screen:

No deadlock found.
Full Thread Dump:
http-10080-Processor50 Id=68 in BLOCKED on
[EMAIL PROTECTED] total
cpu time=72310.ms user time=71090.ms
 owned by http-10080-Processor9 Id=20
 at
org.apache.solr.update.DirectUpdateHandler2.checkCommit(DirectUpdateHandler2
.java:566)
 at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java
:271)
 at org.apache.solr.core.SolrCore.update(SolrCore.java:716)
 at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:252)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:173)
 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:213)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:178)
 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126
)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105
)
 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:107)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
 at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
 at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processC
onnection(Http11BaseProtocol.java:664)
 at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.jav
a:527)
 at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWo
rkerThread.java:80)
 at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.jav
a:684)
 at java.lang.Thread.run(Thread.java:595)



No deadlock, and BLOCKED.

Usually after restart I have 3-5% CPU usage...

SOLR-1.1

Thanks,

Fuad

P.S.
It is still blocked:
http-10080-Processor50 Id=68 in BLOCKED on
[EMAIL PROTECTED] total
cpu time=72310.ms user time=71090.ms
Why it shows same numbers after 5 minutes? I clicked F5 in Internet
Explorer, and I don't expect HTTP Caching!

cpu time=72310.ms user time=71090.ms




RE: Performance Tuning

2007-01-29 Thread Fuad Efendi
Hi Mike,


Thanks for response, I started crawler again trying to repeat the problem...
Already 1 hour, no any problem... Probably tomorrow evening I'll be ready to
execute kill -QUIT pid.

At the bottom of message -
http://192.168.1.3:10080/solr/admin/threaddump.jsp - just a small part of
full ThreadDump, sorry if that information was not enough...


I have multithreaded SOLR-client, 144 threads trying concurrently add new
Docs to SOLR. I have Thread-per-URL architecture, and each thread (Robot)
sleeps about 5 seconds before next execution. So, it should not be
overloaded, and there are no any competition between threads (threads are
per-DNS-name, they are not trying to change same Doc concurrently)

Another threads come from front-end, separate instance of Tomcat
(web-interface).

Sorry, I need to wait... 'Deadlock' might happen after few hours...

I can workaround it by having batch update/add job at night, but it is
very funny to see how robot updates item list in real time...

Thanks,
Fuad


-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 29, 2007 8:01 PM
To: solr-dev@lucene.apache.org
Subject: Re: Performance Tuning


Hi Fuad,

The point at which the thread is blocking is the only synchronization
on tracker, so another thread must be in the block.  Is it possible
that another thread is stalled during commit?  Do you have autowarming
enabled, and are these queries processor-intensive?

If it is in fact a deadlock situation, could you provide a full thread
dump (kill -QUIT pid)?

Thanks,
-MIke

On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:
 I am doing some performance analysis,
 There are currently more than 55 of documents in SOLR, and Tokenizer
 (web-crawler, http://www.tokenizer.org) adds about 2000 of new documents
 each hour. I was forced to stop crawler, but even after 20 minutes SOLR
uses
 about 60% CPU (two Opteron 252 processors, SLES 10).
 I have autocommit set to 1000, and default merge to 1000. Didn't issue
 optimize yet, and I have 738 files in /solr/data/index folder. Usually
 optimize does not help (after I reached 40 docs).
 I want to share some findings... Currently, database size can easily reach
2
 millions of docs (by adding some URLs from USA); but I am forced to stop
 crawler.
 Max number of open files is set to 65000, SuSE 10 Ent. Server.
 8192 max open files didn't help - in fact, this number is enough; but OS
 have some kind of delay (when it is overloaded), it shows 8000 open files
 when we have only 3000 (it will show correct number after some delay! It's
 not truly real-time number)

 Solr runs at Tomcat with 4Gb: -Xms4096M -Xmx4096M

 Ok. I issued commit (via HTTP XML), it took maybe 10 seconds... I have
17
 file now in index folder, but SOLR still uses about 66% double-CPU, and
 there are no any incoming HTTP requests, and Robot stopped crawl.

 Size of SOLR index files on disk is 200Mb total, so I expect that 4Gb for
 dedicated Tomcat is more than enough.

 I see often such message at Admin screen:

 No deadlock found.
 Full Thread Dump:
 http-10080-Processor50 Id=68 in BLOCKED on
 [EMAIL PROTECTED]
total
 cpu time=72310.ms user time=71090.ms
  owned by http-10080-Processor9 Id=20
  at

org.apache.solr.update.DirectUpdateHandler2.checkCommit(DirectUpdateHandler2
 .java:566)
  at

org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java
 :271)
  at org.apache.solr.core.SolrCore.update(SolrCore.java:716)
  at

org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
  at

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:252)
  at

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:173)
  at

org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:213)
  at

org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:178)
  at

org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126
 )
  at

org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105
 )
  at

org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :107)
  at

org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
  at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
  at

org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processC
 onnection(Http11BaseProtocol.java:664)
  at

org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.jav
 a:527)
  at

org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWo
 rkerThread.java:80)
  at

org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.jav
 a:684

Re: Performance Tuning

2007-01-29 Thread Mike Klaas

Hi Fuad,

Responses inline.

On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:


Thanks for response, I started crawler again trying to repeat the problem...
Already 1 hour, no any problem... Probably tomorrow evening I'll be ready to
execute kill -QUIT pid.


Great.


I have multithreaded SOLR-client, 144 threads trying concurrently add new
Docs to SOLR. I have Thread-per-URL architecture, and each thread (Robot)
sleeps about 5 seconds before next execution. So, it should not be
overloaded, and there are no any competition between threads (threads are
per-DNS-name, they are not trying to change same Doc concurrently)


I've never tried using so many threads simultaneously, but I can't
think of any theoretical reason why it wouldn't work.


I can workaround it by having batch update/add job at night, but it is
very funny to see how robot updates item list in real time...


I can't really parse that.

Let me know when you have a full thread dump.  It would be good to
know if you do any autowarming on commit.  Do you ever get WARNING:
multiple on-deck searchers?

-MIke


Re: Performance Tuning

2007-01-29 Thread Mike Klaas

On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:


Solr runs at Tomcat with 4Gb: -Xms4096M -Xmx4096M


How much physical ram do you have on the machine?  I'd suggest leaving
a healthy chunk free for the OS to cache the index.

-Mike


RE: Performance Tuning

2007-01-29 Thread Fuad Efendi
Hi Mike,


Looks like I need lower memory limits for JVMs...

I have 12Gb RAM, 2 x Opteron 252, SuSE 10 ES, Java 5_09, Tomcat 5.5.20, SOLR
1.1

Of course, Java needs at least 20-40Mb per-thread, but SOLR-Client is
executed in separate JVM. 
4Gb for Tomcat
2Gb-4Gb for Client (144 threads). 
6Gb for database.

Too much memory for a database and JVMs...


About amount of threads: I run about 1024-2048 threads simultaneously a
while ago, with 'The Grinder' load-stress simulator, Windows 2000, no any
problem (if RAM is enough)...


I'd suggest leaving a healthy chunk free for the OS to cache the index.

I thought that this cache is inside JVM memory, inside 4Gb allocated to
Tomcat... RAM is not enough, I have 9Gb of User Memory, and 11Gb of Used
Swap.


Currently I have about 100 threads (rest of threads already finished crawl
and are sleeping an hour); I have setting '7 seconds' before subsequent
execution of each transaction. In average, it gives about 15 transactins per
second, and only few of them are SOLR add...



I am recalling... Problems began when I disabled programmatic commit and
started to rely on SOLR
autocommit
maxDocs1000maxDocs
/autocommit

Before that, each thread executed add, delete, and single static method
executed commit after each 100s of total adds. Now, I disabled real-time
delete (going to make it batch job), and I am relying to SOLR autocommit
feature.

So far so good... Already 2 hours, same environment, no any problem... May
be tomorrow... I set 2 hours of HTTP expiration headers, so we can't see
'real-time' SOLR updates (everything is cached by front-end)...

I need to check Tomcat logs, 1.4Gb per-day is too much... I can't even open
it...


Thanks,
Fuad


P.S.
I'll set lower memory limits, and rerun an application (tomorrow); looks
like I have huge swap file.

Now: 
10.6 of 11.7 Gb - user memory
9.6 of 16Gb used swap
2xCPU: 90-100%


P.P.S.
right before sending Email, I tried to check solr admin screen... something
happened... SOLR stopped itself! I can't even see process Id, and admin
screen does not work, but CPU is still 95%...



Re: Performance Tuning

2007-01-29 Thread Yonik Seeley

On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:

right before sending Email, I tried to check solr admin screen... something
happened... SOLR stopped itself! I can't even see process Id, and admin
screen does not work, but CPU is still 95%...


- I couldn't tell from your previous emails if this box was indexing
only, or indexing + searching.
 - if it's also involved in searching, it could be hitting the
multiple threads generating a fieldcache entry issue.  A thread dump
will help tell.
- there is a thread limit to most app servers, if all the threads are
busy doing something else, that could possibly block admin access too.

-Yonik


RE: Performance Tuning

2007-01-29 Thread Fuad Efendi
'Thread Limit in multiple app servers' is just a rumour from old prefork
HTTPD. You can't maintain KeepAlive for 1024 concurrent internet users by
having default 150 threads! On Windows platform, each thread needs 20Mb for
execution stack, and this is the only limitation. On Apache HTTPD, each
multithreaded worker process needs about 20Mb of RAM. Each thread listens to
requests from specific TCP socket, and sometimes even sleeps.

I have 144 client threads in separate JVM, standalone JSE 5.

Searching+Indexing+AutoCommit happens concurrently, and I have adds only
_once_per_second_ (for the whole bunch of 144 client threads sleeping 7
seconds from 7.1!)

I suspect autocommit setting... I didn't have any problem before. I need
to check source code too...

Thanks,
Fuad

-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 29, 2007 11:20 PM
To: solr-dev@lucene.apache.org
Subject: Re: Performance Tuning


On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:
 right before sending Email, I tried to check solr admin screen...
something
 happened... SOLR stopped itself! I can't even see process Id, and admin
 screen does not work, but CPU is still 95%...

- I couldn't tell from your previous emails if this box was indexing
only, or indexing + searching.
  - if it's also involved in searching, it could be hitting the
multiple threads generating a fieldcache entry issue.  A thread dump
will help tell.
- there is a thread limit to most app servers, if all the threads are
busy doing something else, that could possibly block admin access too.

-Yonik




Re: Performance Tuning

2007-01-29 Thread Yonik Seeley

On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:

'Thread Limit in multiple app servers' is just a rumour from old prefork
HTTPD.


No, I meant the servlet container.  The limit to the number of threads
is often a feature, not a limitation.  I'm not sure what happens when
the limit is exceeded... perhaps an error is returned instead of
blocking.  I doubt that's your problem anyway.

Make sure to grep the solr logs for WARNING as Mike suggested.

-Yonik


RE: Performance Tuning

2007-01-29 Thread Fuad Efendi
Hi Yonik,


I have multithreaded SOLR client which runs in a separate JVM; I have Tomcat
with SOLR, and additional Tomcat with Cocoon. Plus, a database... And Apache
HTTPD... Total: 5 big applications inside same double-Opteron box...

Yes, having Add+Search+Commit could be a problem, I don't know architecture
in-depth... I am adding new document each second, and I have an Internet
users performing a search at the same time...

I changed memory limits, restored default solr-config.xml, and I am trying
to execute it again: 
4Gb - database
1Gb - Tomcat+Solr
512Mb - Tomcat+Cocoon
2Gb - Standalone Java Solr-Client
unknown - Apache Httpd (default settings; probably 1Gb total)

I have total 12Gb RAM, and I had huge swap file before decreasing memory
limits; let's see... I think Lucene does not allow to run Reader and Writer
concurrently, but my current architecture does exactly this...

If you see an empty table at www.tokenizer.org, just click refresh, it might
help; I am using expiration headers and caching, and not all requests are
reaching a server... Just restarted Apache to clear cache...


-Fuad


 'Thread Limit in multiple app servers' is just a rumour from old prefork
 HTTPD.

No, I meant the servlet container.  The limit to the number of threads
is often a feature, not a limitation.  I'm not sure what happens when
the limit is exceeded... perhaps an error is returned instead of
blocking.  I doubt that's your problem anyway.

Make sure to grep the solr logs for WARNING as Mike suggested.

-Yonik




Re: Performance Tuning

2007-01-29 Thread Yonik Seeley

On 1/29/07, Mike Klaas [EMAIL PROTECTED] wrote:

On 1/29/07, Fuad Efendi [EMAIL PROTECTED] wrote:

 autocommit
 maxDocs1000maxDocs
 /autocommit

Makes me suspect overlapping searchers even more strongly.  The
current autocommit implementation does not wait for the searcher to
finish warming...


There is a pressure-relief valve of sorts... maxWarmingSearchers

-Yonik


RE: Performance Tuning

2007-01-29 Thread Fuad Efendi
Typo in previous message: threads _do_not_ commit...

Threads do NOT commit, only add