Hi Anshum, I looked at the log and couldn't make too much sense of it.
But I have an update to my original suggestion: try the following command line parameters: -Xms1024m -XX:-UseParallelGC -XX:+ScavengeBeforeFullGC I got rid of the "-XX:+AggressiveOpts" because perhaps these are not always stable. I changed the "-XX:+UseConcMarkSweepGC" to "-XX:-UseParallelGC" as I've found it to be ~10% faster in both indexing and searching, in the multithreaded multi-core configuration that I have (YMMV). Let me know if the above parameters work OK. Other possibilities: 1 - -XX:+UseBiasedLocking See http://java.sun.com/performance/reference/whitepapers/tuning.html#section4.2.5 2 - -XX:+UseFastAccessorMethods 3 - -XX:CompileThreshold=2 But I would suggest doing some profiling to see where the bottlenexk is in your system, using either a commercial profiler or some the tools included in the JDK: http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/tooldescr.html thanks, Glen 2008/4/25 Anshum <[EMAIL PROTECTED]>: > Hi Glen, > > After starting the JVM as you suggested, the daemon (actually the JVM) > crashed unexpectedly. > I didn't really understand the reason behind it from the log (attached), > could you just have a look and help me deciphering the same. > > Thanks, > Anshum > > > > > > On Thu, Apr 24, 2008 at 12:47 AM, Glen Newton <[EMAIL PROTECTED]> wrote: > > Hi Anshum, > > > > 2008/4/23 Anshum <[EMAIL PROTECTED]>: > > > > > Hi Glen, > > > > > > As far as stats for index/search are concerned, here they are: > > > * Yes, it is a web based application > > > * I am currently facing issues when the number of concurrent searches > goes > > > high. The search is not able to handle over 2.5 searches per second. > > What is the OS, hardware etc you are using? > > > > I am assuming you are only opening one instance of IndexSearcher for > > all of your threads/queries (I would do it in the servlet init() > > method) to use, as indicated in > > > http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/IndexSearcher.html > > > > > > > * JVM command line parameters: -server mode; Max heap size 2GB (As it > is a > > > 32 bit machine/OS and that is the limit in the case) > > What JVM/version? > > > > Have you tried running a profiler to see what is happening or even > > jconsole (add -Dcom.sun.management.jmxremote to your command line > > parameters and (in Linux at least) run jconsole after the JVM has > > started. > > > > A general suggestion: > > -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC -Xms2000m > > -XX:+UseConcMarkSweepGC > > > > Although the garbage collection depends on your configuration. > > > > > > > * Performance expectations : > > > * Mean search time :1-2 seconds (though that is a lot considering that > I > > > have tried other engines, though with smaller indexes and they seem to > have > > > sub 0.01s searching time) > > > > Yes, I am also surprised by this poor performance. > > > > > > > * No, I do not store any fields. > > > * Yes, I am using multiple machines ( more than a couple ).serving the > same > > > index; a little more than just a vanilla load balancing mechanism; > sending > > > similar requests to the same server to better OS level caching > > > * We do not index and search on the same machine, so in that case, I > would > > > not think that changing the priority of these threads would matter. In > other > > > words, there is nothing but search that happens on the search server. > If > > > that is what you meant when you wrote about increasing the thead > priority. > > > > Your servlet environment has many threads that have to do a lot of > > things before they get around to the part of your code that does the > > searching. Once they get to the search part, you want them to have > > higher priority than the threads that are handling requests much > > further back in the servlet pipeline. So set the thread priority > > higher when it gets to your code. Of course you would need to test to > > see the impact on performance in your particular case. > > > > > > > > > > I had already started trying the TPE and guess it would provide a lot > of > > > improvement but 'm looking for other ways as well ! :) > > > > -glen > > > > > > > > > > > > > -- > > > Anshum > > > > > > > > > > > > On Tue, Apr 22, 2008 at 11:19 PM, Glen Newton <[EMAIL PROTECTED]> > wrote: > > > > > > > So even if you only have one index, this is the way to go to manage > > > > this kind of problem. > > > > > > > > Looking at the implementation and having used ThreadPoolExecutor > (TPE) > > > > a lot, I would make the following suggestions for this class so as to > > > > better support this particular use case: > > > > Better access to the configuration of the TPE is needed: the ability > > > > to choose the number of threads (pools size and max pool size), the > > > > type and nature of the queue, etc. Also, the default behaviour of TPE > > > > is to throw an exception when a job is submitted and the queue is > > > > full: throwing an exception is expensive, especially when dozens or > > > > hundreds of searches are being rejected and many exceptions are > > > > occurring in a high load situation. Instead, being able to set the > > > > RejectedExecutionHandler on the TPE would allow for a graceful > > > > handling of rejected queries (feedback to user application, etc). > > > > Also, TPE allows for a custom ThreadFactory, which I use to produce > > > > threads with the highest priority to do the searching. > > > > > > > > Right now the implementation sets the #threads and queue size as a > > > > function of the number of Searchables, which is reasonable for this > > > > use case. But it would generalize better if these were a function of > > > > the number of cores on the machine, or some combination. > > > > > > > > That said, I would suggest having adding the ability to set the > > > > ThreadPoolExecutor completely, with a getter/setter. This would allow > > > > this class to be useful beyond the use case of multiple indexes, > > > > becoming more generalizable to a number of use cases including > > > > allowing it to support the use case of one (or more indexes) and a > > > > high work load of queries that need to be managed. It could use the > > > > same defaults if the TPE is not set externally (is null). > > > > > > > > -Glen > > > > > > > > 2008/4/22 Renaud Waldura <[EMAIL PROTECTED]>: > > > > > > one solution is to set-up a ThreadPoolExecutor[2] with a fixed > > > > > > number of threads and a limited queue size (use a bound > > > > BlockingQueue[3]) > > > > > > > > > > Yes, this is precisely how the ConcurrentMultiSearcher works. > > > > > https://issues.apache.org/jira/browse/LUCENE-423 > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Glen Newton [mailto:[EMAIL PROTECTED] > > > > > Sent: Tuesday, April 22, 2008 5:40 AM > > > > > To: java-user@lucene.apache.org > > > > > Subject: Re: Binding lucene instance/threads to a particular > > > > processor(or > > > > > core) > > > > > > > > > > > > > > > > > > > > Anshun, > > > > > > > > > > I think I am dealing with an index of similar scale: 6.4 million > > > > records, 83 > > > > > GB index (see [1] for more info) > > > > > > > > > > I mistakenly thought from your original posting that you were > > > > interested in > > > > > binding threads to processors for indexing, but it is sounding > like you > > > > want > > > > > to do this for searching. I am not sure if this would work well > for > > > > > searching, as there is a great deal more ephemeral state. But I am > not > > > > sure. > > > > > > > > > > With respect to handling concurrency for search, could you > describe > > > > your > > > > > environment better? > > > > > - Is it an web-base application? > > > > > - What sort of problems do you have now? > > > > > - What are your java command line parameters (heap, etc.) > > > > > - What are the performance expectations, i.e. average search < N1 > > > > seconds; > > > > > median search time < N2 seconds > > > > > - Are you storing any fields and if so, do you need to store so > many? > > > > > - Do you have the choice of serving searches from multiple > machines? > > > > > i.e. load balance across >1 machine; We've found this to be one of > the > > > > best > > > > > solutions for scaling; > > > > > - One thing that I do for both indexing and searching is that the > > > > threads > > > > > that are doing these tasks I always shift their priority to MAX, > so > > > > that > > > > > they are run in preference to threads doing other things, like > > > > preparing > > > > > Documents, etc. > > > > > > > > > > If it is a web-type environment, one solution is to set-up a > > > > > ThreadPoolExecutor[2] with a fixed number of threads and a limited > > > > queue > > > > > size (use a bound BlockingQueue[3]) . You would have to experiment > with > > > > the > > > > > numbers to get the sweet-spot for your situation. > > > > > I would suggest starting with 2 times the number processors > (cores) and > > > > a > > > > > queue of say 20. Requests are queued, but if the request cannot be > > > > queued, > > > > > at least the application then knows that it is too busy and you > can > > > > give the > > > > > user a message "The system is too busy at this moment, please try > again > > > > in a > > > > > few seconds..." kind of thing. The advantage is also that when > things > > > > are > > > > > very busy, you have the accepted requests handled in a reasonable > > > > amount of > > > > > time, and some users being told things are too busy, as opposed to > all > > > > the > > > > > requests going in and the system thrashing (and perhaps > running-out of > > > > > memory at the same time) and everyone's queries being very slow. > > > > > > > > > > But I would suggest setting-up a test harness to emulate your > > > > production > > > > > conditions to try these things out... > > > > > > > > > > -Glen > > > > > > > > > > [1] > > > > > http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks > > > > > .html > > > > > [2] > > > > > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx > > > > > ecutor.html > > > > > [3] > > > > > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu > > > > > e.html > > > > > > > > > > > > > > > 2008/4/22 Anshum <[EMAIL PROTECTED]>: > > > > > > The paper seems pretty good but I am still wondering if there > was a > > > > > > way to achieve this through the command line parameters. I'm > just > > > > > > trying this to optimize the code, if this works, would let all > know > > > > > > else would keep everyone informed :) Any other suggestions for > > > > > > handling a concurrency of over 7 search requests per second for > an > > > > > > index size of over 15Gigs containing over 13 million records? > > > > > > Also, could someone help me with obtaining a 'index size' - > > > > > > 'concurrency' - 'processor power' - 'memory' relationship > formula > > > > (or > > > > > something similar)? > > > > > > > > > > > > -- > > > > > > Anshum > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman > <[EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > > > That paper from 1997 is pretty old, but mirrors our > experiences in > > > > > > those > days. Then, we used Solaris processor sets to really > improve > > > > > > performance by > binding one of our processes to a particular > CPU > > > > > > while leaving the other > CPUs to manage the thread intensive > work. > > > > > > > > > > > > > > You can bind processes/LWPs to a CPU on Solaris with psrset. > > > > > > > > > > > > > > The Solaris thread model in the late '90s was also a > significant > > > > > > factor in > performance of multi-threaded programs. The > default > > > > > > thread library in > Solaris 8 implemented a MxN unbound thread > model > > > > > > (threads/LWPS). In those > days we found that it did not > perform > > > > > > well, so used the bound thread model > (i.e. 1:1) where a > Solaris > > > > > > thread was bound permanently to an LWP. That > improved > performance > > > > > > a lot. In Solaris 8, Sun had what they called the > > 'alternate' > > > > > > thread library (T2) around 2000, which became the default > > library > > > > > > in Solaris 9, and implemented a 1:1 model of Solaris threads to > > > > > > LWPs. > > > > > That new library had dramatic performance improvements over the > old. > > > > > > > > > > > > > > Some background info for Java and threading > > > > > > > > > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html > > > > > > > > > > > > > > Antony > > > > > > > > > > > > > > > > > > > > > > > > > > > > Glen Newton wrote: > > > > > > > > > > > > > > > I realised that not everyone on this list might be able to > > > > access > > > > > > the > > IEEE paper I pointed-out, so I will include the > abstract and > > > > > > some > > paragraphs from the paper which I have included below. > > > > > > > > > > > > > > > > Also of interest (and should be available to all): Fedorova > et > > > > al. > > > > > > > > 2005. Performance of Multithreaded Chip Multiprocessors And > > > > > > > > > Implications For Operating System Design. Usenix 2005. > > > > > > > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf > > > > > > > > "Abstract: We investigated how operating system design > should be > > > > > > > > adapted for multithreaded chip multiprocessors (CMT) - a new > > > > > > > > > generation of processors that exploit thread-level parallelism > to > > > > mask > > > > > > > > the memory latency in modern workloads. We > > determined > that > > > > > > the L2 cache is a critical shared resource on CMT and > > that > an > > > > > > insufficient amount of L2 cache can undermine the ability to > > > > > > > hide > > > > > > memory latency on these processors. To use the L2 cache as > > > > > > > > efficiently as possible, we propose an L2-conscious scheduling > > > > > > > > > algorithm and quantify its performance potential. Using this > > > > algorithm > > > > > > > > it is possible to reduce miss ratios in the L2 cache by > 25-37% > > > > and > > > > > > > > improve processor throughput by 27-45%." > > > > > > > > > > > > > > > > > > > > > > > > From Lundberg, L. 1997: > > > > > > > > Abstract: "The default scheduling algorithm in Solaris and > other > > > > > > > > operating systems may result in frequent relocation of > threads at > > > > > > > > run-time. Excessive thread relocation cause > > poor memory > > > > > > performance. This can be avoided by binding threads to > > > > > > > > processors. However, binding threads to processors may result in > an > > > > > > > > > > > > unbalanced load. By considering a previously obtained > theoretical > > > > > > > > > > > > result and by evaluating a set of multithreaded Solaris > > > > > > > > programs using a multiprocessor with 8 processors, we are able > to > > > > > > > > > > > > bound the maximum performance loss due to binding threads, The > > > > > > > > > theoretical result is also recapitulated. By evaluating another > set > > > > of > > > > > > > > multithreaded programs, we show that the gain of binding > threads > > > > > > to > > processors may be substantial, particularly for programs > with > > > > > > fine > > grained parallelism." > > > > > > > > > > > > > > > > First paragraph: "The thread concept in Solaris [3][5] and > other > > > > > > > > operating systems makes it possible to write multithreaded > > > > > > programs, > > which can be executed in parallel on a > multiprocessor. > > > > > > Previous > > experience from real world programs [4] show that, > > > > using > > > > > > the default > > scheduling algorithm in Solaris, threads are > > > > > > frequently relocated from > > one processor > > to another at > > > > > > run-time. After each such relocation, the code and data > > > > > > > > associated with the relocated thread is moved from the cache > memory > > > > of > > > > > > > > the 0113 processor to the cache of the new processor. This > > > > reduces > > > > > > the > > performance. Run-time relocation of threads to > processors > > > > can > > > > > > also > > result in unpredictable response times. This is a > problem > > > > in > > > > > > systems > > which operate in a real-time environment. In order > to > > > > > > avoid poor > > memory performance and unpredictable real-time > > > > > > behaviour due to > > frequent thread relocation, threads can be > > > > bound > > > > > > to processors using > > the processor-bind directive [3] [5]. > The > > > > > > major problem with binding > > threads is that one can end up > with > > > > an > > > > > > unbalanced load, i.e. some > > processors may be extremely busy > > > > > > during some time periods while other > > processors are idle." > > > > > > > > > > > > > > > > -Glen > > > > > > > > > > > > > > > > On 21/04/2008, Glen Newton <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > > And this discussion on bound threads may also shed light > on > > > > things: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200 > > > > > > 7-11/msg02801.html > > > > > > > > > > > > > > > > > > > > > > > > > > > -Glen > > > > > > > > > > > > > > > > > > > > > > > > > > > On 21/04/2008, Glen Newton <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > BInding threads to processors - in many situations - > > > > > > improves > > > > throughput by reducing memory overhead. When > a > > > > > > thread is running > > > on a > > > > core, its state is > local; if > > > > > > it is timeshared-out and either 1) > > > > swapped back in on > the > > > > > > same core, it is likely that there will be > > > the > > > > > > > > > > core's L1 cache; or 2) onto another core, there will not be a > > > > > > > > > > cache > > > > hit and the state will have to be fetched from > L2 or > > > > > > main memory, > > > > incurring a performance hit, esp in the > > > > > > latter. See Lundberg, L. > > > > > > > > > 1997. > > > > > > > > > > Evaluating the Performance Implications of Binding > Threads > > > > > > to > > > > Processors. 393. > > > > > > > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf > > > > > > > > > > for more info. > > > > > > > > > > > > > > > > > > > > If you are using JVM on Solaris on SPARC, you should > take > > > > a > > > > > > look > > > at > > > > the following links for tuning (the > Sun JVM > > > > > > on Solaris SPARC has > > > many > > > > more performance > tuning > > > > > > parameters available), including > > > threading: > > > > > > > > > > - > http://java.sun.com/docs/hotspot/threads/threads.html > > > > > > > > > > - > > > > > > > > > > > > > > > > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg > > > > > > 21107291 > > > > - > > > > > > http://java.sun.com/javase/technologies/performance.jsp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Glen > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 21/04/2008, Ulf Dittmer <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > This sounds odd. Why would restricting it to a > single > > > > > > > > > > > core improve performance? The point of using multiple > > > > > > > > > > > > cores (and multiple threads) is to improve performance > > > > > > > > > > > > > > > > isn't it? I'd leave thread scheduling decisions to the > > > > > > > > > > > > > > > > JVM. Plus, I don't think there is anything in Java to > > > > > > > > > > > > facilitate this (short of using JNI). > > > > > > > > > > > > > > > > > > > > > > Are you talking about indexing or searching? You > may > > > > > > > > > > > be able to use multiple parallel threads to improve > > > > > > > > > > > > indexing performance. I don't think Lucene uses > > > > > > > > > > > > multi-threading for searching; not unless you have > > > > > > > > > > > multiple indices, anyway. > > > > > > > > > > > > > > > > > > > > > > Ulf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- Anshum <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > I have been trying to bind my lucene instance > (JVM - > > > > > > > > > > > > Sun Hotspot*) to a > > > > > > particular > core so > > > > > > as to improve the performance. Is > > > > > > there a way to > do > > > > so > > > > > > or > > > > > > is there support in lucene to explicitly > control > > > > > > the > > > > > > thread - processor > > > > > > linkup? > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > -- > > > > > > > > > > > > The facts expressed here belong to everybody, > the > > > > > > > > > > > > > > > > opinions to me. > > > > > > > > > > > > The distinction is yours to draw............ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > > > > ______________ > > > > > Be a better friend, newshound, and > > > > > > > > > > > > > > > > know-it-all with Yahoo! Mobile. Try it now. > > > > > > > > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > > To unsubscribe, e-mail: > > > > > > [EMAIL PROTECTED] > > > > > > > > > > > For additional commands, e-mail: > > > > > > > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > > > > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > -- > > > > > > The facts expressed here belong to everybody, the opinions to > me. > > > > > > The distinction is yours to draw............ > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > - > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > -- > > > -- > > > The facts expressed here belong to everybody, the opinions to me. > > > The distinction is yours to draw............ > > > > > > > > > > > -- > > > > > > > > > > - > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > -- > -- > The facts expressed here belong to everybody, the opinions to me. > The distinction is yours to draw............ > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- - --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]