Re: Binding lucene instance/threads to a particular processor(or core)

Antony Bowesman Mon, 21 Apr 2008 15:26:34 -0700

That paper from 1997 is pretty old, but mirrors our experiences in those days.Then, we used Solaris processor sets to really improve performance by bindingone of our processes to a particular CPU while leaving the other CPUs to managethe thread intensive work.


You can bind processes/LWPs to a CPU on Solaris with psrset.

The Solaris thread model in the late '90s was also a significant factor inperformance of multi-threaded programs. The default thread library in Solaris 8implemented a MxN unbound thread model (threads/LWPS). In those days we foundthat it did not perform well, so used the bound thread model (i.e. 1:1) where aSolaris thread was bound permanently to an LWP. That improved performance alot. In Solaris 8, Sun had what they called the 'alternate' thread library (T2)around 2000, which became the default library in Solaris 9, and implemented a1:1 model of Solaris threads to LWPs. That new library had dramatic performanceimprovements over the old.


Some background info for Java and threading

http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html

Antony


Glen Newton wrote:

I realised that not everyone on this list might be able to access the
IEEE paper I pointed-out, so I will include the abstract and some
paragraphs from the paper which I have included below.

Also of interest (and should be available to all): Fedorova et al.
2005. Performance of Multithreaded Chip Multiprocessors And
Implications For Operating System Design. Usenix 2005.
http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
"Abstract: We investigated how operating system design should be
adapted for multithreaded chip multiprocessors (CMT) – a new
generation of processors that exploit thread-level parallelism to mask
the memory latency in modern workloads. We
determined that the L2 cache is a critical shared resource on CMT and
that an insufficient amount of L2 cache can undermine the ability to
hide memory latency on these processors. To use the L2 cache as
efficiently as possible, we propose an L2-conscious scheduling
algorithm and quantify its performance potential. Using this algorithm
it is possible to reduce miss ratios in the L2 cache by 25-37% and
improve processor throughput by 27-45%."


From Lundberg, L. 1997:
Abstract: "The default scheduling algorithm in Solaris and other
operating systems may result in frequent relocation of threads at
run-time. Excessive thread relocation cause
poor memory performance. This can be avoided by binding threads to
processors. However, binding threads to processors may result in an
unbalanced load. By considering a previously obtained theoretical
result and by evaluating a set of multithreaded Solaris
programs using a multiprocessor with 8 processors, we are able to
bound the maximum performance loss due to binding threads, The
theoretical result is also recapitulated. By evaluating another set of
multithreaded programs, we show that the gain of binding threads to
processors may be substantial, particularly for programs with fine
grained parallelism."

First paragraph: "The thread concept in Solaris [3][5] and other
operating systems makes it possible to write multithreaded programs,
which can be executed in parallel on a multiprocessor. Previous
experience from real world programs [4] show that, using the default
scheduling algorithm in Solaris, threads are frequently relocated from
one processor
to another at run-time. After each such relocation, the code and data
associated with the relocated thread is moved from the cache memory of
the 0113 processor to the cache of the new processor. This reduces the
performance. Run-time relocation of threads to processors can also
result in unpredictable response times. This is a problem in systems
which operate in a real-time environment. In order to avoid poor
memory performance and unpredictable real-time behaviour due to
frequent thread relocation, threads can be bound to processors using
the processor-bind directive [3] [5]. The major problem with binding
threads is that one can end up with an unbalanced load, i.e. some
processors may be extremely busy during some time periods while other
processors are idle."

-Glen

On 21/04/2008, Glen Newton <[EMAIL PROTECTED]> wrote:

And this discussion on bound threads may also shed light on things:
 
http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/2007-11/msg02801.html


 -Glen


 On 21/04/2008, Glen Newton <[EMAIL PROTECTED]> wrote:
 > BInding threads to processors - in many situations - improves
 >  throughput by reducing memory overhead. When a thread is running on a
 >  core, its state is local; if it is timeshared-out and either 1)
 >  swapped back in on the same core, it is likely that there will be  the
 >  core's L1 cache; or 2) onto another core, there will not be a cache
 >  hit and the state will have to be fetched from L2 or main memory,
 >  incurring a performance hit, esp in the latter. See Lundberg, L. 1997.
 >  Evaluating the Performance Implications of Binding Threads to
 >  Processors. 393.http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
 >  for more info.
 >
 >  If you are using JVM on Solaris on SPARC, you should take a look at
 >  the following links for tuning (the Sun JVM on Solaris SPARC has many
 >  more performance tuning parameters available), including threading:
 >  - http://java.sun.com/docs/hotspot/threads/threads.html
 >  - http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
 >  - 
http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg21107291
 >  - http://java.sun.com/javase/technologies/performance.jsp
 >
 >
 >  -Glen
 >
 >
 >
 >
 >
 >
 >
 >  On 21/04/2008, Ulf Dittmer <[EMAIL PROTECTED]> wrote:
 >  > This sounds odd. Why would restricting it to a single
 >  >  core improve performance? The point of using multiple
 >  >  cores (and multiple threads) is to improve performance
 >  >  isn't it? I'd leave thread scheduling decisions to the
 >  >  JVM. Plus, I don't think there is anything in Java to
 >  >  facilitate this (short of using JNI).
 >  >
 >  >  Are you talking about indexing or searching? You may
 >  >  be able to use multiple parallel threads to improve
 >  >  indexing performance. I don't think Lucene uses
 >  >  multi-threading for searching; not unless you have
 >  >  multiple indices, anyway.
 >  >
 >  >  Ulf
 >  >
 >  >
 >  >
 >  >  --- Anshum <[EMAIL PROTECTED]> wrote:
 >  >
 >  >  > Hi,
 >  >  >
 >  >  > I have been trying to bind my lucene instance (JVM -
 >  >  > Sun Hotspot*) to a
 >  >  > particular core so as to improve the performance. Is
 >  >  > there a way to do so or
 >  >  > is there support in lucene to explicitly control the
 >  >  > thread - processor
 >  >  > linkup?
 >  >  >
 >  >  > --
 >  >  > --
 >  >  > The facts expressed here belong to everybody, the
 >  >  > opinions to me.
 >  >  > The distinction is yours to draw............
 >  >  >
 >  >
 >  >
 >  >
 >  >
 >  >       
____________________________________________________________________________________
 >  >  Be a better friend, newshound, and
 >  >  know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
 >  >
 >  >  ---------------------------------------------------------------------
 >  >  To unsubscribe, e-mail: [EMAIL PROTECTED]
 >  >  For additional commands, e-mail: [EMAIL PROTECTED]
 >  >
 >  >
 >
 >
 >
 > --
 >
 >  -
 >



--

 -



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Binding lucene instance/threads to a particular processor(or core)

Reply via email to