[OMPI devel] RFC: Fragmented sm Allocations

2009-01-15 Thread Eugene Loh
I put back more code changes and refreshed the RFC a little.  So, if you 
want a latest/greatest copy, here is the (slightly) amended RFC.  Thanks 
for the positive feedback so far, but more scrutiny is appreciated!
Title: RFC: Fragmented sm Allocations




RFC: Fragmented sm Allocations


WHAT:  Dealing with the fragmented allocations of sm BTL FIFO
circular buffers (CB) during MPI_Init().


Also:

 Improve handling of error codes.
 Automate the sizing of the mmap file.



WHY:  To reduce consumption of shared memory, making job startup
more robust, and possibly improving the scalability of startup.


WHERE: In mca_btl_sm_add_procs(), there is a loop
over calls to ompi_fifo_init().  This is where CBs are
initialized one at a time, components of a CB allocated individually.
Changes can be seen in
ssh://www.open-mpi.org/~eugene/hg/sm-allocation.


WHEN:  Upon acceptance.


TIMEOUT:  January 30, 2009.



WHY (details)


The sm BTL establishes a FIFO for each non-self, on-node
connection.  Each FIFO is initialized during MPI_Init()
with a circular buffer (CB).  (More CBs can be added later in
program execution if a FIFO runs out of room.)


A CB has different components that are used in different ways:

 The "wrapper" is read by both sender and receiver,
 but is rarely written.
 The "queue" (FIFO entries) is accessed by both the
 sender and receiver.
 The "head" is accessed by the sender.
 The "tail" is accessed by the receiver.



For performance reasons, a CB is not allocated as one large data structure.
Rather, these components are laid out separately in memory and the
wrapper has pointers to the various locations.  Performance
considerations include:

 false sharing: a component used by one process should
 not share a cacheline with another component that is
 modified by another process
 NUMA: certain components should perhaps be mapped
 preferentially to memory pages that are close to the
 processes that access these components



Currently, the sm BTL handles these issues by allocating
each component of each CB its own page.  (Actually, it couples
tails and queues together.)


As the number of on-node processes grows, however, the shared-memory
allocation skyrockets.  E.g., let's say there are n processes
on-node.  There are therefore n(n-1) = O(n2)
FIFOs, each with 3 allocations (wrapper, head, and tail/queue).  The
shared-memory allocation for CBs becomes 3n2 pages.
For large n, this dominates the shared-memory consumption,
even though most of the CB allocation is unused.  E.g., a 12-byte
"head" ends up consuming a full memory page!


Not only is the 3n2-page allocation large, but it
is also not tunable via any MCA parameters.


Large shared-memory consumption has led to some number of
start-up and other user problems.  E.g., the e-mail thread at
http://www.open-mpi.org/community/lists/devel/2008/11/4882.php.

WHAT (details)


Several actions are recommended here.

1.  Cacheline Rather than Pagesize Alignment


The first set of changes reduces pagesize to cacheline alignment.
Though mapping to pages is motivated by NUMA locality, note:

 The code already has NUMA locality optimizations
 (maffinity and mpools) anyhow.
 There is no data that I'm aware of substantiating the
 benefits of locality optimizations in this context.
 
 
 More to the point, I've tried some such experiments myself.
 I had two processes communicating via shared memory on a
 large SMP that had a large difference between remote and local
 memory access times.  I timed the roundtrip latency for
 pingpongs between the processes.  That time was correlated
 to the relative separation between the two processes, and
 not at all to the placement of the physical memory backing
 the shared variables.  It did not matter if the memory was
 local to the sender or receiver or remote from both!  (E.g.,
 colocal processes showed fast timings even if the shared
 memory were remote to both processes.)
 
 My results do not prove that all NUMA platforms behave in the
 same way.  My point is only that, though I understand the
 logic behind locality optimizations for FIFO placement, the
 only data I am aware of does not substantiate that logic.
 



The changes are:

 File: ompi/mca/mpool/sm/mpool_sm_module.c
 Function: mca_mpool_sm_alloc()
 
 Use the alignment requested by the caller rather than
 adding additional pagesize alignment as well.
 File: ompi/class/ompi_fifo.h
 Function: ompi_fifo_init()
  and ompi_fifo_write_to_head()
 
 Align ompi_cb_fifo_wrapper_t structure on cacheline rather than page.
 File: ompi/class/ompi_circular_buffer_fifo.h
 Function: ompi_cb_fifo_init()
 
 Align the two calls to mpool_alloc on cacheline rather than page.


2.  Aggregated Allocation


Another option is to lay out all the CBs at once and aggregate
their allocations.


This may have the added benefit of 

[OMPI devel] This is why we test

2009-01-15 Thread Jeff Squyres

Unfortunately, I have to throw the flag in the v1.3 release.  :-(

I ran ~16k tests via MTT yesterday on the rc5 and rc6 tarballs.  I  
found the following:


Found test runs: 15962
Passed: 15785 (98.89%)
Failed: 83 (0.52%)
--> Openib failures: 80 (0.50%)
Skipped: 46 (0.29%)
Timedout: 48 (0.30%)

The 80 openib failures are all seemingly random segv's.  I repeated a  
much smaller run this morning (about 700 runs) and still found a non- 
zero percentage of fails of the same flavor.


The timeouts are a little worrysome as well.

This unfortunately requires investigation.  :-(

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] -display-map

2009-01-15 Thread Ralph Castain
Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field from  
the noderesolve element since the info is specific to the host element  
that contains it. In other words, I can make it look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point. The  
problem will be the natural interleaving of stdout/err from the  
various procs due to the async behavior of MPI. Mpirun receives  
fragmented output in the forwarding system, limited by the buffer  
sizes and the amount of data we can read at any one "bite" from the  
pipes connecting us to the procs. So even though the user -thinks-  
they output a single large line of stuff, it may show up at mpirun  
as a series of fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name info, I  
can at least do something about that area. Won't be in 1.3.0, but  
should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be activated  
in 1.3.0. However, I believe it is activated on the trunk when you  
do --xml --tagged-output, so perhaps some testing will help us debug  
and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so we  
consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you  
may as well leave as-is and we will attempt to clean it up in  
Eclipse. It would be nice if a future version of ompi could output  
correct XML (including stdout) as this would vastly simplify the  
parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to expose  
the display-map option across the code base and check it prior to  
printing anything about resolving node names. I guess I should  
ask: do you only want noderesolve statements when we are  
displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that time  
(likewise for the end). Anything we output in-between would be  
encapsulated between the two, but that would include any user  
output to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem. To  
be valid, there needs to be only one root element, but currently  
you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in  
the trunk, though. Otherwise, the CMR procedure will fall  
behind and a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea  
why this is? Also the end tag should be "/>" not ">".


Thanks,

Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to

Re: [OMPI devel] This is why we test

2009-01-15 Thread Jeff Squyres
Pasha and I *think* we have a fix.  However, we're not quite clear on  
this part of the code, so we need some more testing and eyes on the  
code.


I'll start the tests now -- given that this is a low-frequency bug,  
I'm going to run a slightly larger MTT run (several thousand tests)  
that'll take a few hours (not the ~12 hours that my MTT run took  
yesterday) and see if we can get reasonable confidence that we fixed it.



On Jan 15, 2009, at 9:05 AM, Jeff Squyres wrote:


Unfortunately, I have to throw the flag in the v1.3 release.  :-(

I ran ~16k tests via MTT yesterday on the rc5 and rc6 tarballs.  I  
found the following:


Found test runs: 15962
Passed: 15785 (98.89%)
Failed: 83 (0.52%)
--> Openib failures: 80 (0.50%)
Skipped: 46 (0.29%)
Timedout: 48 (0.30%)

The 80 openib failures are all seemingly random segv's.  I repeated  
a much smaller run this morning (about 700 runs) and still found a  
non-zero percentage of fails of the same flavor.


The timeouts are a little worrysome as well.

This unfortunately requires investigation.  :-(

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] -display-map

2009-01-15 Thread Greg Watson

Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the host  
element that contains it. In other words, I can make it look like  
this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point. The  
problem will be the natural interleaving of stdout/err from the  
various procs due to the async behavior of MPI. Mpirun receives  
fragmented output in the forwarding system, limited by the buffer  
sizes and the amount of data we can read at any one "bite" from the  
pipes connecting us to the procs. So even though the user -thinks-  
they output a single large line of stuff, it may show up at mpirun  
as a series of fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name info, I  
can at least do something about that area. Won't be in 1.3.0, but  
should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be activated  
in 1.3.0. However, I believe it is activated on the trunk when you  
do --xml --tagged-output, so perhaps some testing will help us  
debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you  
may as well leave as-is and we will attempt to clean it up in  
Eclipse. It would be nice if a future version of ompi could output  
correct XML (including stdout) as this would vastly simplify the  
parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to expose  
the display-map option across the code base and check it prior to  
printing anything about resolving node names. I guess I should  
ask: do you only want noderesolve statements when we are  
displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between would  
be encapsulated between the two, but that would include any user  
output to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root element, but  
currently you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in  
the trunk, though. Otherwise, the CMR procedure will fall  
behind and a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea  
why this is? Also the end t

Re: [OMPI devel] -display-map

2009-01-15 Thread Ralph Castain
Okay, it is in the trunk as of r20284 - I'll file the request to have  
it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1 as  
well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the host  
element that contains it. In other words, I can make it look like  
this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point. The  
problem will be the natural interleaving of stdout/err from the  
various procs due to the async behavior of MPI. Mpirun receives  
fragmented output in the forwarding system, limited by the buffer  
sizes and the amount of data we can read at any one "bite" from  
the pipes connecting us to the procs. So even though the user - 
thinks- they output a single large line of stuff, it may show up  
at mpirun as a series of fragments. Hence, it gets tricky to know  
how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name info,  
I can at least do something about that area. Won't be in 1.3.0,  
but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you  
may as well leave as-is and we will attempt to clean it up in  
Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check it  
prior to printing anything about resolving node names. I guess I  
should ask: do you only want noderesolve statements when we are  
displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between would  
be encapsulated between the two, but that would include any user  
output to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the  
beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root element, but  
currently you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in  
the trunk, though. Otherwise, the CMR procedure will fall  
behind and a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph


[OMPI devel] RFC: Eliminate opal_round_up_to_nearest_pow2()

2009-01-15 Thread Eugene Loh
I don't know what scope of changes require RFCs, but here's a trivial 
change.


==

RFC: Eliminate opal_round_up_to_nearest_pow2().

WHAT: Eliminate the function opal_round_up_to_nearest_pow2().

WHY: It's poorly written.  A clean rewrite would take only
one line of code, which is best written in-place rather than
making a function call.  Currently, there is only one call
site, and I expect it to go away.  Similar code already inlines
such computation rather than calling this obscure, unused function.

WHERE: opal/util

WHEN: Upon acceptance.  For 1.4.

TIMEOUT: January 30, 2009.

==

WHY (details):

The function opal_round_up_to_nearest_pow2() is defined in
opal/util/pow2.c and is declared in the corresponding include
file.  It returns the calling argument rounded up to the
nearest power of 2.

This code is difficult to read.  That is, it is difficult to
reason about the code's correctness or range of validity or
its behavior outside the range of validity.  Meanwhile, it
offers no compelling advantage -- e.g., fast performance or
increased robustness.  It is called by only one site, which
is going away.

To use the existing function, one must know of its existence,
include the proper header file, and make the appropriate function
call.

In contrast, a cleanly written version of this code takes only
one line of code.  Hence, calling sites are cleaner and faster
if they simply in-line this computation.  Further, such code is
customizable (e.g., round down to a power of 2).  Most "round to
power of 2" computations in OMPI today already simply implement
this computation themselves rather than calling this obscure
function.  E.g., search on "pow2" in
ompi/mca/coll/tuned/coll_tuned_decision_fixed.c.

WHAT (details):

- Remove the file opal/util/pow2.c.

- Remove the file opal/util/pow2.h.

- In opal/util/Makefile.am, remove pow2.h and pow2.c

- In ompi/class/ompi_circular_buffer_fifo.h, if the call
 to opal_round_up_to_nearest_pow2() has not already been
 removed, then remove

#include "opal/util/pow2.h"

 and replace

size = opal_round_up_to_nearest_pow2(size_of_fifo);

 with

size = 1;
while ( size < size_of_fifo)
size <<= 1;

 or the even more compact

for (size = 1; size < size_of_fifo; size <<= 1);


Re: [OMPI devel] RFC: Eliminate opal_round_up_to_nearest_pow2()

2009-01-15 Thread George Bosilca
Absolutely! Why wait until the 1.4 while we can have that in the  
1.3.1...


  george.

On Jan 15, 2009, at 16:39 , Eugene Loh wrote:

I don't know what scope of changes require RFCs, but here's a  
trivial change.


==

RFC: Eliminate opal_round_up_to_nearest_pow2().

WHAT: Eliminate the function opal_round_up_to_nearest_pow2().

WHY: It's poorly written.  A clean rewrite would take only
one line of code, which is best written in-place rather than
making a function call.  Currently, there is only one call
site, and I expect it to go away.  Similar code already inlines
such computation rather than calling this obscure, unused function.

WHERE: opal/util

WHEN: Upon acceptance.  For 1.4.

TIMEOUT: January 30, 2009.

==

WHY (details):

The function opal_round_up_to_nearest_pow2() is defined in
opal/util/pow2.c and is declared in the corresponding include
file.  It returns the calling argument rounded up to the
nearest power of 2.

This code is difficult to read.  That is, it is difficult to
reason about the code's correctness or range of validity or
its behavior outside the range of validity.  Meanwhile, it
offers no compelling advantage -- e.g., fast performance or
increased robustness.  It is called by only one site, which
is going away.

To use the existing function, one must know of its existence,
include the proper header file, and make the appropriate function
call.

In contrast, a cleanly written version of this code takes only
one line of code.  Hence, calling sites are cleaner and faster
if they simply in-line this computation.  Further, such code is
customizable (e.g., round down to a power of 2).  Most "round to
power of 2" computations in OMPI today already simply implement
this computation themselves rather than calling this obscure
function.  E.g., search on "pow2" in
ompi/mca/coll/tuned/coll_tuned_decision_fixed.c.

WHAT (details):

- Remove the file opal/util/pow2.c.

- Remove the file opal/util/pow2.h.

- In opal/util/Makefile.am, remove pow2.h and pow2.c

- In ompi/class/ompi_circular_buffer_fifo.h, if the call
to opal_round_up_to_nearest_pow2() has not already been
removed, then remove

   #include "opal/util/pow2.h"

and replace

   size = opal_round_up_to_nearest_pow2(size_of_fifo);

with

   size = 1;
   while ( size < size_of_fifo)
   size <<= 1;

or the even more compact

   for (size = 1; size < size_of_fifo; size <<= 1);
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: Eliminate opal_round_up_to_nearest_pow2()

2009-01-15 Thread Jeff Squyres

Ditto; kill it.

I marginally prefer 1.4 because it really doesn't affect anything in  
the now-more-or-less-static 1.3 series, right?



On Jan 15, 2009, at 5:01 PM, George Bosilca wrote:

Absolutely! Why wait until the 1.4 while we can have that in the  
1.3.1...


 george.

On Jan 15, 2009, at 16:39 , Eugene Loh wrote:

I don't know what scope of changes require RFCs, but here's a  
trivial change.


==

RFC: Eliminate opal_round_up_to_nearest_pow2().

WHAT: Eliminate the function opal_round_up_to_nearest_pow2().

WHY: It's poorly written.  A clean rewrite would take only
one line of code, which is best written in-place rather than
making a function call.  Currently, there is only one call
site, and I expect it to go away.  Similar code already inlines
such computation rather than calling this obscure, unused function.

WHERE: opal/util

WHEN: Upon acceptance.  For 1.4.

TIMEOUT: January 30, 2009.

==

WHY (details):

The function opal_round_up_to_nearest_pow2() is defined in
opal/util/pow2.c and is declared in the corresponding include
file.  It returns the calling argument rounded up to the
nearest power of 2.

This code is difficult to read.  That is, it is difficult to
reason about the code's correctness or range of validity or
its behavior outside the range of validity.  Meanwhile, it
offers no compelling advantage -- e.g., fast performance or
increased robustness.  It is called by only one site, which
is going away.

To use the existing function, one must know of its existence,
include the proper header file, and make the appropriate function
call.

In contrast, a cleanly written version of this code takes only
one line of code.  Hence, calling sites are cleaner and faster
if they simply in-line this computation.  Further, such code is
customizable (e.g., round down to a power of 2).  Most "round to
power of 2" computations in OMPI today already simply implement
this computation themselves rather than calling this obscure
function.  E.g., search on "pow2" in
ompi/mca/coll/tuned/coll_tuned_decision_fixed.c.

WHAT (details):

- Remove the file opal/util/pow2.c.

- Remove the file opal/util/pow2.h.

- In opal/util/Makefile.am, remove pow2.h and pow2.c

- In ompi/class/ompi_circular_buffer_fifo.h, if the call
to opal_round_up_to_nearest_pow2() has not already been
removed, then remove

  #include "opal/util/pow2.h"

and replace

  size = opal_round_up_to_nearest_pow2(size_of_fifo);

with

  size = 1;
  while ( size < size_of_fifo)
  size <<= 1;

or the even more compact

  for (size = 1; size < size_of_fifo; size <<= 1);
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Eliminate opal_round_up_to_nearest_pow2()

2009-01-15 Thread Tim Mattox
Just for reference, if a power of 2 computation is in a performance critical
path, using something like this would be better than a loop:
http://aggregate.org/MAGIC/#Next%20Largest%20Power%20of%202
(it's not the exact same function, just subtract 1 before starting to get
the same function, AFAIK.)

On Thu, Jan 15, 2009 at 4:39 PM, Eugene Loh  wrote:
> I don't know what scope of changes require RFCs, but here's a trivial
> change.
>
> ==
>
> RFC: Eliminate opal_round_up_to_nearest_pow2().
>
> WHAT: Eliminate the function opal_round_up_to_nearest_pow2().
>
> WHY: It's poorly written.  A clean rewrite would take only
> one line of code, which is best written in-place rather than
> making a function call.  Currently, there is only one call
> site, and I expect it to go away.  Similar code already inlines
> such computation rather than calling this obscure, unused function.
>
> WHERE: opal/util
>
> WHEN: Upon acceptance.  For 1.4.
>
> TIMEOUT: January 30, 2009.
>
> ==
>
> WHY (details):
>
> The function opal_round_up_to_nearest_pow2() is defined in
> opal/util/pow2.c and is declared in the corresponding include
> file.  It returns the calling argument rounded up to the
> nearest power of 2.
>
> This code is difficult to read.  That is, it is difficult to
> reason about the code's correctness or range of validity or
> its behavior outside the range of validity.  Meanwhile, it
> offers no compelling advantage -- e.g., fast performance or
> increased robustness.  It is called by only one site, which
> is going away.
>
> To use the existing function, one must know of its existence,
> include the proper header file, and make the appropriate function
> call.
>
> In contrast, a cleanly written version of this code takes only
> one line of code.  Hence, calling sites are cleaner and faster
> if they simply in-line this computation.  Further, such code is
> customizable (e.g., round down to a power of 2).  Most "round to
> power of 2" computations in OMPI today already simply implement
> this computation themselves rather than calling this obscure
> function.  E.g., search on "pow2" in
> ompi/mca/coll/tuned/coll_tuned_decision_fixed.c.
>
> WHAT (details):
>
> - Remove the file opal/util/pow2.c.
>
> - Remove the file opal/util/pow2.h.
>
> - In opal/util/Makefile.am, remove pow2.h and pow2.c
>
> - In ompi/class/ompi_circular_buffer_fifo.h, if the call
>  to opal_round_up_to_nearest_pow2() has not already been
>  removed, then remove
>
>#include "opal/util/pow2.h"
>
>  and replace
>
>size = opal_round_up_to_nearest_pow2(size_of_fifo);
>
>  with
>
>size = 1;
>while ( size < size_of_fifo)
>size <<= 1;
>
>  or the even more compact
>
>for (size = 1; size < size_of_fifo; size <<= 1);
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


[OMPI devel] Open MPI v1.3rc7 has been posted

2009-01-15 Thread Tim Mattox
Hi All,
The seventh release candidate of Open MPI v1.3 is now available:

 http://www.open-mpi.org/software/ompi/v1.3/

Please run it through it's paces as best you can.
Anticipated release of 1.3 is Friday...  of what month or year, I don't know...

This only differs from rc6 with an openib change... see ticket #1753:
https://svn.open-mpi.org/trac/ompi/ticket/1753
-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [OMPI devel] RFC: Eliminate opal_round_up_to_nearest_pow2()

2009-01-15 Thread Eugene Loh
Thanks.  For my edification:  are such trivial changes deserving of 
RFCs?  Perfect for RFCs?  Good for RFCs while I'm still getting my feet 
wet, but unnecessary once I get the hang of things?


1.4 was poor counting on my part:  1.3+1=1.4.  The new math.  I guess 
actually 1.3+1=1.3.1.  I'm fine with 1.3.1.  It's a small, safe change.  
The sooner the better.  But, I'm open to expert opinion.


Jeff Squyres wrote:


Ditto; kill it.

I marginally prefer 1.4 because it really doesn't affect anything in  
the now-more-or-less-static 1.3 series, right?


On Jan 15, 2009, at 5:01 PM, George Bosilca wrote:

Absolutely! Why wait until the 1.4 while we can have that in the  
1.3.1...


On Jan 15, 2009, at 16:39 , Eugene Loh wrote:

I don't know what scope of changes require RFCs, but here's a  
trivial change.

==
RFC: Eliminate opal_round_up_to_nearest_pow2().




Re: [OMPI devel] Open MPI v1.3rc7 has been posted

2009-01-15 Thread Jeff Squyres
I did a large MTT run (about 7k tests) with the openib patch on rc6  
and all came out good.  I'll do the same run on rc7 -- the results  
should be identical.  Will post results tomorrow morning.


On Jan 15, 2009, at 5:24 PM, Tim Mattox wrote:


Hi All,
The seventh release candidate of Open MPI v1.3 is now available:

http://www.open-mpi.org/software/ompi/v1.3/

Please run it through it's paces as best you can.
Anticipated release of 1.3 is Friday...  of what month or year, I  
don't know...


This only differs from rc6 with an openib change... see ticket #1753:
https://svn.open-mpi.org/trac/ompi/ticket/1753
--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Eliminate opal_round_up_to_nearest_pow2()

2009-01-15 Thread Jeff Squyres

On Jan 15, 2009, at 5:36 PM, Eugene Loh wrote:

Thanks.  For my edification:  are such trivial changes deserving of  
RFCs?  Perfect for RFCs?  Good for RFCs while I'm still getting my  
feet wet, but unnecessary once I get the hang of things?


I think that once you're comfortable you can omit RFCs for these kinds  
of small things.


1.4 was poor counting on my part:  1.3+1=1.4.  The new math.  I  
guess actually 1.3+1=1.3.1.  I'm fine with 1.3.1.  It's a small,  
safe change.  The sooner the better.  But, I'm open to expert opinion.


What I was trying to say (but said poorly) in my previous mail was: if  
this change is really only code cleanup and has no other effect on the  
v1.3 series, then just leave it on the trunk and let it go into v1.4.   
I say this because the v1.3 series is effectively done; there won't be  
much/any new development on it from here on out.


If there's a reason to put it into v1.3.1 (e.g., it's in the  
performance-critical path and the new one is better), then put it in  
for v1.3.1.


--
Jeff Squyres
Cisco Systems