Re: [OMPI devel] RFC 1/1: improvements to the "notifier" framework and ORTE WDC

2010-03-30 Thread Abhishek Kulkarni


On Mar 29, 2010, at 9:16 PM, Ralph Castain wrote:



On Mar 29, 2010, at 5:53 PM, Abhishek Kulkarni wrote:




On Mon, 29 Mar 2010, Sylvain Jeaugey wrote:


Hi Ralph,

For now, I think that yes, this is a unique identifier. However,  
in my opinion, this could be improved in the future replacing it  
by a unique string.


Something like :

#define ORTE_NOTIFIER_DEFINE_EVENT(eventstr, associated_text) {
 static int event = -1;
 if (OPAL_UNLIKELY(event == -1) {
event = opal_sos_create_new_event(eventstr, associated_text);
 }
 ..
}

This would move the event numbering to the OPAL layer, making it  
transparent to the developper.




This is a good suggestion, but then I think we end up relying on  
run-time generation of the event numbers and have to pay the extra  
cost of looking up the event in a list/array/hash each time we log  
the event.


Since it is -solely- intended to be in an error path, I fail to see  
the concern here.


My bad. Clearly I misunderstood here -- mostly because I vaguely  
remember (from [1]) that the original motivation
was to put conditional #ifdef'd hooks in the "fast path" as well. But  
if they ought to be on the "slow path", I think
it would be fair enough to consider Sylvain's suggestion of pushing  
the event numbering to SOS. In that, the
SOS hashtable could map the notifier events to their unique identifier  
and the threshold counter itself could be

encoded inside the identifier returned by SOS.

[1] http://www.open-mpi.org/community/lists/devel/2009/05/6132.php





From what I understand, and from the discussions that took place  
when this
proposal was first put up on the devel list, is that since the  
event tracing hooks could lie in the critical path, we want the  
overhead to be as low as possible. By manually defining the unique  
identifiers, we can generate the event tracing macro at compile- 
time and have a minimal tracing impact.


Surely you jest - yes?? The event tracing hooks should -never- be in  
the critical path. The notifier is intended -solely- to be called  
when an error (or some other critical event) has already been  
detected. The idea was that we detect an error, and then (if  
selected) notify someone about it.


The last thing we want to do, IMHO, is put the notifier in a  
critical path. If we do, I personally will regret having created  
it :-)





My 2¢ ofcourse.

Thanks
Abhishek


Just my 2 cents ...

Sylvain

On Mon, 29 Mar 2010, Ralph Castain wrote:


Hi Abhishek
I'm confused by the WDC wiki page, specifically the part about the
new ORTE_NOTIFIER_DEFINE_EVENT macro. Are you saying
that I (as the developer) have to provide this macro with a unique
notifier id? So that would mean that ORTE/OMPI would
have to maintain a global notifier id counter to ensure it is  
unique?


If so, that seems really cumbersome. Could you please clarify?

Thanks
Ralph

On Mar 29, 2010, at 8:57 AM, Abhishek Kulkarni wrote:

  
= 
= 
= 
===

 [RFC 1/2]
  
= 
= 
= 
===


 WHAT: Merge improvements to the "notifier" framework from  
the OPAL

 SOS
 and the ORTE WDC mercurial branches into the SVN trunk.

 WHY: Some improvements and interface changes were put into  
the ORTE
notifier framework during the development of the OPAL  
SOS[1] and

ORTE WDC[2] branches.

 WHERE: Mostly restricted to ORTE notifier files and files  
using the

  notifier interface in OMPI.

 TIMEOUT: The weekend of April 2-3.

 REFERENCE MERCURIAL REPOS:
 * SOS development: http://bitbucket.org/jsquyres/opal-sos-fixed/
 * WDC development: http://bitbucket.org/derbeyn/orte-wdc- 
fixed/


  
= 
= 
= 
===


 BACKGROUND:

 The notifier interface and its components underwent a host of
 improvements and changes during the development of the  
SOS[1] and

 the
 WDC[2] branches.  The ORTE WDC (Warning Data Capture) branch  
enables
 accounting of events through the use of notifier interface,  
whereas
 OPAL SOS uses the notifier interface by setting up callbacks  
to

 relay
 out logged events.

 Some of the improvements include:

 - added more severity levels.
 - "ftb" notifier improvements.
 - "command" notifier improvements.
 - added "file" notifier component
 - changes in the notifier modules selection
 - activate only a subset of the callbacks
 (i.e. any combination of log, help, log_peer)
 - define different output media for any given callback (e.g.
 log_peer
 can be redirected to the syslog and smtp, while the  
show_help can be

 sent to the hnp).
 - ORTE_NOTIFIER_LOG_EVENT() (that accounts and warns about  
unusual

 events)

 Much more information is available on these t

Re: [OMPI devel] RFC 1/1: improvements to the "notifier" framework and ORTE WDC

2010-03-30 Thread Sylvain Jeaugey

On Mon, 29 Mar 2010, Abhishek Kulkarni wrote:


#define ORTE_NOTIFIER_DEFINE_EVENT(eventstr, associated_text) {
 static int event = -1;
 if (OPAL_UNLIKELY(event == -1) {
event = opal_sos_create_new_event(eventstr, associated_text);
 }
 ..
}

This is a good suggestion, but then I think we end up relying on run-time 
generation of the event numbers

Yes.

and have to pay the extra cost of looking up the event in a 
list/array/hash each time we log the event.
No. Of course not, that's the point of the "static int" here. The 
"create_new_event" function will be only called once ; the event is then 
stored and used directly whenever we enter this code again.


But yes, I'm adding an "if", which may cost a little more than just the 
counter increment.


From what I understand, and from the discussions that took place when this 
proposal was first put up on the devel list, is that since the event tracing 
hooks could lie in the critical path, we want the overhead to be as low as 
possible. By manually defining the unique identifiers, we can generate the 
event tracing macro at compile-time and have a minimal tracing impact.
Not in the critical path. And from my point on view not on error pathes 
too. I prefer to talk about some "slow path" : not critical, but slow.


Sylvain


On Mon, 29 Mar 2010, Ralph Castain wrote:


 Hi Abhishek
 I'm confused by the WDC wiki page, specifically the part about the
 new ORTE_NOTIFIER_DEFINE_EVENT macro. Are you saying
 that I (as the developer) have to provide this macro with a unique
 notifier id? So that would mean that ORTE/OMPI would
 have to maintain a global notifier id counter to ensure it is unique?

 If so, that seems really cumbersome. Could you please clarify?

 Thanks
 Ralph

 On Mar 29, 2010, at 8:57 AM, Abhishek Kulkarni wrote:


   ==
   [RFC 1/2]
   ==

   WHAT: Merge improvements to the "notifier" framework from the OPAL
   SOS
   and the ORTE WDC mercurial branches into the SVN trunk.

   WHY: Some improvements and interface changes were put into the ORTE
      notifier framework during the development of the OPAL SOS[1] and
      ORTE WDC[2] branches.

   WHERE: Mostly restricted to ORTE notifier files and files using the
    notifier interface in OMPI.

   TIMEOUT: The weekend of April 2-3.

   REFERENCE MERCURIAL REPOS:
   * SOS development: http://bitbucket.org/jsquyres/opal-sos-fixed/
   * WDC development: http://bitbucket.org/derbeyn/orte-wdc-fixed/

   ==

   BACKGROUND:

   The notifier interface and its components underwent a host of
   improvements and changes during the development of the SOS[1] and
   the
   WDC[2] branches.  The ORTE WDC (Warning Data Capture) branch 
enables

   accounting of events through the use of notifier interface, whereas
   OPAL SOS uses the notifier interface by setting up callbacks to
   relay
   out logged events.

   Some of the improvements include:

   - added more severity levels.
   - "ftb" notifier improvements.
   - "command" notifier improvements.
   - added "file" notifier component
   - changes in the notifier modules selection
   - activate only a subset of the callbacks
   (i.e. any combination of log, help, log_peer)
   - define different output media for any given callback (e.g.
   log_peer
   can be redirected to the syslog and smtp, while the show_help can 
be

   sent to the hnp).
   - ORTE_NOTIFIER_LOG_EVENT() (that accounts and warns about unusual
   events)

   Much more information is available on these two wiki pages:

   [1] http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
   [2] http://svn.open-mpi.org/trac/ompi/wiki/ORTEWDC

   NOTE: This is first of a two-part RFC to bring the SOS and WDC
   branches
   to the trunk. This only brings in the "notifier" changes from the
   SOS
   branch, while the rest of the branch will be brought over after the
   timeout of the second RFC.

   ==
   ___
   devel mailing list
   de...@open-mpi.org
   http://www.open-mpi.org/mailman/listinfo.cgi/devel






[OMPI devel] Some questions about checkpoint/restart (8)

2010-03-30 Thread Takayuki Seki

8th question is as follows:

(8) The result of communication which uses derived datatypes which was 
constructed
using MPI_Type_vector,MPI_Type_indexed is incorrect after taking checkpoint.

Framework : datatype
Component : datatype
The source file   : ompi/datatype/dt_copy.c
The function name : ompi_ddt_copy_content_same_ddt

Framework : crcp
Component : bkmrk
The source file   : ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c
The function name : ?

Here's the code that causes the problem:

#define SLPTIME 60
#define ITEMNUM 10

int buf[ITEMNUM][ITEMNUM];

  MPI_Type_vector(10,1,10,MPI_INT,&newdt);
  MPI_Type_commit(&newdt);
  MPI_Barrier(MPI_COMM_WORLD);

  if (rank == 0) {
MPI_Isend(&buf[0][0],1,newdt,1,1000,MPI_COMM_WORLD,&req);
printf(" rank=%d sleep start \n",rank); fflush(stdout);
sleep(SLPTIME);  /** take checkpoint at this point **/
printf(" rank=%d sleep end   \n",rank); fflush(stdout);
MPI_Wait(&req,&sts);
MPI_Type_free(&newdt);
  }
  else {
printf(" rank=%d sleep start \n",rank); fflush(stdout);
sleep(SLPTIME);  /** take checkpoint at this point **/
printf(" rank=%d sleep end   \n",rank); fflush(stdout);
MPI_Irecv(&buf[0][0],1,newdt,0,1000,MPI_COMM_WORLD,&req);
MPI_Wait(&req,&sts);
MPI_Type_free(&newdt);
  }

  for (i=0;isize=1]
  wait_quiesce_drained:xx=0 0
  wait_quiesce_drained:xx=1 100
  wait_quiesce_drained:xx=2 200
  wait_quiesce_drained:xx=3 300
  wait_quiesce_drained:xx=4 400
  wait_quiesce_drained:xx=5 500
  wait_quiesce_drained:xx=6 600
  wait_quiesce_drained:xx=7 700
  wait_quiesce_drained:xx=8 800
  wait_quiesce_drained:xx=9 900
  ompi_ddt_copy_content_same_ddt:Start size=40 flag=102/4 count=1

* I think that receiver received message correctly in the bkmrk.
  Received messages are contiguous.

* I think that the problem is copy processing in ompi_ddt_copy_content_same_ddt.
  Or is using ompi_ddt_copy_content_same_ddt function wrong?

* the first argument(datatype) of ompi_ddt_copy_content_same_ddt function in
  drain_message_copy_remove is specified by user's application
  Hexadecimal value of datatype->flags is 0x102.
  It does not contain DT_FLAG_CONTIGUOUS and it will mean derived datatype.

* I think that problem occurs at the following parts of 
ompi_ddt_copy_content_same_ddt function.
  Both source and destination use the same information of datatype which is 
specified by
  user's application.
  But source(received messages in the bkmrk) is simple contiguous messages.

  ---
   destination += datatype->true_lb;
   source  += datatype->true_lb;

  ---
  ptrdiff_t extent = (datatype->ub - datatype->lb);
   destination += extent;
   source += extent;

  ---
  pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->btypes[DT_LOOP] 
+ 1) );
   source  = (unsigned char*)source_base + pStack->disp;
   destination = (unsigned char*)destination_base + pStack->disp;

* If the source datatype is different from the destination datatype,
  Should not ompi_ddt_copy_content_same_ddt function be used?


-bash-3.2$ cat t_mpi_question-8.c
#include 
#include 
#include 
#include "mpi.h"

#define SLPTIME 60
#define ITEMNUM 10

int buf[ITEMNUM][ITEMNUM];
int main(int ac,char **av)
{
  int rank,size,cc,i,j;
  MPI_Request req;
  MPI_Status sts;
  MPI_Datatype newdt;

  MPI_Init(&ac,&av);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);

  for (i=0;i
#include 
#include 
#include "mpi.h"

#define SLPTIME 60
#define ITEMNUM 10

int buf[ITEMNUM][ITEMNUM];
int main(int ac,char **av)
{
  int rank,size,cc,i,j;
  MPI_Request req;
  MPI_Status sts;
  MPI_Datatype newdt;
  int block_length[ITEMNUM];
  int disp[ITEMNUM];

  MPI_Init(&ac,&av);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);

  for (i=0;i
#include 
#include 
#include "mpi.h"

#define ITEMNUM_1  10
#define SLPTIME60

int buf[ITEMNUM_1][ITEMNUM_1];
int main(int ac,char **av)
{
  int rank,size,cc,i,j,k;
  MPI_Request req;
  MPI_Status sts;
  MPI_Datatype newdt;
  int itmnum,newdt_size;
  int b_l[3];
  MPI_Aint dp[3],newdt_extent,newdt_lb,newdt_ub;
  MPI_Datatype dt[3];

  itmnum = 10;
  rank=0;
  MPI_Init(&ac,&av);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);

  for (i=0;i

Re: [OMPI devel] RFC 1/1: improvements to the "notifier" framework and ORTE WDC

2010-03-30 Thread Nadia Derbey
On Mon, 2010-03-29 at 09:37 -0600, Ralph Castain wrote:
> Hi Abhishek
> 
> 
> I'm confused by the WDC wiki page, specifically the part about the new
> ORTE_NOTIFIER_DEFINE_EVENT macro. Are you saying that I (as the
> developer) have to provide this macro with a unique notifier id?

Hi Ralph,

Actually ORTE_NOTIFIER_DEFINE_EVENT(, ) expands to a static
inline routine notifier_log_event_(). So I would say there is a one
to one relationship between an event id and a log_event routine. So
there is no need to do a lookup inside an array or a list.
So yes the event identifier needs to be unique, but only inside a single
source file: you can perpectly call ORTE_NOTIFIER_DEFINE_EVENT(0,
) in a .c file and ORTE_NOTIFIER_DEFINE_EVENT(0, ) in
another one.

Now, we could centralize the event ids in a .h file in the notifier
framework, but the purpose here would only be to have something
"cleaner".


>  So that would mean that ORTE/OMPI would have to maintain a global
> notifier id counter to ensure it is unique?

>From what I said before, we don't need this.

Regards,
Nadia
> 
> 
> If so, that seems really cumbersome. Could you please clarify?
> 
> 
> Thanks
> Ralph
> 
> On Mar 29, 2010, at 8:57 AM, Abhishek Kulkarni wrote:
> 
> > 
> > ==
> > [RFC 1/2]
> > ==
> > 
> > WHAT: Merge improvements to the "notifier" framework from the OPAL
> > SOS
> > and the ORTE WDC mercurial branches into the SVN trunk.
> > 
> > WHY: Some improvements and interface changes were put into the ORTE
> >notifier framework during the development of the OPAL SOS[1] and
> >ORTE WDC[2] branches.
> > 
> > WHERE: Mostly restricted to ORTE notifier files and files using the
> >  notifier interface in OMPI.
> > 
> > TIMEOUT: The weekend of April 2-3.
> > 
> > REFERENCE MERCURIAL REPOS:
> > * SOS development: http://bitbucket.org/jsquyres/opal-sos-fixed/
> > * WDC development: http://bitbucket.org/derbeyn/orte-wdc-fixed/
> > 
> > ==
> > 
> > BACKGROUND:
> > 
> > The notifier interface and its components underwent a host of
> > improvements and changes during the development of the SOS[1] and
> > the
> > WDC[2] branches.  The ORTE WDC (Warning Data Capture) branch enables
> > accounting of events through the use of notifier interface, whereas
> > OPAL SOS uses the notifier interface by setting up callbacks to
> > relay
> > out logged events.
> > 
> > Some of the improvements include:
> > 
> > - added more severity levels.
> > - "ftb" notifier improvements.
> > - "command" notifier improvements.
> > - added "file" notifier component
> > - changes in the notifier modules selection
> > - activate only a subset of the callbacks
> > (i.e. any combination of log, help, log_peer)
> > - define different output media for any given callback (e.g.
> > log_peer
> > can be redirected to the syslog and smtp, while the show_help can be
> > sent to the hnp).
> > - ORTE_NOTIFIER_LOG_EVENT() (that accounts and warns about unusual
> > events)
> > 
> > Much more information is available on these two wiki pages:
> > 
> > [1] http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
> > [2] http://svn.open-mpi.org/trac/ompi/wiki/ORTEWDC
> > 
> > NOTE: This is first of a two-part RFC to bring the SOS and WDC
> > branches
> > to the trunk. This only brings in the "notifier" changes from the
> > SOS
> > branch, while the rest of the branch will be brought over after the
> > timeout of the second RFC.
> > 
> > ==
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- 
Nadia Derbey