Re: [hpx-users] parallel_executor::post dominating APEX trace

2019-12-18 Thread Simberg Mikael
Kor,


I think you've just found a bug (all my fault...) at a very good time. There 
are indeed configuration options that affect how the scheduler steals tasks and 
it looks like I've set them to very inappropriate values. Stay tuned for a PR.


On the second point you're probably seeing your hpx_main/main function as 
run_helper. Because of some changes in APEX the task names in the OTF file get 
the first name of the task. We don't have a good solution for this yet. As a 
hack you could try doing this as the first thing in your main function (note: 
that's main if you include hpx_main.hpp or hpx_main if you include hpx_init.hpp 
or hpx_start.hpp):


hpx::util::annotate_function annotation("my_main");

hpx::this_thread::yield();


Once the task is rescheduled it should have the label "my_main".


Mikael


From: hpx-users-boun...@stellar.cct.lsu.edu 
 on behalf of Jong, K. de (Kor) 

Sent: Wednesday, December 18, 2019 3:20:59 PM
To: hpx-users@stellar.cct.lsu.edu
Subject: Re: [hpx-users] parallel_executor::post dominating APEX trace

Hi Mikael,

Thank you for your detailed and helpful answer! It starts to make sense
to me now. My program almost behaves as I expect it should. I still have
two questions though. Maybe you or someone else can point me to the
right direction?

1. I run my program on a single node and expect that all threads receive
about the same number of same-sized tasks. The APEX trace in Vampir
shows that all threads start busy but after a while gaps appear on some
OS threads during which nothing seems to happen, while other threads are
still performing tasks. I would expect tasks to be more evenly
distributed and/or to be stolen from the task queues of other OS
threads. Is this assumption correct? Can I increase the tendency of the
scheduler to steal tasks to keep OS threads busy?

2. I perform scaling tests, and each time my tasks run in parallel there
is a serial 'run_helper' task that runs on a single OS thread. What is
this and can I somehow keep it out of my timings? Based on a quick look
at the HPX code I concluded that run_helper has to do with initializing
the HPX run time. But even if I run my tasks multiple times (from the
same running process), run_helper spends time before my tasks do.

I focus on a single node now (1-96 OS threads) and am not doing anything
too clever, I think. I don't tweak the bindings and scheduler yet.

Thanks!

Kor
___
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
___
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users


Re: [hpx-users] parallel_executor::post dominating APEX trace

2019-12-18 Thread Jong, K. de (Kor)
Hi Mikael,

Thank you for your detailed and helpful answer! It starts to make sense 
to me now. My program almost behaves as I expect it should. I still have 
two questions though. Maybe you or someone else can point me to the 
right direction?

1. I run my program on a single node and expect that all threads receive 
about the same number of same-sized tasks. The APEX trace in Vampir 
shows that all threads start busy but after a while gaps appear on some 
OS threads during which nothing seems to happen, while other threads are 
still performing tasks. I would expect tasks to be more evenly 
distributed and/or to be stolen from the task queues of other OS 
threads. Is this assumption correct? Can I increase the tendency of the 
scheduler to steal tasks to keep OS threads busy?

2. I perform scaling tests, and each time my tasks run in parallel there 
is a serial 'run_helper' task that runs on a single OS thread. What is 
this and can I somehow keep it out of my timings? Based on a quick look 
at the HPX code I concluded that run_helper has to do with initializing 
the HPX run time. But even if I run my tasks multiple times (from the 
same running process), run_helper spends time before my tasks do.

I focus on a single node now (1-96 OS threads) and am not doing anything 
too clever, I think. I don't tweak the bindings and scheduler yet.

Thanks!

Kor
___
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users


Re: [hpx-users] parallel_executor::post dominating APEX trace

2019-12-17 Thread Simberg Mikael
Hi Kor,


hpx::parallel::execution::parallel_executor::post is what eventually gets 
called if you use hpx::apply (or future::then) to create tasks. post itself 
should not take up much time, even in extreme cases. It's hard to say for sure 
without more information about your application but there are at least a few 
possibilities:


  1.  You haven't annotated your tasks and your tasks end up with the default 
name set by the executor. Have you annotated your tasks and do you see them at 
all in your traces?
  2.  You have annotated your tasks and we haven't fixed the annotations for 
all use cases. Are you actually using actions or just plain local functions? 
hpx::apply or hpx::async?
  3.  Your task size is too small for our overheads. What is a typical task 
size in your case? We typically recommend a task size of at least 1 ms to be on 
the safe side, but you can most likely go a bit smaller than that, especially 
if you don't have too many cores on your machine. I think if this were the case 
the symptoms would be a bit different though, so it's most likely not this.

Mikael

From: hpx-users-boun...@stellar.cct.lsu.edu 
 on behalf of Jong, K. de (Kor) 

Sent: Tuesday, December 17, 2019 4:14:51 PM
To: hpx-users@stellar.cct.lsu.edu
Subject: [hpx-users] parallel_executor::post dominating APEX trace

Hi list,

I am using HPX commit bbc3ad7 (1.4.0-rc2 + APEX linking fix) and APEX to
gain insights into the run time behavior of my program (and hopefully
improve it, based on that). The trace I am looking at shows that by far
most of the time is spent by

hpx::parallel::execution::parallel_executor::post

instead of in my actions. Maybe this makes complete sense. Can someone
maybe explain in what situations parallel_executor::post would take up a
lot of time?

Thanks!

Kor
___
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
___
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users


[hpx-users] parallel_executor::post dominating APEX trace

2019-12-17 Thread Jong, K. de (Kor)
Hi list,

I am using HPX commit bbc3ad7 (1.4.0-rc2 + APEX linking fix) and APEX to 
gain insights into the run time behavior of my program (and hopefully 
improve it, based on that). The trace I am looking at shows that by far 
most of the time is spent by

hpx::parallel::execution::parallel_executor::post

instead of in my actions. Maybe this makes complete sense. Can someone 
maybe explain in what situations parallel_executor::post would take up a 
lot of time?

Thanks!

Kor
___
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users