Re: Why do remaining HECs busy-wait during unsafe-FFI calls?

2013-05-06 Thread Andreas Voellmy
When an unsafe call is made, the OS thread currently running on the HEC
makes the call without releasing the HEC. If the main thread was on the run
queue of the HEC making the foreign unsafe call when the foreign call was
made, then no other HECs will pick up the main thread. Hence the two sleep
calls in your program happen sequentially instead of concurrently.

I'm not completely sure what is causing the busy wait, but here is one
guess: when a GC is triggered on one HEC, it signals to all the other HECs
to stop the mutator and run the collection.  This waiting may be a busy
wait, because the wait is typically brief.  If this is true, then since one
thread is off in a unsafe foreign call, there is one HEC that refuses to
start the GC and all the other HECs are busy-waiting for the signal.  The
GC could be triggered by a period of inactivity.  Again, this is just a
guess - you might try to verify this by turning off the periodic triggering
of GC and checking whether the start GC barrier is a busy-wait.



On Mon, May 6, 2013 at 7:29 AM, Herbert Valerio Riedel h...@gnu.org wrote:

 Hello,

 Recently, I stumbled over E.Z.Yang's Safety first: FFI and
 threading[1] post and then while experimenting with unsafe-imported FFI
 functions I've noticed a somewhat surprising behaviour:

 Consider the following contrived program:

 --8---cut here---start-8---
 import Foreign.C
 import Control.Concurrent
 import Control.Monad
 import Data.Time.Clock.POSIX (getPOSIXTime)

 foreign import ccall unsafe unistd.h sleep c_sleep_unsafe :: CUInt - IO
 CUInt

 main :: IO ()
 main = do
 putStrLnTime main started
 _ - forkIO (sleepLoop 10  putStrLnTime sleepLoop finished)
 yield
 putStrLnTime after forkIO
 threadDelay (11*1000*1000) -- 11 seconds
 putStrLnTime end of main
   where
 putStrLnTime s = do
 t - getPOSIXTime
 putStrLn $ init (show t) ++ \t ++ s

 sleepLoop n = do
 n' - c_sleep_unsafe n
 unless (n' == 0) $ do
 putStrLnTime c_sleep_unsafe got interrupted
 sleepLoop n'

 --8---cut here---end---8---

 When compiled with GHC-7.6.3/linux/amd64 with -O2 -threaded and
 executed with +RTS -N4, the following output is emitted:

  1367838802.137419  main started
  1367838812.137727  after forkIO
  1367838812.137783  sleepLoop finished
  1367838823.148733  end of main

 which shows that the forkIO of the unsafe ccall effectively blocks the
 main thread;

 Moreover, when looking at the process table, I saw that 3 threads were
 occupying 100% CPU time each for 10 seconds until the 'after forkIO' was
 emitted.

 So what is happening here exactly, why do the 3 remaining HECs busy-wait
 during that FFI call instead of continuing the execution of the main
 thread?

 Do *all* foreign unsafe ccalls (even short ones) cause N-1 HECs to spend
 time in some kind of busy looping?


  [1]: http://blog.ezyang.com/2010/07/safety-first-ffi-and-threading/

 Cheers,
   hvr

 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Cloud Haskell and network latency issues with -threaded

2013-02-07 Thread Andreas Voellmy
Hi Edward,

I did two things to improve latency for my application: (1) rework the IO
manager and (2) stabilize the work pushing. (1) seems like a big win and we
are almost done with the work on that part. It is less clear whether (2)
will generally help much. It helped me when I developed it against 7.4.1,
but it doesn't seem to have much impact on HEAD on the few measurements I
did. The idea of (2) was to keep running averages of the run queue length
of each capability, then push work when these running averages get too
out-of-balance. The desired effect (which seems to work on my particular
application) is to avoid cases in which threads are pushed back and forth
among cores, which may make cache usage worse. You can see my patch here:
https://github.com/AndreasVoellmy/ghc-arv/commits/push-work-exchange-squashed
.

-Andi


On Fri, Feb 8, 2013 at 12:10 AM, Edward Z. Yang ezy...@mit.edu wrote:

 Hey folks,

 The latency changes sound relevant to some work on the scheduler I'm doing;
 is there a place I can see the changes?

 Thanks,
 Edward

 Excerpts from Simon Peyton-Jones's message of Wed Feb 06 10:10:10 -0800
 2013:
  I (with help from Kazu and helpful comments from Bryan and Johan) have
 nearly completed an overhaul to the IO manager based on my observations and
 we are in the final stages of getting it into GHC
 
  This is really helpful. Thank you very much Andreas, Kazu, Bryan, Johan.
 
  Simon
 
  From: parallel-hask...@googlegroups.com [mailto:
 parallel-hask...@googlegroups.com] On Behalf Of Andreas Voellmy
  Sent: 06 February 2013 14:28
  To: watson.timo...@gmail.com
  Cc: kosti...@gmail.com; parallel-haskell;
 glasgow-haskell-users@haskell.org
  Subject: Re: Cloud Haskell and network latency issues with -threaded
 
  Hi all,
 
  I haven't followed the conversations around CloudHaskell closely, but I
 noticed the discussion around latency using the threaded runtime system,
 and I thought I'd jump in here.
 
  I've been developing a server in Haskell that serves hundreds to
 thousands of clients over very long-lived TCP sockets. I also had latency
 problems with GHC. For example, with 100 clients I had a 10 ms
 (millisecond) latency and with 500 clients I had a 29ms latency. I looked
 into the problem and found that some bottlenecks in the threaded IO manager
 were the cause. I made some hacks there and got the latency for 100 and 500
 clients down to under 0.2 ms. I (with help from Kazu and helpful comments
 from Bryan and Johan) have nearly completed an overhaul to the IO manager
 based on my observations and we are in the final stages of getting it into
 GHC. Hopefully our work will also fix the latency issues in CloudHaskell
 programs :)
 
  It would be very helpful if someone has some benchmark CloudHaskell
 applications and workloads to test with. Does anyone have these handy?
 
  Cheers,
  Andi
 
  On Wed, Feb 6, 2013 at 9:09 AM, Tim Watson watson.timo...@gmail.com
 mailto:watson.timo...@gmail.com wrote:
  Hi Kostirya,
 
  I'm putting the parallel-haskell and ghc-users lists on cc, just in case
 other (better informed) folks want to chip in here.
 
  
 
  First of all, I'm assuming you're talking about network latency when
 compiling with -threaded - if not I apologise for misunderstanding!
 
  There is apparently an outstanding network latency issue when compiling
 with -threaded, but according to a conversation I had with the other
 developers on #haskell-distributed, this is not something that's specific
 to Cloud Haskell. It is something to do with the threaded runtime system,
 so would need to be solved for GHC (or is it just the Network package!?) in
 general. Writing up a simple C program and equivalent socket use in Haskell
 and comparing the latency using -threaded will show this up.
 
  See the latency section in
 http://haskell-distributed.github.com/wiki/networktransport.html for some
 more details. According to that, there *are* some things we might be able
 to do, but the 20% latency isn't going to change significantly on the face
 of things.
 
  We have an open ticket to look into this (
 https://cloud-haskell.atlassian.net/browse/NTTCP-4) and at some point
 we'll try and put together the sample programs in a github repository (if
 that's not already done - I might've missed previous spikes done by Edsko
 or others) and investigate further.
 
  One of the other (more experienced!) devs might be able to chip in and
 proffer a better explanation.
 
  Cheers,
  Tim
 
  On 6 Feb 2013, at 13:27, kosti...@gmail.commailto:kosti...@gmail.com
 wrote:
 
   Haven't you had a necessity to launch Haskell in no-threaded mode
 during the intense network data exchange?
   I am getting the double performance penalty in threaded mode. But I
 must use threaded mode because epoll and kevent are available in the
 threaded mode only.
  
 
  [snip]
 
  
  
   среда, 6 февраля 2013 г., 12:33:36 UTC+2 пользователь Tim Watson
 написал:
   Hello all,
  
   It's been a busy week for Cloud

Re: Cloud Haskell and network latency issues with -threaded

2013-02-07 Thread Andreas Voellmy
On Fri, Feb 8, 2013 at 12:30 AM, Edward Z. Yang ezy...@mit.edu wrote:

 OK. I think it is high priority for us to get some latency benchmarks
 into nofib so that GHC devs (including me) can start measuring changes
 off them.  I know Edsko has some benchmarks here:
 http://www.edsko.net/2013/02/06/performance-problems-with-threaded/
 but they depend on network which makes it a little difficult to move into
 nofib.
 I'm working on other scheduler changes that may help you guys out; we
 should keep each other updated.


That would be great :)



 I noticed your patch also incorporates the make yield actually work
 patch;
 do you think the improvement in 7.4.1 was due to that specific change?


Actually, I believe that patch is irrelevant to the scheduler change and
probably should not be in there, strictly speaking. I actually needed that
patch for the IO manager revisions to work properly.


 (Have you instrumented the run queues and checked how your patch changes
 the distribution of jobs over your runtime?)

 I didn't do this very rigorously, but I think I added some print
statements in the scheduler and I looked at some eventlogs in threadscope
to see that threads work pushing slows down after a while. I had planned to
write a script to analyze an event log file to extract these stats, but I
never got around to it.

-Andi



 Somewhat unrelatedly, if you have some good latency tests already,
 it may be worth a try compiling your copy of GHC -fno-omit-yields, so that
 forced context switches get serviced more predictably.

 Cheers,
 Edward

 Excerpts from Andreas Voellmy's message of Thu Feb 07 21:20:25 -0800 2013:
  Hi Edward,
 
  I did two things to improve latency for my application: (1) rework the IO
  manager and (2) stabilize the work pushing. (1) seems like a big win and
 we
  are almost done with the work on that part. It is less clear whether (2)
  will generally help much. It helped me when I developed it against 7.4.1,
  but it doesn't seem to have much impact on HEAD on the few measurements I
  did. The idea of (2) was to keep running averages of the run queue length
  of each capability, then push work when these running averages get too
  out-of-balance. The desired effect (which seems to work on my particular
  application) is to avoid cases in which threads are pushed back and forth
  among cores, which may make cache usage worse. You can see my patch here:
 
 https://github.com/AndreasVoellmy/ghc-arv/commits/push-work-exchange-squashed
  .
 
  -Andi
 
  On Fri, Feb 8, 2013 at 12:10 AM, Edward Z. Yang ezy...@mit.edu wrote:
 
   Hey folks,
  
   The latency changes sound relevant to some work on the scheduler I'm
 doing;
   is there a place I can see the changes?
  
   Thanks,
   Edward
  
   Excerpts from Simon Peyton-Jones's message of Wed Feb 06 10:10:10 -0800
   2013:
I (with help from Kazu and helpful comments from Bryan and Johan)
 have
   nearly completed an overhaul to the IO manager based on my
 observations and
   we are in the final stages of getting it into GHC
   
This is really helpful. Thank you very much Andreas, Kazu, Bryan,
 Johan.
   
Simon
   
From: parallel-hask...@googlegroups.com [mailto:
   parallel-hask...@googlegroups.com] On Behalf Of Andreas Voellmy
Sent: 06 February 2013 14:28
To: watson.timo...@gmail.com
Cc: kosti...@gmail.com; parallel-haskell;
   glasgow-haskell-users@haskell.org
Subject: Re: Cloud Haskell and network latency issues with -threaded
   
Hi all,
   
I haven't followed the conversations around CloudHaskell closely,
 but I
   noticed the discussion around latency using the threaded runtime
 system,
   and I thought I'd jump in here.
   
I've been developing a server in Haskell that serves hundreds to
   thousands of clients over very long-lived TCP sockets. I also had
 latency
   problems with GHC. For example, with 100 clients I had a 10 ms
   (millisecond) latency and with 500 clients I had a 29ms latency. I
 looked
   into the problem and found that some bottlenecks in the threaded IO
 manager
   were the cause. I made some hacks there and got the latency for 100
 and 500
   clients down to under 0.2 ms. I (with help from Kazu and helpful
 comments
   from Bryan and Johan) have nearly completed an overhaul to the IO
 manager
   based on my observations and we are in the final stages of getting it
 into
   GHC. Hopefully our work will also fix the latency issues in
 CloudHaskell
   programs :)
   
It would be very helpful if someone has some benchmark CloudHaskell
   applications and workloads to test with. Does anyone have these handy?
   
Cheers,
Andi
   
On Wed, Feb 6, 2013 at 9:09 AM, Tim Watson watson.timo...@gmail.com
   mailto:watson.timo...@gmail.com wrote:
Hi Kostirya,
   
I'm putting the parallel-haskell and ghc-users lists on cc, just in
 case
   other (better informed) folks want to chip in here.
   

   
First of all, I'm assuming you're talking about

Re: Cloud Haskell and network latency issues with -threaded

2013-02-06 Thread Andreas Voellmy
Hi Edsko,

Can you explain the figure linked to on that page a bit? E.g. how should
the axes be labelled?


On Wed, Feb 6, 2013 at 9:33 AM, Edsko de Vries ed...@well-typed.com wrote:

 Hi,

 Just for clarity's sake (as the author of that Latency section that Tim
 referred to): I have addressed all of the issues listed there in
 Network.Transport.TCP, with the exception of the first (the -threaded
 issue). As Tim points out, this is not a Cloud Haskell specific issue; I
 have written this up as a short blog post at
 http://www.edsko.net/2013/02/06/performance-problems-with-threaded .

 Edsko



 On Wednesday, 6 February 2013 14:09:22 UTC, Tim Watson wrote:

 Hi Kostirya,

 I'm putting the parallel-haskell and ghc-users lists on cc, just in case
 other (better informed) folks want to chip in here.

 

 First of all, I'm assuming you're talking about network latency when
 compiling with -threaded - if not I apologise for misunderstanding!

 There is apparently an outstanding network latency issue when compiling
 with -threaded, but according to a conversation I had with the other
 developers on #haskell-distributed, this is not something that's specific
 to Cloud Haskell. It is something to do with the threaded runtime system,
 so would need to be solved for GHC (or is it just the Network package!?) in
 general. Writing up a simple C program and equivalent socket use in Haskell
 and comparing the latency using -threaded will show this up.

 See the latency section in http://haskell-distributed.**github.com/wiki/*
 *networktransport.htmlhttp://haskell-distributed.github.com/wiki/networktransport.htmlfor
  some more details. According to that, there *are* some things we might
 be able to do, but the 20% latency isn't going to change significantly on
 the face of things.

 We have an open ticket to look into this (https://cloud-haskell.**
 atlassian.net/browse/NTTCP-4https://cloud-haskell.atlassian.net/browse/NTTCP-4)
 and at some point we'll try and put together the sample programs in a
 github repository (if that's not already done - I might've missed previous
 spikes done by Edsko or others) and investigate further.

 One of the other (more experienced!) devs might be able to chip in and
 proffer a better explanation.

 Cheers,
 Tim


 On 6 Feb 2013, at 13:27, kost...@gmail.com wrote:

  Haven't you had a necessity to launch Haskell in no-threaded mode
 during the intense network data exchange?
  I am getting the double performance penalty in threaded mode. But I
 must use threaded mode because epoll and kevent are available in the
 threaded mode only.
 

 [snip]

 
 
  среда, 6 февраля 2013 г., 12:33:36 UTC+2 пользователь Tim Watson
 написал:
  Hello all,
 
  It's been a busy week for Cloud Haskell and I wanted to share a few of
  our news items with you all.
 
  Firstly, we have a new home page at http://haskell-distributed.**
 github.com http://haskell-distributed.github.com,
  into which most of the documentation and wiki pages have been merged.
 Making
  sassy looking websites is not really my bag, so I'm very grateful to
 the
  various author's whose Creative Commons licensed designs and layouts
 made
  it easy to put together. We've already had some pull requests to fix
 minor
  problems on the site, so thanks very much to those who've contributed
 already!
 
  As well as the new site, you will find a few of us hanging out on the
  #haskell-distributed channel on freenode. Please do come along and join
 in
  the conversation.
 
  We also recently split up the distributed-process project into separate
  git repositories, one for each component that makes up Cloud Haskell.
 This
  was done partly for administrative purposes and partly because we're in
 the
  process of setting up CI builds for all the projects.
 
  Finally, we've moved from Github's issue tracker to a hosted
 Jira/Bamboo setup
  at 
  https://cloud-haskell.**atlassian.nethttps://cloud-haskell.atlassian.net-
   pull requests are naturally still welcome
  via Github! Although you can browse issues freely without logging in,
 you will
  need to provide an email address and get an account in order to submit
 new ones.
  If you have any difficulties logging in, please don't hesitate to
 contact me
  directly, via this forum or the cloud-haskell-developers mailing list
 (on
  google groups).
 
  As always, we'd be delighted to hear any feedback!
 
  Cheers,
  Tim

  --
 You received this message because you are subscribed to the Google Groups
 parallel-haskell group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to parallel-haskell+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users