Re: Expanding a particular type family in type errors

2020-05-01 Thread Alexis King
Answering my own question: it looks like cleaning this up in GHC.Tc.Errors is 
rather impractical—there are just too many types buried inside other datatypes 
that would have to be updated—so it has to be wired into pprType. Fortunately, 
I can branch on userStyle to decide whether or not to do the expansion. This 
still seems like a bit of a hack, but I can’t think of a better way. Do tell me 
if there’s a better approach!

> On May 1, 2020, at 16:36, Alexis King  wrote:
> 
> Hi all,
> 
> As part of my ongoing rework of arrow notation 
> , I have introduced 
> two wired-in type families, which assist in the typechecking process. The 
> details are not terribly important (you can read the proposal if you’d like 
> for those), but the relevant detail is that equalities of the shape
> 
> ArrowStackTup a ~ b
> 
> are produced by the typechecker in key places. ArrowStackTup is one of my 
> wired-in type families, and it’s really an implementation detail. 
> Unfortunately, I often end up with type errors that leak this detail, with 
> expected/actual types like these ones:
> 
> Expected type: ArrowStackTup '[Int] -> Bool
>   Actual type: Int -> String
> 
> This is quite annoying, as it’s exceedingly rare that these type families 
> actually get stuck, so they can almost always be expanded in type errors. As 
> it happens, `ArrowStackTup '[a]` expands to simply `a`, so the type error I 
> would like to report is this one:
> 
> Expected type: Int -> Bool
>   Actual type: Int -> String
> 
> Not technically challenging, but I find myself faced with the question of 
> where this special expansion logic ought to go. It seems like it could go in 
> any of several places:
> 
> It could be hard-wired into pprType, and perhaps selectively disabled with an 
> -fprint-* flag. This is nice in that it’s universal, but it’s almost 
> certainly a step too far: the casts for ArrowStackTup still end up in Core, 
> and expanding the type synonyms there would be quite confusing.
> 
> The expansion could happen in tidyType, since it’s called before reporting an 
> error. But this seems probably even worse than putting it in pprType, since 
> it’s still used in too many places, and it isn’t supposed to actually change 
> the type.
> 
> It could be handled as an extra, ad-hoc preprocessing step in reportWanteds. 
> This is much closer to reasonable, though it feels like quite a hack.
> 
> A separate error Reporter could catch these errors before the other reporters 
> do and perform the expansion there. But I don’t think this actually makes 
> sense, as the above example demonstrates that ArrowStackTup might be buried 
> inside another type and in fact might not actually be the source of the type 
> error at all!
> 
> It could be done last-minute in mkEqErr. But I don’t know if this is too 
> late, and ArrowStackTup could leak into an error through some other code path.
> 
> Of those options, the best one I’ve come up with seems to be option 3, an 
> ad-hoc preprocessing step in reportWanteds. Does that seem reasonable? Or is 
> it too much of a kludge?
> 
> Thanks,
> Alexis

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Expanding a particular type family in type errors

2020-05-01 Thread Alexis King
Hi all,

As part of my ongoing rework of arrow notation 
, I have introduced 
two wired-in type families, which assist in the typechecking process. The 
details are not terribly important (you can read the proposal if you’d like for 
those), but the relevant detail is that equalities of the shape

ArrowStackTup a ~ b

are produced by the typechecker in key places. ArrowStackTup is one of my 
wired-in type families, and it’s really an implementation detail. 
Unfortunately, I often end up with type errors that leak this detail, with 
expected/actual types like these ones:

Expected type: ArrowStackTup '[Int] -> Bool
  Actual type: Int -> String

This is quite annoying, as it’s exceedingly rare that these type families 
actually get stuck, so they can almost always be expanded in type errors. As it 
happens, `ArrowStackTup '[a]` expands to simply `a`, so the type error I would 
like to report is this one:

Expected type: Int -> Bool
  Actual type: Int -> String

Not technically challenging, but I find myself faced with the question of where 
this special expansion logic ought to go. It seems like it could go in any of 
several places:

It could be hard-wired into pprType, and perhaps selectively disabled with an 
-fprint-* flag. This is nice in that it’s universal, but it’s almost certainly 
a step too far: the casts for ArrowStackTup still end up in Core, and expanding 
the type synonyms there would be quite confusing.

The expansion could happen in tidyType, since it’s called before reporting an 
error. But this seems probably even worse than putting it in pprType, since 
it’s still used in too many places, and it isn’t supposed to actually change 
the type.

It could be handled as an extra, ad-hoc preprocessing step in reportWanteds. 
This is much closer to reasonable, though it feels like quite a hack.

A separate error Reporter could catch these errors before the other reporters 
do and perform the expansion there. But I don’t think this actually makes 
sense, as the above example demonstrates that ArrowStackTup might be buried 
inside another type and in fact might not actually be the source of the type 
error at all!

It could be done last-minute in mkEqErr. But I don’t know if this is too late, 
and ArrowStackTup could leak into an error through some other code path.

Of those options, the best one I’ve come up with seems to be option 3, an 
ad-hoc preprocessing step in reportWanteds. Does that seem reasonable? Or is it 
too much of a kludge?

Thanks,
Alexis___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [RFC] Compiler pipeline timings visualization

2020-05-01 Thread Ben Gamari
Sergej Jaskiewicz via ghc-devs  writes:

> tl;dr: I propose adding a new GHC flag for generating a log that allows
> visualizing how much time each stage in the compiler pipleline took, similar
> to Clang's -ftime-trace.
>
> Hello, everyone.

Hi Sergej,


[snip]

> The latter option is much closer to what we need. If we link the GHC 
> executable
> with the -eventlog flag, then various significant events will be written to
> a special log file. For example, "Parser began parsing the Main.hs file after
> 5 ms since GHC has started, and ended parsing it 3 ms after that".
> The resulting log file can be parsed with the ghc-events library [4], and also
> visualized using the ThreadScope app [5].
>
I'm a bit short on time but here are a few points in no particular order:

Out of curiosity, have you seen Alp's work [1] in this area? This work
allows use to the ghc-events-analyze package [2] to visualize (e.g. [3])
the existing withTiming eventlog output in a reasonably easy-to-consume
format.

That being said, I can certainly see the benefit of using the Chrome
tracing infrastructure for this; it would make for a nicer user
experience than the static image that ghc-events-analyze spits out.

I also know that Matthew Pickering has been working in this area and I'm
sure will have something interesting to add.

I will admit that I am a bit reluctant to add support for a *particular*
tracing format to GHC itself. I think it would make the most sense for
GHC to produce a consumer-agnostic trace representation (via our
existing eventlog mechanism) and for users to transform this into the
format their tools require.

Our current withTimings infrastructure is quite ad-hoc and could perhaps
expose more detail. It would be great to know what sorts of detail
people would find useful.

Cheers,

- Ben


[1] https://www.haskell.org/ghc/blog/20190924-eventful-ghc.html
[2] https://hackage.haskell.org/package/ghc-events-analyze
[3] https://www.haskell.org/ghc/blog/images/eventful-ghc/ghc-events-viz.svg


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [RFC] Compiler pipeline timings visualization

2020-05-01 Thread Ben Gamari
Andreas Klebinger  writes:

> Hi Sergej,
>
> I think this is a good idea in general, and it seems you did some great
> work there already.
> Something like this can also help with pinpointing performance issues
> inside the compiler
> so would not just be useful to end users.
>
> intuitively I would assume that instead of adding another way
> to produce compiler events we should:
> * Ship GHC with eventlog enabled by default

For what it's worth, I have also been thinking about proposing this.
I've been doing a lot of performance characterisation recently where I
would have liked to use our binary distributions. Sadly this isn't an
option as the ghc executable doesn't support the eventlog.

I've not taken measurements on the eventlog overhead in ghc-bin but I
suspect that any overhead that *does* exist can be easily eliminated (as
I describe in #17949). We just need to ensure that
Debug.Trace.{traceEvent,traceMarker} know not to pack their buffers if
their respective tracing flags aren't enabled.

Frankly, I also question the value of shipping the non-tracing-enabled
RTS at all. Enabling tracing by default would allow us to eliminate
three whole RTS builds from CI, which I suspect would be worthwhile.
The size overhead on a statically-linked executable appears to be fairly
small in the grand scheme of things:

-rw-r--r-- 1 ben users 6.0M Apr 26 15:11 libHSrts-1.0.a
-rw-r--r-- 1 ben users 6.4M Apr 26 15:21 libHSrts-1.0_l.a



signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Justifying sched_yield() in the RTS

2020-05-01 Thread Ben Gamari
Ben Gamari  writes:

> Travis Whitaker  writes:
>
>> Hello GHC devs,
>>
>> Through the course of reading some recent material posted by Linus
>> Torvalds ^1 ^2, I remembered stumbling upon GHC Trac #3553 ^3 some
>> years ago.
>>
> For what it's worth Simon Marlow and I discussed this a few weeks ago
> and he agreed that it would be worth re-running the futex experiments.

Whoops. The above should have read "mutex" experiments. Of course,
perhaps direct futex usage is also worth evaluating but simpler
is better if there is no performance trade-off.

Cheerss,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Justifying sched_yield() in the RTS

2020-05-01 Thread Ben Gamari
Travis Whitaker  writes:

> Hello GHC devs,
>
> Through the course of reading some recent material posted by Linus
> Torvalds ^1 ^2, I remembered stumbling upon GHC Trac #3553 ^3 some
> years ago.
>
For what it's worth Simon Marlow and I discussed this a few weeks ago
and he agreed that it would be worth re-running the futex experiments.
I do suspect there is good money on the table here on larger/busier
machines. It would be great if someone could pick this up!

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: git prune?

2020-05-01 Thread Ben Gamari
Simon Peyton Jones via ghc-devs  writes:

> I'm getting lots of the messages below.
> Running 'git prune' makes no difference.
> Simon
>
Indeed, the warning is on the server side (hence the "remote:" prefix).
I'll need to take care of this on the server.

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


git prune?

2020-05-01 Thread Simon Peyton Jones via ghc-devs
I'm getting lots of the messages below.
Running 'git prune' makes no difference.
Simon


simonpj@MSRC-3645512:~/code/HEAD-5/testsuite/tests/th$ git push --set-upstream 
origin wip/T18121

Counting objects: 13, done.

Delta compression using up to 20 threads.

Compressing objects: 100% (13/13), done.

Writing objects: 100% (13/13), 1.64 KiB | 418.00 KiB/s, done.

Total 13 (delta 11), reused 0 (delta 0)

remote:

remote: To create a merge request for wip/T18121, visit:

remote:   
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/new?merge_request%5Bsource_branch%5D=wip%2FT18121

remote:

remote: warning: The last gc run reported the following. Please correct the 
root cause

remote: and remove gc.log.

remote: Automatic cleanup will not be performed until the file is removed.

remote:

remote: warning: There are too many unreachable loose objects; run 'git prune' 
to remove them.

remote:
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Justifying sched_yield() in the RTS

2020-05-01 Thread Niklas Hambüchen
There are more related updates in 
https://gitlab.haskell.org/ghc/ghc/issues/9221, also including a short 
discussion of Linus's post.

Simon Marlow's overall response was:

> I'm very supportive of making this better, but as usual I will require 
> thorough data to back up any changes :)
>
> Everything I tried in the past made things worse. Including an experiment I 
> did to use futexes directly: 
> https://gitlab.haskell.org/ghc/ghc/issues/3553?cversion=0&cnum_hist=14#note_39009

So it sounds like this topic is currently in the stage of:

Somebody needs to take the time to re-do that benchmark done 10 years ago.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: [RFC] Compiler pipeline timings visualization

2020-05-01 Thread Simon Peyton Jones via ghc-devs
This sounds great!

Event logging needn't add much cost if we don't log much -- eg just compiler 
phases.  It simply depends on how many events you generate!

For naïve uses, the log you show in [9] is a bit complicated.  Something even 
simpler might be good.

Simon

|  -Original Message-
|  From: ghc-devs  On Behalf Of Andreas
|  Klebinger
|  Sent: 01 May 2020 11:04
|  To: Sergej Jaskiewicz 
|  Cc: ghc-devs@haskell.org
|  Subject: Re: [RFC] Compiler pipeline timings visualization
|  
|  Hi Sergej,
|  
|  I think this is a good idea in general, and it seems you did some great
|  work there already.
|  Something like this can also help with pinpointing performance issues
|  inside the compiler
|  so would not just be useful to end users.
|  
|  intuitively I would assume that instead of adding another way
|  to produce compiler events we should:
|  * Ship GHC with eventlog enabled by default
|  * Provide an official converter from eventLog to the chrome trace format.
|  
|  The eventlog format is quite flexible, and if it's unsuitable to what
|  you want I would
|  prefer to extend it rather than adding support for additional formats
|  inside GHC itself.
|  
|  This way we:
|  * Continue to have one unified way to dump events from haskell apps (the
|  eventlog)
|  * Users need not go to third party apps, as the converter could
|  reasonably come with GHC (like hp2ps)
|  * We are free to include information that can't (easily) be encoded in
|  the chrome format.
|  
|  The downside is that users will have to invoke ghc, followed by some
|  other tool to get the
|  chrome trace, but to me that seems like a worthwhile tradeoff for
|  keeping the compiler and
|  various visualization formats somewhat separated.
|  
|  The obvious question there is how much enabling the eventlog by default
|  would impact non-logging ghc
|  invocations. I have not measured this and it might rule out this
|  approach if it has a big impact and isn't
|  easily corrected.
|  
|  As a last point I want to encourage you to open a ticket about this.
|  Mailing list threads tend to be harder to follow and find down the line
|  then tickets in my experience.
|  
|  Cheers,
|  Andreas
|  
|  Sergej Jaskiewicz via ghc-devs schrieb am 01.05.2020 um 11:09:
|  > tl;dr: I propose adding a new GHC flag for generating a log that allows
|  > visualizing how much time each stage in the compiler pipleline took,
|  similar
|  > to Clang's -ftime-trace.
|  >
|  > Hello, everyone.
|  >
|  > I'd like to tell you about a feature I've been working on recently,
|  namely,
|  > the ability to generate and visualize how long each stage of the
|  compilation
|  > pipeline takes to complete. This is basically about the same as
|  > the -ftime-trace flag that has landed in Clang several months ago [1].
|  >
|  > The initial motivation for this feature was the desire to understand
|  why
|  > compilation times are so long when using LLVM as backend. But later I
|  realized
|  > that this functionality is useful on its own for end users, not just
|  GHC devs,
|  > so it would make sense to add a new flag -ddump-time-trace.
|  >
|  > Since not only does Haskell have complex type system, but also there is
|  > a variety of language extensions, we, the Haskell users, often
|  experience
|  > very long compilation times. For instance, the usage of the
|  TypeFaimilies
|  > extension can considerably slow down the compilation process [2].
|  > It is useful to understand how you can fix your code so that it
|  compiles faster.
|  >
|  > There are two options for that right now:
|  > - Building GHC from source for profiling and using the just-built GHC
|  for
|  >compiling your problem code.
|  > - Building the compiler from source with event log support [3].
|  >
|  > The former option just doesn't do it, since the output would be
|  > "which GHC function calls took how much time", but there'd be no
|  information
|  > about which part of the user code was being compiled.
|  >
|  > The latter option is much closer to what we need. If we link the GHC
|  executable
|  > with the -eventlog flag, then various significant events will be
|  written to
|  > a special log file. For example, "Parser began parsing the Main.hs file
|  after
|  > 5 ms since GHC has started, and ended parsing it 3 ms after that".
|  > The resulting log file can be parsed with the ghc-events library [4],
|  and also
|  > visualized using the ThreadScope app [5].
|  >
|  > Bu this approach has its disadvantages.
|  >
|  > Firstly, if the user wants visualization, they'd have to install
|  ThreadScope.
|  > Some companies' internal policies prohibit installing third-party apps
|  from
|  > the internet. It would be good if we could generate a log that could be
|  > visualized on any computer with a browser. For that we could use
|  > the Chrome Trace Event format [6]. This is an ordinary JSON file with a
|  specific
|  > structure that can be viewed in the Chrome browser by going to
| 

Re: [RFC] Compiler pipeline timings visualization

2020-05-01 Thread Andreas Klebinger

Hi Sergej,

I think this is a good idea in general, and it seems you did some great
work there already.
Something like this can also help with pinpointing performance issues
inside the compiler
so would not just be useful to end users.

intuitively I would assume that instead of adding another way
to produce compiler events we should:
* Ship GHC with eventlog enabled by default
* Provide an official converter from eventLog to the chrome trace format.

The eventlog format is quite flexible, and if it's unsuitable to what
you want I would
prefer to extend it rather than adding support for additional formats
inside GHC itself.

This way we:
* Continue to have one unified way to dump events from haskell apps (the
eventlog)
* Users need not go to third party apps, as the converter could
reasonably come with GHC (like hp2ps)
* We are free to include information that can't (easily) be encoded in
the chrome format.

The downside is that users will have to invoke ghc, followed by some
other tool to get the
chrome trace, but to me that seems like a worthwhile tradeoff for
keeping the compiler and
various visualization formats somewhat separated.

The obvious question there is how much enabling the eventlog by default
would impact non-logging ghc
invocations. I have not measured this and it might rule out this
approach if it has a big impact and isn't
easily corrected.

As a last point I want to encourage you to open a ticket about this.
Mailing list threads tend to be harder to follow and find down the line
then tickets in my experience.

Cheers,
Andreas

Sergej Jaskiewicz via ghc-devs schrieb am 01.05.2020 um 11:09:

tl;dr: I propose adding a new GHC flag for generating a log that allows
visualizing how much time each stage in the compiler pipleline took, similar
to Clang's -ftime-trace.

Hello, everyone.

I'd like to tell you about a feature I've been working on recently, namely,
the ability to generate and visualize how long each stage of the compilation
pipeline takes to complete. This is basically about the same as
the -ftime-trace flag that has landed in Clang several months ago [1].

The initial motivation for this feature was the desire to understand why
compilation times are so long when using LLVM as backend. But later I realized
that this functionality is useful on its own for end users, not just GHC devs,
so it would make sense to add a new flag -ddump-time-trace.

Since not only does Haskell have complex type system, but also there is
a variety of language extensions, we, the Haskell users, often experience
very long compilation times. For instance, the usage of the TypeFaimilies
extension can considerably slow down the compilation process [2].
It is useful to understand how you can fix your code so that it compiles faster.

There are two options for that right now:
- Building GHC from source for profiling and using the just-built GHC for
   compiling your problem code.
- Building the compiler from source with event log support [3].

The former option just doesn't do it, since the output would be
"which GHC function calls took how much time", but there'd be no information
about which part of the user code was being compiled.

The latter option is much closer to what we need. If we link the GHC executable
with the -eventlog flag, then various significant events will be written to
a special log file. For example, "Parser began parsing the Main.hs file after
5 ms since GHC has started, and ended parsing it 3 ms after that".
The resulting log file can be parsed with the ghc-events library [4], and also
visualized using the ThreadScope app [5].

Bu this approach has its disadvantages.

Firstly, if the user wants visualization, they'd have to install ThreadScope.
Some companies' internal policies prohibit installing third-party apps from
the internet. It would be good if we could generate a log that could be
visualized on any computer with a browser. For that we could use
the Chrome Trace Event format [6]. This is an ordinary JSON file with a specific
structure that can be viewed in the Chrome browser by going to
the chrome://tracing page, or on https://speedscope.app. A file in exactly this
format would be generated by Clang if you passed it the -ftime-trace flag.

Secondly, the event log contains many events that are not relevant to the end
user, for example, thread creation, memory allocation, etc.

As an initial proof of concept, I've developed a command line tool for
transforming event log files to Chrome Trace Event files [7].

Thirdly, in order for the event log to be generated, you'd still have to build
GHC from source. The majority of the GHC users won't go this way. Not only
would it require some basic understanding of the GHC building process, but also
building itself takes quite some time. It would be great if the needed
functionality came with GHC out of the box.

This is why I've added support for generating Trace Event files into my fork
of GHC [8], and I would like to propose including this f

[RFC] Compiler pipeline timings visualization

2020-05-01 Thread Sergej Jaskiewicz via ghc-devs
tl;dr: I propose adding a new GHC flag for generating a log that allows
visualizing how much time each stage in the compiler pipleline took, similar
to Clang's -ftime-trace.

Hello, everyone.

I'd like to tell you about a feature I've been working on recently, namely,
the ability to generate and visualize how long each stage of the compilation
pipeline takes to complete. This is basically about the same as
the -ftime-trace flag that has landed in Clang several months ago [1].

The initial motivation for this feature was the desire to understand why
compilation times are so long when using LLVM as backend. But later I realized
that this functionality is useful on its own for end users, not just GHC devs,
so it would make sense to add a new flag -ddump-time-trace.

Since not only does Haskell have complex type system, but also there is
a variety of language extensions, we, the Haskell users, often experience
very long compilation times. For instance, the usage of the TypeFaimilies 
extension can considerably slow down the compilation process [2].
It is useful to understand how you can fix your code so that it compiles faster.

There are two options for that right now:
- Building GHC from source for profiling and using the just-built GHC for
  compiling your problem code.
- Building the compiler from source with event log support [3].

The former option just doesn't do it, since the output would be
"which GHC function calls took how much time", but there'd be no information
about which part of the user code was being compiled.

The latter option is much closer to what we need. If we link the GHC executable
with the -eventlog flag, then various significant events will be written to
a special log file. For example, "Parser began parsing the Main.hs file after
5 ms since GHC has started, and ended parsing it 3 ms after that".
The resulting log file can be parsed with the ghc-events library [4], and also
visualized using the ThreadScope app [5].

Bu this approach has its disadvantages.

Firstly, if the user wants visualization, they'd have to install ThreadScope.
Some companies' internal policies prohibit installing third-party apps from
the internet. It would be good if we could generate a log that could be
visualized on any computer with a browser. For that we could use
the Chrome Trace Event format [6]. This is an ordinary JSON file with a specific
structure that can be viewed in the Chrome browser by going to
the chrome://tracing page, or on https://speedscope.app. A file in exactly this
format would be generated by Clang if you passed it the -ftime-trace flag.

Secondly, the event log contains many events that are not relevant to the end
user, for example, thread creation, memory allocation, etc.

As an initial proof of concept, I've developed a command line tool for
transforming event log files to Chrome Trace Event files [7].

Thirdly, in order for the event log to be generated, you'd still have to build
GHC from source. The majority of the GHC users won't go this way. Not only
would it require some basic understanding of the GHC building process, but also
building itself takes quite some time. It would be great if the needed
functionality came with GHC out of the box.

This is why I've added support for generating Trace Event files into my fork
of GHC [8], and I would like to propose including this feature into the mainline
GHC.

(Note that my implementation is still a bit buggy, for example, it only works
in -j mode. This will be fixed.)

You can now execute the following command:

ghc -ddump-to-file -ddump-file-prefix="Main." -ddump-time-trace -j -O3 Main.hs

And receive this image [9]. Here, we've compiled two modules, one of which
depends on the other.

One more difference from event log is that now various metadata
(like file and module names) are emitted as a separate JSON attribute, instead
of being encoded in the name of an event. For example, parsing the Main.hs file
and parsing the QSort.hs file in one compilation are events with the same name,
but different metadata. We can group them together if we want to know how much
time the parsing took overall. The event log format doesn't allow us to do it.

So, we can now generate a Trace Event file from a single GHC invocation.
However, most projects use build systems that invoke the compiler many times.
It would be good if we could form a log for the whole project.

This is solved by merging logs. However, there is one subtlety: the events in
logs use relative timestamps (the number of microseconds since the process
has started). To combine logs from multiple processes, we add a clock
synchronization point into each trace log in the form of an additional
'beginningOfTime' JSON attribute that contains the absolute time when
the proccess has started.

There is a Python script that performs the actual merging [10].

This is how it looks like with multiple processes [11].

Also, I've implemented generation of the 'beginningOfTime' attribute in
LLVM [12],

Justifying sched_yield() in the RTS

2020-05-01 Thread Travis Whitaker
Hello GHC devs,

Through the course of reading some recent material posted by Linus
Torvalds ^1 ^2, I remembered stumbling upon GHC Trac #3553 ^3 some
years ago.

Looking past the harsh tone Linus used in his notes, I think he makes
some pretty reasonable points about the problems that can be caused by
using spinlocks in userspace. Specifically:

- A group of threads yielding while waiting for a spinlock are, in
effect, simply trying to coax the scheduler into scheduling the thread
that is holding the lock. This is much easier for the scheduler to do
correctly with a futex or similar, since the scheduler has some
context around which tasks are blocking/waking each other up.
- Apparently the Linux scheduler (and maybe other schedulers) has some
smarts for preserving cache locality while choosing which physical
execution units run which tasks, and reordering threads in the run
list in an ad-hoc way with sched_yield() interferes with this
mechanism.
- Using sched_yield() (or other random interference with run lists)
can cause disproportionate havoc on NUMA systems, where jostling
around the lock-holding thread by burning time slices and yielding is
especially costly.

All that said, there is perhaps one concrete advantage (aside from
absolute performance) to the current spinlock implementation: the only
functionality it requires from the host OS is a yield-like operation.
A yielding spinlock occupies a comfortable local optimum, achieving
decent performance across platforms with minimal exposure to cross-OS
API differences.

Revisiting #3553, it seems the only reason that a futex was not used
in place of a spinlock with sched_yield() was that, despite all of the
points against it, the spinlock code empirically performed better.
However, these tests were performed years ago and the futex
implementation in the Linux kernel has changed quite a bit.

I think there is a healthy amount of empirical evidence from the GHC
user community that there are problems afoot with the way parallel GC
does locking, and indeed I wonder if the bad cases Linus described are
playing a role. Advice like "use -qg" or "use -qnM with small M" is
often parroted (this Twitter thread ^4 contains both pieces of
advice). Furthermore, I would wager that an RTS using futexes for
locking rather than spinning plays nicer with other CPU intensive
tasks on the same machine. The "scheduler storm" caused by a large
number of threads waiting for a spinlock could have a negative impact
on neighboring processes, e.g. a large number of HECs spinning
certainly don't do any favors for a busy neighboring DB process.

I'm curious if others have thoughts on reviving the futex
investigation. Perhaps the futexes provided by modern Linux are more
competitive with the current spinlock implementation, or perhaps
better scaling on machines with high core counts is worth some cost.
In the case that futexes remain handily defeated by yielding
spinlocks, it's a worthy endeavor to explain precisely _why_ this
happens. Other programs have seen performance issues crop up when the
kernel makes subtle changes to how sched_yield() works. ^5


1: https://www.realworldtech.com/forum/?threadid=189711&curpostid=189723
2: https://www.realworldtech.com/forum/?threadid=189711&curpostid=189752
3: https://gitlab.haskell.org/ghc/ghc/issues/3553
4: https://twitter.com/ndm_haskell/status/1076187051610042368?s=20
5: https://lwn.net/Articles/31462/
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs