Re: revenge of CALLOC_STRUCT

2022-01-07 Thread Michael Blumenkrantz
I've used it at least once too.


Mike

On Fri, 07 Jan 2022 17:20:11 -0800
Kenneth Graunke  wrote:

> I've definitely used the pipe_reference_described debugging code to
> track down some issues when working on iris.
> 
> --Ken
> 
> On Monday, December 27, 2021 4:18:16 PM PST Marek Olšák wrote:
> > I remember that it wasn't unusual on Windows to define malloc, calloc,
> > strdup, and free macros to redirect those calls to custom wrappers. That
> > would eliminate the need to have non-standard names like MALLOC.
> > 
> > There is also the pipe_reference debug code. Is anybody using that?
> > 
> > Marek
> > 
> > On Sun., Dec. 26, 2021, 05:37 Jose Fonseca,  wrote:
> >   
> > > I believe that as long as the CALLOC_STRUCT continue to get paired with
> > > right FREE call (or equivalent if these get renamed) either way should 
> > > work
> > > for us.  Whatever option proposed gets followed, there's a risk these can
> > > get out of balanced, but the risk seems comparable.
> > >
> > >
> > > For context, IIRC, the main reason these macros remain useful for VMware
> > > is the sad state of memory debugging tools on Windows.  AFAICT, best hope
> > > of one day deprecating this would be use AddressSanitizer, which is
> > > supported on MSVC [1], but unfortunately not yet on MinGW w/ GCC [2], and
> > > we rely upon a lot for day-to-day development, using Linux
> > > cross-compilation.  Using MinGW w/ Clang cross compiler seems be a way to
> > > overcome this difficulty, but that too was still in somewhat experimental
> > > state when I last tried it.
> > >
> > >
> > > Jose
> > >
> > > [1]
> > > https://devblogs.microsoft.com/cppblog/asan-for-windows-x64-and-debug-build-support/
> > > [2]
> > > https://stackoverflow.com/questions/67619314/cannot-use-fsanitize-address-in-mingw-compiler
> > > --
> > > *From:* Dave Airlie 
> > > *Sent:* Wednesday, December 22, 2021 22:35
> > > *To:* mesa-dev ; Jose Fonseca <  
> > > jfons...@vmware.com>; Brian Paul   
> > > *Subject:* revenge of CALLOC_STRUCT
> > >
> > > Hey,
> > >
> > > Happy holidays, and as though to consider over the break,
> > >
> > > We have the vmware used MALLOC/FREE/CALLOC/CALLOC_STRUCT wrappers used
> > > throughout gallium.
> > >
> > > We have ST_CALLOC_STRUCT in use in the mesa state tracker, not used in
> > > gallium.
> > >
> > > Merging the state tracker into mesa proper, and even prior to this a
> > > few CALLOC_STRUCT have started to leak into src/mesa/*.
> > >
> > > Now I don't think we want that, but CALLOC_STRUCT is a genuinely
> > > useful macro on it's own,
> > > I'm considering just defined a calloc_struct instead for the
> > > non-gallium use that goes with malloc/calloc/free.
> > >
> > > Any opinions, or should mesa just get u_memory.h support for Xmas?
> > >
> > > Dave.
> > >  
> >   
> 


pgpXntYJpSKDj.pgp
Description: OpenPGP digital signature


Re: revenge of CALLOC_STRUCT

2022-01-07 Thread Kenneth Graunke
I've definitely used the pipe_reference_described debugging code to
track down some issues when working on iris.

--Ken

On Monday, December 27, 2021 4:18:16 PM PST Marek Olšák wrote:
> I remember that it wasn't unusual on Windows to define malloc, calloc,
> strdup, and free macros to redirect those calls to custom wrappers. That
> would eliminate the need to have non-standard names like MALLOC.
> 
> There is also the pipe_reference debug code. Is anybody using that?
> 
> Marek
> 
> On Sun., Dec. 26, 2021, 05:37 Jose Fonseca,  wrote:
> 
> > I believe that as long as the CALLOC_STRUCT continue to get paired with
> > right FREE call (or equivalent if these get renamed) either way should work
> > for us.  Whatever option proposed gets followed, there's a risk these can
> > get out of balanced, but the risk seems comparable.
> >
> >
> > For context, IIRC, the main reason these macros remain useful for VMware
> > is the sad state of memory debugging tools on Windows.  AFAICT, best hope
> > of one day deprecating this would be use AddressSanitizer, which is
> > supported on MSVC [1], but unfortunately not yet on MinGW w/ GCC [2], and
> > we rely upon a lot for day-to-day development, using Linux
> > cross-compilation.  Using MinGW w/ Clang cross compiler seems be a way to
> > overcome this difficulty, but that too was still in somewhat experimental
> > state when I last tried it.
> >
> >
> > Jose
> >
> > [1]
> > https://devblogs.microsoft.com/cppblog/asan-for-windows-x64-and-debug-build-support/
> > [2]
> > https://stackoverflow.com/questions/67619314/cannot-use-fsanitize-address-in-mingw-compiler
> > --
> > *From:* Dave Airlie 
> > *Sent:* Wednesday, December 22, 2021 22:35
> > *To:* mesa-dev ; Jose Fonseca <
> > jfons...@vmware.com>; Brian Paul 
> > *Subject:* revenge of CALLOC_STRUCT
> >
> > Hey,
> >
> > Happy holidays, and as though to consider over the break,
> >
> > We have the vmware used MALLOC/FREE/CALLOC/CALLOC_STRUCT wrappers used
> > throughout gallium.
> >
> > We have ST_CALLOC_STRUCT in use in the mesa state tracker, not used in
> > gallium.
> >
> > Merging the state tracker into mesa proper, and even prior to this a
> > few CALLOC_STRUCT have started to leak into src/mesa/*.
> >
> > Now I don't think we want that, but CALLOC_STRUCT is a genuinely
> > useful macro on it's own,
> > I'm considering just defined a calloc_struct instead for the
> > non-gallium use that goes with malloc/calloc/free.
> >
> > Any opinions, or should mesa just get u_memory.h support for Xmas?
> >
> > Dave.
> >
> 



signature.asc
Description: This is a digitally signed message part.


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
On Fri, Jan 7, 2022 at 6:32 PM Emma Anholt  wrote:
>
> On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott  wrote:
> >
> > Unfortunately batch mode has only made it *worse* - I'm sure it's not
> > intentional, but it seems that it's still running the CI pipelines
> > individually after the batch pipeline passes and not merging them
> > right away, which completely defeats the point. See, for example,
> > !14213 which has gone through 8 cycles being batched with earlier MRs,
> > 5 of those passing only to have an earlier job in the batch spuriously
> > fail when actually merging and Marge seemingly giving up on merging it
> > (???). As I type it was "lucky" enough to be the first job in a batch
> > which passed and is currently running its pipeline and is blocked on
> > iris-whl-traces-performance (I have !14453 to disable that broken job,
> > but who knows with the Marge chaos when it's going to get merged...).
> >
> > Stepping back, I think it was a bad idea to push a "I think this might
> > help" type change like this without first carefully monitoring things
> > afterwards. An hour or so of babysitting Marge would've caught that
> > this wasn't working, and would've prevented many hours of backlog and
> > perception of general CI instability.
>
> I spent the day watching marge, like I do every day.  Looking at the
> logs, we got 0 MRs in during my work hours PST, out of about 14 or so
> marge assignments that day.  Leaving marge broken for the night would
> have been indistinguishable from the status quo, was my assessment.

Yikes, that's awful - and I know it's definitely not easy keeping
everything running!

But unfortunately it seems like the problems that day were transient,
and as I said earlier there were at least 6 MRs that succeeded that
would've been merged if they weren't batch MRs, so enabling batch mode
did wind up causing some damage compared to doing nothing. So doing it
the next day, when there was some possibility to follow up any
problems, would've been better. Not to take away from what you guys
are doing, just a lesson for next time.

Connor


Re: git and Marge troubles this week

2022-01-07 Thread Ian Romanick
Blarg. That all sounds awful.  I think (hope!) I speak for everyone when 
I say that we all appreciate your and daniels' efforts to keep this big 
piece of machinery working.


If the problems persist much longer, should we consider pushing out the 
22.0 branch point?


On 1/6/22 9:36 PM, Emma Anholt wrote:

As you've probably noticed, there have been issues with git access
this week.  The fd.o sysadmins are desperately trying to stay on
vacation because they do deserve a break, but have still been working
on the problem and a couple of solutions haven't worked out yet.
Hopefully we'll have some news soon.

Due to these ongoing git timeouts, our CI runners have been getting
bogged down with stalled jobs and causing a lot of spurious failures
where the pipeline doesn't get all its jobs assigned to runners before
Marge gives up.  Today, I asked daniels to bump Marge's pipeline
timeout to 4 hours (up from 1).  To get MRs flowing at a similar rate
despite the longer total pipeline times, we also enabled batch mode as
described at 
https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests.

It means there are now theoretical cases as described in the README
where Marge might merge a set of code that leaves main broken.
However, those cases are pretty obscure, and I expect that failure
rate to be much lower than the existing "you can merge flaky code"
failure rate and worth the risk.

Hopefully this gets us all productive again.


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
On Fri, Jan 7, 2022 at 6:32 PM Emma Anholt  wrote:
>
> On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott  wrote:
> >
> > Unfortunately batch mode has only made it *worse* - I'm sure it's not
> > intentional, but it seems that it's still running the CI pipelines
> > individually after the batch pipeline passes and not merging them
> > right away, which completely defeats the point. See, for example,
> > !14213 which has gone through 8 cycles being batched with earlier MRs,
> > 5 of those passing only to have an earlier job in the batch spuriously
> > fail when actually merging and Marge seemingly giving up on merging it
> > (???). As I type it was "lucky" enough to be the first job in a batch
> > which passed and is currently running its pipeline and is blocked on
> > iris-whl-traces-performance (I have !14453 to disable that broken job,
> > but who knows with the Marge chaos when it's going to get merged...).
> >
> > Stepping back, I think it was a bad idea to push a "I think this might
> > help" type change like this without first carefully monitoring things
> > afterwards. An hour or so of babysitting Marge would've caught that
> > this wasn't working, and would've prevented many hours of backlog and
> > perception of general CI instability.
>
> I spent the day watching marge, like I do every day.  Looking at the
> logs, we got 0 MRs in during my work hours PST, out of about 14 or so
> marge assignments that day.  Leaving marge broken for the night would
> have been indistinguishable from the status quo, was my assessment.
>
> There was definitely some extra spam about trying batches, more than
> there were actual batches attempted.  My guess would be gitlab
> connection reliability stuff, but I'm not sure.
>
> Of the 5 batches marge attempted before the change was reverted, three
> fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to
> the git fetch fails, and one to a new timeout I don't think I've seen
> before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731.
> Of all the sub-MRs involved in those batches, I think two of those
> might have gotten through by dodging the LAVA lab fail.  Marge's batch
> backoff did work, and !14436 and maybe !14433 landed during that time.

Looks like I was a bit off with the numbers, but I double-checked and
these batch MRs containing !14213 all passed and yet it didn't get
merged: !14456, !14452, !14449, !14445, !14440, !14438... so actually
6.

!14436, for whatever reason, was never put into a batch - it worked as
before the change, probably because there weren't other MRs to combine
it with at the time. I've been looking through Marge's history and
can't find a single example where a successful batched merge happened.
Typically, when there's a successful batch MR, the first MR in the
batch gets rebased by Marge but not merged, instead its pipeline gets
run and (seemingly) Marge moves on and picks some other MR, not even
waiting for it to finish. Since (iirc) Marge picks MRs by
least-recently-active and this generates activity, it gets shoved to
the back of the queue and then gets locked in a cycle (!14213 is the
worst, but there are others). I think this happens because Mesa gates
acceptance on the pipeline passing, and therefore when Marge goes to
merge the pipelines in a batch she can't, and she just moves on to the
next one.

Connor


Re: git and Marge troubles this week

2022-01-07 Thread Emma Anholt
On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott  wrote:
>
> Unfortunately batch mode has only made it *worse* - I'm sure it's not
> intentional, but it seems that it's still running the CI pipelines
> individually after the batch pipeline passes and not merging them
> right away, which completely defeats the point. See, for example,
> !14213 which has gone through 8 cycles being batched with earlier MRs,
> 5 of those passing only to have an earlier job in the batch spuriously
> fail when actually merging and Marge seemingly giving up on merging it
> (???). As I type it was "lucky" enough to be the first job in a batch
> which passed and is currently running its pipeline and is blocked on
> iris-whl-traces-performance (I have !14453 to disable that broken job,
> but who knows with the Marge chaos when it's going to get merged...).
>
> Stepping back, I think it was a bad idea to push a "I think this might
> help" type change like this without first carefully monitoring things
> afterwards. An hour or so of babysitting Marge would've caught that
> this wasn't working, and would've prevented many hours of backlog and
> perception of general CI instability.

I spent the day watching marge, like I do every day.  Looking at the
logs, we got 0 MRs in during my work hours PST, out of about 14 or so
marge assignments that day.  Leaving marge broken for the night would
have been indistinguishable from the status quo, was my assessment.

There was definitely some extra spam about trying batches, more than
there were actual batches attempted.  My guess would be gitlab
connection reliability stuff, but I'm not sure.

Of the 5 batches marge attempted before the change was reverted, three
fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to
the git fetch fails, and one to a new timeout I don't think I've seen
before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731.
Of all the sub-MRs involved in those batches, I think two of those
might have gotten through by dodging the LAVA lab fail.  Marge's batch
backoff did work, and !14436 and maybe !14433 landed during that time.


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
On Fri, Jan 7, 2022 at 3:18 PM Connor Abbott  wrote:
>
> Unfortunately batch mode has only made it *worse* - I'm sure it's not
> intentional, but it seems that it's still running the CI pipelines
> individually after the batch pipeline passes and not merging them
> right away, which completely defeats the point. See, for example,
> !14213 which has gone through 8 cycles being batched with earlier MRs,
> 5 of those passing only to have an earlier job in the batch spuriously
> fail when actually merging and Marge seemingly giving up on merging it
> (???). As I type it was "lucky" enough to be the first job in a batch
> which passed and is currently running its pipeline and is blocked on
> iris-whl-traces-performance (I have !14453 to disable that broken job,
> but who knows with the Marge chaos when it's going to get merged...).

Ah, I guess I spoke too soon and my analysis of what's happening was
wrong, because in the meantime she merged !14121 without waiting for
that broken job to timeout. It's somehow just completely ignoring
!14213 and merging other things on top of it, I guess? Either way
something is seriously wrong.

>
> Stepping back, I think it was a bad idea to push a "I think this might
> help" type change like this without first carefully monitoring things
> afterwards. An hour or so of babysitting Marge would've caught that
> this wasn't working, and would've prevented many hours of backlog and
> perception of general CI instability.
>
> Connor
>
> On Fri, Jan 7, 2022 at 6:36 AM Emma Anholt  wrote:
> >
> > As you've probably noticed, there have been issues with git access
> > this week.  The fd.o sysadmins are desperately trying to stay on
> > vacation because they do deserve a break, but have still been working
> > on the problem and a couple of solutions haven't worked out yet.
> > Hopefully we'll have some news soon.
> >
> > Due to these ongoing git timeouts, our CI runners have been getting
> > bogged down with stalled jobs and causing a lot of spurious failures
> > where the pipeline doesn't get all its jobs assigned to runners before
> > Marge gives up.  Today, I asked daniels to bump Marge's pipeline
> > timeout to 4 hours (up from 1).  To get MRs flowing at a similar rate
> > despite the longer total pipeline times, we also enabled batch mode as
> > described at 
> > https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests.
> >
> > It means there are now theoretical cases as described in the README
> > where Marge might merge a set of code that leaves main broken.
> > However, those cases are pretty obscure, and I expect that failure
> > rate to be much lower than the existing "you can merge flaky code"
> > failure rate and worth the risk.
> >
> > Hopefully this gets us all productive again.


Re: git and Marge troubles this week

2022-01-07 Thread Connor Abbott
Unfortunately batch mode has only made it *worse* - I'm sure it's not
intentional, but it seems that it's still running the CI pipelines
individually after the batch pipeline passes and not merging them
right away, which completely defeats the point. See, for example,
!14213 which has gone through 8 cycles being batched with earlier MRs,
5 of those passing only to have an earlier job in the batch spuriously
fail when actually merging and Marge seemingly giving up on merging it
(???). As I type it was "lucky" enough to be the first job in a batch
which passed and is currently running its pipeline and is blocked on
iris-whl-traces-performance (I have !14453 to disable that broken job,
but who knows with the Marge chaos when it's going to get merged...).

Stepping back, I think it was a bad idea to push a "I think this might
help" type change like this without first carefully monitoring things
afterwards. An hour or so of babysitting Marge would've caught that
this wasn't working, and would've prevented many hours of backlog and
perception of general CI instability.

Connor

On Fri, Jan 7, 2022 at 6:36 AM Emma Anholt  wrote:
>
> As you've probably noticed, there have been issues with git access
> this week.  The fd.o sysadmins are desperately trying to stay on
> vacation because they do deserve a break, but have still been working
> on the problem and a couple of solutions haven't worked out yet.
> Hopefully we'll have some news soon.
>
> Due to these ongoing git timeouts, our CI runners have been getting
> bogged down with stalled jobs and causing a lot of spurious failures
> where the pipeline doesn't get all its jobs assigned to runners before
> Marge gives up.  Today, I asked daniels to bump Marge's pipeline
> timeout to 4 hours (up from 1).  To get MRs flowing at a similar rate
> despite the longer total pipeline times, we also enabled batch mode as
> described at 
> https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests.
>
> It means there are now theoretical cases as described in the README
> where Marge might merge a set of code that leaves main broken.
> However, those cases are pretty obscure, and I expect that failure
> rate to be much lower than the existing "you can merge flaky code"
> failure rate and worth the risk.
>
> Hopefully this gets us all productive again.