Re: revenge of CALLOC_STRUCT
I've used it at least once too. Mike On Fri, 07 Jan 2022 17:20:11 -0800 Kenneth Graunke wrote: > I've definitely used the pipe_reference_described debugging code to > track down some issues when working on iris. > > --Ken > > On Monday, December 27, 2021 4:18:16 PM PST Marek Olšák wrote: > > I remember that it wasn't unusual on Windows to define malloc, calloc, > > strdup, and free macros to redirect those calls to custom wrappers. That > > would eliminate the need to have non-standard names like MALLOC. > > > > There is also the pipe_reference debug code. Is anybody using that? > > > > Marek > > > > On Sun., Dec. 26, 2021, 05:37 Jose Fonseca, wrote: > > > > > I believe that as long as the CALLOC_STRUCT continue to get paired with > > > right FREE call (or equivalent if these get renamed) either way should > > > work > > > for us. Whatever option proposed gets followed, there's a risk these can > > > get out of balanced, but the risk seems comparable. > > > > > > > > > For context, IIRC, the main reason these macros remain useful for VMware > > > is the sad state of memory debugging tools on Windows. AFAICT, best hope > > > of one day deprecating this would be use AddressSanitizer, which is > > > supported on MSVC [1], but unfortunately not yet on MinGW w/ GCC [2], and > > > we rely upon a lot for day-to-day development, using Linux > > > cross-compilation. Using MinGW w/ Clang cross compiler seems be a way to > > > overcome this difficulty, but that too was still in somewhat experimental > > > state when I last tried it. > > > > > > > > > Jose > > > > > > [1] > > > https://devblogs.microsoft.com/cppblog/asan-for-windows-x64-and-debug-build-support/ > > > [2] > > > https://stackoverflow.com/questions/67619314/cannot-use-fsanitize-address-in-mingw-compiler > > > -- > > > *From:* Dave Airlie > > > *Sent:* Wednesday, December 22, 2021 22:35 > > > *To:* mesa-dev ; Jose Fonseca < > > > jfons...@vmware.com>; Brian Paul > > > *Subject:* revenge of CALLOC_STRUCT > > > > > > Hey, > > > > > > Happy holidays, and as though to consider over the break, > > > > > > We have the vmware used MALLOC/FREE/CALLOC/CALLOC_STRUCT wrappers used > > > throughout gallium. > > > > > > We have ST_CALLOC_STRUCT in use in the mesa state tracker, not used in > > > gallium. > > > > > > Merging the state tracker into mesa proper, and even prior to this a > > > few CALLOC_STRUCT have started to leak into src/mesa/*. > > > > > > Now I don't think we want that, but CALLOC_STRUCT is a genuinely > > > useful macro on it's own, > > > I'm considering just defined a calloc_struct instead for the > > > non-gallium use that goes with malloc/calloc/free. > > > > > > Any opinions, or should mesa just get u_memory.h support for Xmas? > > > > > > Dave. > > > > > > pgpXntYJpSKDj.pgp Description: OpenPGP digital signature
Re: revenge of CALLOC_STRUCT
I've definitely used the pipe_reference_described debugging code to track down some issues when working on iris. --Ken On Monday, December 27, 2021 4:18:16 PM PST Marek Olšák wrote: > I remember that it wasn't unusual on Windows to define malloc, calloc, > strdup, and free macros to redirect those calls to custom wrappers. That > would eliminate the need to have non-standard names like MALLOC. > > There is also the pipe_reference debug code. Is anybody using that? > > Marek > > On Sun., Dec. 26, 2021, 05:37 Jose Fonseca, wrote: > > > I believe that as long as the CALLOC_STRUCT continue to get paired with > > right FREE call (or equivalent if these get renamed) either way should work > > for us. Whatever option proposed gets followed, there's a risk these can > > get out of balanced, but the risk seems comparable. > > > > > > For context, IIRC, the main reason these macros remain useful for VMware > > is the sad state of memory debugging tools on Windows. AFAICT, best hope > > of one day deprecating this would be use AddressSanitizer, which is > > supported on MSVC [1], but unfortunately not yet on MinGW w/ GCC [2], and > > we rely upon a lot for day-to-day development, using Linux > > cross-compilation. Using MinGW w/ Clang cross compiler seems be a way to > > overcome this difficulty, but that too was still in somewhat experimental > > state when I last tried it. > > > > > > Jose > > > > [1] > > https://devblogs.microsoft.com/cppblog/asan-for-windows-x64-and-debug-build-support/ > > [2] > > https://stackoverflow.com/questions/67619314/cannot-use-fsanitize-address-in-mingw-compiler > > -- > > *From:* Dave Airlie > > *Sent:* Wednesday, December 22, 2021 22:35 > > *To:* mesa-dev ; Jose Fonseca < > > jfons...@vmware.com>; Brian Paul > > *Subject:* revenge of CALLOC_STRUCT > > > > Hey, > > > > Happy holidays, and as though to consider over the break, > > > > We have the vmware used MALLOC/FREE/CALLOC/CALLOC_STRUCT wrappers used > > throughout gallium. > > > > We have ST_CALLOC_STRUCT in use in the mesa state tracker, not used in > > gallium. > > > > Merging the state tracker into mesa proper, and even prior to this a > > few CALLOC_STRUCT have started to leak into src/mesa/*. > > > > Now I don't think we want that, but CALLOC_STRUCT is a genuinely > > useful macro on it's own, > > I'm considering just defined a calloc_struct instead for the > > non-gallium use that goes with malloc/calloc/free. > > > > Any opinions, or should mesa just get u_memory.h support for Xmas? > > > > Dave. > > > signature.asc Description: This is a digitally signed message part.
Re: git and Marge troubles this week
On Fri, Jan 7, 2022 at 6:32 PM Emma Anholt wrote: > > On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott wrote: > > > > Unfortunately batch mode has only made it *worse* - I'm sure it's not > > intentional, but it seems that it's still running the CI pipelines > > individually after the batch pipeline passes and not merging them > > right away, which completely defeats the point. See, for example, > > !14213 which has gone through 8 cycles being batched with earlier MRs, > > 5 of those passing only to have an earlier job in the batch spuriously > > fail when actually merging and Marge seemingly giving up on merging it > > (???). As I type it was "lucky" enough to be the first job in a batch > > which passed and is currently running its pipeline and is blocked on > > iris-whl-traces-performance (I have !14453 to disable that broken job, > > but who knows with the Marge chaos when it's going to get merged...). > > > > Stepping back, I think it was a bad idea to push a "I think this might > > help" type change like this without first carefully monitoring things > > afterwards. An hour or so of babysitting Marge would've caught that > > this wasn't working, and would've prevented many hours of backlog and > > perception of general CI instability. > > I spent the day watching marge, like I do every day. Looking at the > logs, we got 0 MRs in during my work hours PST, out of about 14 or so > marge assignments that day. Leaving marge broken for the night would > have been indistinguishable from the status quo, was my assessment. Yikes, that's awful - and I know it's definitely not easy keeping everything running! But unfortunately it seems like the problems that day were transient, and as I said earlier there were at least 6 MRs that succeeded that would've been merged if they weren't batch MRs, so enabling batch mode did wind up causing some damage compared to doing nothing. So doing it the next day, when there was some possibility to follow up any problems, would've been better. Not to take away from what you guys are doing, just a lesson for next time. Connor
Re: git and Marge troubles this week
Blarg. That all sounds awful. I think (hope!) I speak for everyone when I say that we all appreciate your and daniels' efforts to keep this big piece of machinery working. If the problems persist much longer, should we consider pushing out the 22.0 branch point? On 1/6/22 9:36 PM, Emma Anholt wrote: As you've probably noticed, there have been issues with git access this week. The fd.o sysadmins are desperately trying to stay on vacation because they do deserve a break, but have still been working on the problem and a couple of solutions haven't worked out yet. Hopefully we'll have some news soon. Due to these ongoing git timeouts, our CI runners have been getting bogged down with stalled jobs and causing a lot of spurious failures where the pipeline doesn't get all its jobs assigned to runners before Marge gives up. Today, I asked daniels to bump Marge's pipeline timeout to 4 hours (up from 1). To get MRs flowing at a similar rate despite the longer total pipeline times, we also enabled batch mode as described at https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests. It means there are now theoretical cases as described in the README where Marge might merge a set of code that leaves main broken. However, those cases are pretty obscure, and I expect that failure rate to be much lower than the existing "you can merge flaky code" failure rate and worth the risk. Hopefully this gets us all productive again.
Re: git and Marge troubles this week
On Fri, Jan 7, 2022 at 6:32 PM Emma Anholt wrote: > > On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott wrote: > > > > Unfortunately batch mode has only made it *worse* - I'm sure it's not > > intentional, but it seems that it's still running the CI pipelines > > individually after the batch pipeline passes and not merging them > > right away, which completely defeats the point. See, for example, > > !14213 which has gone through 8 cycles being batched with earlier MRs, > > 5 of those passing only to have an earlier job in the batch spuriously > > fail when actually merging and Marge seemingly giving up on merging it > > (???). As I type it was "lucky" enough to be the first job in a batch > > which passed and is currently running its pipeline and is blocked on > > iris-whl-traces-performance (I have !14453 to disable that broken job, > > but who knows with the Marge chaos when it's going to get merged...). > > > > Stepping back, I think it was a bad idea to push a "I think this might > > help" type change like this without first carefully monitoring things > > afterwards. An hour or so of babysitting Marge would've caught that > > this wasn't working, and would've prevented many hours of backlog and > > perception of general CI instability. > > I spent the day watching marge, like I do every day. Looking at the > logs, we got 0 MRs in during my work hours PST, out of about 14 or so > marge assignments that day. Leaving marge broken for the night would > have been indistinguishable from the status quo, was my assessment. > > There was definitely some extra spam about trying batches, more than > there were actual batches attempted. My guess would be gitlab > connection reliability stuff, but I'm not sure. > > Of the 5 batches marge attempted before the change was reverted, three > fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to > the git fetch fails, and one to a new timeout I don't think I've seen > before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731. > Of all the sub-MRs involved in those batches, I think two of those > might have gotten through by dodging the LAVA lab fail. Marge's batch > backoff did work, and !14436 and maybe !14433 landed during that time. Looks like I was a bit off with the numbers, but I double-checked and these batch MRs containing !14213 all passed and yet it didn't get merged: !14456, !14452, !14449, !14445, !14440, !14438... so actually 6. !14436, for whatever reason, was never put into a batch - it worked as before the change, probably because there weren't other MRs to combine it with at the time. I've been looking through Marge's history and can't find a single example where a successful batched merge happened. Typically, when there's a successful batch MR, the first MR in the batch gets rebased by Marge but not merged, instead its pipeline gets run and (seemingly) Marge moves on and picks some other MR, not even waiting for it to finish. Since (iirc) Marge picks MRs by least-recently-active and this generates activity, it gets shoved to the back of the queue and then gets locked in a cycle (!14213 is the worst, but there are others). I think this happens because Mesa gates acceptance on the pipeline passing, and therefore when Marge goes to merge the pipelines in a batch she can't, and she just moves on to the next one. Connor
Re: git and Marge troubles this week
On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott wrote: > > Unfortunately batch mode has only made it *worse* - I'm sure it's not > intentional, but it seems that it's still running the CI pipelines > individually after the batch pipeline passes and not merging them > right away, which completely defeats the point. See, for example, > !14213 which has gone through 8 cycles being batched with earlier MRs, > 5 of those passing only to have an earlier job in the batch spuriously > fail when actually merging and Marge seemingly giving up on merging it > (???). As I type it was "lucky" enough to be the first job in a batch > which passed and is currently running its pipeline and is blocked on > iris-whl-traces-performance (I have !14453 to disable that broken job, > but who knows with the Marge chaos when it's going to get merged...). > > Stepping back, I think it was a bad idea to push a "I think this might > help" type change like this without first carefully monitoring things > afterwards. An hour or so of babysitting Marge would've caught that > this wasn't working, and would've prevented many hours of backlog and > perception of general CI instability. I spent the day watching marge, like I do every day. Looking at the logs, we got 0 MRs in during my work hours PST, out of about 14 or so marge assignments that day. Leaving marge broken for the night would have been indistinguishable from the status quo, was my assessment. There was definitely some extra spam about trying batches, more than there were actual batches attempted. My guess would be gitlab connection reliability stuff, but I'm not sure. Of the 5 batches marge attempted before the change was reverted, three fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to the git fetch fails, and one to a new timeout I don't think I've seen before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731. Of all the sub-MRs involved in those batches, I think two of those might have gotten through by dodging the LAVA lab fail. Marge's batch backoff did work, and !14436 and maybe !14433 landed during that time.
Re: git and Marge troubles this week
On Fri, Jan 7, 2022 at 3:18 PM Connor Abbott wrote: > > Unfortunately batch mode has only made it *worse* - I'm sure it's not > intentional, but it seems that it's still running the CI pipelines > individually after the batch pipeline passes and not merging them > right away, which completely defeats the point. See, for example, > !14213 which has gone through 8 cycles being batched with earlier MRs, > 5 of those passing only to have an earlier job in the batch spuriously > fail when actually merging and Marge seemingly giving up on merging it > (???). As I type it was "lucky" enough to be the first job in a batch > which passed and is currently running its pipeline and is blocked on > iris-whl-traces-performance (I have !14453 to disable that broken job, > but who knows with the Marge chaos when it's going to get merged...). Ah, I guess I spoke too soon and my analysis of what's happening was wrong, because in the meantime she merged !14121 without waiting for that broken job to timeout. It's somehow just completely ignoring !14213 and merging other things on top of it, I guess? Either way something is seriously wrong. > > Stepping back, I think it was a bad idea to push a "I think this might > help" type change like this without first carefully monitoring things > afterwards. An hour or so of babysitting Marge would've caught that > this wasn't working, and would've prevented many hours of backlog and > perception of general CI instability. > > Connor > > On Fri, Jan 7, 2022 at 6:36 AM Emma Anholt wrote: > > > > As you've probably noticed, there have been issues with git access > > this week. The fd.o sysadmins are desperately trying to stay on > > vacation because they do deserve a break, but have still been working > > on the problem and a couple of solutions haven't worked out yet. > > Hopefully we'll have some news soon. > > > > Due to these ongoing git timeouts, our CI runners have been getting > > bogged down with stalled jobs and causing a lot of spurious failures > > where the pipeline doesn't get all its jobs assigned to runners before > > Marge gives up. Today, I asked daniels to bump Marge's pipeline > > timeout to 4 hours (up from 1). To get MRs flowing at a similar rate > > despite the longer total pipeline times, we also enabled batch mode as > > described at > > https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests. > > > > It means there are now theoretical cases as described in the README > > where Marge might merge a set of code that leaves main broken. > > However, those cases are pretty obscure, and I expect that failure > > rate to be much lower than the existing "you can merge flaky code" > > failure rate and worth the risk. > > > > Hopefully this gets us all productive again.
Re: git and Marge troubles this week
Unfortunately batch mode has only made it *worse* - I'm sure it's not intentional, but it seems that it's still running the CI pipelines individually after the batch pipeline passes and not merging them right away, which completely defeats the point. See, for example, !14213 which has gone through 8 cycles being batched with earlier MRs, 5 of those passing only to have an earlier job in the batch spuriously fail when actually merging and Marge seemingly giving up on merging it (???). As I type it was "lucky" enough to be the first job in a batch which passed and is currently running its pipeline and is blocked on iris-whl-traces-performance (I have !14453 to disable that broken job, but who knows with the Marge chaos when it's going to get merged...). Stepping back, I think it was a bad idea to push a "I think this might help" type change like this without first carefully monitoring things afterwards. An hour or so of babysitting Marge would've caught that this wasn't working, and would've prevented many hours of backlog and perception of general CI instability. Connor On Fri, Jan 7, 2022 at 6:36 AM Emma Anholt wrote: > > As you've probably noticed, there have been issues with git access > this week. The fd.o sysadmins are desperately trying to stay on > vacation because they do deserve a break, but have still been working > on the problem and a couple of solutions haven't worked out yet. > Hopefully we'll have some news soon. > > Due to these ongoing git timeouts, our CI runners have been getting > bogged down with stalled jobs and causing a lot of spurious failures > where the pipeline doesn't get all its jobs assigned to runners before > Marge gives up. Today, I asked daniels to bump Marge's pipeline > timeout to 4 hours (up from 1). To get MRs flowing at a similar rate > despite the longer total pipeline times, we also enabled batch mode as > described at > https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests. > > It means there are now theoretical cases as described in the README > where Marge might merge a set of code that leaves main broken. > However, those cases are pretty obscure, and I expect that failure > rate to be much lower than the existing "you can merge flaky code" > failure rate and worth the risk. > > Hopefully this gets us all productive again.