Re: How to reduce the time m-i is closed?
Philip Chee schrieb: I thought that there was a plan to pre-allocate on startup some memory for the minidump/crash reporter? For one thing, I'm not sure how far that went, for the other, we are calling a Windows function to generate the minidump and I'm not sure if we can reserve the memory it needs reasonably beforehand. KaiRo ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: How to reduce the time m-i is closed?
On 11/21/2013 1:11 PM, Robert Kaiser wrote: Philip Chee schrieb: I thought that there was a plan to pre-allocate on startup some memory for the minidump/crash reporter? For one thing, I'm not sure how far that went, for the other, we are calling a Windows function to generate the minidump and I'm not sure if we can reserve the memory it needs reasonably beforehand. We did this in bug 837835. We currently reserve 12MB of address space for the crash reporter. This is apparently either not enough or doesn't work for many crashes; it doesn't appear to have made a noticeable impact converting empty-dump crashes to collect useful minidumps. --BDS ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: How to reduce the time m-i is closed?
Nicholas Nethercote schrieb: It also assumes that we can backout stuff to fix the problem; we tried that to some extent with the first OOM closure -- it is the standard response to test failure, of course -- but it didn't work. Yes, in the case of those OOM issues that caused this closure, they are probably just a symptom of a larger problem. We've been having a step-by-step rise of OOM issues over quite some time now, most intensely seen as an increase of crashes with empty dumps. I alerted to that in bug 837835 but we couldn't track down a decent regression range (we mostly know in which 6-week cycle we had regressions, we can do some assumptions to narrow things a bit further down on trunk, but not nearly well enough to get to candidate checkins). Because of that, this has been lingering without any real tries to fix things, and from what I saw in data, things did even get worse recently - and that's talking of the release channel, so whatever might have increased troubles on trunk around this closure is even on top of that. As in a lot of cases we're seeing, there's apparently too little memory available for Windows to even create a minidump, we have pretty little info about those issues - but we do have our additional annotations we send along with the crash report, and those gives us some info that AFAIK gives us the assumption that in many cases we're running out of virtual memory space but not necessarily of physical memory. As I'm told, that can for example happen with VM fragmentation as well as bugs causing a mapping of the same physical page over and over into virtual memory. We're not sure if that's all on our code or if system code or (graphics?) driver code exposes issues to us there. From what I know, bsmedberg and dmajor are looking into those issues more closely, both now that we had the tree closure problem and also because it has been a lingering stability issue for months. I'm sure any help in those efforts is appreciated as those are tough issues, and it might be multiple problems that all contribute a share to the overall issue. Making us more efficient on memory sounds like a worthwhile goal overall anyhow (even though the bullet of running out of VM space can be dodged by switching to Win64 and/or e10s giving us multiple processes that all have their 32bit virtual memory space, but not sure if those should or will be our primary solutions). I think in other cases, where a clear cause to the tree-closing issues is easy to assess, a backout-based process can work better, but with those OOM issues there's not a clear patch or patch set to point to. IMHO, we should work on the overall issue cluster of OOM, though. KaiRo ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: How to reduce the time m-i is closed?
On 21/11/2013 00:20, Robert Kaiser wrote: As in a lot of cases we're seeing, there's apparently too little memory available for Windows to even create a minidump, we have pretty little info about those issues - but we do have our additional annotations we send along with the crash report, and those gives us some info that AFAIK gives us the assumption that in many cases we're running out of virtual memory space but not necessarily of physical memory. As I'm told, that can for example happen with VM fragmentation as well as bugs causing a mapping of the same physical page over and over into virtual memory. We're not sure if that's all on our code or if system code or (graphics?) driver code exposes issues to us there. I thought that there was a plan to pre-allocate on startup some memory for the minidump/crash reporter? KaiRo Phil -- Philip Chee phi...@aleytys.pc.my, philip.c...@gmail.com http://flashblock.mozdev.org/ http://xsidebar.mozdev.org Guard us from the she-wolf and the wolf, and guard us from the thief, oh Night, and so be good for us to pass. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: How to reduce the time m-i is closed?
On 2013-11-18 7:17 AM, Ed Morley wrote: On 16/11/2013 15:17, smaug wrote: the recent OOM cases have been really annoying. They have slowed down development, even for those who haven't been dealing with the actual issue(s). Could we handle this kind of cases differently. Perhaps clone the bad state of m-i to some other repository we're tracking using tbpl, backout stuff from m-i to the state where we can run it, re-open it and do the fixes in the clone. Unfortunately as Nick mentioned - this wasn't possible, otherwise we would just have performed a backout similar to those performed several times a day when something breaks the tree in a more 'normal' way. The closure was due to a seemingly chronic issue, that had only been highlighted by recent landings (and no one particular landing, since the one backout that was performed, still didn't make the failures disappear entirely). Even if we had just reverted the last week's worth of changes it would not fix the root cause - which was that any single patch could potentially tip us off the edge of being OOM again. But we still reopened without the root cause being fixed, didn't we? What am I missing? Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: How to reduce the time m-i is closed?
On Sun, Nov 17, 2013 at 2:17 AM, smaug sm...@welho.com wrote: the recent OOM cases have been really annoying. They have slowed down development, even for those who haven't been dealing with the actual issue(s). Could we handle this kind of cases differently. Perhaps clone the bad state of m-i to some other repository we're tracking using tbpl, backout stuff from m-i to the state where we can run it, re-open it and do the fixes in the clone. And then, say in a week, merge the clone back to m-i. If the state is still bad (no one has step up to fix the issues), then keep m-i closed until the issues have been fixed. Sounds complicated. It also assumes that we can backout stuff to fix the problem; we tried that to some extent with the first OOM closure -- it is the standard response to test failure, of course -- but it didn't work. More generally, I don't like the idea of making this kind of breakage normal. I'd prefer to see effort go towards preventing it than tolerating it. Nick ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
How to reduce the time m-i is closed?
Hi all, the recent OOM cases have been really annoying. They have slowed down development, even for those who haven't been dealing with the actual issue(s). Could we handle this kind of cases differently. Perhaps clone the bad state of m-i to some other repository we're tracking using tbpl, backout stuff from m-i to the state where we can run it, re-open it and do the fixes in the clone. And then, say in a week, merge the clone back to m-i. If the state is still bad (no one has step up to fix the issues), then keep m-i closed until the issues have been fixed. thoughts? -Olli ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform