Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
It might be good to look at the Android equivalent of this too: http://android-developers.blogspot.com/2010/05/google-feedback-for-android.html Martin -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
On Tue, Aug 9, 2011 at 5:37 PM, Christopher James Halse Rogers wrote: > While we're using the terminology "crash report", I want to ensure that > there's a sufficiently general understanding of what this means. I > think we'd want this to cover at least: > * Actual C-style crashes, with core. > * Unhandled exceptions, such as you'd get from Python et al > * Kernel oops and panics > * Intel GPU dump output > * dmesg & Xorg.0.log, triggered by GPU hangs All of those fit what I've been talking about :) - I'd extend the list with: * Nonfatal but significant issues (e.g. in LP a page that is slower than N seconds is logged with full data gathering but does not show an error to the user.) A desktop example might be requested X extensions that a driver doesn't support well (giving data to inform development decisions such as optimisation efforts). -Rob -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
On Mon, 2011-08-08 at 00:34 -0700, Bryce Harrington wrote: > On Mon, Aug 08, 2011 at 12:18:51AM -0700, Bryce Harrington wrote: > > > I don't know if we've actually written down what we want out of a crash > > > database, though. Do we have a requirements document for one? If the > > > Launchpad team wanted to devote some time to adding a crash database do > > > they know what we want out of such a beast? > > > > I agree, that seems like an important first step. At this point it's > > unclear whether Launchpad's needs overlap with ours or if they're highly > > divergent. > > Here's a start... > > * A collection of files are gathered client-side and inserted into the >crash database record. > > * Processed versions of files (i.e. retracer output) can be added >subsequently. > > * Some files must be kept private (i.e. core dumps) > > * Traces from multiple crash reports are algorithmically compared to >find exact-dupes and likely-dupes. > > * Crash reports can be grouped by package, by distro release, or by both. > > * Statistics are generated to show number of [exact|exact+likely] dupes >for each type of crash. Statistics can be provided by package, by >distro release, by date range, or a combination. > > * Bug report(s) can be associated with a given set of crashes. > > * The user should have some way to check back on the status of their >crash report; e.g. have some report ID they can look at to see >statistics and/or any associated bug #. To this list I'd add: * It should be brainlessly easy for users to submit this data. Either a single "Yes, submit this crash" confirmation, or a check box to automatically submit these crashes. One of the features that the X team really desire out of this sort of database is "how frequent is this kind of problem", which requires the widest possible sample space. * For X and kernel crashes (at least), these reports need to be indexable by hardware. That is, we want to be able to answer both "how prevalent are GPU hangs on Intel hardware?" and "on what hardware does this GPU hang appear?". Probably either DMI data or PCIIDs or both are needed for this. While we're using the terminology "crash report", I want to ensure that there's a sufficiently general understanding of what this means. I think we'd want this to cover at least: * Actual C-style crashes, with core. * Unhandled exceptions, such as you'd get from Python et al * Kernel oops and panics * Intel GPU dump output * dmesg & Xorg.0.log, triggered by GPU hangs CHRis. signature.asc Description: This is a digitally signed message part -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
On Mon, Aug 8, 2011 at 8:05 PM, Rick Spencer wrote: > On Mon, 2011-08-08 at 19:44 +1200, Robert Collins wrote: >> On Mon, Aug 8, 2011 at 6:25 PM, Christopher James Halse Rogers >> wrote: >> The LP team doesn't *currently* have working on this in our short term >> plans; if the stakeholders wanted to negotiate queue jumping, we could >> put a squad on it at the end of the next(ish) project, or some folk >> may do something in scratch time (like my experiment with cassandra >> has been). >> > > I'm wondering why the launchpad team would work on this. It seems that > the desire is to collect the crash data outside of lp bug reports. LP isn't defined by the features we have - they shift over time :) > Am I missing something? Theres a few potential reasons. One is very simple: we need it ourselves; we logged about 14 thousand crash reports yesterday from Launchpad itself into our first-gen crash database (e.g. https://lp-oops.canonical.com/oops.py/?oopsid=1971CBA7). We have some automated analysis of this today but its a kludgy system: high latency, no adhoc analysis, single tenant (and as Launchpad moves to a more SOA model multitenancy or something like it will matter more), no efficient searching. This is much the same reason that Mozilla has built a crash database, for instance. Secondly, one of the things Launchpad exists to do is to provide a solid platform for folk writing software - I see a generic, extensible crash database as something many projects would benefit from. That would be a separate project of course, but a relatively small step up from a just-for-our-own-crashes service. Right now I know of the following Canonical sponsored-or-related projects which could use such a service: t Ubuntu one (server), landscape (server) and Canonical ISD (SSO etc). Not to mention Ubuntu, of course. I can well imagine projects like drizzle, openerp or openstack, wanting to gather anonymised crash data even when folk are running snapshots, or unpackaged builds. (Packages on Ubuntu would naturally get reports gathered without the upstream being a direct user, but there are lots of unpackaged use cases for regular software, let alone web server stacks (consider all the virtualenv based django deployments out there...). Thirdly, as a team Launchpad /started/ as a project to satisfy the server-side infrastructure needs of Ubuntu; supporting Ubuntu's collaboration with Debian, with Upstream, supporting Translations of Ubuntu, and over time this has grown - soyuz was written for Ubuntu, blueprints for UDS. Of course, Launchpad has users beyond Ubuntu, in our capacity as a hosting forge for upstreams, and as a hosting site for derivatives like the OEM flavours and Linaro, but providing services that are primarily, or even exclusively, needed by Ubuntu is well within our mandate. The crash database concept is clearly not one of these exclusively-for-Ubuntu cases (even without aiming at multitenancy, LP needs one itself). We had painted ourselves into a process corner a while back which lead to us having to push back on the amount of things we undertake... but the team have been working fearlessly to fix that, and I think we're over the hump now - on the way back to short, high quality and fast iterations. Fourth, the total workflow desired is to capture crashes and derive bug reports from that: defining Launchpads relationship to crashes by our current implementation would be overly strict IMO: I think LP will *need* to be involved, at minimum, in the crashdb -> bug mapping process / mechanism (e.g. for questions like 'which crash reports are related to this bug' to be answered in the web UI). It doesn't matter whether the crash database is maintained by a different team or not, we'll need integration. Lastly, while the particular form of scalability needed is different to that needed by Launchpad's existing services, writing web services is what the Launchpad team is all about, and we've several services that have similar scaling needs (though different in detail and in likely implementation) - things like the librarian and codehosting. We're already planning on how we will split those out into separate web services which can be scaled separately and reused more effectively. -Rob -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
On Mon, 2011-08-08 at 19:44 +1200, Robert Collins wrote: > On Mon, Aug 8, 2011 at 6:25 PM, Christopher James Halse Rogers > wrote: > The LP team doesn't *currently* have working on this in our short term > plans; if the stakeholders wanted to negotiate queue jumping, we could > put a squad on it at the end of the next(ish) project, or some folk > may do something in scratch time (like my experiment with cassandra > has been). > I'm wondering why the launchpad team would work on this. It seems that the desire is to collect the crash data outside of lp bug reports. Am I missing something? Cheers, Rick -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
On Mon, Aug 8, 2011 at 6:25 PM, Christopher James Halse Rogers wrote: > I know it's been proposed and discussed previously, and I believe Robert > Collins both wants something similar for Launchpad and has done some > work towards making it happen. I have a draft implementation of a core db server intended for use with LP's own crash reporting requirements, and hopefully scalable enough to handle a high volume of reports (e.g. 1M/day). > I don't know if we've actually written down what we want out of a crash > database, though. Do we have a requirements document for one? If the > Launchpad team wanted to devote some time to adding a crash database do > they know what we want out of such a beast? The LP team doesn't *currently* have working on this in our short term plans; if the stakeholders wanted to negotiate queue jumping, we could put a squad on it at the end of the next(ish) project, or some folk may do something in scratch time (like my experiment with cassandra has been). > Is there a LEP for a crash database? If not, perhaps we could gather > requirements in this thread and then write one? Yes, there are two in active discussion I believe: There is this: https://dev.launchpad.net/LEP/OopsDisplay (ignore the name, which could be better). And the Ubuntu folk & the dublin sprint are putting some distro side needs together - https://wiki.ubuntu.com/CrashTracker - which is on my TODO list to provide some thoughts on the server design side. As part of the Launchpad SOA process we'll be splitting out various crash report tools that are currently part of the LP codebase itself - into small reusable python modules. I'd be delighted if they are generic enough to fit in with Ubuntu's needs here - ideally through migrating parts of apport into using the core facilities [we looked at using apport for LP's needs, but the bits we need are different enough that the engineers assessing it decided not to]. -Rob -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
On Mon, Aug 08, 2011 at 12:18:51AM -0700, Bryce Harrington wrote: > > I don't know if we've actually written down what we want out of a crash > > database, though. Do we have a requirements document for one? If the > > Launchpad team wanted to devote some time to adding a crash database do > > they know what we want out of such a beast? > > I agree, that seems like an important first step. At this point it's > unclear whether Launchpad's needs overlap with ours or if they're highly > divergent. Here's a start... * A collection of files are gathered client-side and inserted into the crash database record. * Processed versions of files (i.e. retracer output) can be added subsequently. * Some files must be kept private (i.e. core dumps) * Traces from multiple crash reports are algorithmically compared to find exact-dupes and likely-dupes. * Crash reports can be grouped by package, by distro release, or by both. * Statistics are generated to show number of [exact|exact+likely] dupes for each type of crash. Statistics can be provided by package, by distro release, by date range, or a combination. * Bug report(s) can be associated with a given set of crashes. * The user should have some way to check back on the status of their crash report; e.g. have some report ID they can look at to see statistics and/or any associated bug #. Bryce -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))
On Mon, Aug 08, 2011 at 04:25:30PM +1000, Christopher James Halse Rogers wrote: > On Sun, 2011-08-07 at 22:46 -0700, Bryce Harrington wrote: > > On Mon, Aug 08, 2011 a 07:17:52AM +0200, Rick Spencer wrote: > > > On Sat, 2011-08-06 at 15:46 -0700, Bryce Harrington wrote: > > > > > > > > > > I think we are doing it wrong: we should collect > > > > > crashes on all supported releases. > > > > > > > > I agree. As designed, apport files a new bug report for each crash, > > > > which can quickly lead to excessive numbers of dupe bug reports (there > > > > are ways of making apport auto-dupe, but this takes effort to set up and > > > > isn't always 100% reliable). This can quickly become unmanageable > > > > especially for packages that lack someone to keep an eye on the bug > > > > reports. > > > > > > > > In any case, these types of reports post-release are most useful in > > > > aggregate rather than as individual bug reports. If they were filed in > > > > some ultra simple crash database (with no signup required of the user) > > > > we could get most of the value without incurring a lot of extra bug > > > > labor. > > > Has this very thing not been proposed? We should discuss doing this for > > > 12.04. > > > > It's been propsed and discussed previously, like ScottK mentioned. The > > issue has been finding someone with time to work on it. > > I know it's been proposed and discussed previously, and I believe Robert > Collins both wants something similar for Launchpad and has done some > work towards making it happen. Yes, there's an open bug report against LP for it I ran across at one point. > I don't know if we've actually written down what we want out of a crash > database, though. Do we have a requirements document for one? If the > Launchpad team wanted to devote some time to adding a crash database do > they know what we want out of such a beast? I agree, that seems like an important first step. At this point it's unclear whether Launchpad's needs overlap with ours or if they're highly divergent. > Is there a LEP for a crash database? If not, perhaps we could gather > requirements in this thread and then write one? There was a LEP started several years ago but it was abandoned and the spec is considered obsolete by the LP folks. There are no active LEPs that I know of. Bryce -- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel