Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-09 Thread Martin Pool
It might be good to look at the Android equivalent of this too:

  
http://android-developers.blogspot.com/2010/05/google-feedback-for-android.html

Martin

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-08 Thread Robert Collins
On Tue, Aug 9, 2011 at 5:37 PM, Christopher James Halse Rogers
 wrote:
> While we're using the terminology "crash report", I want to ensure that
> there's a sufficiently general understanding of what this means.  I
> think we'd want this to cover at least:
>  * Actual C-style crashes, with core.
>  * Unhandled exceptions, such as you'd get from Python et al
>  * Kernel oops and panics
>  * Intel GPU dump output
>  * dmesg & Xorg.0.log, triggered by GPU hangs

All of those fit what I've been talking about :) - I'd extend the list with:
 * Nonfatal but significant issues (e.g. in LP a page that is slower
than N seconds is logged with full data gathering but does not show an
error to the user.) A desktop example might be requested X extensions
that a driver doesn't support well (giving data to inform development
decisions such as optimisation efforts).

-Rob

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-08 Thread Christopher James Halse Rogers
On Mon, 2011-08-08 at 00:34 -0700, Bryce Harrington wrote:
> On Mon, Aug 08, 2011 at 12:18:51AM -0700, Bryce Harrington wrote:
> > > I don't know if we've actually written down what we want out of a crash
> > > database, though.  Do we have a requirements document for one?  If the
> > > Launchpad team wanted to devote some time to adding a crash database do
> > > they know what we want out of such a beast?
> > 
> > I agree, that seems like an important first step.  At this point it's
> > unclear whether Launchpad's needs overlap with ours or if they're highly
> > divergent.
> 
> Here's a start...
> 
>  * A collection of files are gathered client-side and inserted into the
>crash database record.
> 
>  * Processed versions of files (i.e. retracer output) can be added
>subsequently.
> 
>  * Some files must be kept private (i.e. core dumps)
> 
>  * Traces from multiple crash reports are algorithmically compared to
>find exact-dupes and likely-dupes.
> 
>  * Crash reports can be grouped by package, by distro release, or by both.
> 
>  * Statistics are generated to show number of [exact|exact+likely] dupes
>for each type of crash.  Statistics can be provided by package, by
>distro release, by date range, or a combination.
> 
>  * Bug report(s) can be associated with a given set of crashes.
> 
>  * The user should have some way to check back on the status of their
>crash report; e.g. have some report ID they can look at to see
>statistics and/or any associated bug #.

To this list I'd add: 
 * It should be brainlessly easy for users to submit this data.  Either
a single "Yes, submit this crash" confirmation, or a check box to
automatically submit these crashes.  One of the features that the X team
really desire out of this sort of database is "how frequent is this kind
of problem", which requires the widest possible sample space.

 * For X and kernel crashes (at least), these reports need to be
indexable by hardware.  That is, we want to be able to answer both "how
prevalent are GPU hangs on Intel hardware?" and "on what hardware does
this GPU hang appear?".  Probably either DMI data or PCIIDs or both are
needed for this.

While we're using the terminology "crash report", I want to ensure that
there's a sufficiently general understanding of what this means.  I
think we'd want this to cover at least:
 * Actual C-style crashes, with core.
 * Unhandled exceptions, such as you'd get from Python et al
 * Kernel oops and panics
 * Intel GPU dump output
 * dmesg & Xorg.0.log, triggered by GPU hangs

CHRis.


signature.asc
Description: This is a digitally signed message part
-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-08 Thread Robert Collins
On Mon, Aug 8, 2011 at 8:05 PM, Rick Spencer  wrote:
> On Mon, 2011-08-08 at 19:44 +1200, Robert Collins wrote:
>> On Mon, Aug 8, 2011 at 6:25 PM, Christopher James Halse Rogers
>>  wrote:
>> The LP team doesn't *currently* have working on this in our short term
>> plans; if the stakeholders wanted to negotiate queue jumping, we could
>> put a squad on it at the end of the next(ish) project, or some folk
>> may do something in scratch time (like my experiment with cassandra
>> has been).
>>
>
> I'm wondering why the launchpad team would work on this. It seems that
> the desire is to collect the crash data outside of lp bug reports.

LP isn't defined by the features we have - they shift over time :)

> Am I missing something?

Theres a few potential reasons.

One is very simple: we need it ourselves; we logged about 14 thousand
crash reports yesterday from Launchpad itself into our first-gen crash
database (e.g. https://lp-oops.canonical.com/oops.py/?oopsid=1971CBA7).
We have some automated analysis of this today but its a kludgy system:
high latency, no adhoc analysis, single tenant (and as Launchpad moves
to a more SOA model multitenancy or something like it will matter
more), no efficient searching. This is much the same reason that
Mozilla has built a crash database, for instance.

Secondly, one of the things Launchpad exists to do is to provide a
solid platform for folk writing software - I see a generic, extensible
crash database as something many projects would benefit from. That
would be a separate project of course, but a relatively small step up
from a just-for-our-own-crashes service. Right now I know of the
following Canonical sponsored-or-related projects which could use such
a service: t Ubuntu one (server), landscape (server) and Canonical ISD
(SSO etc). Not to mention Ubuntu, of course. I can well imagine
projects like drizzle, openerp or openstack, wanting to gather
anonymised crash data even when folk are running snapshots, or
unpackaged builds. (Packages on Ubuntu would naturally get reports
gathered without the upstream being a direct user, but there are lots
of unpackaged use cases for regular software, let alone web server
stacks (consider all the virtualenv based django deployments out
there...).

Thirdly, as a team Launchpad /started/ as a project to satisfy the
server-side infrastructure needs of Ubuntu; supporting Ubuntu's
collaboration with Debian, with Upstream, supporting Translations of
Ubuntu, and over time this has grown - soyuz was written for Ubuntu,
blueprints for UDS. Of course, Launchpad has users beyond Ubuntu, in
our capacity as a hosting forge for upstreams, and as a hosting site
for derivatives like the OEM flavours and Linaro, but providing
services that are primarily, or even exclusively, needed by Ubuntu is
well within our mandate. The crash database concept is clearly not one
of these exclusively-for-Ubuntu cases (even without aiming at
multitenancy, LP needs one itself). We had painted ourselves into a
process corner a while back which lead to us having to push back on
the amount of things we undertake... but the team have been working
fearlessly to fix that, and I think we're over the hump now - on the
way back to short, high quality and fast iterations.

Fourth, the total workflow desired is to capture crashes and derive
bug reports from that: defining Launchpads relationship to crashes by
our current implementation would be overly strict IMO: I think LP will
*need* to be involved, at minimum, in the crashdb -> bug mapping
process / mechanism (e.g. for questions like 'which crash reports are
related to this bug' to be answered in the web UI). It doesn't matter
whether the crash database is maintained by a different team or not,
we'll need integration.

Lastly, while the particular form of scalability needed is different
to that needed by Launchpad's existing services, writing web services
is what the Launchpad team is all about, and we've several services
that have similar scaling needs (though different in detail and in
likely implementation) - things like the librarian and codehosting.
We're already planning on how we will split those out into separate
web services which can be scaled separately and reused more
effectively.

-Rob

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-08 Thread Rick Spencer
On Mon, 2011-08-08 at 19:44 +1200, Robert Collins wrote:
> On Mon, Aug 8, 2011 at 6:25 PM, Christopher James Halse Rogers
>  wrote:
> The LP team doesn't *currently* have working on this in our short term
> plans; if the stakeholders wanted to negotiate queue jumping, we could
> put a squad on it at the end of the next(ish) project, or some folk
> may do something in scratch time (like my experiment with cassandra
> has been).
> 

I'm wondering why the launchpad team would work on this. It seems that
the desire is to collect the crash data outside of lp bug reports. 

Am I missing something?

Cheers, Rick


-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-08 Thread Robert Collins
On Mon, Aug 8, 2011 at 6:25 PM, Christopher James Halse Rogers
 wrote:
> I know it's been proposed and discussed previously, and I believe Robert
> Collins both wants something similar for Launchpad and has done some
> work towards making it happen.

I have a draft implementation of a core db server intended for use
with LP's own crash reporting requirements, and hopefully scalable
enough to handle a high volume of reports (e.g. 1M/day).

> I don't know if we've actually written down what we want out of a crash
> database, though.  Do we have a requirements document for one?  If the
> Launchpad team wanted to devote some time to adding a crash database do
> they know what we want out of such a beast?

The LP team doesn't *currently* have working on this in our short term
plans; if the stakeholders wanted to negotiate queue jumping, we could
put a squad on it at the end of the next(ish) project, or some folk
may do something in scratch time (like my experiment with cassandra
has been).

> Is there a LEP for a crash database?  If not, perhaps we could gather
> requirements in this thread and then write one?

Yes, there are two in active discussion I believe:

There is this: https://dev.launchpad.net/LEP/OopsDisplay (ignore the
name, which could be better).

And the Ubuntu folk & the dublin sprint are putting some distro side
needs together - https://wiki.ubuntu.com/CrashTracker - which is on my
TODO list to provide some thoughts on the server design side.

As part of the Launchpad SOA process we'll be splitting out various
crash report tools that are currently part of the LP codebase itself -
into small reusable python modules. I'd be delighted if they are
generic enough to fit in with Ubuntu's needs here - ideally through
migrating parts of apport into using the core facilities [we looked at
using apport for LP's needs, but the bits we need are different enough
that the engineers assessing it decided not to].

-Rob

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-08 Thread Bryce Harrington
On Mon, Aug 08, 2011 at 12:18:51AM -0700, Bryce Harrington wrote:
> > I don't know if we've actually written down what we want out of a crash
> > database, though.  Do we have a requirements document for one?  If the
> > Launchpad team wanted to devote some time to adding a crash database do
> > they know what we want out of such a beast?
> 
> I agree, that seems like an important first step.  At this point it's
> unclear whether Launchpad's needs overlap with ours or if they're highly
> divergent.

Here's a start...

 * A collection of files are gathered client-side and inserted into the
   crash database record.

 * Processed versions of files (i.e. retracer output) can be added
   subsequently.

 * Some files must be kept private (i.e. core dumps)

 * Traces from multiple crash reports are algorithmically compared to
   find exact-dupes and likely-dupes.

 * Crash reports can be grouped by package, by distro release, or by both.

 * Statistics are generated to show number of [exact|exact+likely] dupes
   for each type of crash.  Statistics can be provided by package, by
   distro release, by date range, or a combination.

 * Bug report(s) can be associated with a given set of crashes.

 * The user should have some way to check back on the status of their
   crash report; e.g. have some report ID they can look at to see
   statistics and/or any associated bug #.

Bryce

-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


Re: Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

2011-08-08 Thread Bryce Harrington
On Mon, Aug 08, 2011 at 04:25:30PM +1000, Christopher James Halse Rogers wrote:
> On Sun, 2011-08-07 at 22:46 -0700, Bryce Harrington wrote:
> > On Mon, Aug 08, 2011 a 07:17:52AM +0200, Rick Spencer wrote:
> > > On Sat, 2011-08-06 at 15:46 -0700, Bryce Harrington wrote:
> > > > > 
> > > > >  I think we are doing it wrong: we should collect
> > > > > crashes on all supported releases. 
> > > > 
> > > > I agree.  As designed, apport files a new bug report for each crash,
> > > > which can quickly lead to excessive numbers of dupe bug reports (there
> > > > are ways of making apport auto-dupe, but this takes effort to set up and
> > > > isn't always 100% reliable).  This can quickly become unmanageable
> > > > especially for packages that lack someone to keep an eye on the bug
> > > > reports.
> > > > 
> > > > In any case, these types of reports post-release are most useful in
> > > > aggregate rather than as individual bug reports.  If they were filed in
> > > > some ultra simple crash database (with no signup required of the user)
> > > > we could get most of the value without incurring a lot of extra bug
> > > > labor.
> > > Has this very thing not been proposed? We should discuss doing this for
> > > 12.04.
> > 
> > It's been propsed and discussed previously, like ScottK mentioned.  The
> > issue has been finding someone with time to work on it.
> 
> I know it's been proposed and discussed previously, and I believe Robert
> Collins both wants something similar for Launchpad and has done some
> work towards making it happen.

Yes, there's an open bug report against LP for it I ran across at one
point.

> I don't know if we've actually written down what we want out of a crash
> database, though.  Do we have a requirements document for one?  If the
> Launchpad team wanted to devote some time to adding a crash database do
> they know what we want out of such a beast?

I agree, that seems like an important first step.  At this point it's
unclear whether Launchpad's needs overlap with ours or if they're highly
divergent.

> Is there a LEP for a crash database?  If not, perhaps we could gather
> requirements in this thread and then write one?

There was a LEP started several years ago but it was abandoned and the
spec is considered obsolete by the LP folks.  There are no active LEPs
that I know of.

Bryce


-- 
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel