[dev] crash reporter, what are the various distro's strategies ?

2007-05-18 Thread Caolan McNamara
So when OOo crashes, what are the strategies that distros employ here if
any ?

For RH we don't build the crashreporter (as it's basically unusable info
for Sun), but we do configure to enable using it, and replace it in the
install set with a simple replacement that tests for the set of common
problems that historically caused crashes, e.g. nvidia drivers, a11y
enabled, selinux settings and insane local library butchering, and
prompts the unfortunate user to log the supplied text to the RH
bugzilla, and we map the stack back to source with
http://people.redhat.com/caolanm/ooocvs/ooomapstack after throwing out
the nvidia driver using reports

At some stage in the past the ooobuild OOo would spawn off the gnome
gnome bug-buddy, but that's no longer the case is it ?

So do other distros have various solutions here, or just simply crash
out and/or dump core ?

What I'm thinking about aiming at is a shared cross-distro crash
repository where we can auto submit the distro OOo crashes, and the
distros can plug in their various stack mappers, with quick and dirty
gnomebugzilla-alike tooling to merge the duped backtraces together.

C.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[dev] Additional toughts about crash reporting, WAS: Re: [dev] crash reporter, what are the various distro's strategies ?

2007-05-18 Thread Bernd Eilers

Caolan McNamara wrote:

Hi Caolan,


[... snip ...]

What I'm thinking about aiming at is a shared cross-distro crash
repository where we can auto submit the distro OOo crashes, and the
distros can plug in their various stack mappers, with quick and dirty
gnomebugzilla-alike tooling to merge the duped backtraces together.



Some additional toughts about crash reporting that have just come to my 
mind:


As far as I have heard there are distributions which currently do not 
release from MasterWorkspaces or ChildWorkspaces but do in fact use some 
kind of more or less complex system to release something which is kind 
of a mix of code from current MasterWorkspace together with some patches 
from integrating stuff from some unfinished/not yet integrated 
ChildWorkspaces together with maybe some patches which are not even yet 
on some ChildWorkspace etc. or maybe even plus some code hold back from 
contributing back for some time for creating some kind of artifical 
feature advantage etc. For such builds of course anything send as 
version information in the reportmail.xml file is kind of unusable. When 
distributions can not agree on a common release mechanism which is based 
on releasing from MasterWorkspaces / ChildWorkspaces we might not be 
able to find common ground on how to handle crash reports and what data 
must be send by crash-reporting for storing it in a cross-distro crash 
repository.


It may be a tedious and time consuming tasks for developers of 
distribution-A having to sort out false positives in the crash 
repository for things like regression or new bugs in code which is not 
even in their distribution because it is based on code of distribution-B 
which was not yet contributed back via ChildWorkspaces or in some 
private patch applied but where the reports stacktrace just looked 
similar to annother one which caused by a problem in common code.


There is some kind of numeric buildid used by OOo and there is also a 
build string in a config file containing that plus some other 
information, eg. name of a ChildWorkspace, used when building. Problem 
is that the build string does not contain information like wether we 
have an SRC680 or something like OOF680 etc. therefor the backend must 
have means to map the numeric buildid to such information. Thus the 
numeric buildid needs to become globally unique across distributions 
requiring a mechanism ( web service etc.) to request a new uniq buildid 
offered somewhere.



C.



Kind regards,
Bernd Eilers

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev] crash reporter, what are the various distro's strategies ?

2007-05-18 Thread Bernd Eilers

Caolan McNamara wrote:

So when OOo crashes, what are the strategies that distros employ here if
any ?



Sun has a SOAP receiving service for crash reports, a database to store 
crashreports, a daemon service which tries to find similiar stacktrace 
based on the information send in the XML file which works with a certain 
fuzziness, a web frontend for development to look at the stuff stored in 
the database and some automated process which generates tasks from crash 
reports to developers in a Sun internal bugtracking system based on 
frequency of crashes. The autosubmission of tasks and the web frontend 
provide some means to resolve the data send with the crash report 
against the debug information kept for released versions using platform 
depend utilities, eg. kd.exe on windows addr2line on unix. The 
autosubmission of tasks merly guesses on a potential developer which 
could become the possible first owner based on applying random function 
to the library owners of the first 5 stackframes, mostly such tasks get 
reassigned by the first owner to a better one sometimes the one 
initially getting the Task is already the right one. In the database 
reports have their own reportid and similar ones are grouped together 
under a common stackmatchid.



For RH we don't build the crashreporter (as it's basically unusable info
for Sun),


This is because Sun would not have the debug information created during 
building those Releases which would be needed to 'resolv' the crash 
report getting stackinformation with source code filenames, classes and 
line numbers out of it. The information in the XML file is not enough 
for the developer to locate the origin of the problem, it´s only enough 
for checking possible similiarity.



but we do configure to enable using it, and replace it in the
install set with a simple replacement that tests for the set of common
problems that historically caused crashes, e.g. nvidia drivers, a11y
enabled, selinux settings and insane local library butchering, and
prompts the unfortunate user to log the supplied text to the RH
bugzilla, and we map the stack back to source with
http://people.redhat.com/caolanm/ooocvs/ooomapstack after throwing out
the nvidia driver using reports



This ooomapstack utility most likely will also need access to debug 
information kept during the build, wouldn´t it?



At some stage in the past the ooobuild OOo would spawn off the gnome
gnome bug-buddy, but that's no longer the case is it ?

So do other distros have various solutions here, or just simply crash
out and/or dump core ?

What I'm thinking about aiming at is a shared cross-distro crash
repository where we can auto submit the distro OOo crashes, and the
distros can plug in their various stack mappers, with quick and dirty
gnomebugzilla-alike tooling to merge the duped backtraces together.


I see a few potential problems with that:

1.) Crash data does contain dump of memmory which in some cases might 
contain personal information which end-users would not want others to be 
able to find on public websites. There is stuff like 
http://www.sun.com/privacy/ and I am quite sure others do have similar 
terms to adhere to. Which means that anything we can create can not be 
created in full public but must have limited access to restricted 
group(s) of developers who agree on common terms of handling possible 
private data in a secure manner.


2.) Who should host this repository? Or are you thinking about some kind 
of distributed or partly distributed repository?


3.) Resources have to be available to call something like RH´s 
ooomapstack utility or Sun´s crashdebug utility on all supported platforms.


4.) Resources have to be available to keep debug information for all 
builds for doing 3.)


5.) 3.) + 4.) have to be on the same network meaning that either one 
contributor would have to provide disk space and computing resources for 
anyone else or that everyone has too keep it´s one debug-information and 
provide computing resources for the community to 'resolv' crash reports 
send in for their distributions.


6.) At least what we (Sun) currently use as 'StackMapping' (in terms of 
grouping together similar reports) does often have false mapping in both 
directions, that means that the system sometimes things 2 reports are 
similiar but the developer later on finds out that they have a differnt 
root cause and that also the system sometimes thinks two reports are not 
similar but the developer later on finds out that they do in fact have 
the same root cause. Considering our 'fuzzy'-algorithm that´s where I am 
starting to wonder how that gnomebugzilla-alike tooling to merge the 
duped backtraces together might look like! Handling does false positives 
/ false negatives already has become a time consuming task which would 
get worse with a larger repository containing data for all distros.


7.) With the amount of data that we (Sun) currently already have in the 
database stackmatching 

Re: [dev] crash reporter, what are the various distro's strategies ?

2007-05-18 Thread Caolan McNamara
On Fri, 2007-05-18 at 15:43 +0200, Bernd Eilers wrote:
 Caolan McNamara wrote:
  So when OOo crashes, what are the strategies that distros employ here if
  any ?
  
 
 Sun has a SOAP receiving service for crash reports, 

I'm aware of the tooling there, but this is somewhat orthogonal as the
Sun tooling is internal and doesn't factor into this much.

  For RH we don't build the crashreporter (as it's basically unusable info
  for Sun),
 
 This is because Sun would not have the debug information created during 
 building those Releases which would be needed to 'resolv' the crash

But we do of course, and are both able to and actually do the mapping.
And probably other distros can as well, and actually any of the distros
using debuginfo rpms can use the simple RH ooomapstack tooling without
much modification I'd assume.

 This ooomapstack utility most likely will also need access to debug 
 information kept during the build, wouldn´t it?

Yup, not a problem. We keep that info around for all our packages e.g.
for the latest Fedora Core 6 it just needs an unpacked
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/6/i386/debug/openoffice.org-debuginfo-2.0.4-5.5.22.i386.rpm
so any potential cross-distro crash server would just need a little
fedora core disto specific plugin to go fetch the matching debuginfo rpm
and run ooomapstack on it.

 I see a few potential problems with that:
 
 1.) Crash data does contain dump of memmory which in some cases might 
 contain personal information which end-users would not want others to be 
 able to find on public websites. 

All that's at stack here is the basic simple stack traces, without
memory dumps. So the issue doesn't arise.

 2.) Who should host this repository? Or are you thinking about some kind 
 of distributed or partly distributed repository?

Well, that's the nub of the issue isn't it. But I don't see this as affecting 
Sun
in any way, i.e. this isn't intended as pumping more data into Suns database, 
just an alternative crash reporter for the distro builders that might point at a
shared database.

Certainly there may be issues of scalability and it may be impractical. I was 
thinking 
of whipping something Red Hat specific together and giving it a whirl during 
the up and
coming development cycle when there are a limited number of users to get a feel 
for 
how much data and server load would be involved when not considering vast 
number of 
window users and where all traces would generally come from a very homogeneous 
environment.  

C.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev] crash reporter, what are the various distro's strategies ?

2007-05-18 Thread Bernd Eilers


ReHi,

Caolan McNamara wrote:

On Fri, 2007-05-18 at 15:43 +0200, Bernd Eilers wrote:


Caolan McNamara wrote:



[... snip ...]




Certainly there may be issues of scalability and it may be impractical. I was thinking 
of whipping something Red Hat specific together and giving it a whirl during the up and
coming development cycle when there are a limited number of users to get a feel for 
how much data and server load would be involved when not considering vast number of 
window users and where all traces would generally come from a very homogeneous environment.  



I think I did already sent out some information about the simple SOAP 
interface used by crash reporting and how to configure the site being 
used etc. to some mailing list some time ago. If you need any help with 
stuff like that you can drop me a note.


If there would some need arise to extend the XML file format used for 
crash reporting while you are at such task I would also like to get 
involved as I consider it to be better to have one common file format 
specification there with maybe some optional data evtually not handled 
by some party than to have two or more derived formats at different 
contributions.


Note that adding for example XML attributes in the reportmail.xml via 
some CWS without us adding support for that in the DTD we keep for it in 
the backend would crash our backend handling, so before integrating such 
 kind of feature please drop me a note ;-)


 C.

Kind regards,
Bernd Eilers

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev] Additional toughts about crash reporting, WAS: Re: [dev] crash reporter, what are the various distro's strategies ?

2007-05-18 Thread Bernd Eilers

Caolan McNamara wrote:

On Fri, 2007-05-18 at 16:23 +0200, Bernd Eilers wrote:


Caolan McNamara wrote:

Hi Caolan,


ReHi,



As far as I have heard there are distributions which currently do not 
release from MasterWorkspaces or ChildWorkspaces but do in fact use some 
kind of more or less complex system to release something which is kind 
of a mix of code from current MasterWorkspace together with some patches 
from integrating stuff from some unfinished/not yet integrated 
ChildWorkspaces together with maybe some patches which are not even yet 
on some ChildWorkspace etc. or maybe even plus some code hold back from 
contributing back for some time for creating some kind of artifical 
feature advantage etc.



Like a StarOffice email merge component, or a StarOffice headless vcl
plugin, or a StarOffice blog component. That kind of thing maybe :-)



Nah those are usually in different libs or are even in a packaged 
extension etc. and would thus not cause a same-match-different-problem 
situation. Potential problems arise only if code is in same lib :-)




For such builds of course anything send as 
version information in the reportmail.xml file is kind of unusable. 



I'm not totally convinced, certainly the info is unusable unless you
have the right source for that version which crashed, but again the
distros have the matching source for each build, so if there's a
distro-specific backported patch or an experimental addition in play the
data will only be relevant for that distro. 


I don't see it as a huge problem really. Just have the disto crash
reporter autofill in the distro and distro OOo package version when
submitting the report, if the various stacks from different distros
match exactly then it's in code which has the same state across distros,
if it doesn't then it remains a problem for the afflicted distro only.



Yes but there currently is no such thing as a distro and 
distro_version variable defined in the reportmail.xml file format, 
there is only the build string and the product variable meaning to 
support this and to support such distros not using a release from 
masterworkspace model for being able to map back to their corresponing 
debug information we must extend the fileformat used for reportmail.xml 
and update it´s corresponding DTD for being able to send this new 
information. This can of course be done but must be coordinated, see 
also my other reply about changing stuff like that.




It may be a tedious and time consuming tasks for developers of 
distribution-A having to sort out false positives in the crash 
repository for things like regression or new bugs in code which is not 
even in their distribution.



Well I'd foresee that it would be up to each distro to look at their own
stacktraces so I don't see that occurring, but I don't have a feel for
the scale of how many reports might be in question. 


Yeah well I was first misunderstanding your intentions of course a 
little bit and thought you were talking about a common system that could 
be used by ANY distro ANY platform ANY product ANY developer, but this 
doesn´t seem to be the case.



But indeed, distros
that drift from a common base would have to re-do work done by others to
note that their trace is the same as others. I naively see a quick auto
comparison on the unmapped stack to see if they are exactly the same as
something else and autodup it, and then on the remainder map back to
source and do a fuzzy comparison to make a list of candidates and leave
it up the humans to make a decision. Perfection wouldn't be a goal, the
odd crash here and there sucks but it's the regular occurrences that I'm
interested in.


Agreed.

And unfortunatly I must add that getting near prefection in that area is 
not being possible at all :-(





Therefor the backend must have means to map the numeric buildid to such information. 
Thus the numeric buildid needs to become globally unique across distributions 



Yeah, there needs to be a unique identified for the version of OOo and
the distro itself. I was (vaguely) thinking more of simply a config file
for a crashreporting tool with a distro.id and have the tool submit
the standard report wrapped in some extra data like the distro.id and
other useful data, like versions of known troublesome software.


I would rather have us extend the reportmail.xml file format spec for 
that. There is already support for namespaces in the definition about 
how that file looks like which groups different types of information 
send in that file into categories so adding other userful data maybe a 
matter of creating corresponding namespace for each type of useful 
data while adding stuff like distroid might fit into exiting namespaces 
just adding new attributes as optional attribures to available elements 
like those being used for officeinfo version information.


For stuff that does not fit into the XML format there is of course 
always the possibility to have additional binary or text