[dev] crash reporter, what are the various distro's strategies ?
So when OOo crashes, what are the strategies that distros employ here if any ? For RH we don't build the crashreporter (as it's basically unusable info for Sun), but we do configure to enable using it, and replace it in the install set with a simple replacement that tests for the set of common problems that historically caused crashes, e.g. nvidia drivers, a11y enabled, selinux settings and insane local library butchering, and prompts the unfortunate user to log the supplied text to the RH bugzilla, and we map the stack back to source with http://people.redhat.com/caolanm/ooocvs/ooomapstack after throwing out the nvidia driver using reports At some stage in the past the ooobuild OOo would spawn off the gnome gnome bug-buddy, but that's no longer the case is it ? So do other distros have various solutions here, or just simply crash out and/or dump core ? What I'm thinking about aiming at is a shared cross-distro crash repository where we can auto submit the distro OOo crashes, and the distros can plug in their various stack mappers, with quick and dirty gnomebugzilla-alike tooling to merge the duped backtraces together. C. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[dev] Additional toughts about crash reporting, WAS: Re: [dev] crash reporter, what are the various distro's strategies ?
Caolan McNamara wrote: Hi Caolan, [... snip ...] What I'm thinking about aiming at is a shared cross-distro crash repository where we can auto submit the distro OOo crashes, and the distros can plug in their various stack mappers, with quick and dirty gnomebugzilla-alike tooling to merge the duped backtraces together. Some additional toughts about crash reporting that have just come to my mind: As far as I have heard there are distributions which currently do not release from MasterWorkspaces or ChildWorkspaces but do in fact use some kind of more or less complex system to release something which is kind of a mix of code from current MasterWorkspace together with some patches from integrating stuff from some unfinished/not yet integrated ChildWorkspaces together with maybe some patches which are not even yet on some ChildWorkspace etc. or maybe even plus some code hold back from contributing back for some time for creating some kind of artifical feature advantage etc. For such builds of course anything send as version information in the reportmail.xml file is kind of unusable. When distributions can not agree on a common release mechanism which is based on releasing from MasterWorkspaces / ChildWorkspaces we might not be able to find common ground on how to handle crash reports and what data must be send by crash-reporting for storing it in a cross-distro crash repository. It may be a tedious and time consuming tasks for developers of distribution-A having to sort out false positives in the crash repository for things like regression or new bugs in code which is not even in their distribution because it is based on code of distribution-B which was not yet contributed back via ChildWorkspaces or in some private patch applied but where the reports stacktrace just looked similar to annother one which caused by a problem in common code. There is some kind of numeric buildid used by OOo and there is also a build string in a config file containing that plus some other information, eg. name of a ChildWorkspace, used when building. Problem is that the build string does not contain information like wether we have an SRC680 or something like OOF680 etc. therefor the backend must have means to map the numeric buildid to such information. Thus the numeric buildid needs to become globally unique across distributions requiring a mechanism ( web service etc.) to request a new uniq buildid offered somewhere. C. Kind regards, Bernd Eilers - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] crash reporter, what are the various distro's strategies ?
Caolan McNamara wrote: So when OOo crashes, what are the strategies that distros employ here if any ? Sun has a SOAP receiving service for crash reports, a database to store crashreports, a daemon service which tries to find similiar stacktrace based on the information send in the XML file which works with a certain fuzziness, a web frontend for development to look at the stuff stored in the database and some automated process which generates tasks from crash reports to developers in a Sun internal bugtracking system based on frequency of crashes. The autosubmission of tasks and the web frontend provide some means to resolve the data send with the crash report against the debug information kept for released versions using platform depend utilities, eg. kd.exe on windows addr2line on unix. The autosubmission of tasks merly guesses on a potential developer which could become the possible first owner based on applying random function to the library owners of the first 5 stackframes, mostly such tasks get reassigned by the first owner to a better one sometimes the one initially getting the Task is already the right one. In the database reports have their own reportid and similar ones are grouped together under a common stackmatchid. For RH we don't build the crashreporter (as it's basically unusable info for Sun), This is because Sun would not have the debug information created during building those Releases which would be needed to 'resolv' the crash report getting stackinformation with source code filenames, classes and line numbers out of it. The information in the XML file is not enough for the developer to locate the origin of the problem, it´s only enough for checking possible similiarity. but we do configure to enable using it, and replace it in the install set with a simple replacement that tests for the set of common problems that historically caused crashes, e.g. nvidia drivers, a11y enabled, selinux settings and insane local library butchering, and prompts the unfortunate user to log the supplied text to the RH bugzilla, and we map the stack back to source with http://people.redhat.com/caolanm/ooocvs/ooomapstack after throwing out the nvidia driver using reports This ooomapstack utility most likely will also need access to debug information kept during the build, wouldn´t it? At some stage in the past the ooobuild OOo would spawn off the gnome gnome bug-buddy, but that's no longer the case is it ? So do other distros have various solutions here, or just simply crash out and/or dump core ? What I'm thinking about aiming at is a shared cross-distro crash repository where we can auto submit the distro OOo crashes, and the distros can plug in their various stack mappers, with quick and dirty gnomebugzilla-alike tooling to merge the duped backtraces together. I see a few potential problems with that: 1.) Crash data does contain dump of memmory which in some cases might contain personal information which end-users would not want others to be able to find on public websites. There is stuff like http://www.sun.com/privacy/ and I am quite sure others do have similar terms to adhere to. Which means that anything we can create can not be created in full public but must have limited access to restricted group(s) of developers who agree on common terms of handling possible private data in a secure manner. 2.) Who should host this repository? Or are you thinking about some kind of distributed or partly distributed repository? 3.) Resources have to be available to call something like RH´s ooomapstack utility or Sun´s crashdebug utility on all supported platforms. 4.) Resources have to be available to keep debug information for all builds for doing 3.) 5.) 3.) + 4.) have to be on the same network meaning that either one contributor would have to provide disk space and computing resources for anyone else or that everyone has too keep it´s one debug-information and provide computing resources for the community to 'resolv' crash reports send in for their distributions. 6.) At least what we (Sun) currently use as 'StackMapping' (in terms of grouping together similar reports) does often have false mapping in both directions, that means that the system sometimes things 2 reports are similiar but the developer later on finds out that they have a differnt root cause and that also the system sometimes thinks two reports are not similar but the developer later on finds out that they do in fact have the same root cause. Considering our 'fuzzy'-algorithm that´s where I am starting to wonder how that gnomebugzilla-alike tooling to merge the duped backtraces together might look like! Handling does false positives / false negatives already has become a time consuming task which would get worse with a larger repository containing data for all distros. 7.) With the amount of data that we (Sun) currently already have in the database stackmatching
Re: [dev] crash reporter, what are the various distro's strategies ?
On Fri, 2007-05-18 at 15:43 +0200, Bernd Eilers wrote: Caolan McNamara wrote: So when OOo crashes, what are the strategies that distros employ here if any ? Sun has a SOAP receiving service for crash reports, I'm aware of the tooling there, but this is somewhat orthogonal as the Sun tooling is internal and doesn't factor into this much. For RH we don't build the crashreporter (as it's basically unusable info for Sun), This is because Sun would not have the debug information created during building those Releases which would be needed to 'resolv' the crash But we do of course, and are both able to and actually do the mapping. And probably other distros can as well, and actually any of the distros using debuginfo rpms can use the simple RH ooomapstack tooling without much modification I'd assume. This ooomapstack utility most likely will also need access to debug information kept during the build, wouldn´t it? Yup, not a problem. We keep that info around for all our packages e.g. for the latest Fedora Core 6 it just needs an unpacked http://download.fedora.redhat.com/pub/fedora/linux/core/updates/6/i386/debug/openoffice.org-debuginfo-2.0.4-5.5.22.i386.rpm so any potential cross-distro crash server would just need a little fedora core disto specific plugin to go fetch the matching debuginfo rpm and run ooomapstack on it. I see a few potential problems with that: 1.) Crash data does contain dump of memmory which in some cases might contain personal information which end-users would not want others to be able to find on public websites. All that's at stack here is the basic simple stack traces, without memory dumps. So the issue doesn't arise. 2.) Who should host this repository? Or are you thinking about some kind of distributed or partly distributed repository? Well, that's the nub of the issue isn't it. But I don't see this as affecting Sun in any way, i.e. this isn't intended as pumping more data into Suns database, just an alternative crash reporter for the distro builders that might point at a shared database. Certainly there may be issues of scalability and it may be impractical. I was thinking of whipping something Red Hat specific together and giving it a whirl during the up and coming development cycle when there are a limited number of users to get a feel for how much data and server load would be involved when not considering vast number of window users and where all traces would generally come from a very homogeneous environment. C. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] crash reporter, what are the various distro's strategies ?
ReHi, Caolan McNamara wrote: On Fri, 2007-05-18 at 15:43 +0200, Bernd Eilers wrote: Caolan McNamara wrote: [... snip ...] Certainly there may be issues of scalability and it may be impractical. I was thinking of whipping something Red Hat specific together and giving it a whirl during the up and coming development cycle when there are a limited number of users to get a feel for how much data and server load would be involved when not considering vast number of window users and where all traces would generally come from a very homogeneous environment. I think I did already sent out some information about the simple SOAP interface used by crash reporting and how to configure the site being used etc. to some mailing list some time ago. If you need any help with stuff like that you can drop me a note. If there would some need arise to extend the XML file format used for crash reporting while you are at such task I would also like to get involved as I consider it to be better to have one common file format specification there with maybe some optional data evtually not handled by some party than to have two or more derived formats at different contributions. Note that adding for example XML attributes in the reportmail.xml via some CWS without us adding support for that in the DTD we keep for it in the backend would crash our backend handling, so before integrating such kind of feature please drop me a note ;-) C. Kind regards, Bernd Eilers - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [dev] Additional toughts about crash reporting, WAS: Re: [dev] crash reporter, what are the various distro's strategies ?
Caolan McNamara wrote: On Fri, 2007-05-18 at 16:23 +0200, Bernd Eilers wrote: Caolan McNamara wrote: Hi Caolan, ReHi, As far as I have heard there are distributions which currently do not release from MasterWorkspaces or ChildWorkspaces but do in fact use some kind of more or less complex system to release something which is kind of a mix of code from current MasterWorkspace together with some patches from integrating stuff from some unfinished/not yet integrated ChildWorkspaces together with maybe some patches which are not even yet on some ChildWorkspace etc. or maybe even plus some code hold back from contributing back for some time for creating some kind of artifical feature advantage etc. Like a StarOffice email merge component, or a StarOffice headless vcl plugin, or a StarOffice blog component. That kind of thing maybe :-) Nah those are usually in different libs or are even in a packaged extension etc. and would thus not cause a same-match-different-problem situation. Potential problems arise only if code is in same lib :-) For such builds of course anything send as version information in the reportmail.xml file is kind of unusable. I'm not totally convinced, certainly the info is unusable unless you have the right source for that version which crashed, but again the distros have the matching source for each build, so if there's a distro-specific backported patch or an experimental addition in play the data will only be relevant for that distro. I don't see it as a huge problem really. Just have the disto crash reporter autofill in the distro and distro OOo package version when submitting the report, if the various stacks from different distros match exactly then it's in code which has the same state across distros, if it doesn't then it remains a problem for the afflicted distro only. Yes but there currently is no such thing as a distro and distro_version variable defined in the reportmail.xml file format, there is only the build string and the product variable meaning to support this and to support such distros not using a release from masterworkspace model for being able to map back to their corresponing debug information we must extend the fileformat used for reportmail.xml and update it´s corresponding DTD for being able to send this new information. This can of course be done but must be coordinated, see also my other reply about changing stuff like that. It may be a tedious and time consuming tasks for developers of distribution-A having to sort out false positives in the crash repository for things like regression or new bugs in code which is not even in their distribution. Well I'd foresee that it would be up to each distro to look at their own stacktraces so I don't see that occurring, but I don't have a feel for the scale of how many reports might be in question. Yeah well I was first misunderstanding your intentions of course a little bit and thought you were talking about a common system that could be used by ANY distro ANY platform ANY product ANY developer, but this doesn´t seem to be the case. But indeed, distros that drift from a common base would have to re-do work done by others to note that their trace is the same as others. I naively see a quick auto comparison on the unmapped stack to see if they are exactly the same as something else and autodup it, and then on the remainder map back to source and do a fuzzy comparison to make a list of candidates and leave it up the humans to make a decision. Perfection wouldn't be a goal, the odd crash here and there sucks but it's the regular occurrences that I'm interested in. Agreed. And unfortunatly I must add that getting near prefection in that area is not being possible at all :-( Therefor the backend must have means to map the numeric buildid to such information. Thus the numeric buildid needs to become globally unique across distributions Yeah, there needs to be a unique identified for the version of OOo and the distro itself. I was (vaguely) thinking more of simply a config file for a crashreporting tool with a distro.id and have the tool submit the standard report wrapped in some extra data like the distro.id and other useful data, like versions of known troublesome software. I would rather have us extend the reportmail.xml file format spec for that. There is already support for namespaces in the definition about how that file looks like which groups different types of information send in that file into categories so adding other userful data maybe a matter of creating corresponding namespace for each type of useful data while adding stuff like distroid might fit into exiting namespaces just adding new attributes as optional attribures to available elements like those being used for officeinfo version information. For stuff that does not fit into the XML format there is of course always the possibility to have additional binary or text