Re: Sparc build failure analysis (was Re: StrongARM tactics)
On Sun, Dec 11, 2005 at 05:30:24AM -0500, Kevin Mark wrote: > has anyone every considered a check in the buildd infrastructure to > alert someone (buildd admin and/or others) if a build is taking too long > (eg openoffice usually takes between 2-3 hours to build and the current > build has been building for 10 hours+). Something like a database entry > or a database of either previous build times or last build time. As a > way to not have a buildd tied up with an obvious build issue and thus > allow the issue to be address sooner thus alowing more buildd > throughput. You mean something like the stats on http://www.buildd.net/ - for example: http://buildd.net/cgi/ptracker.cgi?unstable_pkg=aptitude&searchtype=m68k I intend to give some sort of "build is overdue" warning when I have enough data to calculate on when a build is overdue ;) -- Ciao...//Fon: 0381-2744150 Ingo \X/ SIP: [EMAIL PROTECTED] gpg pubkey: http://www.juergensmann.de/ij/public_key.asc signature.asc Description: Digital signature
Re: Sparc build failure analysis (was Re: StrongARM tactics)
On Sun, Dec 11, 2005 at 05:55:23AM -0800, Steve Langasek wrote: > > > Indeed, for practical buildd maintainance purposes, the distinction is > > not that important -- though 'Failed' is known to not benefit of a > > requeue, while 'Building:Maybe-Failed' might or might not, it's unkown, > > most archs should have enough surplus buildd power that retrying > > everything once in a while doesn't hurt. > > > The major benefit is though to make it apparant for porters what to look > > into, without all the 'noise' in between of maybe-transient failures. > > One could also make sure that the FTBFS bugs are tagged (user-tagged) > > with [EMAIL PROTECTED] (etc) for example (or [EMAIL PROTECTED] There > > doesn't exist a [EMAIL PROTECTED] for example...), so that one can get a > > nice overview of all the porting bugs. It'd make sense to synchronise > > this across all architectures, so that it is consistent. > > http://lists.debian.org/debian-alpha/2005/12/msg00028.html I have a long list of bug affecting amd64, but I haven't started with usertags for it. The (FTBFS) bugs I encouter (as buildd admin) are: - General bugs affecting all arches. - General bugs affecting 64 bit arches. - Bugs affecting some arches (like not using -fPIC) - Bugs only affecting amd64. And the later really is the minorty of the problems. Note that this does not cover runtime problems or something like that, but they're very simular. Do we need to have a special usertag for the first kind? This is basicly something everybody can look at. The only reason I can think of that it requires some tag is that it's better then looking at those that don't have a tag. So, I'm open for suggestions on how to tag the first 3 of those. Kurt -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Sparc build failure analysis (was Re: StrongARM tactics)
On Sun, Dec 11, 2005 at 02:38:35PM +0100, Jeroen van Wolffelaar wrote: > On Sun, Dec 11, 2005 at 12:35:26AM -0800, Steve Langasek wrote: > > On Sat, Dec 10, 2005 at 06:53:47AM -0800, Blars Blarson wrote: > > > FAILED > > But FAILED is an advisory state anyway; it doesn't directly benefit the > > port, at all, to have the package listed as "Failed", this is just a > > convenience for folks sifting through the build failures (like the rarely > > used "confirmed" BTS tag is for maintainers). In the long-term, one of two > > things needs to happen with each of these build failures: the package needs > > to be added to P-a-s so the arch no longer tries to build it, or the package > > needs to be fixed -- via porter NMU if necessary. > > So as you have the list of these packages, as a porter you can proceed with > > figuring out which of the two categories each falls into, and take the > > necessary action without worrying about the "Failed" state, yes? > Indeed, for practical buildd maintainance purposes, the distinction is > not that important -- though 'Failed' is known to not benefit of a > requeue, while 'Building:Maybe-Failed' might or might not, it's unkown, > most archs should have enough surplus buildd power that retrying > everything once in a while doesn't hurt. > The major benefit is though to make it apparant for porters what to look > into, without all the 'noise' in between of maybe-transient failures. > One could also make sure that the FTBFS bugs are tagged (user-tagged) > with [EMAIL PROTECTED] (etc) for example (or [EMAIL PROTECTED] There > doesn't exist a [EMAIL PROTECTED] for example...), so that one can get a > nice overview of all the porting bugs. It'd make sense to synchronise > this across all architectures, so that it is consistent. http://lists.debian.org/debian-alpha/2005/12/msg00028.html Our porters can beat up your porters. ;) -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. [EMAIL PROTECTED] http://www.debian.org/ signature.asc Description: Digital signature
Re: Sparc build failure analysis (was Re: StrongARM tactics)
On Sun, Dec 11, 2005 at 12:35:26AM -0800, Steve Langasek wrote: > On Sat, Dec 10, 2005 at 06:53:47AM -0800, Blars Blarson wrote: > > FAILED > > But FAILED is an advisory state anyway; it doesn't directly benefit the > port, at all, to have the package listed as "Failed", this is just a > convenience for folks sifting through the build failures (like the rarely > used "confirmed" BTS tag is for maintainers). In the long-term, one of two > things needs to happen with each of these build failures: the package needs > to be added to P-a-s so the arch no longer tries to build it, or the package > needs to be fixed -- via porter NMU if necessary. > > So as you have the list of these packages, as a porter you can proceed with > figuring out which of the two categories each falls into, and take the > necessary action without worrying about the "Failed" state, yes? Indeed, for practical buildd maintainance purposes, the distinction is not that important -- though 'Failed' is known to not benefit of a requeue, while 'Building:Maybe-Failed' might or might not, it's unkown, most archs should have enough surplus buildd power that retrying everything once in a while doesn't hurt. The major benefit is though to make it apparant for porters what to look into, without all the 'noise' in between of maybe-transient failures. One could also make sure that the FTBFS bugs are tagged (user-tagged) with [EMAIL PROTECTED] (etc) for example (or [EMAIL PROTECTED] There doesn't exist a [EMAIL PROTECTED] for example...), so that one can get a nice overview of all the porting bugs. It'd make sense to synchronise this across all architectures, so that it is consistent. If that is done, and there would be some way for porters to easily tag the build failures themselves on what bug they correspond with, or not, and especially, what failures are new and are yet to be tagged, there'd be an easy and clear workflow for porters to work on failures. I don't think there has really been such a defined porter workflow for build failures, and nobody so far has built/defined one to the best of my knowledge. And this touches one of the core points Vancouver is intended to solve: *porters* need to work on *porting*, and help actively and actually fixing porting issues in the archive. If creating a better interface for people to work on this is a part of achieving it, so be it. I'll see whether I can hack up something together for this, extending buildd.d.o/~jeroen/status. --Jeroen -- Jeroen van Wolffelaar [EMAIL PROTECTED] (also for Jabber & MSN; ICQ: 33944357) http://Jeroen.A-Eskwadraat.nl -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Sparc build failure analysis (was Re: StrongARM tactics)
On Sat, Dec 10, 2005 at 06:53:47AM -0800, Blars Blarson wrote: > In article <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes: > >On Tue, Dec 06, 2005 at 05:21:46PM -0800, Blars Blarson wrote: > >> I can do the analyzing, but what should I do with the results? > >> [EMAIL PROTECTED] seems to be a black hole. You'll need to find > >> someone willing to communicate with access to the buildd queues before > >> the porters can do anything. > > > >I said that deciding which packages should belong in P-a-s is porter work; > >as is filing bugs on failed packages that shouldn't, providing patches, and > >doing porter NMUs if necessary. > > Again: what can I do with such a list? See the list below. > > >If the porters do this effectively, there's really not much need at all for > >telling the buildd maintainers about transient build failures, because > >they'll be pretty obvious (and account for the majority of failures, as it > >should be). > > Just because it is obvious does not mean that the buildd adminstrator > does the correct thing. kq was "uploaded" 51 days ago, trustedqsl was > "uploaded" 25 days ago, neither is in the archive. > > openoffice.org has been "building" for 8 days, it only took 57 hours > on my slower than any current sparc buildd pbuilder. kexi has been > "building" for 6 days, it took less than 2 hours. Hi Blars et al., has anyone every considered a check in the buildd infrastructure to alert someone (buildd admin and/or others) if a build is taking too long (eg openoffice usually takes between 2-3 hours to build and the current build has been building for 10 hours+). Something like a database entry or a database of either previous build times or last build time. As a way to not have a buildd tied up with an obvious build issue and thus allow the issue to be address sooner thus alowing more buildd throughput. I'd help but I have neither the skill nor the access to buildd infrastrure (as I'm not a DD or a buildd admin) but try to give ideas that I feel are helpful. Anyway, hope those buildd (and thier admins) are humming along smoothly! Cheers, Kev -- counter.li.org #238656 -- goto counter.li.org and be counted! `$' $' $ $ _ ,d$$$g$ ,d$$$b. $,d$$$b`$' g$b $,d$$b ,$P' `$ ,$P' `Y$ $$' `$ $ "' `$ $$' `$ $$ $ $$g$ $ $ $ ,$P"" $ $$ `$g. ,$$ `$$._ _. $ _,g$P $ `$b. ,$$ $$ `Y$$P'$. `YP $$$P"' ,$. `Y$$P'$ $. ,$. signature.asc Description: Digital signature
Re: Sparc build failure analysis (was Re: StrongARM tactics)
On Sat, Dec 10, 2005 at 06:53:47AM -0800, Blars Blarson wrote: > >I said that deciding which packages should belong in P-a-s is porter work; > >as is filing bugs on failed packages that shouldn't, providing patches, and > >doing porter NMUs if necessary. > Again: what can I do with such a list? See the list below. Changes to the P-a-s list should be sent to the contacts listed at the top of this file (http://buildd.debian.org/quinn-diff/Packages-arch-specific). > >If the porters do this effectively, there's really not much need at all for > >telling the buildd maintainers about transient build failures, because > >they'll be pretty obvious (and account for the majority of failures, as it > >should be). > Just because it is obvious does not mean that the buildd adminstrator > does the correct thing. kq was "uploaded" 51 days ago, trustedqsl was > "uploaded" 25 days ago, neither is in the archive. Well, release-wise, the practical impact here is quite small; for packages that aren't needed in order to fix RC bugs, for my part I'm quite content to have the buildd admins manage such give-backs on their own schedule. Being more responsive to give-back requests may generally make people happier, but there's also a context-switch cost associated with such status polling; in the non-RC cases, does it really hurt anything to *not* have these packages given back quickly? If not, isn't a better solution for people to be understanding of this? kq and pointless given back now, btw (not trustedqsl, which is "Failed" rather than "Uploaded"). > openoffice.org has been "building" for 8 days, it only took 57 hours > on my slower than any current sparc buildd pbuilder. kexi has been > "building" for 6 days, it took less than 2 hours. openoffice.org is listed as "building" because the buildd crashed mid-build. Ryan Murray and the package maintainer discussed this on IRC when it happened; it was not immediately given back because of concerns over whether the build might take down a *second* buildd while there was still a significant backlog due to the c2a transition. No, this isn't perfectly transparent; but yes, it should be acceptable. There's almost never a reason to fret over builds-gone-missing for at least, say, a week and a half, which is about how long it would take for the package to be eligible for testing. In OOo's case, try adding another couple of weeks to that for the current c2a transition that it blocks on... > The sparc buildd mainainter has in the past left transient build > failures lie for over 6 months. For the past year he's been requeuing > all maybe-failed packages every 1-3 months. Well, a) we now have auto-dep-wait, so this is even less of a problem now than it might have been before; b) in the general case, it's not much of a problem. > REQUEUE > qterm 0.4.0pre3-2+b1 342381 Consider that this was a bug in a *toolchain* package, so more than a simple give-back is required: gcc-4.0 needs to be upgraded on the buildds for this to work. Given that this was caused by a stray \ in the source that didn't belong, it would be advisable for the package maintainer to defensively correct this as well. > DEP-WAIT > galago-sharp libmono-dev > dmraidlibklibc-dev > motv ibzvbi0 (= 0.2.17-3.0.1) > qmailadminvpopmail-bin > wvstreams libxplc0.3.13-dev > sylpheed-claws-gtk2 libclamv-dev > digikam libartsl-dev (>=1.4.2) > tulip mesa-common-dev (= 6.2.1-7) > liferea libdbus-glib-1-dev > kwave kdemultimedia-dev > ivtools libace-dev > fwbuilder libfwbuilder-dev > gtksharp2 libmono-dev > libavglibavcodeccvs-dev > gnucash slib > memepack r-base-dev > r-cran-bayesm r-base-dev Um, at least some of these dep-waits are completely wrong; r-base-dev and slib are arch: all packages, and it's not useful to set a dep-wait on a package that *is* available. Could you please revise this list accordingly? Also, setting an (= $version) dep-wait isn't particularly helpful, sometimes newer source versions will be uploaded before binaries are uploaded for the current version. So please clarify whether these are meant to be >= or something else. > FAILED But FAILED is an advisory state anyway; it doesn't directly benefit the port, at all, to have the package listed as "Failed", this is just a convenience for folks sifting through the build failures (like the rarely used "confirmed" BTS tag is for maintainers). In the long-term, one of two things needs to happen with each of these build failures: the package needs to be added to P-a-s so the arch no longer tries to build it, or the package needs to be fixed -- via porter NMU if necessary. So as you have the list of these packages, as a porter you can proceed with figuring out which of the two
Re: Sparc build failure analysis (was Re: StrongARM tactics)
On Sat, Dec 10, 2005 at 06:53:47AM -0800, Blars Blarson wrote: > numactl > only supports i386 amd64 ia64 > appears to assume intel-style stuff, would need major redesign > for other architectures There's nothing intel-specific in here, rather it assumes NUMA support in the kernel. Obviously this is only the case for architectures that support numa. I've actually tested it on ppc64, although not on debian. > libaio this should build on all architectures. Currently it needs a tiny bit of architecture-specific code, but that could be avoided by using proper syscall macros instead of trying to do it's own. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]