subject:"Re\: \[BUG\] New Kernel Bugs"

size of git repository (was Re: [BUG] New Kernel Bugs)

2007-11-18 Thread Pavel Machek

On Tue 2007-11-13 12:50:08, Mark Lord wrote:
 Ingo Molnar wrote:
 
 for example git-bisect was godsent. I remember that 
 years ago bisection of a bug was a very laborous task 
 so that it was only used as a final, last-ditch 
 approach for really nasty bugs. Today we can 
 autonomouly bisect build bugs via a simple shell 
 command around git-bisect run, without any human 
 interaction! This freed up testing resources 
 ..
 
 It's only a godsend for the few people who happen to be 
 kernel developers
 and who happen to already use git.
 
 It's a 540MByte download over a slow link for everyone 
 else.

Hmmm, clean-cg is 7.7G on my machine, and yes I tried
git-prune-packed. What am I doing wrong?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Re: size of git repository (was Re: [BUG] New Kernel Bugs)

2007-11-18 Thread Rene Herman


On 18-11-07 13:44, Pavel Machek wrote:


On Tue 2007-11-13 12:50:08, Mark Lord wrote:


It's a 540MByte download over a slow link for everyone 
else.


Hmmm, clean-cg is 7.7G on my machine, and yes I tried
git-prune-packed. What am I doing wrong?


clean-cg? But failure to run git repack -a -d every once in a while?

Rene.

Re: size of git repository (was Re: [BUG] New Kernel Bugs)

2007-11-18 Thread Ingo Molnar


* Pavel Machek [EMAIL PROTECTED] wrote:

 On Tue 2007-11-13 12:50:08, Mark Lord wrote:
  Ingo Molnar wrote:
  
  for example git-bisect was godsent. I remember that 
  years ago bisection of a bug was a very laborous task 
  so that it was only used as a final, last-ditch 
  approach for really nasty bugs. Today we can 
  autonomouly bisect build bugs via a simple shell 
  command around git-bisect run, without any human 
  interaction! This freed up testing resources 
  ..
  
  It's only a godsend for the few people who happen to be 
  kernel developers
  and who happen to already use git.
  
  It's a 540MByte download over a slow link for everyone 
  else.
 
 Hmmm, clean-cg is 7.7G on my machine, and yes I tried 
 git-prune-packed. What am I doing wrong?

git-repack -a -d gives me ~220 MB:

  $ du -s .git
  222064  .git

anyone who can download a 43 MB tar.bz2 tarball for a kernel release 
should be able to afford a _one time_ download size of 250 MB (the size 
of the current kernel.org git repository). If not, burning a CD or DVD 
and carrying it home ought to do the trick. Git is very 
bandwidth-efficient after that point - lots of people behind narrow 
pipes are using it - it's just the initial clone that takes time. And 
given all the history and metadata that the git repository carries (full 
changelogs, annotations, etc.) it's a no-brainer that kernel developers 
should be using it.

(and you can shrink the 250 MB further down by using shallow clones, 
etc.)

yes, some people complained when distros stopped doing floppy installs. 
Some people complained when distros stopped doing CD installs. Yes, i've 
myself done a 250+ MB download over a 56 kbit modem in the past, and 
while it indeed took overnight to finish, it's very much doable. It's 
not really qualitatively different from the 1.5 hours a kernel tar.bz2 
took to download.

Ingo

Re: [BUG] New Kernel Bugs

2007-11-15 Thread Theodore Tso

On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote:
 I don't see any reason that we couldn't have a tool accessible to Ubuntu 
 users that does a real git bisect. Git is really good at being scripted 
 by fancy GUIs. It should be easy enough to have a drop down with all of 
 the Ubuntu kernel package releases, where the user selects what works and 
 what doesn't.

It's possible users who haven't yet downloaded a git repository have
to surmount some obstacles that might cause them to lose interest.
First, they have to download some 190 megs of git repository, and if
they have a slow link, that can take a while, and then they have to
build each kernel, which can take a while.  A full kernel build with
everything selected can take good 30 minutes or more, and that's on a
fast dual-core machine with 4gigs of memory and 7200rpm disk drives.
On a slower, memory limited laptop, doing a single kernel build can
take more time than the user has patiences; multiply that by 7 or 8
build and test boots, and it starts to get tiresome.  

And then on top of that there are the issues about whether there is
enough support for dealing with hitting kernel revisions that fail due
to other bugs getting merged in during the -rc1 process, etc.

I agree that a tool that automated the bisection process and walked
the user through it would be helpful, but I believe it would be
possible for us do better.

- Ted

Re: [BUG] New Kernel Bugs

2007-11-15 Thread Ben Dooks

On Tue, Nov 13, 2007 at 10:34:37PM +, Russell King wrote:
 On Tue, Nov 13, 2007 at 06:25:16PM +, Alan Cox wrote:
   Given the wide range of ARM platforms today, it is utterly idiotic to
   expect a single person to be able to provide responses for all ARM bugs.
   I for one wish I'd never *VOLUNTEERED* to be a part of the kernel
   bugzilla, and really *WISH* I could pull out of that function.
  
  You can. Perhaps that bugzilla needs to point to some kind of
  [EMAIL PROTECTED] list for the various ARM platform
  maintainers ?
 
 That might work - though it would be hard to get all the platform
 maintainers to be signed up to yet another mailing list, I'm sure
 sufficient would do.

As long as it would just be bug reports, I'm sure that most of us
could be persuaded to subscribe. Adding another list for general
discussions is probably not going to be read, the current list
provides more than enough to keep us busy.

-- 
Ben

Q:  What's a light-year?
A:  One-third less calories than a regular year.

Re: [BUG] New Kernel Bugs

2007-11-15 Thread J. Bruce Fields

On Thu, Nov 15, 2007 at 01:50:43PM +1100, Neil Brown wrote:
 Virtual Folders.
 
 I use VM mode in EMACS, but I believe some other mail readers have the
 same functionality.
 I have a virtual folder called nfs which shows me all mail in my
 inbox which has the string 'nfs' or 'lockd' in a To, Cc, or Subject
 field.  When I visit that folder, I see all mail about nfs, whether it
 was sent to me personally, or to a relevant list, or to lkml.

Hm (googling around for mutt and virtual folders): looks like I can
get most of the way there in mutt with some macros based on its limit
command:

http://www.tummy.com/journals/entries/jafo_20060303_00

Thanks.--b.

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Matthew Wilcox

On Wed, Nov 14, 2007 at 12:46:20AM -0700, Denys Vlasenko wrote:
 Finally they replied and asked to rediff it against their
 git tree. I did that and sent patches back. No reply since then.
 
 And mind you, the patch is not trying to do anything
 complex, it mostly moves code around, removes 'inline',
 adds 'const'. What should I think about it?

I'm waiting for an ACK/NAK from Hannes, the maintainer.  What should I
do?

-- 
Intel are signing my paycheques ... these opinions are still mine
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Hannes Reinecke

Matthew Wilcox wrote:
 On Wed, Nov 14, 2007 at 12:46:20AM -0700, Denys Vlasenko wrote:
 Finally they replied and asked to rediff it against their
 git tree. I did that and sent patches back. No reply since then.

 And mind you, the patch is not trying to do anything
 complex, it mostly moves code around, removes 'inline',
 adds 'const'. What should I think about it?
 
 I'm waiting for an ACK/NAK from Hannes, the maintainer.  What should I
 do?
 
I haven't actually been able to test it here (too busy, sorry). If someone
else confirms it does it's job then

Acked-by: Hannes Reinecke [EMAIL PROTECTED]

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Pavel Machek

Hi!

  Suspend to RAM resume hangs on a tickless (NO_HZ) kernel
  http://bugzilla.kernel.org/show_bug.cgi?id=9275
  Kernel: 2.6.23
  This is HP notebook nc6320 T2400 945GM
 
 No response from developers

Maybe I'm optimistic, but I expected Ingo/Thomas to look after nohz
problems. nohz=off highres=off fixes more than one suspend problem...

...stuff I've seen with NOHZ even without suspend (cursor blinking
irregulary) make me think that nohz perhaps should not be used in
production just yet...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Ingo Molnar


* Randy Dunlap [EMAIL PROTECTED] wrote:

  (and this is in no way directed at the networking folks - it holds 
  for all of us. I have one main complaint about networking: the 
  separate netdev list is a bad idea - networking regressions should 
  be discussed and fixed on lkml, like most other subsystems are. Any 
  artificial split of the lk discussion space is bad.)
 
 but here I disagree.  LKML is already too busy and noisy. Major 
 subsystems need their own discussion areas.

That's a stupid argument. We lose much more by forced isolation of 
discussion than what we win by having less traffic! It's _MUCH_ easier 
to narrow down information (by filter by threads, by topics, by people, 
etc.) than it is to gobble information together from various fractured 
sources. We learned it _again and again_ that isolation of kernel 
discussions causes bad things.

In fact this thread is the very example: David points out that on netdev 
some of those bugs were already discussed and resolved. Had it been all 
on lkml we'd all be aware of it.

this is a single kernel project that is released together as one 
codebase, so a central place of discussion is obvious and common-sense.

so please stop this too busy and too noisy nonsense already. It was 
nonsense 10 years ago and it's nonsense today. In 10 years the kernel 
grew from a 1 million lines codebase to an 8 million lines codebase, so 
what? Deal with it and be intelligent about filtering your information 
influx instead of imposing a hard pre-filtering criteria that restricts 
intelligent processing of information.

Ingo

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Fabio Comolli

FWIW, I see the same problem with another HP notebook, DV4378EA with
radeon X700 video card. It does not happen frequently but I can say
that since I disabled the tickless feature I can't reproduce the
problem anymore.

On Nov 14, 2007 2:24 PM, Pavel Machek [EMAIL PROTECTED] wrote:
 Hi!

   Suspend to RAM resume hangs on a tickless (NO_HZ) kernel
   http://bugzilla.kernel.org/show_bug.cgi?id=9275
   Kernel: 2.6.23
   This is HP notebook nc6320 T2400 945GM
 
  No response from developers

 Maybe I'm optimistic, but I expected Ingo/Thomas to look after nohz
 problems. nohz=off highres=off fixes more than one suspend problem...

 ...stuff I've seen with NOHZ even without suspend (cursor blinking
 irregulary) make me think that nohz perhaps should not be used in
 production just yet...

 Pavel
 --
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Ingo Molnar


* Mark Lord [EMAIL PROTECTED] wrote:

 You're assuming that everything in linux-2.6 was downloaded; that's 
 not true.  Everything in linux-2.6/.git was downloaded; but then you 
 do a checkout which happens to approximately double the size of the 
 linux-2.6 directory.
 ..

 Ah, I wondered why it took only half an hour to download.

and you can get even lower than the 260MB by downloading a shallow clone 
of v2.6.23 and then populating the git tree from tht point on. (see the 
--depth parameter of git-clone) [because most of the time you want to 
bisect back to the last stable release, not back to 2 years of git 
history.]

Ingo

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Randy Dunlap

On Wed, 14 Nov 2007 15:08:47 +0100 Ingo Molnar wrote:

 
 * Randy Dunlap [EMAIL PROTECTED] wrote:
 
   (and this is in no way directed at the networking folks - it holds 
   for all of us. I have one main complaint about networking: the 
   separate netdev list is a bad idea - networking regressions should 
   be discussed and fixed on lkml, like most other subsystems are. Any 
   artificial split of the lk discussion space is bad.)
  
  but here I disagree.  LKML is already too busy and noisy. Major 
  subsystems need their own discussion areas.
 
 That's a stupid argument. We lose much more by forced isolation of 
 discussion than what we win by having less traffic! It's _MUCH_ easier 
 to narrow down information (by filter by threads, by topics, by people, 
 etc.) than it is to gobble information together from various fractured 
 sources. We learned it _again and again_ that isolation of kernel 
 discussions causes bad things.
 
 In fact this thread is the very example: David points out that on netdev 
 some of those bugs were already discussed and resolved. Had it been all 
 on lkml we'd all be aware of it.

or had someone been on netdev.

 this is a single kernel project that is released together as one 
 codebase, so a central place of discussion is obvious and common-sense.

Central doesn't have to mean one-and-only-one-list-for-everything.

 so please stop this too busy and too noisy nonsense already. It was 
 nonsense 10 years ago and it's nonsense today. In 10 years the kernel 
 grew from a 1 million lines codebase to an 8 million lines codebase, so 
 what? Deal with it and be intelligent about filtering your information 
 influx instead of imposing a hard pre-filtering criteria that restricts 
 intelligent processing of information.

So you have a preferred method of handling email.  Please don't
force it on the rest of us.

I'll plan to use lkml-list-only when you have convinced DaveM to drop
all of the other mailing lists at vger.kernel.org.  Yeah, sure.

---
~Randy

Re: [BUG] New Kernel Bugs

2007-11-14 Thread J. Bruce Fields

On Wed, Nov 14, 2007 at 09:38:20AM -0800, Randy Dunlap wrote:
 On Wed, 14 Nov 2007 15:08:47 +0100 Ingo Molnar wrote:
  so please stop this too busy and too noisy nonsense already. It was 
  nonsense 10 years ago and it's nonsense today. In 10 years the kernel 
  grew from a 1 million lines codebase to an 8 million lines codebase, so 
  what? Deal with it and be intelligent about filtering your information 
  influx instead of imposing a hard pre-filtering criteria that restricts 
  intelligent processing of information.
 
 So you have a preferred method of handling email.  Please don't
 force it on the rest of us.

I'd be curious for any pointers on tools, actually.  I read (ok, skim)
lkml but still overlook relevant bug reports occasionally.
(Fortunately, between Trond and Andrew and others forwarding things it's
not actually a problem, but I'm still curious).

--b.

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Kok, Auke

Denys Vlasenko wrote:
 On Wednesday 14 November 2007 00:27, Adrian Bunk wrote:
 You missed the following in my email:
 we slowly scare them away due to the many bug reports without any
  reaction.

 The problem is that bug reports take time. If you go away from easy
 things like compile errors then even things like describing what does
 no longer work, ideally producing a scenario where you can reproduce it
 and verifying whether it was present in previous kernels can easily take
 many hours that are spent before the initial bug report.

 If the bug report then gets ignored we discourage the person who sent
 the bug report to do any work related to the kernel again.
 
 Cannot agree more. I am in a similar position right now.
 My patch to aic7xxx driver was ubmitted four times
 with not much reaction from scsi guys.
 
 Finally they replied and asked to rediff it against their
 git tree. I did that and sent patches back. No reply since then.
 
 And mind you, the patch is not trying to do anything
 complex, it mostly moves code around, removes 'inline',
 adds 'const'. What should I think about it?

this has nothing to do with the bugs on bugzilla.

you're trying to send a janitor patch. It should be logical that the response to
that is not heated or receiving a joyous reception :)

If you have a problem getting your cleanup patch to the driver maintainer, send 
it
to the subsystem maintainer instead, or even the janitors, or even Adrian Bunk 
who
will gladly push it to everyone. Or, even to Andrew Morton who will carry it in
-mm for a while and then harrasses the subsystem maintainer to merge it for you!

Cheers,

Auke

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Russell King

On Wed, Nov 14, 2007 at 02:07:06AM -0800, David Miller wrote:
 From: Russell King [EMAIL PROTECTED]
 Date: Wed, 14 Nov 2007 09:55:07 +

  On Tue, Nov 13, 2007 at 05:55:51PM -0800, David Miller wrote:
   I've created [EMAIL PROTECTED]

  By doing so you've just said (implicitly) that you can not tolerate
  someone having a different opinion from your own.

 I created a mailing list on a machine where I provide such services.

 People can choose to use or not use the new list, it is their choice.

  While I accept *your* right to run *your* lists how you please, you
  are unable to accept *my* right to run *my* lists how I see fit.

 I didn't tell you to take your list down or to run it in some other
 way.  I didn't tell you to unsubscribe everyone and move them over
 to the new list either.

I didn't say that you were.

 I've provided an alternative, and people can pick and choose how they
 see fit.  I'm letting natural selection run it's course.  Are you
 able to cope with the fact that people might not want to use your
 list any longer?  Perhaps that is what bugs you so much about my
 giving people a alternative choice.

Absolutely, and if you'd have read my message you'd have seen that
I'd said effectively the same thing that you're saying here.

Having been flamed for not reading emails properly by AKPM shall
I flame you for not reading my emails properly?  Oh no, it's merely
human to occasionally have such misunderstandings.  Unless you're
rmk.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Russell King

On Wed, Nov 14, 2007 at 01:24:48PM +, Pavel Machek wrote:
 Hi!
 
   Suspend to RAM resume hangs on a tickless (NO_HZ) kernel
   http://bugzilla.kernel.org/show_bug.cgi?id=9275
   Kernel: 2.6.23
   This is HP notebook nc6320 T2400 945GM
  
  No response from developers
 
 Maybe I'm optimistic, but I expected Ingo/Thomas to look after nohz
 problems. nohz=off highres=off fixes more than one suspend problem...
 
 ...stuff I've seen with NOHZ even without suspend (cursor blinking
 irregulary) make me think that nohz perhaps should not be used in
 production just yet...

It appears that bug 9229 has been solved, and the reporter of that
bug now says that:

  If I unset NO_TZ suspend/resume works.
  If I set it suspend/resume doesn't works.

So I think this guy is now suffering from bug #9275

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Ingo Molnar


* Randy Dunlap [EMAIL PROTECTED] wrote:

 On Wed, 14 Nov 2007 21:16:39 +0100 Ingo Molnar wrote:
 
  countered by the underlined sentences above, just in case you missed 
  it.
 
 I didn't miss your claim.

ok, then you conceded it by not replying to it? good ;-)

Ingo

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Ingo Molnar

* David Miller [EMAIL PROTECTED] wrote:

 From: Ingo Molnar [EMAIL PROTECTED]
 Date: Wed, 14 Nov 2007 15:08:47 +0100

  In fact this thread is the very example: David points out that on netdev 
  some of those bugs were already discussed and resolved. Had it been all 
  on lkml we'd all be aware of it.

 That's a rediculious argument.

 One other reason these bugs are resolved, is that the networking 
 developers only need to subscribe to netdev and not have to listen to 
 all the noise on lkml.

what noise? If someone really wants networking discussions only, use 
this procmail rule:

  :0 HBc
  * .*net: *
  sched-patches

to separate it into an extra folder and use net:  as an agreed upon 
Subject line if you really want to narrow things down. (But there would 
still be all the other mail just in case the developer has to look at 
the wider picture. There would be no I'm only subscribed to netdev 
excuse. )

but there should still be one central repository for all kernel 
discussions - just like there is one central repository for all kernel 
code.

 People who want to manage bugs know what list to look on and contact 
 about problems.

i think that's the problem. Developers (and here i dont mean you) who
want to do development only, without being exposed to the global state
of the kernel and without being exposed to bugs. I think that's the
basic mindset difference. That is one of the factor that is causing
assymetric allocation of developers and the increasing detachment from
reality.

 Dumping even more crap on lkml is not the answer.

that crap that i'd like to see dumped upon lkml would be netdev 
traffic mainly - most of the other kernel development lists (and i'm 
subscribed to many of them) are low-traffic. netdev is the main reason 
why we cannot do a one common discussion forum approach.

Ingo

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Ingo Molnar

* James Bottomley [EMAIL PROTECTED] wrote:

 On Wed, 2007-11-14 at 11:56 -0800, David Miller wrote:
  From: Ingo Molnar [EMAIL PROTECTED]
  Date: Wed, 14 Nov 2007 15:08:47 +0100

   In fact this thread is the very example: David points out that on netdev 
   some of those bugs were already discussed and resolved. Had it been all 
   on lkml we'd all be aware of it.

  That's a rediculious argument.

  One other reason these bugs are resolved, is that the networking 
  developers only need to subscribe to netdev and not have to listen 
  to all the noise on lkml.

  People who want to manage bugs know what list to look on and contact 
  about problems.

  Dumping even more crap on lkml is not the answer.

 I agree totally with David, and this goes for SCSI too.  If it's not 
 reported on linux-scsi, there's a significant chance of us missing the 
 bug report.  The fact that some people notice bugs go past on LKML and 
 forward them to linux-scsi is a happy accident and not necessarily 
 something to rely on.

 LKML has 10-20x the traffic of linux-scsi and a much smaller signal to 
 noise ratio.  Having a specialist list where all the experts in the 
 field hangs out actually enhances our ability to fix bugs.

you are actually proving my point. People have to scan lkml for SCSI 
regressions _anyway_, because otherwise _you_ would miss them. In the 
case a user is fortunate enough to realize that a regression is SCSI 
related, and he is lucky enough to pre-select the SCSI mailing list in 
the first go, he might get a fix from you. That already reduces the 
number of useful bugreports by about an order of magnitude.

Ingo

Re: [BUG] New Kernel Bugs

2007-11-14 Thread david


On Wed, 14 Nov 2007, Ingo Molnar wrote:


Dumping even more crap on lkml is not the answer.


that crap that i'd like to see dumped upon lkml would be netdev
traffic mainly - most of the other kernel development lists (and i'm
subscribed to many of them) are low-traffic. netdev is the main reason
why we cannot do a one common discussion forum approach.


hmm, how much work would it be to tweak the mail software on vger to have 
a [EMAIL PROTECTED] that got a copy of any linux-* list hosted by 
vger.


this would solve half the problem (people on linux-kernel not seeing 
discussions on the other lists)


David Lang

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Daniel Barkalow

On Tue, 13 Nov 2007, Theodore Tso wrote:

 There are two parts to this.  One is a Ubuntu development kernel which
 we can give to large numbers of people to expand our testing pool.
 But if we don't do a better job of responding to bug reports that
 would be generated by expanded testing this won't necessarily help us.
 
 The other an automated set of standard pre-built bisection points so
 that testers can more easily localize a bug down to a few hundred
 commits without needing to learn how to use git bisect (think Ubuntu
 users).

I don't see any reason that we couldn't have a tool accessible to Ubuntu 
users that does a real git bisect. Git is really good at being scripted 
by fancy GUIs. It should be easy enough to have a drop down with all of 
the Ubuntu kernel package releases, where the user selects what works and 
what doesn't. Then the tool clones a git repository with flags to only get 
relevant parts, and then leads a bisect run, where it's also 
configuring, building, and installing the kernels (as a different grub 
entry), and providing instructions in general. Fundamentally, git bisect 
is a really low-interaction process: you tell it a couple of commits, and 
then it does stuff, and then you tell it I tested, and it worked or I 
tested, and it had the problem or Something else went wrong, and it 
asks you something new. Other than that, it just takes time (and a build 
system hook, which this tool would handle for the kernel). Eventually, it 
tells you what to report, and you do so.

-Daniel
*This .sig left intentionally blank*

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Neil Brown

On Tuesday November 13, [EMAIL PROTECTED] wrote:
 On Tuesday 13 November 2007 07:08, Mark Lord wrote:
  Ingo Molnar wrote:
  ..
 
   This is all QA-101 that _cannot be argued against on a rational basis_,
   it's just that these sorts of things have been largely ignored for
   years, in favor of the all-too-easy open source means many eyeballs and
   that is our QA answer, which is a _good_ answer but by far not the most
   intelligent answer! Today many eyeballs is simply not good enough and
   nature (and other OS projects) will route us around if we dont change.
 
  ..
 
  QA-101 and many eyeballs are not at all in opposition.
  The latter is how we find out about bugs on uncommon hardware,
  and the former is what we need to track them and overall quality.
 
  A HUGE problem I have with current efforts, is that once someone
  reports a bug, the onus seems to be 99% on the *reporter* to find
  the exact line of code or commit.  Ghad what a repressive method.
 
 This is the only method that scales.

That sounds overly hash, and the rest of you mail sounds much more
moderate and sensible - I can only assume you were using hyperbole??

Putting the onus on the reporter is simply not going to work unless
you have a business relationship.  In the community, we are all
volunteering our time (well, maybe my employer is volunteering my time
to do community support, but the effect is the same).

I would hope that the focus of developers is to empower bug reporters
to provide further information (and as has been said, git bisect is
a great empowerer).  Some people will be incredibly help, especially
if you ask politely and say thankyou.  Others won't for any of a
number of reasons - and maybe that means their bug won't get fixed.

To my eyes, the only method that scales is investing effort in
encouraging and training bug reporters.  Some of that effort might not
produce results, but when others among those you have encouraged start
answering the newbee questions on the list and save you the time, you
get a distinct feeling that it was all worth while.


I think we are in agreement - I just wanted to take issue with that
one sentence :-)  The rest is great.

NeilBrown

 
 Developer has only 24 hours in each day, and sometimes he needs to eat,
 sleep, and maybe even pay attention to e.g. his kids.
 
 But bug reporters are much more numerous and they have more
 hours in one day combined.
 
 BUT - it means that developers should try to increase user base,
 not scare users away.
 
  And if the developer who broke the damn thing, or who at least
  claims to be supporting that code, cannot reproduce the bug,
  they drop it completely.
 
 Developer should let reporter know that reporter needs to help
 a bit here. Sometimes a bit of hand holding is needed, but it
 pays off because you breed more qualified testers/bug reporters.
 
  Contrast that flawed approach with how Linus does things..
  he thinks through the symptoms, matches them to the code,
  and figures out what the few possibilities might be,
  and feeds back some trial balloon patches for the bug reporter to try.
 
  MUCH better.
 
  And remember, *I'm* an old-time Linux kernel developer.. just think about
  the people reporting bugs who haven't been around here since 1992..
 
 Yes. Developers should not grow more and more unhelpful
 and arrogant towards their users just because inexperienced
 users send incomplete/poorly written bug reports.
 They need to provide help, not humiliate/ignore.
 
 I think we agree here.
 --
 vda
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] New Kernel Bugs

2007-11-14 Thread Neil Brown

On Wednesday November 14, [EMAIL PROTECTED] wrote:
 On Wed, Nov 14, 2007 at 09:38:20AM -0800, Randy Dunlap wrote:
  On Wed, 14 Nov 2007 15:08:47 +0100 Ingo Molnar wrote:
   so please stop this too busy and too noisy nonsense already. It was 
   nonsense 10 years ago and it's nonsense today. In 10 years the kernel 
   grew from a 1 million lines codebase to an 8 million lines codebase, so 
   what? Deal with it and be intelligent about filtering your information 
   influx instead of imposing a hard pre-filtering criteria that restricts 
   intelligent processing of information.
  
  So you have a preferred method of handling email.  Please don't
  force it on the rest of us.
 
 I'd be curious for any pointers on tools, actually.  I read (ok, skim)
 lkml but still overlook relevant bug reports occasionally.
 (Fortunately, between Trond and Andrew and others forwarding things it's
 not actually a problem, but I'm still curious).

Virtual Folders.

I use VM mode in EMACS, but I believe some other mail readers have the
same functionality.
I have a virtual folder called nfs which shows me all mail in my
inbox which has the string 'nfs' or 'lockd' in a To, Cc, or Subject
field.  When I visit that folder, I see all mail about nfs, whether it
was sent to me personally, or to a relevant list, or to lkml.

Admittedly if someone doesn't bother to choose a meaningful Subject,
then I might miss that.  I think this mostly happens when Andrew sends
a -mm announcement, asked people to change the subject line when
following up, and someone follows up without changing the subject line
and say NFS doesn't work any more.

I have another virtual folder which matches md and raid and
mdadm in any header (so when the people from coraid.com talk about
ATA over Ethernet, that gets badly filed, but it is a small cost).

Then I have the bkernel (boring kernel) folder for all mail from
lkml that doesn't mention nfs or raid or md, and isn't from or to
me.  That folder I skim every week or so and just read the juicy
debates and look for interesting tidbits from interesting people -
then delete the whole folder, mostly unread.

I don't think I could cope with mail without virtual folders.

NeilBrown

Re: [BUG] New Kernel Bugs

2007-11-13 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Tue, 13 Nov 2007 03:15:53 -0800

  NETWORKING===

  RTNLGRP_ND_USEROPT does not report ifindex (IPv6)
  http://bugzilla.kernel.org/show_bug.cgi?id=9349
  Kernel: 2.6.24+

 No response from developers

That's funny, then how come there was a proper patch fix posted
and it's now in my tree ready to go to Linus?

I think you like just saying No response from developers over and
over again to make some of point about how developers are ignoring
lots of bugs.  That's fine, but at least be accurate about it :-)

Re: [BUG] New Kernel Bugs

2007-11-13 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Tue, 13 Nov 2007 04:12:59 -0800

 On Tue, 13 Nov 2007 03:58:24 -0800 (PST) David Miller [EMAIL PROTECTED] 
 wrote:

  From: Andrew Morton [EMAIL PROTECTED]
  Date: Tue, 13 Nov 2007 03:49:16 -0800

   Do you believe that our response to bug reports is adequate?

  Do you feel that making us feel and look like shit helps?

 That doesn't answer my question.

 See, first we need to work out whether we have a problem.  If we do this,
 then we can then have a think about what to do about it.

 I tried to convince the 2006 KS attendees that we have a problem and I
 resoundingly failed.  People seemed to think that we're doing OK.

 But it appears that data such as this contradicts that belief.

 This is not a minor matter.  If the kernel _is_ slowly deteriorating then
 this won't become readily apparent until it has been happening for a number
 of years.  By that stage there will be so much work to do to get us back to
 an acceptable level that it will take a huge effort.  And it will take a
 long time after that for the kerel to get its reputation back.

 So it is important that we catch deterioration *early* if it is happening.

You tell me what I should spend my time working on, and I promise to
do it OK? :-)

For example, if I have a choice between a TCP crash just about anyone
can hit and some obscure issue only reported with some device nearly
nobody has, which one should I analyze and work on?

That's the problem.  All of us prioritize and it means the chaff
collects at the bottom.  You cannot fix that except by getting more
bug fixers so that the chaff pile has a chance to get smaller.

Luckily if the report being ignored isn't chaff, it will show up again
(and again and again) and this triggers a reprioritization because not
only is the bug no longer chaff, it also now got a lot of information
tagged to it so it's a double worthwhile investment to work on the
problem.

I think a lot of bugs that aren't getting looked at are simply
sitting in some early stage of this process.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Ingo Molnar


* Andrew Morton [EMAIL PROTECTED] wrote:

   Do you believe that our response to bug reports is adequate?
  
  Do you feel that making us feel and look like shit helps?
 
 That doesn't answer my question.
 
 See, first we need to work out whether we have a problem.  If we do 
 this, then we can then have a think about what to do about it.
 
 I tried to convince the 2006 KS attendees that we have a problem and I 
 resoundingly failed.  People seemed to think that we're doing OK.
 
 But it appears that data such as this contradicts that belief.
 
 This is not a minor matter.  If the kernel _is_ slowly deteriorating 
 then this won't become readily apparent until it has been happening 
 for a number of years.  By that stage there will be so much work to do 
 to get us back to an acceptable level that it will take a huge effort.  
 And it will take a long time after that for the kerel to get its 
 reputation back.
 
 So it is important that we catch deterioration *early* if it is 
 happening.

yes, yes, yes, and i agree with you that there is a problem. I tried to 
make this point at the 2007 KS: not only is degradation in quality not 
apparent for years, slow degradation in quality can give kernel 
developers the exact _opposite_ perception! (Fewer testers means fewer 
bugreports and that results in apparent improved quality and fewer 
reported regressions - while exactly the opposite is happening and 
testers are leaving us without giving us any indication that this is 
happening. We just dont notice.)

I'm not moaning about bugs that slip through - those are unavoidable 
facts of a high flux codebase. I'm moaning about reoccuring, avoidable 
bugs, i'm moaning about hostility towards testers, i'm moaning about 
hostility towards automated testing, i'm moaning about unnecessary hoops 
a willing (but unskilled) tester has to go through to help us out.

I tried to make the point that the only good approach is to remove our 
current subjective bias from quality metrics and to at least realize 
what a cavalier attitude we still have to QA. The moment we are able to 
_measure_ how bad we are, kernel developers will adopt in a second and 
will improve those metrics. Lets use more debug tools, both static and 
dynamic ones. Lets measure tester base and we need to measure _lost_ 
early adopters and the reasons why they are lost. Regression metrics are 
a very important first step too and i'm very happy about the increasing 
effort that is being spent on this.

This is all QA-101 that _cannot be argued against on a rational basis_, 
it's just that these sorts of things have been largely ignored for 
years, in favor of the all-too-easy open source means many eyeballs and 
that is our QA answer, which is a _good_ answer but by far not the most 
intelligent answer! Today many eyeballs is simply not good enough and 
nature (and other OS projects) will route us around if we dont change.

We kernel developers have been spoiled by years of abundance in testing 
resources. We squander tons of resources in this area, and we could be 
so much more economic about this without hindering our development model 
in any way. We could be so much better about QA and everyone would 
benefit without having to compromize on the incoming flux of changes - 
it's so much easier to write new features for a high quality kernel.

My current guesstimation is that we are utilizing our current testing 
resources at around 10% efficiency. (i.e. if we did an 'ideal' job we 
could fix 10 times as many bugs with the same size of tester effort!) It 
used to be around 5%. (and i mainly attribute the increase from 5% to 
10% to Andrew and the many other people who do kernel QA - kudos!) 10% 
is still awful and we very much suck.

Paradoxically, the end product is still considerably good quality in 
absolute terms because other pieces of our infrastructure are so good 
and powerful, but QA is still a 'weak link' of our path to the user that 
reduces the quality of the end result. We could _really_ be so much 
better without any compromises that hurt.

(and this is in no way directed at the networking folks - it holds for 
all of us. I have one main complaint about networking: the separate 
netdev list is a bad idea - networking regressions should be discussed 
and fixed on lkml, like most other subsystems are. Any artificial split 
of the lk discussion space is bad.)

Ingo

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Ingo Molnar wrote:
..
This is all QA-101 that _cannot be argued against on a rational basis_, 
it's just that these sorts of things have been largely ignored for 
years, in favor of the all-too-easy open source means many eyeballs and 
that is our QA answer, which is a _good_ answer but by far not the most 
intelligent answer! Today many eyeballs is simply not good enough and 
nature (and other OS projects) will route us around if we dont change.

..

QA-101 and many eyeballs are not at all in opposition.
The latter is how we find out about bugs on uncommon hardware,
and the former is what we need to track them and overall quality.

A HUGE problem I have with current efforts, is that once someone
reports a bug, the onus seems to be 99% on the *reporter* to find
the exact line of code or commit.  Ghad what a repressive method.

And if the developer who broke the damn thing, or who at least
claims to be supporting that code, cannot reproduce the bug,
they drop it completely.

Contrast that flawed approach with how Linus does things..
he thinks through the symptoms, matches them to the code,
and figures out what the few possibilities might be,
and feeds back some trial balloon patches for the bug reporter to try.

MUCH better. 


Linus also asks for a git bisect, but doesn't insist upon the reporter
learning an entire new (poorly documented) toolset just to to report a bug.

Blah!

And remember, *I'm* an old-time Linux kernel developer.. just think about
the people reporting bugs who haven't been around here since 1992..

-ml

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Mark Lord wrote:

Andrew Morton wrote:
On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich 
[EMAIL PROTECTED] wrote:

..

..

Suspend to RAM resume hangs on a tickless (NO_HZ) kernel
http://bugzilla.kernel.org/show_bug.cgi?id=9275
Kernel: 2.6.23
This is HP notebook nc6320 T2400 945GM

No response from developers

..

I *still* get very slow resume-from-RAM quite often here
(new in 2.6.22 kernel, wasn't there in early 2.6.23-rc*).

..

Typo.  That should have said:


(new in 2.6.23 kernel, wasn't there in early 2.6.23-rc*).

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Bartlomiej Zolnierkiewicz

On Nov 13, 2007 12:15 PM, Andrew Morton [EMAIL PROTECTED] wrote:
 On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich [EMAIL PROTECTED] 
 wrote:

  This is the listing of the open bugs that are relatively new, around
  2.6.22 and up. They are vaguely classified by specific area.
  (not a full list, there are more :)

[...]

  IDE/SATA=

[...]

  DVD-RAM umount and disk free bug
  http://bugzilla.kernel.org/show_bug.cgi?id=9265
  Kernel: 2.6.15  (asked to try current kernel)

 No response from developers

Bug was filled under IO/Storage-Other so is it assigned to
[EMAIL PROTECTED].

Could be a FS problem as well but it is the best to wait for
confirmation with 2.6.23 before proceeding further...

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Giacomo A. Catenazzi


Mark Lord wrote:

Ingo Molnar wrote:
..
This is all QA-101 that _cannot be argued against on a rational 
basis_, it's just that these sorts of things have been largely ignored 
for years, in favor of the all-too-easy open source means many 
eyeballs and that is our QA answer, which is a _good_ answer but by 
far not the most intelligent answer! Today many eyeballs is simply 
not good enough and nature (and other OS projects) will route us 
around if we dont change.

..

QA-101 and many eyeballs are not at all in opposition.
The latter is how we find out about bugs on uncommon hardware,
and the former is what we need to track them and overall quality.

A HUGE problem I have with current efforts, is that once someone
reports a bug, the onus seems to be 99% on the *reporter* to find
the exact line of code or commit.  Ghad what a repressive method.



As a long time kernel tester, I see some problem with the
newer new development model. In the short merge windows,
after to much time, there are to many patches.
So there are problem to bisect bugs, and to have attention
of developers. My impression is that in a week there are
many more messages in lkml and to much bugs to be
handled in these few days.

I've two proposal:

- better patch quality. I would like that every commit
would compile. So an automatic commit test and public
blames could increase the quality of first commits.
[bisecting with non compilable point it is not a trivial
task]

- a slow down the patch inclusion on the merge windows
(aka: not to much big changes in the first days).
As tester I prefer that some big changes would be
included in a secondary window (pre o rc release),
in an other period as the big patch rush.

ciao
cate

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Benoit Boissinot

On Nov 13, 2007 3:08 PM, Mark Lord [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 ..
  This is all QA-101 that _cannot be argued against on a rational basis_,
  it's just that these sorts of things have been largely ignored for
  years, in favor of the all-too-easy open source means many eyeballs and
  that is our QA answer, which is a _good_ answer but by far not the most
  intelligent answer! Today many eyeballs is simply not good enough and
  nature (and other OS projects) will route us around if we dont change.
 ..

 QA-101 and many eyeballs are not at all in opposition.
 The latter is how we find out about bugs on uncommon hardware,
 and the former is what we need to track them and overall quality.

 A HUGE problem I have with current efforts, is that once someone
 reports a bug, the onus seems to be 99% on the *reporter* to find
 the exact line of code or commit.  Ghad what a repressive method.


Btw, I used to test every -mm kernel. But since I've switched distros
(gentoo-ubuntu)
and I have less time, I feel it's harder to test -rc or -mm kernels (I
know this isn't a lkml problem
but more a distro problem, but I would love having an ubuntu blessed
repo with current dev kernel
for the latest stable ubuntu release).

For debugging, maybe it's time someone does an amazon ec2+s3 service
to automate the bisecting
and create .deb/.rpm from git, I don't know how much it would cost though.

regards,

Benoit

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Ray Lee

On Nov 13, 2007 7:24 AM, Giacomo A. Catenazzi [EMAIL PROTECTED] wrote:
 As a long time kernel tester, I see some problem with the
 newer new development model. In the short merge windows,
 after to much time, there are to many patches.

I think the root issue there is that it's hard to get all testers to
run a bisect, but easy to ask them to test snapshots. Right now the
snapshots are generated nightly, but I think it would make more sense
if they were generated every N patches, for some value of N...

Of course, for that to really work, we have to ensure that the result
is always compilable, which has been getting better, but not perfect.

Ray

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Thomas Gleixner

On Tue, 13 Nov 2007, Mark Lord wrote:

 Mark Lord wrote:
  Andrew Morton wrote:
   On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich
   [EMAIL PROTECTED] wrote:
  ..
 ..
Suspend to RAM resume hangs on a tickless (NO_HZ) kernel
http://bugzilla.kernel.org/show_bug.cgi?id=9275
Kernel: 2.6.23
This is HP notebook nc6320 T2400 945GM
   No response from developers
  ..
  
  I *still* get very slow resume-from-RAM quite often here
  (new in 2.6.22 kernel, wasn't there in early 2.6.23-rc*).
 ..
 
 Typo.  That should have said:
 
  (new in 2.6.23 kernel, wasn't there in early 2.6.23-rc*).

Just asked that :) Is there a chance to bisect that ?

Thanks,

tglx

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk

On Tue, Nov 13, 2007 at 07:57:54AM -0800, Ray Lee wrote:
 On Nov 13, 2007 7:24 AM, Giacomo A. Catenazzi [EMAIL PROTECTED] wrote:
  As a long time kernel tester, I see some problem with the
  newer new development model. In the short merge windows,
  after to much time, there are to many patches.
 
 I think the root issue there is that it's hard to get all testers to
 run a bisect, but easy to ask them to test snapshots. Right now the
 snapshots are generated nightly, but I think it would make more sense
 if they were generated every N patches, for some value of N...
...

I don't see a point in doing that - that would be a more manual 
bisecting, and the result would not be one guilty commit.

Testers are not expected to be able to hack a kernel, but it's 
reasonable to expect testers to be able to build their own kernels
(and your proposal wouldn't change that).

The small instruction below is enough for everyone who is able to 
build his own kernel to do a git bisect.

 Ray

cu
Adrian


--  snip  --


# install git

# clone Linus' tree:
git clone \ 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

# start bisecting:
cd linux-2.6
git bisect start
git bisect bad v2.6.21
git bisect good v2.6.20
cp /path/to/.config .

# start a round
make oldconfig
make
# install kernel, check whether it's good or bad, then:
git bisect [bad|good]
# start next round


After at about 10-15 reboots you'll have found the guilty commit
(...  is first bad commit).


More information on git bisecting:
  man git-bisect

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Theodore Tso

On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
 Btw, I used to test every -mm kernel. But since I've switched distros
 (gentoo-ubuntu)
 and I have less time, I feel it's harder to test -rc or -mm kernels (I
 know this isn't a lkml problem
 but more a distro problem, but I would love having an ubuntu blessed
 repo with current dev kernel
 for the latest stable ubuntu release).

There are two parts to this.  One is a Ubuntu development kernel which
we can give to large numbers of people to expand our testing pool.
But if we don't do a better job of responding to bug reports that
would be generated by expanded testing this won't necessarily help us.

The other an automated set of standard pre-built bisection points so
that testers can more easily localize a bug down to a few hundred
commits without needing to learn how to use git bisect (think Ubuntu
users).

So for the first, I've actually been playing with some plans to put
together an unofficial kernel that basically what Ted is using on his
laptop.  It generally has emergency bug fixes that haven't made it
into mainline, plus some other trees where I've been more aggressive
since I want to latest in wireless and powersaving technology, etc.
It has the property that if it breaks, you get to keep both pieces
--- and I've helpfully included the git ID in the package name so you
can do the bisection yourself.  If you want to try it, the first such
kernel is here:

   http://www.kernel.org/~tytso/tbek

I wasn't planning on talking about it until it was more fully baked,
but if people want something vaguely stable based on 2.6.24-rc2, this
might be interesting.

As for the second, I was just talking to Arjan over pizza and beer
last night, and we reached the same conclusion as Ingo, which is this
really isn't that hard.  It wouldn't be that hard to set up
infrastructure to do this, and it's just a matter of getting the disk
space and the network bandwidth togehter in the right place, plus a
relatively small amount of prgramming at least for the simplest
iteration of the idea.  (As is quite common when doing designs over
beer, we talked about some more gradious web-based schemes to do
custom built kernels that was tied to the kernel bugzilla, but first
things first. :-)

- Ted

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Randy Dunlap

On Tue, 13 Nov 2007 09:33:21 -0600 James Bottomley wrote:

 On Tue, 2007-11-13 at 03:15 -0800, Andrew Morton wrote:
  
  SCSI==
   
   qla2xxx: driver initialization does not complete when booting with
   Port connected
   http://bugzilla.kernel.org/show_bug.cgi?id=9267
   Kernel: 2.6.23.1
  
  No response from developers
 
 Urm, well, if no-one ever tells the SCSI list it's unrealistic to expect
 anyone to be working on it.  As far as I can tell, email was sent to
 Andrew Vasquez only on 31 October. However, the fault looks to be
 generic, so he probably just dropped it.

It seems that new SCSI bugs need to be sent to [EMAIL PROTECTED]

Martin, can you arrange that to happen automatically instead of
Andrew having to do it manually?

---
~Randy

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Larry Finger

Theodore Tso wrote:
 On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
 Btw, I used to test every -mm kernel. But since I've switched distros
 (gentoo-ubuntu)
 and I have less time, I feel it's harder to test -rc or -mm kernels (I
 know this isn't a lkml problem
 but more a distro problem, but I would love having an ubuntu blessed
 repo with current dev kernel
 for the latest stable ubuntu release).
 
 There are two parts to this.  One is a Ubuntu development kernel which
 we can give to large numbers of people to expand our testing pool.
 But if we don't do a better job of responding to bug reports that
 would be generated by expanded testing this won't necessarily help us.

I'm very encouraged to read of your expanded testing efforts. As a bcm43xx 
developer, Ubuntu has
been our problem distro, mostly because your standard kernels have debugging 
turned off for bcm43xx.
When a Ubuntu user reports a problem and we ask for the relevant output from 
dmesg, they have no
information. I ask two things of all distros: (1) Turn on debugging - we don't 
spam the logs that
badly, and (2) forward any bugs found by your testing to the maintainer, and/or 
the bcm43xx mailing
list.

Thanks,

Larry

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Jan Kara

  FILE SYSTEMS===
  
  ext4: delalloc space accounting problem drops data
  http://bugzilla.kernel.org/show_bug.cgi?id=9329
  Kernel: 2.6.24-rc1
 No response from developers
  Actually, there has been a response (Eric asked in mailing list and
created a bug and got answer to the mailing list):
http://marc.info/?l=linux-ext4m=119454449014728w=2

  POSIX Access Control Lists cause bogus file system check errors
  http://bugzilla.kernel.org/show_bug.cgi?id=9241
  Kernel: 2.6.23.1
 
 Andreas did some work, seemed to lose interest.
  As I read the bug it seems that the cause was a filesystem with errors
(which were in ACL's and thus kernel didn't boot only with ACL's
enabled) and fsck fixed the problem... I would close this one as
invalid (OK, I know the filesystem had to be corrupted somehow but
unless this is at least occasionally reproducible, there's low chance of
finding the bug).

Honza

-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Ingo Molnar wrote:


for example git-bisect was godsent. I remember that years ago bisection 
of a bug was a very laborous task so that it was only used as a final, 
last-ditch approach for really nasty bugs. Today we can autonomouly 
bisect build bugs via a simple shell command around git-bisect run, 
without any human interaction! This freed up testing resources 

..

It's only a godsend for the few people who happen to be kernel developers
and who happen to already use git.

It's a 540MByte download over a slow link for everyone else.

-ml

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Thomas Gleixner wrote:

On Tue, 13 Nov 2007, Mark Lord wrote:


 Andrew Morton wrote:

  On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich
  [EMAIL PROTECTED] wrote:

 ..

   with CONFIG_NO_HZ and/or CONFIG_HPET_TIMER set kernel 2.6.23 doesn't
   boot (ARM, Timer)
   http://bugzilla.kernel.org/show_bug.cgi?id=9229
   Kernel: 2.6.23
  
  No response from developers

 ..


The bug report is bogus. ARM has no CONFIG_HPET_TIMER. 
 

 Note:  that same bug exists/existed on i386 back when NO_HZ was
 introduced (2.6.21?).  I still see it from time to time on my Quad core
 system (very rare), but not any more on my Duo notebook where it used
 to happen about 1 in n boots (n  10).
 
 AFAICT no fix was ever released for it.


Hmm, at which point does the boot stop ? 

..

Just as it prints out these messages, sometimes one of them,
sometimes both (or all four on the quad core):

kernel: switched to high resolution mode on cpu 1
kernel: switched to high resolution mode on cpu 0

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Russell King

On Tue, Nov 13, 2007 at 03:15:53AM -0800, Andrew Morton wrote:
 On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich [EMAIL PROTECTED] 
 wrote:
  PLATFORM===
  
  xipImage is built so that uBoot cant run it (ARM)
  http://bugzilla.kernel.org/show_bug.cgi?id=9356
  Kernel: 2.6.21
 
 Zero responses from developers

For christ sake Andrew.  Some of us are not employed to do kernel work
24h x 365days a year.  You might be, I'm not.

First thing, it's not a regression.  Second thing, it's *not* a bug.

uboot requires kernel images to be specially wrapped up in their crappy
formats before uboot will recognise it.  This means that if someone wants
to boot a binary image with uboot, they need to either:

1. work out the correct 'mkimage' command and run that program after
   the kernel build has completed.

2. sort out adding a new target to the kernel makefiles to run this
   uboot specific 'mkimage' command automatically.

And Alexandre (the original feature-missing reporter) has linked to a
message where a patch was proposed to do (2).  So obviously it's no
longer a problem for the reporter.

  with CONFIG_NO_HZ and/or CONFIG_HPET_TIMER set kernel 2.6.23 doesn't
  boot (ARM, Timer)
  http://bugzilla.kernel.org/show_bug.cgi?id=9229
  Kernel: 2.6.23
 
 No response from developers

Bug was assigned to reporter, so I ignored it on the grounds that the
reporter was resolving it.  Plus, until recently I didn't have any
workable PXA systems to test stuff on.

In the end, a similar issue has been resolved anyway after a lot of
discussion on the ARM lists about how PXA should handle one-shot mode
with clockevents. It took absolutely ages to get agreement on what was
a simple patch.

commit 91bc51d8a10b00d8233dd5b6f07d7eb40828b87d
Author: Russell King [EMAIL PROTECTED]
Date:   Thu Nov 8 23:35:46 2007 +

[ARM] pxa: fix one-shot timer mode

One-shot timer mode on PXA has various bugs which prevent kernels
build with NO_HZ enabled booting.  They end up spinning on a
permanently asserted timer interrupt because we don't properly
clear it down - clearing the OIER bit does not stop the pending
interrupt status.  Fix this in the set_mode handler as well.

Moreover, the code which sets the next expiry point may race with
the hardware, and we might not set the match register sufficiently
in the future.  If we encounter that situation, return -ETIME so
the generic time code retries.

Acked-by: Thomas Gleixner [EMAIL PROTECTED]
Acked-by: Nicolas Pitre [EMAIL PROTECTED]
Signed-off-by: Russell King [EMAIL PROTECTED]

Ergo, the bug can be closed provided the reporter re-tests a recent git
snapshot.  Sorry, no idea how the above commit relates to Linus' releases
and/or git snapshots.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk

On Tue, Nov 13, 2007 at 12:50:08PM -0500, Mark Lord wrote:
 Ingo Molnar wrote:

 for example git-bisect was godsent. I remember that years ago bisection of 
 a bug was a very laborous task so that it was only used as a final, 
 last-ditch approach for really nasty bugs. Today we can autonomouly bisect 
 build bugs via a simple shell command around git-bisect run, without any 
 human interaction! This freed up testing resources 
 ..

 It's only a godsend for the few people who happen to be kernel developers

It's also godsend for users who want a regression they observe fixed.

If you can tell which patch broke it you often turned a very hard to 
debug problem into a relatively easy fixable problem.

As an example, [1] was an issue a normal user could discover, and 
bisecting made the difference between nearly undebuggable and
easily fixable by revertng a commit.

 and who happen to already use git.

As already said in thread, the required instructions for bisecting are 
relatively short and simple (assuming the user can build his own 
kernels).

 It's a 540MByte download over a slow link for everyone else.

Not everyone has a slow connection.

For me, the speed of cloning a tree from git.kernel.org is completely 
cpu bound and limited by the speed of the 1.8 Ghz Athlon in my 
computer...

But if there is a real life problem like people with extremely slow and 
expensive internet connections not being able to bisect bugs these 
problems should be named and fixed (e.g. by sending CDs).

 -ml

cu
Adrian

[1] http://lkml.org/lkml/2007/11/12/154

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Russell King

On Tue, Nov 13, 2007 at 05:07:21PM +0100, Thomas Gleixner wrote:
 On Tue, 13 Nov 2007, Mark Lord wrote:
 
  Andrew Morton wrote:
   On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich
   [EMAIL PROTECTED] wrote:
  ..
with CONFIG_NO_HZ and/or CONFIG_HPET_TIMER set kernel 2.6.23 doesn't
boot (ARM, Timer)
http://bugzilla.kernel.org/show_bug.cgi?id=9229
Kernel: 2.6.23
   
   No response from developers
  ..
 
 The bug report is bogus. ARM has no CONFIG_HPET_TIMER. 

Plus we've just merged a fix for NO_HZ on PXA platforms due to an utterly
broken one-shot implementation.  So chances are this problem is now fixed.

However, I object strongly to Andrew's responses to these bugs.  He's
completely out of line.

Given the wide range of ARM platforms today, it is utterly idiotic to
expect a single person to be able to provide responses for all ARM bugs.
I for one wish I'd never *VOLUNTEERED* to be a part of the kernel
bugzilla, and really *WISH* I could pull out of that function.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Adrian Bunk wrote:

On Tue, Nov 13, 2007 at 12:50:08PM -0500, Mark Lord wrote:

Ingo Molnar wrote:
for example git-bisect was godsent. I remember that years ago bisection of 
a bug was a very laborous task so that it was only used as a final, 
last-ditch approach for really nasty bugs. Today we can autonomouly bisect 
build bugs via a simple shell command around git-bisect run, without any 
human interaction! This freed up testing resources 

..

It's only a godsend for the few people who happen to be kernel developers


It's also godsend for users who want a regression they observe fixed.

If you can tell which patch broke it you often turned a very hard to 
debug problem into a relatively easy fixable problem.

..

Oh yes, definitely.  When that use happens to be a kernel dev + git user,
it saves the *fool who broke it* a hell of a lot of time, because they can
slough it off onto the poor bloke who notices it.

Mind you, no arguing that this is effective when that poor bloke
has a day free to download the git-tree and build/reboot a dozen times.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk

On Tue, Nov 13, 2007 at 01:18:43PM -0500, Mark Lord wrote:
 Adrian Bunk wrote:
 On Tue, Nov 13, 2007 at 12:50:08PM -0500, Mark Lord wrote:
 Ingo Molnar wrote:
 for example git-bisect was godsent. I remember that years ago bisection 
 of a bug was a very laborous task so that it was only used as a final, 
 last-ditch approach for really nasty bugs. Today we can autonomouly 
 bisect build bugs via a simple shell command around git-bisect run, 
 without any human interaction! This freed up testing resources 
 ..

 It's only a godsend for the few people who happen to be kernel developers

 It's also godsend for users who want a regression they observe fixed.

 If you can tell which patch broke it you often turned a very hard to debug 
 problem into a relatively easy fixable problem.
 ..

 Oh yes, definitely.  When that use happens to be a kernel dev + git user,
 it saves the *fool who broke it* a hell of a lot of time, because they can
 slough it off onto the poor bloke who notices it.

fool who broke it are hard works. Bugs are part of software 
development, so you'd have to name everyone who develops software
a fool.

But the main point is that often you don't know who broke it until you 
know which commit broke it.

 Mind you, no arguing that this is effective when that poor bloke
 has a day free to download the git-tree and build/reboot a dozen times.

I did bisecting myself, and I know that it costs time and work.

But the first point is the above one that it makes otherwise nearly 
undebuggable problems debuggable and fixable.

Another point is that it shifts the work from the few experienced 
developers to the many users. Users (and voluntary testers) we have
many, but developer time for debugging bug reports is a quite scarce 
resource.

And why poor bloke? Bisecting takes time, but that's not different 
from e.g. writing code or cleaning up code or going through bug reports.

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Matthew Wilcox

On Tue, Nov 13, 2007 at 12:50:08PM -0500, Mark Lord wrote:
 It's a 540MByte download over a slow link for everyone else.

Where do you get this number from?
$ du -sh .git/objects/pack/
249M.git/objects/pack/
$ du -sh .git/objects/
253M.git/objects/

ie about half what you claim.

-- 
Intel are signing my paycheques ... these opinions are still mine
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Matthew Wilcox wrote:

On Tue, Nov 13, 2007 at 12:50:08PM -0500, Mark Lord wrote:

It's a 540MByte download over a slow link for everyone else.


Where do you get this number from?
$ du -sh .git/objects/pack/
249M.git/objects/pack/
$ du -sh .git/objects/
253M.git/objects/

ie about half what you claim.

..

No, it's from earlier in this very thread:

Adrian Bunk wrote:
The small instruction below is enough for everyone who is able to 
build his own kernel to do a git bisect.

..

--  snip  --


# install git

# clone Linus' tree:
git clone \ 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

..

mkdir t
cd t
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
(wait half an hour)
/usr/bin/du -s linux-2.6
522732  linux-2.6

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Matthew Wilcox

On Tue, Nov 13, 2007 at 01:43:53PM -0500, Mark Lord wrote:
 Matthew Wilcox wrote:
 ie about half what you claim.
 ..
 
 No, it's from earlier in this very thread:
 
 Adrian Bunk wrote:
 git clone \ 
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
 ..
 
 mkdir t
 cd t
 git clone 
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
 (wait half an hour)
 /usr/bin/du -s linux-2.6
 522732  linux-2.6

You're assuming that everything in linux-2.6 was downloaded; that's
not true.  Everything in linux-2.6/.git was downloaded; but then you do a
checkout which happens to approximately double the size of the linux-2.6
directory.  If you do git-clone -n, you'll get a closer estimate to the
size of the download.

I suppose git-clone should grow a -v option that it could pass to rsync
to let us find out how many bytes are actually transferred, but i'm
happy to go with 250MB as a close estimate to the amount of data to xfer.

When you compare it to the 60MB tarballs that are published, it's really
not that bad.

-- 
Intel are signing my paycheques ... these opinions are still mine
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Theodore Tso

On Tue, Nov 13, 2007 at 11:33:44AM -0600, Larry Finger wrote:
 I'm very encouraged to read of your expanded testing efforts. As a
 bcm43xx developer, Ubuntu has been our problem distro, mostly
 because your standard kernels have debugging turned off for bcm43xx.
 When a Ubuntu user reports a problem and we ask for the relevant
 output from dmesg, they have no information. I ask two things of all
 distros: (1) Turn on debugging - we don't spam the logs that badly,
 and (2) forward any bugs found by your testing to the maintainer,
 and/or the bcm43xx mailing list.

Heh. I hadn't enabled CONFIG_BCM43XX_DEBUG myself, but I just changed
it for my next kernel build.  This is a slightly different issue,
which is that sometimes _DEBUG options shouldn't be turned on by
default (because they really trash performance and bloat log size),
and sometimes they are painless to turn on and don't cost much.

If that is the case, I'd suggest removing the option and just making
it compiled in by default with a run-time option to enable it.

  - Ted

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Gabriel C

Adrian Bunk wrote:
 On Tue, Nov 13, 2007 at 12:13:56PM -0500, Theodore Tso wrote:
 On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
 Btw, I used to test every -mm kernel. But since I've switched distros
 (gentoo-ubuntu)
 and I have less time, I feel it's harder to test -rc or -mm kernels (I
 know this isn't a lkml problem
 but more a distro problem, but I would love having an ubuntu blessed
 repo with current dev kernel
 for the latest stable ubuntu release).
 There are two parts to this.  One is a Ubuntu development kernel which
 we can give to large numbers of people to expand our testing pool.
 But if we don't do a better job of responding to bug reports that
 would be generated by expanded testing this won't necessarily help us.
 ...
 
 The main problem is finding experienced developers who spend time on 
 looking into bug reports.

There are already. IMO the problem is the development model.

There are tons new features in each new kernel release and 'tons new bugs'
which are not fixed during the release cycle nor in the .XX stable kernels.

Maybe after XX kernel releases there should be one just with bug-fixes 
_without_ any
new features , eg: cleaning bugs from bugzilla , know regressions , cleaning up 
code , 
removing broken drivers and the like.


 cu
 Adrian

Gabriel

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton

On Tue, 13 Nov 2007 04:32:07 -0800 (PST) David Miller [EMAIL PROTECTED] wrote:

 From: Andrew Morton [EMAIL PROTECTED]
 Date: Tue, 13 Nov 2007 04:12:59 -0800

  On Tue, 13 Nov 2007 03:58:24 -0800 (PST) David Miller [EMAIL PROTECTED] 
  wrote:

   From: Andrew Morton [EMAIL PROTECTED]
   Date: Tue, 13 Nov 2007 03:49:16 -0800

Do you believe that our response to bug reports is adequate?

   Do you feel that making us feel and look like shit helps?

  That doesn't answer my question.

  See, first we need to work out whether we have a problem.  If we do this,
  then we can then have a think about what to do about it.

  I tried to convince the 2006 KS attendees that we have a problem and I
  resoundingly failed.  People seemed to think that we're doing OK.

  But it appears that data such as this contradicts that belief.

  This is not a minor matter.  If the kernel _is_ slowly deteriorating then
  this won't become readily apparent until it has been happening for a number
  of years.  By that stage there will be so much work to do to get us back to
  an acceptable level that it will take a huge effort.  And it will take a
  long time after that for the kerel to get its reputation back.

  So it is important that we catch deterioration *early* if it is happening.

 You tell me what I should spend my time working on, and I promise to
 do it OK? :-)

My suggestion: regressions.

If we're really active in chasing down the regressions then I think we can
be confident that the kernel isn't deteriorating.  Probably it will be
improving as we also fix some always-been-there bugs.

I think that we're fairly good about working the regressions in
Adrian/Michal/Rafael's lists but once Linus releases 2.6.x we tend to let
the unsolved ones slide, and we don't pay as much attention to the
regressions which 2.6.x testers report.

 For example, if I have a choice between a TCP crash just about anyone
 can hit and some obscure issue only reported with some device nearly
 nobody has, which one should I analyze and work on?

 That's the problem.  All of us prioritize and it means the chaff
 collects at the bottom.  You cannot fix that except by getting more
 bug fixers so that the chaff pile has a chance to get smaller.

 Luckily if the report being ignored isn't chaff, it will show up again
 (and again and again) and this triggers a reprioritization because not
 only is the bug no longer chaff, it also now got a lot of information
 tagged to it so it's a double worthwhile investment to work on the
 problem.

 I think a lot of bugs that aren't getting looked at are simply
 sitting in some early stage of this process.

Yes, that's a useful technique.  If multiple people are being hurt a lot by
a bug then that's a more important one to fix than the single-person
minor-irritant bug.

otoh that doesn't work very well with driver/platform bugs.  Often these
are regressions which only a single person can reproduce within the time
window which we have in which we can fix it.  If we don't fix it in that
window it'll go out to distros and presumably some more people will hit it.

So I don't see much alternative here to the traditional
work-with-the-originator way of resolving it.

git bisection should really help us with these regressions but it doesn't
appear that people are using as much as one would like.  I'm hoping that
the very good http://www.kernel.org/doc/local/git-quick.html will help us
out here.  Thanks to the mystery person who prepared that.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk

On Tue, Nov 13, 2007 at 01:47:10PM -0500, Mark Lord wrote:
 Adrian Bunk wrote:
 ...
 I did bisecting myself, and I know that it costs time and work.

 But the first point is the above one that it makes otherwise nearly 
 undebuggable problems debuggable and fixable.
 ..

 Definitely useful, no question.

 But the problem is now that kernel devs are addicted to it,
 many won't even consider resolving a problem any other way.

 That's not maintaining (or supporting) one's code.

What you replaced with two dots contained the answer to this:

Another point is that it shifts the work from the few experienced 
developers to the many users. Users (and voluntary testers) we have
many, but developer time for debugging bug reports is a quite scarce 
resource.

 And when a maintainer is too busy to find/fix their own bugs,
 that could be a sign that they've bitten off too big of a chunk
 of the kernel, and it's time for them to distribute code maintainership.

The problem is: Maintainers don't grow on trees.

You need people who are both technically capable and willing to spend 
time on the non-sexy task of debugging problems.

Where do you plan to find them?

If you don't believe me, please find a maintainer for the currently 
unmaintained parallel port support.

Or if you want a harder task, find a maintainer for the floppy driver...

 Cheers

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Adrian Bunk wrote:

On Tue, Nov 13, 2007 at 01:47:10PM -0500, Mark Lord wrote:

Adrian Bunk wrote:
...

I did bisecting myself, and I know that it costs time and work.

But the first point is the above one that it makes otherwise nearly 
undebuggable problems debuggable and fixable.

..

Definitely useful, no question.

But the problem is now that kernel devs are addicted to it,
many won't even consider resolving a problem any other way.

That's not maintaining (or supporting) one's code.


What you replaced with two dots contained the answer to this:

Another point is that it shifts the work from the few experienced 
developers to the many users. Users (and voluntary testers) we have
many, but developer time for debugging bug reports is a quite scarce 
resource.



And when a maintainer is too busy to find/fix their own bugs,
that could be a sign that they've bitten off too big of a chunk
of the kernel, and it's time for them to distribute code maintainership.


The problem is: Maintainers don't grow on trees.

You need people who are both technically capable and willing to spend 
time on the non-sexy task of debugging problems.


Where do you plan to find them?

If you don't believe me, please find a maintainer for the currently 
unmaintained parallel port support.


Or if you want a harder task, find a maintainer for the floppy driver...

..

Again, the problem is:


But the problem is now that kernel devs are addicted to it,
many won't even consider resolving a problem any other way.


And that's simply not good enough.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Russell King

On Tue, Nov 13, 2007 at 09:08:32AM -0500, Mark Lord wrote:
 Ingo Molnar wrote:
 ..
  This is all QA-101 that _cannot be argued against on a rational basis_, 
  it's just that these sorts of things have been largely ignored for 
  years, in favor of the all-too-easy open source means many eyeballs and 
  that is our QA answer, which is a _good_ answer but by far not the most 
  intelligent answer! Today many eyeballs is simply not good enough and 
  nature (and other OS projects) will route us around if we dont change.
 ..
 
 QA-101 and many eyeballs are not at all in opposition.
 The latter is how we find out about bugs on uncommon hardware,
 and the former is what we need to track them and overall quality.
 
 A HUGE problem I have with current efforts, is that once someone
 reports a bug, the onus seems to be 99% on the *reporter* to find
 the exact line of code or commit.  Ghad what a repressive method.

99% on the reporter?  Is that why I always try to understand the
reporters problem (*provided* it's in an area I know about) and come
up with a patch to test a theory or fix the issue?

I'm _less_ inclined to provide such a service for lazy maintainers
who've moved off into new and wonderfully exciting technologies, to
churn out more patches for me to merge (and eventually provide a free
to them bug fixing service for.)

That's less inclined, not won't.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk

On Tue, Nov 13, 2007 at 02:26:05PM -0500, Mark Lord wrote:
 Adrian Bunk wrote:
 On Tue, Nov 13, 2007 at 01:47:10PM -0500, Mark Lord wrote:
 Adrian Bunk wrote:
 ..
 Another point is that it shifts the work from the few experienced 
 developers to the many users. Users (and voluntary testers) we have
 many, but developer time for debugging bug reports is a quite scarce 
 resource.

 And when a maintainer is too busy to find/fix their own bugs,
 that could be a sign that they've bitten off too big of a chunk
 of the kernel, and it's time for them to distribute code maintainership.

 The problem is: Maintainers don't grow on trees.
 ..

 Hey, if somebody has time to break things, then they damn well ought
 to be able to make time to fix them again.  And the best developers
 here on LKML do just that (fix what they break).

 You broke it, you fix it.  A simple rule.

 Translation for the particularly daft:

 If you've been making significant updates to a driver/subsystem,
 and people are reporting that it is now broken for them,

What are significant updates?

Sometimes one person makes one small patch and this patch contains
a typo.

 then it's your job to make it right.

We have some open drivers/ata/ regressions.

I see some person named Mark Lord being responsible for 4 commits.

What pubishment do you plan for him if 2.6.24 ships with any libata 
regressions?

Let George W. Bush wrongly accuse him of possessing weapons of 
mass destructions and invade Canada?

 The reporters can help,
 and many may even git-bisect or send patches.  
 But you cannot *expect* or *insist* upon them doing your job.

Bullshit.

Bug fixing is not about finding someone to blame, it's about getting the 
bug fixed.

The bug reporter is the person who can reproduce the problem, and if 
it's a regression then bisecting is the natural way of getting nearer 
at getting it fixed.

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton

On Tue, 13 Nov 2007 19:32:19 + Russell King [EMAIL PROTECTED] wrote:

 There's another issue I want to raise concerning bugzilla.  We have the
 classic case of not enough people reading bugzilla bugs - which is one
 of the biggest problems with bugzilla.  Virtually no one in the ARM
 community looks for ARM bugs in bugzilla.

Nor should they.

 Let's not forget that it would be a waste of time for people to manually
 check bugzilla for ARM bugs.  There's soo few people reporting ARM bugs
 into bugzilla that a weekly manual check by every maintainer would just
 return the same old boring results for months and months at a time.

I screen all bugzilla reports.  100% of them.

- I'll try to establish whether it is a regression

- I'll solicit any extra information which I believe the reveloper will need

- I'll ensure that an appropriate developer has seen the report

And yes, the number of arm-specific reports in there is very small.

 It would be far more productive if the ARM category was deleted from
 bugzilla and the few people who use bugzilla reported their bugs on the
 mailing list.  We've a couple of thousand people on the ARM kernel
 mailing list at the moment - that's 3 orders of magnitude more of eyes
 than look at bugzilla.

Is that [EMAIL PROTECTED]

If so, MANITAINERS claims that it is subscribers-only.  That would cause
some bug reporters to give up and go away.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk

On Tue, Nov 13, 2007 at 03:13:46PM -0500, Mark Lord wrote:
 Adrian Bunk wrote:
 On Tue, Nov 13, 2007 at 02:26:05PM -0500, Mark Lord wrote:
 ..
 If you've been making significant updates to a driver/subsystem,
 and people are reporting that it is now broken for them,

 What are significant updates?

 Sometimes one person makes one small patch and this patch contains
 a typo.
 ..

 Then that person should double check their changes against
 the problems reported, and re-convince themselves that the
 breakage wasn't from those.  Simple. 

Simple?

Everything you have in mind with should double check their changes is 
simply not realistic with dozens of known unfixed regressions within 
more than half a million changed or new lines of code written by more 
than 800 people - all numbers only counted since 2.6.23.

...
 The reporters can help,
 and many may even git-bisect or send patches.  But you cannot *expect* or 
 *insist* upon them doing your job.

 Bullshit.

 Bug fixing is not about finding someone to blame, it's about getting the 
 bug fixed.
 ..

 It's not about blame, it's about paying attention to breakages in code that a
 person claims to be supporting, and then doing their best to resolve the 
 issues.

Maintainers are just humans with limited time. 

You were the one who suggested to distribute code maintainership, 
so you should explain how to find the additional maintainers.

 Again, if one has the time to actively write/modify code such that something 
 breaks,
 then that person should also make time to fix the breakages.

code writer != subsystem maintainer

And git-bisect is the tool that tells you who broke it.

 The bug reporter is the person who can reproduce the problem, and if it's 
 a regression then bisecting is the natural way of getting nearer at 
 getting it fixed.
 ..
 For the third time, no disagreement here.  git-bsect can help in many cases,
 but not in all cases.  And it requires a great time commitment from somebody
 who's system used to work and now doesn't work.  The person who broke it has
 a fair bit of responsibility there, too.

git-bisect can help only for regressions, and it can help for most 
regressions.

And you shouldn't try to make a problem out of something that isn't a 
problem:

Bug submitters are either volunteers who test -rc or even -git or -mm 
kernels for finding bugs or people who want a problem they experience 
fixed.

In both cases the submitters are usually willing to invest some time for 
helping to get the bug fixed.

 cheers

cu
Adrian

--

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Romano Giannetti


I jump in this discussion hoping to have some more insight on git and to
report my experience as a tester. I consider myself as half-literate in
this (I am here since 1991, more or less, and I am able to compile a
kernel and even hand-apply a patch, although I am in no way a kernel
programmer). 

On Tue, 2007-11-13 at 18:01 +0100, Adrian Bunk wrote:

 The small instruction below is enough for everyone who is able to 
 build his own kernel to do a git bisect.

 # start bisecting:
 cd linux-2.6
 git bisect start
 git bisect bad v2.6.21
 git bisect good v2.6.20
 cp /path/to/.config .
 

This was what I did in my (in the end almost successful) bisecting when
trying to find the mmc problem (see the thread named 2.6.24-rc1 eat my
SD card). This is true in theory, but it has some problem. The this
commit does not compile is the easiest and in man git-bisect it's
explained how to solve it. The changes in .config options, added or
removed, are another problem when jumping back and forth from version (I
was bitten by the gadzillions new options added to hda-intel alsa
driver, but well, that is solvable with a bit of attention).

The main problem I had, and that stopped me to arrive to a definite is
this situation:

j version-bad
i
h
g unrelated (but similar) bug corrected
f
e
d unrelated (but similar) bug introduced
c
b
a version-good 

(d was the series to change drivers to use sg helpers, and g was a fix
fallout from sg helpers patch). Now I have a series of kernels (d, e,
f) that did not work at all and so I cannot mark them good or bad. With
the number of patches added in the free-for-all week, this is a very
probable scenario. There is a way out from this using bisect?

Romano 

PS as a suggestion, I think that added a Reported-by, or Tested-by,
or Debugged-by attribution in the repository, as happened to be in the
MMC case, is a nice an d welcomed reward for the effort.

-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Jörn Engel

On Tue, 13 November 2007 15:18:07 -0500, Mark Lord wrote:
 
 I just find it weird that something can be known broken for several -rc*
 kernels before I happen to install it, discover it's broken on my own 
 machine,
 and then I track it down, fix it, and submit the patch, generally all 
 within a
 couple of hours.  Where the heck was the dude(ess) that broke it ??  AWOL.
 
 And when I receive hostility from the maintainers of said code for fixing
 their bugs, well.. that really motivates me to continue reporting new ones..

Given a decent bug report, I agree that having the bug not looked at is
shameful.  But what can a developer do if a bug report effectively reads
there is some bug somewhere in recent kernels?  How can I know that in
this particular case it is my bug that I introduced?  It could just as
easily be 50 other people and none of them are eager to debug it unless
they suspect it to be their bug.

This is a common problem and fairly unrelated to linux in general or the
kernel in particular.  Who is going to be the sucker that figures out
which developer the bug belongs to?  And I have yet to find a project,
commercial or opensource, where volunteers flock to become such a
sucker.

One option is to push this role to the bug reporter.  Another is to
strong-arm some developers into this role, by whatever means.  A third
would be for $LARGE_COMPANY to hire some people.  If you have a better
idea or would volunteer your time, I'd be grateful.  Simply blaming one
side, whether bug reporter or a random developer, for not being the
sucker doesn't help anyone.

Jörn

-- 
Joern's library part 2:
http://www.art.net/~hopkins/Don/unix-haters/tirix/embarrassing-memo.html

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Rafael J. Wysocki

On Tuesday, 13 of November 2007, Mark Lord wrote:
 Matthew Wilcox wrote:
  On Tue, Nov 13, 2007 at 01:43:53PM -0500, Mark Lord wrote:
 
  mkdir t
  cd t
  git clone 
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
  (wait half an hour)
  /usr/bin/du -s linux-2.6
  522732  linux-2.6
  
  You're assuming that everything in linux-2.6 was downloaded; that's
  not true.  Everything in linux-2.6/.git was downloaded; but then you do a
  checkout which happens to approximately double the size of the linux-2.6
  directory. 
 ..
 
 Ah, I wondered why it took only half an hour to download.
 
 ..
  When you compare it to the 60MB tarballs that are published, it's really
  not that bad.
 ..
 
 The tarballs I download are only 45MB.

You clone the git repo once.  Afterwards, you only update it and that usually
doesn't take that much time and a little effort.

Greetings,
Rafael

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Frans Pop

Romano Giannetti wrote:
 This was what I did in my (in the end almost successful) bisecting when
 trying to find the mmc problem (see the thread named 2.6.24-rc1 eat my
 SD card). This is true in theory, but it has some problem. The this
 commit does not compile is the easiest and in man git-bisect it's
 explained how to solve it. The changes in .config options, added or
 removed, are another problem when jumping back and forth from version.
 
 The main problem I had, and that stopped me to arrive to a definite is
 this situation:
[...] 
 (d was the series to change drivers to use sg helpers, and g was a fix
 fallout from sg helpers patch). Now I have a series of kernels (d, e,
 f) that did not work at all and so I cannot mark them good or bad. With
 the number of patches added in the free-for-all week, this is a very
 probable scenario. There is a way out from this using bisect?

I think there are three strategies you can use in this case:
- create a kernel config that is as simple as possible, but still supports
  your hardware and reproduces your problem; a simpler config will often
  avoid compilation issues in parts of the kernel that you're not using
  anyway and has the benefit of speeding up the compiles too

- if you know/suspect in what part of the tree the bug is, first limit the
  bisection to that; you will have to verify that you did indeed find the
  correct (broken) change by doing a compile for the last good commit + 1

- if you find a broken commit, use 'git-reset --hard' to try to jump past
  the bad set of commits, but of course that does not help in the case:
g version-bad
f unrelated bug corrected
e
d the broken commit that caused your problem
c
b unrelated bug that breaks compilation or system introduced
a version-good
  in that case the best you can reasonably be expected to do is report that
  you narrowed it down to between a and g and leave the rest to the
  developers

Cheers,
FJP

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Jörn Engel

On Tue, 13 November 2007 13:56:58 -0800, Andrew Morton wrote:
 
 It's relatively common that a regression in subsystem A will manifest as a
 failure in subsystem B, and the report initially lands on the desk of the
 subsystem B developers.
 
 But that's OK.  The subsystem B people are the ones with the expertise to
 be able to work out where the bug resides and to help the subsystem A
 people understand what went wrong.
 
 Alas, sometimes the B people will just roll eyes and do nothing because
 they know the problem wasn't in their code.  Sometimes.

And sometimes the A people will ignore the B people after the root cause
has been worked out.  Do you have a good idea how to shame A into
action?  Should I put you on Cc:?  Right now I'm in the eye-rolling
phase.

Jörn

-- 
The cost of changing business rules is much more expensive for software
than for a secretaty.
-- unknown

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton

On Tue, 13 Nov 2007 22:18:01 + Russell King [EMAIL PROTECTED] wrote:

 On Tue, Nov 13, 2007 at 12:52:22PM -0800, Andrew Morton wrote:
  On Tue, 13 Nov 2007 19:32:19 + Russell King [EMAIL PROTECTED] wrote:
   There's another issue I want to raise concerning bugzilla.  We have the
   classic case of not enough people reading bugzilla bugs - which is one
   of the biggest problems with bugzilla.  Virtually no one in the ARM
   community looks for ARM bugs in bugzilla.
  
  Nor should they.
 
 So what you're saying is...
 
   Let's not forget that it would be a waste of time for people to manually
   check bugzilla for ARM bugs.  There's soo few people reporting ARM bugs
   into bugzilla that a weekly manual check by every maintainer would just
   return the same old boring results for months and months at a time.
  
  I screen all bugzilla reports.  100% of them.
  
  - I'll try to establish whether it is a regression
  
  - I'll solicit any extra information which I believe the reveloper will need
  
  - I'll ensure that an appropriate developer has seen the report
  
  And yes, the number of arm-specific reports in there is very small.
 
 that just because you do this everyone in a select clique, who you include
 me in, should be doing this as well.
 
 No.  Thank.  You.

No, I don't mean that at all and this was very plainly obviously from my very
clearly written email.  Let me try again.

No, no subsystem developer needs to monitor new bugzilla reports.  This is
because *I do it for them*.  I will actively make them aware of new reports
which I believe are legitimate and which contain sufficient information for
them to be able to take further action.

   It would be far more productive if the ARM category was deleted from
   bugzilla and the few people who use bugzilla reported their bugs on the
   mailing list.  We've a couple of thousand people on the ARM kernel
   mailing list at the moment - that's 3 orders of magnitude more of eyes
   than look at bugzilla.
  
  Is that [EMAIL PROTECTED]
 
 Yes.
 
  If so, MANITAINERS claims that it is subscribers-only.  That would cause
  some bug reporters to give up and go away.
 
 Find some other mailing list; I'm not hosting *nor* am I willing to run a
 non-subscribers only mailing list.  Period.  Not negotiable, so don't even
 try to change my mind.

Making a list subscribers-only will cause some bug reports to be lost.

Tradeoffs are involved, against which decisions must be made.  You have
made yours.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Russell King

On Tue, Nov 13, 2007 at 06:25:16PM +, Alan Cox wrote:
  Given the wide range of ARM platforms today, it is utterly idiotic to
  expect a single person to be able to provide responses for all ARM bugs.
  I for one wish I'd never *VOLUNTEERED* to be a part of the kernel
  bugzilla, and really *WISH* I could pull out of that function.
 
 You can. Perhaps that bugzilla needs to point to some kind of
 [EMAIL PROTECTED] list for the various ARM platform
 maintainers ?

That might work - though it would be hard to get all the platform
maintainers to be signed up to yet another mailing list, I'm sure
sufficient would do.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton

On Tue, 13 Nov 2007 23:24:14 +0100 Jörn Engel [EMAIL PROTECTED] wrote:

 On Tue, 13 November 2007 13:56:58 -0800, Andrew Morton wrote:
  
  It's relatively common that a regression in subsystem A will manifest as a
  failure in subsystem B, and the report initially lands on the desk of the
  subsystem B developers.
  
  But that's OK.  The subsystem B people are the ones with the expertise to
  be able to work out where the bug resides and to help the subsystem A
  people understand what went wrong.
  
  Alas, sometimes the B people will just roll eyes and do nothing because
  they know the problem wasn't in their code.  Sometimes.
 
 And sometimes the A people will ignore the B people after the root cause
 has been worked out.  Do you have a good idea how to shame A into
 action?  Should I put you on Cc:?  Right now I'm in the eye-rolling
 phase.
 

Well, that's the problem, isn't it?

The best I can come up with is to suggest that all the info be captured in
a bugzilla report so that at least it doesn't get forgotten about.

I suppose that other options are

a) try to fix it yourself.  I'll take the patch and as long as we make a
   big enough mess of it, someone who knows what they're doing might fix it
   for real.

b) If it was a regression, identify the offending commit and we'll just
   revert it.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Thomas Gleixner

On Tue, 13 Nov 2007, Mark Lord wrote:

 Thomas Gleixner wrote:
  On Tue, 13 Nov 2007, Mark Lord wrote:
  
Andrew Morton wrote:
  On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich
  [EMAIL PROTECTED] wrote:
..
with CONFIG_NO_HZ and/or CONFIG_HPET_TIMER set kernel 2.6.23
 doesn't
boot (ARM, Timer)
http://bugzilla.kernel.org/show_bug.cgi?id=9229
Kernel: 2.6.23
No response from developers
..
  
  The bug report is bogus. ARM has no CONFIG_HPET_TIMER.  
Note:  that same bug exists/existed on i386 back when NO_HZ was
introduced (2.6.21?).  I still see it from time to time on my Quad core
system (very rare), but not any more on my Duo notebook where it used
to happen about 1 in n boots (n  10).
 AFAICT no fix was ever released for it.
  
  Hmm, at which point does the boot stop ? 
 ..
 
 Just as it prints out these messages, sometimes one of them,
 sometimes both (or all four on the quad core):
 
 kernel: switched to high resolution mode on cpu 1
 kernel: switched to high resolution mode on cpu 0 

It's completely dead afterwards ?

 tglx

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton

On Tue, 13 Nov 2007 23:09:37 + Russell King [EMAIL PROTECTED] wrote:

 On Tue, Nov 13, 2007 at 02:32:01PM -0800, Andrew Morton wrote:
  On Tue, 13 Nov 2007 22:18:01 + Russell King [EMAIL PROTECTED] wrote:
   On Tue, Nov 13, 2007 at 12:52:22PM -0800, Andrew Morton wrote:
On Tue, 13 Nov 2007 19:32:19 + Russell King [EMAIL PROTECTED] 
wrote:
  No, I don't mean that at all and this was very plainly obviously from my 
  very
  clearly written email.  Let me try again.
 
 If you screen all bugzilla reports then you'll know that bug #9356 arrived
 at about 1400 GMT yesterday.  It's hardly surprising then that your utterly
 crappy responses to Natalie's message (which, incidentally, wasn't copied
 to me) sent within 24 hours of that report cause *great* annoyance.
 
  No, no subsystem developer needs to monitor new bugzilla reports.  This is
  because *I do it for them*.  I will actively make them aware of new reports
  which I believe are legitimate and which contain sufficient information for
  them to be able to take further action.
 
 On the whole you do an excellent job with feeding the bug reports to
 people, and while I recognise that you're only human, things do
 occasionally go wrong.  For instance, sending clearly marked Samsung
 S3C bugs to me rather than Ben Dooks (who's in MAINTAINERS for those
 platforms.)


Well whatever, sorry.  But this is in the noise floor.  Point is: many bug
reports aren't being attended to.


 It would be far more productive if the ARM category was deleted from
 bugzilla and the few people who use bugzilla reported their bugs on 
 the
 mailing list.  We've a couple of thousand people on the ARM kernel
 mailing list at the moment - that's 3 orders of magnitude more of eyes
 than look at bugzilla.

Is that [EMAIL PROTECTED]
   
   Yes.
   
If so, MANITAINERS claims that it is subscribers-only.  That would cause
some bug reporters to give up and go away.
   
   Find some other mailing list; I'm not hosting *nor* am I willing to run a
   non-subscribers only mailing list.  Period.  Not negotiable, so don't even
   try to change my mind.
  
  Making a list subscribers-only will cause some bug reports to be lost.
  
  Tradeoffs are involved, against which decisions must be made.  You have
  made yours.
 
 So how are they lost when they're held in a moderation queue and are
 either accepted, a useful response given to the original poster, or
 are forwarded to someone who can deal with the issue.
 
 I don't think subscribers only describes my lists - we don't devnull
 stuff just because the poster is not a subscriber.

Oh, OK, as long as there really is a human paying attention to those things
then that's fine.  When one is on the sending end of these things one never
knows how long it will take, not whether it will even happen.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Russell King

On Tue, Nov 13, 2007 at 09:13:19PM +0100, Adrian Bunk wrote:
 On Tue, Nov 13, 2007 at 07:32:19PM +, Russell King wrote:
 ...
  There's another issue I want to raise concerning bugzilla.  We have the
  classic case of not enough people reading bugzilla bugs - which is one
  of the biggest problems with bugzilla.  Virtually no one in the ARM
  community looks for ARM bugs in bugzilla.
  
  Let's not forget that it would be a waste of time for people to manually
  check bugzilla for ARM bugs.  There's soo few people reporting ARM bugs
  into bugzilla that a weekly manual check by every maintainer would just
  return the same old boring results for months and months at a time.
 ...
 
 What about having all ARM bugs in Bugzilla by default assigned to 
 [EMAIL PROTECTED] [1]

That would also work, probably much better than setting up yet another
list.

My experience of trying to get mbligh to do this when I stopped looking
after PCMCIA stuff was *extremely* painful.  Wonder if it's become any
easier of late?

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Mark Lord


Thomas Gleixner wrote:

On Tue, 13 Nov 2007, Mark Lord wrote:


Thomas Gleixner wrote:

On Tue, 13 Nov 2007, Mark Lord wrote:


Andrew Morton wrote:

On Mon, 12 Nov 2007 22:42:32 -0800 Natalie Protasevich
[EMAIL PROTECTED] wrote:

..

with CONFIG_NO_HZ and/or CONFIG_HPET_TIMER set kernel 2.6.23

doesn't

boot (ARM, Timer)
http://bugzilla.kernel.org/show_bug.cgi?id=9229
Kernel: 2.6.23
No response from developers

..
The bug report is bogus. ARM has no CONFIG_HPET_TIMER.  

Note:  that same bug exists/existed on i386 back when NO_HZ was
introduced (2.6.21?).  I still see it from time to time on my Quad core
system (very rare), but not any more on my Duo notebook where it used
to happen about 1 in n boots (n  10).

AFAICT no fix was ever released for it.
Hmm, at which point does the boot stop ? 

..

Just as it prints out these messages, sometimes one of them,
sometimes both (or all four on the quad core):

kernel: switched to high resolution mode on cpu 1
kernel: switched to high resolution mode on cpu 0 


It's completely dead afterwards ?


Yeah.  No magic sysrq key or anything.
There's gotta be a race somewhere that's causing it,
but it's not obvious where to look for it.

My regular 2-core notebook no longer suffers from it,
and subtle .config changes used to make it come and go
back when it first appeared.

The quad-core has only done it twice on me thus far.

Tracking this one down looks tricky.  It might require some early lockup
detection code to be tailor made or something.

Cheers

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Russell King

On Tue, Nov 13, 2007 at 03:18:07PM -0500, Mark Lord wrote:
 Russell King wrote:
  On Tue, Nov 13, 2007 at 09:08:32AM -0500, Mark Lord wrote:
  Ingo Molnar wrote:
  ..
  This is all QA-101 that _cannot be argued against on a rational basis_, 
  it's just that these sorts of things have been largely ignored for 
  years, in favor of the all-too-easy open source means many eyeballs and 
  that is our QA answer, which is a _good_ answer but by far not the most 
  intelligent answer! Today many eyeballs is simply not good enough and 
  nature (and other OS projects) will route us around if we dont change.
  ..
 
  QA-101 and many eyeballs are not at all in opposition.
  The latter is how we find out about bugs on uncommon hardware,
  and the former is what we need to track them and overall quality.
 
  A HUGE problem I have with current efforts, is that once someone
  reports a bug, the onus seems to be 99% on the *reporter* to find
  the exact line of code or commit.  Ghad what a repressive method.
  
  99% on the reporter?  Is that why I always try to understand the
  reporters problem (*provided* it's in an area I know about) and come
  up with a patch to test a theory or fix the issue?
 ..
 
 Same here.
 
 I just find it weird that something can be known broken for several -rc*
 kernels before I happen to install it, discover it's broken on my own machine,
 and then I track it down, fix it, and submit the patch, generally all within a
 couple of hours.  Where the heck was the dude(ess) that broke it ??  AWOL.

Same thing can be said for compile breakages as well.  Looking at the
latest kautobuild output:

ARM ep93xx defconfig has been broken since 2.6.23-git1 due to:

drivers/net/arm/ep93xx_eth.c:420: error: implicit declaration of function 
'__netif_rx_schedule_prep'

caused by: [NET]: Make NAPI polling independent of struct net_device objects.

ARM netx defconfig has been broken since 2.6.23-git1 due to:

drivers/net/netx-eth.c: In function 'netx_eth_hard_start_xmit':
drivers/net/netx-eth.c:131: error: 'dev' undeclared (first use in this function)
drivers/net/netx-eth.c:131: error: (Each undeclared identifier is reported only 
once
drivers/net/netx-eth.c:131: error: for each function it appears in.)
drivers/net/netx-eth.c: In function 'netx_eth_receive':
drivers/net/netx-eth.c:158: error: 'dev' undeclared (first use in this function)

caused by: [NET] drivers/net: statistics cleanup #1 -- save memory and shrink 
code

Haven't got a report for either of those, but Kautobuild lets people
know if folk can be bothered to subscribe to its mailing list and/or
look at the site occasionally.

I suspect the maintainers of the above drivers aren't aware that their
drivers are broken.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Chuck Ebbert

On 11/13/2007 04:12 PM, Alan Cox wrote:
 Bug fixing is not about finding someone to blame, it's about getting the 
 bug fixed.
 
 Partly - its also about understanding why the bug occurred and making it
 not happen again.

Very few people think about that part.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Stephen Hemminger

On Tue, 13 Nov 2007 19:52:17 -0500
Chuck Ebbert [EMAIL PROTECTED] wrote:

 On 11/13/2007 04:12 PM, Alan Cox wrote:
  Bug fixing is not about finding someone to blame, it's about getting the 
  bug fixed.
  
  Partly - its also about understanding why the bug occurred and making it
  not happen again.
 
 Very few people think about that part.

Why does the kernel have very few useful tests?
 Lack of interest? resources? expertise?
Ideally each new feature would just be a small add on to an existing test.

Unlike developing new features which seems to grow well with more developers.
Bug fixing also seems to be a scarcity process. There often seems to be
a very few people that understand the problem well enough or have the necessary
hardware to reproduce and fix the problem.

Recent changes like tickless and scheduler rework were well thought out and 
caused
very little impact to 90% of the users. The problem is the 10% who do have 
problems.
Worse, the developers often only hear about the a small sample of those.

-- 
Stephen Hemminger [EMAIL PROTECTED]

Re: [BUG] New Kernel Bugs

2007-11-13 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Tue, 13 Nov 2007 14:32:01 -0800

 On Tue, 13 Nov 2007 22:18:01 + Russell King [EMAIL PROTECTED] wrote:

  Find some other mailing list; I'm not hosting *nor* am I willing to run a
  non-subscribers only mailing list.  Period.  Not negotiable, so don't even
  try to change my mind.

 Making a list subscribers-only will cause some bug reports to be lost.

 Tradeoffs are involved, against which decisions must be made.  You have
 made yours.

Russell doesn't have to worry any more, he doesn't have to
host it, and he doesn't have to be willing to run a
non-subscribers-only mailing list.

Because I am.

I've created [EMAIL PROTECTED]

Enjoy.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton

On Tue, 13 Nov 2007 17:11:36 -0800 Stephen Hemminger [EMAIL PROTECTED] wrote:

 On Tue, 13 Nov 2007 19:52:17 -0500
 Chuck Ebbert [EMAIL PROTECTED] wrote:
 
  On 11/13/2007 04:12 PM, Alan Cox wrote:
   Bug fixing is not about finding someone to blame, it's about getting the 
   bug fixed.
   
   Partly - its also about understanding why the bug occurred and making it
   not happen again.
  
  Very few people think about that part.
 
 Why does the kernel have very few useful tests?

Tests would of course be nice, but they aren't very useful(!)

Looking at this list which Natalie has generated I see around thirty which
are dependent on the right hardware and ten which are not.  This ratio is
typical, I think.  In fact I'd say that more than 75% of reported bugs are
dependent on hardware.

So the best test of all for the kernel is run it on a different machine. 
This is why we are so dependent upon our volunteer testers/reporters to
be able to do kernel development.

  Lack of interest? resources? expertise?
 Ideally each new feature would just be a small add on to an existing test.

Sure.  For system-call-visible features it would be good to do that.

But this tends not to be where bugs get exposed.  Because the original
developer can 100% exercise such code.  That isn't the case with
driver/arch/platform changes.

 Unlike developing new features which seems to grow well with more developers.
 Bug fixing also seems to be a scarcity process. There often seems to be
 a very few people that understand the problem well enough or have the 
 necessary
 hardware to reproduce and fix the problem.

We're 100% dead if having the hardware is a prerequisite to fixing a bug.
The terminal state there is that the kernel runs on about 200 machines
worldwide.  We have to work with reporters via email to fix these sorts of
things.  As we of course do.

 Recent changes like tickless and scheduler rework were well thought out and 
 caused
 very little impact to 90% of the users. The problem is the 10% who do have 
 problems.
 Worse, the developers often only hear about the a small sample of those.

Yes.  An unknown number of people just shrug and go back to an old kernel.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Tue, 13 Nov 2007 18:27:00 -0800

 Let me just say - I'm astonished at how little spam gets though the vger
 lists.  Considering how many times those email addresses must have been
 added to spam databases.

 It must be a lot of work, and whoever is doing it does it well.

 I don't even know.  Is it Matti?  You?

Matti gets all the credit for setting up the bayesian et al.
filters we have and training it as needed.

 contemplates [EMAIL PROTECTED]  Shudders.

Yes, sourceforge is a complete joke.

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Sam Ravnborg

 
  If so, MANITAINERS claims that it is subscribers-only.  That would cause
  some bug reporters to give up and go away.
 
 Find some other mailing list; I'm not hosting *nor* am I willing to run a
 non-subscribers only mailing list.  Period.  Not negotiable, so don't even
 try to change my mind.

The postmasters at vger is pretty good at running mailing lists.
For linux-kbuild my effort so far has been to request it.
Thats not a big deal.

So if they accept it you could have [EMAIL PROTECTED] for zero
overhead for you.

Sam

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Sam Ravnborg

On Wed, Nov 14, 2007 at 06:56:06AM +0100, Sam Ravnborg wrote:
  
   If so, MANITAINERS claims that it is subscribers-only.  That would cause
   some bug reporters to give up and go away.
  
  Find some other mailing list; I'm not hosting *nor* am I willing to run a
  non-subscribers only mailing list.  Period.  Not negotiable, so don't even
  try to change my mind.
 
 The postmasters at vger is pretty good at running mailing lists.
 For linux-kbuild my effort so far has been to request it.
 Thats not a big deal.
 
 So if they accept it you could have [EMAIL PROTECTED] for zero
 overhead for you.

And in a later mail I saw davem already created it.

Sam

Re: [BUG] New Kernel Bugs

2007-11-13 Thread David Miller

From: Sam Ravnborg [EMAIL PROTECTED]
Date: Wed, 14 Nov 2007 06:56:06 +0100

   If so, MANITAINERS claims that it is subscribers-only.  That would cause
   some bug reporters to give up and go away.

  Find some other mailing list; I'm not hosting *nor* am I willing to run a
  non-subscribers only mailing list.  Period.  Not negotiable, so don't even
  try to change my mind.

 The postmasters at vger is pretty good at running mailing lists.
 For linux-kbuild my effort so far has been to request it.
 Thats not a big deal.

 So if they accept it you could have [EMAIL PROTECTED] for zero
 overhead for you.

I already did, get a little deeper in your mailbox before
replying :-)

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk

On Tue, Nov 13, 2007 at 05:39:45PM -0700, Denys Vlasenko wrote:
 On Tuesday 13 November 2007 10:56, Adrian Bunk wrote:
  On Tue, Nov 13, 2007 at 12:13:56PM -0500, Theodore Tso wrote:
   On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
Btw, I used to test every -mm kernel. But since I've switched distros
(gentoo-ubuntu)
and I have less time, I feel it's harder to test -rc or -mm kernels (I
know this isn't a lkml problem
but more a distro problem, but I would love having an ubuntu blessed
repo with current dev kernel
for the latest stable ubuntu release).
  
   There are two parts to this.  One is a Ubuntu development kernel which
   we can give to large numbers of people to expand our testing pool.
   But if we don't do a better job of responding to bug reports that
   would be generated by expanded testing this won't necessarily help us.
  ...
 
  The main problem aren't missing testers [1] - we already have relatively
  experienced people testing kernels and/or reporting bugs, and we slowly
  scare them away due to the many bug reports without any reaction.
 
  The main problem is finding experienced developers who spend time on
  looking into bug reports.
 
  Getting many relatively unexperienced users (who need more guidance for
  debugging issues) as additional testers is therefore IMHO not
  necessarily a good idea.
 
 And where experienced developrs are coming from?
 They are not born with Linux kernel skills.
 They grow up from within user base.
 
 Bigger user base - more developers (eventually)

You missed the following in my email:
we slowly scare them away due to the many bug reports without any 
 reaction.

The problem is that bug reports take time. If you go away from easy 
things like compile errors then even things like describing what does
no longer work, ideally producing a scenario where you can reproduce it 
and verifying whether it was present in previous kernels can easily take 
many hours that are spent before the initial bug report.

If the bug report then gets ignored we discourage the person who sent 
the bug report to do any work related to the kernel again.

 vda

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko

On Wednesday 14 November 2007 00:27, Adrian Bunk wrote:
 You missed the following in my email:
 we slowly scare them away due to the many bug reports without any
  reaction.

 The problem is that bug reports take time. If you go away from easy
 things like compile errors then even things like describing what does
 no longer work, ideally producing a scenario where you can reproduce it
 and verifying whether it was present in previous kernels can easily take
 many hours that are spent before the initial bug report.

 If the bug report then gets ignored we discourage the person who sent
 the bug report to do any work related to the kernel again.

Cannot agree more. I am in a similar position right now.
My patch to aic7xxx driver was ubmitted four times
with not much reaction from scsi guys.

Finally they replied and asked to rediff it against their
git tree. I did that and sent patches back. No reply since then.

And mind you, the patch is not trying to do anything
complex, it mostly moves code around, removes 'inline',
adds 'const'. What should I think about it?
--
vda

81 matches

Mail list logo