Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression)

2007-12-17 Thread Chris Newport

On Tue, 18 Dec 2007, Stefan Richter wrote:


It's a 100% reproducible oops on Sparc (with FireWire controller) for
2.6.23 and 2.6.24 kernels, but not 2.6.22.  The reporter confirmed that
the bug also happens


How do you achieve a sparc system with firewire ?
AFAIK there is no SBUS firewire card.

Only sparc64 and some rare javastations have PCI slots.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/3] 2.6.22-rc2: known regressions v2

2007-05-25 Thread Chris Newport


Sorry, I did not make myself clear.

Linus Torvalds wrote:


On Fri, 25 May 2007, Chris Newport wrote:
 


Maybe we should take a hint from Solaris.
   



No. Solaris is shit. They make their decisions based on "we control the 
hardware" kind of setup.
 


Not really a Solaris feature. This is a feature of the  Openboot  PROM
which is also used by several other vendors.
The Openboot PROM knows how to write to disk. The same should
apply on Apple hardware and others which use the openboot
convention.

If dumps are enabled (disabled by default in a file read at boot) the
crash() function need only  set a couple of registers and do a prom
interrupt. At this point the kernel is no longer involved so broken
drivers etc are not an issue.

The cute bit is that the SunOS debug program can be called as
debug $DUMPFILE and it takes you to the failure point just like
a tracefile.

Crashdumps should not be enabled by default, they can chew rather
a lot of disk space making a crashdump.datetime file every time
something breaks  


If the kernel crashes Solaris dumps core to swap and sets a flag.
At the next boot this image is copied to /var/adm/crashdump where
it is preserved for future debugging. Obviously swap needs to be
larger than core, but this is usually the case.
   



(a) it's not necessarily the case at all on many systems

(b) _most_ crashes that are real BUG()'s (rather than WARN_ON()'s) leave 
   the system in such a fragile state that trying to write to disk is the 
   _last_ thing you should do.


   Linux does the right thing: it tries to not make bugs fatal. 
   Generally, you should see an oops, and things continue. Or a 
   WARN_ON(), and things continue. But you should avoid the "the machine 
   is now dead" cases.


(c) have you looked at the size of drivers lately? I'd argue that *most* 
   bugs by far happen in something driver-related, and most of our source 
   code is likely drivers.


   Writing to disk when the biggest problem is a driver to begin with 
   is INSANE.


So the fact is, Solaris is crap, and to a large degree Solaris is crap 
exactly _because_ it assumes that it runs in a "controlled environment".


Yes, in a controlled environment, dumping the whole memory image to disk 
may be the right thing to do. BUT: in a controlled environment, you'll 
never get the kind of usage that Linux gets. Why do you think Linux (and 
Windows, for that matter) took away a lot of the market from traditional 
UNIX? 

Answer: the traditional UNIX hardware/control model doesn't _work_. People 
want more flexibility, both on a hardware side and on a usage side. And 
once you have the flexibility, the "dump everything to disk" is simply not 
an option any more.


Disk dumps etc are options at things like wall street. But look at the bug 
reports, and ask yourself how many of them happen at Wall Street, and how 
many of them would even be _relevant_ to somebody there? 

So forget about it. The whole model is totally broken. We need to make 
bug-reports short and sweet, enough so that random people can 
copy-and-paste them into an email or take a digital photo. Anything else 
IS TOTALLY INSANE AND USELESS!


Linus
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/3] 2.6.22-rc2: known regressions v2

2007-05-25 Thread Chris Newport

Ingo Molnar wrote:

A BUG_ON() has a (much) lower likelyhood of being reported back - for 
most users it is a "X just hung hard, there was nothing in the syslog, i 
had to switch back to the older kernel" experience, and they do not have 
a serial console to hook up (newer hardware often doesnt even have a 
serial port). With the WARN_ON()s we have a _chance_ that despite the 
seriousness of the bug, the message makes it to the syslog, until the 
system comes to a screeching halt due to side-effects of the bug.


in that sense i am part of the problem: i was adding WARN_ON()s that 
werent true 'warnings' but 'bugs'. So i'd very much like to fix that 
problem, but i'd also like to solve the (very serious and existing) 
problem of BUG_ON()s making it less likely to get bugs reported back. 

 

There is a fundamental problem in getting  a decent log to debug  a 
crashed kernel.  Maybe we should take a hint from Solaris.

If the kernel crashes Solaris dumps core to swap and sets a flag.
At the next boot this image is copied to /var/adm/crashdump where
it is preserved for future debugging. Obviously swap needs to be
larger than core, but this is usually the case.

On Sun machines this is fairly easy because the dump can be
performed by the OBP, on other architectures it may be more
difficult to still have enough working kernel to achieve the dump
after a kernel panic.

Just a thought ...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: possible endless loop in PROM initialization

2007-08-20 Thread Chris Newport

Markus Dahms wrote:


Hello again,

David Miller wrote:
 


When we boot the firmware provides a vector of function
pointers, and this is prom_nodeops.  So prom_nodeops->no_nextprop()
is a routine inside the PROM.
   



Thanks. So there is no real chance to fix it but to override this
function?
The strange thing is that the PROM prompt handles it correctly:

| ok devalias scsi /iommu/sbus/espdma/esp
| scsi isn't unique
 


Non-unique devalias entries are NASTY. Get rid of them, for example
your second definition of scsi could be changed to scsidisk.
You should also check that any nvramrc entries do not cause conflicts,
it is common for nvramrc entries to refer to devalias entries, so you 
may need to also change an nvramrc entry to match any devalias change.


You can disable the use of nvramrc from the OBP prompt with
setenv if the entry isnon-essential, they are most commonly used
where there is a mirrored root disk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/