Re: [PATCH v2] builddeb: Support signing kernels with the module signing key
On Tue, 2022-02-08 at 13:10 +, Matthew Wilcox wrote: > On Tue, Feb 08, 2022 at 12:01:22PM +0100, Julian Andres Klode wrote: > > It's worth pointing out that in Ubuntu, the generated MOK key > > is for module signing only (extended key usage > > 1.3.6.1.4.1.2312.16.1.2), kernels signed with it will NOT be > > bootable. > > Why should these be separate keys? There's no meaningful security > boundary between a kernel module and the ernel itself; a kernel > modulecan, for example, write to CR3, and that's game over for > any pretence at separation. It's standard practice for any automated build private key to be destroyed immediately to preserve security. Thus the modules get signed with a per kernel ephemeral build key but the MoK key is a long term key with a special signing infrastructure, usually burned into the distro version of shim. The kernel signing key usually has to be long term because you want shim to boot multiple kernels otherwise upgrading becomes a nightmare. James
Bug#959069: linux-image-5.5.0-2-amd64 won't boot in a AMD SEV Virtual Machine
Subject: linux-image-5.5.0-2-amd64 won't boot in a AMD SEV Virtual Machine Package: src:linux Version: 5.5.17-1 Severity: important The boot failure is total: not even a console log can be seen, and seems to be due to the necessary memory encryption option not being set in the debian kernel: # CONFIG_AMD_MEM_ENCRYPT is not set In spite of the fact that the rest of the SEV encryption variables are set: CONFIG_KVM_AMD_SEV=y CONFIG_USB_SEVSEG=m So I'm reporting this on the assumption that it is supposed to work out of the box and not setting AMD_MEM_ENCRYPT was an oversight. Not setting this means that all the I/O devices are sending encrypted memory pages through to QEMU which is what's causing the hang. With this set, the kernel would bounce all the encrypted pages into unencrypted pages before sending them to devices. James
Bug#941611: linux-image-5.2.0-2-amd64: Kernel 5.2 has terrible performance under load
On Wed, 2019-10-02 at 22:07 +0200, Salvatore Bonaccorso wrote: > > Linux Kernel 5.2 is completely unusable on most of my systems. The > > problem seems to be something to do with memory compaction causing > > intervals where the system becomes unresponsive. > > > > This is definitely an upstream issue (my laptop running the > > upstream kernel is displaying the problem as well) so this bug is > > really just a warning not to deploy the 5.2 kernel until a fix is > > found. > > If so, could you point where it was reported upstream so we can set > accorrdingly where it has been forwarded to? Well the initial incarnation of this upstream patch set https://marc.info/?t=15676268933 Seems to fix the problem in my testbeds. I'm testing out the first two patches only at the moment. James
Bug#941611: linux-image-5.2.0-2-amd64: Kernel 5.2 has terrible performance under load
Package: src:linux Version: 5.2.9-2 Severity: important Tags: upstream Dear Maintainer, Linux Kernel 5.2 is completely unusable on most of my systems. The problem seems to be something to do with memory compaction causing intervals where the system becomes unresponsive. This is definitely an upstream issue (my laptop running the upstream kernel is displaying the problem as well) so this bug is really just a warning not to deploy the 5.2 kernel until a fix is found. -- Package-specific info: ** Kernel log: boot messages should be attached ** Model information sys_vendor: product_name: product_version: chassis_vendor: chassis_version: bios_vendor: Intel Corp. bios_version: BX97510J.86A.1209.2006.0601.1340 board_vendor: Intel Corporation board_name: D975XBX board_version: AAD27094-305 ** PCI devices: 00:00.0 Host bridge [0600]: Intel Corporation 82975X Memory Controller Hub [8086:277c] Subsystem: Intel Corporation 82975X Memory Controller Hub [8086:5842] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Kernel modules: i82975x_edac 00:01.0 PCI bridge [0604]: Intel Corporation 82975X PCI Express Root Port [8086:277d] (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Intel Corporation 82975X PCI Express Root Port [8086:] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Address: Data: Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE- DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s, Exit Latency L0s <256ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x16 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 75.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Off, PwrInd On, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID , PMEStatus- PMEPending- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb:Fixed+ WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0:Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [140 v1] Root Complex Link Desc: PortNumber=02 ComponentID=01 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: fed19000 Link1: Desc: TargetPort=03 TargetComponent=01 AssocRCRB- LinkType=Config LinkValid+ Addr: 00:03.0 CfgSpace=00018000 Kernel driver in use: pcieport 00:1b.0 Audio device [0403]: Intel Corporation NM10/ICH7 Family High Definition Audio Controller [8086:27d8] (rev 01) Subsystem: Intel Corporation NM10/ICH7 Family High Definition Audio Controller [8086:0417] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort-
Re: [PATCH RESEND 2] mvsas: Recognise device/subsystem 9485/9485 as 88SE9485
On Wed, 2014-02-19 at 01:06 +, Ben Hutchings wrote: Matt Taggart reported that mvsas didn't bind to the Marvell SAS controller on a Supermicro AOC-SAS2LP-MV8 board. lspci reports it as: 01:00.0 RAID bus controller [0104]: Marvell Technology Group Ltd. Device [1b4b:9485] (rev 03) Subsystem: Marvell Technology Group Ltd. Device [1b4b:9485] [...] Add it to the device table as chip_9485. Adding Marvell maintainer to cc. Can we get an ack on this ... or is mvsas dead and I can just apply it anyway? Thanks, James Reported-by: Matt Taggart tagg...@debian.org Tested-by: Matt Taggart tagg...@debian.org Signed-off-by: Ben Hutchings b...@decadent.org.uk --- drivers/scsi/mvsas/mv_init.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index 7b7381d..83fa5f8 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -729,6 +729,15 @@ static struct pci_device_id mvs_pci_table[] = { .class_mask = 0, .driver_data= chip_9485, }, + { + .vendor = PCI_VENDOR_ID_MARVELL_EXT, + .device = 0x9485, + .subvendor = PCI_ANY_ID, + .subdevice = 0x9485, + .class = 0, + .class_mask = 0, + .driver_data= chip_9485, + }, { PCI_VDEVICE(OCZ, 0x1021), chip_9485}, /* OCZ RevoDrive3 */ { PCI_VDEVICE(OCZ, 0x1022), chip_9485}, /* OCZ RevoDrive3/zDriveR4 (exact model unknown) */ { PCI_VDEVICE(OCZ, 0x1040), chip_9485}, /* OCZ RevoDrive3/zDriveR4 (exact model unknown) */ -- Ben Hutchings One of the nice things about standards is that there are so many of them. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/1397504559.2207.30.ca...@dabdike.int.hansenpartnership.com
Remove scsi_wait_scan module
scsi_wait_scan was introduced with asynchronous host scanning as a hack for distributions that weren't using proper udev based wait for root to appear in their initramfs scripts. In 2.6.30 Commit c751085943362143f84346d274e0011419c84202 Author: Rafael J. Wysocki r...@sisk.pl Date: Sun Apr 12 20:06:56 2009 +0200 PM/Hibernate: Wait for SCSI devices scan to complete during resume Actually broke scsi_wait_scan because it renders scsi_complete_async_scans() a nop for modular SCSI if you include scsi_scans.h (which this module does). The lack of bug reports is sufficient proof that this module is no longer used. James --- diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index bea04e5..ac6ea28 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -263,23 +263,6 @@ config SCSI_SCAN_ASYNC You can override this choice by specifying scsi_mod.scan=sync or async on the kernel's command line. -config SCSI_WAIT_SCAN - tristate # No prompt here, this is an invisible symbol. - default m - depends on SCSI - depends on MODULES -# scsi_wait_scan is a loadable module which waits until all the async scans are -# complete. The idea is to use it in initrd/ initramfs scripts. You modprobe -# it after all the modprobes of the root SCSI drivers and it will wait until -# they have all finished scanning their buses before allowing the boot to -# proceed. (This method is not applicable if targets boot independently in -# parallel with the initiator, or with transports with non-deterministic target -# discovery schemes, or if a transport driver does not support scsi_wait_scan.) -# -# This symbol is not exposed as a prompt because little is to be gained by -# disabling it, whereas people who accidentally switch it off may wonder why -# their mkinitrd gets into trouble. - menu SCSI Transports depends on SCSI diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 8deedea..f188509 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -161,8 +161,6 @@ obj-$(CONFIG_SCSI_OSD_INITIATOR) += osd/ # This goes last, so that real scsi devices probe earlier obj-$(CONFIG_SCSI_DEBUG) += scsi_debug.o -obj-$(CONFIG_SCSI_WAIT_SCAN) += scsi_wait_scan.o - scsi_mod-y += scsi.o hosts.o scsi_ioctl.o constants.o \ scsicam.o scsi_error.o scsi_lib.o scsi_mod-$(CONFIG_SCSI_DMA)+= scsi_lib_dma.o diff --git a/drivers/scsi/scsi_wait_scan.c b/drivers/scsi/scsi_wait_scan.c deleted file mode 100644 index 74708fc..000 --- a/drivers/scsi/scsi_wait_scan.c +++ /dev/null @@ -1,42 +0,0 @@ -/* - * scsi_wait_scan.c - * - * Copyright (C) 2006 James Bottomley james.bottom...@steeleye.com - * - * This is a simple module to wait until all the async scans are - * complete. The idea is to use it in initrd/initramfs scripts. You - * modprobe it after all the modprobes of the root SCSI drivers and it - * will wait until they have all finished scanning their busses before - * allowing the boot to proceed - */ - -#include linux/module.h -#include linux/device.h -#include scsi/scsi_scan.h - -static int __init wait_scan_init(void) -{ - /* -* First we need to wait for device probing to finish; -* the drivers we just loaded might just still be probing -* and might not yet have reached the scsi async scanning -*/ - wait_for_device_probe(); - /* -* and then we wait for the actual asynchronous scsi scan -* to finish. -*/ - scsi_complete_async_scans(); - return 0; -} - -static void __exit wait_scan_exit(void) -{ -} - -MODULE_DESCRIPTION(SCSI wait for scans); -MODULE_AUTHOR(James Bottomley); -MODULE_LICENSE(GPL); - -late_initcall(wait_scan_init); -module_exit(wait_scan_exit); -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1338110026.2957.5.ca...@dabdike.int.hansenpartnership.com
Bug#622997: libata-sff/pata_cmd64x problem with hardwired configurations
On Tue, 2011-04-19 at 11:20 +0200, Bartlomiej Zolnierkiewicz wrote: From: Bartlomiej Zolnierkiewicz bzoln...@gmail.com Subject: [PATCH v2] pata_cmd64x: add enablebits checking Fixes IDE - libata regression. IDE's cmd64x host driver has been supporting enablebits checking since the initial driver's merge. Actually, the thread discussing the proposed patches is here: http://marc.info/?t=13031522715 I much prefer the dummy interface approach to the prereset one because it prevents any possible poke at the registers which will crash the box. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1303221565.3171.12.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sun, 2011-04-17 at 11:11 -0400, John David Anglin wrote: On Sat, 2011-04-16 at 19:35 -0400, John David Anglin wrote: On Sat, 16 Apr 2011, James Bottomley wrote: Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts the OHCI module and discovers the ports just fine. Boot 2.6.39-rc3 fails for me with attached config. I can't quite build it. With gcc version 4.2.4 (Debian 4.2.4-6) I'm getting an ICE: net/wireless/reg.c: In function 'freq_reg_info_regd': net/wireless/reg.c:645: internal compiler error: in expand_expr_real_1, at expr.c:8744 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. For Debian GNU/Linux specific bug reporting instructions, see URL:file:///usr/share/doc/gcc-4.2/README.Bugs. make[2]: *** [net/wireless/reg.o] Error 1 This is probably fixed as it doesn't occur with gcc version 4.5.3 20110101 (prerelease) [gcc-4_5-branch revision 168387] (GCC) . GCC 4.2 and 4.3 are no longer maintained and there won't be any further releases from these branches. Without looking at the above, it's hard to tell whether the bug is a middle-end or backend bug. Many middle-end bugs are fixed in more recent GCC versions. Although newer versions may bring their own problems, we can get help in fixing problems particularly if they are regressions. The asm delay slot bug affected all GCC versions. I backported the fix to the 4.3, 4.4 and 4.5 branches. This is a problem in the kernel because of the following: ** The __asm__ op below simple prevents gcc/ld from reordering ** instructions across the mb() call. */ #define mb()__asm__ __volatile__(:::memory) /* barrier() */ It's just a matter of chance whether a barrier ends up in the delay slot of a branch in a critical location. I'll redo optimisation on that one and see if I can avoid this. Plus there's a bug in my kernel code: drivers/usb/host/xhci-pci.c: In function 'xhci_pci_setup': drivers/usb/host/xhci-pci.c:61: error: implicit declaration of function 'kzalloc If I correct for these (add missing slab.h include and disable wireless) I had to add missing slab.h as well. However, I didn't touch wireless with 4.5.3. and build, the last message I see is turn off boot console ttyB0 Which indicates it's got a problem with the console configuration (I don't see any console registration for the DIVA serial port on ttyS1 in the boot log). Comparing the console output that I recorded for the debian kernel, I see udev starts much earlier. It only has the initial message from the tg3 driver and SCSI subsystem. It's most likely a driver module that's getting loaded which is turned off in the booting configuration ... finding it isn't going to be easy, though ... James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1303053807.2583.1.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sun, 2011-04-17 at 10:23 -0500, James Bottomley wrote: It's most likely a driver module that's getting loaded which is turned off in the booting configuration ... finding it isn't going to be easy, though ... Finally got a build (had to swap out -Os for -O2). I traced the module loads and successful inits and found it; it's pata_cmd64x ... it loads but never returns from init. I bet it's trying to poke into ISA space which causes the HPMC. Removing this one module from the system allows it to boot again. I'd suggest just disabling in the parisc config for now. Using an ATA based CD/DVD instead of a SCSI one is a very recent thing. I'll see if I can get it working, but ATA controllers tend to be somewhat nasty and x86 specific ... James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1303068487.2583.7.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sun, 2011-04-17 at 20:37 +0100, Ben Hutchings wrote: On Sun, 2011-04-17 at 14:28 -0500, James Bottomley wrote: I traced the module loads and successful inits and found it; it's pata_cmd64x ... it loads but never returns from init. I bet it's trying to poke into ISA space which causes the HPMC. Removing this one module from the system allows it to boot again. [...] We also had a recent report that this driver is also bust on some sparc systems. We could swap back to cmd64x on these architectures but I would rather get pata_cmd64x fixed. Well, I've got a working pata_cm64x (and now a working CD drive). The specific issue on parisc (and probably sparc) is that we're using this siimage chip hard wired to a single DVD drive. We have no use for a secondary port, so there isn't one. The registers for the secondary port are pointing off into empty space. When libata-sff tries to touch the secondary port, we get an instant High Priority Machine Check because on most non-x86 systems, it's a fault to touch non-responding memory. I got it to work by making libata-sff only probe a single port. Now, here's the problem: the libata-sff driver is hardwired to probe two ports, so it will require major surgery to check dynamically how many ports there are ... and the second problem is that I don't even know how to check this. I'll ask about this on linux-ide. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1303083217.2583.14.ca...@mulgrave.site
Bug#622997: libata-sff/pata_cmd64x problem with hardwired configurations
I've got a parisc system where the DVD drive is hardwired to a silicon image controller: 00:02.0 IDE interface: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to ATA Host Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to ATA Host Controller Flags: bus master, medium devsel, latency 64, IRQ 69 I/O ports at 0d18 [size=8] I/O ports at 0d24 [size=4] I/O ports at 0d10 [size=8] I/O ports at 0d20 [size=4] I/O ports at 0d00 [size=16] Capabilities: [60] Power Management version 2 Kernel driver in use: pata_cmd64x The specific problem is that any access to the registers where the secondary port should be causes an instant fault on the box (I think because the second port just isn't wired up internally, so the memory doesn't respond), so the default libata-sff driver that pata_cmd64x is attached to causes this by insisting on probing both ports. I can get all of this working by fixing up all the hard coded knowledge in libata-sff only to use a single port. However, I can't fix the libata-sff driver until I know how to tell there's only one port wired. Does anyone with cmd649 knowledge have any idea how I might tell this? James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1303084704.2583.23.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sun, 2011-04-17 at 21:25 -0400, John David Anglin wrote: This is excellent detective work. If I might ask, how did you trace the module loads and successful inits? Heh, you're expecting me to name magic tracing tools? Well (shuffles feet) I just put printks in kernel/modules.c to do it. It's basically impossible to trace a boot problem like this any other way, because we don't have enough of the system up to use any tools. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1303096956.2583.25.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote: I posted this debian bug report because the most recent debian SMP kernel build fails to boot on my rp3440: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997 I don't think debian kernels have worked since lenny. Hmm, well upstream ones have: so it's likely a patch debian has but upstream doesn't, or it could be a toolchain issue ... I didn't think gcc-4.4.5 worked properly on 64 bit without a few patches? James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1302980251.4058.11.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sat, 2011-04-16 at 15:29 -0400, John David Anglin wrote: On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote: I posted this debian bug report because the most recent debian SMP kernel build fails to boot on my rp3440: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997 I don't think debian kernels have worked since lenny. Hmm, well upstream ones have: so it's likely a patch debian has but upstream doesn't, or it could be a toolchain issue ... I didn't think gcc-4.4.5 worked properly on 64 bit without a few patches? Yes, but debian tends to build almost everything. For some reason, I've turned off ipv6. Unlike many kernel bugs, this one is completely reproducible. I suppose it could be USB ... before I got ion, I didn't have any parisc systems with USB, so it's turned off in my build. I'll turn it on and see if there's a problem there. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1302986887.4058.13.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sat, 2011-04-16 at 15:48 -0500, James Bottomley wrote: On Sat, 2011-04-16 at 15:29 -0400, John David Anglin wrote: On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote: I posted this debian bug report because the most recent debian SMP kernel build fails to boot on my rp3440: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997 I don't think debian kernels have worked since lenny. Hmm, well upstream ones have: so it's likely a patch debian has but upstream doesn't, or it could be a toolchain issue ... I didn't think gcc-4.4.5 worked properly on 64 bit without a few patches? Yes, but debian tends to build almost everything. For some reason, I've turned off ipv6. Unlike many kernel bugs, this one is completely reproducible. I suppose it could be USB ... before I got ion, I didn't have any parisc systems with USB, so it's turned off in my build. I'll turn it on and see if there's a problem there. Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts the OHCI module and discovers the ports just fine. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1302988479.7967.0.ca...@mulgrave.site
Bug#622997: Debian bug 622997
On Sat, 2011-04-16 at 19:35 -0400, John David Anglin wrote: On Sat, 16 Apr 2011, James Bottomley wrote: Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts the OHCI module and discovers the ports just fine. Boot 2.6.39-rc3 fails for me with attached config. I can't quite build it. With gcc version 4.2.4 (Debian 4.2.4-6) I'm getting an ICE: net/wireless/reg.c: In function 'freq_reg_info_regd': net/wireless/reg.c:645: internal compiler error: in expand_expr_real_1, at expr.c:8744 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. For Debian GNU/Linux specific bug reporting instructions, see URL:file:///usr/share/doc/gcc-4.2/README.Bugs. make[2]: *** [net/wireless/reg.o] Error 1 Plus there's a bug in my kernel code: drivers/usb/host/xhci-pci.c: In function 'xhci_pci_setup': drivers/usb/host/xhci-pci.c:61: error: implicit declaration of function 'kzalloc If I correct for these (add missing slab.h include and disable wireless) and build, the last message I see is turn off boot console ttyB0 Which indicates it's got a problem with the console configuration (I don't see any console registration for the DIVA serial port on ttyS1 in the boot log). James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1303016224.5167.7.ca...@mulgrave.site
Bug#561203: threads and fork on machine with VIPT-WB cache
On Tue, 2010-04-06 at 08:37 -0500, James Bottomley wrote: (5) Child process B is waken up and sees old value at x in oldpage, through different cache line. B sleeps. This isn't possible. at this point, A and B have the same virtual address and mapping for oldpage this means they are the same cache colour, so they both see the cached value. Perhaps to add more detail to this. In spite of what the arch manual says (it says the congruence stride is 16MB), the congruence stride on all manufactured parisc processors is 4MB. This means that any virtual addresses, regardless of space id, that are equal modulo 4MB have the same cache colour. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1270561481.4493.40.ca...@mulgrave.site
Bug#561203: threads and fork on machine with VIPT-WB cache
On Tue, 2010-04-06 at 13:57 +0900, NIIBE Yutaka wrote: John David Anglin wrote: It is interesting that in the case of the Debian bug that a thread of the parent process causes the COW break and thereby corrupts its own memory. As far as I can tell, the fork'd child never writes to the memory that causes the fault. Thanks for writing and testing a patch. The case of #561203 is second scenario. I think that this case is relevant to VIVT-WB machine too (provided kernel does copy by kernel address). James Bottomley wrote: So this is going to be a hard sell because of the arch churn. There are, however, three ways to do it with the original signature. Currently, I think that signature change would be inevitable for ptep_set_wrprotect. Well we can't do it by claiming several architectures are wrong in their implementation. We might do it by claiming to need vma knowledge ... however, even if you want the flush, as I said, you don't need to change the signature. 1. implement copy_user_highpage ... this allows us to copy through the child's page cache (which is coherent with the parent's before the cow) and thus pick up any cache changes without a flush Let me think about this way. Well, this would improve both cases of the first scenario of mine and the second scenario. But... I think that even if we would have copy_user_highpage which does copy by user address, we need to flush at ptep_set_wrprotect. I think that we need to keep the condition: no dirty cache for COW page. Think about third scenario of threads and fork: (1) In process A, there are multiple threads, and a thread A-1 invokes fork. We have process B, with a different space identifier color. I don't understand what you mean by space colour ... there's cache colour which refers to the line in the cache to which the the physical memory maps. The way PA is set up, space ID doesn't factor into cache colour. (2) Another thread A-2 in process A runs while A-1 copies memory by dup_mmap. A-2 writes to the address x in a page. Let's call this page oldpage. (3) We have dirty cache for x by A-2 at the time of ptep_set_wrprotect of thread A-1. Suppose that we don't flush here. (4) A-1 finishes copy, and sleeps. (5) Child process B is waken up and sees old value at x in oldpage, through different cache line. B sleeps. This isn't possible. at this point, A and B have the same virtual address and mapping for oldpage this means they are the same cache colour, so they both see the cached value. James (6) A-2 is waken up. A-2 touches the memory again, breaks COW. A-2 copies data on oldpage to newpage. OK, newpage is consistent with copy_user_highpage by user address. Note that during this copy, the cache line of x by A-2 is flushed out to oldpage. It invokes another memory fault and COW break. (I think that this memory fault is unhealthy.) Then, new value goes to x on oldpage (when it's physically tagged cache). A-2 sleeps. (7) Child process B is waken up. When it accesses at x, it sees new value suddenly. If we flush cache to oldpage at ptep_set_wrprotect, this couldn't occur. * * * I know that we should not do threads and fork. It is difficult to define clean semantics. Because another thread may touch memory while a thread which does memory copy for fork, the memory what the child process will see may be inconsistent. For the child, a page might be new, while another page might be old. For VIVT-WB cache machine, I am considering a possibility for the child process to have inconsistent memory even within a single page (when we have no flush at ptep_set_wrprotect). It will be needed for me to talk to linux-arch soon or later. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1270561069.4493.29.ca...@mulgrave.site
Bug#561203: threads and fork on machine with VIPT-WB cache
On Sun, 2010-04-04 at 22:51 -0400, John David Anglin wrote: Thanks a lot for the discussion. James Bottomley wrote: So your theory is that the data the kernel sees doing the page copy can be stale because of dirty cache lines in userspace (which is certainly possible in the ordinary way)? Yes. By design that shouldn't happen: the idea behind COW breaking is that before it breaks, the page is read only ... this means that processes can have clean cache copies of it, but never dirty cache copies (because writes are forbidden). That must be design, I agree. To keep this condition (no dirty cache for COW page), we need to flush cache before ptep_set_wrprotect. That's my point. Please look at the code path: (kernel/fork.c) do_fork - copy_process - copy_mm - dup_mm - dup_mmap - (mm/memory.c) copy_page_range - copy_p*d_range - copy_one_pte - ptep_set_wrprotect The function flush_cache_dup_mm is called from dup_mmap, that's enough for a case of a process with single thread. I think that: We need to flush cache before ptep_set_wrprotect for a process with multiple threads. Other threads may change memory after a thread invokes do_fork and before calling ptep_set_wrprotect. Specifically, a process may sleep at pte_alloc function to get a page. I agree. It is interesting that in the case of the Debian bug that a thread of the parent process causes the COW break and thereby corrupts its own memory. As far as I can tell, the fork'd child never writes to the memory that causes the fault. My testing indicates that your suggested change fixes the Debian bug. I've attached below my latest test version. This seems to fix the bug on both SMP and UP kernels. However, it doesn't fix all page/cache related issues on parisc SMP kernels that I commonly see. My first inclination after even before reading your analysis was to assume that copy_user_page was broken (i.e, that even if a processor cache was dirty when the COW page was write protected, it should be possible to do the flush before the page is copied). However, this didn't seem to work... Possibly, there are issues with aliased addresses. I note that sparc flushes the entire cache and purges the entire tlb in kmap_atomic/kunmap_atomic for highmem. Although the breakage that I see is not limited to PA8800/PA8900, I'm not convinced that we maintain coherency that is required for these processors in copy_user_page when we have multiple threads. As a side note, kmap_atomic/kunmap_atomic seem to lack calls to pagefault_disable()/pagefault_enable() on PA8800. Dave -- J. David Anglin dave.ang...@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index a27d2e2..b140d5c 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -14,6 +14,7 @@ #include linux/bitops.h #include asm/processor.h #include asm/cache.h +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn); /* * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel @@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, return old_pte; } -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct mm_struct *mm, unsigned long addr, pte_t *ptep) { #ifdef CONFIG_SMP unsigned long new, old; +#endif + pte_t old_pte = *ptep; + + if (atomic_read(mm-mm_users) 1) Just to verify there's nothing this is hiding, can you make this if (pte_dirty(old_pte)) and reverify? The if clause should only trip on the case where the parent has dirtied the line between flush and now. + flush_cache_page(vma, addr, pte_pfn(old_pte)); +#ifdef CONFIG_SMP do { old = pte_val(*ptep); new = pte_val(pte_wrprotect(__pte (old))); } while (cmpxchg((unsigned long *) ptep, old, new) != old); #else - pte_t old_pte = *ptep; set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte)); #endif } diff --git a/mm/memory.c b/mm/memory.c index 09e4b1b..21c2916 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, * in the parent and the child */ if (is_cow_mapping(vm_flags)) { - ptep_set_wrprotect(src_mm, addr, src_pte); + ptep_set_wrprotect(vma, src_mm, addr, src_pte); So this is going to be a hard sell because of the arch churn. There are, however, three ways to do it with the original signature. 1. implement copy_user_highpage ... this allows us
Bug#545229: linux-image-2.6.30-1-parisc: panic on boot
Package: linux-image-2.6.30-1-parisc Version: 2.6.30-6 Severity: critical Tags: patch Justification: breaks the whole system -- Package-specific info: -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (650, 'testing'), (500, 'stable') Architecture: hppa (parisc) Kernel: Linux 2.6.26-2-parisc Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Shell: /bin/sh linked to /bin/bash Versions of packages linux-image-2.6.30-1-parisc depends on: ii debconf [debconf-2.0] 1.5.27 Debian configuration management sy ii initramfs-tools [linux-initra 0.93.4 tools for generating an initramfs ii module-init-tools 3.10-3 tools for managing Linux kernel mo linux-image-2.6.30-1-parisc recommends no packages. Versions of packages linux-image-2.6.30-1-parisc suggests: pn linux-doc-2.6.30 none (no description available) ii palo 1.16+nmu1 Linux boot loader for parisc/hppa -- debconf information: linux-image-2.6.30-1-parisc/postinst/kimage-is-a-directory: linux-image-2.6.30-1-parisc/postinst/old-initrd-link-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/preinst/lilo-has-ramdisk: linux-image-2.6.30-1-parisc/preinst/abort-overwrite-2.6.30-1-parisc: linux-image-2.6.30-1-parisc/postinst/old-system-map-link-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/preinst/failed-to-move-modules-2.6.30-1-parisc: linux-image-2.6.30-1-parisc/prerm/removing-running-kernel-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/postinst/bootloader-test-error-2.6.30-1-parisc: linux-image-2.6.30-1-parisc/postinst/create-kimage-link-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/postinst/depmod-error-initrd-2.6.30-1-parisc: false shared/kernel-image/really-run-bootloader: true linux-image-2.6.30-1-parisc/preinst/lilo-initrd-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/postinst/old-dir-initrd-link-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/preinst/elilo-initrd-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/preinst/overwriting-modules-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/preinst/abort-install-2.6.30-1-parisc: linux-image-2.6.30-1-parisc/postinst/bootloader-error-2.6.30-1-parisc: linux-image-2.6.30-1-parisc/preinst/bootloader-initrd-2.6.30-1-parisc: true linux-image-2.6.30-1-parisc/postinst/depmod-error-2.6.30-1-parisc: false linux-image-2.6.30-1-parisc/preinst/initrd-2.6.30-1-parisc: linux-image-2.6.30-1-parisc/prerm/would-invalidate-boot-loader-2.6.30-1-parisc: true --- All current debian 2.6.30-1 kernels panic on boot on parisc systems when loading the initial modules. Problem is actually caused by binutils outputting duplicate .text section names. However, this trips a panic on boot because kernel/modules.c has insufficient error checking for this case Patches to fix this are From 1b364bf438cf337a3818aee77d68c0713f3e1fc4 Mon Sep 17 00:00:00 2001 From: James Bottomley james.bottom...@hansenpartnership.com Date: Wed, 26 Aug 2009 22:04:12 +0930 Subject: module: workaround duplicate section names and to fix up that patch From ea6bff368548d79529421a9dc0710fc5330eb504 Mon Sep 17 00:00:00 2001 From: Ingo Molnar mi...@elte.hu Date: Fri, 28 Aug 2009 10:44:56 +0200 Subject: modules: Fix build error in the !CONFIG_KALLSYMS case From 1b364bf438cf337a3818aee77d68c0713f3e1fc4 Mon Sep 17 00:00:00 2001 From: James Bottomley james.bottom...@hansenpartnership.com Date: Wed, 26 Aug 2009 22:04:12 +0930 Subject: module: workaround duplicate section names The root cause is a duplicate section name (.text); is this legal? [ Amerigo Wang: AFAIK, yes. ] However, there's a problem with commit 6d76013381ed28979cd122eb4b249a88b5e384fa in that if you fail to allocate a mod-sect_attrs (in this case it's null because of the duplication), it still gets used without checking in add_notes_attrs() This should fix it [ This patch leaves other problems, particularly the sections directory, but recent parisc toolchains seem to produce these modules and this prevents a crash and is a minimal change -- RR ] Signed-off-by: James Bottomley james.bottom...@suse.de Signed-off-by: Rusty Russell ru...@rustcorp.com.au Tested-by: Helge Deller del...@gmx.de Signed-off-by: Linus Torvalds torva...@linux-foundation.org --- kernel/module.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kernel/module.c b/kernel/module.c index 07c80e6..eccb561 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2355,7 +2355,8 @@ static noinline struct module *load_module(void __user *umod, if (err 0) goto unlink; add_sect_attrs(mod, hdr-e_shnum, secstrings, sechdrs); - add_notes_attrs(mod, hdr-e_shnum, secstrings, sechdrs); + if (mod-sect_attrs) + add_notes_attrs(mod, hdr-e_shnum, secstrings, sechdrs); /* Get rid of temporary copy */ vfree(hdr); -- 1.6.0.2
Bug#541702: linux-image-2.6.30-1-686: Kernel fails to start networking because no e100 firmware
On Sat, 2009-08-15 at 19:21 +0100, Ben Hutchings wrote: On Sat, 2009-08-15 at 10:47 -0700, james.bottom...@hansenpartnership.com wrote: Package: linux-image-2.6.30-1-686 Version: 2.6.30-5 Severity: serious Justification: Policy 2.2.1 That very same section explains why we cannot do what you are suggesting! No, it doesn't ... the decision to put firmware-linux in non-free is obviously wrong, since the same firmware was shipped as is in main with 2.6.26-2 On upgrade from 2.6.30-2-686 networking (on a remote machine) failed to start, meaning that a support ticket had to be opened for KVM access. I don't recommend running unstable on production machines. If you bother to read the bug report, you'd see it's actually running testing. Diagnosis revealed that the e100 driver in 2.6.26-2-686 required no firmware, so the firmware-linux package wasn't installed. Apparently 2.6.30-1-686 was built with external firmware for the e100 so it now depends on the firmware-linux package. This is a serious policy violation because required hardware stops working after the upgrade. No, most systems do not require the firmware-linux package. That's not really relevant, is it? linux-image ships with a ton of drivers most systems don't use as well. The point is that what was working before the upgrade didn't work after it. Fix suggested is to make 2.6.30-1-686 depend on linux-firmware so that on upgrade the necessary firmware is present. I intend to ensure that firmware-linux is mentioned in the release notes for squeeze, but it cannot be recommended or made a dependency. So this amounts to ... assuming the user can find the notice (because there's a blizzard of notices that go with each upgrade, particularly if they're going from lenny - squeeze) you'll tell them that you broke their system? The point here is to try and ensure large numbers of systems don't break before this exits testing for stable. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#541702: linux-image-2.6.30-1-686: Kernel fails to start networking because no e100 firmware
OK, so lets go back to basics here. The point of a bug report is to report a bug. The Bug here is that large numbers of systems will break on upgrade to this kernel once it hits stable. This is the problem that needs fixing. The fact that you find the suggested fix politically incorrect, or that you don't think I should have been able to find the bug in the first place are irrelevant to the fact that the bug exists. Apart from being appallingly bad release practice, breaking a significant fraction of users on an upgrade is also a debian policy violation as I've cited (the package is too buggy to release because of all the breakage). Trying to describe this as fixed because you'll put it in the release notes is wrong in principle because it doesn't prevent the existing users from suffering breakage a priori. A pre upgrade script that detected the problem based on the runtime detection that the user needed modules with firmware now in firmware-linux would be acceptable. Just stop, print the warning and allow them to OK or cancel. The list of modules now requiring firmware surely isn't non-free and it can be derived from the linux build system fairly easily. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#527265: [scott.bai...@eds.com: Bug#527265: linux-image-2.6.29-1-alpha-smp: detached firmware qlogic/1040.bin fails to load for qla1280]
On Wed, 2009-05-06 at 16:19 +0200, maximilian attems wrote: [4194023.390744] [ cut here ] [4194023.448362] WARNING: at /build/buildd-linux-2.6_2.6.29-3-alpha-bvFcox/linux-2.6-2.6.29/debian/build/source_alpha_none/kernel/so Is there any way we can get what that file and line actually is? It looks like the kernel build hasn't truncated the path name to top of tree for some reason (did you build with non standard options)? I suspect it might just be a lockdep error about calling request firmware with interrupts disabled. Could you also check to see you have this fix in your kernel: commit 0ce49d6da993adf8b17b7f3ed9805ade14a6a6f3 Author: David Woodhouse david.woodho...@intel.com Date: Wed Apr 8 01:22:36 2009 -0700 qla1280: Fix off-by-some error in firmware loading. James -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Debian parisc config for 2.6.26 broke the real time clock
On Tue, 2008-09-09 at 10:58 -0600, dann frazier wrote: On Mon, Sep 08, 2008 at 11:58:22AM -0500, James Bottomley wrote: On Mon, 2008-09-08 at 18:37 +0200, Bastian Blank wrote: On Sat, Sep 06, 2008 at 10:06:26AM -0500, James Bottomley wrote: Parisc is a CONFIG_GEN_RTC architecture (we use the generic real time clock driver). Well. Starting with 2.6.26, debian is now enabling CONFIG_RTC_CLASS (for platforms with specific RTC drivers) which disables CONFIG_GEN_RTC and means that hwclock (and ntp tracking) are broken on parisc with debian kernels 2.6.26 and above. Yes. Most arches already needs it anyway. All of the arch/parisc/config files get this right, so someone at debian must have screwed up somehow. The config option CONFIG_RTC_CLASS must be set to 'N' for all parisc systems. Apparently hppa uses its special rtc type. I propose that you take a look at drivers/rtc/rtc-ppc.c and write a wrapper rtc module. That may be the way forwards depending on what the RTC developers say, but it certainly won't fix the 2.6.26 regression. I'd suggest checking the debian kernel configs against the in-tree default files to see if there are any other cockups like this. If there are, this are bugs in the kernel themself. I'm no hppa developer so I won't waste my time with such. Hardly, the bug is actually in the debian configs. You have CONFIG_RTC_CLASS as a generic override. This is wrong, it needs to be subordinate to CONIFG_GEN_RTC. The quick fix would be to move the CONFIG_RTC_CLASS sequence down from generic to the architectures ... if you're incapable of doing that, I might be able to find time to look at doing it for you. The real bug looks to be the debian config system which relies on concatenation ... what's really needed is a way of turning CONFIG_RTC_CLASS off on parisc while keeping RTC_CLASS generic for those architectures that need it. fyi, I've got an hppa build in progress - disabling RTC_CLASS causes the symbols below to be removed (essentially, ^rtc_*) Thanks Dann ... are you the person in charge of the builds then? I'm not sure what's behind the hppa/ABI removal commits - I didn't see this discussed on the list - but I think it should be safe to remove these symbols from the ABI files if we can demonstrate that none of the conglomerate packages use them. rtc_class_close rtc_class_open rtc_device_register rtc_device_unregister rtc_irq_register rtc_irq_set_freq rtc_irq_set_state rtc_irq_unregister rtc_month_days rtc_read_alarm rtc_read_time rtc_set_alarm rtc_set_mmss rtc_set_time rtc_time_to_tm rtc_tm_to_time rtc_update_irq rtc_valid_tm rtc_year_days They certainly have to be inessential to the parisc ABI ... they don't work if anything's actually trying to use them. James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian parisc config for 2.6.26 broke the real time clock
On Tue, 2008-09-09 at 19:38 +0200, Bastian Blank wrote: On Tue, Sep 09, 2008 at 10:58:35AM -0600, dann frazier wrote: fyi, I've got an hppa build in progress - disabling RTC_CLASS causes the symbols below to be removed (essentially, ^rtc_*) The whole new-style rtc support. But it doesn't work, that's rather the point of all of this. If it doesn't work, it can hardly be a committed ABI. I'm not sure what's behind the hppa/ABI removal commits - I didn't see this discussed on the list - but I think it should be safe to remove these symbols from the ABI files if we can demonstrate that none of the conglomerate packages use them. Why does noone write this 30 lines module and fix it for all? Because only an idiot would fix a bug in a released product by introducing a new feature: that's QA and release process 101. New features belong in the next merge window, which will be for 2.6.28 by which time it should have been reasonably QA'd by the parisc developers. James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian parisc config for 2.6.26 broke the real time clock
On Tue, 2008-09-09 at 20:01 +0200, Bastian Blank wrote: On Tue, Sep 09, 2008 at 12:48:35PM -0500, James Bottomley wrote: They certainly have to be inessential to the parisc ABI ... they don't work if anything's actually trying to use them. Really? Which sort of don't work is this? Why should a I2C rtc device (some dallas chip) not work? Um, because the architecture doesn't have an i2c bus. James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian parisc config for 2.6.26 broke the real time clock
On Tue, 2008-09-09 at 20:29 +0200, Bastian Blank wrote: On Tue, Sep 09, 2008 at 01:12:01PM -0500, James Bottomley wrote: On Tue, 2008-09-09 at 20:01 +0200, Bastian Blank wrote: On Tue, Sep 09, 2008 at 12:48:35PM -0500, James Bottomley wrote: They certainly have to be inessential to the parisc ABI ... they don't work if anything's actually trying to use them. Really? Which sort of don't work is this? Why should a I2C rtc device (some dallas chip) not work? Um, because the architecture doesn't have an i2c bus. Well, it have USB, so can also power usb-to-i2c adapters. And there is even the rtc test module. Um you mean i2c_tiny_usb? It doesn't drive any supported hardware ... you have to build the connection yourself. Plus only the latest revs of PA actually supported USB ... Which don't work do you refer to? - Does not work because there is no binding to the hardware. - Does not work because a fundamental problem in the whole subsystem. (- Does not work because ...) Well, like most real world systems, you can artificially construct pathological failure cases. If I were you I'd stop looking for the heath robinson ones. No-one in their right mind is going to construct a USB to I2C interface for the purpose of running and I2C RTC; the set of users is clearly empty. The way you would get an external RTC is via a more credible interface like PCI (or EISA/ISA) is from a watchdog card ... however, no-one's apparently written a RTC interface for any of those yet. James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Debian parisc config for 2.6.26 broke the real time clock
On Mon, 2008-09-08 at 18:37 +0200, Bastian Blank wrote: On Sat, Sep 06, 2008 at 10:06:26AM -0500, James Bottomley wrote: Parisc is a CONFIG_GEN_RTC architecture (we use the generic real time clock driver). Well. Starting with 2.6.26, debian is now enabling CONFIG_RTC_CLASS (for platforms with specific RTC drivers) which disables CONFIG_GEN_RTC and means that hwclock (and ntp tracking) are broken on parisc with debian kernels 2.6.26 and above. Yes. Most arches already needs it anyway. All of the arch/parisc/config files get this right, so someone at debian must have screwed up somehow. The config option CONFIG_RTC_CLASS must be set to 'N' for all parisc systems. Apparently hppa uses its special rtc type. I propose that you take a look at drivers/rtc/rtc-ppc.c and write a wrapper rtc module. That may be the way forwards depending on what the RTC developers say, but it certainly won't fix the 2.6.26 regression. I'd suggest checking the debian kernel configs against the in-tree default files to see if there are any other cockups like this. If there are, this are bugs in the kernel themself. I'm no hppa developer so I won't waste my time with such. Hardly, the bug is actually in the debian configs. You have CONFIG_RTC_CLASS as a generic override. This is wrong, it needs to be subordinate to CONIFG_GEN_RTC. The quick fix would be to move the CONFIG_RTC_CLASS sequence down from generic to the architectures ... if you're incapable of doing that, I might be able to find time to look at doing it for you. The real bug looks to be the debian config system which relies on concatenation ... what's really needed is a way of turning CONFIG_RTC_CLASS off on parisc while keeping RTC_CLASS generic for those architectures that need it. James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#476285: linux-image-2.6.24-1-parisc: panics on boot in cmpxchg_futex_value_locked
Package: linux-image-2.6.24-1-parisc Version: 2.6.24-5 Severity: critical Tags: patch Justification: breaks the whole system This actually isn't just a bug in debian, it affects every distro which uses the stable tree as a base for instance, the gentoo bug is here: http://bugs.gentoo.org/show_bug.cgi?id=217030 The panic is: backtrace: [10587970] init+0x20/0xc4 [105807e0] kernel_init+0xf4/0x328 [10109c5c] ret_from_kernel_thread+0x1c/0x24 Kernel Fault: Code=26 regs=8fc241c0 (Addr=) YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 0100 Not tainted r00-03 0004ff0f 104fc140 10587970 f0412000 r04-07 105b57c0 r08-11 1059b810 105b5810 104c3810 r12-15 10568810 1059b810 8fc24088 3b9aca00 r16-19 f8c4 f17c f174 r20-23 4000 07ff 10587950 0001 r24-27 104c6010 r28-31 8fc24000 c99f4bdd 8fc241c0 105807e0 sr00-03 sr04-07 IASQ: IAOQ: 101433b8 101433bc IIR: 0f401089ISR: IOR: CPU:0 CR30: 8fc24000 CR31: ORIG_R28: IAOQ[0]: cmpxchg_futex_value_locked+0x28/0x9c IAOQ[1]: cmpxchg_futex_value_locked+0x2c/0x9c RP(r2): init+0x20/0xc4 Kernel panic - not syncing: Kernel Fault The root cause is a backport of this commit: commit a0c1e9073ef7428a14309cba010633a6cd6719ea Author: Thomas Gleixner [EMAIL PROTECTED] Date: Sat Feb 23 15:23:57 2008 -0800 futex: runtime enable pi and robust functionality To the stable tree (went in for 2.6.24.4). This breaks parisc because we weren't set up to process NULL as a futex cmpxchg address. We found and fixed the bug upstream as: commit c20a84c91048c76c1379011c96b1a5cee5c7d9a0 Author: Kyle McMartin [EMAIL PROTECTED] Date: Sat Mar 1 10:25:52 2008 -0800 [PARISC] futex: special case cmpxchg NULL in kernel space but, because we didn't know tglx had requested a backport, the fix wasn't backported to stable. I'll send the necessary patch into stable, but to get parisc working again on debian it has to be applied on top of the current kernel. NOTE: This bug was introduced into 2.6.24.4; 2.6.24.3 doesn't have it. -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (650, 'testing') Architecture: hppa (parisc) Kernel: Linux 2.6.22-3-parisc Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Shell: /bin/sh linked to /bin/bash Versions of packages linux-image-2.6.24-1-parisc depends on: ii debconf [debconf-2.0]1.5.20 Debian configuration management sy ii initramfs-tools [linux-initr 0.91e tools for generating an initramfs ii module-init-tools3.3-pre11-4 tools for managing Linux kernel mo linux-image-2.6.24-1-parisc recommends no packages. -- debconf information excluded *** parisc-cmpxchg-fix.diff From c8d402df60b3aad85b30cfe7df20f829ef6eb895 Mon Sep 17 00:00:00 2001 From: Kyle McMartin [EMAIL PROTECTED] Date: Sat, 1 Mar 2008 10:25:52 -0800 Subject: [PARISC] futex: special case cmpxchg NULL in kernel space Commit a0c1e9073ef7428a14309cba010633a6cd6719ea added code to futex.c to detect whether futex_atomic_cmpxchg_inatomic was implemented at run time: + curval = cmpxchg_futex_value_locked(NULL, 0, 0); + if (curval == -EFAULT) + futex_cmpxchg_enabled = 1; This is bogus on parisc, since page zero in kernel virtual space is the gateway page for syscall entry, and should not be read from the kernel. (That, and we really don't like the kernel faulting on its own address space...) Signed-off-by: Kyle McMartin [EMAIL PROTECTED] --- include/asm-parisc/futex.h | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/asm-parisc/futex.h b/include/asm-parisc/futex.h index dbee6e6..fdc6d05 100644 --- a/include/asm-parisc/futex.h +++ b/include/asm-parisc/futex.h @@ -56,6 +56,12 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval) int err = 0; int uval; + /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is +* our gateway page, and causes no end of trouble... +*/ + if (segment_eq(KERNEL_DS, get_fs()) !uaddr) + return -EFAULT; + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) return -EFAULT; @@ -67,5 +73,5 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval) return uval; } -#endif -#endif +#endif /*__KERNEL__*/ +#endif /*_ASM_PARISC_FUTEX_H*/ -- 1.5.3.8 -- To
Bug#476292: linux-image-2.6.24-1-parisc64: 64 bit kernel panics on boot in handle_interruption
Package: linux-image-2.6.24-1-parisc64 Version: 2.6.24-5 Severity: critical Tags: patch Justification: breaks the whole system The parisc 64 bit kernel panics on boot with this: CC net/ipv4/netfilter/iptable_raw.mod.o CC net/ipv4/tcp_diag.mod.o CC net/ipv4/tunnel4.mod.o CC net/ipv4/xfrm4_mode_beet.mod.o CC net/ipv4/xfrm4_tunnel.mod.o CC net/key/af_key.mod.o CC net/llc/llc.mod.o CC net/llc/llc2.mod.o CC net/netfilter/nfnetlink_log.mod.o CC net/netfilter/nfnetlink.mod.o CC net/netfilter/nfnetlink_queue.mod.o CC net/netfilter/xt_CLASSIFY.mod.o CC net/netfilter/x_tables.mod.o CC net/netfilter/xt_DSCP.mod.o CC net/netfilter/xt_MARK.mod.o CC net/netfilter/xt_NFQUEUE.mod.o CC net/netfilter/xt_comment.mod.o CC net/netfilter/xt_dccp.mod.o CC net/netfilter/xt_dscp.mod.o CC net/netfilter/xt_esp.mod.o CC net/netfilter/xt_length.mod.o CC net/netfilter/xt_limit.mod.o CC net/netfilter/xt_mac.mod.o CC net/netfilter/xt_mark.mod.o CC net/netfilter/xt_multiport.mod.o CC net/netfilter/xt_pkttype.mod.o CC net/netfilter/xt_policy.mod.o CC net/netfilter/xt_realm.mod.o CC net/netfilter/xt_sctp.mod.o CC net/netfilter/xt_string.mod.o CC net/netfilter/xt_tcpmss.mod.o CC net/netfilter/xt_tcpudp.mod.o CC net/packet/af_packet.mod.o CC net/sctp/sctp.mod.o CC net/sunrpc/auth_gss/auth_rpcgss.mod.o CC net/sunrpc/auth_gss/rpcsec_gss_krb5.mod.o CC net/sunrpc/auth_gss/rpcsec_gss_spkm3.mod.o CC net/sunrpc/sunrpc.mod.o CC net/tipc/tipc.mod.o CC net/xfrm/xfrm_user.mod.o CC sound/ac97_bus.mod.o CC sound/core/oss/snd-mixer-oss.mod.o CC sound/core/oss/snd-pcm-oss.mod.o CC sound/core/seq/oss/snd-seq-oss.mod.o CC sound/core/seq/snd-seq-device.mod.o CC sound/core/seq/snd-seq-dummy.mod.o CC sound/core/seq/snd-seq-midi-event.mod.o CC sound/core/seq/snd-seq-midi.mod.o CC sound/core/seq/snd-seq.mod.o CC sound/core/snd-hwdep.mod.o CC sound/core/snd-page-alloc.mod.o CC sound/core/snd-pcm.mod.o CC sound/core/snd-rawmidi.mod.o CC sound/core/snd-timer.mod.o CC sound/core/snd.mod.o CC sound/parisc/snd-harmony.mod.o CC sound/pci/ac97/snd-ac97-codec.mod.o CC sound/pci/rme9652/snd-hdspm.mod.o CC sound/pci/snd-ad1889.mod.o LD [M] crypto/aes_generic.ko CC sound/soundcore.mod.o LD [M] crypto/anubis.ko LD [M] crypto/arc4.ko LD [M] crypto/blkcipher.ko LD [M] crypto/blowfish.ko LD [M] crypto/cast5.ko LD [M] crypto/cast6.ko LD [M] crypto/cbc.ko LD [M] crypto/crc32c.ko LD [M] crypto/crypto_null.ko LD [M] crypto/deflate.ko LD [M] crypto/des_generic.ko LD [M] crypto/ecb.ko LD [M] crypto/khazad.ko LD [M] crypto/gf128mul.ko LD [M] crypto/md4.ko LD [M] crypto/md5.ko LD [M] crypto/michael_mic.ko LD [M] crypto/serpent.ko LD [M] crypto/sha256_generic.ko LD [M] crypto/sha512.ko LD [M] crypto/tcrypt.ko LD [M] crypto/tea.ko LD [M] crypto/tgr192.ko LD [M] crypto/twofish.ko LD [M] crypto/twofish_common.ko LD [M] crypto/wp512.ko LD [M] drivers/base/firmware_class.ko LD [M] drivers/block/aoe/aoe.ko LD [M] drivers/block/cryptoloop.ko LD [M] drivers/block/loop.ko LD [M] drivers/block/pktcdvd.ko LD [M] drivers/block/sx8.ko LD [M] drivers/block/ub.ko LD [M] drivers/block/umem.ko LD [M] drivers/cdrom/cdrom.ko LD [M] drivers/char/lp.ko LD [M] drivers/char/agp/parisc-agp.ko LD [M] drivers/char/raw.ko LD [M] drivers/hid/usbhid/usbhid.ko LD [M] drivers/input/keyboard/hil_kbd.ko LD [M] drivers/input/keyboard/hilkbd.ko LD [M] drivers/input/misc/hp_sdc_rtc.ko LD [M] drivers/input/misc/uinput.ko LD [M] drivers/input/mouse/hil_ptr.ko LD [M] drivers/input/mouse/psmouse.ko LD [M] drivers/input/mouse/sermouse.ko LD [M] drivers/input/serio/parkbd.ko LD [M] drivers/input/serio/pcips2.ko LD [M] drivers/input/serio/serio_raw.ko LD [M] drivers/md/dm-crypt.ko LD [M] drivers/input/serio/serport.ko LD [M] drivers/md/dm-emc.ko LD [M] drivers/md/dm-mirror.ko LD [M] drivers/md/dm-mod.ko LD [M] drivers/md/dm-multipath.ko LD [M] drivers/md/dm-round-robin.ko LD [M] drivers/md/dm-snapshot.ko LD [M] drivers/md/dm-zero.ko LD [M] drivers/md/faulty.ko LD [M] drivers/md/linear.ko LD [M] drivers/md/md-mod.ko LD [M] drivers/md/multipath.ko LD [M] drivers/md/raid1.ko LD [M] drivers/md/raid0.ko LD [M] drivers/md/raid10.ko LD [M] drivers/message/fusion/mptbase.ko LD [M] drivers/message/fusion/mptctl.ko LD [M] drivers/message/fusion/mptfc.ko LD [M] drivers/message/fusion/mptsas.ko LD [M] drivers/message/fusion/mptscsih.ko LD [M] drivers/message/fusion/mptspi.ko LD [M] drivers/net/3c59x.ko LD [M]
Bug#374792: Dell CERC ATA100/4ch support
On Wed, 2007-05-16 at 14:36 +0100, Leigh Blackwell wrote: I have been looking at the issue with theses cerc devices, has this bug 374792 been closed based on people reverting the firmware to 6.61. Unfortunately Dell doesn't support a Firmware version that old on our Server, is it possible to re-open this bug? I have been unable to get the current etch install to recognize my driver controller with any of the megaraid drivers. Umm, but this is a bug in Dell Support isn't it? I don't think there's a kernel fix for that. LSIs position is that in current kernels they only support this device with the new megaraid driver and only for firmware version = 6.61. Surely you just need to get Dell and LSI on the same page? James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#391384: linux-image-2.6.18-1-686: Compaq Proliant DL380 fails to boot
On Sun, 2006-10-08 at 14:40 -0700, Matt Taggart wrote: dann frazier writes... hey Grant/James, It looks like we're still having cpqarray/sym2 conflicts under 2.6.18 - any idea what this problem may be? This is for dl380. At the very bottom (after the close of the bug) of http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=380272 someone suggests a fix for dl380. jejb/ggg, Does that look like the right fix? Er ... you mean the email that I sent pointing to a fix in the scsi-rc-fixes tree? Then yes, I think it's a correct fix. It's already in 2.6.18 James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [PATCH] MODULE_FIRMWARE for binary firmware(s)
This is a reference implementation with the debian mkinitrd-tools package. It shows how to identify the firmware files necessary for drivers in the initrd and also includes a primitive system for loading them. I've tested this with the aic94xx driver using the new MODULE_FIRMWARE() tag. Initramfs should be much easier because it already includes most of the boot time loading; all it has to do is the piece identifying the firmware for the selected modules. James --- Index: initrd-tools-0.1.84.1/mkinitrd === --- initrd-tools-0.1.84.1.orig/mkinitrd 2006-08-28 13:37:30.0 -0500 +++ initrd-tools-0.1.84.1/mkinitrd 2006-08-28 16:33:28.0 -0500 @@ -950,6 +950,7 @@ add_modules_dep() { return elif ! [ $oldstyle ]; then add_modules_dep_2_5 $VERSION + add_firmware $VERSION return fi @@ -1016,6 +1017,25 @@ add_modules_dep_2_5() { fi } +add_firmware() { + ver=$1 + set -- $FSTYPES + unset IFS + + cat modules.? | + while read junk mod junk; do + modpath=$(modprobe --set-version $ver --list $mod) + if [ -z $modpath ]; then + continue; + fi + p=$(modinfo -F firmware $modpath |sed 's/^/\/lib\/firmware\//') + if [ -n $p ]; then + echo $p + echo /usr/sbin/firmware_loader + fi + done +} + add_command() { if [ -h initrd/$1 ]; then return Index: initrd-tools-0.1.84.1/firmware_loader === --- /dev/null 1970-01-01 00:00:00.0 + +++ initrd-tools-0.1.84.1/firmware_loader 2006-08-28 16:56:18.0 -0500 @@ -0,0 +1,29 @@ +#!/bin/sh -e +# +# firmware loader agent +# +FIRMWARE_DIRS=/lib/firmware + +if [ $SUBSYSTEM != firmware ]; then +exit 0; +fi + +if [ ! -e /sys/$DEVPATH/loading ]; then +echo /sys/$DEVPATH/ does not exist +exit 1 +fi + +for DIR in $FIRMWARE_DIRS; do +[ -e $DIR/$FIRMWARE ] || continue +echo 1 /sys/$DEVPATH/loading +cat $DIR/$FIRMWARE /sys/$DEVPATH/data +echo 0 /sys/$DEVPATH/loading +exit 0 +done + +# the firmware was not found +echo -1 /sys/$DEVPATH/loading + +echo Cannot find the $FIRMWARE firmware +exit 1 + Index: initrd-tools-0.1.84.1/debian/rules === --- initrd-tools-0.1.84.1.orig/debian/rules 2006-08-28 16:07:52.0 -0500 +++ initrd-tools-0.1.84.1/debian/rules 2006-08-28 16:08:56.0 -0500 @@ -35,7 +35,7 @@ install: install -o root -g root -m 644 \ echo init linuxrc debian/initrd-tools/usr/share/initrd-tools/ install -o root -g root -m 755 \ - mkinitrd debian/initrd-tools/usr/sbin/ + mkinitrd firmware_loader debian/initrd-tools/usr/sbin/ install -o root -g root -m 644 \ mkinitrd.conf modules debian/initrd-tools/etc/mkinitrd/ ifeq ($(DEB_HOST_ARCH),powerpc) Index: initrd-tools-0.1.84.1/linuxrc === --- initrd-tools-0.1.84.1.orig/linuxrc 2006-08-28 16:30:30.0 -0500 +++ initrd-tools-0.1.84.1/linuxrc 2006-08-28 16:40:45.0 -0500 @@ -10,3 +10,7 @@ echo 256 proc/sys/kernel/real-root-dev mount -nt tmpfs tmpfs bin || mount -nt ramfs ramfs bin echo $root bin/root +if [ -x /usr/sbin/firmware_loader ]; then + echo /usr/sbin/firmware_loader /proc/sys/kernel/hotplug +fi + Index: initrd-tools-0.1.84.1/init === --- initrd-tools-0.1.84.1.orig/init 2006-08-28 16:54:52.0 -0500 +++ initrd-tools-0.1.84.1/init 2006-08-28 16:55:01.0 -0500 @@ -366,6 +366,7 @@ get_cmdline [ -c /dev/.devfsd ] DEVFS=yes mount -nt devfs devfs devfs +mount -nt sysfs sysfs sys if [ $IDE_CORE != none ] [ -n $ide_options ]; then echo modprobe -k $IDE_CORE options=\$ide_options\ modprobe -k $IDE_CORE options=$ide_options -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [PATCH] MODULE_FIRMWARE for binary firmware(s)
On Tue, 2006-08-29 at 01:04 +0200, Sven Luther wrote: Notice that mkinitrd-tools is dead, and will probably be removed from etch. mkinitramfs-tools and yaird are the two currently used tools. Yes ... I'm aware of that. That's why this is a reference implementation. initramfs should be easier ... I just don't have any initramfs systems at the moment, so I did what I had and could verify. James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#380272: kernel-image-2.6-686-smp: cpqarray module fails to detect arrays
On Fri, 2006-08-18 at 12:39 -0400, Kyle McMartin wrote: The problem is because they both claim support for the same PCI Ids: That's this fix, isn't it? http://www.kernel.org/git/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=b2b3c121076961333977f485f0d54c22121df920 James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#338089: New aic7xxx driver fails spectacularly on 2940UW
On Sun, 2005-11-20 at 21:21 -0500, Graham Knap wrote: Sure enough, the kernel now boots. I'll attach the dmesg output here. Do you guys have a final patch in mind? Let me know if there are other tests you'd like me to run. Now that I know how to do this, I should be able to turn around test results fairly quickly. OK, try the attached. If it works out, I'll soak it in -mm for a while and then try to put it in as a bug fix for 2.6.15. James diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c --- a/drivers/scsi/scsi_transport_spi.c +++ b/drivers/scsi/scsi_transport_spi.c @@ -812,12 +812,10 @@ spi_dv_device_internal(struct scsi_devic if (!scsi_device_sync(sdev) !scsi_device_dt(sdev)) return; - /* see if the device has an echo buffer. If it does we can -* do the SPI pattern write tests */ - - len = 0; - if (scsi_device_dt(sdev)) - len = spi_dv_device_get_echo_buffer(sdev, buffer); + /* len == -1 is the signal that we need to ascertain the +* presence of an echo buffer before trying to use it. len == +* 0 means we don't have an echo buffer */ + len = -1; retry: @@ -840,11 +838,23 @@ spi_dv_device_internal(struct scsi_devic if (spi_min_period(starget) == 8) DV_SET(pcomp_en, 1); } + /* Do the read only INQUIRY tests */ + spi_dv_retrain(sdev, buffer, buffer + sdev-inquiry_len, + spi_dv_device_compare_inquiry); + /* See if we actually managed to negotiate and sustain DT */ + if (i-f-get_dt) + i-f-get_dt(starget); + + /* see if the device has an echo buffer. If it does we can do +* the SPI pattern write tests. Because of some broken +* devices, we *only* try this on a device that has actually +* negotiated DT */ + + if (len == -1 spi_dt(starget)) + len = spi_dv_device_get_echo_buffer(sdev, buffer); - if (len == 0) { + if (len = 0) { starget_printk(KERN_INFO, starget, Domain Validation skipping write tests\n); - spi_dv_retrain(sdev, buffer, buffer + len, - spi_dv_device_compare_inquiry); return; } -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#338089: New aic7xxx driver fails spectacularly on 2940UW
On Sun, 2005-11-13 at 12:41 -0500, Doug Ledford wrote: If the drive is unaccessible after the DV failure, even on a warm reboot (which includes a SCSI bus reset), then the drive is flat hung. Something done in the current code is breaking it. Can you get a boot with DV turned off and capture the log messages and post them here please? You already said it didn't help with the problem, but I'd like to see the failure scenario with it off, that might help determine the true root cause of the issue. Yes, you're right ... the sequencer code seems to identify the WRITE_BUFFER as the failing command. Can you try with the attached patch, which will force DV to ignore the echo buffer write tests? Thanks, James diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c --- a/drivers/scsi/scsi_transport_spi.c +++ b/drivers/scsi/scsi_transport_spi.c @@ -816,8 +816,10 @@ spi_dv_device_internal(struct scsi_devic * do the SPI pattern write tests */ len = 0; +#if 0 if (scsi_device_dt(sdev)) len = spi_dv_device_get_echo_buffer(sdev, buffer); +#endif retry: -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#338089: New aic7xxx driver fails spectacularly on 2940UW
On Sun, 2005-11-13 at 13:03 -0500, Graham Knap wrote: Doug Ledford [EMAIL PROTECTED] wrote: You already said it didn't help with the problem, I meant that I don't think I successfully disabled DV, because the boot messages were *identical*, except for the line where the kernel shows the Kernel command line. I had added this argument at the end of the line: aic7xxx=dv:{0} I've re-read aic7xxx.txt and I'm not sure what I'm doing wrong. If you can tell me how to disable DV, I'd be happy to give it a try. aic7xxx.txt is out of date. The aic7xxx (and 79xx) drivers use the generic domain validation code now rather than the old aic specific ones (which is what the dv:{0} option is referring to). If you try the code in the prior email, I think that will disable the piece of DV that's causing the problem. If the test code succeeds, the problem is pretty nasty: Apparently the device claims DT support but in fact rejects DT in the negotiation. We use DT support to begin the check for an echo buffer, which starts with READ_BUFFERS for the descriptor. Apparently this device returns a valid descriptor with a reasonable echo buffer size and then promptly throws a wobbly when we try to use it. James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#338089: New aic7xxx driver fails spectacularly on 2940UW
On Sun, 2005-11-13 at 14:42 -0500, Doug Ledford wrote: The device is on a non-LVD bus. Certain devices were created back when the spec still stated that using PPR negotiation messages on a non-LVD bus was a no-no. As the echo buffer was an addition to support DV, and originally DV wasn't intended to be used on non-LVD busses, it might stand to reason that this device simply is going tits up because we are attempting to use the echo buffer while in SE mode. Checking that PPR/DT is valid (not just between controller and device, but also given bus mode) and only using echo buffer DV when all LVD conditions are met would likely solve the problem (assuming that the problem is what you are referring to). I think so (pending confirmation of the patch working). The current DV code assumes that if the device claims DT support in the INQUIRY data *and* it returns a valid descriptor to the READ_BUFFER descriptors command then enhanced DV should be attempted. What I'm contemplating doing (which is what you also suggest) is tightening up the check so if the standard DV read tests produce a negotiation that doesn't set DT then we won't attempt enhanced DV James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#338089: New aic7xxx driver fails spectacularly on 2940UW
On Tue, 2005-11-08 at 20:47 -0500, Graham Knap wrote: Target 0 Negotiation Settings User: 40.000MB/s transfers (20.000MHz, offset 127, 16bit) Goal: 40.000MB/s transfers (20.000MHz, offset 8, 16bit) Curr: 40.000MB/s transfers (20.000MHz, offset 8, 16bit) That's a bit unfortunate ... it shows that the domain validation code negotiated identical settings in the old kernel, so it doesn't look like that's the problem. My best guess would be that the bus is slightly marginal. The aic7xxx drivers are notoriously sensitive to bus problems. Could you try lowering the bus speed to 10MHz in the aic7xxx bios and see if that helps? Thanks, James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#338089: New aic7xxx driver fails spectacularly on 2940UW
On Tue, 2005-11-08 at 12:31 +0900, Horms wrote: On Mon, Nov 07, 2005 at 09:45:23PM -0500, Graham Knap wrote: Package: linux-image-2.6.14-1-686 Version: 2.6.14-2 Recent versions of the aic7xxx driver will not boot on my secondary PC. The 2.6.8 kernel shipped with sarge works perfectly, but neither the 2.6.12 kernel in testing nor the 2.6.14 kernel in unstable will boot. This is an older system: Asus P2L-B, Celeron 500MHz, 384MB RAM, GeForce2 MX AGP Adaptec 2940UW, IBM DDYS-T09170 (9GB disk) I can't understand what exactly is failing, but I will attach a boot log. (So null modem cables *are* still useful for something!) I've tried adding aic7xxx=dv:{0} to the boot arguments but that doesn't seem to make a difference. Also, aic7xxx=verbose doesn't seem to do anything either. I don't know if this makes a difference but my 2940UW reports its BIOS revision as 1.34.3 during POST. Any help would be much appreciated. Hi Graham, thanks for your detailed report. This does smell a lot like a driver bug, and as such, its proably best passed onto the upstream maintainers. As such I've CCed James Bottomley and linux-scsi for comment. The other main possiblility, is that perhaps the aic7xxx_old driver would work. Or perhaps some other module loading foo, though its seems the module is loaded fine, it just doesn't like your card very much. This is an older drive, so it looks like it passes domain validation (read only) but then chokes on the next command. On 2.6.8, what do the transport settings report? (that's cat /proc/scsi/aic7xxx/0)? Thanks, James -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]