from:"James Bottomley"

Re: [PATCH v2] builddeb: Support signing kernels with the module signing key

2022-02-09 Thread James Bottomley

On Tue, 2022-02-08 at 13:10 +, Matthew Wilcox wrote:
> On Tue, Feb 08, 2022 at 12:01:22PM +0100, Julian Andres Klode wrote:
> > It's worth pointing out that in Ubuntu, the generated MOK key
> > is for module signing only (extended key usage
> > 1.3.6.1.4.1.2312.16.1.2), kernels signed with it will NOT be
> > bootable.
> 
> Why should these be separate keys?  There's no meaningful security
> boundary between a kernel module and the ernel itself; a kernel
> modulecan, for example, write to CR3, and that's game over for
> any pretence at separation.

It's standard practice for any automated build private key to be
destroyed immediately to preserve security.  Thus the modules get
signed with a per kernel ephemeral build key but the MoK key is a long
term key with a special signing infrastructure, usually burned into the
distro version of shim.  The kernel signing key usually has to be long
term because you want shim to boot multiple kernels otherwise upgrading
becomes a nightmare.

James

Bug#959069: linux-image-5.5.0-2-amd64 won't boot in a AMD SEV Virtual Machine

2020-04-28 Thread James Bottomley

Subject: linux-image-5.5.0-2-amd64 won't boot in a AMD SEV Virtual Machine
Package: src:linux
Version: 5.5.17-1
Severity: important

The boot failure is total: not even a console log can be seen, and
seems to be due to the necessary memory encryption option not being set
in the debian kernel: 

# CONFIG_AMD_MEM_ENCRYPT is not set

In spite of the fact that the rest of the SEV encryption variables are
set:

CONFIG_KVM_AMD_SEV=y
CONFIG_USB_SEVSEG=m

So I'm reporting this on the assumption that it is supposed to work out
of the box and not setting AMD_MEM_ENCRYPT was an oversight.  Not
setting this means that all the I/O devices are sending encrypted
memory pages through to QEMU which is what's causing the hang.  With
this set, the kernel would bounce all the encrypted pages into
unencrypted pages before sending them to devices.

James

Bug#941611: linux-image-5.2.0-2-amd64: Kernel 5.2 has terrible performance under load

2019-10-03 Thread James Bottomley

On Wed, 2019-10-02 at 22:07 +0200, Salvatore Bonaccorso wrote:
> > Linux Kernel 5.2 is completely unusable on most of my systems.  The
> > problem seems to be something to do with memory compaction causing
> > intervals where the system becomes unresponsive.
> > 
> > This is definitely an upstream issue (my laptop running the
> > upstream kernel is displaying the problem as well) so this bug is
> > really just a warning not to deploy the 5.2 kernel until a fix is
> > found.
> 
> If so, could you point where it was reported upstream so we can set
> accorrdingly where it has been forwarded to?

Well the initial incarnation of this upstream patch set

https://marc.info/?t=15676268933

Seems to fix the problem in my testbeds.  I'm testing out the first two
patches only at the moment.

James

Bug#941611: linux-image-5.2.0-2-amd64: Kernel 5.2 has terrible performance under load

2019-10-02 Thread James Bottomley

Package: src:linux
Version: 5.2.9-2
Severity: important
Tags: upstream

Dear Maintainer,

Linux Kernel 5.2 is completely unusable on most of my systems.  The problem
seems to be something to do with memory compaction causing intervals where
the system becomes unresponsive.

This is definitely an upstream issue (my laptop running the upstream
kernel is displaying the problem as well) so this bug is really just a
warning not to deploy the 5.2 kernel until a fix is found.

-- Package-specific info:
** Kernel log: boot messages should be attached

** Model information
sys_vendor: 
product_name: 
product_version: 
chassis_vendor: 
chassis_version: 
bios_vendor: Intel Corp.
bios_version: BX97510J.86A.1209.2006.0601.1340
board_vendor: Intel Corporation
board_name: D975XBX
board_version: AAD27094-305

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation 82975X Memory Controller Hub 
[8086:277c]
Subsystem: Intel Corporation 82975X Memory Controller Hub [8086:5842]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel modules: i82975x_edac

00:01.0 PCI bridge [0604]: Intel Corporation 82975X PCI Express Root Port 
[8086:277d] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [88] Subsystem: Intel Corporation 82975X PCI Express Root 
Port [8086:]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Address:   Data: 
Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- 
TransPend-
LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s, Exit 
Latency L0s <256ns
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
Surprise-
Slot #0, PowerLimit 75.000W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
Control: AttnInd Off, PwrInd On, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
Changed: MRL- PresDet+ LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID , PMEStatus- PMEPending-
Capabilities: [100 v1] Virtual Channel
Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed+ WRR32- WRR64- WRR128-
Ctrl:   ArbSelect=Fixed
Status: InProgress-
VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [140 v1] Root Complex Link
Desc:   PortNumber=02 ComponentID=01 EltType=Config
Link0:  Desc:   TargetPort=00 TargetComponent=01 AssocRCRB- 
LinkType=MemMapped LinkValid+
Addr:   fed19000
Link1:  Desc:   TargetPort=03 TargetComponent=01 AssocRCRB- 
LinkType=Config LinkValid+
Addr:   00:03.0  CfgSpace=00018000
Kernel driver in use: pcieport

00:1b.0 Audio device [0403]: Intel Corporation NM10/ICH7 Family High Definition 
Audio Controller [8086:27d8] (rev 01)
Subsystem: Intel Corporation NM10/ICH7 Family High Definition Audio 
Controller [8086:0417]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort-

Re: [PATCH RESEND 2] mvsas: Recognise device/subsystem 9485/9485 as 88SE9485

2014-04-14 Thread James Bottomley

On Wed, 2014-02-19 at 01:06 +, Ben Hutchings wrote:
 Matt Taggart reported that mvsas didn't bind to the Marvell
 SAS controller on a Supermicro AOC-SAS2LP-MV8 board.
 
 lspci reports it as:
 
 01:00.0 RAID bus controller [0104]: Marvell Technology Group Ltd. Device 
 [1b4b:9485] (rev 03)
 Subsystem: Marvell Technology Group Ltd. Device [1b4b:9485]
 [...]
 
 Add it to the device table as chip_9485.

Adding Marvell maintainer to cc.  Can we get an ack on this ... or is
mvsas dead and I can just apply it anyway?

Thanks,

James


 Reported-by: Matt Taggart tagg...@debian.org
 Tested-by: Matt Taggart tagg...@debian.org
 Signed-off-by: Ben Hutchings b...@decadent.org.uk
 ---
  drivers/scsi/mvsas/mv_init.c | 9 +
  1 file changed, 9 insertions(+)
 
 diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
 index 7b7381d..83fa5f8 100644
 --- a/drivers/scsi/mvsas/mv_init.c
 +++ b/drivers/scsi/mvsas/mv_init.c
 @@ -729,6 +729,15 @@ static struct pci_device_id mvs_pci_table[] = {
   .class_mask = 0,
   .driver_data= chip_9485,
   },
 + {
 + .vendor = PCI_VENDOR_ID_MARVELL_EXT,
 + .device = 0x9485,
 + .subvendor  = PCI_ANY_ID,
 + .subdevice  = 0x9485,
 + .class  = 0,
 + .class_mask = 0,
 + .driver_data= chip_9485,
 + },
   { PCI_VDEVICE(OCZ, 0x1021), chip_9485}, /* OCZ RevoDrive3 */
   { PCI_VDEVICE(OCZ, 0x1022), chip_9485}, /* OCZ RevoDrive3/zDriveR4 
 (exact model unknown) */
   { PCI_VDEVICE(OCZ, 0x1040), chip_9485}, /* OCZ RevoDrive3/zDriveR4 
 (exact model unknown) */
 
 
 -- 
 Ben Hutchings
 One of the nice things about standards is that there are so many of them.
 




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/1397504559.2207.30.ca...@dabdike.int.hansenpartnership.com

Remove scsi_wait_scan module

2012-05-27 Thread James Bottomley

scsi_wait_scan was introduced with asynchronous host scanning as a hack
for distributions that weren't using proper udev based wait for root to
appear in their initramfs scripts.  In 2.6.30 Commit 

c751085943362143f84346d274e0011419c84202
Author: Rafael J. Wysocki r...@sisk.pl
Date:   Sun Apr 12 20:06:56 2009 +0200

PM/Hibernate: Wait for SCSI devices scan to complete during resume

Actually broke scsi_wait_scan because it renders
scsi_complete_async_scans() a nop for modular SCSI if you include
scsi_scans.h (which this module does).

The lack of bug reports is sufficient proof that this module is no
longer used.

James

---

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index bea04e5..ac6ea28 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -263,23 +263,6 @@ config SCSI_SCAN_ASYNC
  You can override this choice by specifying scsi_mod.scan=sync
  or async on the kernel's command line.
 
-config SCSI_WAIT_SCAN
-   tristate  # No prompt here, this is an invisible symbol.
-   default m
-   depends on SCSI
-   depends on MODULES
-# scsi_wait_scan is a loadable module which waits until all the async scans are
-# complete.  The idea is to use it in initrd/ initramfs scripts.  You modprobe
-# it after all the modprobes of the root SCSI drivers and it will wait until
-# they have all finished scanning their buses before allowing the boot to
-# proceed.  (This method is not applicable if targets boot independently in
-# parallel with the initiator, or with transports with non-deterministic target
-# discovery schemes, or if a transport driver does not support scsi_wait_scan.)
-#
-# This symbol is not exposed as a prompt because little is to be gained by
-# disabling it, whereas people who accidentally switch it off may wonder why
-# their mkinitrd gets into trouble.
-
 menu SCSI Transports
depends on SCSI
 
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 8deedea..f188509 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -161,8 +161,6 @@ obj-$(CONFIG_SCSI_OSD_INITIATOR) += osd/
 # This goes last, so that real scsi devices probe earlier
 obj-$(CONFIG_SCSI_DEBUG)   += scsi_debug.o
 
-obj-$(CONFIG_SCSI_WAIT_SCAN)   += scsi_wait_scan.o
-
 scsi_mod-y += scsi.o hosts.o scsi_ioctl.o constants.o \
   scsicam.o scsi_error.o scsi_lib.o
 scsi_mod-$(CONFIG_SCSI_DMA)+= scsi_lib_dma.o
diff --git a/drivers/scsi/scsi_wait_scan.c b/drivers/scsi/scsi_wait_scan.c
deleted file mode 100644
index 74708fc..000
--- a/drivers/scsi/scsi_wait_scan.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/*
- * scsi_wait_scan.c
- *
- * Copyright (C) 2006 James Bottomley james.bottom...@steeleye.com
- *
- * This is a simple module to wait until all the async scans are
- * complete.  The idea is to use it in initrd/initramfs scripts.  You
- * modprobe it after all the modprobes of the root SCSI drivers and it
- * will wait until they have all finished scanning their busses before
- * allowing the boot to proceed
- */
-
-#include linux/module.h
-#include linux/device.h
-#include scsi/scsi_scan.h
-
-static int __init wait_scan_init(void)
-{
-   /*
-* First we need to wait for device probing to finish;
-* the drivers we just loaded might just still be probing
-* and might not yet have reached the scsi async scanning
-*/
-   wait_for_device_probe();
-   /*
-* and then we wait for the actual asynchronous scsi scan
-* to finish.
-*/
-   scsi_complete_async_scans();
-   return 0;
-}
-
-static void __exit wait_scan_exit(void)
-{
-}
-
-MODULE_DESCRIPTION(SCSI wait for scans);
-MODULE_AUTHOR(James Bottomley);
-MODULE_LICENSE(GPL);
-
-late_initcall(wait_scan_init);
-module_exit(wait_scan_exit);




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/1338110026.2957.5.ca...@dabdike.int.hansenpartnership.com

Bug#622997: libata-sff/pata_cmd64x problem with hardwired configurations

2011-04-19 Thread James Bottomley

On Tue, 2011-04-19 at 11:20 +0200, Bartlomiej Zolnierkiewicz wrote:
 From: Bartlomiej Zolnierkiewicz bzoln...@gmail.com
 Subject: [PATCH v2] pata_cmd64x: add enablebits checking

 Fixes IDE - libata regression.

 IDE's cmd64x host driver has been supporting enablebits checking
 since the initial driver's merge.

Actually, the thread discussing the proposed patches is here:

http://marc.info/?t=13031522715

I much prefer the dummy interface approach to the prereset one because
it prevents any possible poke at the registers which will crash the box.

James

-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1303221565.3171.12.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley

On Sun, 2011-04-17 at 11:11 -0400, John David Anglin wrote:
  On Sat, 2011-04-16 at 19:35 -0400, John David Anglin wrote: 
   On Sat, 16 Apr 2011, James Bottomley wrote:
   
Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts
the OHCI module and discovers the ports just fine.
   
   Boot 2.6.39-rc3 fails for me with attached config.
  
  I can't quite build it.  With gcc version 4.2.4 (Debian 4.2.4-6) I'm
  getting an ICE:
  
  net/wireless/reg.c: In function 'freq_reg_info_regd':
  net/wireless/reg.c:645: internal compiler error: in expand_expr_real_1,
  at expr.c:8744
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See URL:http://gcc.gnu.org/bugs.html for instructions.
  For Debian GNU/Linux specific bug reporting instructions,
  see URL:file:///usr/share/doc/gcc-4.2/README.Bugs.
  make[2]: *** [net/wireless/reg.o] Error 1
 
 This is probably fixed as it doesn't occur with 
 gcc version 4.5.3 20110101 (prerelease) [gcc-4_5-branch revision 168387] 
 (GCC) .
 GCC 4.2 and 4.3 are no longer maintained and there won't be any further
 releases from these branches.
 
 Without looking at the above, it's hard to tell whether the bug is a
 middle-end or backend bug.  Many middle-end bugs are fixed in more recent
 GCC versions.  Although newer versions may bring their own problems,
 we can get help in fixing problems particularly if they are regressions. 
 
 The asm delay slot bug affected all GCC versions.  I backported the fix to
 the 4.3, 4.4 and 4.5 branches.  This is a problem in the kernel because of
 the following:
 
 ** The __asm__ op below simple prevents gcc/ld from reordering
 ** instructions across the mb() call.
 */
 #define mb()__asm__ __volatile__(:::memory) /* barrier() 
 */
 
 It's just a matter of chance whether a barrier ends up in the delay slot
 of a branch in a critical location.

I'll redo optimisation on that one and see if I can avoid this.

  Plus there's a bug in my kernel code:
  
  drivers/usb/host/xhci-pci.c: In function 'xhci_pci_setup':
  drivers/usb/host/xhci-pci.c:61: error: implicit declaration of function
  'kzalloc
  
  If I correct for these (add missing slab.h include and disable wireless)
 
 I had to add missing slab.h as well.  However, I didn't touch wireless
 with 4.5.3.
 
  and build, the last message I see is
  
  turn off boot console ttyB0
  
  Which indicates it's got a problem with the console configuration (I
  don't see any console registration for the DIVA serial port on ttyS1 in
  the boot log).
 
 Comparing the console output that I recorded for the debian kernel, I
 see udev starts much earlier.  It only has the initial message from the
 tg3 driver and SCSI subsystem.

It's most likely a driver module that's getting loaded which is turned
off in the booting configuration ... finding it isn't going to be easy,
though ...

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1303053807.2583.1.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley

On Sun, 2011-04-17 at 10:23 -0500, James Bottomley wrote:
 It's most likely a driver module that's getting loaded which is turned
 off in the booting configuration ... finding it isn't going to be easy,
 though ...

Finally got a build (had to swap out -Os for -O2).

I traced the module loads and successful inits and found it; it's
pata_cmd64x  ... it loads but never returns from init.  I bet it's
trying to poke into ISA space which causes the HPMC.

Removing this one module from the system allows it to boot again.

I'd suggest just disabling in the parisc config for now.  Using an ATA
based CD/DVD instead of a SCSI one is a very recent thing.  I'll see if
I can get it working, but ATA controllers tend to be somewhat nasty and
x86 specific ...

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1303068487.2583.7.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley

On Sun, 2011-04-17 at 20:37 +0100, Ben Hutchings wrote:
 On Sun, 2011-04-17 at 14:28 -0500, James Bottomley wrote:
  I traced the module loads and successful inits and found it; it's
  pata_cmd64x  ... it loads but never returns from init.  I bet it's
  trying to poke into ISA space which causes the HPMC.
  
  Removing this one module from the system allows it to boot again.
 [...]
 
 We also had a recent report that this driver is also bust on some sparc
 systems.  We could swap back to cmd64x on these architectures but I
 would rather get pata_cmd64x fixed.

Well, I've got a working pata_cm64x (and now a working CD drive).

The specific issue on parisc (and probably sparc) is that we're using
this siimage chip hard wired to a single DVD drive.  We have no use for
a secondary port, so there isn't one.  The registers for the secondary
port are pointing off into empty space.  When libata-sff tries to touch
the secondary port, we get an instant High Priority Machine Check
because on most non-x86 systems, it's a fault to touch non-responding
memory.

I got it to work by making libata-sff only probe a single port.  Now,
here's the problem: the libata-sff driver is hardwired to probe two
ports, so it will require major surgery to check dynamically how many
ports there are ... and the second problem is that I don't even know how
to check this.  I'll ask about this on linux-ide.

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1303083217.2583.14.ca...@mulgrave.site

Bug#622997: libata-sff/pata_cmd64x problem with hardwired configurations

2011-04-17 Thread James Bottomley

I've got a parisc system where the DVD drive is hardwired to a silicon
image controller:

00:02.0 IDE interface: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to
ATA Host Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to ATA
Host Controller
Flags: bus master, medium devsel, latency 64, IRQ 69
I/O ports at 0d18 [size=8]
I/O ports at 0d24 [size=4]
I/O ports at 0d10 [size=8]
I/O ports at 0d20 [size=4]
I/O ports at 0d00 [size=16]
Capabilities: [60] Power Management version 2
Kernel driver in use: pata_cmd64x

The specific problem is that any access to the registers where the
secondary port should be causes an instant fault on the box (I think
because the second port just isn't wired up internally, so the memory
doesn't respond), so the default libata-sff driver that pata_cmd64x is
attached to causes this by insisting on probing both ports.

I can get all of this working by fixing up all the hard coded knowledge
in libata-sff only to use a single port.

However, I can't fix the libata-sff driver until I know how to tell
there's only one port wired.  Does anyone with cmd649 knowledge have any
idea how I might tell this?

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1303084704.2583.23.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley

On Sun, 2011-04-17 at 21:25 -0400, John David Anglin wrote:
 This is excellent detective work.  If I might ask, how did you trace
 the module loads and successful inits?

Heh, you're expecting me to name magic tracing tools?  Well (shuffles
feet) I just put printks in kernel/modules.c to do it.  It's basically
impossible to trace a boot problem like this any other way, because we
don't have enough of the system up to use any tools.

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1303096956.2583.25.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley

On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote:
 I posted this debian bug report because the most recent debian
 SMP kernel build fails to boot on my rp3440:
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997
 
 I don't think debian kernels have worked since lenny.

Hmm, well upstream ones have: so it's likely a patch debian has but
upstream doesn't, or it could be a toolchain issue ... I didn't think
gcc-4.4.5 worked properly on 64 bit without a few patches?

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1302980251.4058.11.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley

On Sat, 2011-04-16 at 15:29 -0400, John David Anglin wrote:
  On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote:
   I posted this debian bug report because the most recent debian
   SMP kernel build fails to boot on my rp3440:
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997
   
   I don't think debian kernels have worked since lenny.
  
  Hmm, well upstream ones have: so it's likely a patch debian has but
  upstream doesn't, or it could be a toolchain issue ... I didn't think
  gcc-4.4.5 worked properly on 64 bit without a few patches?
 
 Yes, but debian tends to build almost everything.  For some reason,
 I've turned off ipv6.  Unlike many kernel bugs, this one is completely
 reproducible.

I suppose it could be USB ... before I got ion, I didn't have any parisc
systems with USB, so it's turned off in my build.  I'll turn it on and
see if there's a problem there.

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1302986887.4058.13.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley

On Sat, 2011-04-16 at 15:48 -0500, James Bottomley wrote:
 On Sat, 2011-04-16 at 15:29 -0400, John David Anglin wrote:
   On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote:
I posted this debian bug report because the most recent debian
SMP kernel build fails to boot on my rp3440:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997

I don't think debian kernels have worked since lenny.
   
   Hmm, well upstream ones have: so it's likely a patch debian has but
   upstream doesn't, or it could be a toolchain issue ... I didn't think
   gcc-4.4.5 worked properly on 64 bit without a few patches?
  
  Yes, but debian tends to build almost everything.  For some reason,
  I've turned off ipv6.  Unlike many kernel bugs, this one is completely
  reproducible.
 
 I suppose it could be USB ... before I got ion, I didn't have any parisc
 systems with USB, so it's turned off in my build.  I'll turn it on and
 see if there's a problem there.

Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts
the OHCI module and discovers the ports just fine.

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1302988479.7967.0.ca...@mulgrave.site

Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley

On Sat, 2011-04-16 at 19:35 -0400, John David Anglin wrote: 
 On Sat, 16 Apr 2011, James Bottomley wrote:
 
  Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts
  the OHCI module and discovers the ports just fine.
 
 Boot 2.6.39-rc3 fails for me with attached config.

I can't quite build it.  With gcc version 4.2.4 (Debian 4.2.4-6) I'm
getting an ICE:

net/wireless/reg.c: In function 'freq_reg_info_regd':
net/wireless/reg.c:645: internal compiler error: in expand_expr_real_1,
at expr.c:8744
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.
For Debian GNU/Linux specific bug reporting instructions,
see URL:file:///usr/share/doc/gcc-4.2/README.Bugs.
make[2]: *** [net/wireless/reg.o] Error 1

Plus there's a bug in my kernel code:

drivers/usb/host/xhci-pci.c: In function 'xhci_pci_setup':
drivers/usb/host/xhci-pci.c:61: error: implicit declaration of function
'kzalloc

If I correct for these (add missing slab.h include and disable wireless)
and build, the last message I see is

turn off boot console ttyB0

Which indicates it's got a problem with the console configuration (I
don't see any console registration for the DIVA serial port on ttyS1 in
the boot log).

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1303016224.5167.7.ca...@mulgrave.site

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley

On Tue, 2010-04-06 at 08:37 -0500, James Bottomley wrote:
  (5) Child process B is waken up and sees old value at x in
 oldpage,
  through different cache line.  B sleeps.
 
 This isn't possible.  at this point, A and B have the same virtual
 address and mapping for oldpage this means they are the same cache
 colour, so they both see the cached value.

Perhaps to add more detail to this.  In spite of what the arch manual
says (it says the congruence stride is 16MB), the congruence stride on
all manufactured parisc processors is 4MB.  This means that any virtual
addresses, regardless of space id, that are equal modulo 4MB have the
same cache colour.

James
 




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1270561481.4493.40.ca...@mulgrave.site

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley

On Tue, 2010-04-06 at 13:57 +0900, NIIBE Yutaka wrote:
 John David Anglin wrote:
  It is interesting that in the case of the Debian bug that
  a thread of the parent process causes the COW break and thereby corrupts
  its own memory.  As far as I can tell, the fork'd child never writes
  to the memory that causes the fault.
 
 Thanks for writing and testing a patch.
 
 The case of #561203 is second scenario.  I think that this case is
 relevant to VIVT-WB machine too (provided kernel does copy by kernel
 address).
 
 James Bottomley wrote:
  So this is going to be a hard sell because of the arch churn. There are,
  however, three ways to do it with the original signature.
 
 Currently, I think that signature change would be inevitable for
 ptep_set_wrprotect.

Well we can't do it by claiming several architectures are wrong in their
implementation.  We might do it by claiming to need vma knowledge ...
however, even if you want the flush, as I said, you don't need to change
the signature.

   1. implement copy_user_highpage ... this allows us to copy through
  the child's page cache (which is coherent with the parent's
  before the cow) and thus pick up any cache changes without a
  flush
 
 Let me think about this way.
 
 Well, this would improve both cases of the first scenario of mine and
 the second scenario.
 
 But... I think that even if we would have copy_user_highpage which
 does copy by user address, we need to flush at ptep_set_wrprotect.  I
 think that we need to keep the condition: no dirty cache for COW page.
 
 Think about third scenario of threads and fork:
 
 (1) In process A, there are multiple threads, and a thread A-1 invokes
 fork.  We have process B, with a different space identifier color.

I don't understand what you mean by space colour ... there's cache
colour which refers to the line in the cache to which the the physical
memory maps.  The way PA is set up, space ID doesn't factor into cache
colour.

 (2) Another thread A-2 in process A runs while A-1 copies memory by
 dup_mmap.  A-2 writes to the address x in a page.  Let's call
 this page oldpage.
 
 (3) We have dirty cache for x by A-2 at the time of
 ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
 here.
 
 (4) A-1 finishes copy, and sleeps.
 
 (5) Child process B is waken up and sees old value at x in oldpage,
 through different cache line.  B sleeps.

This isn't possible.  at this point, A and B have the same virtual
address and mapping for oldpage this means they are the same cache
colour, so they both see the cached value.

James

 (6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
 copies data on oldpage to newpage.  OK, newpage is
 consistent with copy_user_highpage by user address.
 
 Note that during this copy, the cache line of x by A-2 is
 flushed out to oldpage.  It invokes another memory fault and COW
 break.  (I think that this memory fault is unhealthy.)
 Then, new value goes to x on oldpage (when it's physically
 tagged cache).
 
 A-2 sleeps.
 
 (7) Child process B is waken up.  When it accesses at x, it sees new
 value suddenly.
 
 
 If we flush cache to oldpage at ptep_set_wrprotect, this couldn't
 occur.
 
 
   *   *   *
 
 
 I know that we should not do threads and fork.  It is difficult to
 define clean semantics.  Because another thread may touch memory while
 a thread which does memory copy for fork, the memory what the child
 process will see may be inconsistent.  For the child, a page might be
 new, while another page might be old.
 
 For VIVT-WB cache machine, I am considering a possibility for the
 child process to have inconsistent memory even within a single page
 (when we have no flush at ptep_set_wrprotect).
 
 It will be needed for me to talk to linux-arch soon or later.





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1270561069.4493.29.ca...@mulgrave.site

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-05 Thread James Bottomley

On Sun, 2010-04-04 at 22:51 -0400, John David Anglin wrote:
  Thanks a lot for the discussion.
  
  James Bottomley wrote:
   So your theory is that the data the kernel sees doing the page copy can
   be stale because of dirty cache lines in userspace (which is certainly
   possible in the ordinary way)?
  
  Yes.
  
   By design that shouldn't happen: the idea behind COW breaking is
   that before it breaks, the page is read only ... this means that
   processes can have clean cache copies of it, but never dirty cache
   copies (because writes are forbidden).
  
  That must be design, I agree.
  
  To keep this condition (no dirty cache for COW page), we need to flush
  cache before ptep_set_wrprotect.  That's my point.
  
  Please look at the code path:
 (kernel/fork.c)
 do_fork - copy_process - copy_mm - dup_mm - dup_mmap -
 (mm/memory.c)
 copy_page_range - copy_p*d_range - copy_one_pte - ptep_set_wrprotect
  
  The function flush_cache_dup_mm is called from dup_mmap, that's enough
  for a case of a process with single thread.
  I think that:
  We need to flush cache before ptep_set_wrprotect for a process with
  multiple threads.  Other threads may change memory after a thread
  invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
  a process may sleep at pte_alloc function to get a page.
 
 I agree.  It is interesting that in the case of the Debian bug that
 a thread of the parent process causes the COW break and thereby corrupts
 its own memory.  As far as I can tell, the fork'd child never writes
 to the memory that causes the fault.
 
 My testing indicates that your suggested change fixes the Debian
 bug.  I've attached below my latest test version.  This seems to fix
 the bug on both SMP and UP kernels.
 
 However, it doesn't fix all page/cache related issues on parisc
 SMP kernels that I commonly see.
 
 My first inclination after even before reading your analysis was
 to assume that copy_user_page was broken (i.e, that even if a
 processor cache was dirty when the COW page was write protected,
 it should be possible to do the flush before the page is copied).
 However, this didn't seem to work...  Possibly, there are issues
 with aliased addresses.
 
 I note that sparc flushes the entire cache and purges the entire
 tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
 that I see is not limited to PA8800/PA8900, I'm not convinced
 that we maintain coherency that is required for these processors
 in copy_user_page when we have multiple threads.
 
 As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
 pagefault_disable()/pagefault_enable() on PA8800.
 
 Dave
 -- 
 J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
 National Research Council of Canada  (613) 990-0752 (FAX: 
 952-6602)
 
 diff --git a/arch/parisc/include/asm/pgtable.h 
 b/arch/parisc/include/asm/pgtable.h
 index a27d2e2..b140d5c 100644
 --- a/arch/parisc/include/asm/pgtable.h
 +++ b/arch/parisc/include/asm/pgtable.h
 @@ -14,6 +14,7 @@
  #include linux/bitops.h
  #include asm/processor.h
  #include asm/cache.h
 +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long 
 vmaddr, unsigned long pfn);
  
  /*
   * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
 @@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
 *mm, unsigned long addr,
   return old_pte;
  }
  
 -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
 addr, pte_t *ptep)
 +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
 mm_struct *mm, unsigned long addr, pte_t *ptep)
  {
  #ifdef CONFIG_SMP
   unsigned long new, old;
 +#endif
 + pte_t old_pte = *ptep;
 +
 + if (atomic_read(mm-mm_users)  1)

Just to verify there's nothing this is hiding, can you make this 

if (pte_dirty(old_pte))

and reverify?  The if clause should only trip on the case where the
parent has dirtied the line between flush and now.

 + flush_cache_page(vma, addr, pte_pfn(old_pte));
  
 +#ifdef CONFIG_SMP
   do {
   old = pte_val(*ptep);
   new = pte_val(pte_wrprotect(__pte (old)));
   } while (cmpxchg((unsigned long *) ptep, old, new) != old);
  #else
 - pte_t old_pte = *ptep;
   set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
  #endif
  }
 diff --git a/mm/memory.c b/mm/memory.c
 index 09e4b1b..21c2916 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
 *src_mm,
* in the parent and the child
*/
   if (is_cow_mapping(vm_flags)) {
 - ptep_set_wrprotect(src_mm, addr, src_pte);
 + ptep_set_wrprotect(vma, src_mm, addr, src_pte);

So this is going to be a hard sell because of the arch churn. There are,
however, three ways to do it with the original signature.

 1. implement copy_user_highpage ... this allows us

Bug#545229: linux-image-2.6.30-1-parisc: panic on boot

2009-09-05 Thread James Bottomley

Package: linux-image-2.6.30-1-parisc
Version: 2.6.30-6
Severity: critical
Tags: patch
Justification: breaks the whole system



-- Package-specific info:

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (650, 'testing'), (500, 'stable')
Architecture: hppa (parisc)

Kernel: Linux 2.6.26-2-parisc
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.30-1-parisc depends on:
ii  debconf [debconf-2.0] 1.5.27 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.93.4 tools for generating an initramfs
ii  module-init-tools 3.10-3 tools for managing Linux kernel mo

linux-image-2.6.30-1-parisc recommends no packages.

Versions of packages linux-image-2.6.30-1-parisc suggests:
pn  linux-doc-2.6.30  none (no description available)
ii  palo  1.16+nmu1  Linux boot loader for parisc/hppa

-- debconf information:
  linux-image-2.6.30-1-parisc/postinst/kimage-is-a-directory:
  linux-image-2.6.30-1-parisc/postinst/old-initrd-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/lilo-has-ramdisk:
  linux-image-2.6.30-1-parisc/preinst/abort-overwrite-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/postinst/old-system-map-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/failed-to-move-modules-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/prerm/removing-running-kernel-2.6.30-1-parisc: 
true
  linux-image-2.6.30-1-parisc/postinst/bootloader-test-error-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/postinst/create-kimage-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/postinst/depmod-error-initrd-2.6.30-1-parisc: 
false
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.30-1-parisc/preinst/lilo-initrd-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/postinst/old-dir-initrd-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/elilo-initrd-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/overwriting-modules-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/abort-install-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/postinst/bootloader-error-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/preinst/bootloader-initrd-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/postinst/depmod-error-2.6.30-1-parisc: false
  linux-image-2.6.30-1-parisc/preinst/initrd-2.6.30-1-parisc:
  
linux-image-2.6.30-1-parisc/prerm/would-invalidate-boot-loader-2.6.30-1-parisc: 
true

---

All current debian 2.6.30-1 kernels panic on boot on parisc systems when
loading the initial modules.

Problem is actually caused by binutils outputting duplicate .text 
section names.  However, this trips a panic on boot because kernel/modules.c
has insufficient error checking for this case

Patches to fix this are

From 1b364bf438cf337a3818aee77d68c0713f3e1fc4 Mon Sep 17 00:00:00 2001
From: James Bottomley james.bottom...@hansenpartnership.com
Date: Wed, 26 Aug 2009 22:04:12 +0930
Subject: module: workaround duplicate section names

and to fix up that patch

From ea6bff368548d79529421a9dc0710fc5330eb504 Mon Sep 17 00:00:00 2001
From: Ingo Molnar mi...@elte.hu
Date: Fri, 28 Aug 2009 10:44:56 +0200
Subject: modules: Fix build error in the !CONFIG_KALLSYMS case
From 1b364bf438cf337a3818aee77d68c0713f3e1fc4 Mon Sep 17 00:00:00 2001
From: James Bottomley james.bottom...@hansenpartnership.com
Date: Wed, 26 Aug 2009 22:04:12 +0930
Subject: module: workaround duplicate section names

The root cause is a duplicate section name (.text); is this legal?
[ Amerigo Wang: AFAIK, yes. ]

However, there's a problem with commit
6d76013381ed28979cd122eb4b249a88b5e384fa in that if you fail to allocate
a mod-sect_attrs (in this case it's null because of the duplication),
it still gets used without checking in add_notes_attrs()

This should fix it

[ This patch leaves other problems, particularly the sections directory,
  but recent parisc toolchains seem to produce these modules and this
  prevents a crash and is a minimal change -- RR ]

Signed-off-by: James Bottomley james.bottom...@suse.de
Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Tested-by: Helge Deller del...@gmx.de
Signed-off-by: Linus Torvalds torva...@linux-foundation.org
---
 kernel/module.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 07c80e6..eccb561 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2355,7 +2355,8 @@ static noinline struct module *load_module(void __user 
*umod,
if (err  0)
goto unlink;
add_sect_attrs(mod, hdr-e_shnum, secstrings, sechdrs);
-   add_notes_attrs(mod, hdr-e_shnum, secstrings, sechdrs);
+   if (mod-sect_attrs)
+   add_notes_attrs(mod, hdr-e_shnum, secstrings, sechdrs);
 
/* Get rid of temporary copy */
vfree(hdr);
-- 
1.6.0.2

Bug#541702: linux-image-2.6.30-1-686: Kernel fails to start networking because no e100 firmware

2009-08-15 Thread James Bottomley

On Sat, 2009-08-15 at 19:21 +0100, Ben Hutchings wrote:
 On Sat, 2009-08-15 at 10:47 -0700, james.bottom...@hansenpartnership.com
 wrote:
  Package: linux-image-2.6.30-1-686
  Version: 2.6.30-5
  Severity: serious
  Justification: Policy 2.2.1
 
 That very same section explains why we cannot do what you are
 suggesting!

No, it doesn't ... the decision to put firmware-linux in non-free is
obviously wrong, since the same firmware was shipped as is in main with
2.6.26-2

  On upgrade from 2.6.30-2-686 networking (on a remote machine) failed to
  start, meaning that a support ticket had to be opened for KVM access.
 
 I don't recommend running unstable on production machines.

If you bother to read the bug report, you'd see it's actually running
testing.

  Diagnosis revealed that the e100 driver in 2.6.26-2-686 required no
  firmware, so the firmware-linux package wasn't installed.  Apparently
  2.6.30-1-686 was built with external firmware for the e100 so it now
  depends on the firmware-linux package.
  
  This is a serious policy violation because required hardware stops
  working after the upgrade.
 
 No, most systems do not require the firmware-linux package.

That's not really relevant, is it?  linux-image ships with a ton of
drivers most systems don't use as well.

The point is that what was working before the upgrade didn't work after
it.

  Fix suggested is to make 2.6.30-1-686 depend on linux-firmware so that
  on upgrade the necessary firmware is present.
 
 I intend to ensure that firmware-linux is mentioned in the release notes
 for squeeze, but it cannot be recommended or made a dependency.

So this amounts to ... assuming the user can find the notice (because
there's a blizzard of notices that go with each upgrade, particularly if
they're going from lenny - squeeze) you'll tell them that you broke
their system?

The point here is to try and ensure large numbers of systems don't break
before this exits testing for stable.

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#541702: linux-image-2.6.30-1-686: Kernel fails to start networking because no e100 firmware

2009-08-15 Thread James Bottomley

OK, so lets go back to basics here.

The point of a bug report is to report a bug.  The Bug here is that
large numbers of systems will break on upgrade to this kernel once it
hits stable.  This is the problem that needs fixing.

The fact that you find the suggested fix politically incorrect, or that
you don't think I should have been able to find the bug in the first
place are irrelevant to the fact that the bug exists.

Apart from being appallingly bad release practice, breaking a
significant fraction of users on an upgrade is also a debian policy
violation as I've cited (the package is too buggy to release because of
all the breakage).

Trying to describe this as fixed because you'll put it in the release
notes is wrong in principle because it doesn't prevent the existing
users from suffering breakage a priori.

A pre upgrade script that detected the problem based on the runtime
detection that the user needed modules with firmware now in
firmware-linux would be acceptable.  Just stop, print the warning and
allow them to OK or cancel.  The list of modules now requiring firmware
surely isn't non-free and it can be derived from the linux build system
fairly easily.





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#527265: [scott.bai...@eds.com: Bug#527265: linux-image-2.6.29-1-alpha-smp: detached firmware qlogic/1040.bin fails to load for qla1280]

2009-05-06 Thread James Bottomley

On Wed, 2009-05-06 at 16:19 +0200, maximilian attems wrote:
  [4194023.390744] [ cut here ]
  [4194023.448362] WARNING: at 
 /build/buildd-linux-2.6_2.6.29-3-alpha-bvFcox/linux-2.6-2.6.29/debian/build/source_alpha_none/kernel/so

Is there any way we can get what that file and line actually is?  It
looks like the kernel build hasn't truncated the path name to top of
tree for some reason (did you build with non standard options)?

I suspect it might just be a lockdep error about calling request
firmware with interrupts disabled.

Could you also check to see you have this fix in your kernel:

commit 0ce49d6da993adf8b17b7f3ed9805ade14a6a6f3
Author: David Woodhouse david.woodho...@intel.com
Date:   Wed Apr 8 01:22:36 2009 -0700

qla1280: Fix off-by-some error in firmware loading.

James





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: Debian parisc config for 2.6.26 broke the real time clock

2008-09-09 Thread James Bottomley

On Tue, 2008-09-09 at 10:58 -0600, dann frazier wrote:
 On Mon, Sep 08, 2008 at 11:58:22AM -0500, James Bottomley wrote:
  On Mon, 2008-09-08 at 18:37 +0200, Bastian Blank wrote:
   On Sat, Sep 06, 2008 at 10:06:26AM -0500, James Bottomley wrote:
Parisc is a CONFIG_GEN_RTC architecture (we use the generic real time
clock driver).
   
   Well.
   
Starting with 2.6.26, debian is now enabling
CONFIG_RTC_CLASS (for platforms with specific RTC drivers) which
disables CONFIG_GEN_RTC and means that hwclock (and ntp tracking) are
broken on parisc with debian kernels 2.6.26 and above.
   
   Yes. Most arches already needs it anyway.
   
All of the arch/parisc/config files get this right, so someone at debian
must have screwed up somehow.  The config option CONFIG_RTC_CLASS must
be set to 'N' for all parisc systems.
   
   Apparently hppa uses its special rtc type.  I propose that you take a
   look at drivers/rtc/rtc-ppc.c and write a wrapper rtc module.
  
  That may be the way forwards depending on what the RTC developers say,
  but it certainly won't fix the 2.6.26 regression.
  
   I'd suggest checking the debian
kernel configs against the in-tree default files to see if there are any
other cockups like this.
   
   If there are, this are bugs in the kernel themself. I'm no hppa
   developer so I won't waste my time with such.
  
  Hardly, the bug is actually in the debian configs.  You have
  CONFIG_RTC_CLASS as a generic override.  This is wrong, it needs to be
  subordinate to CONIFG_GEN_RTC.  The quick fix would be to move the
  CONFIG_RTC_CLASS sequence down from generic to the architectures ... if
  you're incapable of doing that, I might be able to find time to look at
  doing it for you.
  
  The real bug looks to be the debian config system which relies on
  concatenation ... what's really needed is a way of turning
  CONFIG_RTC_CLASS off on parisc while keeping RTC_CLASS generic for those
  architectures that need it.
 
 fyi, I've got an hppa build in progress - disabling RTC_CLASS causes
 the symbols below to be removed (essentially, ^rtc_*)

Thanks Dann ... are you the person in charge of the builds then?

 I'm not sure what's behind the hppa/ABI removal commits - I didn't see
 this discussed on the list - but I think it should be safe to remove
 these symbols from the ABI files if we can demonstrate that none of
 the conglomerate packages use them.
 
 rtc_class_close
 rtc_class_open
 rtc_device_register
 rtc_device_unregister
 rtc_irq_register
 rtc_irq_set_freq
 rtc_irq_set_state
 rtc_irq_unregister
 rtc_month_days
 rtc_read_alarm
 rtc_read_time
 rtc_set_alarm
 rtc_set_mmss
 rtc_set_time
 rtc_time_to_tm
 rtc_tm_to_time
 rtc_update_irq
 rtc_valid_tm
 rtc_year_days

They certainly have to be inessential to the parisc ABI ... they don't
work if anything's actually trying to use them.

James



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: Debian parisc config for 2.6.26 broke the real time clock

2008-09-09 Thread James Bottomley

On Tue, 2008-09-09 at 19:38 +0200, Bastian Blank wrote:
 On Tue, Sep 09, 2008 at 10:58:35AM -0600, dann frazier wrote:
  fyi, I've got an hppa build in progress - disabling RTC_CLASS causes
  the symbols below to be removed (essentially, ^rtc_*)
 
 The whole new-style rtc support.

But it doesn't work, that's rather the point of all of this.  If it
doesn't work, it can hardly be a committed ABI.

  I'm not sure what's behind the hppa/ABI removal commits - I didn't see
  this discussed on the list - but I think it should be safe to remove
  these symbols from the ABI files if we can demonstrate that none of
  the conglomerate packages use them.
 
 Why does noone write this 30 lines module and fix it for all?

Because only an idiot would fix a bug in a released product by
introducing a new feature: that's QA and release process 101.  New
features belong in the next merge window, which will be for 2.6.28 by
which time it should have been reasonably QA'd by the parisc developers.

James



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: Debian parisc config for 2.6.26 broke the real time clock

2008-09-09 Thread James Bottomley

On Tue, 2008-09-09 at 20:01 +0200, Bastian Blank wrote:
 On Tue, Sep 09, 2008 at 12:48:35PM -0500, James Bottomley wrote:
  They certainly have to be inessential to the parisc ABI ... they don't
  work if anything's actually trying to use them.
 
 Really? Which sort of don't work is this? Why should a I2C rtc device
 (some dallas chip) not work?

Um, because the architecture doesn't have an i2c bus.

James



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: Debian parisc config for 2.6.26 broke the real time clock

2008-09-09 Thread James Bottomley

On Tue, 2008-09-09 at 20:29 +0200, Bastian Blank wrote:
 On Tue, Sep 09, 2008 at 01:12:01PM -0500, James Bottomley wrote:
  On Tue, 2008-09-09 at 20:01 +0200, Bastian Blank wrote:
   On Tue, Sep 09, 2008 at 12:48:35PM -0500, James Bottomley wrote:
They certainly have to be inessential to the parisc ABI ... they don't
work if anything's actually trying to use them.
   Really? Which sort of don't work is this? Why should a I2C rtc device
   (some dallas chip) not work?
  Um, because the architecture doesn't have an i2c bus.
 
 Well, it have USB, so can also power usb-to-i2c adapters. And there is
 even the rtc test module.

Um you mean i2c_tiny_usb?  It doesn't drive any supported hardware ...
you have to build the connection yourself.  Plus only the latest revs of
PA actually supported USB ...

 Which don't work do you refer to?
 - Does not work because there is no binding to the hardware.
 - Does not work because a fundamental problem in the whole subsystem.
 (- Does not work because ...)

Well, like most real world systems, you can artificially construct
pathological failure cases.  If I were you I'd stop looking for the
heath robinson ones.  No-one in their right mind is going to construct a
USB to I2C interface for the purpose of running and I2C RTC; the set of
users is clearly empty.

The way you would get an external RTC is via a more credible interface
like PCI (or EISA/ISA) is from a watchdog card ... however, no-one's
apparently written a RTC interface for any of those yet.

James



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: Debian parisc config for 2.6.26 broke the real time clock

2008-09-08 Thread James Bottomley

On Mon, 2008-09-08 at 18:37 +0200, Bastian Blank wrote:
 On Sat, Sep 06, 2008 at 10:06:26AM -0500, James Bottomley wrote:
  Parisc is a CONFIG_GEN_RTC architecture (we use the generic real time
  clock driver).
 
 Well.
 
  Starting with 2.6.26, debian is now enabling
  CONFIG_RTC_CLASS (for platforms with specific RTC drivers) which
  disables CONFIG_GEN_RTC and means that hwclock (and ntp tracking) are
  broken on parisc with debian kernels 2.6.26 and above.
 
 Yes. Most arches already needs it anyway.
 
  All of the arch/parisc/config files get this right, so someone at debian
  must have screwed up somehow.  The config option CONFIG_RTC_CLASS must
  be set to 'N' for all parisc systems.
 
 Apparently hppa uses its special rtc type.  I propose that you take a
 look at drivers/rtc/rtc-ppc.c and write a wrapper rtc module.

That may be the way forwards depending on what the RTC developers say,
but it certainly won't fix the 2.6.26 regression.

 I'd suggest checking the debian
  kernel configs against the in-tree default files to see if there are any
  other cockups like this.
 
 If there are, this are bugs in the kernel themself. I'm no hppa
 developer so I won't waste my time with such.

Hardly, the bug is actually in the debian configs.  You have
CONFIG_RTC_CLASS as a generic override.  This is wrong, it needs to be
subordinate to CONIFG_GEN_RTC.  The quick fix would be to move the
CONFIG_RTC_CLASS sequence down from generic to the architectures ... if
you're incapable of doing that, I might be able to find time to look at
doing it for you.

The real bug looks to be the debian config system which relies on
concatenation ... what's really needed is a way of turning
CONFIG_RTC_CLASS off on parisc while keeping RTC_CLASS generic for those
architectures that need it.

James



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#476285: linux-image-2.6.24-1-parisc: panics on boot in cmpxchg_futex_value_locked

2008-04-15 Thread James Bottomley

Package: linux-image-2.6.24-1-parisc
Version: 2.6.24-5
Severity: critical
Tags: patch
Justification: breaks the whole system


This actually isn't just a bug in debian, it affects every distro which
uses the stable tree as a base

for instance, the gentoo bug is here:

http://bugs.gentoo.org/show_bug.cgi?id=217030

The panic is:

backtrace:
 [10587970] init+0x20/0xc4
 [105807e0] kernel_init+0xf4/0x328
 [10109c5c] ret_from_kernel_thread+0x1c/0x24


Kernel Fault: Code=26 regs=8fc241c0 (Addr=)

 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 0100 Not tainted
r00-03  0004ff0f 104fc140 10587970 f0412000
r04-07   105b57c0  
r08-11   1059b810 105b5810 104c3810
r12-15  10568810 1059b810 8fc24088 3b9aca00
r16-19  f8c4 f17c f174 
r20-23  4000 07ff 10587950 0001
r24-27     104c6010
r28-31  8fc24000 c99f4bdd 8fc241c0 105807e0
sr00-03     
sr04-07     

IASQ:   IAOQ: 101433b8 101433bc 
 IIR: 0f401089ISR:   IOR:   
 CPU:0   CR30: 8fc24000 CR31:   
 ORIG_R28:  
 IAOQ[0]: cmpxchg_futex_value_locked+0x28/0x9c  
 IAOQ[1]: cmpxchg_futex_value_locked+0x2c/0x9c  
 RP(r2): init+0x20/0xc4 
Kernel panic - not syncing: Kernel Fault   


The root cause is a backport of this commit:

commit a0c1e9073ef7428a14309cba010633a6cd6719ea
Author: Thomas Gleixner [EMAIL PROTECTED]
Date:   Sat Feb 23 15:23:57 2008 -0800

futex: runtime enable pi and robust functionality

To the stable tree (went in for 2.6.24.4).  This breaks parisc because
we weren't set up to process NULL as a futex cmpxchg address.  We
found and fixed the bug upstream as:

commit c20a84c91048c76c1379011c96b1a5cee5c7d9a0
Author: Kyle McMartin [EMAIL PROTECTED]
Date:   Sat Mar 1 10:25:52 2008 -0800

[PARISC] futex: special case cmpxchg NULL in kernel space

but, because we didn't know tglx had requested a backport, the fix
wasn't backported to stable.

I'll send the necessary patch into stable, but to get parisc working
again on debian it has to be applied on top of the current kernel.

NOTE: This bug was introduced into 2.6.24.4; 2.6.24.3 doesn't have it.


-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (650, 'testing')
Architecture: hppa (parisc)

Kernel: Linux 2.6.22-3-parisc
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.24-1-parisc depends on:
ii  debconf [debconf-2.0]1.5.20  Debian configuration management sy
ii  initramfs-tools [linux-initr 0.91e   tools for generating an initramfs
ii  module-init-tools3.3-pre11-4 tools for managing Linux kernel mo

linux-image-2.6.24-1-parisc recommends no packages.

-- debconf information excluded

*** parisc-cmpxchg-fix.diff
From c8d402df60b3aad85b30cfe7df20f829ef6eb895 Mon Sep 17 00:00:00 2001
From: Kyle McMartin [EMAIL PROTECTED]
Date: Sat, 1 Mar 2008 10:25:52 -0800
Subject: [PARISC] futex: special case cmpxchg NULL in kernel space

Commit a0c1e9073ef7428a14309cba010633a6cd6719ea added code to futex.c
to detect whether futex_atomic_cmpxchg_inatomic was implemented at run
time:

+   curval = cmpxchg_futex_value_locked(NULL, 0, 0);
+   if (curval == -EFAULT)
+   futex_cmpxchg_enabled = 1;

This is bogus on parisc, since page zero in kernel virtual space is the
gateway page for syscall entry, and should not be read from the kernel.
(That, and we really don't like the kernel faulting on its own address
 space...)

Signed-off-by: Kyle McMartin [EMAIL PROTECTED]
---
 include/asm-parisc/futex.h |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/asm-parisc/futex.h b/include/asm-parisc/futex.h
index dbee6e6..fdc6d05 100644
--- a/include/asm-parisc/futex.h
+++ b/include/asm-parisc/futex.h
@@ -56,6 +56,12 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, 
int newval)
int err = 0;
int uval;
 
+   /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
+* our gateway page, and causes no end of trouble...
+*/
+   if (segment_eq(KERNEL_DS, get_fs())  !uaddr)
+   return -EFAULT;
+
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
return -EFAULT;
 
@@ -67,5 +73,5 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, 
int newval)
return uval;
 }
 
-#endif
-#endif
+#endif /*__KERNEL__*/
+#endif /*_ASM_PARISC_FUTEX_H*/
-- 
1.5.3.8



-- 
To

Bug#476292: linux-image-2.6.24-1-parisc64: 64 bit kernel panics on boot in handle_interruption

2008-04-15 Thread James Bottomley

Package: linux-image-2.6.24-1-parisc64
Version: 2.6.24-5
Severity: critical
Tags: patch
Justification: breaks the whole system


The parisc 64 bit kernel panics on boot with this:

  CC  net/ipv4/netfilter/iptable_raw.mod.o
  CC  net/ipv4/tcp_diag.mod.o
  CC  net/ipv4/tunnel4.mod.o
  CC  net/ipv4/xfrm4_mode_beet.mod.o
  CC  net/ipv4/xfrm4_tunnel.mod.o
  CC  net/key/af_key.mod.o
  CC  net/llc/llc.mod.o
  CC  net/llc/llc2.mod.o
  CC  net/netfilter/nfnetlink_log.mod.o
  CC  net/netfilter/nfnetlink.mod.o
  CC  net/netfilter/nfnetlink_queue.mod.o
  CC  net/netfilter/xt_CLASSIFY.mod.o
  CC  net/netfilter/x_tables.mod.o
  CC  net/netfilter/xt_DSCP.mod.o
  CC  net/netfilter/xt_MARK.mod.o
  CC  net/netfilter/xt_NFQUEUE.mod.o
  CC  net/netfilter/xt_comment.mod.o
  CC  net/netfilter/xt_dccp.mod.o
  CC  net/netfilter/xt_dscp.mod.o
  CC  net/netfilter/xt_esp.mod.o
  CC  net/netfilter/xt_length.mod.o
  CC  net/netfilter/xt_limit.mod.o
  CC  net/netfilter/xt_mac.mod.o
  CC  net/netfilter/xt_mark.mod.o
  CC  net/netfilter/xt_multiport.mod.o
  CC  net/netfilter/xt_pkttype.mod.o
  CC  net/netfilter/xt_policy.mod.o
  CC  net/netfilter/xt_realm.mod.o
  CC  net/netfilter/xt_sctp.mod.o
  CC  net/netfilter/xt_string.mod.o
  CC  net/netfilter/xt_tcpmss.mod.o
  CC  net/netfilter/xt_tcpudp.mod.o
  CC  net/packet/af_packet.mod.o
  CC  net/sctp/sctp.mod.o
  CC  net/sunrpc/auth_gss/auth_rpcgss.mod.o
  CC  net/sunrpc/auth_gss/rpcsec_gss_krb5.mod.o
  CC  net/sunrpc/auth_gss/rpcsec_gss_spkm3.mod.o
  CC  net/sunrpc/sunrpc.mod.o
  CC  net/tipc/tipc.mod.o
  CC  net/xfrm/xfrm_user.mod.o
  CC  sound/ac97_bus.mod.o
  CC  sound/core/oss/snd-mixer-oss.mod.o
  CC  sound/core/oss/snd-pcm-oss.mod.o
  CC  sound/core/seq/oss/snd-seq-oss.mod.o
  CC  sound/core/seq/snd-seq-device.mod.o
  CC  sound/core/seq/snd-seq-dummy.mod.o
  CC  sound/core/seq/snd-seq-midi-event.mod.o
  CC  sound/core/seq/snd-seq-midi.mod.o
  CC  sound/core/seq/snd-seq.mod.o
  CC  sound/core/snd-hwdep.mod.o
  CC  sound/core/snd-page-alloc.mod.o
  CC  sound/core/snd-pcm.mod.o
  CC  sound/core/snd-rawmidi.mod.o
  CC  sound/core/snd-timer.mod.o
  CC  sound/core/snd.mod.o
  CC  sound/parisc/snd-harmony.mod.o
  CC  sound/pci/ac97/snd-ac97-codec.mod.o
  CC  sound/pci/rme9652/snd-hdspm.mod.o
  CC  sound/pci/snd-ad1889.mod.o
  LD [M]  crypto/aes_generic.ko
  CC  sound/soundcore.mod.o
  LD [M]  crypto/anubis.ko
  LD [M]  crypto/arc4.ko
  LD [M]  crypto/blkcipher.ko
  LD [M]  crypto/blowfish.ko
  LD [M]  crypto/cast5.ko
  LD [M]  crypto/cast6.ko
  LD [M]  crypto/cbc.ko
  LD [M]  crypto/crc32c.ko
  LD [M]  crypto/crypto_null.ko
  LD [M]  crypto/deflate.ko
  LD [M]  crypto/des_generic.ko
  LD [M]  crypto/ecb.ko
  LD [M]  crypto/khazad.ko
  LD [M]  crypto/gf128mul.ko
  LD [M]  crypto/md4.ko
  LD [M]  crypto/md5.ko
  LD [M]  crypto/michael_mic.ko
  LD [M]  crypto/serpent.ko
  LD [M]  crypto/sha256_generic.ko
  LD [M]  crypto/sha512.ko
  LD [M]  crypto/tcrypt.ko
  LD [M]  crypto/tea.ko
  LD [M]  crypto/tgr192.ko
  LD [M]  crypto/twofish.ko
  LD [M]  crypto/twofish_common.ko
  LD [M]  crypto/wp512.ko
  LD [M]  drivers/base/firmware_class.ko
  LD [M]  drivers/block/aoe/aoe.ko
  LD [M]  drivers/block/cryptoloop.ko
  LD [M]  drivers/block/loop.ko
  LD [M]  drivers/block/pktcdvd.ko
  LD [M]  drivers/block/sx8.ko
  LD [M]  drivers/block/ub.ko
  LD [M]  drivers/block/umem.ko
  LD [M]  drivers/cdrom/cdrom.ko
  LD [M]  drivers/char/lp.ko
  LD [M]  drivers/char/agp/parisc-agp.ko
  LD [M]  drivers/char/raw.ko
  LD [M]  drivers/hid/usbhid/usbhid.ko
  LD [M]  drivers/input/keyboard/hil_kbd.ko
  LD [M]  drivers/input/keyboard/hilkbd.ko
  LD [M]  drivers/input/misc/hp_sdc_rtc.ko
  LD [M]  drivers/input/misc/uinput.ko
  LD [M]  drivers/input/mouse/hil_ptr.ko
  LD [M]  drivers/input/mouse/psmouse.ko
  LD [M]  drivers/input/mouse/sermouse.ko
  LD [M]  drivers/input/serio/parkbd.ko
  LD [M]  drivers/input/serio/pcips2.ko
  LD [M]  drivers/input/serio/serio_raw.ko
  LD [M]  drivers/md/dm-crypt.ko
  LD [M]  drivers/input/serio/serport.ko
  LD [M]  drivers/md/dm-emc.ko
  LD [M]  drivers/md/dm-mirror.ko
  LD [M]  drivers/md/dm-mod.ko
  LD [M]  drivers/md/dm-multipath.ko
  LD [M]  drivers/md/dm-round-robin.ko
  LD [M]  drivers/md/dm-snapshot.ko
  LD [M]  drivers/md/dm-zero.ko
  LD [M]  drivers/md/faulty.ko
  LD [M]  drivers/md/linear.ko
  LD [M]  drivers/md/md-mod.ko
  LD [M]  drivers/md/multipath.ko
  LD [M]  drivers/md/raid1.ko
  LD [M]  drivers/md/raid0.ko
  LD [M]  drivers/md/raid10.ko
  LD [M]  drivers/message/fusion/mptbase.ko
  LD [M]  drivers/message/fusion/mptctl.ko
  LD [M]  drivers/message/fusion/mptfc.ko
  LD [M]  drivers/message/fusion/mptsas.ko
  LD [M]  drivers/message/fusion/mptscsih.ko
  LD [M]  drivers/message/fusion/mptspi.ko
  LD [M]  drivers/net/3c59x.ko
  LD [M]

Bug#374792: Dell CERC ATA100/4ch support

2007-05-16 Thread James Bottomley

On Wed, 2007-05-16 at 14:36 +0100, Leigh Blackwell wrote:
 I have been looking at the issue with theses cerc devices, has this 
 bug 374792 been closed based on people reverting the firmware to  6.61. 
 
 Unfortunately Dell doesn't support a Firmware version that old on our 
 Server, is it possible to re-open this bug? I have been unable to get the
 current etch install to recognize my driver controller with any of the
 megaraid drivers.

Umm, but this is a bug in Dell Support isn't it?  I don't think there's
a kernel fix for that.

LSIs position is that in current kernels they only support this device
with the new megaraid driver and only for firmware version = 6.61.
Surely you just need to get Dell and LSI on the same page?

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#391384: linux-image-2.6.18-1-686: Compaq Proliant DL380 fails to boot

2006-10-08 Thread James Bottomley

On Sun, 2006-10-08 at 14:40 -0700, Matt Taggart wrote:
 dann frazier writes...
 
  hey Grant/James,
It looks like we're still having cpqarray/sym2 conflicts under
  2.6.18 - any idea what this problem may be?
 
 This is for dl380. At the very bottom (after the close of the bug) of
 
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=380272
 
 someone suggests a fix for dl380.
 
 jejb/ggg,
 
 Does that look like the right fix?

Er ... you mean the email that I sent pointing to a fix in the
scsi-rc-fixes tree?  Then yes, I think it's a correct fix.  It's already
in 2.6.18

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: [PATCH] MODULE_FIRMWARE for binary firmware(s)

2006-08-28 Thread James Bottomley

This is a reference implementation with the debian mkinitrd-tools
package.  It shows how to identify the firmware files necessary for
drivers in the initrd and also includes a primitive system for loading
them.

I've tested this with the aic94xx driver using the new MODULE_FIRMWARE()
tag.  Initramfs should be much easier because it already includes most
of the boot time loading; all it has to do is the piece identifying the
firmware for the selected modules.

James

---
Index: initrd-tools-0.1.84.1/mkinitrd
===
--- initrd-tools-0.1.84.1.orig/mkinitrd 2006-08-28 13:37:30.0 -0500
+++ initrd-tools-0.1.84.1/mkinitrd  2006-08-28 16:33:28.0 -0500
@@ -950,6 +950,7 @@ add_modules_dep() {
return
elif ! [ $oldstyle ]; then
add_modules_dep_2_5 $VERSION
+   add_firmware $VERSION
return
fi
 
@@ -1016,6 +1017,25 @@ add_modules_dep_2_5() {
fi
 }
 
+add_firmware() {
+   ver=$1
+   set -- $FSTYPES
+   unset IFS
+
+   cat modules.? |
+   while read junk mod junk; do
+   modpath=$(modprobe --set-version $ver --list $mod)
+   if [ -z $modpath ]; then
+   continue;
+   fi
+   p=$(modinfo -F firmware $modpath |sed 
's/^/\/lib\/firmware\//')
+   if [ -n $p ]; then
+   echo $p
+   echo /usr/sbin/firmware_loader
+   fi
+   done
+}
+
 add_command() {
if [ -h initrd/$1 ]; then
return
Index: initrd-tools-0.1.84.1/firmware_loader
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ initrd-tools-0.1.84.1/firmware_loader   2006-08-28 16:56:18.0 
-0500
@@ -0,0 +1,29 @@
+#!/bin/sh -e
+#
+# firmware loader agent
+#
+FIRMWARE_DIRS=/lib/firmware
+
+if [ $SUBSYSTEM != firmware ]; then
+exit 0;
+fi
+
+if [ ! -e /sys/$DEVPATH/loading ]; then
+echo /sys/$DEVPATH/ does not exist
+exit 1
+fi
+
+for DIR in $FIRMWARE_DIRS; do
+[ -e $DIR/$FIRMWARE ] || continue
+echo 1  /sys/$DEVPATH/loading
+cat $DIR/$FIRMWARE  /sys/$DEVPATH/data
+echo 0  /sys/$DEVPATH/loading
+exit 0
+done
+
+# the firmware was not found
+echo -1  /sys/$DEVPATH/loading
+
+echo Cannot find the $FIRMWARE firmware
+exit 1
+
Index: initrd-tools-0.1.84.1/debian/rules
===
--- initrd-tools-0.1.84.1.orig/debian/rules 2006-08-28 16:07:52.0 
-0500
+++ initrd-tools-0.1.84.1/debian/rules  2006-08-28 16:08:56.0 -0500
@@ -35,7 +35,7 @@ install: 
install -o root -g root -m 644 \
echo init linuxrc debian/initrd-tools/usr/share/initrd-tools/
install -o root -g root -m 755 \
-   mkinitrd debian/initrd-tools/usr/sbin/
+   mkinitrd firmware_loader debian/initrd-tools/usr/sbin/
install -o root -g root -m 644 \
mkinitrd.conf modules debian/initrd-tools/etc/mkinitrd/
 ifeq ($(DEB_HOST_ARCH),powerpc)
Index: initrd-tools-0.1.84.1/linuxrc
===
--- initrd-tools-0.1.84.1.orig/linuxrc  2006-08-28 16:30:30.0 -0500
+++ initrd-tools-0.1.84.1/linuxrc   2006-08-28 16:40:45.0 -0500
@@ -10,3 +10,7 @@ echo 256  proc/sys/kernel/real-root-dev
 mount -nt tmpfs tmpfs bin ||
mount -nt ramfs ramfs bin
 echo $root  bin/root
+if [ -x /usr/sbin/firmware_loader ]; then
+   echo /usr/sbin/firmware_loader  /proc/sys/kernel/hotplug
+fi
+
Index: initrd-tools-0.1.84.1/init
===
--- initrd-tools-0.1.84.1.orig/init 2006-08-28 16:54:52.0 -0500
+++ initrd-tools-0.1.84.1/init  2006-08-28 16:55:01.0 -0500
@@ -366,6 +366,7 @@ get_cmdline
 [ -c /dev/.devfsd ]  DEVFS=yes
 
 mount -nt devfs devfs devfs
+mount -nt sysfs sysfs sys
 if [ $IDE_CORE != none ]  [ -n $ide_options ]; then
echo modprobe -k $IDE_CORE options=\$ide_options\
modprobe -k $IDE_CORE options=$ide_options



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: [PATCH] MODULE_FIRMWARE for binary firmware(s)

2006-08-28 Thread James Bottomley

On Tue, 2006-08-29 at 01:04 +0200, Sven Luther wrote:
 Notice that mkinitrd-tools is dead, and will probably be removed from etch.
 
 mkinitramfs-tools and yaird are the two currently used tools.

Yes ... I'm aware of that.  That's why this is a reference
implementation.  initramfs should be easier ... I just don't have any
initramfs systems at the moment, so I did what I had and could verify.

James



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#380272: kernel-image-2.6-686-smp: cpqarray module fails to detect arrays

2006-08-18 Thread James Bottomley

On Fri, 2006-08-18 at 12:39 -0400, Kyle McMartin wrote:
 The problem is because they both claim support for the same PCI Ids:

That's this fix, isn't it?

http://www.kernel.org/git/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=b2b3c121076961333977f485f0d54c22121df920

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-28 Thread James Bottomley

On Sun, 2005-11-20 at 21:21 -0500, Graham Knap wrote:
 Sure enough, the kernel now boots. I'll attach the dmesg output here.
 
 Do you guys have a final patch in mind?
 
 Let me know if there are other tests you'd like me to run. Now that I
 know how to do this, I should be able to turn around test results
 fairly quickly.

OK, try the attached.  If it works out, I'll soak it in -mm for a while
and then try to put it in as a bug fix for 2.6.15.

James

diff --git a/drivers/scsi/scsi_transport_spi.c 
b/drivers/scsi/scsi_transport_spi.c
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -812,12 +812,10 @@ spi_dv_device_internal(struct scsi_devic
if (!scsi_device_sync(sdev)  !scsi_device_dt(sdev))
return;
 
-   /* see if the device has an echo buffer.  If it does we can
-* do the SPI pattern write tests */
-
-   len = 0;
-   if (scsi_device_dt(sdev))
-   len = spi_dv_device_get_echo_buffer(sdev, buffer);
+   /* len == -1 is the signal that we need to ascertain the
+* presence of an echo buffer before trying to use it.  len ==
+* 0 means we don't have an echo buffer */
+   len = -1;
 
  retry:
 
@@ -840,11 +838,23 @@ spi_dv_device_internal(struct scsi_devic
if (spi_min_period(starget) == 8)
DV_SET(pcomp_en, 1);
}
+   /* Do the read only INQUIRY tests */
+   spi_dv_retrain(sdev, buffer, buffer + sdev-inquiry_len,
+  spi_dv_device_compare_inquiry);
+   /* See if we actually managed to negotiate and sustain DT */
+   if (i-f-get_dt)
+   i-f-get_dt(starget);
+
+   /* see if the device has an echo buffer.  If it does we can do
+* the SPI pattern write tests.  Because of some broken
+* devices, we *only* try this on a device that has actually
+* negotiated DT */
+
+   if (len == -1  spi_dt(starget))
+   len = spi_dv_device_get_echo_buffer(sdev, buffer);
 
-   if (len == 0) {
+   if (len = 0) {
starget_printk(KERN_INFO, starget, Domain Validation skipping 
write tests\n);
-   spi_dv_retrain(sdev, buffer, buffer + len,
-  spi_dv_device_compare_inquiry);
return;
}
 




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-13 Thread James Bottomley

On Sun, 2005-11-13 at 12:41 -0500, Doug Ledford wrote:
 If the drive is unaccessible after the DV failure, even on a warm reboot 
 (which includes a SCSI bus reset), then the drive is flat hung. 
 Something done in the current code is breaking it.  Can you get a boot 
 with DV turned off and capture the log messages and post them here 
 please?  You already said it didn't help with the problem, but I'd like 
 to see the failure scenario with it off, that might help determine the 
 true root cause of the issue.

Yes, you're right ... the sequencer code seems to identify the
WRITE_BUFFER as the failing command.  Can you try with the attached
patch, which will force DV to ignore the echo buffer write tests?

Thanks,

James

diff --git a/drivers/scsi/scsi_transport_spi.c 
b/drivers/scsi/scsi_transport_spi.c
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -816,8 +816,10 @@ spi_dv_device_internal(struct scsi_devic
 * do the SPI pattern write tests */
 
len = 0;
+#if 0
if (scsi_device_dt(sdev))
len = spi_dv_device_get_echo_buffer(sdev, buffer);
+#endif
 
  retry:
 




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-13 Thread James Bottomley

On Sun, 2005-11-13 at 13:03 -0500, Graham Knap wrote:
 Doug Ledford [EMAIL PROTECTED] wrote:
  You already said it didn't help with the problem, 
 
 I meant that I don't think I successfully disabled DV, because the boot
 messages were *identical*, except for the line where the kernel shows
 the Kernel command line.
 
 I had added this argument at the end of the line:  aic7xxx=dv:{0}
 
 I've re-read aic7xxx.txt and I'm not sure what I'm doing wrong. If
 you can tell me how to disable DV, I'd be happy to give it a try.

aic7xxx.txt is out of date.  The aic7xxx (and 79xx) drivers use the
generic domain validation code now rather than the old aic specific ones
(which is what the dv:{0} option is referring to).  If you try the code
in the prior email, I think that will disable the piece of DV that's
causing the problem.

If the test code succeeds, the problem is pretty nasty:  Apparently the
device claims DT support but in fact rejects DT in the negotiation.  We
use DT support to begin the check for an echo buffer, which starts with
READ_BUFFERS for the descriptor.  Apparently this device returns a valid
descriptor with a reasonable echo buffer size and then promptly throws a
wobbly when we try to use it.

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-13 Thread James Bottomley

On Sun, 2005-11-13 at 14:42 -0500, Doug Ledford wrote:
 The device is on a non-LVD bus.  Certain devices were created back when 
 the spec still stated that using PPR negotiation messages on a non-LVD 
 bus was a no-no.  As the echo buffer was an addition to support DV, and 
 originally DV wasn't intended to be used on non-LVD busses, it might 
 stand to reason that this device simply is going tits up because we are 
 attempting to use the echo buffer while in SE mode.  Checking that 
 PPR/DT is valid (not just between controller and device, but also given 
 bus mode) and only using echo buffer DV when all LVD conditions are met 
 would likely solve the problem (assuming that the problem is what you 
 are referring to).

I think so (pending confirmation of the patch working).  The current DV
code assumes that if the device claims DT support in the INQUIRY data
*and* it returns a valid descriptor to the READ_BUFFER descriptors
command then enhanced DV should be attempted.

What I'm contemplating doing (which is what you also suggest) is
tightening up the check so if the standard DV read tests produce a
negotiation that doesn't set DT then we won't attempt enhanced DV

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-12 Thread James Bottomley

On Tue, 2005-11-08 at 20:47 -0500, Graham Knap wrote:
 Target 0 Negotiation Settings
 User: 40.000MB/s transfers (20.000MHz, offset 127, 16bit)
 Goal: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
 Curr: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)

That's a bit unfortunate ... it shows that the domain validation code
negotiated identical settings in the old kernel, so it doesn't look like
that's the problem.

My best guess would be that the bus is slightly marginal.  The aic7xxx
drivers are notoriously sensitive to bus problems.  Could you try
lowering the bus speed to 10MHz in the aic7xxx bios and see if that
helps?

Thanks,

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-08 Thread James Bottomley

On Tue, 2005-11-08 at 12:31 +0900, Horms wrote:
 On Mon, Nov 07, 2005 at 09:45:23PM -0500, Graham Knap wrote:
  Package: linux-image-2.6.14-1-686
  Version: 2.6.14-2
  
  Recent versions of the aic7xxx driver will not boot on my secondary PC.
  The 2.6.8 kernel shipped with sarge works perfectly, but neither the
  2.6.12 kernel in testing nor the 2.6.14 kernel in unstable will boot.
  
  This is an older system: 
  Asus P2L-B, Celeron 500MHz, 384MB RAM, GeForce2 MX AGP
  Adaptec 2940UW, IBM DDYS-T09170 (9GB disk)
  
  I can't understand what exactly is failing, but I will attach a boot
  log. (So null modem cables *are* still useful for something!)
  
  I've tried adding aic7xxx=dv:{0} to the boot arguments but that
  doesn't seem to make a difference. Also, aic7xxx=verbose doesn't seem
  to do anything either.
  
  I don't know if this makes a difference but my 2940UW reports its BIOS
  revision as 1.34.3 during POST.
  
  Any help would be much appreciated.
 
 Hi Graham, 
 
 thanks for your detailed report. This does smell a lot like a driver
 bug, and as such, its proably best passed onto the upstream maintainers.
 As such I've CCed James Bottomley and linux-scsi for comment.
 
 The other main possiblility, is that perhaps the aic7xxx_old driver would
 work. Or perhaps some other module loading foo, though its seems the
 module is loaded fine, it just doesn't like your card very much.

This is an older drive, so it looks like it passes domain validation
(read only) but then chokes on the next command.  On 2.6.8, what do the
transport settings report? (that's cat /proc/scsi/aic7xxx/0)?

Thanks,

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

41 matches

Mail list logo