A few weeks ago I had a scare when a reboot paniced the kernel with a complaint 
that it could not find the root device (/dev/sde), and further reboots couldn't 
even see the USB keyboard.  Leavng the system powered off overnight "fixed" the 
problem and the system has been working fine ever since.

I have since had some time to explore this and find it related to the kernel; 
3.6.10 works fine, while 3.7.1 fails.  If I reset during the 3.7.1 boot while 
it is spewing its error messages, but before the kernel ultimately panics, I 
can reboot with 3.6.10, but if 3.7.1 goes all the way to the panic, I have to 
power off and wait a few minutes before a 3.6.10 reboot is succesful.  This is 
repeatable, but I haven't bothered to see how long the system must be off; "a 
few minutes" is enough.

This is a ~amd64 system, dual Opterons, Tyan S2882, Thunder K8S Pro.  The dmesg 
times here start around 30 seconds because it spends 15 seconds on each of two 
SCSI hosts probing for nonexistent drives.  udev etc are all frozen pre-systemd 
nonsense.  Disks are two SSDs, two 4T drives, two 300G drives, and one 320G 
IDE/PATA drive; the main board is so old that there are only three boot 
options: IDE, DVD, network.

There are two error messages during the 3.7.1 boot, repeated for all SATA 
drives:

    ata5.00: qc timeout (cmd 0x2f)
    ata5.00: failed to set xfermode (err_mask=0x40)

Google does not enlighten me.  One suggestion was change the SATA cable, but 
this is definitely a change from 3.6.10 to 3.7.1.

So here are some details ... You can see everything at 
https://www.dropbox.com/sh/o8j80rps3agvvcf/FBjJLcykRS

I am willing to try reasonable config changes for a new reboot attempt, but it 
is my main home server, not an experimental toy :-)

================ dmesg differences

I took some pictures during the boot process and transcribed the results.  The 
3.6.10 dmesg matches, but of course I can't get a 3.7.1 dmesg.

Both 3.6.10 and 3.7.1 appear to be the same up to this point:

    ata13.00: ATA-8: WDC WD3200AAJB-00J3A0, 01.03E01, max UDMA/133
    ata13.00: 625142448 sectors, multi 16: LBA48
    ata13.00: configured for UDMA/133
    ata1: SATA link down (SStatus 0 SControl 300)
    ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
    ata9.00: ATA-9: M4-CT512M4SD2, 000F, max UDMA/100
    ata9.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 0/32)
    ata9.00: configured for UDMA/100
    ata2: SATA link down (SStatus 0 SControl 300)
    ata3: SATA link down (SStatus 0 SControl 300)
    ata4: SATA link down (SStatus 0 SControl 300)
    ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    ata5.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133
    ata5.00: 586114704 sectors, multi 0: LBA48 NCQ (not used)

Around here 3.6.10 begins scrolling so fast that I could not get any pictures, 
so this is from the 3.6.10 dmesg, where it diverges from 3.7.1:

    ata5.00: configured for UDMA/133
    scsi 6:0:0:0: Direct-Access     ATA      Maxtor 6B300S0   BANC PQ: 0 ANSI: 5
    sd 6:0:0:0: [sda] 586114704 512-byte logical blocks: (300 GB/279 GiB)
    sd 6:0:0:0: [sda] Write Protect is off
    sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00
    sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
     sda:
    sd 6:0:0:0: [sda] Attached SCSI disk
    ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    ata6.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133
    ata6.00: 586114704 sectors, multi 0: LBA48 NCQ (not used)
    ata6.00: configured for UDMA/133
    scsi 7:0:0:0: Direct-Access     ATA      Maxtor 6B300S0   BANC PQ: 0 ANSI: 5
    sd 7:0:0:0: [sdb] 586114704 512-byte logical blocks: (300 GB/279 GiB)
    sd 7:0:0:0: [sdb] Write Protect is off
    sd 7:0:0:0: [sdb] Mode Sense: 00 3a 00 00
    sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
     sdb: unknown partition table
    sd 7:0:0:0: [sdb] Attached SCSI disk
    .... and on and on until it boots.  (The unknown partition table is an LVM 
volume.)

But 3.7.1 pokes along slowly enough while generating its errors that I did get 
some pictures to transcribe, and this is where it diverges from 3.6.10.

    ata5.00: qc timeout (cmd 0x2f)
    ata5.00: failed to set xfermode (err_mask=0x40)
    ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    ata5.00: qc timeout (cmd 0x2f)
    ata5.00: failed to set xfermode (err_mask=0x40)
    ata5: limiting SATA link speed to 1.5 Gbps
    ata5.00: limiting speed to UDMA/133:PIO3
    ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
    ata5.00: qc timeout (cmd 0x2f)
    ata5.00: failed to set xfermode (err_mask=0x40)
    ata5.00: disabled
    ata5: hard resetting link
    ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
    ata5: EH complete
    ... for all ATA drives until it eventually panics because the root device, 
/dev/sde, is not found.


================ 3.6.10 ---> 3.7.1 conf changes

I rebuilt the 3.7.1 kernel and logged all the new config items.

Cputime accounting
> 1. Simple tick based cputime accounting (TICK_CPU_ACCOUNTING) (NEW)
  2. Fine granularity task level IRQ time accounting (IRQ_TIME_ACCOUNTING)
choice[1-2]:

Consider userspace as in RCU extended quiescent state (RCU_USER_QS) [N/y/?] 
(NEW) 

  Module signature verification (MODULE_SIG) [N/y/?] (NEW) 

Supervisor Mode Access Prevention (X86_SMAP) [Y/n/?] (NEW) n

  Legacy cpb sysfs knob support for AMD CPUs (X86_ACPI_CPUFREQ_CPB) [Y/n/?] 
(NEW) 

Enable core dump support (COREDUMP) [Y/n/?] (NEW) 

  Packet: sockets monitoring interface (PACKET_DIAG) [N/m/y/?] (NEW) m

    IPv4 NAT (NF_NAT_IPV4) [N/m/?] (NEW) m

OMAP OCP2SCP DRIVER (OMAP_OCP2SCP) [N/m/y/?] (NEW) m

      Calxeda Highbank SATA support (SATA_HIGHBANK) [N/m/y/?] (NEW) m

    Virtual eXtensible Local Area Network (VXLAN) (VXLAN) [N/m/y/?] (NEW) m

      Solarflare SFC9000-family PTP support (SFC_PTP) [Y/n/?] (NEW) 

    Microchip MRF24J40 transceiver driver (IEEE802154_MRF24J40) [N/m/?] (NEW) m

  8250/16550 PNP device support (SERIAL_8250_PNP) [Y/n/?] (NEW) 

MAX310X support (SERIAL_MAX310X) [N/y/?] (NEW) 

SCCNXP serial port support (SERIAL_SCCNXP) [N/m/y/?] (NEW) m

TPM HW Random Number Generator support (HW_RANDOM_TPM) [M/n/?] (NEW) 

  TPM Interface Specification 1.2 Interface (I2C - Infineon) 
(TCG_TIS_I2C_INFINEON) [N/m/?] (NEW) m

  NXP SC18IS602/602B/603 I2C to SPI bridge (SPI_SC18IS602) [N/m/y/?] (NEW) m

  Dialog DA9052 GPIO (GPIO_DA9052) [N/m/y/?] (NEW) m

  TWL6040 GPO (GPIO_TWL6040) [N/m/y/?] (NEW) m

OMAP HDQ driver (HDQ_MASTER_OMAP) [N/m/?] (NEW) m

  Marvell 88PM860x battery driver (BATTERY_88PM860X) [N/m/y/?] (NEW) m

  Dialog DA9052 Battery (BATTERY_DA9052) [N/m/y/?] (NEW) m

  Marvell 88PM860x Charger driver (CHARGER_88PM860X) [N/m/?] (NEW) m

  Analog Devices ADT7410 (SENSORS_ADT7410) [N/m/?] (NEW) m

  Maxim MAX197 and compatibles (SENSORS_MAX197) [N/m/?] (NEW) m

  generic cpu cooling support (CPU_THERMAL) [N/y/?] (NEW) 

Support for the SMSC ECE1099 series chips (MFD_SMSC) [N/y/?] (NEW) 

Dialog Semiconductor DA9055 PMIC Support (MFD_DA9055) [N/y/?] (NEW) 

Texas Instruments LP8788 Power Management Unit Driver (MFD_LP8788) [N/y/?] 
(NEW) 

Maxim Semiconductor MAX8907 PMIC Support (MFD_MAX8907) [N/m/y/?] (NEW) m

  Fairchild FAN53555 Regulator (REGULATOR_FAN53555) [N/m/y/?] (NEW) m

  Maxim 8907 voltage regulator (REGULATOR_MAX8907) [N/m/?] (NEW) m

  TechnoTrend USB IR Receiver (IR_TTUSBIR) [N/m/?] (NEW) m

Media USB Adapters (MEDIA_USB_SUPPORT) [N/y/?] (NEW) y

  STK1160 USB video capture support (VIDEO_STK1160) [N/m/?] (NEW) m

    STK1160 AC97 codec support (VIDEO_STK1160_AC97) [N/y/?] (NEW) y

  Support for various USB DVB devices v2 (DVB_USB_V2) [N/m/?] (NEW) m

    Enable debug for the B2C2 FlexCop drivers (DVB_B2C2_FLEXCOP_USB_DEBUG) 
[N/y/?] (NEW) 

Media PCI Adapters (MEDIA_PCI_SUPPORT) [N/y/?] (NEW) 

Media test drivers (V4L_TEST_DRIVERS) [N/y] (NEW) 

ISA and parallel port devices (MEDIA_PARPORT_SUPPORT) [N/y/?] (NEW) 

  Autoselect tuners and i2c modules to build (MEDIA_SUBDRV_AUTOSELECT) [Y/n/?] 
(NEW) 

  Maximum debug level (NOUVEAU_DEBUG) [5] (NEW) 

  Default debug level (NOUVEAU_DEBUG_DEFAULT) [3] (NEW) 

    Backlight Driver for LM3630 (BACKLIGHT_LM3630) [N/m/y/?] (NEW) m

    Backlight Driver for LM3639 (BACKLIGHT_LM3639) [N/m/y/?] (NEW) m

    TPS65217 Backlight (BACKLIGHT_TPS65217) [N/m/?] (NEW) m

  Default time-out for HD-audio power-save mode (SND_HDA_POWER_SAVE_DEFAULT) 
[0] (NEW) 

  CIR via RC class (HID_PICOLCD_CIR) [N/y/?] (NEW) 

Sony PS3 BD Remote Control (HID_PS3REMOTE) [N/m/?] (NEW) m

HID Sensors framework support (HID_SENSOR_HUB) [N/m/?] (NEW) m

  ZTE USB serial driver (USB_SERIAL_ZTE) [N/m/?] (NEW) m

  OMAP USB2 PHY Driver (OMAP_USB2) [N/m/y/?] (NEW) m

  LED support for LM3642 Chip (LEDS_LM3642) [N/m/y/?] (NEW) m

  LED support for LM355x Chips, LM3554 and LM3556 (LEDS_LM355x) [N/m/y/?] (NEW) 
m

  LED CPU Trigger (LEDS_TRIGGER_CPU) [N/y/?] (NEW) 

  Dynamic compression of swap pages and clean pagecache pages (ZCACHE2) [N/y/?] 
(NEW) 

  Silicom devices (NET_VENDOR_SILICOM) [Y/n/?] (NEW) 

    Silicom BypassCTL library support (SBYPASS) [N/m/?] (NEW) m

    Silicom BypassCTL net support (BPCTL) [N/m/?] (NEW) m

  Cambridge Electronic Design 1401 USB support (CED1401) [N/m/?] (NEW) m

  Digi Realport driver (DGRP) [N/m/y/?] (NEW) m

STE-Modem remoteproc support (STE_MODEM_RPROC) [N/m/y/?] (NEW) m

    SMB2 network file system support (EXPERIMENTAL) (CIFS_SMB2) [N/y/?] (NEW) 

RCU debugging: preemptible RCU race provocation (PROVE_RCU_DELAY) [N/y/?] (NEW) 

Red-Black tree test (RBTREE_TEST) [N/m/?] (NEW) m

Interval tree test (INTERVAL_TREE_TEST) [N/m/?] (NEW) m

  CAST5 (CAST-128) cipher algorithm (x86_64/AVX) (CRYPTO_CAST5_AVX_X86_64) 
[N/m/y/?] (NEW) m

  CAST6 (CAST-256) cipher algorithm (x86_64/AVX) (CRYPTO_CAST6_AVX_X86_64) 
[N/m/y/?] (NEW) m

  Asymmetric (public-key cryptographic) key type (ASYMMETRIC_KEY_TYPE) 
[N/m/y/?] (NEW) m

    Asymmetric public-key crypto algorithm subtype 
(ASYMMETRIC_PUBLIC_KEY_SUBTYPE) [N/m/?] (NEW) m

      RSA public-key algorithm (PUBLIC_KEY_ALGO_RSA) [N/m/?] (NEW) m

      X.509 certificate parser (X509_CERTIFICATE_PARSER) [N/m/?] (NEW) m

-- 
            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
     Felix Finch: scarecrow repairman & rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o

Reply via email to