I have an old tower which I use to test multiple operating systems. Each OS lives on a separate drive in a removable tray, so the drives can be swapped as needed. Once in a while the system would hang when the BIOS was set to auto-detect the drives at every boot, or I would see an occasional failure to mount the ATA boot device when Linux was started in verbose mode--and Windows would simply freeze randomly. The problem was traced to the power connector on a drive tray: I had to extract the pins from the connector with a special tool, cut off the wires, soak the pins in contact cleaner, and solder them back on, because the crimped connection and the corrosion made it unreliable.
http://en.wikipedia.org/wiki/Molex_connector#Disk_drive_connector_.28AMP_MATE-N-LOK_1-480424-0_Power_Connector.29 http://www.molex.com/molex/products/family?key=disk_drive_power_connector&channel=PRODUCTS&chanName=family&pageTitle=Introduction I never had a problem with these connectors before, except for the ones in the Enermax trays (which seem to be made of the cheapest materials they could find.) Before I repaired the power connector, I encountered that read-only bug in Ubuntu. When this occurred, ALL physical volumes attached to the machine became read-only, including other hard drives and all external USB storage devices. Even new USB devices attached later were not writable. The only thing I could write to was a network share. If this happens on all affected platforms, it might give developers some idea of what to look for in the source code. I also wonder if some power management feature could be involved: GRUB_CMDLINE_LINUX="libata.dma=0 libata.noacpi=1" http://ubuntuforums.org/showthread.php?t=1892483 I believe this bug can be triggered by other things too, such as system BIOS bug or AHCI preference, drive firmware bug, defective electrolytic capacitors on a old mainboard, bad solder joints just about anywhere, a defective (or overloaded) power supply. But in the case of SSD drives it could also be a latency issue: Why Solid-State Drives Slow Down As You Fill Them Up (Ubuntu should warn about this) "When filling up an empty drive, they found high write performance very early in the process and a significant drop as the write operations continued to fill up the drive... If you have a solid-state drive, you should try to avoid using more than 75% of its capacity." http://www.howtogeek.com/165542/why-solid-state-drives-slow-down-as-you-fill-them-up/ (for general reference on dual-boot systems): 12 Things You Must Do When Running a Solid State Drive in Windows 7 http://www.maketecheasier.com/12-things-you-must-do-when-running-a-solid-state-drive-in-windows-7/ I suspect that people who experience read-only issues today were experiencing silent write retries in previous kernel versions and simply did not notice because the retry was successful. It seems like the common thread is that the drive was not ready to accept writes for some reason, and the kernel did not detect this condition. I tried to simulate this by removing power to the drive momentarily. During this time, CPU usage was very high, but it returned to normal when power was applied, and the read-only bug was not triggered. On various other platforms I have seen S.M.A.R.T. drives which are NOT defective logging an "Interface CRC error" when a 'READ DMA EXT' command was issued, due to a cable or connector fault. When the drive was moved to another system, the errors stopped. So the drive is not necessarily failing just because you see the error count going up. I think that a S.M.A.R.T. status monitor should be included with the base installation: the S.M.A.R.T. feature is not only useful to diagnose faults within the drive, it sometimes permits you to infer something about the quality of the power & data connection over time. If you can consistently correlate some particular S.M.A.R.T. error code with the behavior that causes the volume to turn read-only, then you may have found a way to distinguish a cable fault from a kernel or firmware bug, and the OS could use it to generate more helpful error messages. So it might be good to report which (if any) of the drives S.M.A.R.T. counters were incremented when you experience that read-only problem. I am not too familiar with the specifications, but developers might also want to investigate the possibility of using the System Management bus or Power Management bus to assist in characterizing these failures if the platform collects any useful information. For those who solved the problem by disabling NCQ: there was an NCQ drive blacklist for the Linux kernel until (I believe) 2.6.24. This implies some incompatibility with particular models. "there are drives with firmware bugs that deliberately lie about when data has been physically written." http://serverfault.com/questions/460864/safety-of-write-cache-on-sata-drives-with-barriers _____ "One little-known feature of NCQ is that the host can specify whether it wants to be notified of completion when the data hits the disk's platters or when it hits the disk's buffer (on-board cache)." (Does the kernel do this correctly?) "NCQ can negatively interfere with the operating system's I/O scheduler, actually decreasing performance; this has been observed in practice on Linux with RAID-5. There is no mechanism in NCQ for the host to specify any sort of deadlines for an I/O, like how many times a request can be ignored in favor of others. In theory, a NCQ-ed request can be delayed by the drive an arbitrary amount of time while it is serving other (possibly new) requests under I/O pressure. Since the algorithms used inside drive firmware for NCQ dispatch ordering are generally not publicly known, this introduces another level of uncertainty for hardware/firmware performance. Tests at Google around 2008 have shown that NCQ can delay an I/O for up to 1-2 seconds." http://en.wikipedia.org/wiki/Native_Command_Queuing _____ Test if NCQ is enabled: dmesg | grep -i ncq Write-protect & cache status: dmesg | grep sda _____ Operational theory / Educational resources: Modern disk write caches and how they get dealt with http://utcc.utoronto.ca/~cks/space/blog/tech/ModernDiskWriteCaches How to force a disk write cache flush operation on Linux http://utcc.utoronto.ca/~cks/space/blog/linux/ForceDiskFlushes -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1063354 Title: [Dell Studio XPS 1640] Sudden Read-Only Filesystems Status in “linux” package in Ubuntu: Incomplete Bug description: After upgrading to ubuntu 12.10, I experience sudden locks of my filesystems (I have a root and a home partition with ext4), in which the filesystems suddenly become mounted readonly. /var/log/syslog shows the following entries: Oct 7 20:00:42 StudioXPS signond[3510]: signondaemon.cpp 345 init Failed to SUID root. Secure storage will not be available. Oct 7 20:02:12 StudioXPS kernel: [ 249.193555] ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0 Oct 7 20:02:12 StudioXPS kernel: [ 249.193561] ata1.00: irq_stat 0x40000001 Oct 7 20:02:12 StudioXPS kernel: [ 249.193565] ata1.00: failed command: READ FPDMA QUEUED Oct 7 20:02:12 StudioXPS kernel: [ 249.193572] ata1.00: cmd 60/20:00:90:6f:53/00:00:1a:00:00/40 tag 0 ncq 16384 in Oct 7 20:02:12 StudioXPS kernel: [ 249.193572] res 41/40:20:98:6f:53/00:00:1a:00:00/40 Emask 0x409 (media error) <F> Oct 7 20:02:12 StudioXPS kernel: [ 249.193575] ata1.00: status: { DRDY ERR } Oct 7 20:02:12 StudioXPS kernel: [ 249.193578] ata1.00: error: { UNC } Oct 7 20:02:12 StudioXPS kernel: [ 249.193581] ata1.00: failed command: WRITE FPDMA QUEUED Oct 7 20:02:12 StudioXPS kernel: [ 249.193587] ata1.00: cmd 61/18:08:18:fb:0e/00:00:2b:00:00/40 tag 1 ncq 12288 out Oct 7 20:02:12 StudioXPS kernel: [ 249.193587] res 41/40:08:98:6f:53/00:00:1a:00:00/40 Emask 0x9 (media error) Oct 7 20:02:12 StudioXPS kernel: [ 249.193590] ata1.00: status: { DRDY ERR } Oct 7 20:02:12 StudioXPS kernel: [ 249.193593] ata1.00: error: { UNC } Oct 7 20:02:12 StudioXPS kernel: [ 249.193596] ata1.00: failed command: WRITE FPDMA QUEUED Oct 7 20:02:12 StudioXPS kernel: [ 249.193602] ata1.00: cmd 61/d8:10:a0:bd:8b/00:00:0d:00:00/40 tag 2 ncq 110592 out Oct 7 20:02:12 StudioXPS kernel: [ 249.193602] res 41/40:08:98:6f:53/00:00:1a:00:00/40 Emask 0x9 (media error) Oct 7 20:02:12 StudioXPS kernel: [ 249.193605] ata1.00: status: { DRDY ERR } Oct 7 20:02:12 StudioXPS kernel: [ 249.193607] ata1.00: error: { UNC } Oct 7 20:02:12 StudioXPS kernel: [ 249.196606] ata1.00: configured for UDMA/100 Oct 7 20:02:12 StudioXPS kernel: [ 249.196622] sd 0:0:0:0: >[sda] Unhandled sense code Oct 7 20:02:12 StudioXPS kernel: [ 249.196624] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196626] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 7 20:02:12 StudioXPS kernel: [ 249.196628] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196629] Sense Key : Medium Error [current] [descriptor] Oct 7 20:02:12 StudioXPS kernel: [ 249.196633] Descriptor sense data with sense descriptors (in hex): Oct 7 20:02:12 StudioXPS kernel: [ 249.196634] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Oct 7 20:02:12 StudioXPS kernel: [ 249.196642] 1a 53 6f 98 Oct 7 20:02:12 StudioXPS kernel: [ 249.196645] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196648] Add. Sense: Unrecovered read error - auto reallocate failed Oct 7 20:02:12 StudioXPS kernel: [ 249.196650] sd 0:0:0:0: >[sda] CDB: Oct 7 20:02:12 StudioXPS kernel: [ 249.196651] Read(10): 28 00 1a 53 6f 90 00 00 20 00 Oct 7 20:02:12 StudioXPS kernel: [ 249.196658] end_request: I/O error, dev sda, sector 441675672 Oct 7 20:02:12 StudioXPS kernel: [ 249.196674] sd 0:0:0:0: >[sda] Unhandled sense code Oct 7 20:02:12 StudioXPS kernel: [ 249.196676] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196678] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 7 20:02:12 StudioXPS kernel: [ 249.196679] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196681] Sense Key : Medium Error [current] [descriptor] Oct 7 20:02:12 StudioXPS kernel: [ 249.196683] Descriptor sense data with sense descriptors (in hex): Oct 7 20:02:12 StudioXPS kernel: [ 249.196684] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Oct 7 20:02:12 StudioXPS kernel: [ 249.196692] 1a 53 6f 98 Oct 7 20:02:12 StudioXPS kernel: [ 249.196695] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196697] Add. Sense: Unrecovered read error - auto reallocate failed Oct 7 20:02:12 StudioXPS kernel: [ 249.196699] sd 0:0:0:0: >[sda] CDB: Oct 7 20:02:12 StudioXPS kernel: [ 249.196700] Write(10): 2a 00 2b 0e fb 18 00 00 18 00 Oct 7 20:02:12 StudioXPS kernel: [ 249.196706] end_request: I/O error, dev sda, sector 722402072 Oct 7 20:02:12 StudioXPS kernel: [ 249.196710] Buffer I/O error on device sda6, logical block 82899555 Oct 7 20:02:12 StudioXPS kernel: [ 249.196718] Buffer I/O error on device sda6, logical block 82899556 Oct 7 20:02:12 StudioXPS kernel: [ 249.196722] Buffer I/O error on device sda6, logical block 82899557 Oct 7 20:02:12 StudioXPS kernel: [ 249.196725] EXT4-fs warning (device sda6): ext4_end_bio:250: I/O error writing to inode 20709582 (offset 0 size 12288 starting block 90300262) Oct 7 20:02:12 StudioXPS kernel: [ 249.196726] JBD2: Detected IO errors while flushing file data on sda6-8 Oct 7 20:02:12 StudioXPS kernel: [ 249.196737] sd 0:0:0:0: >[sda] Unhandled sense code Oct 7 20:02:12 StudioXPS kernel: [ 249.196739] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196740] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 7 20:02:12 StudioXPS kernel: [ 249.196742] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196743] Sense Key : Medium Error [current] [descriptor] Oct 7 20:02:12 StudioXPS kernel: [ 249.196745] Descriptor sense data with sense descriptors (in hex): Oct 7 20:02:12 StudioXPS kernel: [ 249.196746] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Oct 7 20:02:12 StudioXPS kernel: [ 249.196754] 1a 53 6f 98 Oct 7 20:02:12 StudioXPS kernel: [ 249.196758] sd 0:0:0:0: >[sda] Oct 7 20:02:12 StudioXPS kernel: [ 249.196759] Add. Sense: Unrecovered read error - auto reallocate failed Oct 7 20:02:12 StudioXPS kernel: [ 249.196761] sd 0:0:0:0: >[sda] CDB: Oct 7 20:02:12 StudioXPS kernel: [ 249.196762] Write(10): 2a 00 0d 8b bd a0 00 00 d8 00 Oct 7 20:02:12 StudioXPS kernel: [ 249.196768] end_request: I/O error, dev sda, sector 227261856 Oct 7 20:02:12 StudioXPS kernel: [ 249.196781] ata1: EH complete Oct 7 20:02:12 StudioXPS kernel: [ 249.196810] Aborting journal on device sda6-8. Oct 7 20:02:12 StudioXPS kernel: [ 249.197216] EXT4-fs error (device sda6): ext4_journal_start_sb:370: Detected aborted journal Oct 7 20:02:12 StudioXPS kernel: [ 249.197219] EXT4-fs (sda6): Remounting filesystem read-only Oct 7 20:02:13 StudioXPS kernel: [ 250.934678] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30] Oct 7 20:02:13 StudioXPS kernel: [ 250.934691] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000078]) Oct 7 20:02:13 StudioXPS kernel: [ 250.938886] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30] Oct 7 20:02:13 StudioXPS kernel: [ 250.938896] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000050]) Oct 7 20:02:13 StudioXPS kernel: [ 250.939062] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30] Oct 7 20:02:13 StudioXPS kernel: [ 250.939068] ecryptfs_writepage: Error encrypting page (upper index [0x0000000000000000]) Oct 7 20:02:21 StudioXPS kernel: [ 259.082126] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30] Oct 7 20:02:21 StudioXPS kernel: [ 259.082138] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000005]) Oct 7 20:02:21 StudioXPS kernel: [ 259.082257] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30] Oct 7 20:02:21 StudioXPS kernel: [ 259.082262] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000003]) Oct 7 20:02:21 StudioXPS kernel: [ 259.082376] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30] Oct 7 20:02:21 StudioXPS kernel: [ 259.082381] ecryptfs_write_end: Error encrypting page (upper index [0x0000000000000000]) Oct 7 20:05:16 StudioXPS kernel: [ 433.841434] ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-30] Oct 7 20:05:16 StudioXPS kernel: [ 433.841448] ecryptfs_write_end: Error encrypting page (upper index [0x00000000000000c9]) Oct 7 20:07:57 StudioXPS sudo: pam_ecryptfs: pam_sm_authenticate: /home/lars is already mounted The harddrive is one month old and has no defects (AFAIK). The problem arises anywhere between directly after boot and 3h into working. A remount with mount -o remount,rw is not possible and aborted with an error. Since I will most certainly loose data during work, this renders my system unusable for the moment. The problem did not occur when running 12.04. ProblemType: Bug DistroRelease: Ubuntu 12.10 Package: linux-image-3.5.0-17-generic 3.5.0-17.27 ProcVersionSignature: Ubuntu 3.5.0-17.27-generic 3.5.5 Uname: Linux 3.5.0-17-generic x86_64 ApportVersion: 2.6.1-0ubuntu1 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: lars 2341 F.... pulseaudio /dev/snd/controlC0: lars 2341 F.... pulseaudio Date: Sun Oct 7 20:00:11 2012 EcryptfsInUse: Yes InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Beta amd64 (20120926) MachineType: Dell Inc. Studio XPS 1640 ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-17-generic root=UUID=68856248-4726-45a0-84b2-670a468cce31 ro quiet splash RelatedPackageVersions: linux-restricted-modules-3.5.0-17-generic N/A linux-backports-modules-3.5.0-17-generic N/A linux-firmware 1.94 RfKill: 0: phy0: Wireless LAN Soft blocked: no Hard blocked: yes SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 11/19/2009 dmi.bios.vendor: Dell Inc. dmi.bios.version: A12 dmi.board.name: 0W497D dmi.board.vendor: Dell Inc. dmi.board.version: A12 dmi.chassis.type: 8 dmi.chassis.vendor: Dell Inc. dmi.chassis.version: A12 dmi.modalias: dmi:bvnDellInc.:bvrA12:bd11/19/2009:svnDellInc.:pnStudioXPS1640:pvrA123:rvnDellInc.:rn0W497D:rvrA12:cvnDellInc.:ct8:cvrA12: dmi.product.name: Studio XPS 1640 dmi.product.version: A123 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1063354/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp