Re: [PATCH] Avoid that a kernel warning appears during system resume
Hi Bart. Bart Van Assche - 16.03.19, 22:28: > On 3/16/19 3:43 AM, Martin Steigerwald wrote: > > Bart Van Assche - 16.03.19, 00:27: > >> Since scsi_device_quiesce() skips SCSI devices that have another > >> state than RUNNING, OFFLINE or TRANSPORT_OFFLINE, > >> scsi_device_resume() should not complain about SCSI devices that > >> have been skipped. Hence this patch. This patch avoids that the > >> following warning appears> > >> during resume: > > Am I on CC cause one of those warnings appeared in bug reports from > > me from quite some time ago? […] > That's correct. I hope that you don't mind that I cc'ed you? No, I don't mind. Would you like me to apply the patch in order to test it? As it is just about suppressing a kernel warning and not changing any major functionality, I wondered whether you like me to do with this or whether the CC is more JFYI. > >> WARNING: CPU: 3 PID: 1039 at blk_clear_pm_only+0x2a/0x30 > >> CPU: 3 PID: 1039 Comm: kworker/u8:49 Not tainted 5.0.0+ #1 > >> Hardware name: LENOVO 4180F42/4180F42, BIOS 83ET75WW (1.45 ) > > > > This at least does not appear to be this ThinkPad T520, as I have > > BIOS version 1.49 already. > > The call trace in the patch description is only an example. I think > the problem description and the patch applies to all systems that > have one or more SCSI disks. Best, -- Martin
Re: [PATCH] Avoid that a kernel warning appears during system resume
Hi Bart. Bart Van Assche - 16.03.19, 00:27: > Since scsi_device_quiesce() skips SCSI devices that have another state > than RUNNING, OFFLINE or TRANSPORT_OFFLINE, scsi_device_resume() > should not complain about SCSI devices that have been skipped. Hence > this patch. This patch avoids that the following warning appears > during resume: Am I on CC cause one of those warnings appeared in bug reports from me from quite some time ago? > WARNING: CPU: 3 PID: 1039 at blk_clear_pm_only+0x2a/0x30 > CPU: 3 PID: 1039 Comm: kworker/u8:49 Not tainted 5.0.0+ #1 > Hardware name: LENOVO 4180F42/4180F42, BIOS 83ET75WW (1.45 ) This at least does not appear to be this ThinkPad T520, as I have BIOS version 1.49 already. Thanks and have good weekend, Martin > 05/10/2013 Workqueue: events_unbound async_run_entry_fn > RIP: 0010:blk_clear_pm_only+0x2a/0x30 > Call Trace: > ? scsi_device_resume+0x28/0x50 > ? scsi_dev_type_resume+0x2b/0x80 > ? async_run_entry_fn+0x2c/0xd0 > ? process_one_work+0x1f0/0x3f0 > ? worker_thread+0x28/0x3c0 > ? process_one_work+0x3f0/0x3f0 > ? kthread+0x10c/0x130 > ? __kthread_create_on_node+0x150/0x150 > ? ret_from_fork+0x1f/0x30 > > Cc: Christoph Hellwig > Cc: Hannes Reinecke > Cc: Ming Lei > Cc: Johannes Thumshirn > Cc: Oleksandr Natalenko > Cc: Martin Steigerwald > Cc: > Reported-by: Jisheng Zhang > Tested-by: Jisheng Zhang > Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work > reliably") # v4.15 Signed-off-by: Bart Van Assche > > --- > drivers/scsi/scsi_lib.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 324f830ee9fa..54ad751b42b8 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -2541,8 +2541,10 @@ void scsi_device_resume(struct scsi_device > *sdev) * device deleted during suspend) >*/ > mutex_lock(&sdev->state_mutex); > - sdev->quiesced_by = NULL; > - blk_clear_pm_only(sdev->request_queue); > + if (sdev->quiesced_by) { > + sdev->quiesced_by = NULL; > + blk_clear_pm_only(sdev->request_queue); > + } > if (sdev->sdev_state == SDEV_QUIESCE) > scsi_device_set_state(sdev, SDEV_RUNNING); > mutex_unlock(&sdev->state_mutex); -- Martin
Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()
Hi Bart. Bart Van Assche - 16.03.18, 22:51: > On 03/16/18 14:42, Martin Steigerwald wrote: > > What is this one about? > > > > Fix for the regression I (and others?) reported?¹ > > > > [1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data > > during runtime and boot failures with blk_mq_terminate_expired in > > backtrace > > > > https://bugzilla.kernel.org/show_bug.cgi?id=199077 […] > This patch is a fix for the fix for the bug that you and others had > reported. See also "I/O hangs after resuming from suspend-to-ram" > (https://marc.info/?l=linux-block&m=150340235201348). > > This patch fixes an API violation. For those users for which suspend and > resume works fine with commit 3a0a529971ec applied it should still work > fine with this patch applied since this patch may cause > scsi_device_quiesce() to wait longer. Okay, so if I understand you correctly, this is not related to the regression I mentioned above. Testing anyway. Thanks, -- Martin
Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()
Hello Bart. What is this one about? Fix for the regression I (and others?) reported?¹ [1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and boot failures with blk_mq_terminate_expired in backtrace https://bugzilla.kernel.org/show_bug.cgi?id=199077 Thanks, Martin Bart Van Assche - 16.03.18, 18:35: > Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce() > must use synchronize_sched(). > > Reported-by: Tejun Heo > Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work > reliably") Signed-off-by: Bart Van Assche > Cc: Hannes Reinecke > Cc: Ming Lei > Cc: Christoph Hellwig > Cc: Johannes Thumshirn > Cc: Tejun Heo > Cc: Oleksandr Natalenko > Cc: Martin Steigerwald > Cc: sta...@vger.kernel.org # v4.15 > --- > drivers/scsi/scsi_lib.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 1d83f29aee74..0b99ee2fbbb5 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -3014,7 +3014,7 @@ scsi_device_quiesce(struct scsi_device *sdev) >* unfreeze even if the queue was already frozen before this function >* was called. See also https://lwn.net/Articles/573497/. >*/ > - synchronize_rcu(); > + synchronize_sched(); > blk_mq_unfreeze_queue(q); > > mutex_lock(&sdev->state_mutex); -- Martin
Re: [LSF/MM TOPIC] Two blk-mq related topics
Ming Lei - 30.01.18, 02:24: > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > > > V4.13-rc1, it is enabled at default, but later the patch is reverted > > > in V4.13-rc7, and becomes disabled at default too. > > > > > > Now both the original reported PM issue(actually SCSI quiesce) and > > > the sequential IO performance issue have been addressed. > > > > Is the blocker bug just not closed because no-one thought to do it: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=178381 > > > > (we have confirmed that this issue is now fixed with the original > > reporter?) > > From a developer view, this issue is fixed by the following commit: > 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), > and it is verified by kernel list reporter. I never seen any suspend / hibernate related issues with blk-mq + bfq since then. Using heavily utilized BTRFS dual SSD RAID 1. % egrep "MQ|BFQ" /boot/config-4.15.0-tp520-btrfstrim+ CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_BLK_WBT_MQ=y CONFIG_BLK_MQ_PCI=y CONFIG_BLK_MQ_VIRTIO=y CONFIG_MQ_IOSCHED_DEADLINE=m CONFIG_MQ_IOSCHED_KYBER=m CONFIG_IOSCHED_BFQ=m CONFIG_BFQ_GROUP_IOSCHED=y CONFIG_NET_SCH_MQPRIO=m # CONFIG_SCSI_MQ_DEFAULT is not set # CONFIG_DM_MQ_DEFAULT is not set CONFIG_DM_CACHE_SMQ=m % cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.15.0-tp520-btrfstrim+ root=UUID=[…] ro rootflags=subvol=debian resume=/dev/mapper/sata-swap init=/bin/systemd thinkpad_acpi.fan_control=1 systemd.restore_state=0 scsi_mod.use_blk_mq=1 % cat /sys/block/sda/queue/scheduler [bfq] none Thanks, -- Martin
Re: [PATCH v7 9/9] block, scsi: Make SCSI quiesce and resume work reliably
Bart Van Assche - 10.10.17, 15:27: > On Tue, 2017-10-10 at 09:57 +0200, Martin Steigerwald wrote: > > > Bart Van Assche - 09.10.17, 16:14: > > > > > The contexts from which a SCSI device can be quiesced or resumed are: > > > [ ... ] > > > > > > Does this as reliably fix the issue as the patches from Ming? I mean in > > *real world* scenarios? Or is it just about the same approach as Ming > > has taken. > > I ask cause I don´t see any Tested-By:´s here? I know I tested Ming´s > > patch series and I know it fixes the hang after resume from suspend > > with blk-mq + BFQ issue for me. I have an uptime of 7 days and I didn´t > > see any uptime even remotely like that in a long time (before that issue > > Intel gfx drivers caused hangs, but thankfully that seems fixed > > meanwhile). > > > > I´d be willing to test. Do you have a 4.14.x tree available with these > > patches applied I can just add as a remote and fetch from? > A previous version of this patch series passed Oleksandr's tests. This > series is close to that previous version so I think it is safe to assume > that this version will also pass Oleksandr's tests. Anyway, more testing is > definitely welcome so if you want to verify this series please start from > the following tree (Jens' for-next branch + this series): > https://github.com/bvanassche/linux/tree/blk-mq-pm-v7 Several suspend to ram and resume, and suspend to disk and resume cycles without issues. So I think this is good. Tested-By: Martin Steigerwald Thanks, -- Martin
Re: [PATCH v7 9/9] block, scsi: Make SCSI quiesce and resume work reliably
Bart Van Assche - 09.10.17, 16:14: > The contexts from which a SCSI device can be quiesced or resumed are: > * Writing into /sys/class/scsi_device/*/device/state. > * SCSI parallel (SPI) domain validation. > * The SCSI device power management methods. See also scsi_bus_pm_ops. > > It is essential during suspend and resume that neither the filesystem > state nor the filesystem metadata in RAM changes. This is why while > the hibernation image is being written or restored that SCSI devices > are quiesced. The SCSI core quiesces devices through scsi_device_quiesce() > and scsi_device_resume(). In the SDEV_QUIESCE state execution of > non-preempt requests is deferred. This is realized by returning > BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI > devices. Avoid that a full queue prevents power management requests > to be submitted by deferring allocation of non-preempt requests for > devices in the quiesced state. This patch has been tested by running > the following commands and by verifying that after resume the fio job > is still running: > > for d in /sys/class/block/sd*[a-z]; do > hcil=$(readlink "$d/device") > hcil=${hcil#../../../} > echo 4 > "$d/queue/nr_requests" > echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth" > done > bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c) > bdev=${bdev#../../} > hcil=$(readlink "/sys/block/$bdev/device") > hcil=${hcil#../../../} > fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 > --rw=randread \ --ioengine=libaio --numjobs=4 --iodepth=16 > --iodepth_batch=1 --thread \ --loops=$((2**31)) & > pid=$! > sleep 1 > systemctl hibernate > sleep 10 > kill $pid > > Reported-by: Oleksandr Natalenko > References: "I/O hangs after resuming from suspend-to-ram" > (https://marc.info/?l=linux-block&m=150340235201348). Signed-off-by: Bart > Van Assche > Cc: Martin K. Petersen > Cc: Ming Lei > Cc: Christoph Hellwig > Cc: Hannes Reinecke > Cc: Johannes Thumshirn Does this as reliably fix the issue as the patches from Ming? I mean in *real world* scenarios? Or is it just about the same approach as Ming has taken. I ask cause I don´t see any Tested-By:´s here? I know I tested Ming´s patch series and I know it fixes the hang after resume from suspend with blk-mq + BFQ issue for me. I have an uptime of 7 days and I didn´t see any uptime even remotely like that in a long time (before that issue Intel gfx drivers caused hangs, but thankfully that seems fixed meanwhile). I´d be willing to test. Do you have a 4.14.x tree available with these patches applied I can just add as a remote and fetch from? Thanks, Martin > --- > block/blk-core.c| 42 +++--- > block/blk-mq.c | 4 ++-- > block/blk-timeout.c | 2 +- > drivers/scsi/scsi_lib.c | 28 ++-- > fs/block_dev.c | 4 ++-- > include/linux/blkdev.h | 2 +- > 6 files changed, 59 insertions(+), 23 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index ed992cbd107f..3847ea42e341 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q) > > spin_lock_irqsave(q->queue_lock, flags); > queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q); > + wake_up_all(&q->mq_freeze_wq); > spin_unlock_irqrestore(q->queue_lock, flags); > } > EXPORT_SYMBOL_GPL(blk_clear_preempt_only); > @@ -793,15 +794,40 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask) > } > EXPORT_SYMBOL(blk_alloc_queue); > > -int blk_queue_enter(struct request_queue *q, bool nowait) > +/** > + * blk_queue_enter() - try to increase q->q_usage_counter > + * @q: request queue pointer > + * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT > + */ > +int blk_queue_enter(struct request_queue *q, unsigned int flags) > { > + const bool preempt = flags & BLK_MQ_REQ_PREEMPT; > + > while (true) { > + bool success = false; > int ret; > > - if (percpu_ref_tryget_live(&q->q_usage_counter)) > + rcu_read_lock_sched(); > + if (percpu_ref_tryget_live(&q->q_usage_counter)) { > + /* > + * The code that sets the PREEMPT_ONLY flag is > + * responsible for ensuring that that flag is globally > + * visible before the queue is unfrozen. > + */ > + if (preempt || !blk_queue_preempt_only(q)) { > + success = true; > + } else { > + percpu_ref_put(&q->q_usage_counter); > + WARN_ONCE("%s: Attempt to allocate non-preempt > request in preempt-only > mode.\n", + kobject_name(q->kobj.parent)); > + } > + } > + rcu_read_unlock_sched(); > + > +
Re: [PATCH V7 0/6] block/scsi: safe SCSI quiescing
Hi Ming. Ming Lei - 30.09.17, 14:12: > Please consider this patchset for V4.15, and it fixes one > kind of long-term I/O hang issue in either block legacy path > or blk-mq. > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. Isn´t that material for -stable as well? I´d love to see this go into 4.14. Especially as its an LTS release. Thanks, Martin > Once SCSI device is put into QUIESCE, no new request except for > RQF_PREEMPT can be dispatched to SCSI successfully, and > scsi_device_quiesce() just simply waits for completion of I/Os > dispatched to SCSI stack. It isn't enough at all. > > Because new request still can be comming, but all the allocated > requests can't be dispatched successfully, so request pool can be > consumed up easily. > > Then request with RQF_PREEMPT can't be allocated and wait forever, > then system hangs forever, such as during system suspend or > sending SCSI domain alidation in case of transport_spi. > > Both IO hang inside system suspend[1] or SCSI domain validation > were reported before. > > This patch introduces preempt only mode, and solves the issue > by allowing RQF_PREEMP only during SCSI quiesce. > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes > them all. > > V7: > - add Reviewed-by & Tested-by > - one line change in patch 5 for checking preempt request > > V6: > - borrow Bart's idea of preempt only, with clean > implementation(patch 5/patch 6) > - needn't any external driver's dependency, such as MD's > change > > V5: > - fix one tiny race by introducing blk_queue_enter_preempt_freeze() > given this change is small enough compared with V4, I added > tested-by directly > > V4: > - reorganize patch order to make it more reasonable > - support nested preempt freeze, as required by SCSI transport spi > - check preempt freezing in slow path of of blk_queue_enter() > - add "SCSI: transport_spi: resume a quiesced device" > - wake up freeze queue in setting dying for both blk-mq and legacy > - rename blk_mq_[freeze|unfreeze]_queue() in one patch > - rename .mq_freeze_wq and .mq_freeze_depth > - improve comment > > V3: > - introduce q->preempt_unfreezing to fix one bug of preempt freeze > - call blk_queue_enter_live() only when queue is preempt frozen > - cleanup a bit on the implementation of preempt freeze > - only patch 6 and 7 are changed > > V2: > - drop the 1st patch in V1 because percpu_ref_is_dying() is > enough as pointed by Tejun > - introduce preempt version of blk_[freeze|unfreeze]_queue > - sync between preempt freeze and normal freeze > - fix warning from percpu-refcount as reported by Oleksandr > > > [1] https://marc.info/?t=150340250100013&r=3&w=2 > > > Thanks, > Ming > > Ming Lei (6): > blk-mq: only run hw queues for blk-mq > block: tracking request allocation with q_usage_counter > block: pass flags to blk_queue_enter() > block: prepare for passing RQF_PREEMPT to request allocation > block: support PREEMPT_ONLY > SCSI: set block queue at preempt only when SCSI device is put into > quiesce > > block/blk-core.c| 63 > +++-- block/blk-mq.c | > 14 --- > block/blk-timeout.c | 2 +- > drivers/scsi/scsi_lib.c | 25 +--- > fs/block_dev.c | 4 ++-- > include/linux/blk-mq.h | 7 +++--- > include/linux/blkdev.h | 27 ++--- > 7 files changed, 107 insertions(+), 35 deletions(-) -- Martin
Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
Ming Lei - 27.09.17, 16:27: > On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote: > > Hi Ming. > > > > Ming Lei - 27.09.17, 13:48: > > > Hi, > > > > > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > > > > > Once SCSI device is put into QUIESCE, no new request except for > > > RQF_PREEMPT can be dispatched to SCSI successfully, and > > > scsi_device_quiesce() just simply waits for completion of I/Os > > > dispatched to SCSI stack. It isn't enough at all. > > > > > > Because new request still can be comming, but all the allocated > > > requests can't be dispatched successfully, so request pool can be > > > consumed up easily. > > > > > > Then request with RQF_PREEMPT can't be allocated and wait forever, > > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT, > > > then system hangs forever, such as during system suspend or > > > sending SCSI domain alidation. > > > > > > Both IO hang inside system suspend[1] or SCSI domain validation > > > were reported before. > > > > > > This patch introduces preempt only mode, and solves the issue > > > by allowing RQF_PREEMP only during SCSI quiesce. > > > > > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes > > > them all. > > > > > > V6: > > > - borrow Bart's idea of preempt only, with clean > > > > > > implementation(patch 5/patch 6) > > > > > > - needn't any external driver's dependency, such as MD's > > > change > > > > Do you want me to test with v6 of the patch set? If so, it would be nice > > if > > you´d make a v6 branch in your git repo. > > Hi Martin, > > I appreciate much if you may run V6 and provide your test result, > follows the branch: > > https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6 > > https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6 > > > After an uptime of almost 6 days I am pretty confident that the V5 one > > fixes the issue for me. So > > > > Tested-by: Martin Steigerwald > > > > for V5. > > Thanks for your test! Two days and almost 6 hours, no hang yet. I bet the whole thing works. (3e45474d7df3bfdabe4801b5638d197df9810a79) Tested-By: Martin Steigerwald (It could still hang after three days, but usually I got the first hang within the first two days.) Thanks, -- Martin
Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
Hi Ming. Ming Lei - 27.09.17, 13:48: > Hi, > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > Once SCSI device is put into QUIESCE, no new request except for > RQF_PREEMPT can be dispatched to SCSI successfully, and > scsi_device_quiesce() just simply waits for completion of I/Os > dispatched to SCSI stack. It isn't enough at all. > > Because new request still can be comming, but all the allocated > requests can't be dispatched successfully, so request pool can be > consumed up easily. > > Then request with RQF_PREEMPT can't be allocated and wait forever, > meantime scsi_device_resume() waits for completion of RQF_PREEMPT, > then system hangs forever, such as during system suspend or > sending SCSI domain alidation. > > Both IO hang inside system suspend[1] or SCSI domain validation > were reported before. > > This patch introduces preempt only mode, and solves the issue > by allowing RQF_PREEMP only during SCSI quiesce. > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes > them all. > > V6: > - borrow Bart's idea of preempt only, with clean > implementation(patch 5/patch 6) > - needn't any external driver's dependency, such as MD's > change Do you want me to test with v6 of the patch set? If so, it would be nice if you´d make a v6 branch in your git repo. After an uptime of almost 6 days I am pretty confident that the V5 one fixes the issue for me. So Tested-by: Martin Steigerwald for V5. Thanks, Martin > V5: > - fix one tiny race by introducing blk_queue_enter_preempt_freeze() > given this change is small enough compared with V4, I added > tested-by directly > > V4: > - reorganize patch order to make it more reasonable > - support nested preempt freeze, as required by SCSI transport spi > - check preempt freezing in slow path of of blk_queue_enter() > - add "SCSI: transport_spi: resume a quiesced device" > - wake up freeze queue in setting dying for both blk-mq and legacy > - rename blk_mq_[freeze|unfreeze]_queue() in one patch > - rename .mq_freeze_wq and .mq_freeze_depth > - improve comment > > V3: > - introduce q->preempt_unfreezing to fix one bug of preempt freeze > - call blk_queue_enter_live() only when queue is preempt frozen > - cleanup a bit on the implementation of preempt freeze > - only patch 6 and 7 are changed > > V2: > - drop the 1st patch in V1 because percpu_ref_is_dying() is > enough as pointed by Tejun > - introduce preempt version of blk_[freeze|unfreeze]_queue > - sync between preempt freeze and normal freeze > - fix warning from percpu-refcount as reported by Oleksandr > > > [1] https://marc.info/?t=150340250100013&r=3&w=2 > > > Thanks, > Ming > > Ming Lei (6): > blk-mq: only run hw queues for blk-mq > block: tracking request allocation with q_usage_counter > block: pass flags to blk_queue_enter() > block: prepare for passing RQF_PREEMPT to request allocation > block: support PREEMPT_ONLY > SCSI: set block queue at preempt only when SCSI device is put into > quiesce > > block/blk-core.c| 62 > ++--- block/blk-mq.c | > 14 --- > block/blk-timeout.c | 2 +- > drivers/scsi/scsi_lib.c | 25 +--- > fs/block_dev.c | 4 ++-- > include/linux/blk-mq.h | 7 +++--- > include/linux/blkdev.h | 27 ++--- > 7 files changed, 106 insertions(+), 35 deletions(-) -- Martin
Re: Race to power off harming SATA SSDs
Am Dienstag, 11. April 2017, 11:31:29 CEST schrieb Henrique de Moraes Holschuh: > On Tue, 11 Apr 2017, Martin Steigerwald wrote: > > I do have a Crucial M500 and I do have an increase of that counter: > > > > martin@merkaba:~[…]/Crucial-M500> grep "^174" smartctl-a-201* > > smartctl-a-2014-03-05.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 > > 000 Old_age Always - 1 > > smartctl-a-2014-10-11-nach-prüfsummenfehlern.txt:174 > > Unexpect_Power_Loss_Ct > > 0x0032 100 100 000Old_age Always - 67 > > smartctl-a-2015-05-01.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 > > 000 Old_age Always - 105 > > smartctl-a-2016-02-06.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 > > 000 Old_age Always - 148 > > smartctl-a-2016-07-08-unreadable-sector.txt:174 Unexpect_Power_Loss_Ct > > 0x0032 100 100 000Old_age Always - 201 > > smartctl-a-2017-04-11.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 > > 000 Old_age Always - 272 > > > > > > I mostly didn´t notice anything, except for one time where I indeed had a > > BTRFS checksum error, luckily within a BTRFS RAID 1 with an Intel SSD > > (which also has an attribute for unclean shutdown which raises). > > The Crucial M500 has something called "RAIN" which it got unmodified > from its Micron datacenter siblings of the time, along with a large > amount of flash overprovisioning. Too bad it lost the overprovisioned > supercapacitor bank present on the Microns. I think I read about this some time ago. I decided for a Crucial M500 cause in tests it wasn´t the fastest, but there were hints that it may be one of the most reliable mSATA SSDs of that time. [… RAIN explaination …] > > The write-up Henrique gave me the idea, that maybe it wasn´t an user > > triggered unclean shutdown that caused the issue, but an unclean shutdown > > triggered by the Linux kernel SSD shutdown procedure implementation. > > Maybe. But that corruption could easily having been caused by something > else. There is no shortage of possible culprits. Yes. > I expect most damage caused by unclean SSD power-offs to be hidden from > the user/operating system/filesystem by the extensive recovery > facilities present on most SSDs. > > Note that the fact that data was transparently (and sucessfully) > recovered doesn't mean damage did not happen, or that the unit was not > harmed by it: it likely got some extra flash wear at the very least. Okay, I understand. Well my guess back then, I didn´t fully elaborate on it in the initial mail, but did so in the blog post, was exactly that I didn´t see any capacitor on the mSATA SSD board. But I know the Intel SSD 320 has capacitors. So I thought, okay, maybe there really has been a sudden powerloss due to me trying to exchange battery during suspend to RAM / standby, without me remembering this event. And I thought, okay, without capacitor the SSD then didn´t get a chance to write some of the data. But again this also is just a guess. I can provide to you smart data files in case you want to have a look at them. > BTW, for the record, Windows 7 also appears to have had (and maybe still > have) this issue as far as I can tell. Almost every user report of > excessive unclean power off alerts (and also of SSD bricking) to be > found on SSD vendor forums come from Windows users. Interesting. Thanks, -- Martin
Re: Race to power off harming SATA SSDs
Am Dienstag, 11. April 2017, 08:52:06 CEST schrieb Tejun Heo: > > Evidently, how often the SSD will lose the race depends on a platform > > and SSD combination, and also on how often the system is powered off. > > A sluggish firmware that takes its time to cut power can save the day... > > > > > > Observing the effects: > > > > An unclean SSD power-off will be signaled by the SSD device through an > > increase on a specific S.M.A.R.T attribute. These SMART attributes can > > be read using the smartmontools package from www.smartmontools.org, > > which should be available in just about every Linux distro. > > > > smartctl -A /dev/sd# > > > > The SMART attribute related to unclean power-off is vendor-specific, so > > one might have to track down the SSD datasheet to know which attribute a > > particular SSD uses. The naming of the attribute also varies. > > > > For a Crucial M500 SSD with up-to-date firmware, this would be attribute > > 174 "Unexpect_Power_Loss_Ct", for example. > > > > NOTE: unclean SSD power-offs are dangerous and may brick the device in > > the worst case, or otherwise harm it (reduce longevity, damage flash > > blocks). It is also not impossible to get data corruption. > > I get that the incrementing counters might not be pretty but I'm a bit > skeptical about this being an actual issue. Because if that were > true, the device would be bricking itself from any sort of power > losses be that an actual power loss, battery rundown or hard power off > after crash. The write-up by Henrique has been a very informative and interesting read for me. I wondered about the same question tough. I do have a Crucial M500 and I do have an increase of that counter: martin@merkaba:~[…]/Crucial-M500> grep "^174" smartctl-a-201* smartctl-a-2014-03-05.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 1 smartctl-a-2014-10-11-nach-prüfsummenfehlern.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000Old_age Always - 67 smartctl-a-2015-05-01.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 105 smartctl-a-2016-02-06.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 148 smartctl-a-2016-07-08-unreadable-sector.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000Old_age Always - 201 smartctl-a-2017-04-11.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 272 I mostly didn´t notice anything, except for one time where I indeed had a BTRFS checksum error, luckily within a BTRFS RAID 1 with an Intel SSD (which also has an attribute for unclean shutdown which raises). I blogged about this in german language quite some time ago: https://blog.teamix.de/2015/01/19/btrfs-raid-1-selbstheilung-in-aktion/ (I think its easy enough to get the point of the blog post even when not understanding german) Result of scrub: scrub started at Thu Oct 9 15:52:00 2014 and finished after 564 seconds total bytes scrubbed: 268.36GiB with 60 errors error details: csum=60 corrected errors: 60, uncorrectable errors: 0, unverified errors: 0 Device errors were on: merkaba:~> btrfs device stats /home [/dev/mapper/msata-home].write_io_errs 0 [/dev/mapper/msata-home].read_io_errs0 [/dev/mapper/msata-home].flush_io_errs 0 [/dev/mapper/msata-home].corruption_errs 60 [/dev/mapper/msata-home].generation_errs 0 […] (thats the Crucial m500) I didn´t have any explaination of this, but I suspected some unclean shutdown, even tough I remembered no unclean shutdown. I take good care to always has a battery in this ThinkPad T520, due to unclean shutdown issues with Intel SSD 320 (bricked device which reports 8 MiB as capacity, probably fixed by the firmware update I applied back then). The write-up Henrique gave me the idea, that maybe it wasn´t an user triggered unclean shutdown that caused the issue, but an unclean shutdown triggered by the Linux kernel SSD shutdown procedure implementation. Of course, I don´t know whether this is the case and I think there is no way to proof or falsify it years after this happened. I never had this happen again. Thanks, -- Martin