Re: [PATCH 05/11] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()

2017-11-29 Thread Rafael J. Wysocki
On Thu, Nov 30, 2017 at 2:34 AM, Dave Chinner  wrote:
> On Thu, Nov 30, 2017 at 12:48:15AM +0100, Rafael J. Wysocki wrote:
>> On Thu, Nov 30, 2017 at 12:23 AM, Luis R. Rodriguez  
>> wrote:
>> > There are use cases where we wish to traverse the superblock list
>> > but also capture errors, and in which case we want to avoid having
>> > our callers issue a lock themselves since we can do the locking for
>> > the callers. Provide a iterate_supers_excl() which calls a function
>> > with the write lock held. If an error occurs we capture it and
>> > propagate it.
>> >
>> > Likewise there are use cases where we wish to traverse the superblock
>> > list but in reverse order. The new iterate_supers_reverse_excl() helpers
>> > does this but also also captures any errors encountered.
>> >
>> > Signed-off-by: Luis R. Rodriguez 
>> > ---
>> >  fs/super.c | 91 
>> > ++
>> >  include/linux/fs.h |  2 ++
>> >  2 files changed, 93 insertions(+)
>> >
>> > diff --git a/fs/super.c b/fs/super.c
>> > index a63513d187e8..885711c1d35b 100644
>> > --- a/fs/super.c
>> > +++ b/fs/super.c
>> > @@ -605,6 +605,97 @@ void iterate_supers(void (*f)(struct super_block *, 
>> > void *), void *arg)
>> > spin_unlock(_lock);
>> >  }
>> >
>> > +/**
>> > + * iterate_supers_excl - exclusively call func for all active 
>> > superblocks
>> > + * @f: function to call
>> > + * @arg: argument to pass to it
>> > + *
>> > + * Scans the superblock list and calls given function, passing it
>> > + * locked superblock and given argument. Returns 0 unless an error
>> > + * occurred on calling the function on any superblock.
>> > + */
>> > +int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
>> > +{
>> > +   struct super_block *sb, *p = NULL;
>> > +   int error = 0;
>> > +
>> > +   spin_lock(_lock);
>> > +   list_for_each_entry(sb, _blocks, s_list) {
>> > +   if (hlist_unhashed(>s_instances))
>> > +   continue;
>> > +   sb->s_count++;
>> > +   spin_unlock(_lock);
>>
>> Can anything bad happen if the list is modified at this point by a
>> concurrent thread?
>
> No. We have a valid reference to sb->s_count and that keeps it on
> the list while we have the lock dropped. The sb reference isn't
> dropped until we've iterated to the next sb on the list and taken a
> reference to that, hence it's safe to drop and regain the list lock
> without needing to restart the iteration.
>
>> > +
>> > +   down_write(>s_umount);
>> > +   if (sb->s_root && (sb->s_flags & SB_BORN)) {
>> > +   error = f(sb, arg);
>> > +   if (error) {
>> > +   up_write(>s_umount);
>> > +   spin_lock(_lock);
>> > +   __put_super(sb);
>> > +   break;
>> > +   }
>> > +   }
>> > +   up_write(>s_umount);
>> > +
>> > +   spin_lock(_lock);
>> > +   if (p)
>> > +   __put_super(p);
>> > +   p = sb;
>
> This code here is what drops the reference to the previous sb
> we've iterated past.
>
> FWIW, this "hold until next is held" iteration pattern is used
> frequently for inodes, dentries, and other reference counted VFS
> objects so we can iterate the list without needing to hold the
> list lock for the entire iteration

OK, thanks!


Re: [PATCH 05/11] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()

2017-11-29 Thread Dave Chinner
On Thu, Nov 30, 2017 at 12:48:15AM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 30, 2017 at 12:23 AM, Luis R. Rodriguez  wrote:
> > There are use cases where we wish to traverse the superblock list
> > but also capture errors, and in which case we want to avoid having
> > our callers issue a lock themselves since we can do the locking for
> > the callers. Provide a iterate_supers_excl() which calls a function
> > with the write lock held. If an error occurs we capture it and
> > propagate it.
> >
> > Likewise there are use cases where we wish to traverse the superblock
> > list but in reverse order. The new iterate_supers_reverse_excl() helpers
> > does this but also also captures any errors encountered.
> >
> > Signed-off-by: Luis R. Rodriguez 
> > ---
> >  fs/super.c | 91 
> > ++
> >  include/linux/fs.h |  2 ++
> >  2 files changed, 93 insertions(+)
> >
> > diff --git a/fs/super.c b/fs/super.c
> > index a63513d187e8..885711c1d35b 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -605,6 +605,97 @@ void iterate_supers(void (*f)(struct super_block *, 
> > void *), void *arg)
> > spin_unlock(_lock);
> >  }
> >
> > +/**
> > + * iterate_supers_excl - exclusively call func for all active 
> > superblocks
> > + * @f: function to call
> > + * @arg: argument to pass to it
> > + *
> > + * Scans the superblock list and calls given function, passing it
> > + * locked superblock and given argument. Returns 0 unless an error
> > + * occurred on calling the function on any superblock.
> > + */
> > +int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
> > +{
> > +   struct super_block *sb, *p = NULL;
> > +   int error = 0;
> > +
> > +   spin_lock(_lock);
> > +   list_for_each_entry(sb, _blocks, s_list) {
> > +   if (hlist_unhashed(>s_instances))
> > +   continue;
> > +   sb->s_count++;
> > +   spin_unlock(_lock);
> 
> Can anything bad happen if the list is modified at this point by a
> concurrent thread?

No. We have a valid reference to sb->s_count and that keeps it on
the list while we have the lock dropped. The sb reference isn't
dropped until we've iterated to the next sb on the list and taken a
reference to that, hence it's safe to drop and regain the list lock
without needing to restart the iteration.

> > +
> > +   down_write(>s_umount);
> > +   if (sb->s_root && (sb->s_flags & SB_BORN)) {
> > +   error = f(sb, arg);
> > +   if (error) {
> > +   up_write(>s_umount);
> > +   spin_lock(_lock);
> > +   __put_super(sb);
> > +   break;
> > +   }
> > +   }
> > +   up_write(>s_umount);
> > +
> > +   spin_lock(_lock);
> > +   if (p)
> > +   __put_super(p);
> > +   p = sb;

This code here is what drops the reference to the previous sb
we've iterated past.

FWIW, this "hold until next is held" iteration pattern is used
frequently for inodes, dentries, and other reference counted VFS
objects so we can iterate the list without needing to hold the
list lock for the entire iteration

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 05/11] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()

2017-11-29 Thread Luis R. Rodriguez
On Thu, Nov 30, 2017 at 12:48:15AM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 30, 2017 at 12:23 AM, Luis R. Rodriguez  wrote:
> > +int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
> > +{
> > +   struct super_block *sb, *p = NULL;
> > +   int error = 0;
> > +
> > +   spin_lock(_lock);
> > +   list_for_each_entry(sb, _blocks, s_list) {
> > +   if (hlist_unhashed(>s_instances))
> > +   continue;
> > +   sb->s_count++;
> > +   spin_unlock(_lock);
> 
> Can anything bad happen if the list is modified at this point by a
> concurrent thread?

The race is theoretical and applies to all users of iterate_supers() as well.

Its certainly worth considering, however given other code is implicated its
not a *new* issue or race. Its the best we can do with the current design.

That said, as I looked at all this code I considered that perhaps super_blocks
deserves its own RCU lock to enable us to do full swap operations on the list,
without having to do these nasty releases in between.

If that's possible / desirable I'd consider it a welcomed future optimization,
and I could give it a shot, however its unclear if this is a requirement for
this feature at this time.

  Luis


[PATCH V3] block: drain queue before waiting for q_usage_counter becoming zero

2017-11-29 Thread Ming Lei
Now we track legacy requests with .q_usage_counter in commit 055f6e18e08f
("block: Make q_usage_counter also track legacy requests"), but that
commit never runs and drains legacy queue before waiting for this counter
becoming zero, then IO hang is caused in the test of pulling disk during IO.

This patch fixes the issue by draining requests before waiting for
q_usage_counter becoming zero, both Mauricio and chenxiang reported this
issue, and observed that it can be fixed by this patch.

Link: https://marc.info/?l=linux-block=151192424731797=2
Fixes: 055f6e18e08f("block: Make q_usage_counter also track legacy requests")
Cc: Wen Xiong 
Tested-by: "chenxiang (M)" 
Tested-by: Mauricio Faria de Oliveira 
Signed-off-by: Ming Lei 
---
V3:
- V2 can't cover chenxiang's issue, so we have to drain queue via
blk_drain_queue(), and fallback to original post(V1)


 block/blk-core.c | 9 +++--
 block/blk-mq.c   | 2 ++
 block/blk.h  | 2 ++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e5a623b45a1d..01329367e51c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -562,6 +562,13 @@ static void __blk_drain_queue(struct request_queue *q, 
bool drain_all)
}
 }
 
+void blk_drain_queue(struct request_queue *q)
+{
+   spin_lock_irq(q->queue_lock);
+   __blk_drain_queue(q, true);
+   spin_unlock_irq(q->queue_lock);
+}
+
 /**
  * blk_queue_bypass_start - enter queue bypass mode
  * @q: queue of interest
@@ -689,8 +696,6 @@ void blk_cleanup_queue(struct request_queue *q)
 */
blk_freeze_queue(q);
spin_lock_irq(lock);
-   if (!q->mq_ops)
-   __blk_drain_queue(q, true);
queue_flag_set(QUEUE_FLAG_DEAD, q);
spin_unlock_irq(lock);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c94a8d225b63..e4c2e5c17343 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -161,6 +161,8 @@ void blk_freeze_queue(struct request_queue *q)
 * exported to drivers as the only user for unfreeze is blk_mq.
 */
blk_freeze_queue_start(q);
+   if (!q->mq_ops)
+   blk_drain_queue(q);
blk_mq_freeze_queue_wait(q);
 }
 
diff --git a/block/blk.h b/block/blk.h
index 3f1446937aec..442098aa9463 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -330,4 +330,6 @@ static inline void blk_queue_bounce(struct request_queue 
*q, struct bio **bio)
 }
 #endif /* CONFIG_BOUNCE */
 
+extern void blk_drain_queue(struct request_queue *q);
+
 #endif /* BLK_INTERNAL_H */
-- 
2.9.5



Re: [PATCH 05/11] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()

2017-11-29 Thread Rafael J. Wysocki
On Thu, Nov 30, 2017 at 12:23 AM, Luis R. Rodriguez  wrote:
> There are use cases where we wish to traverse the superblock list
> but also capture errors, and in which case we want to avoid having
> our callers issue a lock themselves since we can do the locking for
> the callers. Provide a iterate_supers_excl() which calls a function
> with the write lock held. If an error occurs we capture it and
> propagate it.
>
> Likewise there are use cases where we wish to traverse the superblock
> list but in reverse order. The new iterate_supers_reverse_excl() helpers
> does this but also also captures any errors encountered.
>
> Signed-off-by: Luis R. Rodriguez 
> ---
>  fs/super.c | 91 
> ++
>  include/linux/fs.h |  2 ++
>  2 files changed, 93 insertions(+)
>
> diff --git a/fs/super.c b/fs/super.c
> index a63513d187e8..885711c1d35b 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -605,6 +605,97 @@ void iterate_supers(void (*f)(struct super_block *, void 
> *), void *arg)
> spin_unlock(_lock);
>  }
>
> +/**
> + * iterate_supers_excl - exclusively call func for all active superblocks
> + * @f: function to call
> + * @arg: argument to pass to it
> + *
> + * Scans the superblock list and calls given function, passing it
> + * locked superblock and given argument. Returns 0 unless an error
> + * occurred on calling the function on any superblock.
> + */
> +int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
> +{
> +   struct super_block *sb, *p = NULL;
> +   int error = 0;
> +
> +   spin_lock(_lock);
> +   list_for_each_entry(sb, _blocks, s_list) {
> +   if (hlist_unhashed(>s_instances))
> +   continue;
> +   sb->s_count++;
> +   spin_unlock(_lock);

Can anything bad happen if the list is modified at this point by a
concurrent thread?

> +
> +   down_write(>s_umount);
> +   if (sb->s_root && (sb->s_flags & SB_BORN)) {
> +   error = f(sb, arg);
> +   if (error) {
> +   up_write(>s_umount);
> +   spin_lock(_lock);
> +   __put_super(sb);
> +   break;
> +   }
> +   }
> +   up_write(>s_umount);
> +
> +   spin_lock(_lock);
> +   if (p)
> +   __put_super(p);
> +   p = sb;
> +   }
> +   if (p)
> +   __put_super(p);
> +   spin_unlock(_lock);
> +
> +   return error;
> +}
> +


[PATCH 00/11] fs: use freeze_fs on suspend/hibernate

2017-11-29 Thread Luis R. Rodriguez
This is a followup from the original RFC which proposed to start
to kill kthread freezing all together [0]. Instead of going straight
out to the jugular for kthread freezing this series only addresses
killing freezer calls on filesystems which implement freeze_fs, after
we let the kernel freeze these filesystems for us on suspend.

This approach puts on a slow but steady path towards the original goal
though. Each subsystem could look for similar solutions. Even with
filesystems we're not all done yet, after this we'll still have to
decide what to do about filesystems which do not implement freeze_fs().

Motivation and problem:

kthreads have some semantics for freezing, which helps the kernel
freeze them when a system is going to suspend or hibernation. These
semantics are not well defined though, and it actually turns out
pretty hard to get it right.

Without a proper solution suspend and hibernation are fragile on filesystems,
it can easily break suspend and fixing such issues are in no way trivial [1]
[2].

Proposed solution:

Instead of fixing such semantics and trying to get all filesystems to do it
right, we can easily do away with all freezing calls if the filesystem
implements a proper freeze_fs() callback. The following 9 filesystems have
freeze_fs() implemented as such we can let the kernel issue the callback upon
suspend and thaw on resume automatically on our behalf.

  o xfs
  o reiserfs
  o nilfs2
  o jfs
  o f2fs
  o ext4
  o ext2
  o btrfs

Of these, the following have freezer helpers, which can then be removed
after the kernel automaticaly calls freeze_fs for us on suspend:

  o xfs
  o nilfs2
  o jfs
  o f2fs
  o ext4

I've tested this on a system with ext4 and XFS, and have let 0-day go at
without issues. I have branches availabe for linux-next [3] and Linus'
latest tree [4].

Further testing, thoughts, reviews, flames are all equally appreciated.

[0] https://lkml.kernel.org/r/20171003185313.1017-1-mcg...@kernel.org
[1] https://bugzilla.suse.com/show_bug.cgi?id=1043449
[2] https://lkml.kernel.org/r/20171113103139.ga18...@yu-chen.sh.intel.com
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20171129-fs-freeze-cleanup
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20171129-fs-freeze-cleanup

Luis R. Rodriguez (11):
  fs: provide unlocked helper for freeze_super()
  fs: provide unlocked helper thaw_super()
  fs: add frozen sb state helpers
  fs: distinguish between user initiated freeze and kernel initiated
freeze
  fs: add iterate_supers_excl() and iterate_supers_reverse_excl()
  fs: freeze on suspend and thaw on resume
  xfs: remove not needed freezing calls
  ext4: remove not needed freezing calls
  f2fs: remove not needed freezing calls
  nilfs2: remove not needed freezing calls
  jfs: remove not needed freezing calls

 fs/ext4/ext4_jbd2.c|   2 +-
 fs/ext4/super.c|   2 -
 fs/f2fs/gc.c   |   5 +-
 fs/f2fs/segment.c  |   6 +-
 fs/jfs/jfs_logmgr.c|  11 +-
 fs/jfs/jfs_txnmgr.c|  31 ++---
 fs/nilfs2/segment.c|  48 
 fs/super.c | 320 -
 fs/xfs/xfs_trans.c |   2 +-
 fs/xfs/xfs_trans_ail.c |   7 +-
 include/linux/fs.h |  63 +-
 kernel/power/process.c |  15 ++-
 12 files changed, 378 insertions(+), 134 deletions(-)

-- 
2.15.0



[PATCH 03/11] fs: add frozen sb state helpers

2017-11-29 Thread Luis R. Rodriguez
The question of whether or not a superblock is frozen needs to be
augmented in the future to account for differences between a user
initiated freeze and a kernel initiated freeze done automatically
on behalf of the kernel.

Provide helpers so that these can be used instead so that we don't
have to expand checks later in these same call sites as we expand
the definition of a frozen superblock.

Signed-off-by: Luis R. Rodriguez 
---
 fs/ext4/ext4_jbd2.c |  2 +-
 fs/super.c  |  4 ++--
 fs/xfs/xfs_trans.c  |  2 +-
 include/linux/fs.h  | 33 +
 4 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 2d593201cf7a..090b8cd4551a 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -50,7 +50,7 @@ static int ext4_journal_check_start(struct super_block *sb)
 
if (sb_rdonly(sb))
return -EROFS;
-   WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
+   WARN_ON(sb_is_frozen(sb));
journal = EXT4_SB(sb)->s_journal;
/*
 * Special case here: if the journal has aborted behind our
diff --git a/fs/super.c b/fs/super.c
index cecc279beecd..e8f5a7139b8f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1392,7 +1392,7 @@ static int freeze_locked_super(struct super_block *sb)
 {
int ret;
 
-   if (sb->s_writers.frozen != SB_UNFROZEN)
+   if (!sb_is_unfrozen(sb))
return -EBUSY;
 
if (!(sb->s_flags & SB_BORN))
@@ -1498,7 +1498,7 @@ static int thaw_locked_super(struct super_block *sb)
 {
int error;
 
-   if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
+   if (!sb_is_frozen(sb))
return -EINVAL;
 
if (sb_rdonly(sb)) {
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index a87f657f59c9..b1180c26d902 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -241,7 +241,7 @@ xfs_trans_alloc(
if (!(flags & XFS_TRANS_NO_WRITECOUNT))
sb_start_intwrite(mp->m_super);
 
-   WARN_ON(mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
+   WARN_ON(sb_is_frozen(mp->m_super));
atomic_inc(>m_active_trans);
 
tp = kmem_zone_zalloc(xfs_trans_zone,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..1e10239c1d3b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1589,6 +1589,39 @@ static inline void sb_start_intwrite(struct super_block 
*sb)
__sb_start_write(sb, SB_FREEZE_FS, true);
 }
 
+/**
+ * sb_is_frozen_by_user - is superblock frozen by a user call
+ * @sb: the super to check
+ *
+ * Returns true if the super freeze was initiated by userspace, for instance,
+ * an ioctl call.
+ */
+static inline bool sb_is_frozen_by_user(struct super_block *sb)
+{
+   return sb->s_writers.frozen == SB_FREEZE_COMPLETE;
+}
+
+/**
+ * sb_is_frozen - is superblock frozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is frozen.
+ */
+static inline bool sb_is_frozen(struct super_block *sb)
+{
+   return sb_is_frozen_by_user(sb);
+}
+
+/**
+ * sb_is_unfrozen - is superblock unfrozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is unfrozen.
+ */
+static inline bool sb_is_unfrozen(struct super_block *sb)
+{
+   return sb->s_writers.frozen == SB_UNFROZEN;
+}
 
 extern bool inode_owner_or_capable(const struct inode *inode);
 
-- 
2.15.0



[PATCH 04/11] fs: distinguish between user initiated freeze and kernel initiated freeze

2017-11-29 Thread Luis R. Rodriguez
Userspace can initiate a freeze call using ioctls. If the kernel decides
to freeze a filesystem later it must be able to distinguish if userspace
had initiated the freeze, so that it does not unfreeze it later
automatically on resume.

Likewise if the kernel is initiating a freeze on its own it should *not*
fail to freeze a filesystem if a user had already frozen it on our behalf.
This same concept applies to thawing, even if its not possible for
userspace to beat the kernel in thawing a filesystem. This logic however
has never applied to userspace freezing and thawing, two consecutive
userspace freeze calls will results in only the first one succeeding, so
we must retain the same behaviour in userspace.

This doesn't implement yet kernel initiated filesystem freeze calls,
this will be done in subsequent calls. This change should introduce
no functional changes, it just extends the definitions a frozen
filesystem to account for future kernel initiated filesystem freeze.

Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 19 ++-
 include/linux/fs.h | 17 +++--
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index e8f5a7139b8f..a63513d187e8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1388,10 +1388,13 @@ static void sb_freeze_unlock(struct super_block *sb)
 }
 
 /* Caller takes lock and handles active count */
-static int freeze_locked_super(struct super_block *sb)
+static int freeze_locked_super(struct super_block *sb, bool usercall)
 {
int ret;
 
+   if (!usercall && sb_is_frozen(sb))
+   return 0;
+
if (!sb_is_unfrozen(sb))
return -EBUSY;
 
@@ -1436,7 +1439,10 @@ static int freeze_locked_super(struct super_block *sb)
 * For debugging purposes so that fs can warn if it sees write activity
 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 */
-   sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   if (usercall)
+   sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   else
+   sb->s_writers.frozen = SB_FREEZE_COMPLETE_AUTO;
lockdep_sb_freeze_release(sb);
return 0;
 }
@@ -1481,7 +1487,7 @@ int freeze_super(struct super_block *sb)
atomic_inc(>s_active);
 
down_write(>s_umount);
-   error = freeze_locked_super(sb);
+   error = freeze_locked_super(sb, true);
if (error) {
deactivate_locked_super(sb);
goto out;
@@ -1494,10 +1500,13 @@ int freeze_super(struct super_block *sb)
 EXPORT_SYMBOL(freeze_super);
 
 /* Caller takes lock and handles active count */
-static int thaw_locked_super(struct super_block *sb)
+static int thaw_locked_super(struct super_block *sb, bool usercall)
 {
int error;
 
+   if (!usercall && sb_is_unfrozen(sb))
+   return 0;
+
if (!sb_is_frozen(sb))
return -EINVAL;
 
@@ -1536,7 +1545,7 @@ int thaw_super(struct super_block *sb)
int error;
 
down_write(>s_umount);
-   error = thaw_locked_super(sb);
+   error = thaw_locked_super(sb, true);
if (error) {
up_write(>s_umount);
goto out;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1e10239c1d3b..107725b20fad 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1324,9 +1324,10 @@ enum {
SB_FREEZE_FS = 3,   /* For internal FS use (e.g. to stop
 * internal threads if needed) */
SB_FREEZE_COMPLETE = 4, /* ->freeze_fs finished successfully */
+   SB_FREEZE_COMPLETE_AUTO = 5,/* same but initiated automatically */
 };
 
-#define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
+#define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE_AUTO - 1)
 
 struct sb_writers {
int frozen; /* Is sb frozen? */
@@ -1601,6 +1602,18 @@ static inline bool sb_is_frozen_by_user(struct 
super_block *sb)
return sb->s_writers.frozen == SB_FREEZE_COMPLETE;
 }
 
+/**
+ * sb_is_frozen_by_kernel - is superblock frozen by the kernel automatically
+ * @sb: the super to check
+ *
+ * Returns true if the super freeze was initiated by the kernel, automatically,
+ * for instance during system sleep or hibernation.
+ */
+static inline bool sb_is_frozen_by_kernel(struct super_block *sb)
+{
+   return sb->s_writers.frozen == SB_FREEZE_COMPLETE_AUTO;
+}
+
 /**
  * sb_is_frozen - is superblock frozen
  * @sb: the super to check
@@ -1609,7 +1622,7 @@ static inline bool sb_is_frozen_by_user(struct 
super_block *sb)
  */
 static inline bool sb_is_frozen(struct super_block *sb)
 {
-   return sb_is_frozen_by_user(sb);
+   return sb_is_frozen_by_user(sb) || sb_is_frozen_by_kernel(sb);
 }
 
 /**
-- 
2.15.0



[PATCH 01/11] fs: provide unlocked helper for freeze_super()

2017-11-29 Thread Luis R. Rodriguez
freeze_super() holds a write lock, however we wish to also enable
callers which already hold the write lock. To do this provide a helper
and make freeze_super() use it. This way, all that freeze_super() does
now is lock handling and active count management.

This change has no functional changes.

Suggested-by: Dave Chinner 
Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 100 +
 1 file changed, 55 insertions(+), 45 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index d4e33e8f1e6f..a7650ff22f0e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1387,59 +1387,20 @@ static void sb_freeze_unlock(struct super_block *sb)
percpu_up_write(sb->s_writers.rw_sem + level);
 }
 
-/**
- * freeze_super - lock the filesystem and force it into a consistent state
- * @sb: the super to lock
- *
- * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs.  Subsequent calls to this without first thawing the fs will 
return
- * -EBUSY.
- *
- * During this function, sb->s_writers.frozen goes through these values:
- *
- * SB_UNFROZEN: File system is normal, all writes progress as usual.
- *
- * SB_FREEZE_WRITE: The file system is in the process of being frozen.  New
- * writes should be blocked, though page faults are still allowed. We wait for
- * all writes to complete and then proceed to the next stage.
- *
- * SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blocked
- * but internal fs threads can still modify the filesystem (although they
- * should not dirty new pages or inodes), writeback can run etc. After waiting
- * for all running page faults we sync the filesystem which will clean all
- * dirty pages and inodes (no new dirty pages or inodes can be created when
- * sync is running).
- *
- * SB_FREEZE_FS: The file system is frozen. Now all internal sources of fs
- * modification are blocked (e.g. XFS preallocation truncation on inode
- * reclaim). This is usually implemented by blocking new transactions for
- * filesystems that have them and need this additional guard. After all
- * internal writers are finished we call ->freeze_fs() to finish filesystem
- * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
- * mostly auxiliary for filesystems to verify they do not modify frozen fs.
- *
- * sb->s_writers.frozen is protected by sb->s_umount.
- */
-int freeze_super(struct super_block *sb)
+/* Caller takes lock and handles active count */
+static int freeze_locked_super(struct super_block *sb)
 {
int ret;
 
-   atomic_inc(>s_active);
-   down_write(>s_umount);
-   if (sb->s_writers.frozen != SB_UNFROZEN) {
-   deactivate_locked_super(sb);
+   if (sb->s_writers.frozen != SB_UNFROZEN)
return -EBUSY;
-   }
 
-   if (!(sb->s_flags & SB_BORN)) {
-   up_write(>s_umount);
+   if (!(sb->s_flags & SB_BORN))
return 0;   /* sic - it's "nothing to do" */
-   }
 
if (sb_rdonly(sb)) {
/* Nothing to do really... */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
-   up_write(>s_umount);
return 0;
}
 
@@ -1468,7 +1429,6 @@ int freeze_super(struct super_block *sb)
sb->s_writers.frozen = SB_UNFROZEN;
sb_freeze_unlock(sb);
wake_up(>s_writers.wait_unfrozen);
-   deactivate_locked_super(sb);
return ret;
}
}
@@ -1478,9 +1438,59 @@ int freeze_super(struct super_block *sb)
 */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
lockdep_sb_freeze_release(sb);
-   up_write(>s_umount);
return 0;
 }
+
+/**
+ * freeze_super - lock the filesystem and force it into a consistent state
+ * @sb: the super to lock
+ *
+ * Syncs the super to make sure the filesystem is consistent and calls the fs's
+ * freeze_fs.  Subsequent calls to this without first thawing the fs will 
return
+ * -EBUSY.
+ *
+ * During this function, sb->s_writers.frozen goes through these values:
+ *
+ * SB_UNFROZEN: File system is normal, all writes progress as usual.
+ *
+ * SB_FREEZE_WRITE: The file system is in the process of being frozen.  New
+ * writes should be blocked, though page faults are still allowed. We wait for
+ * all writes to complete and then proceed to the next stage.
+ *
+ * SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blocked
+ * but internal fs threads can still modify the filesystem (although they
+ * should not dirty new pages or inodes), writeback can run etc. After waiting
+ * for all running page faults we sync the filesystem which will clean all
+ * dirty pages and inodes (no new dirty pages or inodes can be created when
+ * sync is running).
+ *
+ * SB_FREEZE_FS: The file system is frozen. Now all internal sources 

[PATCH 02/11] fs: provide unlocked helper thaw_super()

2017-11-29 Thread Luis R. Rodriguez
thaw_super() hold a write lock, however we wish to also enable
callers which already hold the write lock. To do this provide a helper
and make thaw_super() use it. This way, all that thaw_super() does
now is lock handling and active count management.

This change has no functional changes.

Suggested-by: Dave Chinner 
Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 39 ++-
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index a7650ff22f0e..cecc279beecd 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1493,21 +1493,13 @@ int freeze_super(struct super_block *sb)
 }
 EXPORT_SYMBOL(freeze_super);
 
-/**
- * thaw_super -- unlock filesystem
- * @sb: the super to thaw
- *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
- */
-int thaw_super(struct super_block *sb)
+/* Caller takes lock and handles active count */
+static int thaw_locked_super(struct super_block *sb)
 {
int error;
 
-   down_write(>s_umount);
-   if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
-   up_write(>s_umount);
+   if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
return -EINVAL;
-   }
 
if (sb_rdonly(sb)) {
sb->s_writers.frozen = SB_UNFROZEN;
@@ -1522,7 +1514,6 @@ int thaw_super(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
lockdep_sb_freeze_release(sb);
-   up_write(>s_umount);
return error;
}
}
@@ -1531,7 +1522,29 @@ int thaw_super(struct super_block *sb)
sb_freeze_unlock(sb);
 out:
wake_up(>s_writers.wait_unfrozen);
-   deactivate_locked_super(sb);
return 0;
 }
+
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+   int error;
+
+   down_write(>s_umount);
+   error = thaw_locked_super(sb);
+   if (error) {
+   up_write(>s_umount);
+   goto out;
+   }
+
+   deactivate_locked_super(sb);
+
+out:
+   return error;
+}
 EXPORT_SYMBOL(thaw_super);
-- 
2.15.0



[PATCH 05/11] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()

2017-11-29 Thread Luis R. Rodriguez
There are use cases where we wish to traverse the superblock list
but also capture errors, and in which case we want to avoid having
our callers issue a lock themselves since we can do the locking for
the callers. Provide a iterate_supers_excl() which calls a function
with the write lock held. If an error occurs we capture it and
propagate it.

Likewise there are use cases where we wish to traverse the superblock
list but in reverse order. The new iterate_supers_reverse_excl() helpers
does this but also also captures any errors encountered.

Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 91 ++
 include/linux/fs.h |  2 ++
 2 files changed, 93 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index a63513d187e8..885711c1d35b 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -605,6 +605,97 @@ void iterate_supers(void (*f)(struct super_block *, void 
*), void *arg)
spin_unlock(_lock);
 }
 
+/**
+ * iterate_supers_excl - exclusively call func for all active superblocks
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument. Returns 0 unless an error
+ * occurred on calling the function on any superblock.
+ */
+int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
+{
+   struct super_block *sb, *p = NULL;
+   int error = 0;
+
+   spin_lock(_lock);
+   list_for_each_entry(sb, _blocks, s_list) {
+   if (hlist_unhashed(>s_instances))
+   continue;
+   sb->s_count++;
+   spin_unlock(_lock);
+
+   down_write(>s_umount);
+   if (sb->s_root && (sb->s_flags & SB_BORN)) {
+   error = f(sb, arg);
+   if (error) {
+   up_write(>s_umount);
+   spin_lock(_lock);
+   __put_super(sb);
+   break;
+   }
+   }
+   up_write(>s_umount);
+
+   spin_lock(_lock);
+   if (p)
+   __put_super(p);
+   p = sb;
+   }
+   if (p)
+   __put_super(p);
+   spin_unlock(_lock);
+
+   return error;
+}
+
+/**
+ * iterate_supers_reverse_excl - exclusively calls func in reverse order
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument, in reverse order, and holding
+ * the s_umount write lock. Returns if an error occurred.
+ */
+int iterate_supers_reverse_excl(int (*f)(struct super_block *, void *),
+void *arg)
+{
+   struct super_block *sb, *p = NULL;
+   int error = 0;
+
+   spin_lock(_lock);
+   list_for_each_entry_reverse(sb, _blocks, s_list) {
+   if (hlist_unhashed(>s_instances))
+   continue;
+   sb->s_count++;
+   spin_unlock(_lock);
+
+   down_write(>s_umount);
+   if (sb->s_root && (sb->s_flags & SB_BORN)) {
+   error = f(sb, arg);
+   if (error) {
+   up_write(>s_umount);
+   spin_lock(_lock);
+   __put_super(sb);
+   break;
+   }
+   }
+   up_write(>s_umount);
+
+   spin_lock(_lock);
+   if (p)
+   __put_super(p);
+   p = sb;
+   }
+   if (p)
+   __put_super(p);
+   spin_unlock(_lock);
+
+   return error;
+}
+
 /**
  * iterate_supers_type - call function for superblocks of given type
  * @type: fs type
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 107725b20fad..fe90b6542697 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3164,6 +3164,8 @@ extern struct super_block *get_active_super(struct 
block_device *bdev);
 extern void drop_super(struct super_block *sb);
 extern void drop_super_exclusive(struct super_block *sb);
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern int iterate_supers_excl(int (*f)(struct super_block *, void *), void 
*arg);
+extern int iterate_supers_reverse_excl(int (*)(struct super_block *, void *), 
void *);
 extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
 
-- 
2.15.0



[PATCH 07/11] xfs: remove not needed freezing calls

2017-11-29 Thread Luis R. Rodriguez
This removes superflous freezer calls as they are no longer needed
as the VFS now performs filesystem freezing/thaw if the filesystem has
support for it.

The following Coccinelle rule was used as follows:

spatch --sp-file fs-freeze-cleanup.cocci --in-place fs/$FS/

@ has_freeze_fs @
identifier super_ops;
expression freeze_op;
@@

struct super_operations super_ops = {
.freeze_fs = freeze_op,
};

@ remove_set_freezable depends on has_freeze_fs @
expression time;
statement S, S2, S3;
expression task;
@@

(
-   set_freezable();
|
-   if (try_to_freeze())
-   continue;
|
-   try_to_freeze();
|
-   freezable_schedule();
+   schedule();
|
-   freezable_schedule_timeout(time);
+   schedule_timeout(time);
|
-   if (freezing(task)) { S }
|
-   if (freezing(task)) { S }
-   else
{ S2 }
|
-   freezing(current)
)

Generated-by: Coccinelle SmPL
Signed-off-by: Luis R. Rodriguez 
---
 fs/xfs/xfs_trans_ail.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index cef89f7127d3..1f3dd10a9d00 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -513,7 +513,6 @@ xfsaild(
longtout = 0;   /* milliseconds */
 
current->flags |= PF_MEMALLOC;
-   set_freezable();
 
while (1) {
if (tout && tout <= 20)
@@ -551,19 +550,17 @@ xfsaild(
if (!xfs_ail_min(ailp) &&
ailp->xa_target == ailp->xa_target_prev) {
spin_unlock(>xa_lock);
-   freezable_schedule();
+   schedule();
tout = 0;
continue;
}
spin_unlock(>xa_lock);
 
if (tout)
-   freezable_schedule_timeout(msecs_to_jiffies(tout));
+   schedule_timeout(msecs_to_jiffies(tout));
 
__set_current_state(TASK_RUNNING);
 
-   try_to_freeze();
-
tout = xfsaild_push(ailp);
}
 
-- 
2.15.0



[PATCH 06/11] fs: freeze on suspend and thaw on resume

2017-11-29 Thread Luis R. Rodriguez
This uses the existing filesystem freeze and thaw callbacks to
freeze each filesystem on suspend/hibernation and thaw upon resume.

This is needed so that we properly really stop IO in flight without
races after userspace has been frozen. Without this we rely on
kthread freezing and its semantics are loose and error prone.
For instance, even though a kthread may use try_to_freeze() and end
up being frozen we have no way of being sure that everything that
has been spawned asynchronously from it (such as timers) have also
been stopped as well.

A long term advantage of also adding filesystem freeze / thawing
supporting durign suspend / hibernation is that long term we may
be able to eventually drop the kernel's thread freezing completely
as it was originally added to stop disk IO in flight as we hibernate
or suspend.

This also implies that many kthread users exist which have been
adding freezer semantics onto its kthreads without need. These also
will need to be reviewed later.

This is based on prior work originally by Rafael Wysocki and later by
Jiri Kosina.

Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 85 ++
 include/linux/fs.h | 13 
 kernel/power/process.c | 15 -
 3 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 885711c1d35b..c3a2842e5690 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1648,3 +1648,88 @@ int thaw_super(struct super_block *sb)
return error;
 }
 EXPORT_SYMBOL(thaw_super);
+
+#ifdef CONFIG_PM_SLEEP
+static bool super_should_freeze(struct super_block *sb)
+{
+   if (!sb->s_root)
+   return false;
+   if (!(sb->s_flags & MS_BORN))
+   return false;
+   /*
+* We don't freeze virtual filesystems, we skip those filesystems with
+* no backing device.
+*/
+   if (sb->s_bdi == _backing_dev_info)
+   return false;
+   /* No need to freeze read-only filesystems */
+   if (sb->s_flags & MS_RDONLY)
+   return false;
+
+   return true;
+}
+
+static int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+   int error = 0;
+
+   spin_lock(_lock);
+   if (!super_should_freeze(sb))
+   goto out;
+
+   pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
+
+   spin_unlock(_lock);
+
+   atomic_inc(>s_active);
+   error = freeze_locked_super(sb, false);
+   if (error)
+   atomic_dec(>s_active);
+
+   spin_lock(_lock);
+   if (error && error != -EBUSY)
+   pr_notice("%s (%s): Unable to freeze, error=%d",
+ sb->s_type->name, sb->s_id, error);
+
+out:
+   spin_unlock(_lock);
+   return error;
+}
+
+int fs_suspend_freeze(void)
+{
+   return iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
+}
+
+static int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+   int error = 0;
+
+   spin_lock(_lock);
+   if (!super_should_freeze(sb))
+   goto out;
+
+   pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
+
+   spin_unlock(_lock);
+
+   error = thaw_locked_super(sb, false);
+   if (!error)
+   atomic_dec(>s_active);
+
+   spin_lock(_lock);
+   if (error && error != -EBUSY)
+   pr_notice("%s (%s): Unable to unfreeze, error=%d",
+ sb->s_type->name, sb->s_id, error);
+
+out:
+   spin_unlock(_lock);
+   return error;
+}
+
+int fs_resume_unfreeze(void)
+{
+   return iterate_supers_excl(fs_suspend_thaw_sb, NULL);
+}
+
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fe90b6542697..dbaa69c3a4cf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2237,6 +2237,19 @@ extern int user_statfs(const char __user *, struct 
kstatfs *);
 extern int fd_statfs(int, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+#ifdef CONFIG_PM_SLEEP
+int fs_suspend_freeze(void);
+int fs_resume_unfreeze(void);
+#else
+static inline int fs_suspend_freeze(void)
+{
+   return 0;
+}
+static inline int fs_resume_unfreeze(void)
+{
+   return 0;
+}
+#endif
 extern bool our_mnt(struct vfsmount *mnt);
 extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index c326d7235c5f..7a44f8310968 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -145,6 +145,16 @@ int freeze_processes(void)
pr_cont("\n");
BUG_ON(in_atomic());
 
+   pr_info("Freezing filesystems ... ");
+   error = fs_suspend_freeze();
+   if (error) {
+   pr_cont("failed\n");
+   fs_resume_unfreeze();
+   thaw_processes();
+   return error;
+   }
+   pr_cont("done.\n");
+
/*
 * 

[PATCH 09/11] f2fs: remove not needed freezing calls

2017-11-29 Thread Luis R. Rodriguez
This removes superflous freezer calls as they are no longer needed
as the VFS now performs filesystem freezing/thaw if the filesystem has
support for it.

The following Coccinelle rule was used as follows:

spatch --sp-file fs-freeze-cleanup.cocci --in-place fs/$FS/

@ has_freeze_fs @
identifier super_ops;
expression freeze_op;
@@

struct super_operations super_ops = {
.freeze_fs = freeze_op,
};

@ remove_set_freezable depends on has_freeze_fs @
expression time;
statement S, S2, S3;
expression task;
@@

(
-   set_freezable();
|
-   if (try_to_freeze())
-   continue;
|
-   try_to_freeze();
|
-   freezable_schedule();
+   schedule();
|
-   freezable_schedule_timeout(time);
+   schedule_timeout(time);
|
-   if (freezing(task)) { S }
|
-   if (freezing(task)) { S }
-   else
{ S2 }
|
-   freezing(current)
)

Generated-by: Coccinelle SmPL
Signed-off-by: Luis R. Rodriguez 
---
 fs/f2fs/gc.c  | 5 +
 fs/f2fs/segment.c | 6 +-
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d844dcb80570..1032d6aa1756 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -32,10 +32,9 @@ static int gc_thread_func(void *data)
 
wait_ms = gc_th->min_sleep_time;
 
-   set_freezable();
do {
wait_event_interruptible_timeout(*wq,
-   kthread_should_stop() || freezing(current) ||
+   kthread_should_stop() ||
gc_th->gc_wake,
msecs_to_jiffies(wait_ms));
 
@@ -43,8 +42,6 @@ static int gc_thread_func(void *data)
if (gc_th->gc_wake)
gc_th->gc_wake = 0;
 
-   if (try_to_freeze())
-   continue;
if (kthread_should_stop())
break;
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c117e0913f2a..a55e456e67ee 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1382,18 +1382,14 @@ static int issue_discard_thread(void *data)
unsigned int wait_ms = DEF_MIN_DISCARD_ISSUE_TIME;
int issued;
 
-   set_freezable();
-
do {
init_discard_policy(, DPOLICY_BG,
dcc->discard_granularity);
 
wait_event_interruptible_timeout(*q,
-   kthread_should_stop() || freezing(current) ||
+   kthread_should_stop() ||
dcc->discard_wake,
msecs_to_jiffies(wait_ms));
-   if (try_to_freeze())
-   continue;
if (kthread_should_stop())
return 0;
 
-- 
2.15.0



[PATCH 08/11] ext4: remove not needed freezing calls

2017-11-29 Thread Luis R. Rodriguez
This removes superflous freezer calls as they are no longer needed
as the VFS now performs filesystem freezing/thaw if the filesystem has
support for it.

The following Coccinelle rule was used as follows:

spatch --sp-file fs-freeze-cleanup.cocci --in-place fs/$FS/

@ has_freeze_fs @
identifier super_ops;
expression freeze_op;
@@

struct super_operations super_ops = {
.freeze_fs = freeze_op,
};

@ remove_set_freezable depends on has_freeze_fs @
expression time;
statement S, S2, S3;
expression task;
@@

(
-   set_freezable();
|
-   if (try_to_freeze())
-   continue;
|
-   try_to_freeze();
|
-   freezable_schedule();
+   schedule();
|
-   freezable_schedule_timeout(time);
+   schedule_timeout(time);
|
-   if (freezing(task)) { S }
|
-   if (freezing(task)) { S }
-   else
{ S2 }
|
-   freezing(current)
)

Generated-by: Coccinelle SmPL
Signed-off-by: Luis R. Rodriguez 
---
 fs/ext4/super.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 57a8fa451d3e..8a510b1c2d92 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2980,8 +2980,6 @@ static int ext4_lazyinit_thread(void *arg)
}
mutex_unlock(>li_list_mtx);
 
-   try_to_freeze();
-
cur = jiffies;
if ((time_after_eq(cur, next_wakeup)) ||
(MAX_JIFFY_OFFSET == next_wakeup)) {
-- 
2.15.0



[PATCH 10/11] nilfs2: remove not needed freezing calls

2017-11-29 Thread Luis R. Rodriguez
This removes superflous freezer calls as they are no longer needed
as the VFS now performs filesystem freezing/thaw if the filesystem has
support for it.

The following Coccinelle rule was used as follows:

spatch --sp-file fs-freeze-cleanup.cocci --in-place fs/$FS/

@ has_freeze_fs @
identifier super_ops;
expression freeze_op;
@@

struct super_operations super_ops = {
.freeze_fs = freeze_op,
};

@ remove_set_freezable depends on has_freeze_fs @
expression time;
statement S, S2, S3;
expression task;
@@

(
-   set_freezable();
|
-   if (try_to_freeze())
-   continue;
|
-   try_to_freeze();
|
-   freezable_schedule();
+   schedule();
|
-   freezable_schedule_timeout(time);
+   schedule_timeout(time);
|
-   if (freezing(task)) { S }
|
-   if (freezing(task)) { S }
-   else
{ S2 }
|
-   freezing(current)
)

Generated-by: Coccinelle SmPL
Signed-off-by: Luis R. Rodriguez 
---
 fs/nilfs2/segment.c | 48 
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9f3ffba41533..407e12a60145 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2543,6 +2543,8 @@ static int nilfs_segctor_thread(void *arg)
struct nilfs_sc_info *sci = (struct nilfs_sc_info *)arg;
struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
int timeout = 0;
+   DEFINE_WAIT(wait);
+   int should_sleep = 1;
 
sci->sc_timer_task = current;
 
@@ -2574,38 +2576,28 @@ static int nilfs_segctor_thread(void *arg)
timeout = 0;
}
 
+   prepare_to_wait(>sc_wait_daemon, ,
+   TASK_INTERRUPTIBLE);
 
-   if (freezing(current)) {
+   if (sci->sc_seq_request != sci->sc_seq_done)
+   should_sleep = 0;
+   else if (sci->sc_flush_request)
+   should_sleep = 0;
+   else if (sci->sc_state & NILFS_SEGCTOR_COMMIT)
+   should_sleep = time_before(jiffies,
+   sci->sc_timer.expires);
+
+   if (should_sleep) {
spin_unlock(>sc_state_lock);
-   try_to_freeze();
+   schedule();
spin_lock(>sc_state_lock);
-   } else {
-   DEFINE_WAIT(wait);
-   int should_sleep = 1;
-
-   prepare_to_wait(>sc_wait_daemon, ,
-   TASK_INTERRUPTIBLE);
-
-   if (sci->sc_seq_request != sci->sc_seq_done)
-   should_sleep = 0;
-   else if (sci->sc_flush_request)
-   should_sleep = 0;
-   else if (sci->sc_state & NILFS_SEGCTOR_COMMIT)
-   should_sleep = time_before(jiffies,
-   sci->sc_timer.expires);
-
-   if (should_sleep) {
-   spin_unlock(>sc_state_lock);
-   schedule();
-   spin_lock(>sc_state_lock);
-   }
-   finish_wait(>sc_wait_daemon, );
-   timeout = ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
-  time_after_eq(jiffies, sci->sc_timer.expires));
-
-   if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs))
-   set_nilfs_discontinued(nilfs);
}
+   finish_wait(>sc_wait_daemon, );
+   timeout = ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
+  time_after_eq(jiffies, sci->sc_timer.expires));
+
+   if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs))
+   set_nilfs_discontinued(nilfs);
goto loop;
 
  end_thread:
-- 
2.15.0



[PATCH 11/11] jfs: remove not needed freezing calls

2017-11-29 Thread Luis R. Rodriguez
This removes superflous freezer calls as they are no longer needed
as the VFS now performs filesystem freezing/thaw if the filesystem has
support for it.

The following Coccinelle rule was used as follows:

spatch --sp-file fs-freeze-cleanup.cocci --in-place fs/$FS/

@ has_freeze_fs @
identifier super_ops;
expression freeze_op;
@@

struct super_operations super_ops = {
.freeze_fs = freeze_op,
};

@ remove_set_freezable depends on has_freeze_fs @
expression time;
statement S, S2, S3;
expression task;
@@

(
-   set_freezable();
|
-   if (try_to_freeze())
-   continue;
|
-   try_to_freeze();
|
-   freezable_schedule();
+   schedule();
|
-   freezable_schedule_timeout(time);
+   schedule_timeout(time);
|
-   if (freezing(task)) { S }
|
-   if (freezing(task)) { S }
-   else
{ S2 }
|
-   freezing(current)
)

Generated-by: Coccinelle SmPL
Signed-off-by: Luis R. Rodriguez 
---
 fs/jfs/jfs_logmgr.c | 11 +++
 fs/jfs/jfs_txnmgr.c | 31 +--
 2 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 0e5d412c0b01..fa5a95d8fba8 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2344,14 +2344,9 @@ int jfsIOWait(void *arg)
spin_lock_irq(_redrive_lock);
}
 
-   if (freezing(current)) {
-   spin_unlock_irq(_redrive_lock);
-   try_to_freeze();
-   } else {
-   set_current_state(TASK_INTERRUPTIBLE);
-   spin_unlock_irq(_redrive_lock);
-   schedule();
-   }
+   set_current_state(TASK_INTERRUPTIBLE);
+   spin_unlock_irq(_redrive_lock);
+   schedule();
} while (!kthread_should_stop());
 
jfs_info("jfsIOWait being killed!");
diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
index 4d973524c887..a313300c4651 100644
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
@@ -2747,6 +2747,7 @@ int jfs_lazycommit(void *arg)
struct tblock *tblk;
unsigned long flags;
struct jfs_sb_info *sbi;
+   DECLARE_WAITQUEUE(wq, current);
 
do {
LAZY_LOCK(flags);
@@ -2793,19 +2794,11 @@ int jfs_lazycommit(void *arg)
}
/* In case a wakeup came while all threads were active */
jfs_commit_thread_waking = 0;
-
-   if (freezing(current)) {
-   LAZY_UNLOCK(flags);
-   try_to_freeze();
-   } else {
-   DECLARE_WAITQUEUE(wq, current);
-
-   add_wait_queue(_commit_thread_wait, );
-   set_current_state(TASK_INTERRUPTIBLE);
-   LAZY_UNLOCK(flags);
-   schedule();
-   remove_wait_queue(_commit_thread_wait, );
-   }
+   add_wait_queue(_commit_thread_wait, );
+   set_current_state(TASK_INTERRUPTIBLE);
+   LAZY_UNLOCK(flags);
+   schedule();
+   remove_wait_queue(_commit_thread_wait, );
} while (!kthread_should_stop());
 
if (!list_empty(_queue))
@@ -2982,15 +2975,9 @@ int jfs_sync(void *arg)
}
/* Add anon_list2 back to anon_list */
list_splice_init(_list2, _list);
-
-   if (freezing(current)) {
-   TXN_UNLOCK();
-   try_to_freeze();
-   } else {
-   set_current_state(TASK_INTERRUPTIBLE);
-   TXN_UNLOCK();
-   schedule();
-   }
+   set_current_state(TASK_INTERRUPTIBLE);
+   TXN_UNLOCK();
+   schedule();
} while (!kthread_should_stop());
 
jfs_info("jfs_sync being killed");
-- 
2.15.0



Re: [RFC 5/5] pm: remove kernel thread freezing

2017-11-29 Thread Luis R. Rodriguez
On Wed, Oct 04, 2017 at 01:03:54AM +, Bart Van Assche wrote:
> On Wed, 2017-10-04 at 02:47 +0200, Luis R. Rodriguez wrote:
> >   3) Lookup for kthreads which seem to generate IO -- address / review if
> >  removal of the freezer API can be done somehow with a quescing. This
> >  is currently for example being done on SCSI / md.
> >  4) Only after all the above is done should we consider this patch or some
> > form of it.
> 
> After having given this more thought, I think we should omit these last two
> steps. Modifying the md driver such that it does not submit I/O requests while
> processes are frozen requires either to use the freezer API or to open-code 
> it.
> I think there is general agreement in the kernel community that open-coding a
> single mechanism in multiple drivers is wrong. Does this mean that instead of
> removing the freezer API we should keep it and review all its users instead?

Agreed.

  Luis


Re: [PATCH] [RFC] um: Convert ubd driver to blk-mq

2017-11-29 Thread Christoph Hellwig
On Sun, Nov 26, 2017 at 02:10:53PM +0100, Richard Weinberger wrote:
> MAX_SG is 64, used for blk_queue_max_segments(). This comes from
> a0044bdf60c2 ("uml: batch I/O requests"). Is this still a good/sane
> value for blk-mq?

blk-mq itself doesn't change the tradeoff.

> The driver does IO batching, for each request it issues many UML struct
> io_thread_req request to the IO thread on the host side.
> One io_thread_req per SG page.
> Before the conversion the driver used blk_end_request() to indicate that
> a part of the request is done.
> blk_mq_end_request() does not take a length parameter, therefore we can
> only mark the whole request as done. See the new is_last property on the
> driver.
> Maybe there is a way to partially end requests too in blk-mq?

You can, take a look at scsi_end_request which handles this for blk-mq
and the legacy layer.  That being said I wonder if batching really
makes that much sene if you execute each segment separately?

> Another obstacle with IO batching is that UML IO thread requests can
> fail. Not only due to OOM, also because the pipe between the UML kernel
> process and the host IO thread can return EAGAIN.
> In this case the driver puts the request into a list and retried later
> again when the pipe turns writable.
> I’m not sure whether this restart logic makes sense with blk-mq, maybe
> there is a way in blk-mq to put back a (partial) request?

blk_mq_requeue_request requeues requests that have been partially
exectuted (or not at all for that matter).


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-29 Thread Christian Borntraeger

On 11/29/2017 08:18 PM, Christian Borntraeger wrote:
> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot 
> disk.
> Seems that this is the place where the system stops. (see the sysrq-t output
> at the bottom).

FWIW, the failing kernel had CONFIG_NR_CPUS=256 and 32 CPUS (with SMT2) == 64 
threads
with CONFIG_NR_CPUS=16 the system booted fine.



Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-29 Thread Christian Borntraeger
Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
Seems that this is the place where the system stops. (see the sysrq-t output
at the bottom).


Message
"[0.247484] Linux version 4.15.0-rc1+ (cborntra@s38lp08) (gcc version 6.3.1 
2
"
"0161221 (Red Hat 6.3.1-1.0.ibm) (GCC)) #229 SMP Wed Nov 29 20:05:35 CET 2017
"
"[0.247489] setup: Linux is running natively in 64-bit mode
"
"[0.247661] setup: The maximum memory size is 1048576MB
"
"[0.247670] setup: Reserving 1024MB of memory at 1047552MB for crashkernel 
(S
"
"ystem RAM: 1047552MB)
"
"[0.247688] numa: NUMA mode: plain
"
"[0.247794] cpu: 64 configured CPUs, 0 standby CPUs
"
"[0.247834] cpu: The CPU configuration topology of the machine is: 0 0 4 2 
3 
"
"8 / 4
"
"[0.248279] Write protected kernel read-only data: 12456k
"
"[0.265131] Zone ranges:
"
"[0.265134]   DMA  [mem 0x-0x7fff]
"
"[0.265136]   Normal   [mem 0x8000-0x00ff]
"
"[0.265137] Movable zone start for each node
"
"[0.265138] Early memory node ranges
"
"[0.265139]   node   0: [mem 0x-0x00ff]
"
"[0.265141] Initmem setup node 0 [mem 0x-0x00ff]
"
"[7.445561] random: fast init done
"
"[7.449194] percpu: Embedded 23 pages/cpu @00fbbe60 s56064 r8192 
d299
"
"52 u94208
"
"[7.449380] Built 1 zonelists, mobility grouping on.  Total pages: 264241152
"
"[7.449381] Policy zone: Normal
"
"[7.449384] Kernel command line: elevator=deadline audit_enable=0 audit=0 
aud
"
"it_debug=0 selinux=0 crashkernel=1024M printk.time=1 zfcp.dbfsize=100 
dasd=241c,
"
"241d,241e,241f root=/dev/dasda1 kvm.nested=1  BOOT_IMAGE=0
"
"[7.449420] audit: disabled (until reboot)
"
"[7.450513] log_buf_len individual max cpu contribution: 4096 bytes
"
"[7.450514] log_buf_len total cpu_extra contributions: 1044480 bytes
"
"[7.450515] log_buf_len min size: 131072 bytes
"
"[7.450788] log_buf_len: 2097152 bytes
"
"[7.450789] early log buf free: 125076(95%)
"
"[   11.040620] Memory: 1055873868K/1073741824K available (8248K kernel code, 
107
"
"8K rwdata, 4204K rodata, 812K init, 700K bss, 17867956K reserved, 0K 
cma-reserve
"
"d)
"
"[   11.040938] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=256, Nodes=1
"
"[   11.040969] ftrace: allocating 26506 entries in 104 pages
"
"[   11.051476] Hierarchical RCU implementation.
"
"[   11.051476]  RCU event tracing is enabled.
"
"[   11.051478]  RCU debug extended QS entry/exit.
"
"[   11.053263] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3
"
"[   11.053444] clocksource: tod: mask: 0x max_cycles: 
0x3b0a9be8
"
"03b0a9, max_idle_ns: 1805497147909793 ns
"
"[   11.160192] console [ttyS0] enabled
"
"[   11.308228] pid_max: default: 262144 minimum: 2048
"
"[   11.308298] Security Framework initialized
"
"[   11.308300] SELinux:  Disabled at boot.
"
"[   11.354028] Dentry cache hash table entries: 33554432 (order: 16, 268435456 
b
"
"ytes)
"
"[   11.376945] Inode-cache hash table entries: 16777216 (order: 15, 134217728 
by
"
"tes)
"
"[   11.377685] Mount-cache hash table entries: 524288 (order: 10, 4194304 
bytes)
"

"[   11.378401] Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 
b
"
"ytes)
"
"[   11.378984] Hierarchical SRCU implementation.
"
"[   11.380032] smp: Bringing up secondary CPUs ...
"
"[   11.393634] smp: Brought up 1 node, 64 CPUs
"
"[   11.585458] devtmpfs: initialized
"
"[   11.588589] clocksource: jiffies: mask: 0x max_cycles: 0x, 
ma
"
"x_idle_ns: 1911260446275 ns
"
"[   11.588998] futex hash table entries: 65536 (order: 12, 16777216 bytes)
"
"[   11.591926] NET: Registered protocol family 16
"
"[   11.596413] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages
"
"[   11.597604] SCSI subsystem initialized
"
"[   11.597611] pps_core: LinuxPPS API ver. 1 registered
"
"[   11.597612] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo 
Giome
"
"tti 
"
"[   11.597614] PTP clock support registered
"
"[   11.599088] NetLabel: Initializing
"
"[   11.599089] NetLabel:  domain hash size = 128
"
"[   11.599090] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
"
"[   11.599101] NetLabel:  unlabeled traffic allowed by default
"
"[   11.612542] PCI host bridge to bus :00
"
"[   11.612546] pci_bus :00: root bus resource [mem 
0x8000-0x8000
"
"007f 64bit pref]
"
"[   11.612548] pci_bus :00: No busn resource found for root bus, will use 
[b
"
"us 00-ff]
"
"[   11.616458] iommu: Adding device :00:00.0 to group 0
"
"[   12.291894] VFS: Disk quotas dquot_6.6.0
"
"[   12.291942] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
"
"[   12.292226] NET: Registered protocol family 2
"
"[   12.292662] TCP established hash table entries: 524288 (order: 

Re: [GIT PULL] nvme fixes for Linux 4.15

2017-11-29 Thread Jens Axboe
On 11/29/2017 09:16 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> a few more nvme updates for 4.15.  A single small PCIe fix, and a number
> of patches for RDMA that are a little larger than what I'd like to see
> for -rc2, but they fix important issues seen in the wild.

Looks good to me, pulled. Thanks.

-- 
Jens Axboe



[PATCH V15 02/22] mmc: block: Simplify cleaning up the queue

2017-11-29 Thread Adrian Hunter
Use blk_cleanup_queue() to shutdown the queue when the driver is removed,
and instead get an extra reference to the queue to prevent the queue being
freed before the final mmc_blk_put().

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 17 -
 drivers/mmc/core/queue.c |  2 ++
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ccfa98af1dd3..e44f6d90aeb4 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -189,7 +189,7 @@ static void mmc_blk_put(struct mmc_blk_data *md)
md->usage--;
if (md->usage == 0) {
int devidx = mmc_get_devidx(md->disk);
-   blk_cleanup_queue(md->queue.queue);
+   blk_put_queue(md->queue.queue);
ida_simple_remove(_blk_ida, devidx);
put_disk(md->disk);
kfree(md);
@@ -2156,6 +2156,17 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct 
mmc_card *card,
 
md->queue.blkdata = md;
 
+   /*
+* Keep an extra reference to the queue so that we can shutdown the
+* queue (i.e. call blk_cleanup_queue()) while there are still
+* references to the 'md'. The corresponding blk_put_queue() is in
+* mmc_blk_put().
+*/
+   if (!blk_get_queue(md->queue.queue)) {
+   mmc_cleanup_queue(>queue);
+   goto err_putdisk;
+   }
+
md->disk->major = MMC_BLOCK_MAJOR;
md->disk->first_minor = devidx * perdev_minors;
md->disk->fops = _bdops;
@@ -2471,10 +2482,6 @@ static void mmc_blk_remove_req(struct mmc_blk_data *md)
 * from being accepted.
 */
card = md->queue.card;
-   spin_lock_irq(md->queue.queue->queue_lock);
-   queue_flag_set(QUEUE_FLAG_BYPASS, md->queue.queue);
-   spin_unlock_irq(md->queue.queue->queue_lock);
-   blk_set_queue_dying(md->queue.queue);
mmc_cleanup_queue(>queue);
if (md->disk->flags & GENHD_FL_UP) {
device_remove_file(disk_to_dev(md->disk), 
>force_ro);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 26f8da30ebe5..ae6d9da68735 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -268,6 +268,8 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
blk_start_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
 
+   blk_cleanup_queue(q);
+
mq->card = NULL;
 }
 
-- 
1.9.1



[PATCH V15 03/22] mmc: core: Make mmc_pre_req() and mmc_post_req() available

2017-11-29 Thread Adrian Hunter
Make mmc_pre_req() and mmc_post_req() available to the card drivers. Later
patches will make use of this.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/core.c | 31 ---
 drivers/mmc/core/core.h | 31 +++
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 1f0f44f4dd5f..7ca6e4866a8b 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -658,37 +658,6 @@ bool mmc_is_req_done(struct mmc_host *host, struct 
mmc_request *mrq)
 EXPORT_SYMBOL(mmc_is_req_done);
 
 /**
- * mmc_pre_req - Prepare for a new request
- * @host: MMC host to prepare command
- * @mrq: MMC request to prepare for
- *
- * mmc_pre_req() is called in prior to mmc_start_req() to let
- * host prepare for the new request. Preparation of a request may be
- * performed while another request is running on the host.
- */
-static void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq)
-{
-   if (host->ops->pre_req)
-   host->ops->pre_req(host, mrq);
-}
-
-/**
- * mmc_post_req - Post process a completed request
- * @host: MMC host to post process command
- * @mrq: MMC request to post process for
- * @err: Error, if non zero, clean up any resources made in pre_req
- *
- * Let the host post process a completed request. Post processing of
- * a request may be performed while another reuqest is running.
- */
-static void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq,
-int err)
-{
-   if (host->ops->post_req)
-   host->ops->post_req(host, mrq, err);
-}
-
-/**
  * mmc_finalize_areq() - finalize an asynchronous request
  * @host: MMC host to finalize any ongoing request on
  *
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index b2877e2d740f..3e3d21304e5f 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -150,4 +150,35 @@ static inline void mmc_claim_host(struct mmc_host *host)
 void mmc_cqe_post_req(struct mmc_host *host, struct mmc_request *mrq);
 int mmc_cqe_recovery(struct mmc_host *host);
 
+/**
+ * mmc_pre_req - Prepare for a new request
+ * @host: MMC host to prepare command
+ * @mrq: MMC request to prepare for
+ *
+ * mmc_pre_req() is called in prior to mmc_start_req() to let
+ * host prepare for the new request. Preparation of a request may be
+ * performed while another request is running on the host.
+ */
+static inline void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+   if (host->ops->pre_req)
+   host->ops->pre_req(host, mrq);
+}
+
+/**
+ * mmc_post_req - Post process a completed request
+ * @host: MMC host to post process command
+ * @mrq: MMC request to post process for
+ * @err: Error, if non zero, clean up any resources made in pre_req
+ *
+ * Let the host post process a completed request. Post processing of
+ * a request may be performed while another request is running.
+ */
+static inline void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq,
+   int err)
+{
+   if (host->ops->post_req)
+   host->ops->post_req(host, mrq, err);
+}
+
 #endif
-- 
1.9.1



[PATCH V15 05/22] mmc: core: Add parameter use_blk_mq

2017-11-29 Thread Adrian Hunter
Until mmc has blk-mq support fully implemented and tested, add a parameter
use_blk_mq, set to true if config option MMC_MQ_DEFAULT is selected, which
it is by default.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/Kconfig  | 10 ++
 drivers/mmc/core/core.c  |  7 +++
 drivers/mmc/core/core.h  |  2 ++
 drivers/mmc/core/host.c  |  2 ++
 drivers/mmc/core/host.h  |  4 
 include/linux/mmc/host.h |  1 +
 6 files changed, 26 insertions(+)

diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig
index ec21388311db..42565562577c 100644
--- a/drivers/mmc/Kconfig
+++ b/drivers/mmc/Kconfig
@@ -12,6 +12,16 @@ menuconfig MMC
  If you want MMC/SD/SDIO support, you should say Y here and
  also to your specific host controller driver.
 
+config MMC_MQ_DEFAULT
+   bool "MMC: use blk-mq I/O path by default"
+   depends on MMC && BLOCK
+   default y
+   ---help---
+ This option enables the new blk-mq based I/O path for MMC block
+ devices by default.  With the option the mmc_core.use_blk_mq
+ module/boot option defaults to Y, without it to N, but it can
+ still be overridden either way.
+
 if MMC
 
 source "drivers/mmc/core/Kconfig"
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 7ca6e4866a8b..617802f45386 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -66,6 +66,13 @@
 bool use_spi_crc = 1;
 module_param(use_spi_crc, bool, 0);
 
+#ifdef CONFIG_MMC_MQ_DEFAULT
+bool mmc_use_blk_mq = true;
+#else
+bool mmc_use_blk_mq = false;
+#endif
+module_param_named(use_blk_mq, mmc_use_blk_mq, bool, S_IWUSR | S_IRUGO);
+
 static int mmc_schedule_delayed_work(struct delayed_work *work,
 unsigned long delay)
 {
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 3e3d21304e5f..136617d2f971 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -35,6 +35,8 @@ struct mmc_bus_ops {
int (*reset)(struct mmc_host *);
 };
 
+extern bool mmc_use_blk_mq;
+
 void mmc_attach_bus(struct mmc_host *host, const struct mmc_bus_ops *ops);
 void mmc_detach_bus(struct mmc_host *host);
 
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 64b03d6eaf18..409a68a96a0a 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -404,6 +404,8 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
*dev)
 
host->fixed_drv_type = -EINVAL;
 
+   host->use_blk_mq = mmc_use_blk_mq;
+
return host;
 }
 
diff --git a/drivers/mmc/core/host.h b/drivers/mmc/core/host.h
index fb689a1065ed..6eaf558e62d6 100644
--- a/drivers/mmc/core/host.h
+++ b/drivers/mmc/core/host.h
@@ -74,6 +74,10 @@ static inline bool mmc_card_hs400es(struct mmc_card *card)
return card->host->ios.enhanced_strobe;
 }
 
+static inline bool mmc_host_use_blk_mq(struct mmc_host *host)
+{
+   return host->use_blk_mq;
+}
 
 #endif
 
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index e7743eca1021..ce2075d6f429 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -380,6 +380,7 @@ struct mmc_host {
unsigned intdoing_retune:1; /* re-tuning in progress */
unsigned intretune_now:1;   /* do re-tuning at next req */
unsigned intretune_paused:1; /* re-tuning is temporarily 
disabled */
+   unsigned intuse_blk_mq:1;   /* use blk-mq */
 
int rescan_disable; /* disable card detection */
int rescan_entered; /* used with nonremovable 
devices */
-- 
1.9.1



[PATCH V15 06/22] mmc: block: Add blk-mq support

2017-11-29 Thread Adrian Hunter
Define and use a blk-mq queue. Discards and flushes are processed
synchronously, but reads and writes asynchronously. In order to support
slow DMA unmapping, DMA unmapping is not done until after the next request
is started. That means the request is not completed until then. If there is
no next request then the completion is done by queued work.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 502 ++-
 drivers/mmc/core/block.h |   9 +
 drivers/mmc/core/queue.c | 296 +---
 drivers/mmc/core/queue.h |  32 +++
 4 files changed, 808 insertions(+), 31 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 7dcd5d5b203b..7874c3bbf6b5 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1220,6 +1220,14 @@ static inline void mmc_blk_reset_success(struct 
mmc_blk_data *md, int type)
md->reset_done &= ~type;
 }
 
+static void mmc_blk_end_request(struct request *req, blk_status_t error)
+{
+   if (req->mq_ctx)
+   blk_mq_end_request(req, error);
+   else
+   blk_end_request_all(req, error);
+}
+
 /*
  * The non-block commands come back from the block layer after it queued it and
  * processed it with all other requests and then they get issued in this
@@ -1281,7 +1289,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, 
struct request *req)
break;
}
mq_rq->drv_op_result = ret;
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   mmc_blk_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
@@ -1324,7 +1332,7 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue 
*mq, struct request *req)
else
mmc_blk_reset_success(md, type);
 fail:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   mmc_blk_end_request(req, status);
 }
 
 static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
@@ -1394,7 +1402,7 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue 
*mq,
if (!err)
mmc_blk_reset_success(md, type);
 out:
-   blk_end_request(req, status, blk_rq_bytes(req));
+   mmc_blk_end_request(req, status);
 }
 
 static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
@@ -1404,7 +1412,7 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, 
struct request *req)
int ret = 0;
 
ret = mmc_flush_cache(card);
-   blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+   mmc_blk_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1481,11 +1489,9 @@ static void mmc_blk_eval_resp_error(struct 
mmc_blk_request *brq)
}
 }
 
-static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
-struct mmc_async_req *areq)
+static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
+  struct mmc_queue_req *mq_mrq)
 {
-   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-   areq);
struct mmc_blk_request *brq = _mrq->brq;
struct request *req = mmc_queue_req_to_req(mq_mrq);
int need_retune = card->host->need_retune;
@@ -1591,6 +1597,15 @@ static enum mmc_blk_status mmc_blk_err_check(struct 
mmc_card *card,
return MMC_BLK_SUCCESS;
 }
 
+static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
+struct mmc_async_req *areq)
+{
+   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+   areq);
+
+   return __mmc_blk_err_check(card, mq_mrq);
+}
+
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
  int disable_multi, bool *do_rel_wr_p,
  bool *do_data_tag_p)
@@ -1783,6 +1798,477 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req 
*mqrq,
mqrq->areq.err_check = mmc_blk_err_check;
 }
 
+#define MMC_MAX_RETRIES5
+#define MMC_NO_RETRIES (MMC_MAX_RETRIES + 1)
+
+#define MMC_READ_SINGLE_RETRIES2
+
+/* Single sector read during recovery */
+static void mmc_blk_read_single(struct mmc_queue *mq, struct request *req)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_request *mrq = >brq.mrq;
+   struct mmc_card *card = mq->card;
+   struct mmc_host *host = card->host;
+   blk_status_t error = BLK_STS_OK;
+   int retries = 0;
+
+   do {
+   u32 status;
+   int err;
+
+   mmc_blk_rw_rq_prep(mqrq, card, 1, mq);
+
+   mmc_wait_for_req(host, mrq);
+
+   err = mmc_send_status(card, );
+   if (err)

[PATCH V15 07/22] mmc: block: Add CQE support

2017-11-29 Thread Adrian Hunter
Add CQE support to the block driver, including:
- optionally using DCMD for flush requests
- "manually" issuing discard requests
- issuing read / write requests to the CQE
- supporting block-layer timeouts
- handling recovery
- supporting re-tuning

CQE offers 25% - 50% better random multi-threaded I/O.  There is a slight
(e.g. 2%) drop in sequential read speed but no observable change to sequential
write.

CQE automatically sends the commands to complete requests.  However it only
supports reads / writes and so-called "direct commands" (DCMD).  Furthermore
DCMD is limited to one command at a time, but discards require 3 commands.
That makes issuing discards through CQE very awkward, but some CQE's don't
support DCMD anyway.  So for discards, the existing non-CQE approach is
taken, where the mmc core code issues the 3 commands one at a time i.e.
mmc_erase(). Where DCMD is used, is for issuing flushes.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 150 ++-
 drivers/mmc/core/block.h |   2 +
 drivers/mmc/core/queue.c | 162 +--
 drivers/mmc/core/queue.h |  18 ++
 4 files changed, 326 insertions(+), 6 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 7874c3bbf6b5..7275ac5d6799 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -112,6 +112,7 @@ struct mmc_blk_data {
 #define MMC_BLK_WRITE  BIT(1)
 #define MMC_BLK_DISCARDBIT(2)
 #define MMC_BLK_SECDISCARD BIT(3)
+#define MMC_BLK_CQE_RECOVERY   BIT(4)
 
/*
 * Only set in main mmc_blk_data associated
@@ -1730,6 +1731,138 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, 
struct mmc_queue_req *mqrq,
*do_data_tag_p = do_data_tag;
 }
 
+#define MMC_CQE_RETRIES 2
+
+static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_request *mrq = >brq.mrq;
+   struct request_queue *q = req->q;
+   struct mmc_host *host = mq->card->host;
+   unsigned long flags;
+   bool put_card;
+   int err;
+
+   mmc_cqe_post_req(host, mrq);
+
+   if (mrq->cmd && mrq->cmd->error)
+   err = mrq->cmd->error;
+   else if (mrq->data && mrq->data->error)
+   err = mrq->data->error;
+   else
+   err = 0;
+
+   if (err) {
+   if (mqrq->retries++ < MMC_CQE_RETRIES)
+   blk_mq_requeue_request(req, true);
+   else
+   blk_mq_end_request(req, BLK_STS_IOERR);
+   } else if (mrq->data) {
+   if (blk_update_request(req, BLK_STS_OK, 
mrq->data->bytes_xfered))
+   blk_mq_requeue_request(req, true);
+   else
+   __blk_mq_end_request(req, BLK_STS_OK);
+   } else {
+   blk_mq_end_request(req, BLK_STS_OK);
+   }
+
+   spin_lock_irqsave(q->queue_lock, flags);
+
+   mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+
+   put_card = (mmc_tot_in_flight(mq) == 0);
+
+   mmc_cqe_check_busy(mq);
+
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   if (!mq->cqe_busy)
+   blk_mq_run_hw_queues(q, true);
+
+   if (put_card)
+   mmc_put_card(mq->card, >ctx);
+}
+
+void mmc_blk_cqe_recovery(struct mmc_queue *mq)
+{
+   struct mmc_card *card = mq->card;
+   struct mmc_host *host = card->host;
+   int err;
+
+   pr_debug("%s: CQE recovery start\n", mmc_hostname(host));
+
+   err = mmc_cqe_recovery(host);
+   if (err)
+   mmc_blk_reset(mq->blkdata, host, MMC_BLK_CQE_RECOVERY);
+   else
+   mmc_blk_reset_success(mq->blkdata, MMC_BLK_CQE_RECOVERY);
+
+   pr_debug("%s: CQE recovery done\n", mmc_hostname(host));
+}
+
+static void mmc_blk_cqe_req_done(struct mmc_request *mrq)
+{
+   struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+ brq.mrq);
+   struct request *req = mmc_queue_req_to_req(mqrq);
+   struct request_queue *q = req->q;
+   struct mmc_queue *mq = q->queuedata;
+
+   /*
+* Block layer timeouts race with completions which means the normal
+* completion path cannot be used during recovery.
+*/
+   if (mq->in_recovery)
+   mmc_blk_cqe_complete_rq(mq, req);
+   else
+   blk_mq_complete_request(req);
+}
+
+static int mmc_blk_cqe_start_req(struct mmc_host *host, struct mmc_request 
*mrq)
+{
+   mrq->done   = mmc_blk_cqe_req_done;
+   mrq->recovery_notifier  = mmc_cqe_recovery_notifier;
+
+   return mmc_cqe_start_req(host, mrq);
+}
+
+static struct mmc_request *mmc_blk_cqe_prep_dcmd(struct mmc_queue_req *mqrq,
+

[PATCH V15 10/22] mmc: block: blk-mq: Add support for direct completion

2017-11-29 Thread Adrian Hunter
For blk-mq, add support for completing requests directly in the ->done
callback. That means that error handling and urgent background operations
must be handled by recovery_work in that case.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 129 ++-
 drivers/mmc/core/block.h |   1 +
 drivers/mmc/core/host.h  |   5 ++
 drivers/mmc/core/queue.c |   5 +-
 drivers/mmc/core/queue.h |   1 +
 include/linux/mmc/host.h |   1 +
 6 files changed, 116 insertions(+), 26 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 7275ac5d6799..a710a6e95307 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2131,6 +2131,22 @@ static void mmc_blk_mq_rw_recovery(struct mmc_queue *mq, 
struct request *req)
}
 }
 
+static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
+{
+   mmc_blk_eval_resp_error(brq);
+
+   return brq->sbc.error || brq->cmd.error || brq->stop.error ||
+  brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
+}
+
+static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
+   struct request *req)
+{
+   int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+
+   mmc_blk_reset_success(mq->blkdata, type);
+}
+
 static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
 {
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
@@ -2213,14 +2229,43 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, 
struct request *req)
 
mmc_post_req(host, mrq, 0);
 
-   blk_mq_complete_request(req);
+   /*
+* Block layer timeouts race with completions which means the normal
+* completion path cannot be used during recovery.
+*/
+   if (mq->in_recovery)
+   mmc_blk_mq_complete_rq(mq, req);
+   else
+   blk_mq_complete_request(req);
 
mmc_blk_mq_dec_in_flight(mq, req);
 }
 
+void mmc_blk_mq_recovery(struct mmc_queue *mq)
+{
+   struct request *req = mq->recovery_req;
+   struct mmc_host *host = mq->card->host;
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+
+   mq->recovery_req = NULL;
+   mq->rw_wait = false;
+
+   if (mmc_blk_rq_error(>brq)) {
+   mmc_retune_hold_now(host);
+   mmc_blk_mq_rw_recovery(mq, req);
+   }
+
+   mmc_blk_urgent_bkops(mq, mqrq);
+
+   mmc_blk_mq_post_req(mq, req);
+}
+
 static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
 struct request **prev_req)
 {
+   if (mmc_host_done_complete(mq->card->host))
+   return;
+
mutex_lock(>complete_lock);
 
if (!mq->complete_req)
@@ -2254,29 +2299,56 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
struct request *req = mmc_queue_req_to_req(mqrq);
struct request_queue *q = req->q;
struct mmc_queue *mq = q->queuedata;
+   struct mmc_host *host = mq->card->host;
unsigned long flags;
-   bool waiting;
 
-   /*
-* We cannot complete the request in this context, so record that there
-* is a request to complete, and that a following request does not need
-* to wait (although it does need to complete complete_req first).
-*/
-   spin_lock_irqsave(q->queue_lock, flags);
-   mq->complete_req = req;
-   mq->rw_wait = false;
-   waiting = mq->waiting;
-   spin_unlock_irqrestore(q->queue_lock, flags);
+   if (!mmc_host_done_complete(host)) {
+   bool waiting;
 
-   /*
-* If 'waiting' then the waiting task will complete this request,
-* otherwise queue a work to do it. Note that complete_work may still
-* race with the dispatch of a following request.
-*/
-   if (waiting)
+   /*
+* We cannot complete the request in this context, so record
+* that there is a request to complete, and that a following
+* request does not need to wait (although it does need to
+* complete complete_req first).
+*/
+   spin_lock_irqsave(q->queue_lock, flags);
+   mq->complete_req = req;
+   mq->rw_wait = false;
+   waiting = mq->waiting;
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   /*
+* If 'waiting' then the waiting task will complete this
+* request, otherwise queue a work to do it. Note that
+* complete_work may still race with the dispatch of a following
+* request.
+*/
+   if (waiting)
+   wake_up(>wait);
+   else
+   kblockd_schedule_work(>complete_work);
+
+   return;
+   }
+
+   /* Take the recovery path 

[PATCH V15 11/22] mmc: block: blk-mq: Separate card polling from recovery

2017-11-29 Thread Adrian Hunter
Recovery is simpler to understand if it is only used for errors. Create a
separate function for card polling.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index a710a6e95307..6d2c42c1c33a 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2139,6 +2139,26 @@ static inline bool mmc_blk_rq_error(struct 
mmc_blk_request *brq)
   brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
 }
 
+static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   bool gen_err = false;
+   int err;
+
+   if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
+   return 0;
+
+   err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, _err);
+
+   /* Copy the general error bit so it will be seen later on */
+   if (gen_err) {
+   mqrq->brq.stop.resp[0] |= R1_ERROR;
+   err = err ? err : -EIO;
+   }
+
+   return err;
+}
+
 static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
struct request *req)
 {
@@ -2197,8 +2217,15 @@ static void mmc_blk_mq_poll_completion(struct mmc_queue 
*mq,
   struct request *req)
 {
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_host *host = mq->card->host;
 
-   mmc_blk_mq_rw_recovery(mq, req);
+   if (mmc_blk_rq_error(>brq) ||
+   mmc_blk_card_busy(mq->card, req)) {
+   mmc_blk_mq_rw_recovery(mq, req);
+   } else {
+   mmc_blk_rw_reset_success(mq, req);
+   mmc_retune_release(host);
+   }
 
mmc_blk_urgent_bkops(mq, mqrq);
 }
-- 
1.9.1



[PATCH V15 13/22] mmc: block: blk-mq: Check error bits and save the exception bit when polling card busy

2017-11-29 Thread Adrian Hunter
Check error bits and save the exception bit when polling card busy.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 30fc012353ae..c446d17b48c4 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1457,15 +1457,18 @@ static inline void mmc_apply_rel_rw(struct 
mmc_blk_request *brq,
}
 }
 
-#define CMD_ERRORS \
-   (R1_OUT_OF_RANGE |  /* Command argument out of range */ \
-R1_ADDRESS_ERROR | /* Misaligned address */\
+#define CMD_ERRORS_EXCL_OOR\
+   (R1_ADDRESS_ERROR | /* Misaligned address */\
 R1_BLOCK_LEN_ERROR |   /* Transferred block length incorrect */\
 R1_WP_VIOLATION |  /* Tried to write to protected block */ \
 R1_CARD_ECC_FAILED |   /* Card ECC failed */   \
 R1_CC_ERROR |  /* Card controller error */ \
 R1_ERROR)  /* General/unknown error */
 
+#define CMD_ERRORS \
+   (CMD_ERRORS_EXCL_OOR |  \
+R1_OUT_OF_RANGE)   /* Command argument out of range */ \
+
 static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
 {
u32 val;
@@ -2157,24 +2160,40 @@ static inline bool mmc_blk_rq_error(struct 
mmc_blk_request *brq)
   brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
 }
 
+static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
+{
+   return !!brq->mrq.sbc;
+}
+
+static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
+{
+   return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
+}
+
 static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
 {
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
-   bool gen_err = false;
+   u32 status = 0;
int err;
 
if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
return 0;
 
-   err = card_busy_detect_err(card, MMC_BLK_TIMEOUT_MS, false, req,
-  _err);
+   err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, );
 
-   /* Copy the general error bit so it will be seen later on */
-   if (gen_err) {
-   mqrq->brq.stop.resp[0] |= R1_ERROR;
+   /*
+* Do not assume data transferred correctly if there are any error bits
+* set.
+*/
+   if (status & mmc_blk_stop_err_bits(>brq)) {
+   mqrq->brq.data.bytes_xfered = 0;
err = err ? err : -EIO;
}
 
+   /* Copy the exception bit so it will be seen later on */
+   if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
+   mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
+
return err;
 }
 
-- 
1.9.1



[PATCH V15 14/22] mmc: block: Check the timeout correctly in card_busy_detect()

2017-11-29 Thread Adrian Hunter
Pedantically, ensure the status is checked for the last time after the full
timeout has passed.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index c446d17b48c4..f7c387c27ac0 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -931,6 +931,8 @@ static int card_busy_detect(struct mmc_card *card, unsigned 
int timeout_ms,
u32 status;
 
do {
+   bool done = time_after(jiffies, timeout);
+
err = __mmc_send_status(card, , 5);
if (err) {
pr_err("%s: error %d requesting status\n",
@@ -951,7 +953,7 @@ static int card_busy_detect(struct mmc_card *card, unsigned 
int timeout_ms,
 * Timeout if the device never becomes ready for data and never
 * leaves the program state.
 */
-   if (time_after(jiffies, timeout)) {
+   if (done) {
pr_err("%s: Card stuck in programming state! %s %s\n",
mmc_hostname(card->host),
req->rq_disk->disk_name, __func__);
-- 
1.9.1



[PATCH V15 16/22] mmc: block: Add timeout_clks when calculating timeout

2017-11-29 Thread Adrian Hunter
According to the specification, total access time is derived from both TAAC
and NSAC, which means the timeout should add both timeout_ns and
timeout_clks. Host drivers do that, so make the block driver do that too.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 42 +++---
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 0b40fc2ebf77..46e63aec1fcb 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -922,6 +922,34 @@ static int mmc_sd_num_wr_blocks(struct mmc_card *card, u32 
*written_blocks)
return 0;
 }
 
+static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
+{
+   if (host->actual_clock)
+   return host->actual_clock / 1000;
+
+   /* Clock may be subject to a divisor, fudge it by a factor of 2. */
+   if (host->ios.clock)
+   return host->ios.clock / 2000;
+
+   /* How can there be no clock */
+   WARN_ON_ONCE(1);
+   return 100; /* 100 kHz is minimum possible value */
+}
+
+static unsigned int mmc_blk_data_timeout_ms(struct mmc_host *host,
+   struct mmc_data *data)
+{
+   unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 100);
+   unsigned int khz;
+
+   if (data->timeout_clks) {
+   khz = mmc_blk_clock_khz(host);
+   ms += DIV_ROUND_UP(data->timeout_clks, khz);
+   }
+
+   return ms;
+}
+
 static inline bool mmc_blk_in_tran_state(u32 status)
 {
/*
@@ -1169,9 +1197,10 @@ static int mmc_blk_cmd_recovery(struct mmc_card *card, 
struct request *req,
 */
if (R1_CURRENT_STATE(status) == R1_STATE_DATA ||
R1_CURRENT_STATE(status) == R1_STATE_RCV) {
-   err = send_stop(card,
-   DIV_ROUND_UP(brq->data.timeout_ns, 100),
-   req, gen_err, _status);
+   unsigned int timeout;
+
+   timeout = mmc_blk_data_timeout_ms(card->host, >data);
+   err = send_stop(card, timeout, req, gen_err, _status);
if (err) {
pr_err("%s: error %d sending stop command\n",
   req->rq_disk->disk_name, err);
@@ -1977,6 +2006,7 @@ static void mmc_blk_read_single(struct mmc_queue *mq, 
struct request *req)
struct mmc_host *host = card->host;
blk_status_t error = BLK_STS_OK;
int retries = 0;
+   unsigned int timeout = mmc_blk_data_timeout_ms(host, mrq->data);
 
do {
u32 status;
@@ -1995,10 +2025,8 @@ static void mmc_blk_read_single(struct mmc_queue *mq, 
struct request *req)
u32 stop_status = 0;
bool gen_err = false;
 
-   err = send_stop(card,
-   DIV_ROUND_UP(mrq->data->timeout_ns,
-100),
-   req, _err, _status);
+   err = send_stop(card, timeout, req, _err,
+   _status);
if (err)
goto error_exit;
}
-- 
1.9.1



[PATCH V15 17/22] mmc: block: Reduce polling timeout from 10 minutes to 10 seconds

2017-11-29 Thread Adrian Hunter
Set a 10 second timeout for polling write request busy state. Note, mmc
core is setting a 3 second timeout for SD cards, and SDHCI has long had a
10 second software timer to timeout the whole request, so 10 seconds should
be ample.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 46e63aec1fcb..9d323ed34f82 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -63,7 +63,13 @@
 #endif
 #define MODULE_PARAM_PREFIX "mmcblk."
 
-#define MMC_BLK_TIMEOUT_MS  (10 * 60 * 1000)/* 10 minute timeout */
+/*
+ * Set a 10 second timeout for polling write request busy state. Note, mmc core
+ * is setting a 3 second timeout for SD cards, and SDHCI has long had a 10
+ * second software timer to timeout the whole request, so 10 seconds should be
+ * ample.
+ */
+#define MMC_BLK_TIMEOUT_MS  (10 * 1000)
 #define MMC_SANITIZE_REQ_TIMEOUT 24
 #define MMC_EXTRACT_INDEX_FROM_ARG(x) ((x & 0x00FF) >> 16)
 
-- 
1.9.1



[PATCH V15 18/22] mmc: block: blk-mq: Stop using legacy recovery

2017-11-29 Thread Adrian Hunter
There are only a few things the recovery needs to do. Primarily, it just
needs to:
Determine the number of bytes transferred
Get the card back to transfer state
Determine whether to retry

There are also a couple of additional features:
Reset the card before the last retry
Read one sector at a time

The legacy code spent much effort analyzing command errors, but commands
fail fast, so it is simpler just to give all command errors the same number
of retries.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 304 +--
 1 file changed, 161 insertions(+), 143 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 9d323ed34f82..bd7ead343500 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1557,9 +1557,11 @@ static void mmc_blk_eval_resp_error(struct 
mmc_blk_request *brq)
}
 }
 
-static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
-  struct mmc_queue_req *mq_mrq)
+static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
+struct mmc_async_req *areq)
 {
+   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+   areq);
struct mmc_blk_request *brq = _mrq->brq;
struct request *req = mmc_queue_req_to_req(mq_mrq);
int need_retune = card->host->need_retune;
@@ -1665,15 +1667,6 @@ static enum mmc_blk_status __mmc_blk_err_check(struct 
mmc_card *card,
return MMC_BLK_SUCCESS;
 }
 
-static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
-struct mmc_async_req *areq)
-{
-   struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-   areq);
-
-   return __mmc_blk_err_check(card, mq_mrq);
-}
-
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
  int disable_multi, bool *do_rel_wr_p,
  bool *do_data_tag_p)
@@ -1999,8 +1992,39 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req 
*mqrq,
 }
 
 #define MMC_MAX_RETRIES5
+#define MMC_DATA_RETRIES   2
 #define MMC_NO_RETRIES (MMC_MAX_RETRIES + 1)
 
+static int mmc_blk_send_stop(struct mmc_card *card, unsigned int timeout)
+{
+   struct mmc_command cmd = {
+   .opcode = MMC_STOP_TRANSMISSION,
+   .flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC,
+   /* Some hosts wait for busy anyway, so provide a busy timeout */
+   .busy_timeout = timeout,
+   };
+
+   return mmc_wait_for_cmd(card->host, , 5);
+}
+
+static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
+{
+   struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+   struct mmc_blk_request *brq = >brq;
+   unsigned int timeout = mmc_blk_data_timeout_ms(card->host, >data);
+   int err;
+
+   mmc_retune_hold_now(card->host);
+
+   mmc_blk_send_stop(card, timeout);
+
+   err = card_busy_detect(card, timeout, false, req, NULL);
+
+   mmc_retune_release(card->host);
+
+   return err;
+}
+
 #define MMC_READ_SINGLE_RETRIES2
 
 /* Single sector read during recovery */
@@ -2012,7 +2036,6 @@ static void mmc_blk_read_single(struct mmc_queue *mq, 
struct request *req)
struct mmc_host *host = card->host;
blk_status_t error = BLK_STS_OK;
int retries = 0;
-   unsigned int timeout = mmc_blk_data_timeout_ms(host, mrq->data);
 
do {
u32 status;
@@ -2027,12 +2050,8 @@ static void mmc_blk_read_single(struct mmc_queue *mq, 
struct request *req)
goto error_exit;
 
if (!mmc_host_is_spi(host) &&
-   R1_CURRENT_STATE(status) != R1_STATE_TRAN) {
-   u32 stop_status = 0;
-   bool gen_err = false;
-
-   err = send_stop(card, timeout, req, _err,
-   _status);
+   !mmc_blk_in_tran_state(status)) {
+   err = mmc_blk_fix_state(card, req);
if (err)
goto error_exit;
}
@@ -2062,6 +2081,60 @@ static void mmc_blk_read_single(struct mmc_queue *mq, 
struct request *req)
mqrq->retries = MMC_MAX_RETRIES - 1;
 }
 
+static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
+{
+   return !!brq->mrq.sbc;
+}
+
+static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
+{
+   return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
+}
+
+/*
+ * Check for errors the host controller driver might not have seen such as
+ * response mode errors or invalid card state.

[PATCH V15 19/22] mmc: mmc_test: Do not use mmc_start_areq() anymore

2017-11-29 Thread Adrian Hunter
The block driver's blk-mq paths do not use mmc_start_areq(). In order to
remove mmc_start_areq() entirely, start by removing it from mmc_test.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/mmc_test.c | 122 
 1 file changed, 54 insertions(+), 68 deletions(-)

diff --git a/drivers/mmc/core/mmc_test.c b/drivers/mmc/core/mmc_test.c
index 478869805b96..9311c8de2061 100644
--- a/drivers/mmc/core/mmc_test.c
+++ b/drivers/mmc/core/mmc_test.c
@@ -171,11 +171,6 @@ struct mmc_test_multiple_rw {
enum mmc_test_prep_media prepare;
 };
 
-struct mmc_test_async_req {
-   struct mmc_async_req areq;
-   struct mmc_test_card *test;
-};
-
 /***/
 /*  General helper functions   */
 /***/
@@ -741,30 +736,6 @@ static int mmc_test_check_result(struct mmc_test_card 
*test,
return ret;
 }
 
-static enum mmc_blk_status mmc_test_check_result_async(struct mmc_card *card,
-  struct mmc_async_req *areq)
-{
-   struct mmc_test_async_req *test_async =
-   container_of(areq, struct mmc_test_async_req, areq);
-   int ret;
-
-   mmc_test_wait_busy(test_async->test);
-
-   /*
-* FIXME: this would earlier just casts a regular error code,
-* either of the kernel type -ERRORCODE or the local test framework
-* RESULT_* errorcode, into an enum mmc_blk_status and return as
-* result check. Instead, convert it to some reasonable type by just
-* returning either MMC_BLK_SUCCESS or MMC_BLK_CMD_ERR.
-* If possible, a reasonable error code should be returned.
-*/
-   ret = mmc_test_check_result(test_async->test, areq->mrq);
-   if (ret)
-   return MMC_BLK_CMD_ERR;
-
-   return MMC_BLK_SUCCESS;
-}
-
 /*
  * Checks that a "short transfer" behaved as expected
  */
@@ -831,6 +802,45 @@ static struct mmc_test_req *mmc_test_req_alloc(void)
return rq;
 }
 
+static void mmc_test_wait_done(struct mmc_request *mrq)
+{
+   complete(>completion);
+}
+
+static int mmc_test_start_areq(struct mmc_test_card *test,
+  struct mmc_request *mrq,
+  struct mmc_request *prev_mrq)
+{
+   struct mmc_host *host = test->card->host;
+   int err = 0;
+
+   if (mrq) {
+   init_completion(>completion);
+   mrq->done = mmc_test_wait_done;
+   mmc_pre_req(host, mrq);
+   }
+
+   if (prev_mrq) {
+   wait_for_completion(_mrq->completion);
+   err = mmc_test_wait_busy(test);
+   if (!err)
+   err = mmc_test_check_result(test, prev_mrq);
+   }
+
+   if (!err && mrq) {
+   err = mmc_start_request(host, mrq);
+   if (err)
+   mmc_retune_release(host);
+   }
+
+   if (prev_mrq)
+   mmc_post_req(host, prev_mrq, 0);
+
+   if (err && mrq)
+   mmc_post_req(host, mrq, err);
+
+   return err;
+}
 
 static int mmc_test_nonblock_transfer(struct mmc_test_card *test,
  struct scatterlist *sg, unsigned sg_len,
@@ -838,17 +848,10 @@ static int mmc_test_nonblock_transfer(struct 
mmc_test_card *test,
  unsigned blksz, int write, int count)
 {
struct mmc_test_req *rq1, *rq2;
-   struct mmc_test_async_req test_areq[2];
-   struct mmc_async_req *done_areq;
-   struct mmc_async_req *cur_areq = _areq[0].areq;
-   struct mmc_async_req *other_areq = _areq[1].areq;
-   enum mmc_blk_status status;
+   struct mmc_request *mrq, *prev_mrq;
int i;
int ret = RESULT_OK;
 
-   test_areq[0].test = test;
-   test_areq[1].test = test;
-
rq1 = mmc_test_req_alloc();
rq2 = mmc_test_req_alloc();
if (!rq1 || !rq2) {
@@ -856,33 +859,25 @@ static int mmc_test_nonblock_transfer(struct 
mmc_test_card *test,
goto err;
}
 
-   cur_areq->mrq = >mrq;
-   cur_areq->err_check = mmc_test_check_result_async;
-   other_areq->mrq = >mrq;
-   other_areq->err_check = mmc_test_check_result_async;
+   mrq = >mrq;
+   prev_mrq = NULL;
 
for (i = 0; i < count; i++) {
-   mmc_test_prepare_mrq(test, cur_areq->mrq, sg, sg_len, dev_addr,
-blocks, blksz, write);
-   done_areq = mmc_start_areq(test->card->host, cur_areq, );
-
-   if (status != MMC_BLK_SUCCESS || (!done_areq && i > 0)) {
-   ret = RESULT_FAIL;
+   mmc_test_req_reset(container_of(mrq, struct mmc_test_req, mrq));
+   mmc_test_prepare_mrq(test, mrq, sg, sg_len, dev_addr, blocks,
+   

[PATCH V15 20/22] mmc: core: Remove option not to use blk-mq

2017-11-29 Thread Adrian Hunter
Remove config option MMC_MQ_DEFAULT and parameter mmc_use_blk_mq, so that
blk-mq must be used always.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/Kconfig | 10 --
 drivers/mmc/core/core.c |  7 ---
 drivers/mmc/core/core.h |  2 --
 drivers/mmc/core/host.c |  2 --
 drivers/mmc/core/host.h |  2 +-
 5 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig
index 42565562577c..ec21388311db 100644
--- a/drivers/mmc/Kconfig
+++ b/drivers/mmc/Kconfig
@@ -12,16 +12,6 @@ menuconfig MMC
  If you want MMC/SD/SDIO support, you should say Y here and
  also to your specific host controller driver.
 
-config MMC_MQ_DEFAULT
-   bool "MMC: use blk-mq I/O path by default"
-   depends on MMC && BLOCK
-   default y
-   ---help---
- This option enables the new blk-mq based I/O path for MMC block
- devices by default.  With the option the mmc_core.use_blk_mq
- module/boot option defaults to Y, without it to N, but it can
- still be overridden either way.
-
 if MMC
 
 source "drivers/mmc/core/Kconfig"
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 617802f45386..7ca6e4866a8b 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -66,13 +66,6 @@
 bool use_spi_crc = 1;
 module_param(use_spi_crc, bool, 0);
 
-#ifdef CONFIG_MMC_MQ_DEFAULT
-bool mmc_use_blk_mq = true;
-#else
-bool mmc_use_blk_mq = false;
-#endif
-module_param_named(use_blk_mq, mmc_use_blk_mq, bool, S_IWUSR | S_IRUGO);
-
 static int mmc_schedule_delayed_work(struct delayed_work *work,
 unsigned long delay)
 {
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 136617d2f971..3e3d21304e5f 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -35,8 +35,6 @@ struct mmc_bus_ops {
int (*reset)(struct mmc_host *);
 };
 
-extern bool mmc_use_blk_mq;
-
 void mmc_attach_bus(struct mmc_host *host, const struct mmc_bus_ops *ops);
 void mmc_detach_bus(struct mmc_host *host);
 
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 409a68a96a0a..64b03d6eaf18 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -404,8 +404,6 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
*dev)
 
host->fixed_drv_type = -EINVAL;
 
-   host->use_blk_mq = mmc_use_blk_mq;
-
return host;
 }
 
diff --git a/drivers/mmc/core/host.h b/drivers/mmc/core/host.h
index 8ca284e079e3..6d896869e5c6 100644
--- a/drivers/mmc/core/host.h
+++ b/drivers/mmc/core/host.h
@@ -81,7 +81,7 @@ static inline bool mmc_card_hs400es(struct mmc_card *card)
 
 static inline bool mmc_host_use_blk_mq(struct mmc_host *host)
 {
-   return host->use_blk_mq;
+   return true;
 }
 
 #endif
-- 
1.9.1



[PATCH V15 21/22] mmc: block: Remove code no longer needed after the switch to blk-mq

2017-11-29 Thread Adrian Hunter
Remove code no longer needed after the switch to blk-mq.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/block.c | 723 +--
 drivers/mmc/core/block.h |   2 -
 drivers/mmc/core/queue.c | 240 +---
 drivers/mmc/core/queue.h |  15 -
 4 files changed, 16 insertions(+), 964 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index bd7ead343500..a1fca9748898 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -967,8 +967,7 @@ static inline bool mmc_blk_in_tran_state(u32 status)
 }
 
 static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms,
-   bool hw_busy_detect, struct request *req,
-   u32 *resp_errs)
+   struct request *req, u32 *resp_errs)
 {
unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
int err = 0;
@@ -988,11 +987,6 @@ static int card_busy_detect(struct mmc_card *card, 
unsigned int timeout_ms,
if (resp_errs)
*resp_errs |= status;
 
-   /* We may rely on the host hw to handle busy detection.*/
-   if ((card->host->caps & MMC_CAP_WAIT_WHILE_BUSY) &&
-   hw_busy_detect)
-   break;
-
/*
 * Timeout if the device never becomes ready for data and never
 * leaves the program state.
@@ -1014,243 +1008,6 @@ static int card_busy_detect(struct mmc_card *card, 
unsigned int timeout_ms,
return err;
 }
 
-static int card_busy_detect_err(struct mmc_card *card, unsigned int timeout_ms,
-   bool hw_busy_detect, struct request *req,
-   bool *gen_err)
-{
-   u32 resp_errs = 0;
-   int err;
-
-   err = card_busy_detect(card, timeout_ms, hw_busy_detect, req,
-  _errs);
-   if (resp_errs & R1_ERROR) {
-   pr_err("%s: %s: error sending status cmd, status %#x\n",
-  req->rq_disk->disk_name, __func__, resp_errs);
-   *gen_err = true;
-   }
-
-   return err;
-}
-
-static int send_stop(struct mmc_card *card, unsigned int timeout_ms,
-   struct request *req, bool *gen_err, u32 *stop_status)
-{
-   struct mmc_host *host = card->host;
-   struct mmc_command cmd = {};
-   int err;
-   bool use_r1b_resp = rq_data_dir(req) == WRITE;
-
-   /*
-* Normally we use R1B responses for WRITE, but in cases where the host
-* has specified a max_busy_timeout we need to validate it. A failure
-* means we need to prevent the host from doing hw busy detection, which
-* is done by converting to a R1 response instead.
-*/
-   if (host->max_busy_timeout && (timeout_ms > host->max_busy_timeout))
-   use_r1b_resp = false;
-
-   cmd.opcode = MMC_STOP_TRANSMISSION;
-   if (use_r1b_resp) {
-   cmd.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
-   cmd.busy_timeout = timeout_ms;
-   } else {
-   cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC;
-   }
-
-   err = mmc_wait_for_cmd(host, , 5);
-   if (err)
-   return err;
-
-   *stop_status = cmd.resp[0];
-
-   /* No need to check card status in case of READ. */
-   if (rq_data_dir(req) == READ)
-   return 0;
-
-   if (!mmc_host_is_spi(host) &&
-   (*stop_status & R1_ERROR)) {
-   pr_err("%s: %s: general error sending stop command, resp %#x\n",
-   req->rq_disk->disk_name, __func__, *stop_status);
-   *gen_err = true;
-   }
-
-   return card_busy_detect_err(card, timeout_ms, use_r1b_resp, req,
-   gen_err);
-}
-
-#define ERR_NOMEDIUM   3
-#define ERR_RETRY  2
-#define ERR_ABORT  1
-#define ERR_CONTINUE   0
-
-static int mmc_blk_cmd_error(struct request *req, const char *name, int error,
-   bool status_valid, u32 status)
-{
-   switch (error) {
-   case -EILSEQ:
-   /* response crc error, retry the r/w cmd */
-   pr_err("%s: %s sending %s command, card status %#x\n",
-   req->rq_disk->disk_name, "response CRC error",
-   name, status);
-   return ERR_RETRY;
-
-   case -ETIMEDOUT:
-   pr_err("%s: %s sending %s command, card status %#x\n",
-   req->rq_disk->disk_name, "timed out", name, status);
-
-   /* If the status cmd initially failed, retry the r/w cmd */
-   if (!status_valid) {
-   pr_err("%s: status not valid, retrying timeout\n",
-   req->rq_disk->disk_name);
-   return ERR_RETRY;
-   }
-
-   /*
-* If 

[PATCH V15 22/22] mmc: core: Remove code no longer needed after the switch to blk-mq

2017-11-29 Thread Adrian Hunter
Remove code no longer needed after the switch to blk-mq.

Signed-off-by: Adrian Hunter 
---
 drivers/mmc/core/bus.c   |   2 -
 drivers/mmc/core/core.c  | 185 +--
 drivers/mmc/core/core.h  |   8 --
 drivers/mmc/core/host.h  |   5 --
 include/linux/mmc/host.h |   3 -
 5 files changed, 1 insertion(+), 202 deletions(-)

diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
index 7586ff2ad1f1..fc92c6c1c9a4 100644
--- a/drivers/mmc/core/bus.c
+++ b/drivers/mmc/core/bus.c
@@ -351,8 +351,6 @@ int mmc_add_card(struct mmc_card *card)
 #ifdef CONFIG_DEBUG_FS
mmc_add_card_debugfs(card);
 #endif
-   mmc_init_context_info(card->host);
-
card->dev.of_node = mmc_of_find_child_device(card->host, 0);
 
device_enable_async_suspend(>dev);
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 7ca6e4866a8b..e5c8727c16ad 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -361,20 +361,6 @@ int mmc_start_request(struct mmc_host *host, struct 
mmc_request *mrq)
 }
 EXPORT_SYMBOL(mmc_start_request);
 
-/*
- * mmc_wait_data_done() - done callback for data request
- * @mrq: done data request
- *
- * Wakes up mmc context, passed as a callback to host controller driver
- */
-static void mmc_wait_data_done(struct mmc_request *mrq)
-{
-   struct mmc_context_info *context_info = >host->context_info;
-
-   context_info->is_done_rcv = true;
-   wake_up_interruptible(_info->wait);
-}
-
 static void mmc_wait_done(struct mmc_request *mrq)
 {
complete(>completion);
@@ -392,37 +378,6 @@ static inline void mmc_wait_ongoing_tfr_cmd(struct 
mmc_host *host)
wait_for_completion(_mrq->cmd_completion);
 }
 
-/*
- *__mmc_start_data_req() - starts data request
- * @host: MMC host to start the request
- * @mrq: data request to start
- *
- * Sets the done callback to be called when request is completed by the card.
- * Starts data mmc request execution
- * If an ongoing transfer is already in progress, wait for the command line
- * to become available before sending another command.
- */
-static int __mmc_start_data_req(struct mmc_host *host, struct mmc_request *mrq)
-{
-   int err;
-
-   mmc_wait_ongoing_tfr_cmd(host);
-
-   mrq->done = mmc_wait_data_done;
-   mrq->host = host;
-
-   init_completion(>cmd_completion);
-
-   err = mmc_start_request(host, mrq);
-   if (err) {
-   mrq->cmd->error = err;
-   mmc_complete_cmd(mrq);
-   mmc_wait_data_done(mrq);
-   }
-
-   return err;
-}
-
 static int __mmc_start_req(struct mmc_host *host, struct mmc_request *mrq)
 {
int err;
@@ -650,133 +605,11 @@ int mmc_cqe_recovery(struct mmc_host *host)
  */
 bool mmc_is_req_done(struct mmc_host *host, struct mmc_request *mrq)
 {
-   if (host->areq)
-   return host->context_info.is_done_rcv;
-   else
-   return completion_done(>completion);
+   return completion_done(>completion);
 }
 EXPORT_SYMBOL(mmc_is_req_done);
 
 /**
- * mmc_finalize_areq() - finalize an asynchronous request
- * @host: MMC host to finalize any ongoing request on
- *
- * Returns the status of the ongoing asynchronous request, but
- * MMC_BLK_SUCCESS if no request was going on.
- */
-static enum mmc_blk_status mmc_finalize_areq(struct mmc_host *host)
-{
-   struct mmc_context_info *context_info = >context_info;
-   enum mmc_blk_status status;
-
-   if (!host->areq)
-   return MMC_BLK_SUCCESS;
-
-   while (1) {
-   wait_event_interruptible(context_info->wait,
-   (context_info->is_done_rcv ||
-context_info->is_new_req));
-
-   if (context_info->is_done_rcv) {
-   struct mmc_command *cmd;
-
-   context_info->is_done_rcv = false;
-   cmd = host->areq->mrq->cmd;
-
-   if (!cmd->error || !cmd->retries ||
-   mmc_card_removed(host->card)) {
-   status = host->areq->err_check(host->card,
-  host->areq);
-   break; /* return status */
-   } else {
-   mmc_retune_recheck(host);
-   pr_info("%s: req failed (CMD%u): %d, 
retrying...\n",
-   mmc_hostname(host),
-   cmd->opcode, cmd->error);
-   cmd->retries--;
-   cmd->error = 0;
-   __mmc_start_request(host, host->areq->mrq);
-   continue; /* wait for done/new event again */
-   }
-   }
-
-   return MMC_BLK_NEW_REQUEST;
-   }
-
-   

Re: [PATCH 00/12 v5] Multiqueue for MMC/SD

2017-11-29 Thread Linus Walleij
On Wed, Nov 15, 2017 at 2:50 PM, Adrian Hunter  wrote:
> On 14/11/17 23:17, Linus Walleij wrote:
>> We have the following risk factors:
>>
>> - Observed performance degradation of 1% (on x86 SDHI I guess)
>> - The kernel crashes if SD card is removed (both patch sets)
>
> I haven't been able to reproduce that.  Do you have more information?

I saw it in an earlier version of the patch set, but it might be due to
some confusion on my side.

I will try to get this series going and stress it a bit and see what happens.

Yours,
Linus Walleij