On 05/10/2017 01:12 AM, David Sterba wrote:
On Sat, May 06, 2017 at 07:17:54AM +0800, Anand Jain wrote:
This adds comments to the flush error handling part of
the code, and hopes to maintain the same logic with a
framework which can be used to handle the errors at the
volume level.

Signed-off-by: Anand Jain <anand.j...@oracle.com>

Reviewed-by: David Sterba <dste...@suse.com>

 Thanks.

So the next step is to preallocate the flush_bio and remove the ENOMEM
handling.

 one or the other way ENOMEM is going to go away. Thinking to fix
 blkdev_issue_flush(), which IMO is the right way, its RFC is sent out.


submit_bio can fail, the error code will be stored to the bio and this
will be checked in the waiting side.

Then write_dev_flush can be split into 2 functions by the parameter
'wait', as the paths are completely independent.

 The new KPI blkdev_issue_flush_no_wait() makes it much simpler,
 I have sent out the btrfs part of the RFC patch as well. Thanks.

+static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
+{
+       int submit_flush_error = 0;
+       int dev_flush_error = 0;
+       struct btrfs_device *dev;
+       int tolerance;
+
+       list_for_each_entry_rcu(dev, &fsdevs->devices, dev_list) {
+               if (!dev->bdev) {
+                       submit_flush_error++;
+                       dev_flush_error++;
+                       continue;
+               }
+               if (dev->last_flush_error == -ENOMEM)
+                       submit_flush_error++;
+               if (dev->last_flush_error && dev->last_flush_error != -ENOMEM)
+                       dev_flush_error++;
+       }
+
+       tolerance = fsdevs->fs_info->num_tolerated_disk_barrier_failures;
+       if (submit_flush_error > tolerance || dev_flush_error > tolerance)
+               return -EIO;

Actually, after reading submit_bio and the comment above, do we need to
care about the submission errors? As submit_bio could return -EIO
immeditaelly, but does not help us anyway. We'll wait and check again
later.  It's the bio completion what counts.

This should simplify the code.

 Yeah. we don;t have to worry about the send part of the error, when
 all the error that may encounter is all related to the device itself.

 There are RFC patches for the device state: offline and failed, now
 depending on whether the flush returned EIO or transport error ENXIO
 I intend to choice the device state such as of failed or offline
 respectively.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to