Steven,

I think there is probably another way we can solve this problem but I first want to get a better understanding of the corruption. We have not integrated the TRIM support upstream and I suspect that's the source of most of the problems. Can you confirm that with TRIM disabled that most of corruption you've seen does not occur? I'm trying to get context here since we've not seen this type of failure elsewhere.

Since I'm not familiar with the TRIM implementation in FreeBSD I was wondering if you could explain the scenario that leads to the corruption. The fact that the pipeline doesn't allow the zio to change mid-pipeline is actually intentional so I don't think we want to make a change to allow this to occur. From code inspection it does look like the vdev_*_io_start() routines should never return ZIO_PIPELINE_CONTINUE. I will look at this closer but it looks like there is a bug there.

Anyway, if you could give me more details to the corruption I would be happy to help you design a way that this can be implemented while still ensuring that a zio cannot change while the pipeline is still active. Thanks for diving into this and I will post more about the bugs that look to exist in the vdev_*_io_start() routines.

Thanks,
George

On 4/25/14, 8:53 PM, Steven Hartland wrote:
I've been working on adding IO priority support for TRIM back into
FreeBSD after the import of the new IO scheduling from illumos.

Based on avg's initial work and having got my head around the
requirements of the new scheduler I came up with the attached
zz-zfs-trim-priority.patch.

Most of the time this worked fine but as soon as bio_delete requests
where disabled using the follow I started getting panics:
sysctl vfs.zfs.vdev.bio_delete_disable=1

A simple dd is enough to trigger the panic e.g.
dd if=/dev/zero of=/data/random.dd bs=1m count=10240

The wide selection of panics all seemed to indicate queue corruption
with the main one erroring in vdev_queue_io_to_issue on the line:
zio = avl_first(&vqc->vqc_queued_tree);

After a day of debugging and adding lots of additional validation
checks it became apparent that after removing a zio from vq_active_tree
both vq_active_tree and the associated vqc_queued_tree become corrupt.

By corrupt I mean that avl_numnodes is no longer in sync with a manual
count of the nodes using a tree walk.

In each case the vq_active_tree.avl_numnodes is one less than its actual
number of nodes and vqc_queued_tree.avl_numnodes is one greater than
its actual number of nodes.

After adding queue tracking to zio's it turned out that
vdev_queue_pending_remove was trying to remove a zio from vq_active_tree
which wasn't in that tree but was in write vqc_queued_tree tree.

As avl_remove doesn't do any validation on the node being present in
the tree it merrily tried to remove it resulting in nasty ness in both
trees.

The cause of this is in zio_vdev_io_start specifically
if ((zio = vdev_queue_io(zio)) == NULL)
    return (ZIO_PIPELINE_STOP);

This can result in a different zio reaching:
return (vd->vdev_ops->vdev_op_io_start(zio));

When this happens and vdev_op_io_start returns ZIO_PIPELINE_CONTINUE
e.g. TRIM requests when bio_delete_disable=1 is set, the calling
zio_execute continues the pipeline for the zio it called
zio_vdev_io_start with, but that zio hasn't been processed and hence
isn't in the vq_active_tree but in one of vqc_queued_tree's.

Its not clear if any other paths can have their vdev_io_io_start
return ZIO_PIPELINE_CONTINUE but it definitely looks that way and
may well explain other panics I've seen in this area when for example
disks dropped.

I'm unsure if there's a more elegent fix but allowing pipeline stages
to change the processing zio by passing in a zio_t **ziop instead
of zio_t *zio as in the attached zfs-zio-queue-reorder.patch fixes the
issue.

Note: Patches are based on FreeBSD 10-RELEASE + some backports from
10-STABLE, mainly r260763: 4045 zfs write throttle & i/o scheduler,
so should apply to 10-STABLE and 11-CURRENT.

   Regards
   Steve


_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to