Hi, here's how I managed to recover from a BTRFS replace panic which happened even on 4.8.4.
The kernel didn't seem to handle our raid10 filesystem with a missing device correctly (even though it passed a precautionary scrub before removing the device) : - replace didn't work and triggered a kernel panic, - we saw PostgreSQL corruption (duplicate entries in indexes and write errors), both for database clusters using NoCoW and CoW (we run several clusters on this filesystem and configure them differently based on our needs). What finally worked is adding devices to the filesystem, balancing (I added skip_balance in fstab in case balance would trigger a panic like replace) which removed data allocated to the missing device and then delete it. I didn't dare delete without balancing first as I couldn't get confirmation that skip_balance would prevent the balance triggered by delete to stop (which could mean a panic each time we tried to mount the filesystem). In the end it seems that balancing before deleting is doing the same work : balance correctly detects that it shouldn't use the missing device and reallocate all data properly. The sad result is that we are currently forced to check/restore most of the data just because we had to replace a single disk : clearly BTRFS can't handle itself properly until the missing device is completely removed. That's not what I expected to do when using raid10 :-( Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html