On Thu, Dec 08, 2016 at 11:47:41AM +0100, Jan Kara wrote:
> On Wed 07-12-16 17:15:42, Chris Mason wrote:
> > On 12/07/2016 04:45 PM, Liu Bo wrote:
> > >This has implemented DAX support for btrfs with nocow and single-device.
> > >
> > >DAX is developed for block devices that are memory-like in order to avoid
> > >double buffer in both page cache and the storage, so DAX can performs 
> > >reads and
> > >writes directly to the storage device, and for those who prefer to using
> > >filesystem, filesystem dax support can help to map the storage into 
> > >userspace
> > >for file-mapping.
> > >
> > >Since I haven't figure out how to map multiple devices to userspace without
> > >pagecache, this DAX support is only for single-device, and I don't think
> > >DAX(Direct Access) can work with cow, this is limited to nocow case.  I 
> > >made
> > >this by setting nodatacow in dax mount option.
> > 
> > Interesting, this is a nice small start.  It might make more sense to limit
> > snapshots to readonly in DAX mode until we can figure out how to cow
> > properly.  I think it can be done, I just need to sit down with the dax code
> > to do a good review.
> > 
> > But bigger picture, if we can't cow and we can't crc and we can't
> > multi-device, I'd rather let XFS/ext4 sort out the dax space until we pull
> > in more of the btrfs features too.
> 
> So normal DAX IO (via read(2) and write(2)) is very similar to direct IO so
> I don't think there would be any obstacle to support all the features with
> that.

For DAX IO via read(2)/write(2), cow is OK while the mutliple devices is
a problem as currently iomap_dax_actor only takes one <device, blocknum>
pair:

- raid 0, one device is written once a time
- raid 1/10 and others, 2 or more devices need to be written each time

> For mmap(2) things get more difficult but still: The filesystem gets
> normal ->fault notifications when the page is first faulted in. So you
> can COW if you need to at that moment.

Right.

> Also DAX PTEs can be write-protected (well, as of the coming merge
> window) as normal PTEs and then you'll get ->pfn_mkwrite /
> ->page_mkwrite notification when someone tries to write via mmap and
> you can do your stuff at that point.

That's right, but I think the problem comes from the fact that only
->fault with FAULT_FLAG_WRITE gets to space allocation where we could
cow to new location.

For page_mkwrite, btrfs does cow while writing back a dirty page, but
dax doesn't do delayed allocation so dax_writeback_one doesn't have
place to do cow.

Also thank you for the great write-protected patch, since another reason
I decided to disable cow is that there is no write-protected on DAX
PTEs, so without that even if we can do cow, we don't have a way to
update every pte pointing to our cow'd dax pfn.

> So DAX mappings are not that
> different from filesystem point of view. There are some differences wrt.
> locking (you don't have page lock, but you use a lock bit in radix tree
> entry instead for that) but that's about it. So I don't see a principial
> reason why we cannot support all btrfs features for DAX... But if you see
> some problem, let me know and we can talk if we could somehow help from the
> DAX side.

Yeah, looks like we have two problems at least, one is dax_writeback_one
and the other is iomap.

> 
> BTW, I also don't see how the multiple devices are a problem. Actually XFS
> supports that (with its real-time devices) just fine - your ->iomap_begin()
> returns a <device, blocknumber> pair and that should be all that's needed,
> no?

xfs is a bit different, it only writes to one device at a time, sort of
a raid0.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to