On Thu, Dec 08, 2016 at 11:47:41AM +0100, Jan Kara wrote: > On Wed 07-12-16 17:15:42, Chris Mason wrote: > > On 12/07/2016 04:45 PM, Liu Bo wrote: > > >This has implemented DAX support for btrfs with nocow and single-device. > > > > > >DAX is developed for block devices that are memory-like in order to avoid > > >double buffer in both page cache and the storage, so DAX can performs > > >reads and > > >writes directly to the storage device, and for those who prefer to using > > >filesystem, filesystem dax support can help to map the storage into > > >userspace > > >for file-mapping. > > > > > >Since I haven't figure out how to map multiple devices to userspace without > > >pagecache, this DAX support is only for single-device, and I don't think > > >DAX(Direct Access) can work with cow, this is limited to nocow case. I > > >made > > >this by setting nodatacow in dax mount option. > > > > Interesting, this is a nice small start. It might make more sense to limit > > snapshots to readonly in DAX mode until we can figure out how to cow > > properly. I think it can be done, I just need to sit down with the dax code > > to do a good review. > > > > But bigger picture, if we can't cow and we can't crc and we can't > > multi-device, I'd rather let XFS/ext4 sort out the dax space until we pull > > in more of the btrfs features too. > > So normal DAX IO (via read(2) and write(2)) is very similar to direct IO so > I don't think there would be any obstacle to support all the features with > that.
For DAX IO via read(2)/write(2), cow is OK while the mutliple devices is a problem as currently iomap_dax_actor only takes one <device, blocknum> pair: - raid 0, one device is written once a time - raid 1/10 and others, 2 or more devices need to be written each time > For mmap(2) things get more difficult but still: The filesystem gets > normal ->fault notifications when the page is first faulted in. So you > can COW if you need to at that moment. Right. > Also DAX PTEs can be write-protected (well, as of the coming merge > window) as normal PTEs and then you'll get ->pfn_mkwrite / > ->page_mkwrite notification when someone tries to write via mmap and > you can do your stuff at that point. That's right, but I think the problem comes from the fact that only ->fault with FAULT_FLAG_WRITE gets to space allocation where we could cow to new location. For page_mkwrite, btrfs does cow while writing back a dirty page, but dax doesn't do delayed allocation so dax_writeback_one doesn't have place to do cow. Also thank you for the great write-protected patch, since another reason I decided to disable cow is that there is no write-protected on DAX PTEs, so without that even if we can do cow, we don't have a way to update every pte pointing to our cow'd dax pfn. > So DAX mappings are not that > different from filesystem point of view. There are some differences wrt. > locking (you don't have page lock, but you use a lock bit in radix tree > entry instead for that) but that's about it. So I don't see a principial > reason why we cannot support all btrfs features for DAX... But if you see > some problem, let me know and we can talk if we could somehow help from the > DAX side. Yeah, looks like we have two problems at least, one is dax_writeback_one and the other is iomap. > > BTW, I also don't see how the multiple devices are a problem. Actually XFS > supports that (with its real-time devices) just fine - your ->iomap_begin() > returns a <device, blocknumber> pair and that should be all that's needed, > no? xfs is a bit different, it only writes to one device at a time, sort of a raid0. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html