On 23.05.2018 09:37, Christoph Hellwig wrote: > On Tue, May 22, 2018 at 02:31:36PM -0400, Chris Mason wrote: >>> And what protects two writes from interleaving their results now? >> >> page locks...ish, we at least won't have results interleaved in a single >> page. For btrfs it'll actually be multiple pages since we try to do more >> than one at a time. > > I think you are going to break just about every assumption people > have that any single write is going to be atomic vs another write. > > E.g. this comes from the posix read definition reference by the > write definition: > > "I/O is intended to be atomic to ordinary files and pipes and FIFOs. > Atomic means that all the bytes from a single operation that started out > together end up together, without interleaving from other I/O > operations. It is a known attribute of terminals that this is not > honored, and terminals are explicitly (and implicitly permanently) > excepted, making the behavior unspecified. The behavior for other device > types is also left unspecified, but the wording is intended to imply > that future standards might choose to specify atomicity (or not). > " > > Without taking the inode lock (or some sort of range lock) you can > easily interleave data from two write requests. > >> But we're not avoiding the inode lock completely, we're just dropping it for >> the expensive parts of writing to the file. A quick guess about what the >> expensive parts are: > > The way I read the patch it basically 'avoids' the inode lock for almost > the whole write call, just minus some setup.
You are right, this drops the inode_lock for some rather crucial ops: prepare_pages - this one allocates pages in the page cache mapping for this inode and locks them lock_and_cleanup_extent_if_need - this one completes any previous writes for the range currenlty being written and locks the range of extents. Looking at the code I *think* the locking provided by prepare_pages/lock_and_cleanup_extent_if_need *should* provide the necessary exclusivity to guarantee atomicity w.r.t to different writes. So Robbie, This reliance on btrfs-internal locks to provide exclusivity *must* be explicitly documented in the commit log. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html