Re: [PATCH 01/10] arch:powerpc: return -ENOMEM on failed allocation
> I think the changelog for this series of conversions > should show that you've validated the change by > inspecting the return call chain at each modified line. > > Also, it seems you've cc'd the same mailing lists for > all of the patches modified by this series. > > It would be better to individually address each patch > in the series only cc'ing the appropriate maintainers > and mailing lists. > > A cover letter would be good too. Point noted. Thanks. - Allen -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not backup tree roots when fsync
On 2017年09月14日 02:25, Liu Bo wrote: It doens't make sense to backup tree roots when doing fsync, since during fsync those tree roots have not been consistent on disk. Signed-off-by: Liu Bo Reviewed-by: Qu Wenruo With a pit can be improved. --- fs/btrfs/disk-io.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 79ac228..a145a88 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3668,7 +3668,14 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) u64 flags; do_barriers = !btrfs_test_opt(fs_info, NOBARRIER); - backup_super_roots(fs_info); + + /* +* max_mirrors == 0 indicates we're from commit_transaction, +* not from fsync where the tree roots in fs_info have not +* been consistent on disk. +*/ + if (max_mirrors == 0) + backup_super_roots(fs_info); BTW, the @max_mirrors naming here is really confusing. Normally I would expect max_mirrors == 0 means we don't need to backup super roots... And since there are only two callers it won't be a big thing to change. Thanks, Qu sb = fs_info->super_for_commit; dev_item = &sb->dev_item; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: do not backup tree roots when fsync
It doens't make sense to backup tree roots when doing fsync, since during fsync those tree roots have not been consistent on disk. Signed-off-by: Liu Bo --- fs/btrfs/disk-io.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 79ac228..a145a88 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3668,7 +3668,14 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) u64 flags; do_barriers = !btrfs_test_opt(fs_info, NOBARRIER); - backup_super_roots(fs_info); + + /* +* max_mirrors == 0 indicates we're from commit_transaction, +* not from fsync where the tree roots in fs_info have not +* been consistent on disk. +*/ + if (max_mirrors == 0) + backup_super_roots(fs_info); sb = fs_info->super_for_commit; dev_item = &sb->dev_item; -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 v2] Btrfs: remove bio_flags which indicates a meta block of log-tree
Since both committing transaction and writing log-tree are doing plugging on metadata IO, we can unify to use %sync_writers to benefit both cases, instead of checking bio_flags while writing meta blocks of log-tree. We can remove this bio_flags because in order to write dirty blocks, log tree also uses btrfs_write_marked_extents(), inside which we has enabled %sync_writers, therefore, every write goes in a synchronous way, so does checksuming. Please also note that, bio_flags is applied per-context while %sync_writers is applied per-inode, so this might incur some overhead, ie. 1) while log tree is flushing its dirty blocks via btrfs_write_marked_extents(), in which %sync_writers is increased by one. 2) in the meantime, some writeback operations may happen upon btrfs's metadata inode, so these writes go synchronously, too. However, AFAICS, the overhead is not a big one while the win is that we unify the two places that needs synchronous way and remove a special hack/flag. This removes the bio_flags related stuff for writing log-tree. Signed-off-by: Liu Bo --- v2: Improve the commit log to offer more details why this change makes sense. fs/btrfs/disk-io.c | 6 ++ fs/btrfs/extent_io.c | 13 ++--- fs/btrfs/extent_io.h | 1 - 3 files changed, 4 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8d097ba..b0e353e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1005,12 +1005,10 @@ static blk_status_t __btree_submit_bio_done(void *private_data, struct bio *bio, return ret; } -static int check_async_write(struct btrfs_inode *bi, unsigned long bio_flags) +static int check_async_write(struct btrfs_inode *bi) { if (atomic_read(&bi->sync_writers)) return 0; - if (bio_flags & EXTENT_BIO_TREE_LOG) - return 0; #ifdef CONFIG_X86 if (static_cpu_has(X86_FEATURE_XMM4_2)) return 0; @@ -1024,7 +1022,7 @@ static blk_status_t btree_submit_bio_hook(void *private_data, struct bio *bio, { struct inode *inode = private_data; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - int async = check_async_write(BTRFS_I(inode), bio_flags); + int async = check_async_write(BTRFS_I(inode)); blk_status_t ret; if (bio_op(bio) != REQ_OP_WRITE) { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 0aff9b2..b61d68f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -110,7 +110,6 @@ struct extent_page_data { struct bio *bio; struct extent_io_tree *tree; get_extent_t *get_extent; - unsigned long bio_flags; /* tells writepage not to lock the state bits for this range * it still does the unlocking @@ -3713,7 +3712,6 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, u64 offset = eb->start; u32 nritems; unsigned long i, num_pages; - unsigned long bio_flags = 0; unsigned long start, end; int write_flags = (epd->sync_io ? REQ_SYNC : 0) | REQ_META; int ret = 0; @@ -3721,8 +3719,6 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags); num_pages = num_extent_pages(eb->start, eb->len); atomic_set(&eb->io_pages, num_pages); - if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID) - bio_flags = EXTENT_BIO_TREE_LOG; /* set btree blocks beyond nritems with 0 to avoid stale content. */ nritems = btrfs_header_nritems(eb); @@ -3749,8 +3745,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, p, offset >> 9, PAGE_SIZE, 0, bdev, &epd->bio, end_bio_extent_buffer_writepage, -0, epd->bio_flags, bio_flags, false); - epd->bio_flags = bio_flags; +0, 0, 0, false); if (ret) { set_btree_ioerr(p); if (PageWriteback(p)) @@ -3787,7 +3782,6 @@ int btree_write_cache_pages(struct address_space *mapping, .tree = tree, .extent_locked = 0, .sync_io = wbc->sync_mode == WB_SYNC_ALL, - .bio_flags = 0, }; int ret = 0; int done = 0; @@ -4063,7 +4057,7 @@ static void flush_epd_write_bio(struct extent_page_data *epd) bio_set_op_attrs(epd->bio, REQ_OP_WRITE, epd->sync_io ? REQ_SYNC : 0); - ret = submit_one_bio(epd->bio, 0, epd->bio_flags); + ret = submit_one_bio(epd->bio, 0, 0); BUG_ON(ret < 0); /* -ENOMEM */ epd->bio = NULL; } @@ -4086,7 +4080,6 @@ int extent_write_full_page(struct extent_io_tree
[PATCH] Btrfs: fix confusing worker helper info
We've seen the following backtrace stack in ftrace or dmesg log, kworker/u16:10-4244 [000] 241942.480955: function: btrfs_put_ordered_extent kworker/u16:10-4244 [000] 241942.480956: kernel_stack: => finish_ordered_fn (a0384475) => btrfs_scrubparity_helper (a03ca577)<-"incorrect" => btrfs_freespace_write_helper (a03ca98e)<-"correct" => process_one_work (81117b2f) => worker_thread (81118c2a) => kthread (81121de0) => ret_from_fork (81d7087a) btrfs_freespace_write_helper is actually calling normal_worker_helper instead of btrfs_scrubparity_helper, so somehow kernel has parsed the incorrect function address while unwinding the stack, btrfs_scrubparity_helper really shouldn't be shown up. It's caused by compiler doing inline for our helper function, adding a noinline tag can fix that. Signed-off-by: Liu Bo cc: David Sterba --- fs/btrfs/async-thread.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index ff0b0be..593709a 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -67,7 +67,7 @@ struct btrfs_workqueue { static void normal_work_helper(struct btrfs_work *work); #define BTRFS_WORK_HELPER(name)\ -void btrfs_##name(struct work_struct *arg) \ +noinline void btrfs_##name(struct work_struct *arg)\ { \ struct btrfs_work *work = container_of(arg, struct btrfs_work, \ normal_work);\ -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: remove bio_flags which indicates a meta block of log-tree
On Wed, Sep 13, 2017 at 06:43:30PM +0200, David Sterba wrote: > On Mon, Aug 21, 2017 at 03:50:00PM -0600, Liu Bo wrote: > > Since both committing transaction and writing log-tree are doing > > plugging on metadata IO, we can unify to use %sync_writers to benefit > > both cases, instead of checking bio_flags while writing meta blocks of > > log-tree. > > > > This removes the bio_flags related stuff for writing log-tree. > > I'm not sure if this preserves the intentions introduced in > de0022b9da616b95ea5, as it's meant to do the checksums synchronously > (without offloading to other threads), if it's for the tree-log. So now > it depends on the sync_writers count to do that, while with the bio flag > it was independent. > > The sync_writers is per-inode, while the bio flag is from the > submit_bio hook, so it's per-context. Again, blame me for not making the commit log clear enough. We can remove this bio_flags because in order to write dirty blocks, log tree also uses btrfs_write_marked_extents(), inside which PATCH 1 has enabled %sync_writers, therefore, every write goes in a synchronous way, so does checksuming. WRT per-inode vs per-context, yes, they're a bit different, but the only overhead I could think of is that 1) while log tree is flushing its dirty blocks via btrfs_write_marked_extents(), in which %sync_writers is increased by one. 2) in the meantime, some writeback operations may happen upon btrfs's metadata inode, so these writes go synchronously, too. However, AFAICS, the overhead is not a big one while the win is that we unify the two places that needs synchronous way and remove a special hack/flag. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
On 09/13/2017 04:15 PM, Marat Khalili wrote: > On 13/09/17 16:23, Chris Murphy wrote: >> Right, known problem. To use o_direct implies also using nodatacow (or >> at least nodatasum), e.g. xattr +C is set, done by qemu-img -o >> nocow=on >> https://www.spinics.net/lists/linux-btrfs/msg68244.html > Can you please elaborate? I don't have exactly the same problem as described > by the link, but I'm still worried that particularly qemu can be less > resilient to partial raid1 failures even on newer kernels, due to missing > checksums for instance. (BTW I didn't find any xattrs on my VM images, nor do > I plan to set any.) >From what Josef Bacik wrote, I understood that it is not only related to >RAID1. I tried to ask further clarifications without success :( It seems that simply using O_DIRECT could allow checksums mismatch. My understand is that to avoid to copy the data between buffer, the checksum computation is subject to data race: i.e. it is possible that the kernel computes the checksum *and* the user space program change the data. This lead to an io error during a subsequent read. To avoid that BTRFS should copy in a temporary buffer the data, and then compute the checksum. But this is what the common sense suggest that O_DIRECT should avoid. If I understood correctly (which is a BIG if), i think that O_DIRECT should be unsupported (i.e. return -EINVAL) if the file is "not marked" as "nodatacsum" I looked to what ZFSOL does: it seems that it doesn't support O_DIRECT [1] for the same reason (see the comments ' ryao commented on Jul 23, 2015' for further details). Anyway I suggest to read what the open(2) man page says about O_DIRECT: it seems that O_DIRECT has to be used carefully when doing fork; the man page concludes: [...] In summary, O_DIRECT is a potentially powerful tool that should be used with caution. It is recommended that applications treat use of O_DIRECT as a performance option which is disabled by default. [...] [1] https://github.com/zfsonlinux/zfs/issues/224 > > -- > > With Best Regards, > Marat Khalili > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 RESEND] Btrfs: make plug in writing meta blocks really work
On Wed, Sep 13, 2017 at 05:51:55PM +0200, David Sterba wrote: > On Mon, Aug 21, 2017 at 03:49:59PM -0600, Liu Bo wrote: > > We have started plug in btrfs_write_and_wait_marked_extents() but the > > generated IOs actually go to device's schedule IO list where the work > > is doing in another task, thus the started plug doesn't make any > > sense. > > > > And since we wait for IOs immediately after writing meta blocks, it's > > the same case as writing log tree, doing sync submit can merge more > > IOs. > > > > Signed-off-by: Liu Bo > > Reviewed-by: David Sterba > > The original patch c6adc9cc082e3 adds too many randomly scattered > blg_plugs that, we could use some high-level comments about the purpose. The commit log of that patch wasn't explaining well why the old way is not efficient, in most cases, cow'd metadata blocks in the log tree and the log root tree are contiguous on disk, so putting a blk_plug pair can merge writes on those meta blocks so that we could get better performance out of it. And yes, adding comments mentioning the reason will be helpful. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: remove BTRFS_FS_QUOTA_DISABLING flag
On Wed, Aug 30, 2017 at 04:33:16PM +0900, Misono, Tomohiro wrote: > Currently, "btrfs quota enable" would fail after "btrfs quota disable" on > the first time with syslog output "qgroup_rescan_init failed with -22", but > it would succeed on the second time. > > When "quota disable" is called, BTRFS_FS_QUOTA_DISABLING flag bit will be > set in fs_info->flags in btrfs_quota_disable(), but it will not be droppd > in btrfs_run_qgroups() (which is called in btrfs_commit_transaction()) > because quota_root has already been freed. If "quota enable" is called > after that, both BTRFS_FS_QUOTA_DISABLING and BTRFS_FS_QUOTA_ENABLED flag > would be dropped in the btrfs_run_qgroups() since quota_root is not NULL. > This leads to the failure of "quota enable" on the first time. > > BTRFS_FS_QUOTA_DISABLING flag is not used outside of "quota disable" > context and is equivalent to whether quota_root is NULL or not. > btrfs_run_qgroups() checks whether quota_root is NULL or not in the first > place. > > So, let's remove BTRFS_FS_QUOTA_DISABLING flag. The state bits and transitions are messy and keeping them consistent could lead to the problmes you observe. By the analysis above it looks correct to me to remove the 'disabling' bit. Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: send | receive: received snapshot is missing recent files
On Mon, Sep 11, 2017 at 11:19 PM, Andrei Borzenkov wrote: > 11.09.2017 20:53, Axel Burri пишет: >> On 2017-09-08 06:44, Dave wrote: >>> I'm referring to the link below. Using "btrfs subvolume snapshot -r" >>> copies the Received UUID from the source into the new snapshot. The >>> btrbk FAQ entry suggests otherwise. Has something changed? >> >> I don't think something has changed, the description for the read-only >> subvolumes on the btrbk FAQ was just wrong (fixed now). >> >>> The only way I see to remove a Received UUID is to create a rw >>> snapshot (above command without the "-r"), which is not ideal in this >>> situation when cleaning up readonly source snapshots. >>> >>> Any suggestions? Thanks >> >> No suggestions from my part, as far as I know there is no way to easily >> remove/change a received_uuid from a subvolume. >> > > There is BTRFS_IOC_SET_RECEIVED_SUBVOL IOCTL which is used by "btrfs > received". My understanding is that it can also be set to empty (this > clearing it). You could write small program to do it. > > In general it sounds like a bug - removing read-only flag from subvolume > by any means should also clear Received UUID as we cannot anymore > guarantee that subvolume content is the same. Yes! That makes a great deal of sense. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: remove bio_flags which indicates a meta block of log-tree
On Mon, Aug 21, 2017 at 03:50:00PM -0600, Liu Bo wrote: > Since both committing transaction and writing log-tree are doing > plugging on metadata IO, we can unify to use %sync_writers to benefit > both cases, instead of checking bio_flags while writing meta blocks of > log-tree. > > This removes the bio_flags related stuff for writing log-tree. I'm not sure if this preserves the intentions introduced in de0022b9da616b95ea5, as it's meant to do the checksums synchronously (without offloading to other threads), if it's for the tree-log. So now it depends on the sync_writers count to do that, while with the bio flag it was independent. The sync_writers is per-inode, while the bio flag is from the submit_bio hook, so it's per-context. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] drivers:ethernet: return -ENOMEM on allocation failure.
From: Allen Pais Date: Wed, 13 Sep 2017 13:02:15 +0530 > Signed-off-by: Allen Pais This is quite pointless as the caller doesn't do anything with the value, it just tests whether a negative value is returned or not. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs Issues
Hello, We are researchers from Columbia University, New York. As part of our current research we have found some semantic discrepancies between btrfs and other popular filesystems. We have attached two cases. The first one involves an invalid O_DIRECT write() that fails back to buffered write instead of failing with error EINVAL. In directory 2, we discovered that btrfs calculates write_bytes in __btrfs_buffered_write differently from that in generic_perform_writes in fs/mmap.c. This can cause inconsistent behavior between btrfs and other filesystems when program invokes the same writev/write() syscall. In each directory, you will find a Readme.md describing the issue and pointing to the code that may cause the problem. For your convenience, we also included test programs (min.cpp) and instructions in Readme to help reproduce the issues. We would appreciate very much if you could confirm the two cases at your conveniences. Thanks, Amy btrfs_issues.tar.gz Description: GNU Zip compressed data
Re: [PATCH 1/2 RESEND] Btrfs: make plug in writing meta blocks really work
On Mon, Aug 21, 2017 at 03:49:59PM -0600, Liu Bo wrote: > We have started plug in btrfs_write_and_wait_marked_extents() but the > generated IOs actually go to device's schedule IO list where the work > is doing in another task, thus the started plug doesn't make any > sense. > > And since we wait for IOs immediately after writing meta blocks, it's > the same case as writing log tree, doing sync submit can merge more > IOs. > > Signed-off-by: Liu Bo Reviewed-by: David Sterba The original patch c6adc9cc082e3 adds too many randomly scattered blg_plugs that, we could use some high-level comments about the purpose. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] Btrfs: fix confusing worker helper info
On Fri, Sep 08, 2017 at 03:34:45PM -0600, Liu Bo wrote: > We've seen the following backtrace stack in ftrace or dmesg log, > > kworker/u16:10-4244 [000] 241942.480955: function: > btrfs_put_ordered_extent > kworker/u16:10-4244 [000] 241942.480956: kernel_stack: trace> > => finish_ordered_fn (a0384475) > => btrfs_scrubparity_helper (a03ca577) > => btrfs_freespace_write_helper (a03ca98e) > => process_one_work (81117b2f) > => worker_thread (81118c2a) > => kthread (81121de0) > => ret_from_fork (81d7087a) > > btrfs_scrubparity_helper really shouldn't be shown up. > > It's caused by compiler doing inline for our helper function, adding a > noinline tag can fix that. Isn't it the other way around then? Noninline would make the function object exist separately and then it would appear in the stacktrace. And I think this is desired, so you see the call stack withtou the shortcuts that the inlining can cause. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: propagate error to btrfs_cmp_data_prepare caller
On Fri, Sep 08, 2017 at 05:48:55PM +0900, Naohiro Aota wrote: > btrfs_cmp_data_prepare() (almost) always returns 0 i.e. ignoring errors > from gather_extent_pages(). While the pages are freed by > btrfs_cmp_data_free(), cmp->num_pages still has > 0. Then, > btrfs_extent_same() try to access the already freed pages causing faults > (or violates PageLocked assertion). > > This patch just return the error as is so that the caller stop the process. > > Signed-off-by: Naohiro Aota > Fixes: f441460202cb ("btrfs: fix deadlock with extent-same and readpage") > Cc: # 4.2 Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
On 2017-09-13 10:47, Martin Raiber wrote: Hi, On 12.09.2017 23:13 Adam Borowski wrote: On Tue, Sep 12, 2017 at 04:12:32PM -0400, Austin S. Hemmelgarn wrote: On 2017-09-12 16:00, Adam Borowski wrote: Noted. Both Marat's and my use cases, though, involve VMs that are off most of the time, and at least for me, turned on only to test something. Touching mtime makes rsync run again, and it's freaking _slow_: worse than 40 minutes for a 40GB VM (source:SSD target:deduped HDD). 40 minutes for 40GB is insanely slow (that's just short of 18 MB/s) if you're going direct to a hard drive. I get better performance than that on my somewhat pathetic NUC based storage cluster (I get roughly 20 MB/s there, but it's for archival storage so I don't really care). I'm actually curious what the exact rsync command you are using is (you can obviously redact paths as you see fit), as the only way I can think of that it should be that slow is if you're using both --checksum (but if you're using this, you can tell rsync to skip the mtime check, and that issue goes away) and --inplace, _and_ your HDD is slow to begin with. rsync -axX --delete --inplace --numeric-ids /mnt/btr1/qemu/ mordor:$BASE/qemu The target is single, compress=zlib SAMSUNG HD204UI, 34976 hours old but with nothing notable on SMART, in a Qnap 253a, kernel 4.9. Both source and target are btrfs, but here switching to send|receive wouldn't give much as this particular guest is Win10 Insider Edition -- a thingy that shows what the folks from Redmond have cooked up, with roughly weekly updates to the tune of ~10GB writes 10GB deletions (if they do incremental transfers, installation still rewrites everything system). Lemme look a bit more, rsync performance is indeed really abysmal compared to what it should be. self promo, but consider using UrBackup (OSS software, too) instead? For Windows VMs I would install the client in the VM. It excludes unnessary stuff like e.g. page files or the shadow storage area from the image backups, as well and has a mode to store image backups as raw btrfs files. Linux VMs I'd backup as files either from the hypervisor or from in VM. If you want to backup big btrfs image files it can do that too, and faster than rsync plus it can do incremental backups with sparse files. Even without UrBackup (I'll need to look into that actually, we're looking for new backup software where I work since MS has been debating removing File History, and the custom scripts my predecessor wrote are showing their 20+ year age at this point), it's usually better to just run the backup from inside the VM if at all possible. You end up saving space, and don't waste time backing up stuff you don't need. In this particular use case, it would also save other system resources, since you only need to back up the VM if something has changed, and by definition nothing could have changed in the VM (at least, nothing could have legitimately changed) if it's not running. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: prevent to set invalid default subvolid
On Tue, Sep 12, 2017 at 10:42:52PM +0900, Satoru Takeuchi wrote: > `btrfs sub set-default` succeeds to set an ID which isn't corresponding to any > fs/file tree. If such the bad ID is set to a filesystem, we can't mount this > filesystem without specifying `subvol` or `subvolid` mount options. > > Signed-off-by: Satoru Takeuchi Fixes: 6ef5ed0d386b Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] fs:btrfs: return -ENOMEM on allocation failure.
On Wed, Sep 13, 2017 at 01:02:19PM +0530, Allen Pais wrote: > Signed-off-by: Allen Pais > --- > fs/btrfs/check-integrity.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c > index 7d5a9b5..efa4c23 100644 > --- a/fs/btrfs/check-integrity.c > +++ b/fs/btrfs/check-integrity.c > @@ -2913,7 +2913,7 @@ int btrfsic_mount(struct btrfs_fs_info *fs_info, > state = kvzalloc(sizeof(*state), GFP_KERNEL); > if (!state) { > pr_info("btrfs check-integrity: allocation failed!\n"); > - return -1; > + return -ENOMEM; Makes sense, also please fix the -1 a few lines below that also result from failed memory allocation, indirectly from btrfsic_dev_state_alloc(). > } > > if (!btrfsic_is_initialized) { > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] arch:powerpc: return -ENOMEM on failed allocation
On Wed, 2017-09-13 at 13:02 +0530, Allen Pais wrote: > Signed-off-by: Allen Pais I think the changelog for this series of conversions should show that you've validated the change by inspecting the return call chain at each modified line. Also, it seems you've cc'd the same mailing lists for all of the patches modified by this series. It would be better to individually address each patch in the series only cc'ing the appropriate maintainers and mailing lists. A cover letter would be good too. > --- > arch/powerpc/platforms/cell/spider-pci.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/cell/spider-pci.c > b/arch/powerpc/platforms/cell/spider-pci.c > index d1e61e2..82aa3f7 100644 > --- a/arch/powerpc/platforms/cell/spider-pci.c > +++ b/arch/powerpc/platforms/cell/spider-pci.c > @@ -106,7 +106,7 @@ static int __init spiderpci_pci_setup_chip(struct > pci_controller *phb, > dummy_page_va = kmalloc(PAGE_SIZE, GFP_KERNEL); > if (!dummy_page_va) { > pr_err("SPIDERPCI-IOWA:Alloc dummy_page_va failed.\n"); > - return -1; > + return -ENOMEM; > } > > dummy_page_da = dma_map_single(phb->parent, dummy_page_va, > @@ -137,7 +137,7 @@ int __init spiderpci_iowa_init(struct iowa_bus *bus, void > *data) > if (!priv) { > pr_err("SPIDERPCI-IOWA:" > "Can't allocate struct spiderpci_iowa_private"); > - return -1; > + return -ENOMEM; > } > > if (of_address_to_resource(np, 0, &r)) { -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
Hi, On 12.09.2017 23:13 Adam Borowski wrote: > On Tue, Sep 12, 2017 at 04:12:32PM -0400, Austin S. Hemmelgarn wrote: >> On 2017-09-12 16:00, Adam Borowski wrote: >>> Noted. Both Marat's and my use cases, though, involve VMs that are off most >>> of the time, and at least for me, turned on only to test something. >>> Touching mtime makes rsync run again, and it's freaking _slow_: worse than >>> 40 minutes for a 40GB VM (source:SSD target:deduped HDD). >> 40 minutes for 40GB is insanely slow (that's just short of 18 MB/s) if >> you're going direct to a hard drive. I get better performance than that on >> my somewhat pathetic NUC based storage cluster (I get roughly 20 MB/s there, >> but it's for archival storage so I don't really care). I'm actually curious >> what the exact rsync command you are using is (you can obviously redact >> paths as you see fit), as the only way I can think of that it should be that >> slow is if you're using both --checksum (but if you're using this, you can >> tell rsync to skip the mtime check, and that issue goes away) and --inplace, >> _and_ your HDD is slow to begin with. > rsync -axX --delete --inplace --numeric-ids /mnt/btr1/qemu/ mordor:$BASE/qemu > The target is single, compress=zlib SAMSUNG HD204UI, 34976 hours old but > with nothing notable on SMART, in a Qnap 253a, kernel 4.9. > > Both source and target are btrfs, but here switching to send|receive > wouldn't give much as this particular guest is Win10 Insider Edition -- > a thingy that shows what the folks from Redmond have cooked up, with roughly > weekly updates to the tune of ~10GB writes 10GB deletions (if they do > incremental transfers, installation still rewrites everything system). > > Lemme look a bit more, rsync performance is indeed really abysmal compared > to what it should be. self promo, but consider using UrBackup (OSS software, too) instead? For Windows VMs I would install the client in the VM. It excludes unnessary stuff like e.g. page files or the shadow storage area from the image backups, as well and has a mode to store image backups as raw btrfs files. Linux VMs I'd backup as files either from the hypervisor or from in VM. If you want to backup big btrfs image files it can do that too, and faster than rsync plus it can do incremental backups with sparse files. Regards, Martin Raiber -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
On 13/09/17 16:23, Chris Murphy wrote: Right, known problem. To use o_direct implies also using nodatacow (or at least nodatasum), e.g. xattr +C is set, done by qemu-img -o nocow=on https://www.spinics.net/lists/linux-btrfs/msg68244.html Can you please elaborate? I don't have exactly the same problem as described by the link, but I'm still worried that particularly qemu can be less resilient to partial raid1 failures even on newer kernels, due to missing checksums for instance. (BTW I didn't find any xattrs on my VM images, nor do I plan to set any.) -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
On Tue, Sep 12, 2017 at 10:02 AM, Marat Khalili wrote: > (3) it is possible that it uses O_DIRECT or something, and btrfs raid1 does > not fully protect this kind of access. Right, known problem. To use o_direct implies also using nodatacow (or at least nodatasum), e.g. xattr +C is set, done by qemu-img -o nocow=on https://www.spinics.net/lists/linux-btrfs/msg68244.html -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
discard and rollback
Hi, I've been running discard mount option continuously for about 9 months with an HP Spectre which contains a SAMSUNG MZVLV256HCHP-000H1 as reported by smartctl. There have been no problems. However I realize today that within seconds of a super being updated with a new root tree, the old address returns nothing: [chris@f26h ~]$ sudo btrfs-debug-tree -b 84258750464 /dev/nvme0n1p8 btrfs-progs v4.12 node 84258750464 level 1 items 2 free 491 generation 200994 owner 1 fs uuid 2662057f-e6c7-47fa-8af9-ad933a22f6ec chunk uuid 1df72dcf-f515-404a-894a-f7345f988793 key (EXTENT_TREE ROOT_ITEM 0) block 84258783232 (5142748) gen 200994 key (452 INODE_ITEM 0) block 84258881536 (5142754) gen 200994 [chris@f26h ~]$ sudo btrfs-debug-tree -b 84258750464 /dev/nvme0n1p8 btrfs-progs v4.12 checksum verify failed on 84258750464 found E4E3BDB6 wanted checksum verify failed on 84258750464 found E4E3BDB6 wanted bytenr mismatch, want=84258750464, have=0 ERROR: failed to read 84258750464 [chris@f26h ~]$ This suggests a problem for automatic recovery, should it be needed at next mount, as well as the use of 'usebackuproot'. And it could make Btrfs much more fragile than it would otherwise be. It's not broken, so I have no fix proposal. Of course discard is not the default mount option, and a viable work around is fstrim.timer executed once a week or even once per hour. Possibly better would be a delayed discard list. i.e. current discard list accumulates discards for e.g. 5 minutes, and then current list becomes a static delayed list, and a new current list is started. After another 5 minutes, delayed list is executed for discards, and current list becomes delayed list. That way discarded blocks are a minimum of 5 minutes old. I guess there are security implications by delaying discard, so there might be a use case for non-delayed discards (?) -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] drivers:net: return -ENOMEM on allocation failure.
> propagates the -1. That is only called by bond_open() with: > > if (bond_alb_initialize(bond, (BOND_MODE(bond) == BOND_MODE_ALB))) > return -ENOMEM; > > So you might want to also modify this code, to return the return > value, rather than use the hard coded ENOMEM. > I'll modify the above and send it out a separate patch. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] drivers:ethernet: return -ENOMEM on allocation failure.
> > static int cas_alloc_rxds(struct cas *cp) > { > int i; > > for (i = 0; i < N_RX_DESC_RINGS; i++) { > if (cas_alloc_rx_desc(cp, i) < 0) { > cas_free_rxds(cp); > return -1; > } > } > return 0; > } > > Again, your change is correct, but in the end the value is not used. > And if you fix it at the cas_alloc_rxds level, you also need a fix at > the next level up: > > err = -ENOMEM; > if (cas_tx_tiny_alloc(cp) < 0) > goto err_unlock; > > /* alloc rx descriptors */ > if (cas_alloc_rxds(cp) < 0) > goto err_tx_tiny; > > again, the return value is discarded. I agree. I could send out v2 with fixes at both level. - Allen -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
discard and rollback
Hi, I've been running discard mount option continuously for about 9 months with an HP Spectre which contains a SAMSUNG MZVLV256HCHP-000H1 as reported by smartctl. There have been no problems. However I realize today that within seconds of a super being updated with a new root tree, the old address returns nothing: [chris@f26h ~]$ sudo btrfs-debug-tree -b 84258750464 /dev/nvme0n1p8 btrfs-progs v4.12 node 84258750464 level 1 items 2 free 491 generation 200994 owner 1 fs uuid 2662057f-e6c7-47fa-8af9-ad933a22f6ec chunk uuid 1df72dcf-f515-404a-894a-f7345f988793 key (EXTENT_TREE ROOT_ITEM 0) block 84258783232 (5142748) gen 200994 key (452 INODE_ITEM 0) block 84258881536 (5142754) gen 200994 [chris@f26h ~]$ sudo btrfs-debug-tree -b 84258750464 /dev/nvme0n1p8 btrfs-progs v4.12 checksum verify failed on 84258750464 found E4E3BDB6 wanted checksum verify failed on 84258750464 found E4E3BDB6 wanted bytenr mismatch, want=84258750464, have=0 ERROR: failed to read 84258750464 [chris@f26h ~]$ This suggests a problem for automatic recovery, should it be needed at next mount, as well as the use of 'usebackuproot'. And it could make Btrfs much more fragile than it would otherwise be. It's not broken, so I have no fix proposal. Of course discard is not the default mount option, and a viable work around is fstrim.timer executed once a week or even once per hour. Possibly a better option is a delayed discard list. i.e. current discard list accumulates discards for e.g. 5 minutes, and then current list becomes a static delayed list, and a new current list is started. After another 5 minutes, delayed list is executed for discards, and current list becomes delayed list. That way discarded blocks are a minimum of 5 minutes old. I guess there are security implications by delaying discard, so there might be a use case for non-delayed discards (?) -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
On 2017-09-12 20:52, Timofey Titovets wrote: No, no, no, no... No new ioctl, no change in fallocate. Fisrt: VM can do punch hole, if you use qemu -> qemu know how to do it. Windows Guest also know how to do it. Different Hypervisor? -> google -> Make issue to support, all Linux/Windows/Mac OS support holes in files. Not everybody who uses sparse files is using virtual machines. No new code, no new strange stuff to fix not broken things. Um, the fallocate PUNCH_HOLE mode _is_ broken. There's a race condition that can trivially cause data loss. You want replace zeroes? EXTENT_SAME can do that. But only on a small number of filesystems, and it requires extra work that shouldn't be necessary. truncate -s 4M test_hole dd if=/dev/zero of=./test_zero bs=4M duperemove -vhrd ./test_hole ./test_zero And performance for this approach is absolute shit compared to fallocate -d. Actual numbers, using a 4G test file (which is still small for what you're talking about) and a 4M hole file: fallocate -d: 0.19 user, 0.85 system, 1.26 real duperemove -vhrd: 0.75 user, 137.70 system, 144.80 real So, for a 4G file, it took duperemove (and the EXTENT_SAME ioctl) 114.92 times as long to achieve the same net effect. From a practical perspective, this isn't viable for regular usage just because of how long it takes. Most of that overhead is that the EXTENT_SAME ioctl does a byte-by-byte comparison of the ranges to make sure they match, but that isn't strictly necessary to avoid this race condition. All that's actually needed is determining if there is outstanding I/O on that region, and if so, some special handling prior to freezing the region is needed. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storage and snapshots as historical yearly
On 2017-09-13 07:51, Pete wrote: On 09/12/2017 01:16 PM, Austin S. Hemmelgarn wrote: Diverting away from the original topic, what issues with overlayfs and btrfs? As mentioned, I thought whiteout support was missing, but if you're using it without issue, I might be wrong. Whiteout works fine. Upper and lower layers and working directory are all on btrfs subvolumes. Snapshotting seems fine. Hmm, just double checked myself. Apparently I was operating based on old information. I'm using btrfs to create 'base' operating system containers (btrfs) and then using overlayfs for a few 'upper' containers for specific applications, so the upper parts of the overlays contain only the config and data files and I can apply OS updates only on the lower ones. I do note that changes in the 'base' os can take time to propagate to the upper containers and I'm probably not being sensible in not stopping the upper containers when updating the lower ones. This is also does not seem to be what overlaysfs is intended for. However, for my light usage it generally works OK and is useful to me. Actually, this is pretty well in-line with one of the intended use cases (it was mostly designed for efficient multiple instantiation of Docker or LXC containers). The other big use case is 'live' systems that only retain state while powered on, like most install images. OK, I only spotted the latter use case when reading up, apart from one website which seemed to mention using it for containers. Yeah, containers are the other big one. It's not unusual for someone to need to spin up a dozen or more instances of essentially the same container (think large build systems), and without an overlay mount, you end up multiplying the space for the base image times the number of containers. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
On 2017-09-12 17:13, Adam Borowski wrote: On Tue, Sep 12, 2017 at 04:12:32PM -0400, Austin S. Hemmelgarn wrote: On 2017-09-12 16:00, Adam Borowski wrote: Noted. Both Marat's and my use cases, though, involve VMs that are off most of the time, and at least for me, turned on only to test something. Touching mtime makes rsync run again, and it's freaking _slow_: worse than 40 minutes for a 40GB VM (source:SSD target:deduped HDD). 40 minutes for 40GB is insanely slow (that's just short of 18 MB/s) if you're going direct to a hard drive. I get better performance than that on my somewhat pathetic NUC based storage cluster (I get roughly 20 MB/s there, but it's for archival storage so I don't really care). I'm actually curious what the exact rsync command you are using is (you can obviously redact paths as you see fit), as the only way I can think of that it should be that slow is if you're using both --checksum (but if you're using this, you can tell rsync to skip the mtime check, and that issue goes away) and --inplace, _and_ your HDD is slow to begin with. rsync -axX --delete --inplace --numeric-ids /mnt/btr1/qemu/ mordor:$BASE/qemu The target is single, compress=zlib SAMSUNG HD204UI, 34976 hours old but with nothing notable on SMART, in a Qnap 253a, kernel 4.9. compress=zlib is probably your biggest culprit. As odd as this sounds, I'd suggest switching that to lzo (seriously, the performance difference is ludicrous), and then setting up a cron job (or systemd timer) to run defrag over things to switch to zlib. As a general point of comparison, we do archival backups to a file server running BTRFS where I work, and the archiving process runs about four to ten times faster if we take this approach (LZO for initial compression, then recompress using defrag once the initial transfer is done) than just using zlib directly. `--inplace` is probably not helping (especially if most of the file changed, on BTRFS, it actually is marginally more efficient to just write out a whole new file and then replace the old one with a rename if you're rewriting most of the file), but is probably not as much of an issue as compress=zlib. Both source and target are btrfs, but here switching to send|receive wouldn't give much as this particular guest is Win10 Insider Edition -- a thingy that shows what the folks from Redmond have cooked up, with roughly weekly updates to the tune of ~10GB writes 10GB deletions (if they do incremental transfers, installation still rewrites everything system). > Lemme look a bit more, rsync performance is indeed really abysmal compared to what it should be. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] drivers:ethernet: return -ENOMEM on allocation failure.
On Wed, Sep 13, 2017 at 01:02:15PM +0530, Allen Pais wrote: > Signed-off-by: Allen Pais > --- > drivers/net/ethernet/sun/cassini.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/sun/cassini.c > b/drivers/net/ethernet/sun/cassini.c > index 382993c..fc0ea3a 100644 > --- a/drivers/net/ethernet/sun/cassini.c > +++ b/drivers/net/ethernet/sun/cassini.c > @@ -3984,7 +3984,7 @@ static inline int cas_alloc_rx_desc(struct cas *cp, int > ring) > size = RX_DESC_RINGN_SIZE(ring); > for (i = 0; i < size; i++) { > if ((page[i] = cas_page_alloc(cp, GFP_KERNEL)) == NULL) > - return -1; > + return -ENOMEM; > } > return 0; > } static int cas_alloc_rxds(struct cas *cp) { int i; for (i = 0; i < N_RX_DESC_RINGS; i++) { if (cas_alloc_rx_desc(cp, i) < 0) { cas_free_rxds(cp); return -1; } } return 0; } Again, your change is correct, but in the end the value is not used. And if you fix it at the cas_alloc_rxds level, you also need a fix at the next level up: err = -ENOMEM; if (cas_tx_tiny_alloc(cp) < 0) goto err_unlock; /* alloc rx descriptors */ if (cas_alloc_rxds(cp) < 0) goto err_tx_tiny; again, the return value is discarded. Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] drivers:net: return -ENOMEM on allocation failure.
On Wed, Sep 13, 2017 at 01:02:14PM +0530, Allen Pais wrote: > Signed-off-by: Allen Pais Hi Allen Although correct, if you look higher up the call chain, this appears to be not so useful. rlb_initialize() is only called by bond_alb_initialize(), and it propagates the -1. That is only called by bond_open() with: if (bond_alb_initialize(bond, (BOND_MODE(bond) == BOND_MODE_ALB))) return -ENOMEM; So you might want to also modify this code, to return the return value, rather than use the hard coded ENOMEM. Since you code is OK as far as it goes: Reviewed-by: Andrew Lunn Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storage and snapshots as historical yearly
On 09/12/2017 01:16 PM, Austin S. Hemmelgarn wrote: >> Diverting away from the original topic, what issues with overlayfs and >> btrfs? > As mentioned, I thought whiteout support was missing, but if you're > using it without issue, I might be wrong. Whiteout works fine. Upper and lower layers and working directory are all on btrfs subvolumes. Snapshotting seems fine. >> I'm using btrfs to create 'base' operating system containers (btrfs) and >> then using overlayfs for a few 'upper' containers for specific >> applications, so the upper parts of the overlays contain only the config >> and data files and I can apply OS updates only on the lower ones. >> >> I do note that changes in the 'base' os can take time to propagate to >> the upper containers and I'm probably not being sensible in not stopping >> the upper containers when updating the lower ones. This is also does >> not seem to be what overlaysfs is intended for. However, for my light >> usage it generally works OK and is useful to me. > Actually, this is pretty well in-line with one of the intended use cases > (it was mostly designed for efficient multiple instantiation of Docker > or LXC containers). The other big use case is 'live' systems that only > retain state while powered on, like most install images. OK, I only spotted the latter use case when reading up, apart from one website which seemed to mention using it for containers. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: incremental send, apply asynchronous page cache readahead
On Wed, Sep 13, 2017 at 7:38 AM, peterh wrote: > From: Kuanling Huang > > By analyzing the perf on btrfs send, we found it take large > amount of cpu time on page_cache_sync_readahead. This effort > can be reduced after switching to asynchronous one. Overall > performance gain on HDD and SSD were 9 and 15 respectively if > simply send a large file. Besides what was pointed before, about saying what those 9 and 15 are, the subject mentions incremental send, but there's nothing here that is specific to incremental sends, as it applies to full send operations as well, so please also change the subject. > > Signed-off-by: Kuanling Huang > --- > fs/btrfs/send.c | 21 - > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index 63a6152..ac67ff6 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -4475,16 +4475,27 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, > u64 offset, u32 len) > /* initial readahead */ > memset(&sctx->ra, 0, sizeof(struct file_ra_state)); > file_ra_state_init(&sctx->ra, inode->i_mapping); > - btrfs_force_ra(inode->i_mapping, &sctx->ra, NULL, index, > - last_index - index + 1); > > while (index <= last_index) { > unsigned cur_len = min_t(unsigned, len, > PAGE_CACHE_SIZE - pg_offset); > - page = find_or_create_page(inode->i_mapping, index, GFP_NOFS); You based this patch on an old code base. Currently it is GFP_KERNEL and not GFP_NOFS anymore. Please update the patch. > + page = find_lock_page(inode->i_mapping, index); > if (!page) { > - ret = -ENOMEM; > - break; > + page_cache_sync_readahead(inode->i_mapping, > + &sctx->ra, NULL, index, > + last_index + 1 - index); > + > + page = find_or_create_page(inode->i_mapping, index, > GFP_NOFS); > + if (unlikely(!page)) { Please avoid polluting the code with unlikely/likely macros (unless there's really a significant performance win, which isn't the case here I bet). > + ret = -ENOMEM; > + break; > + } > + } > + > + if (PageReadahead(page)) { > + page_cache_async_readahead(inode->i_mapping, > + &sctx->ra, NULL, page, index, > + last_index + 1 - index); > } > > if (!PageUptodate(page)) { > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: incremental send, apply asynchronous page cache readahead
Hi Pasi Sorry for missing the word. 9 and 15 percent performance gain on HDD and SSD after the patch is applied. Kuanling Pasi Kärkkäinen 於 2017-09-13 17:33 寫到: Hi, On Wed, Sep 13, 2017 at 02:38:49PM +0800, peterh wrote: From: Kuanling Huang By analyzing the perf on btrfs send, we found it take large amount of cpu time on page_cache_sync_readahead. This effort can be reduced after switching to asynchronous one. Overall performance gain on HDD and SSD were 9 and 15 respectively if simply send a large file. hmm, 9 and 15 what? -- Pasi Signed-off-by: Kuanling Huang --- fs/btrfs/send.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 63a6152..ac67ff6 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4475,16 +4475,27 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len) /* initial readahead */ memset(&sctx->ra, 0, sizeof(struct file_ra_state)); file_ra_state_init(&sctx->ra, inode->i_mapping); - btrfs_force_ra(inode->i_mapping, &sctx->ra, NULL, index, - last_index - index + 1); while (index <= last_index) { unsigned cur_len = min_t(unsigned, len, PAGE_CACHE_SIZE - pg_offset); - page = find_or_create_page(inode->i_mapping, index, GFP_NOFS); + page = find_lock_page(inode->i_mapping, index); if (!page) { - ret = -ENOMEM; - break; + page_cache_sync_readahead(inode->i_mapping, + &sctx->ra, NULL, index, + last_index + 1 - index); + + page = find_or_create_page(inode->i_mapping, index, GFP_NOFS); + if (unlikely(!page)) { + ret = -ENOMEM; + break; + } + } + + if (PageReadahead(page)) { + page_cache_async_readahead(inode->i_mapping, + &sctx->ra, NULL, page, index, + last_index + 1 - index); } if (!PageUptodate(page)) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: incremental send, apply asynchronous page cache readahead
Hi, On Wed, Sep 13, 2017 at 02:38:49PM +0800, peterh wrote: > From: Kuanling Huang > > By analyzing the perf on btrfs send, we found it take large > amount of cpu time on page_cache_sync_readahead. This effort > can be reduced after switching to asynchronous one. Overall > performance gain on HDD and SSD were 9 and 15 respectively if > simply send a large file. > hmm, 9 and 15 what? -- Pasi > Signed-off-by: Kuanling Huang > --- > fs/btrfs/send.c | 21 - > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index 63a6152..ac67ff6 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -4475,16 +4475,27 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, > u64 offset, u32 len) > /* initial readahead */ > memset(&sctx->ra, 0, sizeof(struct file_ra_state)); > file_ra_state_init(&sctx->ra, inode->i_mapping); > - btrfs_force_ra(inode->i_mapping, &sctx->ra, NULL, index, > -last_index - index + 1); > > while (index <= last_index) { > unsigned cur_len = min_t(unsigned, len, >PAGE_CACHE_SIZE - pg_offset); > - page = find_or_create_page(inode->i_mapping, index, GFP_NOFS); > + page = find_lock_page(inode->i_mapping, index); > if (!page) { > - ret = -ENOMEM; > - break; > + page_cache_sync_readahead(inode->i_mapping, > + &sctx->ra, NULL, index, > + last_index + 1 - index); > + > + page = find_or_create_page(inode->i_mapping, index, > GFP_NOFS); > + if (unlikely(!page)) { > + ret = -ENOMEM; > + break; > + } > + } > + > + if (PageReadahead(page)) { > + page_cache_async_readahead(inode->i_mapping, > + &sctx->ra, NULL, page, index, > + last_index + 1 - index); > } > > if (!PageUptodate(page)) { > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/10] driver:gpu: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/gpu/drm/gma500/mid_bios.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/gma500/mid_bios.c b/drivers/gpu/drm/gma500/mid_bios.c index d75ecb3..1fa1633 100644 --- a/drivers/gpu/drm/gma500/mid_bios.c +++ b/drivers/gpu/drm/gma500/mid_bios.c @@ -237,7 +237,7 @@ static int mid_get_vbt_data_r10(struct drm_psb_private *dev_priv, u32 addr) gct = kmalloc(sizeof(*gct) * vbt.panel_count, GFP_KERNEL); if (!gct) - return -1; + return -ENOMEM; gct_virtual = ioremap(addr + sizeof(vbt), sizeof(*gct) * vbt.panel_count); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] arch:powerpc: return -ENOMEM on failed allocation
Signed-off-by: Allen Pais --- arch/powerpc/platforms/cell/spider-pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/cell/spider-pci.c b/arch/powerpc/platforms/cell/spider-pci.c index d1e61e2..82aa3f7 100644 --- a/arch/powerpc/platforms/cell/spider-pci.c +++ b/arch/powerpc/platforms/cell/spider-pci.c @@ -106,7 +106,7 @@ static int __init spiderpci_pci_setup_chip(struct pci_controller *phb, dummy_page_va = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!dummy_page_va) { pr_err("SPIDERPCI-IOWA:Alloc dummy_page_va failed.\n"); - return -1; + return -ENOMEM; } dummy_page_da = dma_map_single(phb->parent, dummy_page_va, @@ -137,7 +137,7 @@ int __init spiderpci_iowa_init(struct iowa_bus *bus, void *data) if (!priv) { pr_err("SPIDERPCI-IOWA:" "Can't allocate struct spiderpci_iowa_private"); - return -1; + return -ENOMEM; } if (of_address_to_resource(np, 0, &r)) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] drivers:net: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/net/bonding/bond_alb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index c02cc81..89df377 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -864,7 +864,7 @@ static int rlb_initialize(struct bonding *bond) new_hashtbl = kmalloc(size, GFP_KERNEL); if (!new_hashtbl) - return -1; + return -ENOMEM; spin_lock_bh(&bond->mode_lock); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] drivers:crypto: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/crypto/omap-aes-gcm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/omap-aes-gcm.c b/drivers/crypto/omap-aes-gcm.c index 7d4f8a4..2542224 100644 --- a/drivers/crypto/omap-aes-gcm.c +++ b/drivers/crypto/omap-aes-gcm.c @@ -186,7 +186,7 @@ static int do_encrypt_iv(struct aead_request *req, u32 *tag, u32 *iv) sk_req = skcipher_request_alloc(ctx->ctr, GFP_KERNEL); if (!sk_req) { pr_err("skcipher: Failed to allocate request\n"); - return -1; + return -ENOMEM; } init_completion(&result.completion); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] drivers:mpt: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/message/fusion/mptbase.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index 84eab28..7920b2b 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -4328,15 +4328,15 @@ initChainBuffers(MPT_ADAPTER *ioc) if (ioc->ReqToChain == NULL) { sz = ioc->req_depth * sizeof(int); mem = kmalloc(sz, GFP_ATOMIC); - if (mem == NULL) - return -1; + if (!mem) + return -ENOMEM; ioc->ReqToChain = (int *) mem; dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "ReqToChain alloc @ %p, sz=%d bytes\n", ioc->name, mem, sz)); mem = kmalloc(sz, GFP_ATOMIC); - if (mem == NULL) - return -1; + if (!mem) + return -ENOMEM; ioc->RequestNB = (int *) mem; dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "RequestNB alloc @ %p, sz=%d bytes\n", @@ -4402,8 +4402,8 @@ initChainBuffers(MPT_ADAPTER *ioc) sz = num_chain * sizeof(int); if (ioc->ChainToChain == NULL) { mem = kmalloc(sz, GFP_ATOMIC); - if (mem == NULL) - return -1; + if (!mem) + return -ENOMEM; ioc->ChainToChain = (int *) mem; dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "ChainToChain alloc @ %p, sz=%d bytes\n", -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] driver:megaraid: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/scsi/megaraid/megaraid_mbox.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_mbox.c b/drivers/scsi/megaraid/megaraid_mbox.c index ec3c438..b09a0a6 100644 --- a/drivers/scsi/megaraid/megaraid_mbox.c +++ b/drivers/scsi/megaraid/megaraid_mbox.c @@ -724,8 +724,8 @@ megaraid_init_mbox(adapter_t *adapter) * controllers */ raid_dev = kzalloc(sizeof(mraid_device_t), GFP_KERNEL); - if (raid_dev == NULL) return -1; - + if (!raid_dev) + return -ENOMEM; /* * Attach the adapter soft state to raid device soft state -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/10] drivers:ethernet: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/net/ethernet/sun/cassini.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sun/cassini.c b/drivers/net/ethernet/sun/cassini.c index 382993c..fc0ea3a 100644 --- a/drivers/net/ethernet/sun/cassini.c +++ b/drivers/net/ethernet/sun/cassini.c @@ -3984,7 +3984,7 @@ static inline int cas_alloc_rx_desc(struct cas *cp, int ring) size = RX_DESC_RINGN_SIZE(ring); for (i = 0; i < size; i++) { if ((page[i] = cas_page_alloc(cp, GFP_KERNEL)) == NULL) - return -1; + return -ENOMEM; } return 0; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/10] driver:cxgbit: return -NOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/target/iscsi/cxgbit/cxgbit_target.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/iscsi/cxgbit/cxgbit_target.c b/drivers/target/iscsi/cxgbit/cxgbit_target.c index 514986b..47127d6 100644 --- a/drivers/target/iscsi/cxgbit/cxgbit_target.c +++ b/drivers/target/iscsi/cxgbit/cxgbit_target.c @@ -399,7 +399,7 @@ cxgbit_map_skb(struct iscsi_cmd *cmd, struct sk_buff *skb, u32 data_offset, if (padding) { page = alloc_page(GFP_KERNEL | __GFP_ZERO); if (!page) - return -1; + return -ENOMEM; skb_fill_page_desc(skb, i, page, 0, padding); skb->data_len += padding; skb->len += padding; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] driver:video: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- drivers/video/fbdev/matrox/matroxfb_base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/video/fbdev/matrox/matroxfb_base.c b/drivers/video/fbdev/matrox/matroxfb_base.c index f6a0b9a..5cd238d 100644 --- a/drivers/video/fbdev/matrox/matroxfb_base.c +++ b/drivers/video/fbdev/matrox/matroxfb_base.c @@ -2058,7 +2058,7 @@ static int matroxfb_probe(struct pci_dev* pdev, const struct pci_device_id* dumm minfo = kzalloc(sizeof(*minfo), GFP_KERNEL); if (!minfo) - return -1; + return -ENOMEM; minfo->pcidev = pdev; minfo->dead = 0; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] fs:btrfs: return -ENOMEM on allocation failure.
Signed-off-by: Allen Pais --- fs/btrfs/check-integrity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 7d5a9b5..efa4c23 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -2913,7 +2913,7 @@ int btrfsic_mount(struct btrfs_fs_info *fs_info, state = kvzalloc(sizeof(*state), GFP_KERNEL); if (!state) { pr_info("btrfs check-integrity: allocation failed!\n"); - return -1; + return -ENOMEM; } if (!btrfsic_is_initialized) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html