Re: btrfs/git question.
On Tue, Nov 29, 2011 at 10:22 PM, Chris Mason wrote: > On Tue, Nov 29, 2011 at 09:33:37AM +0700, Fajar A. Nugraha wrote: >> On Tue, Nov 29, 2011 at 8:58 AM, Phillip Susi wrote: >> > On 11/28/2011 12:53 PM, Ken D'Ambrosio wrote: >> >> Seems I've picked up a wireless regression, and randomly drop my WiFi >> >> connection with more recent kernels. While I'd love to try to track down >> >> the >> >> issue, the sporadic nature makes it difficult. But I don't want to >> >> revert to a >> >> flat-out old kernel because of all the btrfs modifications. Is it >> >> possible >> >> using git to add *just* btrfs patches to an older kernel? >> > >> > Sure: use git rebase to apply the patches to the older kernel. >> >> ... or use 3.1.2, and get ONLY fs/btrfs from Chris' for-linus tree, >> compile it out-of-tree, and use it to replace the original btrfs.ko. > > If you're on a 3.1 kernel, you can pull my for-linus directly on top of > it with git pull. I always keep a btrfs tree against the previous > kernel so that people can use the latest btrfs goodness without having > to use an rc kernel. Yes, thanks for that. My suggestion is simply an alternative (instead of git pull) for people who: - aren't quite familiar with git, but know enough to grab a directory snapshot from gitweb (e.g. http://git.kernel.org/?p=linux/kernel/git/mason/linux-btrfs.git;a=tree;f=fs/btrfs;h=5f51bd7e3b8b6c4825681408450e6580bdbccce1;hb=refs/heads/for-linus) - know how to build a module out-of-tree - on the latest stable, but don't want to re-compile the whole kernel just to get btrfs fix -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: set the i_nlink to 2 for an initial dir inode
On 11/29/2011 11:48 PM, Chris Mason wrote: > On Tue, Nov 29, 2011 at 02:04:37PM +0800, Jeff Liu wrote: >> Please ignore this patch for now, it can cause the file system corrupted >> and failed to mount again, sorry for the noise! > > Directories always have a link count of 1 in btrfs. This tells find not > to use the link count as the count of subdirectories in the directory. Thank you for your clarification! -Jeff > > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] Sector Size check during Mount
Gracefully fail when trying to mount a BTRFS file system that has a sectorsize smaller than PAGE_SIZE. On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel then boot into a 64K PAGE_SIZE kernel. Presently open_ctree fails in an endless loop and hangs the machine in this situation. My debugging has show this Sector size < Page size to be a non trivial situation and a graceful exit from the situation would be nice for the time being. Signed-off-by: Keith Mannthey --- diff -urN a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c --- a/fs/btrfs/disk-io.c2011-10-09 21:53:11.0 -0500 +++ b/fs/btrfs/disk-io.c2011-11-29 17:33:15.0 -0600 @@ -1916,6 +1916,12 @@ goto fail_sb_buffer; } + if (sectorsize < PAGE_SIZE) { + printk(KERN_WARNING "btrfs: Incompatible sector size " + "found on %s\n", sb->s_id); + goto fail_sb_buffer; + } + mutex_lock(&fs_info->chunk_mutex); ret = btrfs_read_sys_array(tree_root); mutex_unlock(&fs_info->chunk_mutex); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix submit_worker congestion
On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote: > Write bios are submitted from the submit_worker. The worker pumps down > bios into the block layer until it signals a congestion. At least this > is the theory. In pratice submit_bio just blocks before any signalling > happens. As the bios are queued per device, this can lead to a situation > where only one device is served until all bios are submitted, and only > then the next device is served. This is obviously suboptimal. > This patch just throws out the congestion detection and reschedules the > worker every 8 requests. This way, all devices can be kept busy. > This is only a temporary fix until the block layer provides a non-blocking > submit_bio. Then the whole submit_worker mechanism can be killed. The problem with the every 8 requests logic is that we've still got a pretty good chance of getting stuck behind get_request_wait. The way the elevator batching works is that it should give us a batch of requests, and once that batch is done we wait. If we jump around every 8 requests, we've turned this: [ dev A bio 1-8, dev A bio 8-16, dev A bio 16-32, dev B bio 1-8, dev B ... ] into: [ dev A bio 1-8, dev B bio 1-8, dev A bio 8-16, dev B bio 8-16 ] They look like the same IO, but if we wait for a request when we do (dev B bio 1-8) then our dev A bio 1-8 bio is likely to dispatch without all the other dev A bios we had queued. As you said in IRC, we'd be better off with one thread per device or (my preference) with a real non-blocking submit_bio. What kind of results did you get with your test from bumping the nr_requests? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/20] Btrfs: initialize new bitmaps' list
2011/11/28 Alexandre Oliva : > We're failing to create clusters with bitmaps because > setup_cluster_no_bitmap checks that the list is empty before inserting > the bitmap entry in the list for setup_cluster_bitmap, but the list > field is only initialized when it is restored from the on-disk free > space cache, or when it is written out to disk. > > Besides a potential race condition due to the multiple use of the list > field, filesystem performance severely degrades over time: as we use > up all non-bitmap free extents, the try-to-set-up-cluster dance is > done at every metadata block allocation. For every block group, we > fail to set up a cluster, and after failing on them all up to twice, > we fall back to the much slower unclustered allocation. This matches exactly what I've been observing in our ceph cluster. I've now installed your patches (1-11) on two servers. The cluster setup problem seems to be gone. - A big thanks for that! However another thing is causing me some headeache: When I'm doing havy reading in our ceph cluster. The load and wait-io on the patched servers is higher than on the unpatched ones. Dstat from an unpatched server: total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 1 6 83 8 0 1| 22M 348k| 336k 93M| 0 0 |8445 3715 1 5 87 7 0 1| 12M 1808k| 214k 65M| 0 0 |5461 1710 1 3 85 10 0 0| 11M 640k| 313k 49M| 0 0 |5919 2853 1 6 84 9 0 1| 12M 608k| 358k 69M| 0 0 |7406 3645 1 7 78 13 0 1| 15M 5344k| 348k 105M| 0 0 |9765 4403 1 7 80 10 0 1| 22M 1368k| 358k 89M| 0 0 |8036 3202 1 9 72 16 0 1| 22M 2424k| 646k 137M| 0 0 | 12k 5527 Dstat from a patched server: ---total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 1 2 61 35 0 0|2500k 2736k| 141k 34M| 0 0 |4415 1603 1 4 48 47 0 1| 10M 3924k| 353k 61M| 0 0 |6871 3771 1 5 55 38 0 1| 10M 1728k| 385k 92M| 0 0 |8030 2617 2 8 69 20 0 1| 18M 1384k| 435k 130M| 0 0 | 10k 4493 1 5 85 8 0 1|7664k 84k| 287k 97M| 0 0 |6231 1357 1 3 91 5 0 0| 10M 144k| 194k 44M| 0 0 |3807 1081 1 7 66 25 0 1| 20M 1248k| 404k 101M| 0 0 |8676 3632 0 3 38 58 0 0|8104k 2660k| 176k 40M| 0 0 |4841 2093 This seems to be coming from "btrfs-endio-1". A kernel thread that has not caught my attention on unpatched systems, yet. I did some tracing on that process with ftrace and I can see that the time is wasted in end_bio_extent_readpage(). In a single call to end_bio_extent_readpage()the functions unlock_extent_cached(), unlock_page() and btrfs_readpage_end_io_hook() are invoked 128 times (each). Do you have any idea what's going on here? (Please note that the filesystem is still unmodified - metadata overhead is large). Thanks, Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix submit_worker congestion
Write bios are submitted from the submit_worker. The worker pumps down bios into the block layer until it signals a congestion. At least this is the theory. In pratice submit_bio just blocks before any signalling happens. As the bios are queued per device, this can lead to a situation where only one device is served until all bios are submitted, and only then the next device is served. This is obviously suboptimal. This patch just throws out the congestion detection and reschedules the worker every 8 requests. This way, all devices can be kept busy. This is only a temporary fix until the block layer provides a non-blocking submit_bio. Then the whole submit_worker mechanism can be killed. Signed-off-by: Arne Jansen --- fs/btrfs/volumes.c | 30 +- 1 files changed, 1 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c37433d..5b01742 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -257,36 +257,8 @@ loop_lock: * is now congested. Back off and let other work structs * run instead */ - if (pending && bdi_write_congested(bdi) && batch_run > 8 && + if (pending && batch_run > 8 && fs_info->fs_devices->open_devices > 1) { - struct io_context *ioc; - - ioc = current->io_context; - - /* -* the main goal here is that we don't want to -* block if we're going to be able to submit -* more requests without blocking. -* -* This code does two great things, it pokes into -* the elevator code from a filesystem _and_ -* it makes assumptions about how batching works. -*/ - if (ioc && ioc->nr_batch_requests > 0 && - time_before(jiffies, ioc->last_waited + HZ/50UL) && - (last_waited == 0 || -ioc->last_waited == last_waited)) { - /* -* we want to go through our batch of -* requests and stop. So, we copy out -* the ioc->last_waited time and test -* against it before looping -*/ - last_waited = ioc->last_waited; - if (need_resched()) - cond_resched(); - continue; - } spin_lock(&device->io_lock); requeue_list(pending_bios, pending, tail); device->running_pending = 1; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
On Tue, Nov 29, 2011 at 05:47:46PM +0100, David Sterba wrote: > On Tue, Nov 29, 2011 at 10:49:13AM -0500, Chris Mason wrote: > > The good news about this one is that it is very clear cut. The hard > > part is figuring out where these bogus link counts came from. > > > > I'd suggest that you spend some time running memtest on the machine. > > Just to add some evidence from the log: > > Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at > /home/apw/COD/linux/fs/btrfs/extent-tree.c:4775! > Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX: > ea01 RBX: 880412c3ab40 RCX: 880380173900 > > > 4765 ret = btrfs_search_slot(trans, extent_root, > 4766 &key, path, -1, 1); > 4767 if (ret) { > 4768 printk(KERN_ERR "umm, got %d back from > search" > 4769", was looking for %llu\n", ret, > 4770(unsigned long long)bytenr); > 4771 if (ret > 0) > 4772 btrfs_print_leaf(extent_root, > 4773 path->nodes[0]); > 4774 } > 4775 BUG_ON(ret); > > the ret value comes from btrfs_search_slot, returning " < 0" or 1, but > RAX has some extra bits set, this could really be a RAM failure. > > > david Interesting, look at this: > karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0 > ref mismatch on [2176962560 8192] extent item 480, found 1 > Incorrect local backref count on 2176970752 root 5 owner 2101705 > offset 368640 found 1 wanted 3925868545 > backpointer mismatch on [2176970752 4096] 3925868545 == EA01 Are you sure this is the BUG_ON he was triggering? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: simplify move_pages and copy_pages
After commit a65917156e34594 ("Btrfs: stop using highmem for extent_buffers") we don't need to kmap_atomic anymore and can simplify both functions. Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 19 --- 1 files changed, 4 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9472d3d..9e04d9b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4238,16 +4238,9 @@ static void move_pages(struct page *dst_page, struct page *src_page, unsigned long len) { char *dst_kaddr = page_address(dst_page); - if (dst_page == src_page) { - memmove(dst_kaddr + dst_off, dst_kaddr + src_off, len); - } else { - char *src_kaddr = page_address(src_page); - char *p = dst_kaddr + dst_off + len; - char *s = src_kaddr + src_off + len; + char *src_kaddr = page_address(src_page); - while (len--) - *--p = *--s; - } + memmove(dst_kaddr + dst_off, src_kaddr + src_off, len); } static inline bool areas_overlap(unsigned long src, unsigned long dst, unsigned long len) @@ -4261,14 +4254,10 @@ static void copy_pages(struct page *dst_page, struct page *src_page, unsigned long len) { char *dst_kaddr = page_address(dst_page); - char *src_kaddr; + char *src_kaddr = page_address(src_page); - if (dst_page != src_page) { - src_kaddr = page_address(src_page); - } else { - src_kaddr = dst_kaddr; + if (dst_page == src_page) BUG_ON(areas_overlap(src_off, dst_off, len)); - } memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len); } -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
On Tue, Nov 29, 2011 at 10:49:13AM -0500, Chris Mason wrote: > The good news about this one is that it is very clear cut. The hard > part is figuring out where these bogus link counts came from. > > I'd suggest that you spend some time running memtest on the machine. Just to add some evidence from the log: Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent-tree.c:4775! Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX: ea01 RBX: 880412c3ab40 RCX: 880380173900 4765 ret = btrfs_search_slot(trans, extent_root, 4766 &key, path, -1, 1); 4767 if (ret) { 4768 printk(KERN_ERR "umm, got %d back from search" 4769", was looking for %llu\n", ret, 4770(unsigned long long)bytenr); 4771 if (ret > 0) 4772 btrfs_print_leaf(extent_root, 4773 path->nodes[0]); 4774 } 4775 BUG_ON(ret); the ret value comes from btrfs_search_slot, returning " < 0" or 1, but RAX has some extra bits set, this could really be a RAM failure. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fix bugs of sub transid -- WARNING: at fs/btrfs/ctree.c:432
On Tue, Nov 29, 2011 at 09:18:35AM +0800, Liu Bo wrote: > a) For the first one (last_snapshot bug), > > The test involves three processes (derived from Chris): > > mkfs.btrfs /dev/xxx > mount /dev/xxx /mnt > > 1) run compilebench -i 30 --makej -D /mnt > > Let compilebench run until it starts the create phase. > > 2) run synctest -f -u -n 200 -t 3 /mnt > 3) for x in `seq 1 200` ; do btrfs subvol snap /mnt /mnt/snap$x ; sleep 0.5 ; > done I have hit following 2 warnings during this test. Phase 1 was at compile stage, 2 and 3 were running. I did not see them during first run and other activity at the filestystem was 'du -sh /mnt'. mount options: compress-force=lzo,discard,space_cache,autodefrag,inode_cache Label: none uuid: 79f4160b-81f8-46ed-968c-968cb17a2e87 Total devices 4 FS bytes used 7.76GB devid4 size 13.96GB used 2.26GB path /dev/sdb4 devid3 size 13.96GB used 2.26GB path /dev/sdb3 devid2 size 13.96GB used 3.00GB path /dev/sdb2 devid1 size 13.96GB used 3.02GB path /dev/sdb1 fresh and default mkfs 430 WARN_ON(root->ref_cows && trans->transaction->transid != 431 root->fs_info->running_transaction->transid); 432 WARN_ON(root->ref_cows && trans->transid < root->last_trans); 20433.473713] [ cut here ] [20433.478825] WARNING: at fs/btrfs/ctree.c:432 __btrfs_cow_block+0x429/0x5e0 [btrfs]() [20433.487148] Hardware name: Santa Rosa platform [20433.487150] Modules linked in: btrfs aoe sr_mod ide_cd_mod cdrom loop [last unloaded: btrfs] [20433.487162] Pid: 12099, comm: btrfs Tainted: GW 3.1.0-default+ #80 [20433.487165] Call Trace: [20433.487174] [] warn_slowpath_common+0x7f/0xc0 [20433.487179] [] warn_slowpath_null+0x1a/0x20 [20433.487190] [] __btrfs_cow_block+0x429/0x5e0 [btrfs] [20433.487196] [] ? trace_hardirqs_off_caller+0x29/0xc0 [20433.487201] [] ? lock_release_holdtime+0x3d/0x1c0 [20433.487218] [] ? btrfs_set_lock_blocking_rw+0x50/0xb0 [btrfs] [20433.487230] [] btrfs_cow_block+0x1a6/0x3d0 [btrfs] [20433.487236] [] ? _raw_write_unlock+0x2b/0x50 [20433.487247] [] btrfs_search_slot+0x300/0xd20 [btrfs] [20433.487262] [] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [20433.487279] [] btrfs_update_inode_item+0x66/0x120 [btrfs] [20433.487296] [] btrfs_update_inode+0xab/0xc0 [btrfs] [20433.487313] [] ? lookup_free_ino_inode+0x51/0xe0 [btrfs] [20433.487327] [] btrfs_save_ino_cache+0x145/0x2f0 [btrfs] [20433.487342] [] ? commit_fs_roots+0xa4/0x1c0 [btrfs] [20433.487357] [] commit_fs_roots+0xd4/0x1c0 [btrfs] [20433.487373] [] btrfs_commit_transaction+0x454/0x900 [btrfs] [20433.487378] [] ? lock_release_holdtime+0x3d/0x1c0 [20433.487395] [] ? btrfs_mksubvol+0x298/0x360 [btrfs] [20433.487400] [] ? wake_up_bit+0x40/0x40 [20433.487405] [] ? do_raw_spin_unlock+0x5e/0xb0 [20433.487421] [] btrfs_mksubvol+0x358/0x360 [btrfs] [20433.487427] [] ? might_fault+0x53/0xb0 [20433.487443] [] btrfs_ioctl_snap_create_transid+0x100/0x160 [btrfs] [20433.487448] [] ? might_fault+0x53/0xb0 [20433.487464] [] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs] [20433.487482] [] btrfs_ioctl+0x588/0x1080 [btrfs] [20433.487487] [] ? do_page_fault+0x2d0/0x580 [20433.487492] [] ? local_clock+0x6f/0x80 [20433.487498] [] do_vfs_ioctl+0x98/0x560 [20433.487502] [] ? retint_swapgs+0x13/0x1b [20433.487507] [] sys_ioctl+0x4f/0x80 [20433.487512] [] system_call_fastpath+0x16/0x1b [20433.487515] ---[ end trace d93007cf8d0a8eac ]--- [20433.487576] [ cut here ] [20433.487587] WARNING: at fs/btrfs/ctree.c:432 __btrfs_cow_block+0x429/0x5e0 [btrfs]() [20433.487590] Hardware name: Santa Rosa platform [20433.487592] Modules linked in: btrfs aoe sr_mod ide_cd_mod cdrom loop [last unloaded: btrfs] [20433.487601] Pid: 12099, comm: btrfs Tainted: GW 3.1.0-default+ #80 [20433.487603] Call Trace: [20433.487608] [] warn_slowpath_common+0x7f/0xc0 [20433.487613] [] warn_slowpath_null+0x1a/0x20 [20433.487623] [] __btrfs_cow_block+0x429/0x5e0 [btrfs] [20433.487628] [] ? trace_hardirqs_off_caller+0x29/0xc0 [20433.487633] [] ? lock_release_holdtime+0x3d/0x1c0 [20433.487649] [] ? btrfs_set_lock_blocking_rw+0x50/0xb0 [btrfs] [20433.487660] [] btrfs_cow_block+0x1a6/0x3d0 [btrfs] [20433.487665] [] ? _raw_write_unlock+0x2b/0x50 [20433.487676] [] btrfs_search_slot+0x300/0xd20 [btrfs] [20433.487691] [] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [20433.487707] [] btrfs_update_inode_item+0x66/0x120 [btrfs] [20433.487723] [] btrfs_update_inode+0xab/0xc0 [btrfs] [20433.487739] [] ? lookup_free_ino_inode+0x51/0xe0 [btrfs] [20433.487753] [] btrfs_save_ino_cache+0x145/0x2f0 [btrfs] [20433.487769] [] ? commit_fs_roots+0xa4/0x1c0 [btrfs] [20433.487784] [] commit_fs_roots+0xd4/0x1c0 [btrfs] [20433.487800] [] btrfs_commit_transaction+0x454/0x900 [btrfs] [20433.487805] [] ? lock_release_holdtime+0x3d/0x1c0 [20433.487821] [] ? btrfs_mksubvol+0x298/0x360 [btrfs] [20433.487826]
Re: [PATCH 2/2] Btrfs: set the i_nlink to 2 for an initial dir inode
On 29.11.2011 16:48, Chris Mason wrote: > On Tue, Nov 29, 2011 at 02:04:37PM +0800, Jeff Liu wrote: >> Please ignore this patch for now, it can cause the file system corrupted >> and failed to mount again, sorry for the noise! > > Directories always have a link count of 1 in btrfs. This tells find not > to use the link count as the count of subdirectories in the directory. I'm surprised. Now I see why my thread "Creation of pseudo items leads to (seemingly) duplicate inodes (BUG inside)" suffered from little attention :-) -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fs: push file_update_time into ->page_mkwrite
On Tue, Nov 29, 2011 at 04:50:20PM +0100, Jan Kara wrote: > On Tue 29-11-11 10:40:59, Josef Bacik wrote: > > The fault code has been calling file_update_time after ->page_mkwrite after > > it > > drops the page lock, but this is annoying because this calls > > mark_inode_dirty > > which can fail in Btrfs, so we want to be able to do these updates in > > ->page_mkwrite so we can get an error back to the user. So get rid of the > > file_update_time calls in the fault code and push it into everybody who has > > a > > ->page_mkwrite. I didn't do this for ubifs because it appears that ubifs > > already updates the time itself in ->page_mkwrite, presumebly for the same > > reasons as btrfs, so I left it as is. Thanks, > But this effectively disables atime updates on mmaped writes for ext2, > ext3, and similar filesystems which is a no-go IMHO. > Heh doh you're right, I have vacation brain. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
On Tue, Nov 29, 2011 at 04:29:54PM +0100, Karl Mardoff Kittilsen wrote: > Den 29. nov. 2011 16:12, skrev Chris Mason: > >On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote: > >>Hi! > >> > >>Sending a mail on this issue, as advised on IRC. > >> > >>My /home file system fails to mount and the kernel seem to freeze > >>and I need to do the Alt+SysRq RSNEIUB routine to boot it safely. > >>The corruption happened on a 3.2-rc kernel and Ubuntu > >>11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic > >>kernel to see if that helped, it did not. > >>btrfsck from the latest btrfs-tools returns: > >> > >>karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0 > >>ref mismatch on [2176962560 8192] extent item 480, found 1 > >>Incorrect local backref count on 2176970752 root 5 owner 2101705 > >>offset 368640 found 1 wanted 3925868545 > >>backpointer mismatch on [2176970752 4096] > > > >So the crashes below were because we tried to free one of these extents. > >You have two extents whose reference counts are way off. > > > >Unfortunately this is stored on disk, so different kernels aren't going > >to fix it (yet). One of the extents is in a file with inode number > >2101705, and the other is in a btree block (2176962560). > > > >I'll be able to fix this soon, but we can also make a patch that changes > >those BUG_ONs to just deal with the mismatch. The worst case here would > >be leaking those two extents, about 12K of data. > > > >-chris > > Thank you for looking into it, and that does sounds really > promising. I am available to test any patches you want tested. Is > there anything else that I can do to help getting this issue fixed? The good news about this one is that it is very clear cut. The hard part is figuring out where these bogus link counts came from. I'd suggest that you spend some time running memtest on the machine. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fs: push file_update_time into ->page_mkwrite
On Tue 29-11-11 10:40:59, Josef Bacik wrote: > The fault code has been calling file_update_time after ->page_mkwrite after it > drops the page lock, but this is annoying because this calls mark_inode_dirty > which can fail in Btrfs, so we want to be able to do these updates in > ->page_mkwrite so we can get an error back to the user. So get rid of the > file_update_time calls in the fault code and push it into everybody who has a > ->page_mkwrite. I didn't do this for ubifs because it appears that ubifs > already updates the time itself in ->page_mkwrite, presumebly for the same > reasons as btrfs, so I left it as is. Thanks, But this effectively disables atime updates on mmaped writes for ext2, ext3, and similar filesystems which is a no-go IMHO. Honza > > Signed-off-by: Josef Bacik > --- > fs/9p/vfs_file.c |1 + > fs/btrfs/inode.c |1 + > fs/buffer.c |1 + > fs/ceph/addr.c |1 + > fs/cifs/file.c |1 + > fs/ext4/inode.c |1 + > fs/fuse/file.c |1 + > fs/gfs2/file.c |1 + > fs/nfs/file.c|1 + > fs/nilfs2/file.c |1 + > fs/ocfs2/mmap.c |1 + > fs/sysfs/bin.c |1 + > kernel/events/core.c |1 + > mm/memory.c |8 > security/selinux/selinuxfs.c |1 + > 15 files changed, 14 insertions(+), 8 deletions(-) > > diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c > index 62857a8..ae2968f 100644 > --- a/fs/9p/vfs_file.c > +++ b/fs/9p/vfs_file.c > @@ -610,6 +610,7 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct > vm_fault *vmf) > P9_DPRINTK(P9_DEBUG_VFS, "page %p fid %lx\n", > page, (unsigned long)filp->private_data); > > + file_update_time(filp); > v9inode = V9FS_I(inode); > /* make sure the cache has finished storing the page */ > v9fs_fscache_wait_on_page_write(inode, page); > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index e16215f..c272b91 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6313,6 +6313,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, > struct vm_fault *vmf) > } > > ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */ > + file_update_time(vma->vm_file); > again: > lock_page(page); > size = i_size_read(inode); > diff --git a/fs/buffer.c b/fs/buffer.c > index 1a80b04..c949a11 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2347,6 +2347,7 @@ int __block_page_mkwrite(struct vm_area_struct *vma, > struct vm_fault *vmf, > loff_t size; > int ret; > > + file_update_time(vma->vm_file); > lock_page(page); > size = i_size_read(inode); > if ((page->mapping != inode->i_mapping) || > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 5a3953d..1cf89aa 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -1137,6 +1137,7 @@ static int ceph_page_mkwrite(struct vm_area_struct > *vma, struct vm_fault *vmf) > dout("page_mkwrite %p %llu~%llu page %p idx %lu\n", inode, >off, len, page, page->index); > > + file_update_time(vma->vm_file); > lock_page(page); > > ret = VM_FAULT_NOPAGE; > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > index 9f41a10..410b11c 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -1910,6 +1910,7 @@ cifs_page_mkwrite(struct vm_area_struct *vma, struct > vm_fault *vmf) > { > struct page *page = vmf->page; > > + file_update_time(vma->vm_file); > lock_page(page); > return VM_FAULT_LOCKED; > } > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 986e238..e995f2c 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4372,6 +4372,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, > struct vm_fault *vmf) > goto out_ret; > } > > + file_update_time(vma->vm_file); > lock_page(page); > size = i_size_read(inode); > /* Page got truncated from under us? */ > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 594f07a..4f92651 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -1323,6 +1323,7 @@ static int fuse_page_mkwrite(struct vm_area_struct > *vma, struct vm_fault *vmf) >*/ > struct inode *inode = vma->vm_file->f_mapping->host; > > + file_update_time(vma->vm_file); > fuse_wait_on_page_writeback(inode, page->index); > return 0; > } > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c > index edeb9e8..ba22704 100644 > --- a/fs/gfs2/file.c > +++ b/fs/gfs2/file.c > @@ -359,6 +359,7 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, > struct vm_fault *vmf) > struct gfs2_alloc *al; > int ret; > > + file_update_time(vma->vm_file); > gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); > ret = gfs2_glock_nq(&gh);
Re: [PATCH 2/2] Btrfs: set the i_nlink to 2 for an initial dir inode
On Tue, Nov 29, 2011 at 02:04:37PM +0800, Jeff Liu wrote: > Please ignore this patch for now, it can cause the file system corrupted > and failed to mount again, sorry for the noise! Directories always have a link count of 1 in btrfs. This tells find not to use the link count as the count of subdirectories in the directory. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fs: push file_update_time into ->page_mkwrite
The fault code has been calling file_update_time after ->page_mkwrite after it drops the page lock, but this is annoying because this calls mark_inode_dirty which can fail in Btrfs, so we want to be able to do these updates in ->page_mkwrite so we can get an error back to the user. So get rid of the file_update_time calls in the fault code and push it into everybody who has a ->page_mkwrite. I didn't do this for ubifs because it appears that ubifs already updates the time itself in ->page_mkwrite, presumebly for the same reasons as btrfs, so I left it as is. Thanks, Signed-off-by: Josef Bacik --- fs/9p/vfs_file.c |1 + fs/btrfs/inode.c |1 + fs/buffer.c |1 + fs/ceph/addr.c |1 + fs/cifs/file.c |1 + fs/ext4/inode.c |1 + fs/fuse/file.c |1 + fs/gfs2/file.c |1 + fs/nfs/file.c|1 + fs/nilfs2/file.c |1 + fs/ocfs2/mmap.c |1 + fs/sysfs/bin.c |1 + kernel/events/core.c |1 + mm/memory.c |8 security/selinux/selinuxfs.c |1 + 15 files changed, 14 insertions(+), 8 deletions(-) diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index 62857a8..ae2968f 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -610,6 +610,7 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) P9_DPRINTK(P9_DEBUG_VFS, "page %p fid %lx\n", page, (unsigned long)filp->private_data); + file_update_time(filp); v9inode = V9FS_I(inode); /* make sure the cache has finished storing the page */ v9fs_fscache_wait_on_page_write(inode, page); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e16215f..c272b91 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6313,6 +6313,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) } ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */ + file_update_time(vma->vm_file); again: lock_page(page); size = i_size_read(inode); diff --git a/fs/buffer.c b/fs/buffer.c index 1a80b04..c949a11 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2347,6 +2347,7 @@ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, loff_t size; int ret; + file_update_time(vma->vm_file); lock_page(page); size = i_size_read(inode); if ((page->mapping != inode->i_mapping) || diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 5a3953d..1cf89aa 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1137,6 +1137,7 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) dout("page_mkwrite %p %llu~%llu page %p idx %lu\n", inode, off, len, page, page->index); + file_update_time(vma->vm_file); lock_page(page); ret = VM_FAULT_NOPAGE; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 9f41a10..410b11c 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1910,6 +1910,7 @@ cifs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) { struct page *page = vmf->page; + file_update_time(vma->vm_file); lock_page(page); return VM_FAULT_LOCKED; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 986e238..e995f2c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4372,6 +4372,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) goto out_ret; } + file_update_time(vma->vm_file); lock_page(page); size = i_size_read(inode); /* Page got truncated from under us? */ diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 594f07a..4f92651 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1323,6 +1323,7 @@ static int fuse_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) */ struct inode *inode = vma->vm_file->f_mapping->host; + file_update_time(vma->vm_file); fuse_wait_on_page_writeback(inode, page->index); return 0; } diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index edeb9e8..ba22704 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -359,6 +359,7 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) struct gfs2_alloc *al; int ret; + file_update_time(vma->vm_file); gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); ret = gfs2_glock_nq(&gh); if (ret) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 28b8c3f..bfa0c48 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -571,6 +571,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) filp->f_mapping->host->i_ino, (long long)page_offset(page)); + file_update_time(filp); /* make sure the cache has finished storing the page */
Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
Den 29. nov. 2011 16:12, skrev Chris Mason: On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote: Hi! Sending a mail on this issue, as advised on IRC. My /home file system fails to mount and the kernel seem to freeze and I need to do the Alt+SysRq RSNEIUB routine to boot it safely. The corruption happened on a 3.2-rc kernel and Ubuntu 11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic kernel to see if that helped, it did not. btrfsck from the latest btrfs-tools returns: karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0 ref mismatch on [2176962560 8192] extent item 480, found 1 Incorrect local backref count on 2176970752 root 5 owner 2101705 offset 368640 found 1 wanted 3925868545 backpointer mismatch on [2176970752 4096] So the crashes below were because we tried to free one of these extents. You have two extents whose reference counts are way off. Unfortunately this is stored on disk, so different kernels aren't going to fix it (yet). One of the extents is in a file with inode number 2101705, and the other is in a btree block (2176962560). I'll be able to fix this soon, but we can also make a patch that changes those BUG_ONs to just deal with the mismatch. The worst case here would be leaking those two extents, about 12K of data. -chris Thank you for looking into it, and that does sounds really promising. I am available to test any patches you want tested. Is there anything else that I can do to help getting this issue fixed? Karl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs/git question.
On Tue, Nov 29, 2011 at 09:33:37AM +0700, Fajar A. Nugraha wrote: > On Tue, Nov 29, 2011 at 8:58 AM, Phillip Susi wrote: > > On 11/28/2011 12:53 PM, Ken D'Ambrosio wrote: > >> Seems I've picked up a wireless regression, and randomly drop my WiFi > >> connection with more recent kernels. While I'd love to try to track down > >> the > >> issue, the sporadic nature makes it difficult. But I don't want to revert > >> to a > >> flat-out old kernel because of all the btrfs modifications. Is it possible > >> using git to add *just* btrfs patches to an older kernel? > > > > Sure: use git rebase to apply the patches to the older kernel. > > ... or use 3.1.2, and get ONLY fs/btrfs from Chris' for-linus tree, > compile it out-of-tree, and use it to replace the original btrfs.ko. If you're on a 3.1 kernel, you can pull my for-linus directly on top of it with git pull. I always keep a btrfs tree against the previous kernel so that people can use the latest btrfs goodness without having to use an rc kernel. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote: > Hi! > > Sending a mail on this issue, as advised on IRC. > > My /home file system fails to mount and the kernel seem to freeze > and I need to do the Alt+SysRq RSNEIUB routine to boot it safely. > The corruption happened on a 3.2-rc kernel and Ubuntu > 11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic > kernel to see if that helped, it did not. > btrfsck from the latest btrfs-tools returns: > > karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0 > ref mismatch on [2176962560 8192] extent item 480, found 1 > Incorrect local backref count on 2176970752 root 5 owner 2101705 > offset 368640 found 1 wanted 3925868545 > backpointer mismatch on [2176970752 4096] So the crashes below were because we tried to free one of these extents. You have two extents whose reference counts are way off. Unfortunately this is stored on disk, so different kernels aren't going to fix it (yet). One of the extents is in a file with inode number 2101705, and the other is in a btree block (2176962560). I'll be able to fix this soon, but we can also make a patch that changes those BUG_ONs to just deal with the mismatch. The worst case here would be leaking those two extents, about 12K of data. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html