Re: btrfs/git question.

2011-11-29 Thread Fajar A. Nugraha
On Tue, Nov 29, 2011 at 10:22 PM, Chris Mason  wrote:
> On Tue, Nov 29, 2011 at 09:33:37AM +0700, Fajar A. Nugraha wrote:
>> On Tue, Nov 29, 2011 at 8:58 AM, Phillip Susi  wrote:
>> > On 11/28/2011 12:53 PM, Ken D'Ambrosio wrote:
>> >> Seems I've picked up a wireless regression, and randomly drop my WiFi
>> >> connection with more recent kernels.  While I'd love to try to track down 
>> >> the
>> >> issue, the sporadic nature makes it difficult.  But I don't want to 
>> >> revert to a
>> >> flat-out old kernel because of all the btrfs modifications.  Is it 
>> >> possible
>> >> using git to add *just* btrfs patches to an older kernel?
>> >
>> > Sure: use git rebase to apply the patches to the older kernel.
>>
>> ... or use 3.1.2, and get ONLY fs/btrfs from Chris' for-linus tree,
>> compile it out-of-tree, and use it to replace the original btrfs.ko.
>
> If you're on a 3.1 kernel, you can pull my for-linus directly on top of
> it with git pull.  I always keep a btrfs tree against the previous
> kernel so that people can use the latest btrfs goodness without having
> to use an rc kernel.

Yes, thanks for that.

My suggestion is simply an alternative (instead of git pull) for people who:
- aren't quite familiar with git, but know enough to grab a directory
snapshot from gitweb (e.g.
http://git.kernel.org/?p=linux/kernel/git/mason/linux-btrfs.git;a=tree;f=fs/btrfs;h=5f51bd7e3b8b6c4825681408450e6580bdbccce1;hb=refs/heads/for-linus)
- know how to build a module out-of-tree
- on the latest stable, but don't want to re-compile the whole kernel
just to get btrfs fix

-- 
Fajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: set the i_nlink to 2 for an initial dir inode

2011-11-29 Thread Jeff Liu
On 11/29/2011 11:48 PM, Chris Mason wrote:

> On Tue, Nov 29, 2011 at 02:04:37PM +0800, Jeff Liu wrote:
>> Please ignore this patch for now, it can cause the file system corrupted
>> and failed to mount again, sorry for the noise!
> 
> Directories always have a link count of 1 in btrfs.  This tells find not
> to use the link count as the count of subdirectories in the directory.

Thank you for your clarification!

-Jeff

> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] Sector Size check during Mount

2011-11-29 Thread Keith Mannthey
Gracefully fail when trying to mount a BTRFS file system that has a
sectorsize smaller than PAGE_SIZE.  

On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel
then boot into a 64K PAGE_SIZE kernel.  Presently open_ctree fails in an
endless loop and hangs the machine in this situation. 

My debugging has show this Sector size < Page size to be a non trivial
situation and a graceful exit from the situation would be nice for the
time being. 


Signed-off-by:  Keith Mannthey 

---

diff -urN a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
--- a/fs/btrfs/disk-io.c2011-10-09 21:53:11.0 -0500
+++ b/fs/btrfs/disk-io.c2011-11-29 17:33:15.0 -0600
@@ -1916,6 +1916,12 @@
goto fail_sb_buffer;
}
 
+   if (sectorsize < PAGE_SIZE) {
+   printk(KERN_WARNING "btrfs: Incompatible sector size "
+  "found on %s\n", sb->s_id);
+   goto fail_sb_buffer;
+   }
+
mutex_lock(&fs_info->chunk_mutex);
ret = btrfs_read_sys_array(tree_root);
mutex_unlock(&fs_info->chunk_mutex);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix submit_worker congestion

2011-11-29 Thread Chris Mason
On Tue, Nov 29, 2011 at 09:40:56PM +0100, Arne Jansen wrote:
> Write bios are submitted from the submit_worker. The worker pumps down
> bios into the block layer until it signals a congestion. At least this
> is the theory. In pratice submit_bio just blocks before any signalling
> happens. As the bios are queued per device, this can lead to a situation
> where only one device is served until all bios are submitted, and only
> then the next device is served. This is obviously suboptimal.
> This patch just throws out the congestion detection and reschedules the
> worker every 8 requests. This way, all devices can be kept busy.
> This is only a temporary fix until the block layer provides a non-blocking
> submit_bio. Then the whole submit_worker mechanism can be killed.

The problem with the every 8 requests logic is that we've still got a
pretty good chance of getting stuck behind get_request_wait.  The way
the elevator batching works is that it should give us a batch of
requests, and once that batch is done we wait.

If we jump around every 8 requests, we've turned this:

[ dev A bio 1-8, dev A bio 8-16, dev A bio 16-32, dev B bio 1-8, dev B ... ]

into:

[ dev A bio 1-8, dev B bio 1-8, dev A bio 8-16, dev B bio 8-16 ]

They look like the same IO, but if we wait for a request when we do
(dev B bio 1-8) then our dev A bio 1-8 bio is likely to dispatch without
all the other dev A bios we had queued.

As you said in IRC, we'd be better off with one thread per device or (my
preference) with a real non-blocking submit_bio.  What kind of results
did you get with your test from bumping the nr_requests?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/20] Btrfs: initialize new bitmaps' list

2011-11-29 Thread Christian Brunner
2011/11/28 Alexandre Oliva :
> We're failing to create clusters with bitmaps because
> setup_cluster_no_bitmap checks that the list is empty before inserting
> the bitmap entry in the list for setup_cluster_bitmap, but the list
> field is only initialized when it is restored from the on-disk free
> space cache, or when it is written out to disk.
>
> Besides a potential race condition due to the multiple use of the list
> field, filesystem performance severely degrades over time: as we use
> up all non-bitmap free extents, the try-to-set-up-cluster dance is
> done at every metadata block allocation.  For every block group, we
> fail to set up a cluster, and after failing on them all up to twice,
> we fall back to the much slower unclustered allocation.

This matches exactly what I've been observing in our ceph cluster.
I've now installed your patches (1-11) on two servers.
The cluster setup problem seems to be gone. - A big thanks for that!

However another thing is causing me some headeache:

When I'm doing havy reading in our ceph cluster. The load and wait-io
on the patched servers is higher than on the unpatched ones.

Dstat from an unpatched server:

total-cpu-usage -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  1   6  83   8   0   1|  22M  348k| 336k   93M|   0 0 |8445  3715
  1   5  87   7   0   1|  12M 1808k| 214k   65M|   0 0 |5461  1710
  1   3  85  10   0   0|  11M  640k| 313k   49M|   0 0 |5919  2853
  1   6  84   9   0   1|  12M  608k| 358k   69M|   0 0 |7406  3645
  1   7  78  13   0   1|  15M 5344k| 348k  105M|   0 0 |9765  4403
  1   7  80  10   0   1|  22M 1368k| 358k   89M|   0 0 |8036  3202
  1   9  72  16   0   1|  22M 2424k| 646k  137M|   0 0 |  12k 5527

Dstat from a patched server:

---total-cpu-usage -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  1   2  61  35   0   0|2500k 2736k| 141k   34M|   0 0 |4415  1603
  1   4  48  47   0   1|  10M 3924k| 353k   61M|   0 0 |6871  3771
  1   5  55  38   0   1|  10M 1728k| 385k   92M|   0 0 |8030  2617
  2   8  69  20   0   1|  18M 1384k| 435k  130M|   0 0 |  10k 4493
  1   5  85   8   0   1|7664k   84k| 287k   97M|   0 0 |6231  1357
  1   3  91   5   0   0|  10M  144k| 194k   44M|   0 0 |3807  1081
  1   7  66  25   0   1|  20M 1248k| 404k  101M|   0 0 |8676  3632
  0   3  38  58   0   0|8104k 2660k| 176k   40M|   0 0 |4841  2093


This seems to be coming from "btrfs-endio-1". A kernel thread that has
not caught my attention on unpatched systems, yet.

I did some tracing on that process with ftrace and I can see that the
time is wasted in end_bio_extent_readpage(). In a single call to
end_bio_extent_readpage()the functions unlock_extent_cached(),
unlock_page() and btrfs_readpage_end_io_hook() are invoked 128 times
(each).

Do you have any idea what's going on here?

(Please note that the filesystem is still unmodified - metadata
overhead is large).

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix submit_worker congestion

2011-11-29 Thread Arne Jansen
Write bios are submitted from the submit_worker. The worker pumps down
bios into the block layer until it signals a congestion. At least this
is the theory. In pratice submit_bio just blocks before any signalling
happens. As the bios are queued per device, this can lead to a situation
where only one device is served until all bios are submitted, and only
then the next device is served. This is obviously suboptimal.
This patch just throws out the congestion detection and reschedules the
worker every 8 requests. This way, all devices can be kept busy.
This is only a temporary fix until the block layer provides a non-blocking
submit_bio. Then the whole submit_worker mechanism can be killed.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/volumes.c |   30 +-
 1 files changed, 1 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c37433d..5b01742 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -257,36 +257,8 @@ loop_lock:
 * is now congested.  Back off and let other work structs
 * run instead
 */
-   if (pending && bdi_write_congested(bdi) && batch_run > 8 &&
+   if (pending && batch_run > 8 &&
fs_info->fs_devices->open_devices > 1) {
-   struct io_context *ioc;
-
-   ioc = current->io_context;
-
-   /*
-* the main goal here is that we don't want to
-* block if we're going to be able to submit
-* more requests without blocking.
-*
-* This code does two great things, it pokes into
-* the elevator code from a filesystem _and_
-* it makes assumptions about how batching works.
-*/
-   if (ioc && ioc->nr_batch_requests > 0 &&
-   time_before(jiffies, ioc->last_waited + HZ/50UL) &&
-   (last_waited == 0 ||
-ioc->last_waited == last_waited)) {
-   /*
-* we want to go through our batch of
-* requests and stop.  So, we copy out
-* the ioc->last_waited time and test
-* against it before looping
-*/
-   last_waited = ioc->last_waited;
-   if (need_resched())
-   cond_resched();
-   continue;
-   }
spin_lock(&device->io_lock);
requeue_list(pending_bios, pending, tail);
device->running_pending = 1;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!

2011-11-29 Thread Chris Mason
On Tue, Nov 29, 2011 at 05:47:46PM +0100, David Sterba wrote:
> On Tue, Nov 29, 2011 at 10:49:13AM -0500, Chris Mason wrote:
> > The good news about this one is that it is very clear cut.  The hard
> > part is figuring out where these bogus link counts came from.
> > 
> > I'd suggest that you spend some time running memtest on the machine.
> 
> Just to add some evidence from the log:
> 
> Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at
> /home/apw/COD/linux/fs/btrfs/extent-tree.c:4775!
> Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX:
> ea01 RBX: 880412c3ab40 RCX: 880380173900
> 
> 
> 4765 ret = btrfs_search_slot(trans, extent_root,
> 4766 &key, path, -1, 1);
> 4767 if (ret) {
> 4768 printk(KERN_ERR "umm, got %d back from 
> search"
> 4769", was looking for %llu\n", ret,
> 4770(unsigned long long)bytenr);
> 4771 if (ret > 0)
> 4772 btrfs_print_leaf(extent_root,
> 4773  path->nodes[0]);
> 4774 }
> 4775 BUG_ON(ret);
> 
> the ret value comes from btrfs_search_slot, returning " < 0" or 1, but
> RAX has some extra bits set, this could really be a RAM failure.
> 
> 
> david

Interesting, look at this:

> karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
> ref mismatch on [2176962560 8192] extent item 480, found 1
> Incorrect local backref count on 2176970752 root 5 owner 2101705
> offset 368640 found 1 wanted 3925868545
> backpointer mismatch on [2176970752 4096]

3925868545 == EA01

Are you sure this is the BUG_ON he was triggering?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: simplify move_pages and copy_pages

2011-11-29 Thread David Sterba
After commit a65917156e34594 ("Btrfs: stop using highmem for
extent_buffers") we don't need to kmap_atomic anymore and can simplify
both functions.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c |   19 ---
 1 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9472d3d..9e04d9b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4238,16 +4238,9 @@ static void move_pages(struct page *dst_page, struct 
page *src_page,
   unsigned long len)
 {
char *dst_kaddr = page_address(dst_page);
-   if (dst_page == src_page) {
-   memmove(dst_kaddr + dst_off, dst_kaddr + src_off, len);
-   } else {
-   char *src_kaddr = page_address(src_page);
-   char *p = dst_kaddr + dst_off + len;
-   char *s = src_kaddr + src_off + len;
+   char *src_kaddr = page_address(src_page);
 
-   while (len--)
-   *--p = *--s;
-   }
+   memmove(dst_kaddr + dst_off, src_kaddr + src_off, len);
 }
 
 static inline bool areas_overlap(unsigned long src, unsigned long dst, 
unsigned long len)
@@ -4261,14 +4254,10 @@ static void copy_pages(struct page *dst_page, struct 
page *src_page,
   unsigned long len)
 {
char *dst_kaddr = page_address(dst_page);
-   char *src_kaddr;
+   char *src_kaddr = page_address(src_page);
 
-   if (dst_page != src_page) {
-   src_kaddr = page_address(src_page);
-   } else {
-   src_kaddr = dst_kaddr;
+   if (dst_page == src_page)
BUG_ON(areas_overlap(src_off, dst_off, len));
-   }
 
memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);
 }
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!

2011-11-29 Thread David Sterba
On Tue, Nov 29, 2011 at 10:49:13AM -0500, Chris Mason wrote:
> The good news about this one is that it is very clear cut.  The hard
> part is figuring out where these bogus link counts came from.
> 
> I'd suggest that you spend some time running memtest on the machine.

Just to add some evidence from the log:

Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at
/home/apw/COD/linux/fs/btrfs/extent-tree.c:4775!
Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX:
ea01 RBX: 880412c3ab40 RCX: 880380173900


4765 ret = btrfs_search_slot(trans, extent_root,
4766 &key, path, -1, 1);
4767 if (ret) {
4768 printk(KERN_ERR "umm, got %d back from 
search"
4769", was looking for %llu\n", ret,
4770(unsigned long long)bytenr);
4771 if (ret > 0)
4772 btrfs_print_leaf(extent_root,
4773  path->nodes[0]);
4774 }
4775 BUG_ON(ret);

the ret value comes from btrfs_search_slot, returning " < 0" or 1, but
RAX has some extra bits set, this could really be a RAM failure.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] fix bugs of sub transid -- WARNING: at fs/btrfs/ctree.c:432

2011-11-29 Thread David Sterba
On Tue, Nov 29, 2011 at 09:18:35AM +0800, Liu Bo wrote:
> a) For the first one (last_snapshot bug),
> 
> The test involves three processes (derived from Chris):
> 
> mkfs.btrfs /dev/xxx
> mount /dev/xxx /mnt
> 
> 1) run compilebench -i 30 --makej -D /mnt
> 
> Let compilebench run until it starts the create phase.
> 
> 2) run synctest -f -u -n 200 -t 3 /mnt
> 3) for x in `seq 1 200` ; do btrfs subvol snap /mnt /mnt/snap$x ; sleep 0.5 ; 
> done

I have hit following 2 warnings during this test. Phase 1 was at compile
stage, 2 and 3 were running. I did not see them during first run and other
activity at the filestystem was 'du -sh /mnt'.

mount options: compress-force=lzo,discard,space_cache,autodefrag,inode_cache

Label: none  uuid: 79f4160b-81f8-46ed-968c-968cb17a2e87
Total devices 4 FS bytes used 7.76GB
devid4 size 13.96GB used 2.26GB path /dev/sdb4
devid3 size 13.96GB used 2.26GB path /dev/sdb3
devid2 size 13.96GB used 3.00GB path /dev/sdb2
devid1 size 13.96GB used 3.02GB path /dev/sdb1

fresh and default mkfs

 430 WARN_ON(root->ref_cows && trans->transaction->transid !=
 431 root->fs_info->running_transaction->transid);
 432 WARN_ON(root->ref_cows && trans->transid < root->last_trans);


20433.473713] [ cut here ]
[20433.478825] WARNING: at fs/btrfs/ctree.c:432 __btrfs_cow_block+0x429/0x5e0 
[btrfs]()
[20433.487148] Hardware name: Santa Rosa platform
[20433.487150] Modules linked in: btrfs aoe sr_mod ide_cd_mod cdrom loop [last 
unloaded: btrfs]
[20433.487162] Pid: 12099, comm: btrfs Tainted: GW   3.1.0-default+ #80
[20433.487165] Call Trace:
[20433.487174]  [] warn_slowpath_common+0x7f/0xc0
[20433.487179]  [] warn_slowpath_null+0x1a/0x20
[20433.487190]  [] __btrfs_cow_block+0x429/0x5e0 [btrfs]
[20433.487196]  [] ? trace_hardirqs_off_caller+0x29/0xc0
[20433.487201]  [] ? lock_release_holdtime+0x3d/0x1c0
[20433.487218]  [] ? btrfs_set_lock_blocking_rw+0x50/0xb0 
[btrfs]
[20433.487230]  [] btrfs_cow_block+0x1a6/0x3d0 [btrfs]
[20433.487236]  [] ? _raw_write_unlock+0x2b/0x50
[20433.487247]  [] btrfs_search_slot+0x300/0xd20 [btrfs]
[20433.487262]  [] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
[20433.487279]  [] btrfs_update_inode_item+0x66/0x120 [btrfs]
[20433.487296]  [] btrfs_update_inode+0xab/0xc0 [btrfs]
[20433.487313]  [] ? lookup_free_ino_inode+0x51/0xe0 [btrfs]
[20433.487327]  [] btrfs_save_ino_cache+0x145/0x2f0 [btrfs]
[20433.487342]  [] ? commit_fs_roots+0xa4/0x1c0 [btrfs]
[20433.487357]  [] commit_fs_roots+0xd4/0x1c0 [btrfs]
[20433.487373]  [] btrfs_commit_transaction+0x454/0x900 
[btrfs]
[20433.487378]  [] ? lock_release_holdtime+0x3d/0x1c0
[20433.487395]  [] ? btrfs_mksubvol+0x298/0x360 [btrfs]
[20433.487400]  [] ? wake_up_bit+0x40/0x40
[20433.487405]  [] ? do_raw_spin_unlock+0x5e/0xb0
[20433.487421]  [] btrfs_mksubvol+0x358/0x360 [btrfs]
[20433.487427]  [] ? might_fault+0x53/0xb0
[20433.487443]  [] 
btrfs_ioctl_snap_create_transid+0x100/0x160 [btrfs]
[20433.487448]  [] ? might_fault+0x53/0xb0
[20433.487464]  [] 
btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
[20433.487482]  [] btrfs_ioctl+0x588/0x1080 [btrfs]
[20433.487487]  [] ? do_page_fault+0x2d0/0x580
[20433.487492]  [] ? local_clock+0x6f/0x80
[20433.487498]  [] do_vfs_ioctl+0x98/0x560
[20433.487502]  [] ? retint_swapgs+0x13/0x1b
[20433.487507]  [] sys_ioctl+0x4f/0x80
[20433.487512]  [] system_call_fastpath+0x16/0x1b
[20433.487515] ---[ end trace d93007cf8d0a8eac ]---
[20433.487576] [ cut here ]
[20433.487587] WARNING: at fs/btrfs/ctree.c:432 __btrfs_cow_block+0x429/0x5e0 
[btrfs]()
[20433.487590] Hardware name: Santa Rosa platform
[20433.487592] Modules linked in: btrfs aoe sr_mod ide_cd_mod cdrom loop [last 
unloaded: btrfs]
[20433.487601] Pid: 12099, comm: btrfs Tainted: GW   3.1.0-default+ #80
[20433.487603] Call Trace:
[20433.487608]  [] warn_slowpath_common+0x7f/0xc0
[20433.487613]  [] warn_slowpath_null+0x1a/0x20
[20433.487623]  [] __btrfs_cow_block+0x429/0x5e0 [btrfs]
[20433.487628]  [] ? trace_hardirqs_off_caller+0x29/0xc0
[20433.487633]  [] ? lock_release_holdtime+0x3d/0x1c0
[20433.487649]  [] ? btrfs_set_lock_blocking_rw+0x50/0xb0 
[btrfs]
[20433.487660]  [] btrfs_cow_block+0x1a6/0x3d0 [btrfs]
[20433.487665]  [] ? _raw_write_unlock+0x2b/0x50
[20433.487676]  [] btrfs_search_slot+0x300/0xd20 [btrfs]
[20433.487691]  [] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
[20433.487707]  [] btrfs_update_inode_item+0x66/0x120 [btrfs]
[20433.487723]  [] btrfs_update_inode+0xab/0xc0 [btrfs]
[20433.487739]  [] ? lookup_free_ino_inode+0x51/0xe0 [btrfs]
[20433.487753]  [] btrfs_save_ino_cache+0x145/0x2f0 [btrfs]
[20433.487769]  [] ? commit_fs_roots+0xa4/0x1c0 [btrfs]
[20433.487784]  [] commit_fs_roots+0xd4/0x1c0 [btrfs]
[20433.487800]  [] btrfs_commit_transaction+0x454/0x900 
[btrfs]
[20433.487805]  [] ? lock_release_holdtime+0x3d/0x1c0
[20433.487821]  [] ? btrfs_mksubvol+0x298/0x360 [btrfs]
[20433.487826]

Re: [PATCH 2/2] Btrfs: set the i_nlink to 2 for an initial dir inode

2011-11-29 Thread Jan Schmidt
On 29.11.2011 16:48, Chris Mason wrote:
> On Tue, Nov 29, 2011 at 02:04:37PM +0800, Jeff Liu wrote:
>> Please ignore this patch for now, it can cause the file system corrupted
>> and failed to mount again, sorry for the noise!
> 
> Directories always have a link count of 1 in btrfs.  This tells find not
> to use the link count as the count of subdirectories in the directory.

I'm surprised.

Now I see why my thread "Creation of pseudo items leads to (seemingly)
duplicate inodes (BUG inside)" suffered from little attention :-)

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fs: push file_update_time into ->page_mkwrite

2011-11-29 Thread Josef Bacik
On Tue, Nov 29, 2011 at 04:50:20PM +0100, Jan Kara wrote:
> On Tue 29-11-11 10:40:59, Josef Bacik wrote:
> > The fault code has been calling file_update_time after ->page_mkwrite after 
> > it
> > drops the page lock, but this is annoying because this calls 
> > mark_inode_dirty
> > which can fail in Btrfs, so we want to be able to do these updates in
> > ->page_mkwrite so we can get an error back to the user.  So get rid of the
> > file_update_time calls in the fault code and push it into everybody who has 
> > a
> > ->page_mkwrite.  I didn't do this for ubifs because it appears that ubifs
> > already updates the time itself in ->page_mkwrite, presumebly for the same
> > reasons as btrfs, so I left it as is.  Thanks,
>   But this effectively disables atime updates on mmaped writes for ext2,
> ext3, and similar filesystems which is a no-go IMHO.
>

Heh doh you're right, I have vacation brain.  Thanks,

Josef 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!

2011-11-29 Thread Chris Mason
On Tue, Nov 29, 2011 at 04:29:54PM +0100, Karl Mardoff Kittilsen wrote:
> Den 29. nov. 2011 16:12, skrev Chris Mason:
> >On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote:
> >>Hi!
> >>
> >>Sending a mail on this issue, as advised on IRC.
> >>
> >>My /home file system fails to mount and the kernel seem to freeze
> >>and I need to do the Alt+SysRq RSNEIUB routine to boot it safely.
> >>The corruption happened on a 3.2-rc  kernel and Ubuntu
> >>11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic
> >>kernel to see if that helped, it did not.
> >>btrfsck from the latest btrfs-tools returns:
> >>
> >>karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
> >>ref mismatch on [2176962560 8192] extent item 480, found 1
> >>Incorrect local backref count on 2176970752 root 5 owner 2101705
> >>offset 368640 found 1 wanted 3925868545
> >>backpointer mismatch on [2176970752 4096]
> >
> >So the crashes below were because we tried to free one of these extents.
> >You have two extents whose reference counts are way off.
> >
> >Unfortunately this is stored on disk, so different kernels aren't going
> >to fix it (yet).  One of the extents is in a file with inode number
> >2101705, and the other is in a btree block (2176962560).
> >
> >I'll be able to fix this soon, but we can also make a patch that changes
> >those BUG_ONs to just deal with the mismatch.  The worst case here would
> >be leaking those two extents, about 12K of data.
> >
> >-chris
> 
> Thank you for looking into it, and that does sounds really
> promising. I am available to test any patches you want tested. Is
> there anything else that I can do to help getting this issue fixed?

The good news about this one is that it is very clear cut.  The hard
part is figuring out where these bogus link counts came from.

I'd suggest that you spend some time running memtest on the machine.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fs: push file_update_time into ->page_mkwrite

2011-11-29 Thread Jan Kara
On Tue 29-11-11 10:40:59, Josef Bacik wrote:
> The fault code has been calling file_update_time after ->page_mkwrite after it
> drops the page lock, but this is annoying because this calls mark_inode_dirty
> which can fail in Btrfs, so we want to be able to do these updates in
> ->page_mkwrite so we can get an error back to the user.  So get rid of the
> file_update_time calls in the fault code and push it into everybody who has a
> ->page_mkwrite.  I didn't do this for ubifs because it appears that ubifs
> already updates the time itself in ->page_mkwrite, presumebly for the same
> reasons as btrfs, so I left it as is.  Thanks,
  But this effectively disables atime updates on mmaped writes for ext2,
ext3, and similar filesystems which is a no-go IMHO.

Honza
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/9p/vfs_file.c |1 +
>  fs/btrfs/inode.c |1 +
>  fs/buffer.c  |1 +
>  fs/ceph/addr.c   |1 +
>  fs/cifs/file.c   |1 +
>  fs/ext4/inode.c  |1 +
>  fs/fuse/file.c   |1 +
>  fs/gfs2/file.c   |1 +
>  fs/nfs/file.c|1 +
>  fs/nilfs2/file.c |1 +
>  fs/ocfs2/mmap.c  |1 +
>  fs/sysfs/bin.c   |1 +
>  kernel/events/core.c |1 +
>  mm/memory.c  |8 
>  security/selinux/selinuxfs.c |1 +
>  15 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
> index 62857a8..ae2968f 100644
> --- a/fs/9p/vfs_file.c
> +++ b/fs/9p/vfs_file.c
> @@ -610,6 +610,7 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct 
> vm_fault *vmf)
>   P9_DPRINTK(P9_DEBUG_VFS, "page %p fid %lx\n",
>  page, (unsigned long)filp->private_data);
>  
> + file_update_time(filp);
>   v9inode = V9FS_I(inode);
>   /* make sure the cache has finished storing the page */
>   v9fs_fscache_wait_on_page_write(inode, page);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e16215f..c272b91 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6313,6 +6313,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
> struct vm_fault *vmf)
>   }
>  
>   ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
> + file_update_time(vma->vm_file);
>  again:
>   lock_page(page);
>   size = i_size_read(inode);
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 1a80b04..c949a11 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2347,6 +2347,7 @@ int __block_page_mkwrite(struct vm_area_struct *vma, 
> struct vm_fault *vmf,
>   loff_t size;
>   int ret;
>  
> + file_update_time(vma->vm_file);
>   lock_page(page);
>   size = i_size_read(inode);
>   if ((page->mapping != inode->i_mapping) ||
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 5a3953d..1cf89aa 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1137,6 +1137,7 @@ static int ceph_page_mkwrite(struct vm_area_struct 
> *vma, struct vm_fault *vmf)
>   dout("page_mkwrite %p %llu~%llu page %p idx %lu\n", inode,
>off, len, page, page->index);
>  
> + file_update_time(vma->vm_file);
>   lock_page(page);
>  
>   ret = VM_FAULT_NOPAGE;
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 9f41a10..410b11c 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1910,6 +1910,7 @@ cifs_page_mkwrite(struct vm_area_struct *vma, struct 
> vm_fault *vmf)
>  {
>   struct page *page = vmf->page;
>  
> + file_update_time(vma->vm_file);
>   lock_page(page);
>   return VM_FAULT_LOCKED;
>  }
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 986e238..e995f2c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4372,6 +4372,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, 
> struct vm_fault *vmf)
>   goto out_ret;
>   }
>  
> + file_update_time(vma->vm_file);
>   lock_page(page);
>   size = i_size_read(inode);
>   /* Page got truncated from under us? */
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 594f07a..4f92651 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1323,6 +1323,7 @@ static int fuse_page_mkwrite(struct vm_area_struct 
> *vma, struct vm_fault *vmf)
>*/
>   struct inode *inode = vma->vm_file->f_mapping->host;
>  
> + file_update_time(vma->vm_file);
>   fuse_wait_on_page_writeback(inode, page->index);
>   return 0;
>  }
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index edeb9e8..ba22704 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -359,6 +359,7 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, 
> struct vm_fault *vmf)
>   struct gfs2_alloc *al;
>   int ret;
>  
> + file_update_time(vma->vm_file);
>   gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
>   ret = gfs2_glock_nq(&gh);

Re: [PATCH 2/2] Btrfs: set the i_nlink to 2 for an initial dir inode

2011-11-29 Thread Chris Mason
On Tue, Nov 29, 2011 at 02:04:37PM +0800, Jeff Liu wrote:
> Please ignore this patch for now, it can cause the file system corrupted
> and failed to mount again, sorry for the noise!

Directories always have a link count of 1 in btrfs.  This tells find not
to use the link count as the count of subdirectories in the directory.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fs: push file_update_time into ->page_mkwrite

2011-11-29 Thread Josef Bacik
The fault code has been calling file_update_time after ->page_mkwrite after it
drops the page lock, but this is annoying because this calls mark_inode_dirty
which can fail in Btrfs, so we want to be able to do these updates in
->page_mkwrite so we can get an error back to the user.  So get rid of the
file_update_time calls in the fault code and push it into everybody who has a
->page_mkwrite.  I didn't do this for ubifs because it appears that ubifs
already updates the time itself in ->page_mkwrite, presumebly for the same
reasons as btrfs, so I left it as is.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/9p/vfs_file.c |1 +
 fs/btrfs/inode.c |1 +
 fs/buffer.c  |1 +
 fs/ceph/addr.c   |1 +
 fs/cifs/file.c   |1 +
 fs/ext4/inode.c  |1 +
 fs/fuse/file.c   |1 +
 fs/gfs2/file.c   |1 +
 fs/nfs/file.c|1 +
 fs/nilfs2/file.c |1 +
 fs/ocfs2/mmap.c  |1 +
 fs/sysfs/bin.c   |1 +
 kernel/events/core.c |1 +
 mm/memory.c  |8 
 security/selinux/selinuxfs.c |1 +
 15 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 62857a8..ae2968f 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -610,6 +610,7 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct 
vm_fault *vmf)
P9_DPRINTK(P9_DEBUG_VFS, "page %p fid %lx\n",
   page, (unsigned long)filp->private_data);
 
+   file_update_time(filp);
v9inode = V9FS_I(inode);
/* make sure the cache has finished storing the page */
v9fs_fscache_wait_on_page_write(inode, page);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e16215f..c272b91 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6313,6 +6313,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct 
vm_fault *vmf)
}
 
ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
+   file_update_time(vma->vm_file);
 again:
lock_page(page);
size = i_size_read(inode);
diff --git a/fs/buffer.c b/fs/buffer.c
index 1a80b04..c949a11 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2347,6 +2347,7 @@ int __block_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf,
loff_t size;
int ret;
 
+   file_update_time(vma->vm_file);
lock_page(page);
size = i_size_read(inode);
if ((page->mapping != inode->i_mapping) ||
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5a3953d..1cf89aa 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1137,6 +1137,7 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
dout("page_mkwrite %p %llu~%llu page %p idx %lu\n", inode,
 off, len, page, page->index);
 
+   file_update_time(vma->vm_file);
lock_page(page);
 
ret = VM_FAULT_NOPAGE;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 9f41a10..410b11c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1910,6 +1910,7 @@ cifs_page_mkwrite(struct vm_area_struct *vma, struct 
vm_fault *vmf)
 {
struct page *page = vmf->page;
 
+   file_update_time(vma->vm_file);
lock_page(page);
return VM_FAULT_LOCKED;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 986e238..e995f2c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4372,6 +4372,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct 
vm_fault *vmf)
goto out_ret;
}
 
+   file_update_time(vma->vm_file);
lock_page(page);
size = i_size_read(inode);
/* Page got truncated from under us? */
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 594f07a..4f92651 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1323,6 +1323,7 @@ static int fuse_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
 */
struct inode *inode = vma->vm_file->f_mapping->host;
 
+   file_update_time(vma->vm_file);
fuse_wait_on_page_writeback(inode, page->index);
return 0;
 }
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index edeb9e8..ba22704 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -359,6 +359,7 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
struct gfs2_alloc *al;
int ret;
 
+   file_update_time(vma->vm_file);
gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
ret = gfs2_glock_nq(&gh);
if (ret)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 28b8c3f..bfa0c48 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -571,6 +571,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
filp->f_mapping->host->i_ino,
(long long)page_offset(page));
 
+   file_update_time(filp);
/* make sure the cache has finished storing the page */

Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!

2011-11-29 Thread Karl Mardoff Kittilsen

Den 29. nov. 2011 16:12, skrev Chris Mason:

On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote:

Hi!

Sending a mail on this issue, as advised on IRC.

My /home file system fails to mount and the kernel seem to freeze
and I need to do the Alt+SysRq RSNEIUB routine to boot it safely.
The corruption happened on a 3.2-rc  kernel and Ubuntu
11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic
kernel to see if that helped, it did not.
btrfsck from the latest btrfs-tools returns:

karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
ref mismatch on [2176962560 8192] extent item 480, found 1
Incorrect local backref count on 2176970752 root 5 owner 2101705
offset 368640 found 1 wanted 3925868545
backpointer mismatch on [2176970752 4096]


So the crashes below were because we tried to free one of these extents.
You have two extents whose reference counts are way off.

Unfortunately this is stored on disk, so different kernels aren't going
to fix it (yet).  One of the extents is in a file with inode number
2101705, and the other is in a btree block (2176962560).

I'll be able to fix this soon, but we can also make a patch that changes
those BUG_ONs to just deal with the mismatch.  The worst case here would
be leaking those two extents, about 12K of data.

-chris


Thank you for looking into it, and that does sounds really promising. I 
am available to test any patches you want tested. Is there anything else 
that I can do to help getting this issue fixed?


Karl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs/git question.

2011-11-29 Thread Chris Mason
On Tue, Nov 29, 2011 at 09:33:37AM +0700, Fajar A. Nugraha wrote:
> On Tue, Nov 29, 2011 at 8:58 AM, Phillip Susi  wrote:
> > On 11/28/2011 12:53 PM, Ken D'Ambrosio wrote:
> >> Seems I've picked up a wireless regression, and randomly drop my WiFi
> >> connection with more recent kernels.  While I'd love to try to track down 
> >> the
> >> issue, the sporadic nature makes it difficult.  But I don't want to revert 
> >> to a
> >> flat-out old kernel because of all the btrfs modifications.  Is it possible
> >> using git to add *just* btrfs patches to an older kernel?
> >
> > Sure: use git rebase to apply the patches to the older kernel.
> 
> ... or use 3.1.2, and get ONLY fs/btrfs from Chris' for-linus tree,
> compile it out-of-tree, and use it to replace the original btrfs.ko.

If you're on a 3.1 kernel, you can pull my for-linus directly on top of
it with git pull.  I always keep a btrfs tree against the previous
kernel so that people can use the latest btrfs goodness without having
to use an rc kernel.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!

2011-11-29 Thread Chris Mason
On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote:
> Hi!
> 
> Sending a mail on this issue, as advised on IRC.
> 
> My /home file system fails to mount and the kernel seem to freeze
> and I need to do the Alt+SysRq RSNEIUB routine to boot it safely.
> The corruption happened on a 3.2-rc kernel and Ubuntu
> 11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic
> kernel to see if that helped, it did not.
> btrfsck from the latest btrfs-tools returns:
> 
> karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
> ref mismatch on [2176962560 8192] extent item 480, found 1
> Incorrect local backref count on 2176970752 root 5 owner 2101705
> offset 368640 found 1 wanted 3925868545
> backpointer mismatch on [2176970752 4096]

So the crashes below were because we tried to free one of these extents.
You have two extents whose reference counts are way off.

Unfortunately this is stored on disk, so different kernels aren't going
to fix it (yet).  One of the extents is in a file with inode number
2101705, and the other is in a btree block (2176962560).

I'll be able to fix this soon, but we can also make a patch that changes
those BUG_ONs to just deal with the mismatch.  The worst case here would
be leaking those two extents, about 12K of data.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html