date:20140820

[PATCH] btrfs: Fix a deadlock in btrfs_dev_replace_finishing()

2014-08-20 Thread Qu Wenruo

btrfs-transacion:5657
[stack snip]
btrfs_bio_map()
btrfs_bio_counter_inc_blocked()
percpu_counter_inc(&fs_info->bio_counter)  ###bio_counter > 0(A)
__btrfs_bio_map()
btrfs_dev_replace_lock()
mutex_lock(dev_replace->lock)  ###wait mutex(B)

btrfs:32612
[stack snip]
btrfs_dev_replace_start()
btrfs_dev_replace_lock()
mutex_lock(dev_replace->lock)  ###hold mutex(B)
btrfs_dev_replace_finishing()
btrfs_rm_dev_replace_blocked()
wait until percpu_counter_sum == 0 ###wait on bio_counter(A)

This bug can be triggered quite easily by the following test script:
http://pastebin.com/MQmb37Cy

This patch will fix the ABBA problem by calling
btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked().

The consistency of btrfs devices list and their superblocks is protected
by device_list_mutex, not btrfs_dev_replace_lock/unlock().
So it is safe the move btrfs_dev_replace_unlock() before
btrfs_rm_dev_replace_blocked().

Reported-by: Zhao Lei 
Signed-off-by: Qu Wenruo 
Cc: Stefan Behrens 
---
 fs/btrfs/dev-replace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index eea26e1..d738ff8 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -567,6 +567,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
btrfs_kobj_rm_device(fs_info, src_device);
btrfs_kobj_add_device(fs_info, tgt_device);
 
+   btrfs_dev_replace_unlock(dev_replace);
+
btrfs_rm_dev_replace_blocked(fs_info);
 
btrfs_rm_dev_replace_srcdev(fs_info, src_device);
@@ -580,7 +582,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
 * superblock is scratched out so that it is no longer marked to
 * belong to this filesystem.
 */
-   btrfs_dev_replace_unlock(dev_replace);
mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
mutex_unlock(&root->fs_info->chunk_mutex);
 
-- 
2.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix crash on endio of reading corrupted block

2014-08-20 Thread Liu Bo

On Tue, Aug 19, 2014 at 04:42:42PM -0500, Eric Sandeen wrote:
> On 8/19/14, 10:33 AM, Liu Bo wrote:
> > The crash is
> > 
> > [ cut here ]
> > kernel BUG at fs/btrfs/extent_io.c:2124!
> > [...]
> > Workqueue: btrfs-endio normal_work_helper [btrfs]
> > RIP: 0010:[]  [] 
> > end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
> > 
> > This is in fact a regression.
> 
> It'd be helpful to identify the commit, or at least kernel release, which 
> caused
> the regression.

Okay, got it.

> 
> > It is because we forgot to increase @offset properly in reading corrupted 
> > block,
> > so that the @offset remains, and this leads to checksum errors while reading
> > left blocks queued up in the same bio, and then ends up with hiting the 
> > above
> > BUG_ON.
> 
> So does that mean that any checksum error on this path will crash the kernel?
> 
> That sounds like this bug has exposed a more fundamental problem, no?

Eric, you're right, I was hiding some details, now writing a new commit log...

thanks,
-liubo

> 
> Thanks,
> -Eric
> 
> > Reported-by: Chris Murphy 
> > Signed-off-by: Liu Bo 
> > ---
> >  fs/btrfs/extent_io.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 3af4966..be41e4d 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, 
> > int err)
> > test_bit(BIO_UPTODATE, &bio->bi_flags);
> > if (err)
> > uptodate = 0;
> > +   offset += len;
> > continue;
> > }
> > }
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: cleanup the same name in end_bio_extent_readpage

2014-08-20 Thread Liu Bo

On Tue, Aug 19, 2014 at 03:31:00PM -0400, Chris Mason wrote:
> On 08/19/2014 11:32 AM, Liu Bo wrote:
> > We've defined a 'offset' out of bio_for_each_segment_all.
> 
> This isn't causing problems though?  It should just be shadowing the
> bio_for_each_segment_all variable for the duration of the curlies.
> 
> No objection as a cleanup, just making sure I'm not missing something.

It did't cause any problems yet in my testers at least, so just a cleanup.

thanks,
-liubo

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5] Btrfs: send, lower mem requirements for processing xattrs

2014-08-20 Thread Filipe Manana

Maximum xattr size can be up to nearly the leaf size. For an fs with a
leaf size larger than the page size, using kmalloc requires allocating
multiple pages that are contiguous, which might not be possible if
there's heavy memory fragmentation. Therefore fallback to vmalloc if
we fail to allocate with kmalloc. Also start with a smaller buffer size,
since xattr values typically are smaller than a page.

Reported-by: Chris Murphy 
Signed-off-by: Filipe Manana 
---

V2: Use is_vmalloc_addr() instead of keeping a boolean variable around.
V3: Use krealloc instead of kfree + kmalloc.
V4: Fixed a checkpatch warning about missing blank line after var declaration.
V5: Use kvfree() and pass __GFP_NOWARN to krealloc().

 fs/btrfs/send.c | 40 
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3c63b29..3290da9 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1006,11 +1006,13 @@ static int iterate_dir_item(struct btrfs_root *root, 
struct btrfs_path *path,
int num;
u8 type;
 
-   if (found_key->type == BTRFS_XATTR_ITEM_KEY)
-   buf_len = BTRFS_MAX_XATTR_SIZE(root);
-   else
-   buf_len = PATH_MAX;
-
+   /*
+* Start with a small buffer (1 page). If later we end up needing more
+* space, which can happen for xattrs on a fs with a leaf size greater
+* then the page size, attempt to increase the buffer. Typically xattr
+* values are small.
+*/
+   buf_len = PATH_MAX;
buf = kmalloc(buf_len, GFP_NOFS);
if (!buf) {
ret = -ENOMEM;
@@ -1037,7 +1039,7 @@ static int iterate_dir_item(struct btrfs_root *root, 
struct btrfs_path *path,
ret = -ENAMETOOLONG;
goto out;
}
-   if (name_len + data_len > buf_len) {
+   if (name_len + data_len > BTRFS_MAX_XATTR_SIZE(root)) {
ret = -E2BIG;
goto out;
}
@@ -1045,12 +1047,34 @@ static int iterate_dir_item(struct btrfs_root *root, 
struct btrfs_path *path,
/*
 * Path too long
 */
-   if (name_len + data_len > buf_len) {
+   if (name_len + data_len > PATH_MAX) {
ret = -ENAMETOOLONG;
goto out;
}
}
 
+   if (name_len + data_len > buf_len) {
+   buf_len = name_len + data_len;
+   if (is_vmalloc_addr(buf)) {
+   vfree(buf);
+   buf = NULL;
+   } else {
+   char *tmp = krealloc(buf, buf_len,
+GFP_NOFS | __GFP_NOWARN);
+
+   if (!tmp)
+   kfree(buf);
+   buf = tmp;
+   }
+   if (!buf) {
+   buf = vmalloc(buf_len);
+   if (!buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   }
+   }
+
read_extent_buffer(eb, buf, (unsigned long)(di + 1),
name_len + data_len);
 
@@ -1071,7 +1095,7 @@ static int iterate_dir_item(struct btrfs_root *root, 
struct btrfs_path *path,
}
 
 out:
-   kfree(buf);
+   kvfree(buf);
return ret;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] Btrfs: fix crash on endio of reading corrupted block

2014-08-20 Thread Liu Bo

The crash is

[ cut here ]
kernel BUG at fs/btrfs/extent_io.c:2124!
invalid opcode:  [#1] SMP
...
CPU: 3 PID: 88 Comm: kworker/u8:7 Not tainted 3.17.0-0.rc1.git0.1.fc22.x86_64 #1
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Workqueue: btrfs-endio normal_work_helper [btrfs]
task: 8800d7152700 ti: 8800d729c000 task.ti: 8800d729c000
RIP: 0010:[]  [] 
end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
Call Trace:
  [] ? __enqueue_entity+0x78/0x80
  [] ? enqueue_entity+0x2e9/0x990
  [] bio_endio+0x6b/0xa0
  [] bio_endio_nodec+0x12/0x20
  [] end_workqueue_fn+0x37/0x40 [btrfs]
  [] normal_work_helper+0xbd/0x280 [btrfs]
  [] process_one_work+0x17e/0x430
  [] worker_thread+0x6b/0x4a0
  [] ? rescuer_thread+0x2a0/0x2a0
  [] kthread+0xea/0x100
  [] ? kthread_create_on_node+0x1a0/0x1a0
  [] ret_from_fork+0x7c/0xb0
  [] ? kthread_create_on_node+0x1a0/0x1a0

This is in fact a regression.

It is because we forgot to increase @offset properly in reading corrupted block,
so that the @offset remains unchanged, and it leads to checksum errors while
reading left blocks queued up in the same bio, and then btrfs tries to
iterate copies for those blocks in order to get good data, and hits the
BUG_ON() which we set to avoid finding good copies for blocks without problems.

Reported-by: Chris Murphy 
Signed-off-by: Liu Bo 
---
v2:
   - Improve the commit log to be clear, suggested by Eric.

 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3af4966..be41e4d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, int 
err)
test_bit(BIO_UPTODATE, &bio->bi_flags);
if (err)
uptodate = 0;
+   offset += len;
continue;
}
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Questions on using BtrFS for fileserver

2014-08-20 Thread Tomasz Chmielewski


we are thinking about using BtrFS on standard hardware for a
fileserver with about 50T (100T raw) of storage (25Ã—4TByte).



I would recommend carefully reading this thread titled: "1 week to
rebuid 4x 3TB raid10 is a long time!"


So I have a 2 x 2.6 TB devices in btrfs RAID-1, 716G used. Linux 3.16.

One of the disks failed.

"btrfs device delete missing /home" is taking 9 days so far, on an idle 
system:


root  4828  0.3  0.0  17844   260 pts/1D+   Aug11  38:18 btrfs 
device delete missing /home


There is some kind of btrfs debug info printed in dmesg which seems to 
tell me that the operation is working, like:


[744657.598810] BTRFS info (device sda4): relocating block group 
908951814144 flags 17

[744672.021612] BTRFS info (device sda4): found 4784 extents
[744688.604997] BTRFS info (device sda4): found 4784 extents
[744689.133397] BTRFS info (device sda4): relocating block group 
91002968 flags 17

[744701.162678] BTRFS info (device sda4): found 4196 extents
[744725.000459] BTRFS info (device sda4): found 4196 extents


but other than that, the recovery time doesn't look optimistic to me, 
there is no ability to check the progress etc.



--
Tomasz Chmielewski
http://www.sslrack.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [btrfs] 8d875f95: xfstests.generic.226.fail

2014-08-20 Thread Miao Xie

On Tue, 19 Aug 2014 10:58:09 -0400, Chris Mason wrote:
> On 08/19/2014 10:23 AM, David Sterba wrote:
>> On Tue, Aug 19, 2014 at 07:58:20PM +0800, Fengguang Wu wrote:
>>> We noticed an xfstests failure on commit
>>>
>>> 8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file 
>>> flushes for renames and truncates")
>>>
>>> It's 100% reproducible in the 5 test runs.
>>
>> Same here, different mkfs configurations.
>>
>> generic/226 28s ...[16:11:52] [16:12:55] - output mismatch (see 
>> /root/xfstests/results//generic/226.out.bad)
>> --- tests/generic/226.out   2013-05-29 17:16:03.0 +0200
>> +++ /root/xfstests/results//generic/226.out.bad 2014-08-19 
>> 16:12:55.0 +0200
>> @@ -1,6 +1,8 @@
>>  QA output created by 226
>>  --> mkfs 256m filesystem
>>  --> 16 buffered 64m writes in a loop
>> -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
>> +1 2 3 4 pwrite64: No space left on device
>> +5 6 7 8 9 10 11 12 pwrite64: No space left on device
>> +13 14 15 16
>>
>> enospc on a small filesystem (256M)
> 
> I'm calling filemap flush more often, but otherwise everything else is
> the same.  I'll take a look.

The above patch also introduced a performance regression(~70%DOWN).
We can reproduce this regression by fio, here is the config:

[global]
ioengine=falloc
iodepth=1
direct=0
buffered=0
directory=
nrfiles=1
filesize=100m
group_reporting

[sequential aio-dio write]
stonewall
ioengine=posixaio
numjobs=1
iodepth=128
buffered=0
direct=0
rw=write
bs=64k
filename=fragmented_file

I found the problem is caused by the following function:

int btrfs_release_file(struct inode *inode, struct file *filp)
{
...
filemap_flush(inode->i_mapping);
return 0;
}

I don't think we need flush file at most situation. Ext4 flushes the file only
after someone truncate the file to be zero-length, I don't know the real reason
why ext4 flush the file only after the file is truncated, someone said it is to
reduce the risk that the users find a zero-length file after a crash, which 
happens
after truncate-write-close process.

If we change btrfs_release_file by ext4's implementation, both the failure of
xfstests's generic/226  and performance regression can be fixed.

Thanks
Miao

> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-20 Thread David Sterba

On Tue, Aug 19, 2014 at 10:33:44AM -0500, Eric Sandeen wrote:
> Seems like using /proc/partitions would make more sense in that case
> than a recursive scan of every file under /dev, wouldn't it?
> Any details on those reports?

It does make sense.

> I'm just wondering when you might possibly have success looking deep
> into the /dev tree if you didn't have success in /proc/partitions.

I haven't figured out any advantage of /dev.

> It just seems a bit bizarre to have so many ways to get the same info.

I think keeping blkid as default and /proc/filesystems as fallback
should cover the usecases. This meanas that the option -d stays, and the
scanning method can be changed to BTRFS_SCAN_PROC (fi show and dev scan).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Questions on using BtrFS for fileserver

2014-08-20 Thread Benjamin O'Connor

As a counter-argument, I use BTRFS on a filesystem with about 280TB raw right now as a
fileserver. Note that this is mostly transient data (raid 0), and I have stuck with 3.14,
hearing the horror stories of 3.15/3.16 locking up.

Even at that size (29TB LUNs), I have been able to add and remove devices and rebalance
with no issues other than it causing increased IO and taking several weeks to move that
much data around.

Definitely avoid 3.15/3.16 and test out your workload first if possible to make sure it
performs properly. Also build the filesystem and put data on it and test device
removals/rebuilds to make sure it works with your OS, btrfs tools, and kernel version.
All that being said it works great for us where we value COW and expansion/rebalancing
over performance and redundancy.

The filesystem is exported via NFS and rsync to over 200 clients over 10gb/sec ethernet,
and hits around 5-7gb/sec balanced between reads and writes.

One of our alternative file storage needs that I'd also hoped to move to BTRFS consisted
of subtrees of 255 directories, each with 255 directories under them, and 255 directories
under them with 1 file in each (don't ask). That *did not* work well under BTRFS --
probably due to the metadata juggling required in creating or removing any one file that
far down in such a bizarre tree. We kept that particular area under XFS.

-ben

Tomasz Chmielewski wrote:

we are thinking about using BtrFS on standard hardware for a
fileserver with about 50T (100T raw) of storage (25Ã—4TByte).

I would recommend carefully reading this thread titled: "1 week to
rebuid 4x 3TB raid10 is a long time!"

So I have a 2 x 2.6 TB devices in btrfs RAID-1, 716G used. Linux 3.16.

One of the disks failed.

"btrfs device delete missing /home" is taking 9 days so far, on an idle system:

root 4828 0.3 0.0 17844 260 pts/1 D+ Aug11 38:18 btrfs device delete missing
/home

There is some kind of btrfs debug info printed in dmesg which seems to tell me
that the
operation is working, like:

[744657.598810] BTRFS info (device sda4): relocating block group 908951814144
flags 17
[744672.021612] BTRFS info (device sda4): found 4784 extents
[744688.604997] BTRFS info (device sda4): found 4784 extents
[744689.133397] BTRFS info (device sda4): relocating block group 91002968
flags 17
[744701.162678] BTRFS info (device sda4): found 4196 extents
[744725.000459] BTRFS info (device sda4): found 4196 extents

but other than that, the recovery time doesn't look optimistic to me, there is
no ability
to check the progress etc.

--
-
Benjamin O'Connor
TechOps Systems Administrator
TripAdvisor Media Group

bocon...@tripadvisor.com
c. 617-312-9072
-

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [btrfs] 8d875f95: xfstests.generic.226.fail

2014-08-20 Thread Chris Mason



On 08/20/2014 06:52 AM, Miao Xie wrote:
> On Tue, 19 Aug 2014 10:58:09 -0400, Chris Mason wrote:
>> On 08/19/2014 10:23 AM, David Sterba wrote:
>>> On Tue, Aug 19, 2014 at 07:58:20PM +0800, Fengguang Wu wrote:
 We noticed an xfstests failure on commit

 8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file 
 flushes for renames and truncates")

 It's 100% reproducible in the 5 test runs.
>>>
>>> Same here, different mkfs configurations.
>>>
>>> generic/226 28s ...[16:11:52] [16:12:55] - output mismatch (see 
>>> /root/xfstests/results//generic/226.out.bad)
>>> --- tests/generic/226.out   2013-05-29 17:16:03.0 +0200
>>> +++ /root/xfstests/results//generic/226.out.bad 2014-08-19 
>>> 16:12:55.0 +0200
>>> @@ -1,6 +1,8 @@
>>>  QA output created by 226
>>>  --> mkfs 256m filesystem
>>>  --> 16 buffered 64m writes in a loop
>>> -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
>>> +1 2 3 4 pwrite64: No space left on device
>>> +5 6 7 8 9 10 11 12 pwrite64: No space left on device
>>> +13 14 15 16
>>>
>>> enospc on a small filesystem (256M)
>>
>> I'm calling filemap flush more often, but otherwise everything else is
>> the same.  I'll take a look.
> 
> I found the problem is caused by the following function:
> 
> int btrfs_release_file(struct inode *inode, struct file *filp)
> {
>   ...
>   filemap_flush(inode->i_mapping);
>   return 0;
> }
> 
> I don't think we need flush file at most situation. Ext4 flushes the file only
> after someone truncate the file to be zero-length, I don't know the real 
> reason
> why ext4 flush the file only after the file is truncated, someone said it is 
> to
> reduce the risk that the users find a zero-length file after a crash, which 
> happens
> after truncate-write-close process.
> 
> If we change btrfs_release_file by ext4's implementation, both the failure of
> xfstests's generic/226  and performance regression can be fixed.
> 

You're completely right, my original had more checks here and I stripped
them out by accident.  Fixing, thanks!

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: show real function name in btrfs workqueue tracepoint

2014-08-20 Thread David Sterba

On Fri, Aug 15, 2014 at 11:38:06PM +0800, Liu Bo wrote:
> Use %pf instead of %p, just same as kernel workqueue tracepoints.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: send, lower mem requirements for processing xattrs

2014-08-20 Thread David Sterba

On Sun, Aug 10, 2014 at 11:52:09PM +, Josef Bacik wrote:
> Sigh I can only top post from my phone.

Can you at least snip the original text?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8 v2] btrfs: rename total_bytes to avoid confusion

2014-08-20 Thread David Sterba

On Wed, Aug 20, 2014 at 10:54:17AM +0800, Anand Jain wrote:
> we are assigning number_devices to the total_bytes,
> that's very confusing for a moment
> 
> Signed-off-by: Anand Jain 

Thanks.

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix filemap_flush call in btrfs_file_release

2014-08-20 Thread Chris Mason


We should only be flushing on close if the file was flagged as needing
it during truncate.  I broke this with my ordered data vs transaction
commit deadlock fix.

Thanks to Miao Xie for catching this.

Signed-off-by: Chris Mason 
Reported-by: Miao Xie 
Reported-by: Fengguang Wu 

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f15c13f..36861b7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1840,7 +1840,15 @@ int btrfs_release_file(struct inode *inode, struct file 
*filp)
 {
if (filp->private_data)
btrfs_ioctl_trans_end(filp);
-   filemap_flush(inode->i_mapping);
+   /*
+* ordered_data_close is set by settattr when we are about to truncate
+* a file from a non-zero size to a zero size.  This tries to
+* flush down new bytes that may have been written if the
+* application were using truncate to replace a file in place.
+*/
+   if (test_and_clear_bit(BTRFS_INODE_ORDERED_DATA_CLOSE,
+  &BTRFS_I(inode)->runtime_flags))
+   filemap_flush(inode->i_mapping);
return 0;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -rc2] btrfs: remove stale code after removing ordered operations

2014-08-20 Thread David Sterba

The commit "btrfs: disable strict file flushes for renames and
truncates" (8d875f95da43c6a8f18f77869f2ef26e9594fecc) left some unused
code and defines.

Signed-off-by: David Sterba 
---

This is a cleanup after a 3.17-rc1 patch.

 fs/btrfs/btrfs_inode.h |  8 
 fs/btrfs/ctree.h   |  7 ---
 fs/btrfs/inode.c   | 10 --
 3 files changed, 25 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 43527fd78825..ee9d37d4f883 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -25,14 +25,6 @@
 #include "ordered-data.h"
 #include "delayed-inode.h"
 
-/*
- * ordered_data_close is set by truncate when a file that used
- * to have good data has been truncated to zero.  When it is set
- * the btrfs file release call will add this inode to the
- * ordered operations list so that we make sure to flush out any
- * new data the application may have written before commit.
- */
-#define BTRFS_INODE_ORDERED_DATA_CLOSE 0
 #define BTRFS_INODE_ORPHAN_META_RESERVED   1
 #define BTRFS_INODE_DUMMY  2
 #define BTRFS_INODE_IN_DEFRAG  3
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8e29b614fe93..69cd3262193e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -62,13 +62,6 @@ struct btrfs_ordered_sum;
 
 #define BTRFS_COMPAT_EXTENT_TREE_V0
 
-/*
- * files bigger than this get some pre-flushing when they are added
- * to the ordered operations list.  That way we limit the total
- * work done by the commit
- */
-#define BTRFS_ORDERED_OPERATIONS_FLUSH_LIMIT (8 * 1024 * 1024)
-
 /* holds pointers to all of the tree roots */
 #define BTRFS_ROOT_TREE_OBJECTID 1ULL
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 03708ef3deef..ded36a4e47a8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4545,16 +4545,6 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
ret = btrfs_update_inode(trans, root, inode);
btrfs_end_transaction(trans, root);
} else {
-
-   /*
-* We're truncating a file that used to have good data down to
-* zero. Make sure it gets into the ordered flush list so that
-* any new writes get down to disk quickly.
-*/
-   if (newsize == 0)
-   set_bit(BTRFS_INODE_ORDERED_DATA_CLOSE,
-   &BTRFS_I(inode)->runtime_flags);
-
/*
 * 1 for the orphan item we're going to add
 * 1 for the orphan item deletion.
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -rc2] btrfs: remove stale code after removing ordered operations

2014-08-20 Thread Chris Mason

On 08/20/2014 11:03 AM, David Sterba wrote:
> The commit "btrfs: disable strict file flushes for renames and
> truncates" (8d875f95da43c6a8f18f77869f2ef26e9594fecc) left some unused
> code and defines.

See my followup for the perf regression, we still need these.  I was
never able to trigger the enospc problem with xfstests here, but I did
double check that we're flushing or not properly now.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Questions on using BtrFS for fileserver

2014-08-20 Thread Austin S Hemmelgarn

On 08/19/2014 05:38 PM, Andrej Manduch wrote:
> Hi,
> 
> On 08/19/2014 06:21 PM, M G Berberich wrote:> · Are there any
> reports/papers/web-pages about BtrFS-systems this size
>>   in use? Praises, complains, performance-reviews, whatever…
> 
> I don't know about papers or benchmarks but few weeks ago there was a
> guy who has problem with really long mounting with btrfs with similiar size.
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36226.html
> 
> And I would not recommend 3TB disks. *I'm not btrfs dev* but as far as I
> know there is a quite different between rebuilding disk on real RAID and
> btrfs RAID. The problem is btrfs has RAID on filesystem level not on hw
> level so there is bigger mechanical overheat on drives and thus it take
> significantli longer than regular RAID.
It really suprises me that so many people come to this conclusion, but
maybe they don't provide as much slack space as I do on my systems.  In
general you will only have a longer rebuild on BTRFS than on hardware
RAID if the filesystem is more than about 50% full.  On my desktop array
(4x 1TB disks using BTRFS RAID10), I've replaced disks before and it
took less than an hour for the operation.  Of course that array is
usually not more than 10% full.  Interestingly, it took less time to
rebuild this array the last time I lost a disk than it did back when it
was 3x 1TB disks in a BTRFS RAID1, so things might improve overall with
a larger number of disks in the array.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: remove stale define after removing ordered operations

2014-08-20 Thread David Sterba

Last user removed in commit "btrfs: disable strict file flushes for
renames and truncates" (8d875f95da43c6a8f18f77869f2ef26e9594fecc).

Signed-off-by: David Sterba 
---

 fs/btrfs/ctree.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8e29b614fe93..69cd3262193e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -62,13 +62,6 @@ struct btrfs_ordered_sum;
 
 #define BTRFS_COMPAT_EXTENT_TREE_V0
 
-/*
- * files bigger than this get some pre-flushing when they are added
- * to the ordered operations list.  That way we limit the total
- * work done by the commit
- */
-#define BTRFS_ORDERED_OPERATIONS_FLUSH_LIMIT (8 * 1024 * 1024)
-
 /* holds pointers to all of the tree roots */
 #define BTRFS_ROOT_TREE_OBJECTID 1ULL
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

52 matches

Mail list logo