Re: [PATCH] Btrfs: Add code to support file creation time.

2012-09-30 Thread Arne Jansen
On 07/04/12 13:04, Alexander Block wrote:
 On Wed, Jul 4, 2012 at 9:56 AM, Li Zefan lize...@huawei.com wrote:
 On 2012/7/4 15:18, chandan r wrote:

 This patch adds a new member to the 'struct btrfs_inode' structure to hold
 the file creation time.



 Well, how do users use this file creation time? There's no syscall and 
 there's
 no ioctl that exports this information. That xstat syscall hasn't been 
 accepted,
 so you can revise and repost the patch when you see it happens.
 In my opinion we should still include this patch. Currently, otime is never 
 even
 initialized, having undefined values. If it ever gets possible to
 access otime, we
 would at least have some inodes with valid otime fields.

I'll second that, even if by now the fields get correctly initialized.
Why should we zero the fields instead of setting them to the correct
values?

-Arne

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


MDaemon Notification -- Attachment Removed

2012-09-30 Thread Postmaster
The following message contained restricted attachment(s) which have been 
removed:

From  : linux-btrfs@vger.kernel.org
To: bh...@vista.gov.vn
Subject   : bh...@vista.gov.vn
Message-ID: 

Attachment(s) removed:
-
mail.zip (mail.scr)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: try to avoid doing a search in btrfs_next_leaf

2012-09-30 Thread Alex Lyakas
Hi Josef,
I guess I am missing something, but I thought btrfs_next_leaf() should
just jump to the next leaf (or item, if it was added meanwhile)
irrespective of the key that is in the last slot of the current leaf.
The change you added is effective when there is a next leaf, but you
refuse to go there unless its first key has the same objectid. (I
think you use the ctree property that the key in the first node/leaf
of a tree block is equal to its parent's key).
Can you pls explain why you insist on the same objectid?

Thanks,
Alex.


On Mon, Sep 24, 2012 at 10:02 PM, Josef Bacik jba...@fusionio.com wrote:
 Things like btrfs_drop_extents call btrfs_next_leaf to see if there is
 anything else they need on the next leaf.  This will result in a re-search,
 but if we are already at the last leaf in the tree or if the first item in
 the next leaf doesn't match the objectid of the one we want we can just
 return without doing the search at all.  This helps in things like fsync()
 where our tree is pretty shallow and we're likely to be on the last leaf
 often.  Thanks,

 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/ctree.c |   27 +++
  fs/btrfs/ctree.h |1 +
  2 files changed, 28 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 6d183f6..64ea61c 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -2441,6 +2441,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
 struct btrfs_root
 lowest_level = p-lowest_level;
 WARN_ON(lowest_level  ins_len  0);
 WARN_ON(p-nodes[0] != NULL);
 +   p-shecantgoanyfarthercapt = 1;

 if (ins_len  0) {
 lowest_unlock = 2;
 @@ -2568,6 +2569,13 @@ cow_done:

 if (level != 0) {
 int dec = 0;
 +
 +   /*
 +* Slot is not the last in the node, we can go farther
 +* capt.
 +*/
 +   if (slot  btrfs_header_nritems(b))
 +   p-shecantgoanyfarthercapt = 0;
 if (ret  slot  0) {
 dec = 1;
 slot -= 1;
 @@ -5612,8 +5620,27 @@ int btrfs_next_old_leaf(struct btrfs_root *root, 
 struct btrfs_path *path,
 nritems = btrfs_header_nritems(path-nodes[0]);
 if (nritems == 0)
 return 1;
 +   if (path-shecantgoanyfarthercapt)
 +   return 1;
 +   if (!path-nodes[1])
 +   return 1;

 btrfs_item_key_to_cpu(path-nodes[0], key, nritems - 1);
 +
 +   /*
 +* If we have the level above us locked already just check and see if
 +* the key in the next leaf even has the same objectid, and if not
 +* return 1 and avoid the search.
 +*/
 +   if (path-locks[1] 
 +   path-slots[1] + 1  btrfs_header_nritems(path-nodes[1])) {
 +   struct btrfs_key tmp;
 +
 +   btrfs_node_key_to_cpu(path-nodes[1], tmp,
 + path-slots[1] + 1);
 +   if (key.objectid != tmp.objectid)
 +   return 1;
 +   }
  again:
 level = 1;
 next = NULL;
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 6f2e7e6..2e5c6c5 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -571,6 +571,7 @@ struct btrfs_path {
 unsigned int skip_locking:1;
 unsigned int leave_spinning:1;
 unsigned int search_commit_root:1;
 +   unsigned int shecantgoanyfarthercapt:1;
  };

  /*
 --
 1.7.7.6

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: btrfsck: Print feedback about fscking to stdout.

2012-09-30 Thread Dieter Ries
Status reports of the checking process should be printed to stdout
instead of stderr, as that is normal program output and not related to
problems in btrfsck. This patch changes this behaviour and adds the
output Done! after each of the parts.

Signed-off-by: Dieter Ries m...@dieterries.net
---
 btrfsck.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/btrfsck.c b/btrfsck.c
index dde63e7..245ba7d 100644
--- a/btrfsck.c
+++ b/btrfsck.c
@@ -3557,7 +3557,7 @@ int main(int ac, char **av)
 
root = info-fs_root;
 
-   fprintf(stderr, checking extents\n);
+   printf(checking extents... );
if (rw)
trans = btrfs_start_transaction(root, 1);
 
@@ -3565,22 +3565,31 @@ int main(int ac, char **av)
fprintf(stderr, Reinit crc root\n);
ret = btrfs_fsck_reinit_root(trans, info-csum_root);
if (ret) {
+   printf(\n);
fprintf(stderr, crc root initialization failed\n);
return -EIO;
}
goto out;
}
ret = check_extents(trans, root, repair);
-   if (ret)
+   if (ret) {
fprintf(stderr, Errors found in extent allocation tree\n);
+   printf(\n);
+   }
+   else
+   printf(Done!\n);
 
-   fprintf(stderr, checking fs roots\n);
+   printf(checking fs roots... );
ret = check_fs_roots(root, root_cache);
-   if (ret)
+   if (ret) {
+   printf(\n);
goto out;
+   else
+   printf(Done!\n);
 
-   fprintf(stderr, checking root refs\n);
+   printf(checking root refs... );
ret = check_root_refs(root, root_cache);
+   printf(Done!\n);
 out:
free_root_recs(root_cache);
if (rw) {
-- 
1.7.3.GIT

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: Remove redundant Btrfs string from version string

2012-09-30 Thread Dieter Ries
In the first line of version.sh, $v was set to Btrfs vx.yy, and in
the end Btrfs $v was echoed to the version.h file. This resulted in
the version string Btrfs Btrfs vx.yy. This patch removes the second
occurrence of Btrfs.

Signed-off-by: Dieter Ries m...@dieterries.net
---
 version.sh |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/version.sh b/version.sh
index af3e441..2c1aff1 100644
--- a/version.sh
+++ b/version.sh
@@ -46,7 +46,7 @@ fi
  
 echo #ifndef __BUILD_VERSION  .build-version.h
 echo #define __BUILD_VERSION  .build-version.h
-echo #define BTRFS_BUILD_VERSION \Btrfs $v\  .build-version.h
+echo #define BTRFS_BUILD_VERSION \$v\  .build-version.h
 echo #endif  .build-version.h
 
 diff -q version.h .build-version.h  /dev/null
-- 
1.7.3.GIT

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: btrfsck: Print which filesystem to be checked to stdout

2012-09-30 Thread Dieter Ries
This patch makes btrfsck print the filesystem, which is to be checked,
to stdout. This should be helpful when analyzing (copied and pasted)
output of btrfsck.

Signed-off-by: Dieter Ries m...@dieterries.net
---
 btrfsck.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/btrfsck.c b/btrfsck.c
index 088b9f4..dde63e7 100644
--- a/btrfsck.c
+++ b/btrfsck.c
@@ -3538,6 +3538,8 @@ int main(int ac, char **av)
} else if(ret) {
fprintf(stderr, %s is currently mounted. Aborting.\n, 
av[optind]);
return -EBUSY;
+   } else {
+   printf(Checking filesystem on %s\n,av[optind]);
}
 
info = open_ctree_fs_info(av[optind], bytenr, rw, 1);
-- 
1.7.3.GIT

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: Some cosmetic changes (mainly) to btrfsck

2012-09-30 Thread Dieter Ries
Hi,

this patch series implements some mainly cosmetic changes to 
btrfs-progs, most in btrfsck.

As this is my first contribution here, I'd kindly ask you for feedback,
and if work like this is appreciated in general.

Cheers,

Dieter

Dieter Ries (3):
  btrfs-progs: Remove redundant Btrfs string from version string
  btrfs-progs: btrfsck: Print which filesystem to be checked to stdout
  btrfs-progs: btrfsck: Print feedback about fscking to stdout.

 btrfsck.c  |   21 -
 version.sh |2 +-
 2 files changed, 17 insertions(+), 6 deletions(-)

-- 
1.7.3.GIT

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck: root 256 inode 581419 errors 100. Can I heal this file system?

2012-09-30 Thread David Sterba
On Sun, Sep 30, 2012 at 12:26:36AM +0200, Adam Ryczkowski wrote:
 I was using this btrfs file system on kernel 3.6 rc4 and rc5 and 3.5.4 on
 Mint 13 (which is based on Ubuntu 12.04). Now, I found the following output
 of btrfsck /dev/sda8 (which was of course off-line at that time):
 checking extents
 checking fs roots
 root 256 inode 581419 errors 100

If I'm reading fsck.c right, it means that there are missing extents at
the end of the file, ie some unaccounted range.

Please run 'filefrag -vb mountpoint/@/var/cache/hald/fdi-cache', save
the last line, and add the 2nd and last numbers

  22 1046528 52330076 63340363   2048 eof
 ^^^ 

multiply by 1024 and this will give end of the accounted extents. Next,
run 'stat' on the file to find it's size. Tell me those numbers and I'll
tell you if your filesystem is broken :)

I think the error comes from a truncate that made extents and file size
disagree and this can be further observed by reading the file beyond the
last extent. In ideal case it returns zeros in that range, or refuse to
read anything due to missing checksum.

Do you recall any unclean shutdown? As it's a cache of a system(!d)
service, the file will be probably open for a long time and will last
out to all unexpected shutdowns.

 I understand, that the line root 256 inode 581419 errors 100 tries to tell
 me, that there is a problem with inode 581419. (Which happen to be
 mountpoint/@/var/cache/hald/fdi-cache; I use btrfs as a root file system for
 Linux mint 13). I pulled the recent btrfs-tools from Ubuntu 12.10
 (https://launchpad.net/ubuntu/quantal/+package/btrfs-tools; version
 0.19+20120328-7ubuntu1 
 https://launchpad.net/ubuntu/quantal/+source/btrfs-tools/0.19+20120328-7ubuntu1).
 The btrfsck doesn't make the error disapear. I don't experience any problems
 with the file system, but maybe I am only lucky.

Not sure here, it's fsck after all, but the particular error could be
fixed by resetting the inode size to the last extent end or adding a
preallocated range for the gap.

 Is there any way to heal the file system?
 
 I can read the file just fine. But I can't confirm if the contents are valid
 though; they definitely don't look random. Would deleting the file help? Or
 re-writing it?

If it's a cache, the application will be able to recreate it if you
delete it. Howver I'm not sure if this would not provoke a runtime bug.

Maybe other developers would like to investigate, so please do not
delete it yet. Otherwise, 'rm', or 'truncate -s0 file; rm file' should work.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested

2012-09-30 Thread David Sterba
Hi again,

On Sun, Sep 30, 2012 at 11:11:10AM +0900, Itaru Kitayama wrote:
 This is the correct one.
 
 Signed-off-by: Itaru Kitayama kitay...@cl.bb4.ne.jp
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index efb044e..c032dbe 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root
 *root, u64 to_reclaim, u64 orig,
 while (delalloc_bytes  loops  3) {
 max_reclaim = min(delalloc_bytes, to_reclaim);
 nr_pages = max_reclaim  PAGE_CACHE_SHIFT;
 -   writeback_inodes_sb_nr_if_idle(root-fs_info-sb, nr_pages,
 -  WB_REASON_FS_FREE_SPACE);
 +   if (!bdi_write_congested(root-fs_info-sb-s_bdi))
 +   writeback_inodes_sb_nr_if_idle(root-fs_info, 
 nr_pages,
 +  
 WB_REASON_FS_FREE_SPACE);

You don't pass the same argument to writeback_inodes_sb_nr_if_idle in
the changed code, this would not compile (root-fs_info-sb), but it's
a minor thing.

I'm more interested in the background motivation of the change, it's
clear that it tries to avoid writing data if the devices are congested,
have you measured an improvement against original behaviour?

writeback_inodes_sb_nr_if_idle checks if writeback is in progress and
does not start if this is true. That way this will not hammer the device
unnecessarily.

Your code tries to avoid even more hammering of the device when the
writes do not come from writeback. This may or may not be a good thing
(and hard to guess without a few tests, or at least I don't see that).

Shrink delalloc starts the writeback for a given number of pages and
then hopes they'll be flushed so the reserved space can be reclaimed
back. If the device is congested, this will not start the writeback and
it would be very unlikely that total delalloc bytes shrinks. The rest of
the function relies on asynchronous behaviour, it's even less clear what
it would do without the writeback call. In the worst case it could block
on 'wait_event' or at 'btrfs_wait_ordered_extents' in some case, though
this is just more of a speculation.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] btrfs-progs: limit the max value of leafsize and nodesize

2012-09-30 Thread David Sterba
On Fri, Sep 28, 2012 at 11:02:39AM +0800, Robin Dong wrote:
 From: Robin Dong san...@taobao.com
 --- a/mkfs.c
 +++ b/mkfs.c
 @@ -1201,6 +1201,27 @@ static int zero_output_file(int out_fd, u64 size, u32 
 sectorsize)
   return ret;
  }
  
 +static int check_leaf_or_node_size(u32 size, u32 sectorsize)
 +{
 + if (size  sectorsize) {
 + fprintf(stderr,
 + Illegal leafsize (or nodesize) %u (smaller than %u)\n,
 + size, sectorsize);

First looking at the message, the absolute limit value might be
confusing, the condition is really 'if node/leaf is smaller than
sectorsize', and sectorsize will become a varaible eventually (though
now it's fixed to the system's PAGE_SIZE and 4k in most cases).

So, even if this will make the erro message more verbose, I suggest to
enhance the explanation with

... (smaller than sectorsize %u)

(If you se a better way how to format the value and the word, please don't
hasitate to change it.)

 + return -1;
 + } else if (size  BTRFS_MAX_METADATA_BLOCKSIZE) {
 + fprintf(stderr,
 + Illegal leafsize (or nodesize) %u (larger than %u)\n,
 + size, BTRFS_MAX_METADATA_BLOCKSIZE);
 + return -1;
 + } else if (size  (sectorsize - 1)) {
 + fprintf(stderr,
 + Illegal leafsize (or nodesize) %u (not align to %u)\n,

align - aligned

 + size, sectorsize);
 + return -1;
 + }
 + return 0;
 +}
 +

thanks,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] btrfs-progs: limit the min value of total_bytes

2012-09-30 Thread David Sterba
On Fri, Sep 28, 2012 at 11:02:40AM +0800, Robin Dong wrote:
 From: Robin Dong san...@taobao.com
 
 Using mkfs.btrfs like:
 
 mkfs.btrfs -b 1048576 /dev/sda
 
 will report error:
 
   mkfs.btrfs: volumes.c:796: btrfs_alloc_chunk: Assertion `!(ret)' failed.
   Aborted
 
 because the length of dev_extent is 4MB.
 
 But if we use mkfs.btrfs with 8MB total bytes, the newly mounted btrfs 
 filesystem
 would not contain even one empty file. So 12MB will be good min-value for 
 block_count.

I'm not able to create a single file even on a 12MB filesystem
(with -d single -m single --mixed), so any limit that would let the mkfs
finish normally should be fine. For the single/single case it's 5MB but
for the dup/dup it's 156MB. It's due to the known bug in the blockgroup
creation with multiple devices (applies on dup as well here) that leads
to:

# btrfs fi df .
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Data+Metadata, DUP: total=64.00MB, used=24.00KB
Data+Metadata: total=8.00MB, used=0.00

8*2 + 4 + 64*2 + 8 = 156

so, 12M is too small to avoid the mkfs crash.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: btrfs subvolume delete could delete subvolumes

2012-09-30 Thread David Sterba
On Mon, Sep 24, 2012 at 08:36:47AM -0600, cwillu wrote:
 For what it's worth, rmdir's behaviour is to continue after errors
 (i.e., mkdir 1; mkdir 3; rmdir 1 2 3 deletes 1 and 3, and exits with
 a non-zero exit code); unless there's a good reason to do otherwise,
 matching that behaviour is probably best.

Thanks for your input. I have tried it and agree with the proposed
implementation (ie. to process all arguments and skip non-subvols).

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: move inline function code to header file

2012-09-30 Thread David Sterba
On Sat, Sep 29, 2012 at 04:07:47PM +0800, Robin Dong wrote:
 From: Robin Dong san...@taobao.com
 
 When building btrfs from kernel code, it will report:
 
   fs/btrfs/extent_io.h:281: warning: 'extent_buffer_page' declared inline 
 after being called
   fs/btrfs/extent_io.h:281: warning: previous declaration of 
 'extent_buffer_page' was here
   fs/btrfs/extent_io.h:280: warning: 'num_extent_pages' declared inline 
 after being called
   fs/btrfs/extent_io.h:280: warning: previous declaration of 
 'num_extent_pages' was here
 
 because of the wrong declaration of inline functions.
 
 Signed-off-by: Robin Dong san...@taobao.com

Yep, makes sense, thanks.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: remove unused function btrfs_insert_some_items()

2012-09-30 Thread David Sterba
On Sat, Sep 29, 2012 at 04:07:46PM +0800, Robin Dong wrote:
 From: Robin Dong san...@taobao.com
 
 The function btrfs_insert_some_items() would not be called by any other 
 functions,
 so remove it.
 
 Signed-off-by: Robin Dong san...@taobao.com

Ok from me, thanks.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: end user may like full path for the subvol list display

2012-09-30 Thread David Sterba
On Fri, Sep 28, 2012 at 08:56:33PM +0800, Anand jain wrote:
 From: Anand Jain anand.j...@oracle.com

 [PATCH] Btrfs-progs: end user may like full path for the subvol list
 display

Sure, why not, but please make it optional, eg. with --full-subvol-path
or similar ugly command line option :)

There are several patches touching the output format of the subvolume
listing, it would be great to put them all into one branch and then
unify the options.

It may end up in a more invasive format change where the one-line would
stay as is now and another --verbose format where the rest of the
information (generation, timestamp, uuid, rw/ro, parent) will be listed
separately under the subvol name in a parseable format.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested

2012-09-30 Thread Itaru Kitayama
Hi David,
Thank you for your comments. I wanted to fix a lockdep warning on a
possible deadlock
case encountered during the xfstests with a scratch space almost full.

You are right I encountered the worst scenario you described below, I
drop this patch and
I'll look at btrfs_congested_fn more to examine the mechanisms
implemented there are
working as expected.

Thanks,

Itaru

On Mon, Oct 1, 2012 at 7:29 AM, David Sterba d...@jikos.cz wrote:
 Hi again,

 On Sun, Sep 30, 2012 at 11:11:10AM +0900, Itaru Kitayama wrote:
 This is the correct one.

 Signed-off-by: Itaru Kitayama kitay...@cl.bb4.ne.jp

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index efb044e..c032dbe 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root
 *root, u64 to_reclaim, u64 orig,
 while (delalloc_bytes  loops  3) {
 max_reclaim = min(delalloc_bytes, to_reclaim);
 nr_pages = max_reclaim  PAGE_CACHE_SHIFT;
 -   writeback_inodes_sb_nr_if_idle(root-fs_info-sb, nr_pages,
 -  WB_REASON_FS_FREE_SPACE);
 +   if (!bdi_write_congested(root-fs_info-sb-s_bdi))
 +   writeback_inodes_sb_nr_if_idle(root-fs_info, 
 nr_pages,
 +  
 WB_REASON_FS_FREE_SPACE);

 You don't pass the same argument to writeback_inodes_sb_nr_if_idle in
 the changed code, this would not compile (root-fs_info-sb), but it's
 a minor thing.

 I'm more interested in the background motivation of the change, it's
 clear that it tries to avoid writing data if the devices are congested,
 have you measured an improvement against original behaviour?

 writeback_inodes_sb_nr_if_idle checks if writeback is in progress and
 does not start if this is true. That way this will not hammer the device
 unnecessarily.

 Your code tries to avoid even more hammering of the device when the
 writes do not come from writeback. This may or may not be a good thing
 (and hard to guess without a few tests, or at least I don't see that).

 Shrink delalloc starts the writeback for a given number of pages and
 then hopes they'll be flushed so the reserved space can be reclaimed
 back. If the device is congested, this will not start the writeback and
 it would be very unlikely that total delalloc bytes shrinks. The rest of
 the function relies on asynchronous behaviour, it's even less clear what
 it would do without the writeback call. In the worst case it could block
 on 'wait_event' or at 'btrfs_wait_ordered_extents' in some case, though
 this is just more of a speculation.


 david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC design issues

2012-09-30 Thread David Sterba
On Tue, Sep 25, 2012 at 01:02:39PM -0400, Josef Bacik wrote:
  implemented as follows on top of next/master, in short:
  * disable overcommit completely
  * do the optimistically best guess for the metadata and reserve only up
to the current tree height
 
 So I had tried to do this before, the problem is when height changes our 
 reserve
 changes.  So for things like delalloc we say we have X number of extents and 
 we
 reserve that much space, but then when we run delalloc we re-calculate the
 metadata size for X number extents we've removed and that number could come 
 out
 differently since the height of the tree would have changed.

Understood, I did not came accross this during testing because I did not
reproduce the required conditions that do not seem frequent but
definetelly are possible.

Let me proceed with V2:

* disable overcommit completely
* reserve only up to the current tree height + 1

You can object that the tree height may grow by 2, namely at the early
stage from 0 - 2 if the number of operations is quite high and all of
this finishes within one transaction.

So improvement that also covers that:

* reserve up to MIN( BTRFS_MAX_LEVEL, max(tree-height + 1, 3) )

with the assumption that growing three from 3-4 will take more
operations that could fit into one transaction period even on a fast
device. (More accurate analysis required, but this should give the
idea).

The estimated savings in metadata block reservations start at 3/8 (and
decreases with the larger filesystem use). This estimate does not have
the property of being 'almost exact' as the V1 proposal, so some sort of
overcommit math should also take place. I leave this out for now.

thanks,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html