[PATCH 0/6] random bugfixes of the space management

2010-12-22 Thread Miao Xie
Hello,

I have a bunch of random fixes of the space management in

git://repo.or.cz/linux-btrfs-devel.git space-manage

They are the ENOSPC fixes, as well as fixes for df command.
The first one and the last one fixed the wrong free space information reported
by df command. The second one fixed ENOSPC when there is tiny space in the
filesystem. And The third fixed wrong calculation of stripe size. And the 4th
and 5th patches fixed the chunk allocation problem when the block devices have
no enough space to allocate a default-size chunk.

---
 fs/btrfs/ctree.h   |2 +
 fs/btrfs/extent-tree.c |   71 ++-
 fs/btrfs/super.c   |  147 +++-
 fs/btrfs/volumes.c |  606 +++-
 fs/btrfs/volumes.h |   27 +++
 5 files changed, 682 insertions(+), 171 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] btrfs: fix wrong data space statistics

2010-12-22 Thread Miao Xie
Josef has implemented mixed data/metadata chunks, we must add those chunks'
space just like data chunks.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/super.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 61bd79a..1d21208 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -768,11 +768,10 @@ static int btrfs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
 
rcu_read_lock();
list_for_each_entry_rcu(found, head, list) {
-   if (found-flags  (BTRFS_BLOCK_GROUP_METADATA |
-   BTRFS_BLOCK_GROUP_SYSTEM))
-   total_used_data += found-disk_total;
-   else
+   if (found-flags  BTRFS_BLOCK_GROUP_DATA)
total_used_data += found-disk_used;
+   else
+   total_used_data += found-disk_total;
total_used += found-disk_used;
}
rcu_read_unlock();
-- 
1.7.0.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] btrfs: try to reclaim some space when chunk allocation fails

2010-12-22 Thread Miao Xie
We cannot write data into files when when there is tiny space in the filesystem.

Reproduce steps:
 # mkfs.btrfs /dev/sda1
 # mount /dev/sda1 /mnt
 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99
   (fill the filesystem)
 # umount /mnt
 # mount /dev/sda1 /mnt
 # rm -f /mnt/tmpfile0
 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
   (failed with nospec)

But if we do the last step again, we can write data successfully. The reason of
the problem is that btrfs didn't try to commit the current transaction and
reclaim some space when chunk allocation failed.

This patch fixes it by committing the current transaction to to reclaim some
space when chunk allocation fails.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7e5162e..4bcd875 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3162,8 +3162,12 @@ alloc:
 bytes + 2 * 1024 * 1024,
 alloc_target, 0);
btrfs_end_transaction(trans, root);
-   if (ret  0)
-   return ret;
+   if (ret  0) {
+   if (ret != -ENOSPC)
+   return ret;
+   else
+   goto commit_trans;
+   }
 
if (!data_sinfo) {
btrfs_set_inode_space_info(root, inode);
@@ -3174,6 +3178,7 @@ alloc:
spin_unlock(data_sinfo-lock);
 
/* commit the current transaction and try again */
+commit_trans:
if (!committed  !root-fs_info-open_ioctl_trans) {
committed = 1;
trans = btrfs_join_transaction(root, 1);
-- 
1.7.0.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] btrfs: restructure find_free_dev_extent()

2010-12-22 Thread Miao Xie
- make it return the start position and length of the max free space when it can
  not find a suitable free space.
- make it more readability

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |4 +-
 fs/btrfs/volumes.c |  155 +++
 2 files changed, 91 insertions(+), 68 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4bcd875..7c1a053 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8098,7 +8098,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
mutex_lock(root-fs_info-chunk_mutex);
list_for_each_entry(device, fs_devices-alloc_list, dev_alloc_list) {
u64 min_free = btrfs_block_group_used(block_group-item);
-   u64 dev_offset, max_avail;
+   u64 dev_offset;
 
/*
 * check to make sure we can actually find a chunk with enough
@@ -8106,7 +8106,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
 */
if (device-total_bytes  device-bytes_used + min_free) {
ret = find_free_dev_extent(NULL, device, min_free,
-  dev_offset, max_avail);
+  dev_offset, NULL);
if (!ret)
break;
ret = -1;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e1028f4..15e8c3f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -729,58 +729,82 @@ error:
 }
 
 /*
+ * find_free_dev_extent - find free space in the specified device
+ * @trans: transaction handler
+ * @device:the device which we search the free space in
+ * @num_bytes: the size of the free space that we need
+ * @start: store the start of the free space.
+ * @len:   the size of the free space. that we find, or the size of the max
+ * free space if we don't find suitable free space
+ *
  * this uses a pretty simple search, the expectation is that it is
  * called very infrequently and that a given device has a small number
  * of extents
+ *
+ * @start is used to store the start of the free space if we find. But if we
+ * don't find suitable free space, it will be used to store the start position
+ * of the max free space.
+ *
+ * @len is used to store the size of the free space that we find.
+ * But if we don't find suitable free space, it is used to store the size of
+ * the max free space.
  */
 int find_free_dev_extent(struct btrfs_trans_handle *trans,
 struct btrfs_device *device, u64 num_bytes,
-u64 *start, u64 *max_avail)
+u64 *start, u64 *len)
 {
struct btrfs_key key;
struct btrfs_root *root = device-dev_root;
-   struct btrfs_dev_extent *dev_extent = NULL;
+   struct btrfs_dev_extent *dev_extent;
struct btrfs_path *path;
-   u64 hole_size = 0;
-   u64 last_byte = 0;
-   u64 search_start = 0;
+   u64 hole_size;
+   u64 max_hole_start;
+   u64 max_hole_size;
+   u64 extent_end;
+   u64 search_start;
u64 search_end = device-total_bytes;
int ret;
-   int slot = 0;
-   int start_found;
+   int slot;
struct extent_buffer *l;
 
-   path = btrfs_alloc_path();
-   if (!path)
-   return -ENOMEM;
-   path-reada = 2;
-   start_found = 0;
-
/* FIXME use last free of some kind */
 
/* we don't want to overwrite the superblock on the drive,
 * so we make sure to start at an offset of at least 1MB
 */
-   search_start = max((u64)1024 * 1024, search_start);
+   search_start = 1024 * 1024;
 
-   if (root-fs_info-alloc_start + num_bytes = device-total_bytes)
+   if (root-fs_info-alloc_start + num_bytes = search_end)
search_start = max(root-fs_info-alloc_start, search_start);
 
+   max_hole_start = search_start;
+   max_hole_size = 0;
+
+   if (search_start = search_end) {
+   ret = -ENOSPC;
+   goto error;
+   }
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto error;
+   }
+   path-reada = 2;
+
key.objectid = device-devid;
key.offset = search_start;
key.type = BTRFS_DEV_EXTENT_KEY;
+
ret = btrfs_search_slot(trans, root, key, path, 0, 0);
if (ret  0)
-   goto error;
+   goto out;
if (ret  0) {
ret = btrfs_previous_item(root, path, key.objectid, key.type);
if (ret  0)
-   goto error;
-   if (ret  0)
-   start_found = 1;
+   goto out;
}
-   l = path-nodes[0];
-   btrfs_item_key_to_cpu(l, key, path-slots[0]);
+
while (1) {
  

[PATCH 3/6] btrfs: fix wrong calculation of stripe size

2010-12-22 Thread Miao Xie
There are two tiny problem:
- One is When we check the chunk size is greater than the max chunk size or not,
  we should take mirrors into account, but the original code didn't.
- The other is btrfs shouldn't use the size of the residual free space as the
  length of of a dup chunk when doing chunk allocation. It is because the device
  space that a dup chunk needs is twice as large as the chunk size, if we use
  the size of the residual free space as the length of a dup chunk, we can not
  get enough free space. Fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 177b731..e1028f4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2177,6 +2177,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
int num_stripes = 1;
int min_stripes = 1;
int sub_stripes = 0;
+   int ncopies = 1;
int looped = 0;
int ret;
int index;
@@ -2197,12 +2198,14 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
if (type  (BTRFS_BLOCK_GROUP_DUP)) {
num_stripes = 2;
min_stripes = 2;
+   ncopies = 2;
}
if (type  (BTRFS_BLOCK_GROUP_RAID1)) {
if (fs_devices-rw_devices  2)
return -ENOSPC;
num_stripes = 2;
min_stripes = 2;
+   ncopies = 2;
}
if (type  (BTRFS_BLOCK_GROUP_RAID10)) {
num_stripes = fs_devices-rw_devices;
@@ -2210,6 +2213,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
return -ENOSPC;
num_stripes = ~(u32)1;
sub_stripes = 2;
+   ncopies = 2;
min_stripes = 4;
}
 
@@ -2239,8 +2243,8 @@ again:
map-num_stripes = num_stripes;
}
 
-   if (calc_size * num_stripes  max_chunk_size) {
-   calc_size = max_chunk_size;
+   if (calc_size * num_stripes / ncopies  max_chunk_size) {
+   calc_size = max_chunk_size * ncopies;
do_div(calc_size, num_stripes);
do_div(calc_size, stripe_len);
calc_size *= stripe_len;
@@ -2321,6 +2325,8 @@ again:
if (!looped  max_avail  0) {
looped = 1;
calc_size = max_avail;
+   if (type  BTRFS_BLOCK_GROUP_DUP)
+   calc_size /= 2;
goto again;
}
kfree(map);
-- 
1.7.0.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] btrfs: fix wrong free space information of btrfs

2010-12-22 Thread Miao Xie
When we store data by raid profile in btrfs with two or more different size
disks, df command shows there is some free space in the filesystem, but the
user can not write any data in fact, df command shows the wrong free space
information of btrfs.

 # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
Total devices 2 FS bytes used 28.00KB
devid1 size 5.01GB used 2.03GB path /dev/sda9
devid2 size 10.00GB used 2.01GB path /dev/sda10
 # btrfs device scan /dev/sda9 /dev/sda10
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=tmpfile0 bs=4K count=99
   (fill the filesystem)
 # sync
 # df -TH
 Filesystem TypeSizeUsedAvail   Use%Mounted on
 /dev/sda9  btrfs   17G 8.6G5.4G62% /mnt
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
Total devices 2 FS bytes used 3.99GB
devid1 size 5.01GB used 5.01GB path /dev/sda9
devid2 size 10.00GB used 4.99GB path /dev/sda10

It is because btrfs cannot allocate chunks when one of the pairing disks has
no space, the free space on the other disks can not be used for ever, and should
be subtracted from the total space, but btrfs doesn't subtract this space from
the total. It is strange to the user.

This patch fixes it by calcing the free space that can be used to allocate
chunks.

Implementation:
1. get all the devices free space, and align them by stripe length.
2. sort the devices by the free space.
3. check the free space of the devices,
   3.1. if it is not zero, and then check the number of the devices that has
more free space than this device,
if the number of the devices is beyond the min stripe number, the free
space can be used, and add into total free space.
if the number of the devices is below the min stripe number, we can not
use the free space, the check ends.
   3.2. if the free space is zero, check the next devices, goto 3.1

This implementation is just likely fake chunk allocation.

After appling this patch, df can show correct space information:
 # df -TH
 Filesystem TypeSizeUsedAvail   Use%Mounted on
 /dev/sda9  btrfs   17G 8.6G0   100%/mnt

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |2 +
 fs/btrfs/extent-tree.c |   58 +++-
 fs/btrfs/super.c   |  146 ++--
 fs/btrfs/volumes.c |   84 +++
 fs/btrfs/volumes.h |3 +
 5 files changed, 286 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index af52f6d..a068a5d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2145,6 +2145,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
 int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 struct btrfs_root *root, u64 group_start);
 u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags);
+u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data);
 void btrfs_set_inode_space_info(struct btrfs_root *root, struct inode *ionde);
 void btrfs_clear_space_info_full(struct btrfs_fs_info *info);
 int btrfs_check_data_free_space(struct inode *inode, u64 bytes);
@@ -2188,6 +2189,7 @@ int btrfs_set_block_group_ro(struct btrfs_root *root,
 int btrfs_set_block_group_rw(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
+u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 int level, int *slot);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7c1a053..fd465e1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3090,7 +3090,7 @@ static u64 get_alloc_profile(struct btrfs_root *root, u64 
flags)
return btrfs_reduce_alloc_profile(root, flags);
 }
 
-static u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data)
+u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data)
 {
u64 flags;
 
@@ -8018,6 +8018,62 @@ out:
return ret;
 }
 
+/*
+ * helper to account the unused space of all the readonly block group in the
+ * list. takes mirrors into account.
+ */
+static u64 __btrfs_get_ro_block_group_free_space(struct list_head *groups_list)
+{
+   struct btrfs_block_group_cache *block_group;
+   u64 free_bytes = 0;
+   int factor;
+
+   list_for_each_entry(block_group, groups_list, list) {
+   spin_lock(block_group-lock);
+
+   if (!block_group-ro) {
+   spin_unlock(block_group-lock);
+   continue;
+   }
+
+   if (block_group-flags  (BTRFS_BLOCK_GROUP_RAID1 |
+ 

Re: [PATCH 4/6] btrfs: restructure find_free_dev_extent()

2010-12-22 Thread Arne Jansen
Hi,

this patch seems to have the same intention as the patch I sent to the
list on Dec 11 Fixing the chunk allocator to allow it to better
utilize the devices. The result is quite similar, except that you
left the line

search_start = max(root-fs_info-alloc_start, search_start);

in place, which could lead to disregarding the configured alloc_start.

As both patches address the same problem, it might be good to compare
them in more detail.

--
Arne

On 22.12.2010 11:47, Miao Xie wrote:
 - make it return the start position and length of the max free space when it 
 can
not find a suitable free space.
 - make it more readability
 
 Signed-off-by: Miao Xiemi...@cn.fujitsu.com
 ---
   fs/btrfs/extent-tree.c |4 +-
   fs/btrfs/volumes.c |  155 
 +++
   2 files changed, 91 insertions(+), 68 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 4bcd875..7c1a053 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -8098,7 +8098,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
 bytenr)
   mutex_lock(root-fs_info-chunk_mutex);
   list_for_each_entry(device,fs_devices-alloc_list, dev_alloc_list) {
   u64 min_free = btrfs_block_group_used(block_group-item);
 - u64 dev_offset, max_avail;
 + u64 dev_offset;
 
   /*
* check to make sure we can actually find a chunk with enough
 @@ -8106,7 +8106,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
 bytenr)
*/
   if (device-total_bytes  device-bytes_used + min_free) {
   ret = find_free_dev_extent(NULL, device, min_free,
 - dev_offset,max_avail);
 + dev_offset, NULL);
   if (!ret)
   break;
   ret = -1;
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index e1028f4..15e8c3f 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -729,58 +729,82 @@ error:
   }
 
   /*
 + * find_free_dev_extent - find free space in the specified device
 + * @trans:   transaction handler
 + * @device:  the device which we search the free space in
 + * @num_bytes:   the size of the free space that we need
 + * @start:   store the start of the free space.
 + * @len: the size of the free space. that we find, or the size of the max
 + *   free space if we don't find suitable free space
 + *
* this uses a pretty simple search, the expectation is that it is
* called very infrequently and that a given device has a small number
* of extents
 + *
 + * @start is used to store the start of the free space if we find. But if we
 + * don't find suitable free space, it will be used to store the start 
 position
 + * of the max free space.
 + *
 + * @len is used to store the size of the free space that we find.
 + * But if we don't find suitable free space, it is used to store the size of
 + * the max free space.
*/
   int find_free_dev_extent(struct btrfs_trans_handle *trans,
struct btrfs_device *device, u64 num_bytes,
 -  u64 *start, u64 *max_avail)
 +  u64 *start, u64 *len)
   {
   struct btrfs_key key;
   struct btrfs_root *root = device-dev_root;
 - struct btrfs_dev_extent *dev_extent = NULL;
 + struct btrfs_dev_extent *dev_extent;
   struct btrfs_path *path;
 - u64 hole_size = 0;
 - u64 last_byte = 0;
 - u64 search_start = 0;
 + u64 hole_size;
 + u64 max_hole_start;
 + u64 max_hole_size;
 + u64 extent_end;
 + u64 search_start;
   u64 search_end = device-total_bytes;
   int ret;
 - int slot = 0;
 - int start_found;
 + int slot;
   struct extent_buffer *l;
 
 - path = btrfs_alloc_path();
 - if (!path)
 - return -ENOMEM;
 - path-reada = 2;
 - start_found = 0;
 -
   /* FIXME use last free of some kind */
 
   /* we don't want to overwrite the superblock on the drive,
* so we make sure to start at an offset of at least 1MB
*/
 - search_start = max((u64)1024 * 1024, search_start);
 + search_start = 1024 * 1024;
 
 - if (root-fs_info-alloc_start + num_bytes= device-total_bytes)
 + if (root-fs_info-alloc_start + num_bytes= search_end)
   search_start = max(root-fs_info-alloc_start, search_start);
 
 + max_hole_start = search_start;
 + max_hole_size = 0;
 +
 + if (search_start= search_end) {
 + ret = -ENOSPC;
 + goto error;
 + }
 +
 + path = btrfs_alloc_path();
 + if (!path) {
 + ret = -ENOMEM;
 + goto error;
 + }
 + path-reada = 2;
 +
   key.objectid = device-devid;
   key.offset = search_start;
   key.type = BTRFS_DEV_EXTENT_KEY;
 +
   ret = btrfs_search_slot(trans, 

Re: [PATCH 2/6] btrfs: try to reclaim some space when chunk allocation fails

2010-12-22 Thread Josef Bacik
On Wed, Dec 22, 2010 at 06:47:20PM +0800, Miao Xie wrote:
 We cannot write data into files when when there is tiny space in the 
 filesystem.
 
 Reproduce steps:
  # mkfs.btrfs /dev/sda1
  # mount /dev/sda1 /mnt
  # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
  # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99
(fill the filesystem)
  # umount /mnt
  # mount /dev/sda1 /mnt
  # rm -f /mnt/tmpfile0
  # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
(failed with nospec)
 
 But if we do the last step again, we can write data successfully. The reason 
 of
 the problem is that btrfs didn't try to commit the current transaction and
 reclaim some space when chunk allocation failed.
 
 This patch fixes it by committing the current transaction to to reclaim some
 space when chunk allocation fails.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com

Reviewed-by: Josef Bacik jo...@redhat.com

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] random bugfixes of the space management

2010-12-22 Thread Chris Mason
Excerpts from Josef Bacik's message of 2010-12-22 14:22:54 -0500:
 On Wed, Dec 22, 2010 at 06:47:08PM +0800, Miao Xie wrote:
  Hello,
  
  I have a bunch of random fixes of the space management in
  
  git://repo.or.cz/linux-btrfs-devel.git space-manage
  
  They are the ENOSPC fixes, as well as fixes for df command.
  The first one and the last one fixed the wrong free space information 
  reported
  by df command. The second one fixed ENOSPC when there is tiny space in the
  filesystem. And The third fixed wrong calculation of stripe size. And the 
  4th
  and 5th patches fixed the chunk allocation problem when the block devices 
  have
  no enough space to allocate a default-size chunk.
 
 
 I'll review the rest of them when I have more time, thanks for these Miao.

For now they are going with the new compression code into a new branch
for 2.6.38 in my git tree.  I might have to rebase as patches are added
and removed, but these will all go in.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-22 Thread Andy Isaacson
On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote:
 On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote:
  I have a directory with 1.2M files in it, which makes readdir very slow
  on btrfs with cold caches (although it's reasonably fast with hot caches
  as in the first example below):
 
 Sounds like:
 
 Bug 21562 - btrfs is dead slow due to fragmentation
 https://bugzilla.kernel.org/show_bug.cgi?id=21562

Hmmm, how do I look at the btree layout for a given inode?

btrfs-image for this filesystem is 1.7GiB .bz2, so I'm afraid it's not
reasonable to publish it.

-andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-22 Thread Hugo Mills
On Wed, Dec 22, 2010 at 12:39:15PM -0800, Andy Isaacson wrote:
 On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote:
  On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote:
   I have a directory with 1.2M files in it, which makes readdir very slow
   on btrfs with cold caches (although it's reasonably fast with hot caches
   as in the first example below):
  
  Sounds like:
  
  Bug 21562 - btrfs is dead slow due to fragmentation
  https://bugzilla.kernel.org/show_bug.cgi?id=21562
 
 Hmmm, how do I look at the btree layout for a given inode?

   There's documentation on the tree structures at [1] and [2]. If you
know the inode number of the object you're interested in, you need to
look in the FS tree for the subvolume it's in and find the
(inode_number, EXTENT_DATA, ...) keys for the file. Each of those
records will reference an individual disk extent -- and you can get
the disk start position and length of the extent from the data stored
under the key.

   Hugo.

[1] https://btrfs.wiki.kernel.org/index.php/Btree_Items
[2] https://btrfs.wiki.kernel.org/index.php/Data_Structures

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Hail and greetings.  We are a flat-pack invasion force from ---   
 Planet Ikea. We come in pieces. 


signature.asc
Description: Digital signature


Re: [PATCH 4/6] btrfs: restructure find_free_dev_extent()

2010-12-22 Thread Miao Xie
On wed, 22 Dec 2010 13:07:18 +0100, Arne Jansen wrote:
 this patch seems to have the same intention as the patch I sent to the
 list on Dec 11 Fixing the chunk allocator to allow it to better
 utilize the devices. The result is quite similar, except that you
 left the line

Ahhh, partial code is similar indeed. But I think this patch is different with
yours. this function can return the offset of the max free space when it can not
find a suitable free space now, it is the main purpose of this patch.
I think this is also the biggest difference between this patch and yours.

The original function is what I need, so I retain it, and this is why the result
is similar.

 search_start = max(root-fs_info-alloc_start, search_start);
 
 in place, which could lead to disregarding the configured alloc_start.

According to the original code, I think alloc_start just is a suggested
parameter, if there is no enough space on the device, we just ignore it.

Sometimes, we must retain the original semantic.

Thanks
Miao

 As both patches address the same problem, it might be good to compare
 them in more detail.
 
 --
 Arne
 
 On 22.12.2010 11:47, Miao Xie wrote:
 - make it return the start position and length of the max free space when it 
 can
 not find a suitable free space.
 - make it more readability

 Signed-off-by: Miao Xiemi...@cn.fujitsu.com
 ---
fs/btrfs/extent-tree.c |4 +-
fs/btrfs/volumes.c |  155 
 +++
2 files changed, 91 insertions(+), 68 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 4bcd875..7c1a053 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -8098,7 +8098,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
 bytenr)
  mutex_lock(root-fs_info-chunk_mutex);
  list_for_each_entry(device,fs_devices-alloc_list, dev_alloc_list) {
  u64 min_free = btrfs_block_group_used(block_group-item);
 -u64 dev_offset, max_avail;
 +u64 dev_offset;

  /*
   * check to make sure we can actually find a chunk with enough
 @@ -8106,7 +8106,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
 bytenr)
   */
  if (device-total_bytes   device-bytes_used + min_free) {
  ret = find_free_dev_extent(NULL, device, min_free,
 -dev_offset,max_avail);
 +dev_offset, NULL);
  if (!ret)
  break;
  ret = -1;
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index e1028f4..15e8c3f 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -729,58 +729,82 @@ error:
}

/*
 + * find_free_dev_extent - find free space in the specified device
 + * @trans:  transaction handler
 + * @device: the device which we search the free space in
 + * @num_bytes:  the size of the free space that we need
 + * @start:  store the start of the free space.
 + * @len:the size of the free space. that we find, or the size of the max
 + *  free space if we don't find suitable free space
 + *
 * this uses a pretty simple search, the expectation is that it is
 * called very infrequently and that a given device has a small number
 * of extents
 + *
 + * @start is used to store the start of the free space if we find. But if we
 + * don't find suitable free space, it will be used to store the start 
 position
 + * of the max free space.
 + *
 + * @len is used to store the size of the free space that we find.
 + * But if we don't find suitable free space, it is used to store the size of
 + * the max free space.
 */
int find_free_dev_extent(struct btrfs_trans_handle *trans,
   struct btrfs_device *device, u64 num_bytes,
 - u64 *start, u64 *max_avail)
 + u64 *start, u64 *len)
{
  struct btrfs_key key;
  struct btrfs_root *root = device-dev_root;
 -struct btrfs_dev_extent *dev_extent = NULL;
 +struct btrfs_dev_extent *dev_extent;
  struct btrfs_path *path;
 -u64 hole_size = 0;
 -u64 last_byte = 0;
 -u64 search_start = 0;
 +u64 hole_size;
 +u64 max_hole_start;
 +u64 max_hole_size;
 +u64 extent_end;
 +u64 search_start;
  u64 search_end = device-total_bytes;
  int ret;
 -int slot = 0;
 -int start_found;
 +int slot;
  struct extent_buffer *l;

 -path = btrfs_alloc_path();
 -if (!path)
 -return -ENOMEM;
 -path-reada = 2;
 -start_found = 0;
 -
  /* FIXME use last free of some kind */

  /* we don't want to overwrite the superblock on the drive,
   * so we make sure to start at an offset of at least 1MB
   */
 -search_start = max((u64)1024 * 1024, search_start);
 +search_start = 1024 * 1024;

 -if (root-fs_info-alloc_start + num_bytes=