Re: [RFC v4+ hot_track 03/19] vfs: add I/O frequency update function

2012-11-05 Thread Steven Whitehouse
Hi,

On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
   Add some util helpers to update access frequencies
 for one file or its range.
 
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|  179 
 ++
  fs/hot_tracking.h|7 ++
  include/linux/hot_tracking.h |2 +
  3 files changed, 188 insertions(+), 0 deletions(-)
 
 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index 68591f0..0a7d9a3 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -172,6 +172,137 @@ static void hot_inode_tree_exit(struct hot_info *root)
   }
  }
  
 +struct hot_inode_item
 +*hot_inode_item_find(struct hot_info *root, u64 ino)
 +{
 + struct hot_inode_item *he;
 + int ret;
 +
 +again:
 + spin_lock(root-lock);
 + he = radix_tree_lookup(root-hot_inode_tree, ino);
 + if (he) {
 + kref_get(he-hot_inode.refs);
 + spin_unlock(root-lock);
 + return he;
 + }
 + spin_unlock(root-lock);
 +
 + he = kmem_cache_zalloc(hot_inode_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
This doesn't look quite right... which of these two did you mean? I
assume probably just GFP_NOFS

 + if (!he)
 + return ERR_PTR(-ENOMEM);
 +
 + hot_inode_item_init(he, ino, root-hot_inode_tree);
 +
 + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
 + if (ret) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + return ERR_PTR(ret);
 + }
 +
 + spin_lock(root-lock);
 + ret = radix_tree_insert(root-hot_inode_tree, ino, he);
 + if (ret == -EEXIST) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 + goto again;
 + }
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 +
 + kref_get(he-hot_inode.refs);
 + return he;
 +}
 +EXPORT_SYMBOL_GPL(hot_inode_item_find);
 +
 +static struct hot_range_item
 +*hot_range_item_find(struct hot_inode_item *he,
 + u32 start)
 +{
 + struct hot_range_item *hr;
 + int ret;
 +
 +again:
 + spin_lock(he-lock);
 + hr = radix_tree_lookup(he-hot_range_tree, start);
 + if (hr) {
 + kref_get(hr-hot_range.refs);
 + spin_unlock(he-lock);
 + return hr;
 + }
 + spin_unlock(he-lock);
 +
 + hr = kmem_cache_zalloc(hot_range_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
Likewise, here too.

Steve.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 09/19] vfs: add one work queue

2012-11-05 Thread Steven Whitehouse
Hi,

On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
   Add a per-superblock workqueue and a delayed_work
 to run periodic work to update map info on each superblock.
 
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|   85 
 ++
  fs/hot_tracking.h|3 +
  include/linux/hot_tracking.h |3 +
  3 files changed, 91 insertions(+), 0 deletions(-)
 
 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index fff0038..0ef9cad 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -15,9 +15,12 @@
  #include linux/module.h
  #include linux/spinlock.h
  #include linux/hardirq.h
 +#include linux/kthread.h
 +#include linux/freezer.h
  #include linux/fs.h
  #include linux/blkdev.h
  #include linux/types.h
 +#include linux/list_sort.h
  #include linux/limits.h
  #include hot_tracking.h
  
 @@ -557,6 +560,67 @@ static void hot_map_array_exit(struct hot_info *root)
   }
  }
  
 +/* Temperature compare function*/
 +static int hot_temp_cmp(void *priv, struct list_head *a,
 + struct list_head *b)
 +{
 + struct hot_comm_item *ap =
 + container_of(a, struct hot_comm_item, n_list);
 + struct hot_comm_item *bp =
 + container_of(b, struct hot_comm_item, n_list);
 +
 + int diff = ap-hot_freq_data.last_temp
 + - bp-hot_freq_data.last_temp;
 + if (diff  0)
 + return -1;
 + if (diff  0)
 + return 1;
 + return 0;
 +}
 +
 +/*
 + * Every sync period we update temperatures for
 + * each hot inode item and hot range item for aging
 + * purposes.
 + */
 +static void hot_update_worker(struct work_struct *work)
 +{
 + struct hot_info *root = container_of(to_delayed_work(work),
 + struct hot_info, update_work);
 + struct hot_inode_item *hi_nodes[8];
 + u64 ino = 0;
 + int i, n;
 +
 + while (1) {
 + n = radix_tree_gang_lookup(root-hot_inode_tree,
 +(void **)hi_nodes, ino,
 +ARRAY_SIZE(hi_nodes));
 + if (!n)
 + break;
 +
 + ino = hi_nodes[n - 1]-i_ino + 1;
 + for (i = 0; i  n; i++) {
 + kref_get(hi_nodes[i]-hot_inode.refs);
 + hot_map_array_update(
 + hi_nodes[i]-hot_inode.hot_freq_data, root);
 + hot_range_update(hi_nodes[i], root);
 + hot_inode_item_put(hi_nodes[i]);
 + }
 + }
 +
 + /* Sort temperature map info */
 + for (i = 0; i  HEAT_MAP_SIZE; i++) {
 + list_sort(NULL, root-heat_inode_map[i].node_list,
 + hot_temp_cmp);
 + list_sort(NULL, root-heat_range_map[i].node_list,
 + hot_temp_cmp);
 + }
 +

If this list can potentially have one (or more) entries per inode, then
filesystems with a lot of inodes (millions) may potentially exceed the
max size of list which list_sort() can handle. If that happens it still
works, but you'll get a warning message and it won't be as efficient.

It is something that we've run into with list_sort() and GFS2, but it
only happens very rarely,

Steve.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 03/19] vfs: add I/O frequency update function

2012-11-05 Thread Zhi Yong Wu
On Mon, Nov 5, 2012 at 7:07 PM, Steven Whitehouse swhit...@redhat.com wrote:
 Hi,

 On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add some util helpers to update access frequencies
 for one file or its range.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|  179 
 ++
  fs/hot_tracking.h|7 ++
  include/linux/hot_tracking.h |2 +
  3 files changed, 188 insertions(+), 0 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index 68591f0..0a7d9a3 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -172,6 +172,137 @@ static void hot_inode_tree_exit(struct hot_info *root)
   }
  }

 +struct hot_inode_item
 +*hot_inode_item_find(struct hot_info *root, u64 ino)
 +{
 + struct hot_inode_item *he;
 + int ret;
 +
 +again:
 + spin_lock(root-lock);
 + he = radix_tree_lookup(root-hot_inode_tree, ino);
 + if (he) {
 + kref_get(he-hot_inode.refs);
 + spin_unlock(root-lock);
 + return he;
 + }
 + spin_unlock(root-lock);
 +
 + he = kmem_cache_zalloc(hot_inode_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 This doesn't look quite right... which of these two did you mean? I
 assume probably just GFP_NOFS
Yes, good catch, thanks.

 + if (!he)
 + return ERR_PTR(-ENOMEM);
 +
 + hot_inode_item_init(he, ino, root-hot_inode_tree);
 +
 + ret = radix_tree_preload(GFP_NOFS  ~__GFP_HIGHMEM);
 + if (ret) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + return ERR_PTR(ret);
 + }
 +
 + spin_lock(root-lock);
 + ret = radix_tree_insert(root-hot_inode_tree, ino, he);
 + if (ret == -EEXIST) {
 + kmem_cache_free(hot_inode_item_cachep, he);
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 + goto again;
 + }
 + spin_unlock(root-lock);
 + radix_tree_preload_end();
 +
 + kref_get(he-hot_inode.refs);
 + return he;
 +}
 +EXPORT_SYMBOL_GPL(hot_inode_item_find);
 +
 +static struct hot_range_item
 +*hot_range_item_find(struct hot_inode_item *he,
 + u32 start)
 +{
 + struct hot_range_item *hr;
 + int ret;
 +
 +again:
 + spin_lock(he-lock);
 + hr = radix_tree_lookup(he-hot_range_tree, start);
 + if (hr) {
 + kref_get(hr-hot_range.refs);
 + spin_unlock(he-lock);
 + return hr;
 + }
 + spin_unlock(he-lock);
 +
 + hr = kmem_cache_zalloc(hot_range_item_cachep,
 + GFP_KERNEL | GFP_NOFS);
 Likewise, here too.
ditto

 Steve.






-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 09/19] vfs: add one work queue

2012-11-05 Thread Zhi Yong Wu
On Mon, Nov 5, 2012 at 7:21 PM, Steven Whitehouse swhit...@redhat.com wrote:
 Hi,

 On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   Add a per-superblock workqueue and a delayed_work
 to run periodic work to update map info on each superblock.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c|   85 
 ++
  fs/hot_tracking.h|3 +
  include/linux/hot_tracking.h |3 +
  3 files changed, 91 insertions(+), 0 deletions(-)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index fff0038..0ef9cad 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -15,9 +15,12 @@
  #include linux/module.h
  #include linux/spinlock.h
  #include linux/hardirq.h
 +#include linux/kthread.h
 +#include linux/freezer.h
  #include linux/fs.h
  #include linux/blkdev.h
  #include linux/types.h
 +#include linux/list_sort.h
  #include linux/limits.h
  #include hot_tracking.h

 @@ -557,6 +560,67 @@ static void hot_map_array_exit(struct hot_info *root)
   }
  }

 +/* Temperature compare function*/
 +static int hot_temp_cmp(void *priv, struct list_head *a,
 + struct list_head *b)
 +{
 + struct hot_comm_item *ap =
 + container_of(a, struct hot_comm_item, n_list);
 + struct hot_comm_item *bp =
 + container_of(b, struct hot_comm_item, n_list);
 +
 + int diff = ap-hot_freq_data.last_temp
 + - bp-hot_freq_data.last_temp;
 + if (diff  0)
 + return -1;
 + if (diff  0)
 + return 1;
 + return 0;
 +}
 +
 +/*
 + * Every sync period we update temperatures for
 + * each hot inode item and hot range item for aging
 + * purposes.
 + */
 +static void hot_update_worker(struct work_struct *work)
 +{
 + struct hot_info *root = container_of(to_delayed_work(work),
 + struct hot_info, update_work);
 + struct hot_inode_item *hi_nodes[8];
 + u64 ino = 0;
 + int i, n;
 +
 + while (1) {
 + n = radix_tree_gang_lookup(root-hot_inode_tree,
 +(void **)hi_nodes, ino,
 +ARRAY_SIZE(hi_nodes));
 + if (!n)
 + break;
 +
 + ino = hi_nodes[n - 1]-i_ino + 1;
 + for (i = 0; i  n; i++) {
 + kref_get(hi_nodes[i]-hot_inode.refs);
 + hot_map_array_update(
 + hi_nodes[i]-hot_inode.hot_freq_data, root);
 + hot_range_update(hi_nodes[i], root);
 + hot_inode_item_put(hi_nodes[i]);
 + }
 + }
 +
 + /* Sort temperature map info */
 + for (i = 0; i  HEAT_MAP_SIZE; i++) {
 + list_sort(NULL, root-heat_inode_map[i].node_list,
 + hot_temp_cmp);
 + list_sort(NULL, root-heat_range_map[i].node_list,
 + hot_temp_cmp);
 + }
 +

 If this list can potentially have one (or more) entries per inode, then
Only one hot_inode_item per inode, while maybe multiple
hot_range_items per inode.
 filesystems with a lot of inodes (millions) may potentially exceed the
 max size of list which list_sort() can handle. If that happens it still
 works, but you'll get a warning message and it won't be as efficient.
I haven't do so large scale test. If we want to find that issue, we
need to do large scale performance test, before that, i want to make
sure the code change is correct at first.
To be honest, for that issue you pointed to, i also have such
concern.But list_sort() performance looks good from the test result of
the following URL:
https://lkml.org/lkml/2010/1/20/485


 It is something that we've run into with list_sort() and GFS2, but it
 only happens very rarely,
Beside list_sort(), do you have any other way to share? For this
concern, how does GFS2 resolve it?


 Steve.






-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Corruption at start of files

2012-11-05 Thread Gabriel
Here is what I see in my kern.log (see below).

For me this first happened when the filesystem was close to full (less
than 1GB left), but someone on the irc channel mentioned a similar
problem on suspend to ram.

The files that have checksum failures end up with their first 4k filled
with 0x01 bytes. They were seeing a lot of writes; things like firefox
session data and cookie data, plus files that disappeared before I
could call inode-resolve on them.

I was running 3.6.3 when this happened; I've upgraded to -rcs since
but I haven't tried to reproduce the bug deliberately. I didn't see
relevant changes in the changelog.

Oct 31 17:06:31 moulinex kernel: [93539.008465] BTRFS warning (device dm-16): 
Aborting unused transaction.
Oct 31 17:06:31 moulinex kernel: [93539.011257] BTRFS warning (device dm-16): 
Aborting unused transaction.
Oct 31 17:06:31 moulinex kernel: [93539.017137] BTRFS warning (device dm-16): 
Aborting unused transaction.
Oct 31 17:06:46 moulinex kernel: [93554.728793] use_block_rsv: 16 callbacks 
suppressed
Oct 31 17:06:46 moulinex kernel: [93554.728795] btrfs: block rsv returned -28
Oct 31 17:06:46 moulinex kernel: [93554.728796] [ cut here 
]
Oct 31 17:06:46 moulinex kernel: [93554.728818] WARNING: at 
/home/apw/COD/linux/fs/btrfs/extent-tree.c:6323 use_block_rsv+0x19f/0x1b0 
[btrfs]()
Oct 31 17:06:46 moulinex kernel: [93554.728819] Hardware name: System Product 
Name
Oct 31 17:06:46 moulinex kernel: [93554.728820] Modules linked in: 
snd_seq_dummy vhost_net macvtap macvlan xt_recent bnep rfcomm bluetooth 
snd_hrtimer nls_utf8 sch_fq_codel ebtable_nat ebtables xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE iptable_nat bridge stp llc ppdev lp parport 
deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common 
camellia_generic camellia_x86_64 serpent_sse2_x86_64 glue_helper lrw 
serpent_generic xts gf128mul blowfish_generic blowfish_x86_64 blowfish_common 
cast5 des_generic xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo 
binfmt_misc dm_crypt snd_hda_codec_hdmi snd_hda_codec_realtek eeepc_wmi 
asus_wmi sparse_keymap coretemp kvm_intel kvm dm_multipath scsi_dh microcode 
arc4 joydev snd_hda_intel snd_hda_codec snd_hwdep snd_pcm rt61pci rt2x00pci 
rt2x00lib snd_seq_midi snd_rawmidi mac80211 snd_seq_midi_event snd_seq 
snd_timer snd_seq_device snd cfg80211 soundcore snd_page_alloc eeprom_93cx6 
serio_raw lpc_ich mei mac_hid k8temp hw
 mon_vid i2c_nforce2 firewire_sbp2 firew
Oct 31 17:06:46 moulinex kernel: ire_core crc_itu_t psmouse ip6t_REJECT xt_hl 
ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_multiport xt_limit 
xt_tcpudp xt_addrtype xt_state ip6table_filter ip6_tables 
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter 
ip_tables x_tables btrfs zlib_deflate libcrc32c raid10 raid0 multipath linear 
raid456 async_pq async_xor xor async_memcpy async_raid6_recov hid_generic 
raid6_pq async_tx hid_cherry usbhid hid raid1 ghash_clmulni_intel sata_via wmi 
aesni_intel ablk_helper cryptd aes_x86_64 r8169 i915 drm_kms_helper drm 
i2c_algo_bit video [last unloaded: ipmi_msghandler]
Oct 31 17:06:46 moulinex kernel: [93554.728873] Pid: 2230, comm: 
btrfs-endio-wri Tainted: GW3.6.3-030603-generic #201210211349
Oct 31 17:06:46 moulinex kernel: [93554.728874] Call Trace:
Oct 31 17:06:46 moulinex kernel: [93554.728880]  [81056f6f] 
warn_slowpath_common+0x7f/0xc0
Oct 31 17:06:46 moulinex kernel: [93554.728882]  [81056fca] 
warn_slowpath_null+0x1a/0x20
Oct 31 17:06:46 moulinex kernel: [93554.728889]  [a01feedf] 
use_block_rsv+0x19f/0x1b0 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728897]  [a020260d] 
btrfs_alloc_free_block+0x3d/0x220 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728904]  [a01ef38d] ? 
balance_level+0xcd/0x890 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728906]  [81332e10] ? 
rb_insert_color+0x110/0x150
Oct 31 17:06:46 moulinex kernel: [93554.728916]  [a022f16c] ? 
read_extent_buffer+0xbc/0x120 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728918]  [81178ebd] ? 
kmem_cache_alloc_trace+0x12d/0x150
Oct 31 17:06:46 moulinex kernel: [93554.728925]  [a01ee3b2] 
__btrfs_cow_block+0x122/0x4f0 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728927]  [81136892] ? 
set_page_dirty+0x62/0x70
Oct 31 17:06:46 moulinex kernel: [93554.728930]  [8169f37e] ? 
_raw_spin_lock+0xe/0x20
Oct 31 17:06:46 moulinex kernel: [93554.728936]  [a01ee87c] 
btrfs_cow_block+0xfc/0x220 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728943]  [a01f29f8] 
btrfs_search_slot+0x368/0x740 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728951]  [a0206e84] 
btrfs_lookup_csum+0x74/0x190 [btrfs]
Oct 31 17:06:46 moulinex kernel: [93554.728953]  [81179cfc] ? 
kmem_cache_alloc+0x11c/0x150
Oct 31 17:06:46 moulinex kernel: [93554.728960]  

Production use with vanilla 3.6.6

2012-11-05 Thread Stefan Priebe - Profihost AG

Hello list,

is btrfs ready for production use in 3.6.6? Or should i backport fixes 
from 3.7-rc?


Is it planned to have a stable kernel which will get all btrfs fixes 
backported?


Greets
Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v4+ hot_track 09/19] vfs: add one work queue

2012-11-05 Thread Steven Whitehouse
Hi,

On Mon, 2012-11-05 at 19:55 +0800, Zhi Yong Wu wrote:
 On Mon, Nov 5, 2012 at 7:21 PM, Steven Whitehouse swhit...@redhat.com wrote:
  Hi,
 
  On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Add a per-superblock workqueue and a delayed_work
  to run periodic work to update map info on each superblock.
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   fs/hot_tracking.c|   85 
  ++
   fs/hot_tracking.h|3 +
   include/linux/hot_tracking.h |3 +
   3 files changed, 91 insertions(+), 0 deletions(-)
 
  diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
  index fff0038..0ef9cad 100644
  --- a/fs/hot_tracking.c
  +++ b/fs/hot_tracking.c
  @@ -15,9 +15,12 @@
   #include linux/module.h
   #include linux/spinlock.h
   #include linux/hardirq.h
  +#include linux/kthread.h
  +#include linux/freezer.h
   #include linux/fs.h
   #include linux/blkdev.h
   #include linux/types.h
  +#include linux/list_sort.h
   #include linux/limits.h
   #include hot_tracking.h
 
  @@ -557,6 +560,67 @@ static void hot_map_array_exit(struct hot_info *root)
}
   }
 
  +/* Temperature compare function*/
  +static int hot_temp_cmp(void *priv, struct list_head *a,
  + struct list_head *b)
  +{
  + struct hot_comm_item *ap =
  + container_of(a, struct hot_comm_item, n_list);
  + struct hot_comm_item *bp =
  + container_of(b, struct hot_comm_item, n_list);
  +
  + int diff = ap-hot_freq_data.last_temp
  + - bp-hot_freq_data.last_temp;
  + if (diff  0)
  + return -1;
  + if (diff  0)
  + return 1;
  + return 0;
  +}
  +
  +/*
  + * Every sync period we update temperatures for
  + * each hot inode item and hot range item for aging
  + * purposes.
  + */
  +static void hot_update_worker(struct work_struct *work)
  +{
  + struct hot_info *root = container_of(to_delayed_work(work),
  + struct hot_info, update_work);
  + struct hot_inode_item *hi_nodes[8];
  + u64 ino = 0;
  + int i, n;
  +
  + while (1) {
  + n = radix_tree_gang_lookup(root-hot_inode_tree,
  +(void **)hi_nodes, ino,
  +ARRAY_SIZE(hi_nodes));
  + if (!n)
  + break;
  +
  + ino = hi_nodes[n - 1]-i_ino + 1;
  + for (i = 0; i  n; i++) {
  + kref_get(hi_nodes[i]-hot_inode.refs);
  + hot_map_array_update(
  + hi_nodes[i]-hot_inode.hot_freq_data, root);
  + hot_range_update(hi_nodes[i], root);
  + hot_inode_item_put(hi_nodes[i]);
  + }
  + }
  +
  + /* Sort temperature map info */
  + for (i = 0; i  HEAT_MAP_SIZE; i++) {
  + list_sort(NULL, root-heat_inode_map[i].node_list,
  + hot_temp_cmp);
  + list_sort(NULL, root-heat_range_map[i].node_list,
  + hot_temp_cmp);
  + }
  +
 
  If this list can potentially have one (or more) entries per inode, then
 Only one hot_inode_item per inode, while maybe multiple
 hot_range_items per inode.
  filesystems with a lot of inodes (millions) may potentially exceed the
  max size of list which list_sort() can handle. If that happens it still
  works, but you'll get a warning message and it won't be as efficient.
 I haven't do so large scale test. If we want to find that issue, we
 need to do large scale performance test, before that, i want to make
 sure the code change is correct at first.
 To be honest, for that issue you pointed to, i also have such
 concern.But list_sort() performance looks good from the test result of
 the following URL:
 https://lkml.org/lkml/2010/1/20/485
 
Yes, I think it is good. Also, even when it says that it's performance
is poor (via the warning message) it is still much better than the
alternative (of not sorting) in the GFS2 case. So currently our
workaround is to ignore the warning. Due to what we using it for
(sorting the data blocks for ordered writeback) we only see it very
occasionally when there has been lots of data write activity with little
journal activity on a node with lots of RAM.

 
  It is something that we've run into with list_sort() and GFS2, but it
  only happens very rarely,
 Beside list_sort(), do you have any other way to share? For this
 concern, how does GFS2 resolve it?
 
That is an ongoing investigation :-)

I've pondered various options... increase temp variable space in
list_sort(), not using list_sort() and insertion sorting the blocks
instead, flushing the ordered write data early if the list gets too
long, figuring out how to remove blocks written back by the VM from the
list before the sort, and various other 

Re: [RFC v4+ hot_track 09/19] vfs: add one work queue

2012-11-05 Thread Zhi Yong Wu
On Mon, Nov 5, 2012 at 8:07 PM, Steven Whitehouse swhit...@redhat.com wrote:
 Hi,

 On Mon, 2012-11-05 at 19:55 +0800, Zhi Yong Wu wrote:
 On Mon, Nov 5, 2012 at 7:21 PM, Steven Whitehouse swhit...@redhat.com 
 wrote:
  Hi,
 
  On Mon, 2012-10-29 at 12:30 +0800, zwu.ker...@gmail.com wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
Add a per-superblock workqueue and a delayed_work
  to run periodic work to update map info on each superblock.
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   fs/hot_tracking.c|   85 
  ++
   fs/hot_tracking.h|3 +
   include/linux/hot_tracking.h |3 +
   3 files changed, 91 insertions(+), 0 deletions(-)
 
  diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
  index fff0038..0ef9cad 100644
  --- a/fs/hot_tracking.c
  +++ b/fs/hot_tracking.c
  @@ -15,9 +15,12 @@
   #include linux/module.h
   #include linux/spinlock.h
   #include linux/hardirq.h
  +#include linux/kthread.h
  +#include linux/freezer.h
   #include linux/fs.h
   #include linux/blkdev.h
   #include linux/types.h
  +#include linux/list_sort.h
   #include linux/limits.h
   #include hot_tracking.h
 
  @@ -557,6 +560,67 @@ static void hot_map_array_exit(struct hot_info *root)
}
   }
 
  +/* Temperature compare function*/
  +static int hot_temp_cmp(void *priv, struct list_head *a,
  + struct list_head *b)
  +{
  + struct hot_comm_item *ap =
  + container_of(a, struct hot_comm_item, n_list);
  + struct hot_comm_item *bp =
  + container_of(b, struct hot_comm_item, n_list);
  +
  + int diff = ap-hot_freq_data.last_temp
  + - bp-hot_freq_data.last_temp;
  + if (diff  0)
  + return -1;
  + if (diff  0)
  + return 1;
  + return 0;
  +}
  +
  +/*
  + * Every sync period we update temperatures for
  + * each hot inode item and hot range item for aging
  + * purposes.
  + */
  +static void hot_update_worker(struct work_struct *work)
  +{
  + struct hot_info *root = container_of(to_delayed_work(work),
  + struct hot_info, update_work);
  + struct hot_inode_item *hi_nodes[8];
  + u64 ino = 0;
  + int i, n;
  +
  + while (1) {
  + n = radix_tree_gang_lookup(root-hot_inode_tree,
  +(void **)hi_nodes, ino,
  +ARRAY_SIZE(hi_nodes));
  + if (!n)
  + break;
  +
  + ino = hi_nodes[n - 1]-i_ino + 1;
  + for (i = 0; i  n; i++) {
  + kref_get(hi_nodes[i]-hot_inode.refs);
  + hot_map_array_update(
  + hi_nodes[i]-hot_inode.hot_freq_data, 
  root);
  + hot_range_update(hi_nodes[i], root);
  + hot_inode_item_put(hi_nodes[i]);
  + }
  + }
  +
  + /* Sort temperature map info */
  + for (i = 0; i  HEAT_MAP_SIZE; i++) {
  + list_sort(NULL, root-heat_inode_map[i].node_list,
  + hot_temp_cmp);
  + list_sort(NULL, root-heat_range_map[i].node_list,
  + hot_temp_cmp);
  + }
  +
 
  If this list can potentially have one (or more) entries per inode, then
 Only one hot_inode_item per inode, while maybe multiple
 hot_range_items per inode.
  filesystems with a lot of inodes (millions) may potentially exceed the
  max size of list which list_sort() can handle. If that happens it still
  works, but you'll get a warning message and it won't be as efficient.
 I haven't do so large scale test. If we want to find that issue, we
 need to do large scale performance test, before that, i want to make
 sure the code change is correct at first.
 To be honest, for that issue you pointed to, i also have such
 concern.But list_sort() performance looks good from the test result of
 the following URL:
 https://lkml.org/lkml/2010/1/20/485

 Yes, I think it is good. Also, even when it says that it's performance
 is poor (via the warning message) it is still much better than the
 alternative (of not sorting) in the GFS2 case. So currently our
 workaround is to ignore the warning. Due to what we using it for
 (sorting the data blocks for ordered writeback) we only see it very
 occasionally when there has been lots of data write activity with little
 journal activity on a node with lots of RAM.
OK.

 
  It is something that we've run into with list_sort() and GFS2, but it
  only happens very rarely,
 Beside list_sort(), do you have any other way to share? For this
 concern, how does GFS2 resolve it?

 That is an ongoing investigation :-)

 I've pondered various options... increase temp variable space in
 list_sort(), not using list_sort() and insertion sorting the blocks
 instead, flushing the ordered write data early if the list gets too
 long, figuring 

[PATCH 1/2] Btrfs: fix a deadlock in aborting transaction due to ENOSPC

2012-11-05 Thread Liu Bo
When committing a transaction, we may bail out of running delayed refs
due to ENOSPC, and then abort the current transaction to flip into readonly.

But we'll hit a deadlock on ref head's lock since we forget to release
its lock and other cleanup stuff.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/extent-tree.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d3e2c1..e0c4809 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2314,6 +2314,9 @@ static noinline int run_clustered_refs(struct 
btrfs_trans_handle *trans,
kfree(extent_op);
 
if (ret) {
+   list_del_init(locked_ref-cluster);
+   mutex_unlock(locked_ref-mutex);
+
printk(KERN_DEBUG btrfs: 
run_delayed_extent_op returned %d\n, ret);
spin_lock(delayed_refs-lock);
return ret;
@@ -2356,6 +2359,10 @@ static noinline int run_clustered_refs(struct 
btrfs_trans_handle *trans,
count++;
 
if (ret) {
+   if (locked_ref) {
+   list_del_init(locked_ref-cluster);
+   mutex_unlock(locked_ref-mutex);
+   }
printk(KERN_DEBUG btrfs: run_one_delayed_ref returned 
%d\n, ret);
spin_lock(delayed_refs-lock);
return ret;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: fix a double free on pending snapshots in error handling

2012-11-05 Thread Liu Bo
When creating a snapshot, failing to commit a transaction can end up
with aborting the transaction, following by doing a cleanup for it, where
we'll free all snapshots pending to disk.

So we check it and avoid double free on pending snapshots.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/ioctl.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8fcf9a5..4e1a1ce 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -571,8 +571,12 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
ret = btrfs_commit_transaction(trans,
   root-fs_info-extent_root);
}
-   if (ret)
+   if (ret) {
+   /* cleanup_transaction has freed this for us */
+   if (trans-aborted)
+   pending_snapshot = NULL;
goto fail;
+   }
 
ret = pending_snapshot-error;
if (ret)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Don't trust the superblock label and simply printk(%s) it

2012-11-05 Thread Stefan Behrens
Someone who is root or capable(CAP_SYS_ADMIN) could corrupt the
superblock and make Btrfs printk(%s) crash while holding the
uuid_mutex since nobody forces a limit on the string. Since the
uuid_mutex is significant, the system would be unusable
afterwards.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/volumes.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index eeed97d..a429cc6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -764,10 +764,13 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
devid = btrfs_stack_device_id(disk_super-dev_item);
transid = btrfs_super_generation(disk_super);
total_devices = btrfs_super_num_devices(disk_super);
-   if (disk_super-label[0])
+   if (disk_super-label[0]) {
+   if (disk_super-label[BTRFS_LABEL_SIZE - 1])
+   disk_super-label[BTRFS_LABEL_SIZE - 1] = '\0';
printk(KERN_INFO device label %s , disk_super-label);
-   else
+   } else {
printk(KERN_INFO device fsid %pU , disk_super-fsid);
+   }
printk(KERN_CONT devid %llu transid %llu %s\n,
   (unsigned long long)devid, (unsigned long long)transid, path);
ret = device_list_add(path, disk_super, devid, fs_devices_ret);
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Don't trust the superblock label and simply printk(%s) it

2012-11-05 Thread David Sterba
On Mon, Nov 05, 2012 at 02:10:49PM +0100, Stefan Behrens wrote:
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -764,10 +764,13 @@ int btrfs_scan_one_device(const char *path, fmode_t 
 flags, void *holder,
   devid = btrfs_stack_device_id(disk_super-dev_item);
   transid = btrfs_super_generation(disk_super);
   total_devices = btrfs_super_num_devices(disk_super);
 - if (disk_super-label[0])
 + if (disk_super-label[0]) {
 + if (disk_super-label[BTRFS_LABEL_SIZE - 1])
 + disk_super-label[BTRFS_LABEL_SIZE - 1] = '\0';

The label set via 'btrfs fi label' will also set the last-1 byte to 0,
so this keeps it as expected, although it is silent.

thanks,
Reviewed-by: David Sterba dste...@suse.cz

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/16] fs/btrfs: use WARN

2012-11-05 Thread David Sterba
On Sat, Nov 03, 2012 at 11:58:34AM +0100, Julia Lawall wrote:
 From: Julia Lawall julia.law...@lip6.fr
 
 Use WARN rather than printk followed by WARN_ON(1), for conciseness.
 
 A simplified version of the semantic patch that makes this transformation
 is as follows: (http://coccinelle.lip6.fr/)
 
 // smpl
 @@
 expression list es;
 @@
 
 -printk(
 +WARN(1,
   es);
 -WARN_ON(1);
 // /smpl
 
 Signed-off-by: Julia Lawall julia.law...@lip6.fr
Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] fs/btrfs: drop if around WARN_ON

2012-11-05 Thread David Sterba
On Sat, Nov 03, 2012 at 09:30:18PM +0100, Julia Lawall wrote:
 From: Julia Lawall julia.law...@lip6.fr
 
 Just use WARN_ON rather than an if containing only WARN_ON(1).
 
 A simplified version of the semantic patch that makes this transformation
 is as follows: (http://coccinelle.lip6.fr/)
 
 // smpl
 @@
 expression e;
 @@
 - if (e) WARN_ON(1);
 + WARN_ON(e);
 // /smpl
 
 Signed-off-by: Julia Lawall julia.law...@lip6.fr
Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merging printk and WARN

2012-11-05 Thread David Sterba
On Sun, Nov 04, 2012 at 09:25:53PM +0100, Julia Lawall wrote:
 It looks like these patches were not a good idea, because in each case the
 printk provides an error level, and WARN then provides another one.

I think this is not a problem within btrfs at the place where this has
changed.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (late) REQUEST: Default mkfs.btrfs block size

2012-11-05 Thread David Sterba
On Wed, Oct 31, 2012 at 12:20:39PM +, Alex wrote:
 As one 'stuck' with 4k leaves on my main machine for the moment, can I request
 the btrfs-progs v0.20 defaults to more efficient decent block sizes before
 release. Most distro install programs for the moment don't give access to the
 options at install time and there seems to be is a significant advantage to 
 16k
 or 32k

IMHO this should be fixed inside the installer, changing defaults for a
core utility will affect everybody. 4k is the most tested option and
thus can be considered safe for everybody.

The installer may let you to enter a shell and create the filesystem by
hand, then point it to use it for installation.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (late) REQUEST: Default mkfs.btrfs block size

2012-11-05 Thread cwillu
On Mon, Nov 5, 2012 at 10:06 AM, David Sterba d...@jikos.cz wrote:
 On Wed, Oct 31, 2012 at 12:20:39PM +, Alex wrote:
 As one 'stuck' with 4k leaves on my main machine for the moment, can I 
 request
 the btrfs-progs v0.20 defaults to more efficient decent block sizes before
 release. Most distro install programs for the moment don't give access to the
 options at install time and there seems to be is a significant advantage to 
 16k
 or 32k

 IMHO this should be fixed inside the installer, changing defaults for a
 core utility will affect everybody. 4k is the most tested option and
 thus can be considered safe for everybody.

 The installer may let you to enter a shell and create the filesystem by
 hand, then point it to use it for installation.

If we know a better setting, we should default to it.  Punting the
decision to the distro just means I'll spend the next 3 years telling
people yeah, distro X doesn't set it to the recommended setting
(which isn't the mkfs default), and there's no way to change it
without wiping and reinstalling using manual partitioning blah blah
blah.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-11-05 Thread David Sterba
On Wed, Oct 31, 2012 at 09:02:15PM +0800, Jeff Liu wrote:
 I propose this because OCFS2 report shared space in this way combine with 
 du(1).
 
 An old patch set to teach du(1) aware of reflinked file:
 https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html

Patch looks ok, the shared size is requested by an option.

 Do you means that the costs is very expensive for userland extent status 
 checkup per file?

The most expensive part is IMO not in userspace, it does in-memory lookups.

  And without any possibility to turn this off,I'm afraid this will render 
  FIEMAP unusable in practice.
 For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if 
 an extent
 is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that 
 FIEMAP_EXTENT_SHARED
 is not a persistent flag, but I have no idea how Btrfs would be in this 
 point. :(

After some research, I think this could work for btrfs without
unwanted performance penalties.

There's the fiemap::fm_flags field that can be extended to request the
shared extent info from fiemap, so the information is not computed
unconditionally (that was my concern before). The rest is only
implementation details how to speed up the file extent - refcount info
lookups.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs defrag problem

2012-11-05 Thread David Sterba
On Thu, Nov 01, 2012 at 05:17:04AM +0800, ching wrote:
  3. Is any possible to online defrag a btrfs partition without hindered by 
  mount point/polyinstantied directories?
  Sorry, I do not understand the question.
 
 when a device is mounted under a directory, files in the directory is 
 hidden, and files in the device is available, right?
 when a directory is polyinstantied, files in the original directory is 
 hidden, and files in the polyinstantied directory is available,
 
 How to get past them and pass those hidden files to defrag command?

I hope I get it right, so unless you have a reference to the directory
with hidden files (using your term), there's no way to access them. And
this is a more generic question, not related to btrfs itself. The hidden
files may also belong to a different filesystem.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's the minimum size I can shrink my FS to?

2012-11-05 Thread David Sterba
On Sat, Nov 03, 2012 at 03:10:52PM +1030, Jordan Windsor wrote:
 [root@archpc ~]# btrfs fi df /home/jordan/Storage/
 Data: total=580.88GB, used=490.88GB

This is getting full, 84%, there is not much chance of getting rid of
substantially many 1G-chunks through the 'usage=1' balance filter.
Some of the space between 490G and 580G will be spent on slack space and
fragmentation, the rest may be packed together by a higher usage= value
(but will be slower due to relocating more data).

 System, DUP: total=32.00MB, used=76.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=13.01GB, used=1001.83MB

If you intend to shrink a filesystem, all space group types must be
taken into account, so here you have at least

580G + 2x32M + 4M + 2x13G = ~607G

 [root@archpc ~]# btrfs fi sh
 failed to read /dev/sr0
 Label: 'Storage'  uuid: 717d4a43-38b3-495f-841b-d223068584de
 Total devices 1 FS bytes used 491.86GB
 devid1 size 612.04GB used 606.96GB path /dev/sda6
^^
confirmed :)

So basically you cannot go under this number when shrinking. I think you
can squeeze the metadata space down to 2G (or maybe to 1G, it's getting
very close to 1G so hard to guess) by the -musage= filter AND using at
least 3.7 kernel (or 3.6+ chris' for-linus branch) with the fixed
over-allocation bug (otherwise the size will stay pinned at 2% of the
filesystem size).

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Request for review] [RFC] Add label support for snapshots and subvols

2012-11-05 Thread David Sterba
On Mon, Nov 05, 2012 at 03:24:48PM +0800, Anand Jain wrote:
  featurexattr btrfs-kernel-way
  [1] NoYes
  [2] NoYes
  [3] Yes   No
 
 
  [1]. Ability to read subvol label without mount

It is possible to read it offline, one can traverse the data structures
the same way as from kernel, ie root_tree - subovlume fs_tree - root
directory item - xattr item.

  [2]. Full-ability to log and track the property
   when it is modified

What is expected to happen when the label changes? I understand that
somebody may change the xattr value silently, but let's say this is
changed through kernel -- do you intend to prohibit any changes or issue
some notification or whatever?

  [3]. risk-free patch ?

No patch is risk free :) but yes, xattrs use an established and tested
infrastruture.

davdi
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs defrag problem

2012-11-05 Thread ching
On 11/06/2012 06:57 AM, David Sterba wrote:
 On Thu, Nov 01, 2012 at 05:17:04AM +0800, ching wrote:
 3. Is any possible to online defrag a btrfs partition without hindered by 
 mount point/polyinstantied directories?
 Sorry, I do not understand the question.
 when a device is mounted under a directory, files in the directory is 
 hidden, and files in the device is available, right?
 when a directory is polyinstantied, files in the original directory is 
 hidden, and files in the polyinstantied directory is available,

 How to get past them and pass those hidden files to defrag command?
 I hope I get it right, so unless you have a reference to the directory
 with hidden files (using your term), there's no way to access them. And
 this is a more generic question, not related to btrfs itself. The hidden
 files may also belong to a different filesystem.

 david


thank for your explanation

ching
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to find (out if) files sharing content?

2012-11-05 Thread Jeff Liu
On 11/06/2012 06:45 AM, David Sterba wrote:
 On Wed, Oct 31, 2012 at 09:02:15PM +0800, Jeff Liu wrote:
 I propose this because OCFS2 report shared space in this way combine with 
 du(1).

 An old patch set to teach du(1) aware of reflinked file:
 https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
 
 Patch looks ok, the shared size is requested by an option.
 
 Do you means that the costs is very expensive for userland extent status 
 checkup per file?
 
 The most expensive part is IMO not in userspace, it does in-memory lookups.
 
 And without any possibility to turn this off,I'm afraid this will render 
 FIEMAP unusable in practice.
 For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if 
 an extent
 is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that 
 FIEMAP_EXTENT_SHARED
 is not a persistent flag, but I have no idea how Btrfs would be in this 
 point. :(
 
 After some research, I think this could work for btrfs without
 unwanted performance penalties.
 
 There's the fiemap::fm_flags field that can be extended to request the
 shared extent info from fiemap, so the information is not computed
 unconditionally (that was my concern before). The rest is only
 implementation details how to speed up the file extent - refcount info
 lookups.
Thanks for your confirmation.

-Jeff
 
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


slow after btrfs balance

2012-11-05 Thread william L'Heureux

Hello,
 
Here is a little background of my setup. mdadm-lvm-dmcrypt-btrfs.
 
I had the btrfs in ext4 but I converted it.
When I first did the convert, everything was fine. After a moment, I did a 
btrfs balance and since that day,
the writing speed is very slow. When I do things like unzip/unrar, the load 
average flies to 15+.
That makes the system to be almost unusable. During the whole process I went 
from 3.2 wheezy to 3.5.
I went to back to 3.2 to check the stability, the results are the same.
I was asking myself if simply changing the chunk would solve my problem?
 
Thanks
 
William   --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Production use with vanilla 3.6.6

2012-11-05 Thread Fajar A. Nugraha
On Mon, Nov 5, 2012 at 7:07 PM, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:
 Hello list,

 is btrfs ready for production use in 3.6.6? Or should i backport fixes from
 3.7-rc?

 Is it planned to have a stable kernel which will get all btrfs fixes
 backported?

I would say no to both, but you should check with distros that
supports btrfs (Oracle Linux and SLES). In particular, whether they
backport fixes, and what exactly does supported status gives you
when you buy support for that distro.

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html