Re: [PATCH v3] lib: add size unit t/p/e to memparse

2014-06-13 Thread Brendan Hide

On 12/06/14 23:15, Andrew Morton wrote:

On Wed, 2 Apr 2014 16:54:37 +0800 Gui Hecheng guihc.f...@cn.fujitsu.com wrote:


For modern filesystems such as btrfs, t/p/e size level operations
are common.
add size unit t/p/e parsing to memparse

Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
---
changelog
v1-v2: replace kilobyte with kibibyte, and others
v2-v3: add missing unit bytes in comment
---
  lib/cmdline.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/lib/cmdline.c b/lib/cmdline.c
index eb67911..511b9be 100644
--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -119,11 +119,17 @@ char *get_options(const char *str, int nints, int *ints)
   *@retptr: (output) Optional pointer to next char after parse completes
   *
   *Parses a string into a number.  The number stored at @ptr is
- * potentially suffixed with %K (for kilobytes, or 1024 bytes),
- * %M (for megabytes, or 1048576 bytes), or %G (for gigabytes, or
- * 1073741824).  If the number is suffixed with K, M, or G, then
- * the return value is the number multiplied by one kilobyte, one
- * megabyte, or one gigabyte, respectively.
+ * potentially suffixed with
+ * %K (for kibibytes, or 1024 bytes),
+ * %M (for mebibytes, or 1048576 bytes),
+ * %G (for gibibytes, or 1073741824 bytes),
+ * %T (for tebibytes, or 1099511627776 bytes),
+ * %P (for pebibytes, or 1125899906842624 bytes),
+ * %E (for exbibytes, or 1152921504606846976 bytes).

I'm afraid I find these names quite idiotic - we all know what the
traditional terms mean so why go and muck with it.

Also, kibibytes sounds like cat food.

Hi, Andrew

While I agree it sounds like cat food, it seemed like a good opportunity 
to fix a minor issue that is otherwise unlikely to be fixed for a very 
long time. Should we feel uncomfortable with the patch, as is, because 
of language/correctness friction? Pedantry included, the patch is 
correct. ;)


Thanks

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] lib: add size unit t/p/e to memparse

2014-06-13 Thread Brendan Hide

On 13/06/14 03:42, Gui Hecheng wrote:

For modern filesystems such as btrfs, t/p/e size level operations
are common.
add size unit t/p/e parsing to memparse

Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
---
changelog
v1-v2: replace kilobyte with kibibyte, and others
v2-v3: add missing unit bytes in comment
v3-v4: remove idiotic name for K,M,G,P,T,E
---
  lib/cmdline.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/cmdline.c b/lib/cmdline.c
index d4932f7..76a712e 100644
--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -121,11 +121,7 @@ EXPORT_SYMBOL(get_options);
   *@retptr: (output) Optional pointer to next char after parse completes
   *
   *Parses a string into a number.  The number stored at @ptr is
- * potentially suffixed with %K (for kilobytes, or 1024 bytes),
- * %M (for megabytes, or 1048576 bytes), or %G (for gigabytes, or
- * 1073741824).  If the number is suffixed with K, M, or G, then
- * the return value is the number multiplied by one kilobyte, one
- * megabyte, or one gigabyte, respectively.
+ * potentially suffixed with K, M, G, T, P, E.
   */
  
  unsigned long long memparse(const char *ptr, char **retptr)

@@ -135,6 +131,15 @@ unsigned long long memparse(const char *ptr, char **retptr)
unsigned long long ret = simple_strtoull(ptr, endptr, 0);
  
  	switch (*endptr) {

+   case 'E':
+   case 'e':
+   ret = 10;
+   case 'P':
+   case 'p':
+   ret = 10;
+   case 'T':
+   case 't':
+   ret = 10;
case 'G':
case 'g':
ret = 10;

Ah, I see - you've removed all reference to their names. That's good too. :)

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] lib: add size unit t/p/e to memparse

2014-06-13 Thread Gui Hecheng
On Fri, 2014-06-13 at 07:55 +0200, Brendan Hide wrote:
 On 13/06/14 03:42, Gui Hecheng wrote:
  For modern filesystems such as btrfs, t/p/e size level operations
  are common.
  add size unit t/p/e parsing to memparse
 
  Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
  ---
  changelog
  v1-v2: replace kilobyte with kibibyte, and others
  v2-v3: add missing unit bytes in comment
  v3-v4: remove idiotic name for K,M,G,P,T,E
  ---
lib/cmdline.c | 15 ++-
1 file changed, 10 insertions(+), 5 deletions(-)
 
  diff --git a/lib/cmdline.c b/lib/cmdline.c
  index d4932f7..76a712e 100644
  --- a/lib/cmdline.c
  +++ b/lib/cmdline.c
  @@ -121,11 +121,7 @@ EXPORT_SYMBOL(get_options);
 *@retptr: (output) Optional pointer to next char after parse 
  completes
 *
 *Parses a string into a number.  The number stored at @ptr is
  - * potentially suffixed with %K (for kilobytes, or 1024 bytes),
  - * %M (for megabytes, or 1048576 bytes), or %G (for gigabytes, or
  - * 1073741824).  If the number is suffixed with K, M, or G, then
  - * the return value is the number multiplied by one kilobyte, one
  - * megabyte, or one gigabyte, respectively.
  + * potentially suffixed with K, M, G, T, P, E.
 */

unsigned long long memparse(const char *ptr, char **retptr)
  @@ -135,6 +131,15 @@ unsigned long long memparse(const char *ptr, char 
  **retptr)
  unsigned long long ret = simple_strtoull(ptr, endptr, 0);

  switch (*endptr) {
  +   case 'E':
  +   case 'e':
  +   ret = 10;
  +   case 'P':
  +   case 'p':
  +   ret = 10;
  +   case 'T':
  +   case 't':
  +   ret = 10;
  case 'G':
  case 'g':
  ret = 10;
 Ah, I see - you've removed all reference to their names. That's good too. :)
 
Thank you for your review!
I think maybe more people would like cleaner things in the kernel.

-Gui

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Deadlock/high load

2014-06-13 Thread Alin Dobre
On 13/06/14 04:37, Liu Bo wrote:
 The output of 'btrfs filesystem df' is appreciate, it can help determine if 
 the
 FS has entered into 'almost full' situation, and that may cause a bug that 
 pages
 are not marked with writeback tag and lead to processes's endless waiting.

I'll send it as soon as it happens again, thanks!

Cheers,
Alin.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Deadlock/high load

2014-06-13 Thread Alin Dobre
On 13/06/14 04:37, Liu Bo wrote:
 The output of 'btrfs filesystem df' is appreciate, it can help determine if 
 the
 FS has entered into 'almost full' situation, and that may cause a bug that 
 pages
 are not marked with writeback tag and lead to processes's endless waiting.

The output now (when the host is stable) is:
Data, single: total=54.01GiB, used=24.48GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=3.01GiB, used=1.31GiB
unknown, single: total=464.00MiB, used=0.00

Cheers,
Alin.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] lib: add size unit t/p/e to memparse

2014-06-13 Thread Hugo Mills
On Fri, Jun 13, 2014 at 07:54:44AM +0200, Brendan Hide wrote:
 On 12/06/14 23:15, Andrew Morton wrote:
 On Wed, 2 Apr 2014 16:54:37 +0800 Gui Hecheng guihc.f...@cn.fujitsu.com 
 wrote:
 + * %K (for kibibytes, or 1024 bytes),
 + * %M (for mebibytes, or 1048576 bytes),
 + * %G (for gibibytes, or 1073741824 bytes),
 + * %T (for tebibytes, or 1099511627776 bytes),
 + * %P (for pebibytes, or 1125899906842624 bytes),
 + * %E (for exbibytes, or 1152921504606846976 bytes).
 I'm afraid I find these names quite idiotic - we all know what the
 traditional terms mean so why go and muck with it.
 
 Also, kibibytes sounds like cat food.
 Hi, Andrew
 
 While I agree it sounds like cat food, it seemed like a good opportunity to
 fix a minor issue that is otherwise unlikely to be fixed for a very long
 time. Should we feel uncomfortable with the patch, as is, because of
 language/correctness friction? Pedantry included, the patch is correct. ;)

   Last night, I wrote a very grumpy reply to Andrew. I'm glad I
didn't send it, because Brendan has managed to cover at least one of
my points much more politely than I did.

   My other comment is that TB vs TiB is a 10% difference in the
magnitude of the number, and so the accumulated error is now no longer
small enough to be brushed under the carpet as we all did in days
past. By Andrew's thinking, a 4 TB disk is 3.638 TB in size. I'd say a
4 TB disk is 3.638 TiB in size, and I can be precise (±1GB in the
latter case) with both values.

   Hugo.

PS. Let's just not talk about 1.44 MB floppy disks.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- 2 + 2 = 5,  for sufficiently large values of 2. --- 


signature.asc
Description: Digital signature


Re: [PATCH v2] Btrfs: fix RCU correctness warning when running sanity tests

2014-06-13 Thread Filipe David Manana
On Tue, Jun 10, 2014 at 7:17 PM, Filipe David Borba Manana
fdman...@gmail.com wrote:
 When CONFIG_PROVE_RCU=y and CONFIG_PROVE_RCU_REPEATEDLY=y, the
 following was dumped in dmesg:

 [ 3197.218064] ===
 [ 3197.218064] [ INFO: suspicious RCU usage. ]
 [ 3197.218066] 3.15.0-rc8-fdm-btrfs-next-33+ #4 Not tainted
 [ 3197.218067] ---
 [ 3197.218068] include/linux/radix-tree.h:196 suspicious 
 rcu_dereference_check() usage!
 [ 3197.218068]
 [ 3197.218068] other info that might help us debug this:
 [ 3197.218068]
 [ 3197.218070]
 [ 3197.218070] rcu_scheduler_active = 1, debug_locks = 1
 [ 3197.218071] 1 lock held by modprobe/12024:
 [ 3197.218072]  #0:  ((fs_info-buffer_lock)-rlock){+.+...}, at: 
 [a025c5fa] btrfs_free_dummy_root+0x5a/0x1d0 [btrfs]
 [ 3197.218093]
 [ 3197.218093] stack backtrace:
 [ 3197.218095] CPU: 3 PID: 12024 Comm: modprobe Not tainted 
 3.15.0-rc8-fdm-btrfs-next-33+ #4
 [ 3197.218096] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [ 3197.218097]  0001 8800af18fc18 81685c5a 
 feb0
 [ 3197.218099]  8800cf6ccb40 8800af18fc48 810a6316 
 8801d955f640
 [ 3197.218101]  8800d719e328 8800d719e370 8800d719c000 
 8800af18fcb8
 [ 3197.218102] Call Trace:
 [ 3197.218105]  [81685c5a] dump_stack+0x4e/0x68
 [ 3197.218108]  [810a6316] lockdep_rcu_suspicious+0xe6/0x130
 [ 3197.218119]  [a025c728] btrfs_free_dummy_root+0x188/0x1d0 [btrfs]
 [ 3197.218129]  [a025f56a] btrfs_test_qgroups+0xea/0x1bb [btrfs]
 [ 3197.218137]  [a03a19d2] ? 
 ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs]
 [ 3197.218144]  [a03a19d2] ? 
 ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs]
 [ 3197.218151]  [a03a1ab7] init_btrfs_fs+0xe5/0x184 [btrfs]
 [ 3197.218154]  [81000352] do_one_initcall+0x102/0x150
 [ 3197.218157]  [8103d223] ? set_memory_nx+0x43/0x50
 [ 3197.218160]  [81682668] ? set_section_ro_nx+0x6d/0x74
 [ 3197.218162]  [810d91cc] load_module+0x1cdc/0x2630
 [ 3197.218164]  [810d4e90] ? show_initstate+0x60/0x60
 [ 3197.218166]  [810d9c9e] SyS_finit_module+0x8e/0x90
 [ 3197.218168]  [81698212] system_call_fastpath+0x16/0x1b

 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com

Chris,

Please drop this patch from your integration branch.
Sasha fixed this too but in a simpler way:
https://patchwork.kernel.org/patch/4337091/
(In fact both patches applied probably trigger another warning the rcu
usage correctness checker)

Thanks

 ---

 V2: Added missing rcu read unlock if a retry is needed.

  fs/btrfs/tests/btrfs-tests.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

 diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
 index a5dcacb..bdb1f05 100644
 --- a/fs/btrfs/tests/btrfs-tests.c
 +++ b/fs/btrfs/tests/btrfs-tests.c
 @@ -130,8 +130,8 @@ static void btrfs_free_dummy_fs_info(struct btrfs_fs_info 
 *fs_info)
 struct radix_tree_iter iter;
 void **slot;

 -   spin_lock(fs_info-buffer_lock);
  restart:
 +   rcu_read_lock();
 radix_tree_for_each_slot(slot, fs_info-buffer_radix, iter, 0) {
 struct extent_buffer *eb;

 @@ -140,15 +140,17 @@ restart:
 continue;
 /* Shouldn't happen but that kind of thinking creates CVE's */
 if (radix_tree_exception(eb)) {
 -   if (radix_tree_deref_retry(eb))
 +   if (radix_tree_deref_retry(eb)) {
 +   rcu_read_unlock();
 goto restart;
 +   }
 continue;
 }
 -   spin_unlock(fs_info-buffer_lock);
 +   rcu_read_unlock();
 free_extent_buffer_stale(eb);
 -   spin_lock(fs_info-buffer_lock);
 +   goto restart;
 }
 -   spin_unlock(fs_info-buffer_lock);
 +   rcu_read_unlock();

 btrfs_free_qgroup_config(fs_info);
 btrfs_free_fs_roots(fs_info);
 --
 1.9.1




-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs filesystem read only on error

2014-06-13 Thread David Bloquel
Hi,

Today my btrfs disk suddently turned to read only after than an error
occured, I guess it's a security measure to avoid any corrupted data
after a filesystem error.

Here is the syslog:

Jun 13 04:59:39 nas1 kernel: [110663.709572] btrfs: page allocation
failure: order:1, mode:0x200020
Jun 13 04:59:39 nas1 kernel: [110663.709579] CPU: 3 PID: 21177 Comm:
btrfs Not tainted 3.12-0.bpo.1-amd64 #1 Debian 3.12.9-1~bpo70+1
Jun 13 04:59:39 nas1 kernel: [110663.709581] Hardware name: Supermicro
A1SAi/A1SAi, BIOS 1.0b 11/06/2013
Jun 13 04:59:39 nas1 kernel: [110663.709584]  
0001 814be0b3 00200020
Jun 13 04:59:39 nas1 kernel: [110663.709590]  8112535c
88047ffefb00  00020002
Jun 13 04:59:39 nas1 kernel: [110663.709594]  88047ffefb08
0002  
Jun 13 04:59:39 nas1 kernel: [110663.709599] Call Trace:
Jun 13 04:59:39 nas1 kernel: [110663.709608]  [814be0b3] ?
dump_stack+0x41/0x51
Jun 13 04:59:39 nas1 kernel: [110663.709614]  [8112535c] ?
warn_alloc_failed+0x10c/0x160
Jun 13 04:59:39 nas1 kernel: [110663.709618]  [811292b2] ?
__alloc_pages_nodemask+0x9c2/0xaa0
Jun 13 04:59:39 nas1 kernel: [110663.709623]  [8116d5c5] ?
kmem_getpages+0x65/0x1a0
Jun 13 04:59:39 nas1 kernel: [110663.709627]  [8116efd2] ?
fallback_alloc+0x172/0x260
Jun 13 04:59:39 nas1 kernel: [110663.709631]  [8116f904] ?
kmem_cache_alloc+0x144/0x1f0
Jun 13 04:59:39 nas1 kernel: [110663.709635]  [81281f98] ?
__idr_pre_get+0x68/0x90
Jun 13 04:59:39 nas1 kernel: [110663.709639]  [81282128] ?
ida_pre_get+0x18/0x90
Jun 13 04:59:39 nas1 kernel: [110663.709644]  [81187561] ?
get_anon_bdev+0x21/0xe0
Jun 13 04:59:39 nas1 kernel: [110663.709665]  [a048a768] ?
btrfs_init_fs_root+0xa8/0xf0 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709677]  [a048b32e] ?
btrfs_get_fs_root+0xce/0x220 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709689]  [a048e5fe] ?
create_pending_snapshot+0x6ee/0x980 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709701]  [a048e919] ?
create_pending_snapshots+0x89/0xa0 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709713]  [a048fd1a] ?
btrfs_commit_transaction+0x46a/0x9f0 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709718]  [81082d20] ?
add_wait_queue+0x60/0x60
Jun 13 04:59:39 nas1 kernel: [110663.709731]  [a04bf39d] ?
btrfs_mksubvol.isra.59+0x3cd/0x3f0 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709743]  [a04bf4d0] ?
btrfs_ioctl_snap_create_transid+0x110/0x1a0 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709755]  [a04bf621] ?
btrfs_ioctl_snap_create_v2+0x41/0x140 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709767]  [a04bf6e9] ?
btrfs_ioctl_snap_create_v2+0x109/0x140 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709780]  [a04c2bd2] ?
btrfs_ioctl+0xc32/0x1d30 [btrfs]
Jun 13 04:59:39 nas1 kernel: [110663.709785]  [81151544] ?
mmap_region+0x274/0x600
Jun 13 04:59:39 nas1 kernel: [110663.709790]  [814c75b8] ?
__do_page_fault+0x2b8/0x540
Jun 13 04:59:39 nas1 kernel: [110663.709795]  [811971ca] ?
do_vfs_ioctl+0x8a/0x4f0
Jun 13 04:59:39 nas1 kernel: [110663.709799]  [811976d0] ?
SyS_ioctl+0xa0/0xc0
Jun 13 04:59:39 nas1 kernel: [110663.709803]  [814cb7b9] ?
system_call_fastpath+0x16/0x1b
 Jun 13 04:59:39 nas1 kernel: [110663.709805] Mem-Info:
Jun 13 04:59:39 nas1 kernel: [110663.709807] Node 0 DMA per-cpu:
Jun 13 04:59:39 nas1 kernel: [110663.709810] CPU0: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709812] CPU1: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709814] CPU2: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709816] CPU3: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709818] CPU4: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709820] CPU5: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709822] CPU6: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709824] CPU7: hi:0, btch:
  1 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709825] Node 0 DMA32 per-cpu:
Jun 13 04:59:39 nas1 kernel: [110663.709828] CPU0: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709830] CPU1: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709832] CPU2: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709834] CPU3: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709836] CPU4: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709838] CPU5: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709840] CPU6: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709842] CPU7: hi:  186, btch:
 31 usd:   0
Jun 13 04:59:39 nas1 kernel: [110663.709844] 

[GIT PULL] Btrfs #2 for 3.16

2014-06-13 Thread Chris Mason

Hi Linus,

Please pull my for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

This has a few fixes since our last pull and a new ioctl for doing btree
searches from userland.  It's very similar to the existing ioctl, but
lets us return larger items back down to the app.

Gerhard Heift (7) commits (+183/-38):
btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer 
(+26/-2)
btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW (+15/-9)
btrfs: tree_search, search_ioctl: direct copy to userspace (+33/-15)
btrfs: tree_search, search_ioctl: accept varying buffer (+11/-7)
btrfs: tree_search: eliminate redundant nr_items check (+7/-5)
btrfs: new function read_extent_buffer_to_user (+40/-0)
btrfs: new ioctl TREE_SEARCH_V2 (+51/-0)

Eric Sandeen (3) commits (+11/-7):
btrfs: free ulist in qgroup_shared_accounting() error path (+3/-1)
btrfs: fix use of uninit ret in end_extent_writepage() (+1/-1)
btrfs: fix error handling in create_pending_snapshot (+7/-5)

Wang Shilong (1) commits (+7/-2):
Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting

Filipe Manana (1) commits (+2/-0):
Btrfs: fix qgroups sanity test crash or hang

Sasha Levin (1) commits (+1/-1):
btrfs: prevent RCU warning when dereferencing radix tree slot

Total: (13) commits

 fs/btrfs/extent_io.c  |  39 ++-
 fs/btrfs/extent_io.h  |   3 +
 fs/btrfs/ioctl.c  | 147 ++
 fs/btrfs/qgroup.c |   4 +-
 fs/btrfs/reada.c  |   9 ++-
 fs/btrfs/tests/btrfs-tests.c  |   2 +-
 fs/btrfs/tests/qgroup-tests.c |   2 +
 fs/btrfs/transaction.c|  12 ++--
 include/uapi/linux/btrfs.h|  10 +++
 9 files changed, 192 insertions(+), 36 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Remove a leading '+'.

2014-06-13 Thread Eric Sandeen
On 6/13/14, 4:34 PM, Adam Buchbinder wrote:
 It was added in commit aab7f3ae3eb02fb7e8a6a31d7e2ada704a3dcd5e,
 most likely as a copy-pasteo.
 
 Signed-off-by: Adam Buchbinder abuchbin...@google.com
 ---
  btrfs-show-super.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/btrfs-show-super.c b/btrfs-show-super.c
 index f42cd15..ed0311f 100644
 --- a/btrfs-show-super.c
 +++ b/btrfs-show-super.c
 @@ -50,7 +50,7 @@ static void print_usage(void)
   fprintf(stderr, \t-f : print full superblock information\n);
   fprintf(stderr, \t-a : print information of all superblocks\n);
   fprintf(stderr, \t-i super_mirror : specify which mirror to print 
 out\n);
 -+fprintf(stderr, \t-F : attempt to dump superblocks with bad magic\n);
 + fprintf(stderr, \t-F : attempt to dump superblocks with bad magic\n);
   fprintf(stderr, %s\n, BTRFS_BUILD_VERSION);
  }
  
 

Whoa, git blame blames me, but that's not the patch I sent.  ;)

Reviewed-by: Eric Sandeen sand...@redhat.com

thanks,
-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: R: Re: Slow startup of systemd-journal on BTRFS

2014-06-13 Thread Goffredo Baroncelli
Hi Dave

On 06/13/2014 01:24 AM, Dave Chinner wrote:
 On Thu, Jun 12, 2014 at 12:37:13PM +, Duncan wrote:
 Goffredo Baroncelli kreij...@libero.it posted on Thu, 12 Jun 2014
 13:13:26 +0200 as excerpted:

 systemd has a very stupid journal write pattern. It checks if there is
 space in the file for the write, and if not it fallocates the small
 amount of space it needs (it does *4 byte* fallocate calls!) and then
 does the write to it.  All this does is fragment the crap out of the log
 files because the filesystems cannot optimise the allocation patterns.

 I checked the code, and to me it seems that the fallocate() are done in
 FILE_SIZE_INCREASE unit (actually 8MB).

 FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think 
 actually pretty much equally bad without NOCOW set on the file.
 
 So maybe it's been fixed in systemd since the last time I looked.
 Yup:
 
 http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58
 
 The reason it was changed? To save a syscall per append, not to
 prevent fragmentation of the file, which was the problem everyone
 was complaining about...

thanks for pointing that. However I am performing my tests on a fedora 20 with 
systemd-208, which seems have this change
 
 Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for 
 either 4 byte or 8 MiB file allocations is going to end up being the 
 same, forcing (repeated until full) rewrite of each 4 KiB block into its 
 own extent.


I am reaching the conclusion that fallocate is not the problem. The fallocate 
increase the filesize of about 8MB, which is enough for some logging. So it is 
not called very often.

I have to investigate more what happens when the log are copied from /run to 
/var/log/journal: this is when journald seems to slow all.

I am prepared a PC which reboot continuously; I am collecting the time required 
to finish the boot vs the fragmentation of the system.journal file vs the 
number of boot. The results are dramatic: after 20 reboot, the boot time 
increase of 20-30 seconds. Doing a defrag of system.journal reduces the boot 
time to the original one, but after another 20 reboot, the boot time still 
requires 20-30 seconds more

It is a slow PC, but I saw the same behavior also on a more modern pc (i5 with 
8GB).

For both PC the HD is a mechanical one...

 
 And that's now a btrfs problem :/

Are you sure ?

ghigo@venice:/var/log$ sudo filefrag messages
messages: 29 extents found

ghigo@venice:/var/log$ sudo filefrag journal/*/system.journal
journal/41d686199835445395ac629d576dfcb9/system.journal: 1378 extents found

So the old rsyslog creates files with fewer fragments. BTRFS (but it seems also 
xfs) for sure highlights more this problem than other filesystem. But also 
systemd seems to create a lot of extens.

BR
G.Baroncelli



 
 Cheers,
 
 Dave.
 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix a use-after-free in the volumes code.

2014-06-13 Thread Adam Buchbinder
When a struct btrfs_fs_devices was being torn down by
btrfs_close_devices(), there was an invalidated pointer in the global
list fs_uuids which still pointed to it; if a device was closed and
then reopened (which btrfs-convert does), freed memory would be
accessed.

This was found using ThreadSanitizer (pretty much doing what
AddressSanitizer would, but not exiting after the first failure).
To reproduce, build with -fsanitize=thread and run 'make test'.
Representative output is below.

This change makes the current tests TSan-clean.

WARNING: ThreadSanitizer: heap-use-after-free (pid=29161)
  Read of size 8 at 0x7d18eee0 by main thread:
#0 memcmp ??:0
#1 find_fsid .../volumes.c:81
#2 device_list_add .../volumes.c:95
#3 btrfs_scan_one_device .../volumes.c:259
#4 btrfs_scan_fs_devices .../disk-io.c:1002
#5 __open_ctree_fd .../disk-io.c:1090
#6 open_ctree_fd .../disk-io.c:1191
#7 do_convert .../btrfs-convert.c:2317
#8 main .../btrfs-convert.c:2745

  Previous write of size 8 at 0x7d18eee0 by main thread:
#0 free ??:0
#1 btrfs_close_devices .../volumes.c:191
#2 close_ctree .../disk-io.c:1401
#3 do_convert .../btrfs-convert.c:2300
#4 main .../btrfs-convert.c:2745

  Location is heap block of size 96 at 0x7d18eee0 allocated by main thread:
#0 calloc ??:0 (exe+0x0002acc6)
#1 device_list_add .../volumes.c:97
#2 btrfs_scan_one_device .../volumes.c:259
#3 btrfs_scan_fs_devices .../disk-io.c:1002
#4 __open_ctree_fd .../disk-io.c:1090
#5 open_ctree_fd .../disk-io.c:1191
#6 do_convert .../btrfs-convert.c:2256
#7 main .../btrfs-convert.c:2745

Signed-off-by: Adam Buchbinder abuchbin...@google.com
---
 volumes.c | 1 +
 volumes.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/volumes.c b/volumes.c
index a61928c..36f6050 100644
--- a/volumes.c
+++ b/volumes.c
@@ -188,6 +188,7 @@ again:
goto again;
}
 
+   list_del(fs_devices-list);
free(fs_devices);
return 0;
 }
diff --git a/volumes.h b/volumes.h
index b1ff3d0..2e960b5 100644
--- a/volumes.h
+++ b/volumes.h
@@ -67,7 +67,7 @@ struct btrfs_device {
 struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 
-   /* the device with this id has the most recent coyp of the super */
+   /* the device with this id has the most recent copy of the super */
u64 latest_devid;
u64 latest_trans;
u64 lowest_devid;
-- 
2.0.0.526.g5318336

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Properly size the leafsize field in the mdrestore_struct struct.

2014-06-13 Thread Adam Buchbinder
I'd like to follow up on this a bit, because the way I found it was *weird*.

MSan found an uninitialized write. Reproducing the issue through GDB
showed that there's a struct mdrestore_struct type with a member of
type u64 called 'leafsize' which was... half-initialized? Four bytes
were uninitialized, four were initialized. Looking further, it was
being set by a call to a getter function defined in ctree.h, which
pulls a particular member from a superblock struct... which is of type
u32. Aha! So the struct member was missized. Easy enough to fix.
Except that the mdrestore_struct is initialized using calloc, so the
part that wasn't getting written to was
actually initialized, and the part that *was* was showing up as
uninitialized. And actually contained a sane value, reliably. What was
going on?

Turns out that btrfs's use of zlib, which was the uninstrumented
system version, caused anything allocated by uncompress() to show up
as uninitialized. (Trying to use __msan_unpoison did not work, as I
have an iffy install of the
sanitizer in the first place (it's whatever happened to come with
clang-3.3), and MSanDR (the dynamic-instrumentation tool which marks
all uninstrumented libraries as good) is apparently not trivial to
install. I may just build zlib.) But the point is, it would have been
annoying and sad if this had been a false positive through and
through, but I *did* find an actual issue here using MSan, even though
MSan was pointing in the exact wrong place.

Adam Buchbinder
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: some project ideas: NFS4 ACLs, resilience on the same device, allowing to specify which devices are distinct in a RAID

2014-06-13 Thread Christoph Anton Mitterer
On Sat, 2014-06-07 at 10:25 +0200, Goffredo Baroncelli wrote: 
 You can add new ideas to the wiki pages, supporting by link and other info 
 were available. This is the real nature of the wiki pages.
I've added some stuff now:
https://btrfs.wiki.kernel.org/index.php?title=Project_ideasaction=historysubmitdiff=27001oldid=26611

@upstream-devs: Just drop the ideas wich don't fit :)


Cheers,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: Slow startup of systemd-journal on BTRFS

2014-06-13 Thread Duncan
Goffredo Baroncelli posted on Sat, 14 Jun 2014 00:19:31 +0200 as
excerpted:

 On 06/13/2014 01:24 AM, Dave Chinner wrote:
 On Thu, Jun 12, 2014 at 12:37:13PM +, Duncan wrote:

 FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think
 actually pretty much equally bad without NOCOW set on the file.
 
 So maybe it's been fixed in systemd since the last time I looked.
 Yup:
 
 http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-
file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58
 
 The reason it was changed? To save a syscall per append, not to
 prevent fragmentation of the file, which was the problem everyone was
 complaining about...
 
 thanks for pointing that. However I am performing my tests on a fedora
 20 with systemd-208, which seems have this change
 
 Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for
 either 4 byte or 8 MiB file allocations is going to end up being the
 same, forcing (repeated until full) rewrite of each 4 KiB block into
 its own extent.
 
 I am reaching the conclusion that fallocate is not the problem. The
 fallocate increase the filesize of about 8MB, which is enough for some
 logging. So it is not called very often.

But... 

If a file isn't (properly[1]) set NOCOW (and the btrfs isn't mounted with 
nodatacow), then an fallocate of 8 MiB will increase the file size by 8 
MiB and write that out.  So far so good as at that point the 8 MiB should 
be a single extent.  But then, data gets written into 4 KiB blocks of 
that 8 MiB one at a time, and because btrfs is COW, the new data in the 
block must be written to a new location.

Which effectively means that by the time the 8 MiB is filled, each 4 KiB 
block has been rewritten to a new location and is now an extent unto 
itself.  So now that 8 MiB is composed of 2048 new extents, each one a 
single 4 KiB block in size.

=:^(

Tho as I already stated, for file sizes upto 128 MiB or so anyway[2], the 
btrfs autodefrag mount option should at least catch that and rewrite 
(again), this time sequentially.

 I have to investigate more what happens when the log are copied from
 /run to /var/log/journal: this is when journald seems to slow all.

That's an interesting point.

At least in theory, during normal operation journald will write to 
/var/log/journal, but there's a point during boot at which it flushes the 
information accumulated during boot from the volatile /run location to 
the non-volatile /var/log location.  /That/ write, at least, should be 
sequential, since there will be  4 KiB of journal accumulated that needs 
to be transferred at once.  However, if it's being handled by the forced 
pre-write fallocate described above, then that's not going to be the 
case, as it'll then be a rewrite of already fallocated file blocks and 
thus will get COWed exactly as I described above.

=:^(


 I am prepared a PC which reboot continuously; I am collecting the time
 required to finish the boot vs the fragmentation of the system.journal
 file vs the number of boot. The results are dramatic: after 20 reboot,
 the boot time increase of 20-30 seconds. Doing a defrag of
 system.journal reduces the boot time to the original one, but after
 another 20 reboot, the boot time still requires 20-30 seconds more
 
 It is a slow PC, but I saw the same behavior also on a more modern pc
 (i5 with 8GB).
 
 For both PC the HD is a mechanical one...

The problem's duplicable.  That's the first step toward a fix. =:^)

 And that's now a btrfs problem :/
 
 Are you sure ?

As they say, Whoosh!

At least here, I interpreted that remark as primarily sarcastic 
commentary on the systemd devs' apparent attitude, which can be 
(controversially) summarized as: Systemd doesn't have problems because 
it's perfect.  Therefore, any problems you have with systemd must instead 
be with other components which systemd depends on.

IOW, it's a btrfs problem now in practice, not because it is so in a 
technical sense, but because systemd defines it as such and is unlikely 
to budge, so the only way to achieve progress is for btrfs to deal with 
it. 

An arguably fairer and more impartial assessment of this particular 
situations suggests that neither btrfs, which as a COW-based filesystem, 
like all COW-based filesystems has the existing-file-rewrite as a major 
technical challenge that it must deal with /somehow/, nor systemd, which 
in choosing to use fallocate is specifically putting itself in that 
existing-file-rewrite class, are entirely at fault.

But that doesn't matter if one side refuses to budge, because then the 
other side must do so regardless of where the fault was, if there is to 
be any progress at all.

Meanwhile, I've predicted before and do so here again, that as btrfs 
moves toward mainstream and starts supplanting ext* as the assumed Linux 
default filesystem, some of these problems will simply go away, because 
at that point, various apps are no longer optimized for the assumed 
default 

Re: Btrfs filesystem read only on error

2014-06-13 Thread Duncan
David Bloquel posted on Fri, 13 Jun 2014 12:51:32 +0200 as excerpted:

 Informations about my system:
 Linux nas1 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1
 (2014-02-07)
 x86_64 GNU/Linux brtfs version:
 Btrfs v0.19

Yikes, that's old!  Particularly the 0.19 btrfs-progs, tho there have 
been a lot of kernel btrfs fixes since the 3.12 series as well, not all 
of which will have been backported.

Due to the rate of fixes going into btrfs, users are strongly recommended 
to keep up with current stable kernels, now 3.15, and not get /too/ far 
behind on btrfs-progs either.

I'm actually a bit behind myself ATM as I've been focused on other 
things, kernel from git but not rebuild since 3.14, and btrfs-progs from 
git at v3.14.2.  I'd recommend kernel 3.14 if not 3.15, and a btrfs-progs 
no older than 3.12.  I think there are some snapshotting fixes in kernel 
3.15 so you'll probably want that given the snapshotting you report 
doing.  (I do little snapshotting and 3.14 has been pretty stable for me, 
the reason I've let myself get a bit behind, what with other things I 
have been doing.)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html