Re: Btrfs ENOSPC issue

2015-04-04 Thread Filipe David Manana
On Sat, Apr 4, 2015 at 12:36 AM, Justin Maggard jmaggar...@gmail.com wrote:
 Hi,

 We're hitting a consistently reproducible ENOSPC condition with a
 simple test case:

 # truncate -s 1T btrfs.fs
 # mkfs.btrfs btrfs.fs
 # mount btrfs.fs /mnt/
 # fallocate -l 1021G /mnt/fallocate
 # btrfs fi sync /mnt/
 # dd if=/dev/zero of=/mnt/dd bs=1G
 # btrfs fi sync /mnt/
 # rm /mnt/fallocate
 # btrfs fi sync /mnt/
 # fallocate -l 20G /mnt/fallocate
 fallocate: /mnt/fallocate: fallocate failed: No space left on device

 I continue to get ENOSPC even after unmount / mount.

 Here we have 1022GB free as reported by df, yet we can't allocate
 20GB.  I tried the integration-4.1 tree, which had the same results.
 I also added Zhao Lei's ENOSPC most recent patchset from today, but it
 didn't seem to help.

Have you tried too the following patch (not in any release nor rc)?

https://patchwork.kernel.org/patch/5800231/


 So it appears that when allocating the first chunk,
 find_free_dev_extent() finds a huge hole, and allocates a portion of
 that free 1022GB.  Real chunk allocation is delayed until transaction
 submit and does not insert the DEV_EXTENT item into the device tree
 immediately, so transaction-pending_chunks is used to record pending
 chunks.

 When it comes to the next chunk allocation, find_free_dev_extent()
 detects the same huge hole, but contains_pending_extent() returns true
 and sets hole_size to 0.  This means we skip our one and only huge
 free space hole and try to search for some other free space holes.

 The problem occurs when there is not enough space for chunk allocation
 if we skip that huge hole, and find_free_dev_extent() eventually
 returns –ENOSPC.

 The following patch makes it work for me, but I certainly may have
 missed some subtleties in how btrfs allocation works; so if something
 is incorrect here, I'd appreciate feedback.  If this is the proper way
 to go about fixing it, I can whip up a proper patch and post it to the
 list.

 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index a73acf4..d056448 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -1053,7 +1053,7 @@ out:

  static int contains_pending_extent(struct btrfs_trans_handle *trans,
struct btrfs_device *device,
 -  u64 *start, u64 len)
 +  u64 *start, u64 *len)
  {
 struct extent_map *em;
 struct list_head *search_list = trans-transaction-pending_chunks;
 @@ -1068,12 +1068,16 @@ again:
 for (i = 0; i  map-num_stripes; i++) {
 if (map-stripes[i].dev != device)
 continue;
 -   if (map-stripes[i].physical = *start + len ||
 +   if (map-stripes[i].physical = *start + *len ||
 map-stripes[i].physical + em-orig_block_len =
 *start)
 continue;
 *start = map-stripes[i].physical +
 em-orig_block_len;
 +   if (*len  em-orig_block_len)
 +   *len -= em-orig_block_len;
 +   else
 +   *len = 0;
 ret = 1;
 }
 }
 @@ -1191,10 +1195,9 @@ again:
  * Have to check before we set max_hole_start, 
 otherwise
  * we could end up sending back this offset anyway.
  */
 -   if (contains_pending_extent(trans, device,
 -   search_start,
 -   hole_size))
 -   hole_size = 0;
 +   contains_pending_extent(trans, device,
 +   search_start,
 +   hole_size);

 if (hole_size  max_hole_size) {
 max_hole_start = search_start;
 @@ -1239,7 +1242,7 @@ next:
 max_hole_size = hole_size;
 }

 -   if (contains_pending_extent(trans, device, search_start, hole_size)) 
 {
 +   if (contains_pending_extent(trans, device, search_start, 
 hole_size)) {
 btrfs_release_path(path);
 goto again;
 }

 Thanks,
 -Justin
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More 

Re: Btrfs hangs 3.19-10

2015-04-04 Thread Russell Coker
On Fri, 3 Apr 2015 05:14:12 AM Duncan wrote:
 Well, btrfs itself isn't really stable yet...  Stable series should be 
 stable at least to the extent that whatever you're using in them is, but 
 with btrfs itself not yet entirely stable... 

Also for stable operation you want both forward and backward compatability.  
You could make an Ext3 filesystem and expect that any random ancient Linux box 
you are likely to encounter can read it.  Even Ext4 has been supported for a 
long time and most systems you are likely to encounter won't have any problems 
with it.

I recently made a BTRFS filesystem on a Debian/Jessie system (kernel 3.16.7) 
with default options and discovered that Debian/Wheezy (kernel 3.2.65) can't 
read it.  I think that one criteria for stable in a filesystem is that 
kernels from a couple of previous releases can mount it.  By that criteria 
BTRFS won't be stable for use in Debian for about 4 years.

As an aside are there options to mkfs.btrfs that would make a filesystem 
mountable by kernel 3.2.65?  If so I'll file a Debian/Jessie bug report 
requesting that a specific mention be added to the man page.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs hangs 3.19-10

2015-04-04 Thread Hugo Mills
On Sat, Apr 04, 2015 at 12:55:08PM +, Russell Coker wrote:
 On Fri, 3 Apr 2015 05:14:12 AM Duncan wrote:
  Well, btrfs itself isn't really stable yet...  Stable series should be 
  stable at least to the extent that whatever you're using in them is, but 
  with btrfs itself not yet entirely stable... 
 
 Also for stable operation you want both forward and backward compatability.  
 You could make an Ext3 filesystem and expect that any random ancient Linux 
 box 
 you are likely to encounter can read it.  Even Ext4 has been supported for a 
 long time and most systems you are likely to encounter won't have any 
 problems 
 with it.
 
 I recently made a BTRFS filesystem on a Debian/Jessie system (kernel 3.16.7) 
 with default options and discovered that Debian/Wheezy (kernel 3.2.65) can't 
 read it.  I think that one criteria for stable in a filesystem is that 
 kernels from a couple of previous releases can mount it.  By that criteria 
 BTRFS won't be stable for use in Debian for about 4 years.
 
 As an aside are there options to mkfs.btrfs that would make a filesystem 
 mountable by kernel 3.2.65?  If so I'll file a Debian/Jessie bug report 
 requesting that a specific mention be added to the man page.

   Yes, there are. It's probably -O^extref, but if you can show the
dmesg output from the 3.2 kernel on the failed mount (so that it shows
what the actual failure was), we should be able to give you a more
precise answer.

   Hugo.

-- 
Hugo Mills | Welcome to Hollywood, a land just off the coast of
hugo@... carfax.org.uk | Planet Earth
http://carfax.org.uk/  |
PGP: 65E74AC0  |The Cat's Meow


signature.asc
Description: Digital signature


Re: Btrfs hangs 3.19-10

2015-04-04 Thread Russell Coker
On Sun, 5 Apr 2015 03:16:21 AM Duncan wrote:
 Hugo Mills posted on Sat, 04 Apr 2015 13:00:47 + as excerpted:
  On Sat, Apr 04, 2015 at 12:55:08PM +, Russell Coker wrote:
  As an aside are there options to mkfs.btrfs that would make a
  filesystem mountable by kernel 3.2.65?  If so I'll file a Debian/Jessie
  bug report requesting that a specific mention be added to the man page.
  
  Yes, there are. It's probably -O^extref, but if you can show the
  dmesg output from the 3.2 kernel on the failed mount (so that it shows
  what the actual failure was), we should be able to give you a more
  precise answer.

[698190.987065] Btrfs loaded
[698190.92] device fsid 118e2c64-6ce1-4f21-85e2-2d6aea8f0fa5 devid 1 
transid 426 /dev/sdf1
[698191.000981] btrfs: disk space caching is enabled
[698191.000986] BTRFS: couldn't mount because of unsupported optional features 
(60).
[698191.018176] btrfs: open_ctree failed

 So I was thinking about this, and the several other earlier options where
 support wasn't added until kernel X, and had an idea...
 
 How easy and useful might it be to add to mkfs.btrfs appropriate option-
 group aliases such that if one knew the oldest kernel one was likely to
 deal with, all one would need to do for the mkfs would be to set for
 example, -O3.2, or even simply --3.2 (or maybe even --32), and have
 mkfs.btrfs automatically set/unset the appropriate options so it would
 just work with that kernel and anything newer?

That would be really useful.  Also it would be good if the code structure 
allowed adding extra aliases, so for Debian we could add an option -Owheezy.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs hangs 3.19-10

2015-04-04 Thread Duncan
Hugo Mills posted on Sat, 04 Apr 2015 13:00:47 + as excerpted:

 On Sat, Apr 04, 2015 at 12:55:08PM +, Russell Coker wrote:
 
 As an aside are there options to mkfs.btrfs that would make a
 filesystem mountable by kernel 3.2.65?  If so I'll file a Debian/Jessie
 bug report requesting that a specific mention be added to the man page.
 
 Yes, there are. It's probably -O^extref, but if you can show the
 dmesg output from the 3.2 kernel on the failed mount (so that it shows
 what the actual failure was), we should be able to give you a more
 precise answer.

So I was thinking about this, and the several other earlier options where 
support wasn't added until kernel X, and had an idea...

How easy and useful might it be to add to mkfs.btrfs appropriate option-
group aliases such that if one knew the oldest kernel one was likely to 
deal with, all one would need to do for the mkfs would be to set for 
example, -O3.2, or even simply --3.2 (or maybe even --32), and have 
mkfs.btrfs automatically set/unset the appropriate options so it would 
just work with that kernel and anything newer?

I imagine that could be a very useful feature for some, and I can't 
imagine it being too hard to setup the aliases since after all that 
should be all that's necessary, so what's left is to decide if it'd 
actually be useful enough to enough people to bother implementing and 
documenting...

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/16] btrfs: use is_xxx_kiocb instead of filp-fl_flags

2015-04-04 Thread Dmitry Monakhov
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Dmitry Monakhov dmonak...@openvz.org
---
 fs/btrfs/file.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index aee18f8..4dc3856 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1747,7 +1747,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
mutex_lock(inode-i_mutex);
 
current-backing_dev_info = inode_to_bdi(inode);
-   err = generic_write_checks(file, pos, count, S_ISBLK(inode-i_mode));
+   err = generic_write_checks(iocb, pos, count, S_ISBLK(inode-i_mode));
if (err) {
mutex_unlock(inode-i_mutex);
goto out;
@@ -1800,7 +1800,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
if (sync)
atomic_inc(BTRFS_I(inode)-sync_writers);
 
-   if (file-f_flags  O_DIRECT) {
+   if (is_direct_kiocb(iocb)) {
num_written = __btrfs_direct_write(iocb, from, pos);
} else {
num_written = __btrfs_buffered_write(file, from, pos);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/inode.c:3142

2015-04-04 Thread Ochi

Hi,

it seems like I triggered a bug after deleting some (actually all) 
subvolumes from a 2 TB backup volume (about 1.5 TB worth of data, around 
20 subvolumes, btrfs-cleaner took quite a long time), and running a 
btrfs filesystem defrag . within the volume afterwards, after cleaner 
seemed to have finished. I rebooted (had to reset because the shutdown 
process didn't finish) and tried the defrag again which immediately 
triggered the same bug.


dmesg:

[38016.025970] [ cut here ]
[38016.025976] kernel BUG at fs/btrfs/inode.c:3142!
[38016.025978] invalid opcode:  [#1] PREEMPT SMP
[38016.025980] Modules linked in: ses enclosure uas usb_storage 
nvidia_uvm(PO) fuse xt_addrtype xt_conntrack ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables bridge 
stp llc cfg80211 rfkill snd_hda_codec_hdmi ext4 crc16 mbcache jbd2 
snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic iTCO_vendor_support 
gpio_ich coretemp mousedev nvidia(PO) ppdev mxm_wmi evdev psmouse 
kvm_intel serio_raw mac_hid kvm winbond_cir i2c_i801 lpc_ich rc_core 
led_class tpm_tis drm tpm parport_pc acpi_cpufreq snd_hda_intel parport 
wmi snd_hda_controller processor snd_hda_codec button snd_hwdep snd_pcm 
e1000e snd_timer snd soundcore i7core_edac shpchp ptp pps_core edac_core 
i5500_temp sch_fq_codel asc7621 hwmon i2c_core nfs lockd
[38016.026011]  grace sunrpc fscache btrfs xor raid6_pq xts gf128mul 
algif_skcipher af_alg dm_crypt dm_mod ata_generic pata_acpi hid_generic 
usbhid hid sr_mod cdrom sd_mod pata_marvell atkbd libps2 crc32c_intel 
ahci libahci firewire_ohci libata ehci_pci uhci_hcd firewire_core 
crc_itu_t ehci_hcd scsi_mod usbcore usb_common i8042 serio
[38016.026029] CPU: 1 PID: 8534 Comm: btrfs-cleaner Tainted: P 
IO   3.19.2-1-ARCH #1
[38016.026031] Hardware name:  /DX58SO, BIOS 
SOX5810J.86A.5599.2012.0529.2218 05/29/2012
[38016.026032] task: 8803206193e0 ti: 8800b49ec000 task.ti: 
8800b49ec000
[38016.026034] RIP: 0010:[a035dea0]  [a035dea0] 
btrfs_orphan_add+0x1c0/0x1e0 [btrfs]

[38016.026049] RSP: 0018:8800b49efc38  EFLAGS: 00010286
[38016.026051] RAX: ffe4 RBX: 8800cb1b7000 RCX: 
002d
[38016.026052] RDX: 0001 RSI: 0001 RDI: 
8801f057e138
[38016.026053] RBP: 8800b49efc78 R08: 0001b9d0 R09: 
88003251f3f0
[38016.026054] R10: 88032fc3c540 R11: ea0008d0c240 R12: 
88001ab1bad0
[38016.026055] R13: 8800cacbef20 R14: 8800cb1b7458 R15: 
0001
[38016.026057] FS:  () GS:88032fc2() 
knlGS:

[38016.026058] CS:  0010 DS:  ES:  CR0: 8005003b
[38016.026059] CR2: 7fcb1f50d090 CR3: 01811000 CR4: 
07e0

[38016.026060] Stack:
[38016.026061]  8800b49efc78 a039f355 8801f057e000 
880313981800
[38016.026063]  88003251f3f0 88001ab1bad0 88031d5eda00 
880233fb7480
[38016.026065]  8800b49efd08 a0346c99 88003251f3f8 
88003251f470

[38016.026067] Call Trace:
[38016.026078]  [a039f355] ? lookup_free_space_inode+0x45/0xf0 
[btrfs]
[38016.026087]  [a0346c99] 
btrfs_remove_block_group+0x149/0x780 [btrfs]

[38016.026097]  [a03823db] btrfs_remove_chunk+0x6fb/0x7e0 [btrfs]
[38016.026105]  [a0347519] btrfs_delete_unused_bgs+0x249/0x270 
[btrfs]

[38016.026114]  [a034eae4] cleaner_kthread+0x144/0x1a0 [btrfs]
[38016.026123]  [a034e9a0] ? 
btrfs_destroy_pinned_extent+0xe0/0xe0 [btrfs]

[38016.026128]  [81091748] kthread+0xd8/0xf0
[38016.026130]  [81091670] ? kthread_create_on_node+0x1c0/0x1c0
[38016.026133]  [81562758] ret_from_fork+0x58/0x90
[38016.026135]  [81091670] ? kthread_create_on_node+0x1c0/0x1c0
[38016.026136] Code: 60 04 00 00 e9 b0 fe ff ff 66 90 89 45 c8 f0 41 80 
64 24 80 fd 4c 89 e7 e8 2e 14 fe ff 8b 45 c8 e9 1b ff ff ff 66 0f 1f 44 
00 00 0f 0b b8 f4 ff ff ff e9 10 ff ff ff 4c 89 f7 45 31 f6 e8 99 40
[38016.026156] RIP  [a035dea0] btrfs_orphan_add+0x1c0/0x1e0 
[btrfs]

[38016.026164]  RSP 8800b49efc38
[38016.026167] ---[ end trace d42bede17d45ec34 ]---


btrfs fi df:

Data, single: total=1.02TiB, used=437.50MiB
System, DUP: total=8.00MiB, used=128.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=7.00GiB, used=1.02MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=4.00MiB, used=3.87MiB


BTW, it's interesting that 437 MB of data are used since there are no 
files left on the volume.


Please let me know how I can help you to debug this.


Best regards,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html