subject:"No space left on device"

Re: "no space left on device" from tar on ppc64le

2019-06-19 Thread Jean-Denis Girard

Hi Rich,

Le 18/06/2019 à 13:19, Chris Murphy a écrit :
> On Tue, Jun 18, 2019 at 4:23 PM Rich Turner  wrote:
>>
>> tar: ./lib/modules/4.4.73-7-default/kernel/drivers/md/faulty.ko: Cannot 
>> open: No space left on device
> 
> If this really is a 4.4.73 based kernel, I expect the report is out of
> scope for this list. There have been 109 subsequent stable releases of
> the 4.4 kernel since. There nearly 3000 commits between 4.4 and 5.1.

I would also recommend using a newer kernel. I'm the one who reported
the problem back in October, and I had to make new SD cards 2 weeks ago.
I used the exact same script on kernel 5.1.x and the problem did not
come up again, ie I was able to untar without the hack to slow down
write speed.

Thanks to Btrfs developpers!

Best regards,
-- 
Jean-Denis Girard

SysNux   Systèmes   Linux   en   Polynésie  française
https://www.sysnux.pf/   Tél: +689 40.50.10.40 / GSM: +689 87.797.527

Re: "no space left on device" from tar on ppc64le

2019-06-18 Thread Chris Murphy

On Tue, Jun 18, 2019 at 4:23 PM Rich Turner  wrote:
>
> tar: ./lib/modules/4.4.73-7-default/kernel/drivers/md/faulty.ko: Cannot open: 
> No space left on device

If this really is a 4.4.73 based kernel, I expect the report is out of
scope for this list. There have been 109 subsequent stable releases of
the 4.4 kernel since. There nearly 3000 commits between 4.4 and 5.1.

Ideally you'd retest with 5.2-rc5 since that's the current mainline
that this upstream development list is actively working on fixing bugs
in. But I'd suggest at the oldest, 4.19.52 or whatever most recent
version has Btrfs backports. If it does reproduce with a recent
kernel, I suggest remounting with option enospc_debug to include
additional information; and also include the output from

$ sudo ./btrfs-debugfs -b / mntpoint

The btrfs-debugfs script can be found in upstream btrfs-progs, I'm not
sure if it's typically packaged by distributions.
https://github.com/kdave/btrfs-progs

> PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"

In the case you can't test with something recent, then I suggest
reporting it to this distribution's support. That's the way it's
supposed to work.

-- 
Chris Murphy

"no space left on device" from tar on ppc64le

2019-06-18 Thread Rich Turner

Hi List,

I am attempting to extract a tar archive into a btrfs filesystem and more often 
than not tar will fail with “No space left on device” error. I say “more often 
than not” because sometimes the extraction completes successfully even though I 
am following the same steps.

# mkfs.btrfs -f -s 65536 -n 65536 -d single -m dup /dev/system/turner_lv
# mount -v -o rw,relatime,space_cache /dev/system/turner_lv /turner
# cd /turner
# tar -xpf ~/turner.tar --selinux --xattrs --xattrs-include=*

Here is the error message for the latest attempt. Note that filename in the “no 
space” message from tar is not consistent in that it could fail on a different 
file on a separate attempt.

tar: ./lib/firmware/keyspan/usa18x.fw: Cannot open: No space left on device
tar: ./lib/firmware/nvidia/gm200/gr/sw_ctx.bin: Cannot open: No space left on 
device
tar: ./lib/modules/4.4.73-7-default/kernel/drivers/md/faulty.ko: Cannot open: 
No space left on device
tar: ./lib/modules/4.4.73-7-default/kernel/net/dns_resolver/dns_resolver.ko: 
Cannot open: No space left on device
tar: ./lib/modules/4.4.73-7-default/kernel/net/ipv4/ip_gre.ko;5cc1f748: Cannot 
open: No space left on device
tar: Exiting with failure status due to previous errors

The tar archive contains about 566M when extracted and the btrfs filesystem is 
5G in size.

A few other tidbits that I have noticed:
- this behavior is only so far found on ppc64le. I have performed the same 
steps on x86-64 without issue.
- when I slow down the write speed I have more successes 
(https://www.spinics.net/lists/linux-btrfs/msg83384.html)
- the btrfs filesystem is created on a LVM logical volume, whose volume group 
is using a device-mapper multipath device as the physical volume. note that I 
have replicated this same issue by removing LVM and multipath.

# uname -a
Linux mpath6pwr8 4.4.73-7-default #1 SMP Fri Jul 21 13:26:40 UTC 2017 (6beeafd) 
ppc64le ppc64le ppc64le GNU/Linux

# cat /etc/os-release 
NAME="SLES"
VERSION="12-SP3"
VERSION_ID="12.3"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles_sap:12:sp3”

# btrfs --version
btrfs-progs v4.5.3+20160729

# mkfs.btrfs --version
mkfs.btrfs, part of btrfs-progs v4.5.3+20160729

# tar --version
tar (GNU tar) 1.27.1

# btrfs fi show
Label: none  uuid: 5895b3f8-6d36-4d37-aabd-0cbdbca62144
Total devices 1 FS bytes used 544.62MiB
devid1 size 5.00GiB used 1.32GiB path /dev/mapper/system-root

Label: none  uuid: 8e71ab0d-0ef7-4b21-af05-c792ad0ac28d
Total devices 1 FS bytes used 9.76GiB
devid1 size 15.00GiB used 12.02GiB path /dev/mapper/system-usr_lv

Label: none  uuid: bf684abc-34db-4033-872f-c7cb46139263
Total devices 1 FS bytes used 1.56MiB
devid1 size 5.00GiB used 536.00MiB path /dev/mapper/system-home_lv

Label: none  uuid: cb0e6369-4579-4145-9c0e-a94f0fdffcc5
Total devices 1 FS bytes used 406.56MiB
devid1 size 2.00GiB used 852.75MiB path /dev/mapper/system-root_lv

Label: none  uuid: ba3219c3-f5f4-438c-86c6-80ac2fa2454d
Total devices 1 FS bytes used 92.31MiB
devid1 size 5.00GiB used 1.02GiB path /dev/mapper/system-tmp_lv

Label: none  uuid: 43418ec3-4a96-4fe9-9646-f6d2e976f85d
Total devices 1 FS bytes used 212.12MiB
devid1 size 5.00GiB used 1.02GiB path /dev/mapper/system-var_lv

Label: none  uuid: 8917ac61-6d5a-40c7-8605-7c48990baf8a
Total devices 1 FS bytes used 19.19MiB
devid1 size 5.00GiB used 1.02GiB path /dev/mapper/system-opt_lv

Label: none  uuid: 4b47b636-785a-4498-af16-8496deee26ac
Total devices 1 FS bytes used 543.75MiB
devid1 size 5.00GiB used 1.52GiB path /dev/mapper/system-turner_lv

# btrfs fi df /turner
Data, single: total=1.01GiB, used=538.69MiB
System, DUP: total=8.00MiB, used=64.00KiB
Metadata, DUP: total=256.00MiB, used=5.00MiB
GlobalReserve, single: total=16.00MiB, used=0.00B

# btrfs fi usage /turner
Overall:
Device size:   5.00GiB
Device allocated:  1.52GiB
Device unallocated:3.48GiB
Device missing:  0.00B
Used:548.81MiB
Free (estimated):  3.96GiB  (min: 2.22GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:   16.00MiB  (used: 0.00B)

Data,single: Size:1.01GiB, Used:538.69MiB
   /dev/mapper/system-turner_lv1.01GiB

Metadata,DUP: Size:256.00MiB, Used:5.00MiB
   /dev/mapper/system-turner_lv  512.00MiB

System,DUP: Size:8.00MiB, Used:64.00KiB
   /dev/mapper/system-turner_lv   16.00MiB

Unallocated:
   /dev/mapper/system-turner_lv3.48GiB





dmesg.out
Description: Binary data

btrfs error: cmds-check.c:4869: add_data_backref: Assertion `!back` failed. no space left on device

2019-03-10 Thread Leszek Dubiel




Hello!

I have a problem with btrfs device.
Shows 355Gb free space.
Done scrub on that.

It shows that there is no space left on device.
Doing simple operations (mkdir, touch, find) are extremely slow.
Checking btrfsck show add_data_backref error (see below).


Do you think I could do :  btrfs check —repair ?


root@gamma:~# btrfsck -p /dev/sda1
Checking filesystem on /dev/sda1
UUID: 666a7089-d716-44ff-8081-56b969b58eff
cmds-check.c:4869: add_data_backref: Assertion `!back` failed.



root@gamma:~# df -h /dev/sda1
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda1   1.9T  1.5T  355G  81% /tmp/rescue



root@gamma:~# btrfs --version
btrfs-progs v4.7.3

Re: btrfs filesystem failing with 'No space left on device' after 4 hours

2019-03-06 Thread Chris Murphy

On Wed, Mar 6, 2019 at 7:29 AM Michael Firth  wrote:
>
> Hi,
>
> I have a BTRFS filesystem that seems to have become very ill. After 4 hours 
> of being mounted, it will fail with every write attempt saying "No space left 
> on device".

What program/process is trying to write to the volume? Even "touch
~/hello" fails with this message? What happens if you strace the
command? If you strace and output to a file, make sure you direct the
file to a file system OTHER than root and the file system you've had
this problem with (redirect it to a USB stick or to /tmp), but if the
problem happens even with touch, or writing some zeros with dd, you
should be able to strace just to std out and copy paste the results
into a file.

>
> Unmounting and remounting the filesystem clears the issue for another 4 hours
>
> From every check I have done, no messages are logged at the point of the 
> failure to "dmesg" or any system log.

The lack of a message doesn't sound like the usual enospc. If the file
system runs out of space, even if it's wrong and it's a bug, Btrfs
will warn or info in dmesg.

>
> The output of the three (why on earth are there three?) disk space commands 
> on the filesystem are:

The three come from different eras, and the legacy 'btrfs filesystem
df' and 'btrfs filesystem show' commands were kept around for script
support I assume. I personally find it ridiculous, but also I know
developers are busy with other important issues. I think there should
be one command for humans and when meaningful improvements are made,
the old way is flat out removed. And there should be a switch to
output machine readable raw spew for scripts and such. But whatever,
not up to me!

>
> From my understanding of the output in this, there don't seem to be any areas 
> that are even close to full. And if it was a genuine full condition, even due 
> to running out of metadata or something, then I wouldn't expect unmounting 
> and remounting to clear the issue.

Yep, it's suspicious that it is kernel related. But there's a lot that
happens at umount (you can strace umount and see some of it!) that's
not just implicating Btrfs as a possible cause. It could be something
else. The lack of Btrfs errors strongly suggests it's not directly
related to Btrfs. The program is getting some idea that there's no
space left so that needs to be tracked down why it thinks this. Btrfs
doesn't think that because when it does, it reports it to dmesg.

I don't know anything about Debian and its default kernel console
message logging level, but sometimes I see for some distros that
'dmesg -n 7' needs to be issued before reproducing a problem. Maybe in
your case a hint is just not being retained by dmesg? If you're
running systemd an alternative is to get kernel messages from
'journalctl -k' for the current boot; or also 'journalctl -k
--no-pager' or output with monotonic time 'journalclt -k -o
short-monotonic > journal.txt' and so on.

> Is there any known issue that may cause this behaviour?

This list is upstream development. You'll find on ext4 and XFS list a
similar notion that distro kernels are supported by distros, not
upstream. It's a function of almost pure luck if you get the attention
of a developer who knows something about a 2 year old kernel. And 4.9
is more than 2 years old from a Btrfs development perspective, closer
to three years. Current development is happening on kernel 5.2; where
bug fixes are happening for 5.1. For practical purposes it's ordinary
to be asked to use a mainline or stable (5.0 or 4.20) kernel to see if
the problem still happens. If it does, then you've likely discovered
an unfixed bug. If it doesn't happen, you've discovered a fixed bug.
For various reasons it can be difficult to backport all bug fixes so
maybe it's in a 4.19 Debian built kernel, you'd have to test it. But
the way to limit the testing as much as possible is go straight to
5.0. If it happens there you've almost certainly found a bug that's
not yet fixed.

But even before changing kernels in your case I suggest stracing the
simplest program that reproduces the error, like even touch or cp. We
need to have some idea why the program thinks there's no more space
left while the kernel isn't reporting it.

>
> Is there any way to get more debugging from what is going on?

dmesg -n 7
and reproduce with strace + some simple command simpler reproduction the better

>
> My initial thought was that it might be related to snapshots, as I was 
> generating regular snapshots (for a 'previous versions' feature), and many of 
> the failures were just after a snapshot was created. However, I have now 
> disabled the snapshot creation and

Re: btrfs filesystem failing with 'No space left on device' after 4 hours

2019-03-06 Thread Patrik Lundquist

On Wed, 6 Mar 2019 at 16:53, Michael Firth  wrote:
>
> Is there any way to get more debugging from what is going on?

Try mounting with enospc_debug.

> The system is running stock Debian 9 (Stretch). It was running their latest 
> 4.9 kernel (Rev 4.9.144-3.1) when the problem first occurred. After two 
> instances of the problem, I rolled back to their previous kernel (Rev 
> 4.9.130-2), which the system had been running error free for several months, 
> but the failures have continued.
>

4.9 is pretty old for Btrfs. I'd use the backported kernel which
currently is 4.19.

btrfs filesystem failing with 'No space left on device' after 4 hours

2019-03-06 Thread Michael Firth

Hi,

I have a BTRFS filesystem that seems to have become very ill. After 4 hours of 
being mounted, it will fail with every write attempt saying "No space left on 
device".

Unmounting and remounting the filesystem clears the issue for another 4 hours

>From every check I have done, no messages are logged at the point of the 
>failure to "dmesg" or any system log.

I'm over 99% sure there is not a space issue on the filesystem - it has over 
100GB free, and I've run a full "balance" which has not changed the behaviour. 
A "scrub" on the filesystem hasn't reported any issues.

The output of the three (why on earth are there three?) disk space commands on 
the filesystem are:

--
$ sudo btrfs filesystem usage /home
Overall:
Device size:450.00GiB
Device allocated: 319.06GiB
Device unallocated:130.94GiB
Device missing:  0.00B
Used:305.95GiB
Free (estimated):131.77GiB   (min: 66.30GiB)
Data ratio:1.00
Metadata ratio:  2.00
Global reserve: 512.00MiB  (used: 0.00B)

Data,single: Size:299.00GiB, Used:298.16GiB
   /dev/mapper/VG-HomeVol  299.00GiB

Metadata,DUP: Size:10.00GiB, Used:3.89GiB
   /dev/mapper/VG-HomeVol20.00GiB

System,DUP: Size:32.00MiB, Used:80.00KiB
   /dev/mapper/VG-HomeVol64.00MiB

Unallocated:
   /dev/mapper/VG-HomeVol  130.94GiB

$ sudo btrfs filesystem df /home
Data, single: total=299.00GiB, used=298.16GiB
System, DUP: total=32.00MiB, used=80.00KiB
Metadata, DUP: total=10.00GiB, used=3.89GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

$ sudo btrfs filesystem show /home
Label: none  uuid: 550e6e7c-d669-4128-9b0d-b61ef4f3f1c1
Total devices 1 FS bytes used 302.07GiB
devid1 size 450.00GiB used 319.06GiB path 
/dev/mapper/VG-HomeVol
--

>From my understanding of the output in this, there don't seem to be any areas 
>that are even close to full. And if it was a genuine full condition, even due 
>to running out of metadata or something, then I wouldn't expect unmounting and 
>remounting to clear the issue.

Is there any known issue that may cause this behaviour?

Is there any way to get more debugging from what is going on?

My initial thought was that it might be related to snapshots, as I was 
generating regular snapshots (for a 'previous versions' feature), and many of 
the failures were just after a snapshot was created. However, I have now 
disabled the snapshot creation and I am still seeing regular failures.

The system is running stock Debian 9 (Stretch). It was running their latest 4.9 
kernel (Rev 4.9.144-3.1) when the problem first occurred. After two instances 
of the problem, I rolled back to their previous kernel (Rev 4.9.130-2), which 
the system had been running error free for several months, but the failures 
have continued.

I'm happy to get any other information that would be needed to debug this, if 
someone can point me to how to do it.

Currently my faith in BTRFS is approaching zero (it was knocked after a data 
loss in October, but had grown again). It has a lot of nice features, but 
(despite comments on the Wiki) really does not seem stable, at least not in 
Debian.

Thanks

Michael

Re: a new kind of "No space left on device" error

2018-10-29 Thread Henk Slager

On Mon, Oct 29, 2018 at 7:20 AM Dave  wrote:
>
> This is one I have not seen before.
>
> When running a simple, well-tested and well-used script that makes
> backups using btrfs send | receive, I got these two errors:
>
> At subvol snapshot
> ERROR: rename o131621-1091-0 ->
> usr/lib/node_modules/node-gyp/gyp/pylib/gyp/MSVSVersion.py failed: No
> space left on device
>
> At subvol snapshot
> ERROR: rename o259-1095-0 -> myser/.bash_profile failed: No space left on 
> device
>
> I have run this script many, many times and never seen errors like
> this. There is plenty of room on the device:
>
> # btrfs fi df /mnt/
> Data, single: total=18.01GiB, used=16.53GiB
> System, DUP: total=8.00MiB, used=16.00KiB
> Metadata, DUP: total=1.00GiB, used=145.12MiB
> GlobalReserve, single: total=24.53MiB, used=0.00B
>
> # df -h /mnt/
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/sdc254G   17G   36G  33% /mnt
>
> The send | receive appears to have mostly succeeded because the final
> expected size is about 17G, as shown above. That will use only about
> 1/3 of the available disk space, when completed. I don't see any
> reason for "No space left on device" errors, but maybe somebody here
> can spot a problem I am missing.
What kernel and progs versions?
What are the mount options for the filesystem?
Can you tell something about the device /dev/sdc2 (SSD, HDD, SD-card,
USBstick, LANstorage, etc)?
Could it be that your ENOSPACE has the same cause as this:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg81554.html

a new kind of "No space left on device" error

2018-10-28 Thread Dave

This is one I have not seen before.

When running a simple, well-tested and well-used script that makes
backups using btrfs send | receive, I got these two errors:

At subvol snapshot
ERROR: rename o131621-1091-0 ->
usr/lib/node_modules/node-gyp/gyp/pylib/gyp/MSVSVersion.py failed: No
space left on device

At subvol snapshot
ERROR: rename o259-1095-0 -> myser/.bash_profile failed: No space left on device

I have run this script many, many times and never seen errors like
this. There is plenty of room on the device:

# btrfs fi df /mnt/
Data, single: total=18.01GiB, used=16.53GiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=1.00GiB, used=145.12MiB
GlobalReserve, single: total=24.53MiB, used=0.00B

# df -h /mnt/
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdc254G   17G   36G  33% /mnt

The send | receive appears to have mostly succeeded because the final
expected size is about 17G, as shown above. That will use only about
1/3 of the available disk space, when completed. I don't see any
reason for "No space left on device" errors, but maybe somebody here
can spot a problem I am missing.

Re: btrfs send receive: No space left on device

2018-10-17 Thread Henk Slager

On Wed, Oct 17, 2018 at 10:29 AM Libor Klepáč  wrote:
>
> Hello,
> i have new 32GB SSD in my intel nuc, installed debian9 on it, using btrfs as 
> a rootfs.
> Then i created subvolumes /system and /home and moved system there.
>
> System was installed using kernel 4.9.x and filesystem created using 
> btrfs-progs 4.7.x
> Details follow:
> main filesystem
>
> # btrfs filesystem usage /mnt/btrfs/ssd/
> Overall:
> Device size:  29.08GiB
> Device allocated:  4.28GiB
> Device unallocated:   24.80GiB
> Device missing:  0.00B
> Used:  2.54GiB
> Free (estimated): 26.32GiB  (min: 26.32GiB)
> Data ratio:   1.00
> Metadata ratio:   1.00
> Global reserve:   16.00MiB  (used: 0.00B)
>
> Data,single: Size:4.00GiB, Used:2.48GiB
>/dev/sda3   4.00GiB
>
> Metadata,single: Size:256.00MiB, Used:61.05MiB
>/dev/sda3 256.00MiB
>
> System,single: Size:32.00MiB, Used:16.00KiB
>/dev/sda3  32.00MiB
>
> Unallocated:
>/dev/sda3  24.80GiB
>
> #/etc/fstab
> UUID=d801da52-813d-49da-bdda-87fc6363e0ac   /mnt/btrfs/ssd  btrfs 
> noatime,space_cache=v2,compress=lzo,commit=300,subvolid=5 0   0
> UUID=d801da52-813d-49da-bdda-87fc6363e0ac   /   btrfs 
>   noatime,space_cache=v2,compress=lzo,commit=300,subvol=/system 0 
>   0
> UUID=d801da52-813d-49da-bdda-87fc6363e0ac   /home   btrfs 
>   noatime,space_cache=v2,compress=lzo,commit=300,subvol=/home 0   > 0
>
> -
> Then i installed kernel from backports:
> 4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1
> and btrfs-progs 4.17
>
> For backups , i have created 16GB iscsi device on my qnap and mounted it, 
> created filesystem, mounted like this:
> LABEL=backup/mnt/btrfs/backup   btrfs 
>   noatime,space_cache=v2,compress=lzo,subvolid=5,nofail,noauto 0  
>  0
>
> After send-receive operation on /home subvolume, usage looks like this:
>
> # btrfs filesystem usage /mnt/btrfs/backup/
> Overall:
> Device size:  16.00GiB
> Device allocated:  1.27GiB
> Device unallocated:   14.73GiB
> Device missing:  0.00B
> Used:844.18MiB
> Free (estimated): 14.92GiB  (min: 14.92GiB)
> Data ratio:   1.00
> Metadata ratio:   1.00
> Global reserve:   16.00MiB  (used: 0.00B)
>
> Data,single: Size:1.01GiB, Used:833.36MiB
>/dev/sdb1.01GiB
>
> Metadata,single: Size:264.00MiB, Used:10.80MiB
>/dev/sdb  264.00MiB
>
> System,single: Size:4.00MiB, Used:16.00KiB
>/dev/sdb4.00MiB
>
> Unallocated:
>/dev/sdb   14.73GiB
>
>
> Problem is, during send-receive of system subvolume, it runs out of space:
>
> # btrbk run /mnt/btrfs/ssd/system/ -v
> btrbk command line client, version 0.26.1  (Wed Oct 17 09:51:20 2018)
> Using configuration: /etc/btrbk/btrbk.conf
> Using transaction log: /var/log/btrbk.log
> Creating subvolume snapshot for: /mnt/btrfs/ssd/system
> [snapshot] source: /mnt/btrfs/ssd/system
> [snapshot] target: /mnt/btrfs/ssd/_snapshots/system.20181017T0951
> Checking for missing backups of subvolume "/mnt/btrfs/ssd/system" in 
> "/mnt/btrfs/backup/"
> Creating subvolume backup (send-receive) for: 
> /mnt/btrfs/ssd/_snapshots/system.20181016T2034
> No common parent subvolume present, creating full backup...
> [send/receive] source: /mnt/btrfs/ssd/_snapshots/system.20181016T2034
> [send/receive] target: /mnt/btrfs/backup/system.20181016T2034
> mbuffer: error: outputThread: error writing to  at offset 0x4b5bd000: 
> Broken pipe
> mbuffer: warning: error during output to : Broken pipe
> WARNING: [send/receive] (send=/mnt/btrfs/ssd/_snapshots/system.20181016T2034, 
> receive=/mnt/btrfs/backup) At subvol 
> /mnt/btrfs/ssd/_snapshots/system.20181016T2034
> WARNING: [send/receive] (send=/mnt/btrfs/ssd/_snapshots/system.20181016T2034, 
> receive=/mnt/btrfs/backup) At subvol system.20181016T2034
> ERROR: rename o77417-5519-0 -> 
> lib/modules/4.18.0-0.bpo.1-amd64/kernel/drivers/watchdog/pcwd_pci.ko failed: 
> No space left on device
> ERROR: Failed to send/receive btrfs subvolume: 
> /mnt/btrfs/ssd/_snapshots/system.20181016T2034  -> /mnt/btrfs/backup
> [delete] options: commit-after
> [delete] target: /mnt/btrfs/backup/syste

btrfs send receive: No space left on device

2018-10-17 Thread Libor Klepáč

Hello,
i have new 32GB SSD in my intel nuc, installed debian9 on it, using btrfs as a 
rootfs.
Then i created subvolumes /system and /home and moved system there.

System was installed using kernel 4.9.x and filesystem created using 
btrfs-progs 4.7.x
Details follow:
main filesystem

# btrfs filesystem usage /mnt/btrfs/ssd/
Overall:
Device size:  29.08GiB
Device allocated:  4.28GiB
Device unallocated:   24.80GiB
Device missing:  0.00B
Used:  2.54GiB
Free (estimated): 26.32GiB  (min: 26.32GiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:   16.00MiB  (used: 0.00B)

Data,single: Size:4.00GiB, Used:2.48GiB
   /dev/sda3   4.00GiB

Metadata,single: Size:256.00MiB, Used:61.05MiB
   /dev/sda3 256.00MiB

System,single: Size:32.00MiB, Used:16.00KiB
   /dev/sda3  32.00MiB

Unallocated:
   /dev/sda3  24.80GiB

#/etc/fstab
UUID=d801da52-813d-49da-bdda-87fc6363e0ac   /mnt/btrfs/ssd  btrfs 
noatime,space_cache=v2,compress=lzo,commit=300,subvolid=5 0   0
UUID=d801da52-813d-49da-bdda-87fc6363e0ac   /   btrfs   
noatime,space_cache=v2,compress=lzo,commit=300,subvol=/system 0   0
UUID=d801da52-813d-49da-bdda-87fc6363e0ac   /home   btrfs   
noatime,space_cache=v2,compress=lzo,commit=300,subvol=/home 0   0

-
Then i installed kernel from backports:
4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1
and btrfs-progs 4.17

For backups , i have created 16GB iscsi device on my qnap and mounted it, 
created filesystem, mounted like this:
LABEL=backup/mnt/btrfs/backup   btrfs   
noatime,space_cache=v2,compress=lzo,subvolid=5,nofail,noauto 0   0

After send-receive operation on /home subvolume, usage looks like this:

# btrfs filesystem usage /mnt/btrfs/backup/
Overall:
Device size:  16.00GiB
Device allocated:  1.27GiB
Device unallocated:   14.73GiB
Device missing:  0.00B
Used:844.18MiB
Free (estimated): 14.92GiB  (min: 14.92GiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:   16.00MiB  (used: 0.00B)

Data,single: Size:1.01GiB, Used:833.36MiB
   /dev/sdb1.01GiB

Metadata,single: Size:264.00MiB, Used:10.80MiB
   /dev/sdb  264.00MiB

System,single: Size:4.00MiB, Used:16.00KiB
   /dev/sdb4.00MiB

Unallocated:
   /dev/sdb   14.73GiB


Problem is, during send-receive of system subvolume, it runs out of space:

# btrbk run /mnt/btrfs/ssd/system/ -v  
btrbk command line client, version 0.26.1  (Wed Oct 17 09:51:20 2018)
Using configuration: /etc/btrbk/btrbk.conf
Using transaction log: /var/log/btrbk.log
Creating subvolume snapshot for: /mnt/btrfs/ssd/system
[snapshot] source: /mnt/btrfs/ssd/system
[snapshot] target: /mnt/btrfs/ssd/_snapshots/system.20181017T0951
Checking for missing backups of subvolume "/mnt/btrfs/ssd/system" in 
"/mnt/btrfs/backup/"
Creating subvolume backup (send-receive) for: 
/mnt/btrfs/ssd/_snapshots/system.20181016T2034
No common parent subvolume present, creating full backup...
[send/receive] source: /mnt/btrfs/ssd/_snapshots/system.20181016T2034
[send/receive] target: /mnt/btrfs/backup/system.20181016T2034
mbuffer: error: outputThread: error writing to  at offset 0x4b5bd000: 
Broken pipe
mbuffer: warning: error during output to : Broken pipe
WARNING: [send/receive] (send=/mnt/btrfs/ssd/_snapshots/system.20181016T2034, 
receive=/mnt/btrfs/backup) At subvol 
/mnt/btrfs/ssd/_snapshots/system.20181016T2034
WARNING: [send/receive] (send=/mnt/btrfs/ssd/_snapshots/system.20181016T2034, 
receive=/mnt/btrfs/backup) At subvol system.20181016T2034
ERROR: rename o77417-5519-0 -> 
lib/modules/4.18.0-0.bpo.1-amd64/kernel/drivers/watchdog/pcwd_pci.ko failed: No 
space left on device
ERROR: Failed to send/receive btrfs subvolume: 
/mnt/btrfs/ssd/_snapshots/system.20181016T2034  -> /mnt/btrfs/backup
[delete] options: commit-after
[delete] target: /mnt/btrfs/backup/system.20181016T2034
WARNING: Deleted partially received (garbled) subvolume: 
/mnt/btrfs/backup/system.20181016T2034
ERROR: Error while resuming backups, aborting
Created 0/2 missing backups
WARNING: Skipping cleanup of snapshots for subvolume "/mnt/btrfs/ssd/system", 
as at least one target aborted earlier
Completed within: 116s  (Wed Oct 17 09:53:16 2018)

Backup Summary (btrbk command line client, version 0.26.1)

Date:   Wed Oct 17 09:51:20 2018
Config: /etc/btrbk/btrbk.conf
Filter: subvolume=/mnt/btrfs/ssd/system

Legend:
===  up-to-date subvolume (source snapshot

Re: how to run balance successfully (No space left on device)?

2017-11-10 Thread Martin Raiber

On 10.11.2017 22:51 Chris Murphy wrote:
>> Combined with evidence that "No space left on device" during balance can
>> lead to various file corruption (we've witnessed it with MySQL), I'd day
>> btrfs balance is a dangerous operation and decision to use it should be
>> considered very thoroughly.
> I've never heard of this. Balance is COW at the chunk level. The old
> chunk is not dereferenced until it's written in the new location
> correctly. Corruption during balance shouldn't be possible so if you
> have a reproducer, the devs need to know about it.

I didn't say anything before, because I could not reproduce the problem.
I had (I guess) a corruption caused by balance as well. It had ENOSPC in
spite of enough free space (4.9.x), which made me balance it regularly
to keep unallocated space around. Corruption occured probably after or
shortly before power reset during a balance -- no skip_balance specified
so it continued directly after mount -- data was moved relatively fast
after the mount operation (copy file then delete old file). I think
space_cache=v2 was active at the time. I'm of course not completely sure
it was btrfs's fault and as usual not all the conditions may be
relevant. Could also be instead an upper layer error (Hyper-V storage),
memory issue or an application error.

Regards,
Martin Raiber

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-11-10 Thread Chris Murphy

On Fri, Nov 10, 2017 at 12:42 AM, Tomasz Chmielewski  wrote:
> On 2017-11-07 23:49, E V wrote:
>
>> Hmm, I used to see these phantom no space issues quite a bit on older
>> 4.x kernels, and haven't seen them since switching to space_cache=v2.
>> So it could be space cache corruption. You might try either clearing
>> you space cache, or mounting with nospace_cache, or try converting to
>> space_cache=v2 after reading up on it's caveats.
>
>
> We have space_cache=v2.

I have no idea if it's related or not, as this isn't a default mount
option and is still under testing.

>
> Unfortunately yet one more system running 4.14-rc8 with "No space left"
> during balance:
>
>
> [68443.535664] BTRFS info (device sdb3): relocating block group 591771009024
> flags data|raid1
> [68463.203330] BTRFS info (device sdb3): found 8578 extents
> [68492.238676] BTRFS info (device sdb3): found 8559 extents
> [68500.751792] BTRFS info (device sdb3): 1 enospc errors during balance
>
>
> # btrfs balance start /var/lib/lxd
> WARNING:
>
> Full balance without filters requested. This operation is very
> intense and takes potentially very long. It is recommended to
> use the balance filters to narrow down the balanced data.
> Use 'btrfs balance start --full-balance' option to skip this
> warning. The operation will start in 10 seconds.
>     Use Ctrl-C to stop it.
> 10 9 8 7 6 5 4 3 2 1
> Starting balance without any filters.
> ERROR: error during balancing '/var/lib/lxd': No space left on device
> There may be more info in syslog - try dmesg | tail

OK I wonder if this is a bug in user space tool's error handling?
Because what you have in kernel messages is BTRFS info. It is not a
warning or an error. I interpret this as enospc error happened but it
recovered, so it was not an unhandled error condition, and definitely
non-fatal. But the user space tool is reporting a bogus "No space left
on device". It's plainly bogus because you have a lot of space on the
device, including unallocated space. So the user space tool needs to
either ignore this type of informational enospc or it needs a
different message to make it clear this is not a fatal error and was
properly handled.

Do you get any additional information when using enospc_debug mount
option and reproduce this problem?

> Unallocated:
>/dev/sda3 112.00GiB
>/dev/sdb3 112.00GiB

Metric shittons of space. The error is certainly bogus.

> Combined with evidence that "No space left on device" during balance can
> lead to various file corruption (we've witnessed it with MySQL), I'd day
> btrfs balance is a dangerous operation and decision to use it should be
> considered very thoroughly.

I've never heard of this. Balance is COW at the chunk level. The old
chunk is not dereferenced until it's written in the new location
correctly. Corruption during balance shouldn't be possible so if you
have a reproducer, the devs need to know about it.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-11-09 Thread Tomasz Chmielewski


On 2017-11-07 23:49, E V wrote:


Hmm, I used to see these phantom no space issues quite a bit on older
4.x kernels, and haven't seen them since switching to space_cache=v2.
So it could be space cache corruption. You might try either clearing
you space cache, or mounting with nospace_cache, or try converting to
space_cache=v2 after reading up on it's caveats.


We have space_cache=v2.

Unfortunately yet one more system running 4.14-rc8 with "No space left" 
during balance:



[68443.535664] BTRFS info (device sdb3): relocating block group 
591771009024 flags data|raid1

[68463.203330] BTRFS info (device sdb3): found 8578 extents
[68492.238676] BTRFS info (device sdb3): found 8559 extents
[68500.751792] BTRFS info (device sdb3): 1 enospc errors during balance


# btrfs balance start /var/lib/lxd
WARNING:

Full balance without filters requested. This operation is very
intense and takes potentially very long. It is recommended to
use the balance filters to narrow down the balanced data.
Use 'btrfs balance start --full-balance' option to skip this
warning. The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/var/lib/lxd': No space left on device
There may be more info in syslog - try dmesg | tail


# btrfs fi usage /var/lib/lxd
Overall:
Device size: 846.26GiB
Device allocated:622.27GiB
Device unallocated:  223.99GiB
Device missing:  0.00B
Used:606.40GiB
Free (estimated):116.68GiB  (min: 116.68GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:306.00GiB, Used:301.31GiB
   /dev/sda3 306.00GiB
   /dev/sdb3 306.00GiB

Metadata,RAID1: Size:5.10GiB, Used:1.89GiB
   /dev/sda3   5.10GiB
   /dev/sdb3   5.10GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB
   /dev/sda3  32.00MiB
   /dev/sdb3  32.00MiB

Unallocated:
   /dev/sda3 112.00GiB
   /dev/sdb3 112.00GiB


# btrfs fi show /var/lib/lxd
Label: 'btrfs'  uuid: 6340f5de-f635-4d09-bbb2-1e03b1e1b160
Total devices 2 FS bytes used 303.20GiB
devid1 size 423.13GiB used 311.13GiB path /dev/sda3
devid2 size 423.13GiB used 311.13GiB path /dev/sdb3


# btrfs fi df /var/lib/lxd
Data, RAID1: total=306.00GiB, used=301.32GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=5.10GiB, used=1.89GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



So far out of all systems which were giving us "No space left on device" 
with 4.13.x, all but one are still giving us "No space left on device" 
during balance with 4.14-rc7 and later.
We've seen it on a mix of servers with SSD or HDD disks, with 
filesystems ranging from 0.5 TB to 20 TB, and use % from 30% to 90%.


Combined with evidence that "No space left on device" during balance can 
lead to various file corruption (we've witnessed it with MySQL), I'd day 
btrfs balance is a dangerous operation and decision to use it should be 
considered very thoroughly.



Shouldn't "Balance" be marked as "mostly OK" or "Unstable" here? Giving 
it "OK" status is misleading.


https://btrfs.wiki.kernel.org/index.php/Status


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Error During Balancing - No Space Left on Device

2017-11-07 Thread Ben Hooper

For the archives…

I replaced the 2x2TB drives with 8TB drives and now I see the correct amount of 
space and the balancing process is working without error so far.

# btrfs replace start 2 /dev/sdaa /data

(replace old 2TB /dev/sdk with new 8TB drive /dev/sdaa)

# btrfs replace start 3 /dev/sdab /data

(replace old 2TB /dev/sdk with new 8TB drive /dev/sdab)

The I physically removed the old drives, rebooted and resized the disks:

# btrfs filesystem resize 2:max /data
# btrfs filesystem resize 3:max /data

# df -h | grep data
/dev/sdk  73T   60T   12T  85% /data

# uname -a
Linux nas 4.14.0-rc7+ #1 SMP Thu Nov 2 00:35:35 HKT 2017 x86_64 x86_64 x86_64 
GNU/Linux

# btrfs filesystem show /data
Label: 'data'  uuid: 3742056f-7ff0-4ce1-9131-ab2cfd7b8736
Total devices 24 FS bytes used 59.87TiB
devid2 size 7.28TiB used 1.82TiB path /dev/sdk
devid3 size 7.28TiB used 1.80TiB path /dev/sdj
devid5 size 3.64TiB used 3.64TiB path /dev/sdo
devid7 size 2.73TiB used 2.73TiB path /dev/sdr
devid   11 size 3.64TiB used 3.64TiB path /dev/sdm
devid   12 size 3.64TiB used 3.64TiB path /dev/sdn
devid   13 size 3.64TiB used 3.64TiB path /dev/sdp
devid   14 size 3.64TiB used 3.64TiB path /dev/sdq
devid   15 size 3.64TiB used 3.64TiB path /dev/sdw
devid   16 size 3.64TiB used 3.64TiB path /dev/sdx
devid   17 size 7.28TiB used 7.28TiB path /dev/sdv
devid   18 size 7.28TiB used 7.28TiB path /dev/sdu
devid   19 size 7.28TiB used 7.28TiB path /dev/sds
devid   20 size 7.28TiB used 7.28TiB path /dev/sdt
devid   21 size 7.28TiB used 7.28TiB path /dev/sdc
devid   22 size 7.28TiB used 7.28TiB path /dev/sdd
devid   23 size 7.28TiB used 7.27TiB path /dev/sdf
devid   24 size 7.28TiB used 7.27TiB path /dev/sde
devid   25 size 7.28TiB used 7.26TiB path /dev/sdl
devid   26 size 7.28TiB used 7.26TiB path /dev/sdh
devid   27 size 7.28TiB used 7.26TiB path /dev/sdi
devid   28 size 7.28TiB used 6.95TiB path /dev/sda
devid   29 size 7.28TiB used 690.25GiB path /dev/sdg
devid   30 size 7.28TiB used 690.25GiB path /dev/sdb

> On 2 Nov 2017, at 11:56 pm, Ben Hooper  wrote:
> 
> Duncan, 
> 
> Thanks for the reply (apologies, I am not subscribed to the list). 
> 
> I compiled 4.14.0-rc7 and tried balance again with increasing amounts of 
> dusage but it errored out again at dusage=60 with enospc. Will try 
> downgrading to 4.9.x and try again.
> 
> # uname -a
> Linux nas 4.14.0-rc7+ #1 SMP Thu Nov 2 00:35:35 HKT 2017 x86_64 x86_64 x86_64 
> GNU/Linux
> 
> # dmesg | grep enospc
> [ 6989.220434] BTRFS info (device sdm): 8 enospc errors during balance
> 
> Cheers,
> 
> Ben
> 
> 
>> On 2 Nov 2017, at 12:18 am, Ben Hooper  wrote:
>> 
>> Hello,
>> 
>> I am trying to upgrade capacity on by btrfs filesystem by replacing smaller 
>> disks with larger ones. I added 2x8TB drives to the existing RAID10 but am 
>> not seeing the expected increase in space and am experiencing enospc errors 
>> during balance. This array has been extended several times but this is the 
>> first time I have seen any issues.
>> 
>> Looking at the list archives, it seems that some others have had similar 
>> problems. Has anyone found a solution or any recommendations?
>> 
>> Thanks,
>> 
>> Ben
>> 
>> Cannot remove device, no space left on device
>> https://marc.info/?l=linux-btrfs&m=150684414519356&w=2
>> 
>> how to run balance successfully (No space left on device)?
>> https://marc.info/?l=linux-btrfs&m=150566058009527&w=2
>> 
>> 
>> # btrfs balance start -v -dusage=0 /data
>> Dumping filters: flags 0x1, state 0x0, force is off
>> DATA (flags 0x2): balancing, usage=0
>> ERROR: error during balancing '/data': No space left on device
>> There may be more info in syslog - try dmesg | tail
>> 
>> # dmesg | tail
>> [49518.949915] BTRFS info (device sdg): relocating block group 
>> 241419042029568 flags data|raid10
>> [49519.899061] BTRFS info (device sdg): relocating block group 
>> 241419041767424 flags data|raid10
>> [49521.448691] BTRFS info (device sdg): relocating block group 
>> 241419041505280 flags data|raid10
>> [49522.136725] BTRFS info (device sdg): relocating block group 
>> 241419041243136 flags data|raid10
>> [49522.877266] BTRFS info (device sdg): relocating block group 
>> 241419040980992 flags data|raid10
>> [49524.702461] BTRFS info (device sdg): relocating block group 
>> 241419040718848 flags data|raid10
>> [49525.068713] BTRFS info (device sdg): relocating block group 
>

Re: how to run balance successfully (No space left on device)?

2017-11-06 Thread Tomasz Chmielewski


On 2017-10-31 23:18, Tomasz Chmielewski wrote:

On 2017-09-18 17:20, Tomasz Chmielewski wrote:

# df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a 
situation
such as this, as it really doesn't give you the information you need 
(it

can say you have lots of space available, but if btrfs has all of it
allocated into chunks, even if the chunks have space in them still, 
there

can be problems).


I see here on RAID-1, "df -h" it shows pretty much the same amount of
free space as "btrfs fi show":

- "df -h" shows 105G free
- "btrfs fi show" says: Free (estimated):104.28GiB
(min: 104.28GiB)




But chances are pretty good that one you get that patch integrated,
whether by integrating it yourself to what you have currently, or by
trying 4.14-rc1 or waiting until it hits release or stable, that bug 
will

have been squashed! =:^)


OK, will wait for 4.14.


So I've tried to run balance with 4.14-rc6.


I've also tried with 4.14-rc7 on a server which was failing with "no 
space left" - unfortunately, it's still failing:



# time btrfs balance start /srv
WARNING:

Full balance without filters requested. This operation is very
intense and takes potentially very long. It is recommended to
use the balance filters to narrow down the scope of balance.
Use 'btrfs balance start --full-balance' option to skip this
warning. The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/srv': No space left on device
There may be more info in syslog - try dmesg | tail

real8731m13.424s
user0m0.000s
sys 560m36.363s



# dmesg -c
(...)
[546228.496902] BTRFS info (device sda4): relocating block group 
297455845376 flags data|raid1

[546251.393541] BTRFS info (device sda4): found 107799 extents
[546512.346360] BTRFS info (device sda4): found 107799 extents
[546529.407077] BTRFS info (device sda4): relocating block group 
296382103552 flags metadata|raid1

[546692.465746] BTRFS info (device sda4): found 35202 extents
[546733.294172] BTRFS info (device sda4): found 2586 extents
[546738.487556] BTRFS info (device sda4): relocating block group 
295308361728 flags data|raid1

[546770.474409] BTRFS info (device sda4): found 140906 extents
[547037.744023] BTRFS info (device sda4): found 140906 extents
[547065.840993] BTRFS info (device sda4): 117 enospc errors during 
balance



# btrfs fi df /srv
Data, RAID1: total=2.46TiB, used=2.35TiB
System, RAID1: total=32.00MiB, used=416.00KiB
Metadata, RAID1: total=19.00GiB, used=12.92GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


# btrfs fi show /srv
Label: 'btrfs'  uuid: 105b2e0c-8af2-45ee-b4c8-14ff0a3ca899
Total devices 2 FS bytes used 2.36TiB
devid1 size 2.63TiB used 2.48TiB path /dev/sda4
devid2 size 2.63TiB used 2.48TiB path /dev/sdb4


# btrfs fi usage /srv
Overall:
Device size:   5.25TiB
Device allocated:  4.96TiB
Device unallocated:  302.00GiB
Device missing:  0.00B
Used:  4.72TiB
Free (estimated):268.66GiB  (min: 268.66GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:2.46TiB, Used:2.35TiB
   /dev/sda4   2.46TiB
   /dev/sdb4   2.46TiB

Metadata,RAID1: Size:19.00GiB, Used:12.92GiB
   /dev/sda4  19.00GiB
   /dev/sdb4  19.00GiB

System,RAID1: Size:32.00MiB, Used:416.00KiB
   /dev/sda4  32.00MiB
   /dev/sdb4  32.00MiB

Unallocated:
   /dev/sda4 151.00GiB
   /dev/sdb4 151.00GiB


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Error During Balancing - No Space Left on Device

2017-11-02 Thread Ben Hooper

Duncan, 

Thanks for the reply (apologies, I am not subscribed to the list). 

I compiled 4.14.0-rc7 and tried balance again with increasing amounts of dusage 
but it errored out again at dusage=60 with enospc. Will try downgrading to 
4.9.x and try again.

# uname -a
Linux nas 4.14.0-rc7+ #1 SMP Thu Nov 2 00:35:35 HKT 2017 x86_64 x86_64 x86_64 
GNU/Linux
 
# dmesg | grep enospc
[ 6989.220434] BTRFS info (device sdm): 8 enospc errors during balance

Cheers,

Ben


> On 2 Nov 2017, at 12:18 am, Ben Hooper  wrote:
> 
> Hello,
> 
> I am trying to upgrade capacity on by btrfs filesystem by replacing smaller 
> disks with larger ones. I added 2x8TB drives to the existing RAID10 but am 
> not seeing the expected increase in space and am experiencing enospc errors 
> during balance. This array has been extended several times but this is the 
> first time I have seen any issues.
> 
> Looking at the list archives, it seems that some others have had similar 
> problems. Has anyone found a solution or any recommendations?
> 
> Thanks,
> 
> Ben
> 
> Cannot remove device, no space left on device
> https://marc.info/?l=linux-btrfs&m=150684414519356&w=2
> 
> how to run balance successfully (No space left on device)?
> https://marc.info/?l=linux-btrfs&m=150566058009527&w=2
> 
> 
> # btrfs balance start -v -dusage=0 /data
> Dumping filters: flags 0x1, state 0x0, force is off
>  DATA (flags 0x2): balancing, usage=0
> ERROR: error during balancing '/data': No space left on device
> There may be more info in syslog - try dmesg | tail
> 
> # dmesg | tail
> [49518.949915] BTRFS info (device sdg): relocating block group 
> 241419042029568 flags data|raid10
> [49519.899061] BTRFS info (device sdg): relocating block group 
> 241419041767424 flags data|raid10
> [49521.448691] BTRFS info (device sdg): relocating block group 
> 241419041505280 flags data|raid10
> [49522.136725] BTRFS info (device sdg): relocating block group 
> 241419041243136 flags data|raid10
> [49522.877266] BTRFS info (device sdg): relocating block group 
> 241419040980992 flags data|raid10
> [49524.702461] BTRFS info (device sdg): relocating block group 
> 241419040718848 flags data|raid10
> [49525.068713] BTRFS info (device sdg): relocating block group 
> 241419040456704 flags data|raid10
> [49525.656543] BTRFS info (device sdg): relocating block group 
> 241419040194560 flags data|raid10
> [49549.168836] BTRFS info (device sdg): relocating block group 
> 241419039932416 flags data|raid10
> [49622.578101] BTRFS info (device sdg): 1 enospc errors during balance
> 
> 
> # btrfs --version
> btrfs-progs v4.13.3
> 
> # uname -a
> Linux nas 4.13.9-1.el7.elrepo.x86_64 #1 SMP Sun Oct 22 10:02:34 EDT 2017 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> # btrfs filesystem df /data
> Data, RAID10: total=59.79TiB, used=59.77TiB
> System, RAID10: total=176.00MiB, used=133.80MiB
> Metadata, RAID10: total=97.69GiB, used=96.47GiB
> GlobalReserve, single: total=512.00MiB, used=16.00KiB
> 
> # df -h | grep data
> /dev/sdj  67T   60T  269G 100% /data
> 
> 
> # btrfs filesystem show /data
> Label: 'data'  uuid: 3742056f-7ff0-4ce1-9131-ab2cfd7b8736
>Total devices 24 FS bytes used 59.87TiB
>devid2 size 1.82TiB used 1.82TiB path /dev/sdj
>devid3 size 1.82TiB used 1.70TiB path /dev/sdk
>devid5 size 3.64TiB used 3.64TiB path /dev/sdo
>devid7 size 2.73TiB used 2.73TiB path /dev/sdr
>devid   11 size 3.64TiB used 3.64TiB path /dev/sdm
>devid   12 size 3.64TiB used 3.64TiB path /dev/sdn
>devid   13 size 3.64TiB used 3.64TiB path /dev/sdp
>devid   14 size 3.64TiB used 3.64TiB path /dev/sdq
>devid   15 size 3.64TiB used 3.64TiB path /dev/sdw
>devid   16 size 3.64TiB used 3.64TiB path /dev/sdx
>devid   17 size 7.28TiB used 7.28TiB path /dev/sdv
>devid   18 size 7.28TiB used 7.28TiB path /dev/sdu
>devid   19 size 7.28TiB used 7.28TiB path /dev/sds
>devid   20 size 7.28TiB used 7.28TiB path /dev/sdt
>devid   21 size 7.28TiB used 7.28TiB path /dev/sdc
>devid   22 size 7.28TiB used 7.28TiB path /dev/sdd
>devid   23 size 7.28TiB used 7.28TiB path /dev/sdf
>devid   24 size 7.28TiB used 7.27TiB path /dev/sde
>devid   25 size 7.28TiB used 7.28TiB path /dev/sdl
>devid   26 size 7.28TiB used 7.28TiB path /dev/sdh
>devid   27 size 7.28TiB used 7.28TiB path /dev/sdi
>devid   28 size 7.28TiB used 6.86TiB path /dev/sda
>devid   29 size 7.28TiB used 586.88GiB path /dev/sdg
>devid   30 size 7.28TiB used 586.88GiB path /dev/sdb
> 
> 
> # btrfs device us

Re: Error During Balancing - No Space Left on Device

2017-11-01 Thread Duncan

Ben Hooper posted on Wed, 01 Nov 2017 16:18:25 + as excerpted:

> Hello,
> 
> I am trying to upgrade capacity on by btrfs filesystem by replacing
> smaller disks with larger ones. I added 2x8TB drives to the existing
> RAID10 but am not seeing the expected increase in space and am
> experiencing enospc errors during balance. This array has been extended
> several times but this is the first time I have seen any issues.
> 
> Looking at the list archives, it seems that some others have had similar
> problems. Has anyone found a solution or any recommendations?


> # btrfs balance start -v -dusage=0 /data
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=0
> ERROR: error during balancing '/data': No space left on device
> There may be more info in syslog - try dmesg | tail

 
> # uname -a
> Linux nas 4.13.9-1.el7.elrepo.x86_64 #1 SMP
> Sun Oct 22 10:02:34 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

There's a known bug with kernel 4.13 with balance on TB+ sized 
filesystems.  A reserve-space calculation goes haywire and attempts to 
reserve orders of magnitude more space than it actually needs, and given 
that PB-sized storage isn't particularly common yet, more space than it 
actually has, as well.  (Not that PB would fix it, the problem seems to 
be one of scale, hundred-GB sized filesystems don't seem to be as badly 
affected, so PB-sized filesystems may actually make the bug worse and 
it'd think it needed EB-sized!)

Try waiting for 4.14 if it's not urgent, or try the latest 4.14-rc or 
downgrade to, say the latest LTS series 4.9.x kernel, and try the balance 
again.  In theory 4.13 stable series should get the fix as well, but in 
practice, not being an LTS, as late in the 4.14 cycle as it is already, 
I'm not sure whether the fix will make it to 4.13 before it goes 
unsupported, or not.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Error During Balancing - No Space Left on Device

2017-11-01 Thread Ben Hooper

Hello,

I am trying to upgrade capacity on by btrfs filesystem by replacing smaller 
disks with larger ones. I added 2x8TB drives to the existing RAID10 but am not 
seeing the expected increase in space and am experiencing enospc errors during 
balance. This array has been extended several times but this is the first time 
I have seen any issues.

Looking at the list archives, it seems that some others have had similar 
problems. Has anyone found a solution or any recommendations?

Thanks,

Ben

Cannot remove device, no space left on device
https://marc.info/?l=linux-btrfs&m=150684414519356&w=2

how to run balance successfully (No space left on device)?
https://marc.info/?l=linux-btrfs&m=150566058009527&w=2


# btrfs balance start -v -dusage=0 /data
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
ERROR: error during balancing '/data': No space left on device
There may be more info in syslog - try dmesg | tail

# dmesg | tail
[49518.949915] BTRFS info (device sdg): relocating block group 241419042029568 
flags data|raid10
[49519.899061] BTRFS info (device sdg): relocating block group 241419041767424 
flags data|raid10
[49521.448691] BTRFS info (device sdg): relocating block group 241419041505280 
flags data|raid10
[49522.136725] BTRFS info (device sdg): relocating block group 241419041243136 
flags data|raid10
[49522.877266] BTRFS info (device sdg): relocating block group 241419040980992 
flags data|raid10
[49524.702461] BTRFS info (device sdg): relocating block group 241419040718848 
flags data|raid10
[49525.068713] BTRFS info (device sdg): relocating block group 241419040456704 
flags data|raid10
[49525.656543] BTRFS info (device sdg): relocating block group 241419040194560 
flags data|raid10
[49549.168836] BTRFS info (device sdg): relocating block group 241419039932416 
flags data|raid10
[49622.578101] BTRFS info (device sdg): 1 enospc errors during balance


# btrfs --version
btrfs-progs v4.13.3

# uname -a
Linux nas 4.13.9-1.el7.elrepo.x86_64 #1 SMP Sun Oct 22 10:02:34 EDT 2017 x86_64 
x86_64 x86_64 GNU/Linux

# btrfs filesystem df /data
Data, RAID10: total=59.79TiB, used=59.77TiB
System, RAID10: total=176.00MiB, used=133.80MiB
Metadata, RAID10: total=97.69GiB, used=96.47GiB
GlobalReserve, single: total=512.00MiB, used=16.00KiB

# df -h | grep data
/dev/sdj  67T   60T  269G 100% /data


# btrfs filesystem show /data
Label: 'data'  uuid: 3742056f-7ff0-4ce1-9131-ab2cfd7b8736
Total devices 24 FS bytes used 59.87TiB
devid2 size 1.82TiB used 1.82TiB path /dev/sdj
devid3 size 1.82TiB used 1.70TiB path /dev/sdk
devid5 size 3.64TiB used 3.64TiB path /dev/sdo
devid7 size 2.73TiB used 2.73TiB path /dev/sdr
devid   11 size 3.64TiB used 3.64TiB path /dev/sdm
devid   12 size 3.64TiB used 3.64TiB path /dev/sdn
devid   13 size 3.64TiB used 3.64TiB path /dev/sdp
devid   14 size 3.64TiB used 3.64TiB path /dev/sdq
devid   15 size 3.64TiB used 3.64TiB path /dev/sdw
devid   16 size 3.64TiB used 3.64TiB path /dev/sdx
devid   17 size 7.28TiB used 7.28TiB path /dev/sdv
devid   18 size 7.28TiB used 7.28TiB path /dev/sdu
devid   19 size 7.28TiB used 7.28TiB path /dev/sds
devid   20 size 7.28TiB used 7.28TiB path /dev/sdt
devid   21 size 7.28TiB used 7.28TiB path /dev/sdc
devid   22 size 7.28TiB used 7.28TiB path /dev/sdd
devid   23 size 7.28TiB used 7.28TiB path /dev/sdf
devid   24 size 7.28TiB used 7.27TiB path /dev/sde
devid   25 size 7.28TiB used 7.28TiB path /dev/sdl
devid   26 size 7.28TiB used 7.28TiB path /dev/sdh
devid   27 size 7.28TiB used 7.28TiB path /dev/sdi
devid   28 size 7.28TiB used 6.86TiB path /dev/sda
devid   29 size 7.28TiB used 586.88GiB path /dev/sdg
devid   30 size 7.28TiB used 586.88GiB path /dev/sdb


# btrfs device usage /data
/dev/sda, ID: 28
   Device size: 7.28TiB
   Device slack:  0.00B
   Data,RAID10: 1.16TiB
   Data,RAID10:   447.48GiB
   Data,RAID10:   687.27GiB
   Data,RAID10: 8.00GiB
   Data,RAID10:   371.32GiB
   Data,RAID10:   173.51GiB
   Data,RAID10:   144.21GiB
   Data,RAID10:14.36GiB
   Data,RAID10:   437.67GiB
   Data,RAID10:36.90GiB
   Data,RAID10:80.34MiB
   Metadata,RAID10:99.00MiB
   Metadata,RAID10: 4.41MiB
   Metadata,RAID10: 2.03GiB
   Metadata,RAID10:56.06MiB
   Metadata,RAID10: 1.42GiB
   Metadata,RAID10: 1.06MiB
   Metadata,RAID10:   200.31MiB
   Metadata,RAID10:   104.00MiB
   Metadata,RAID10:   121.81MiB
   System,RAID10:   8.00MiB
   Unallocated: 3.84TiB

/dev/sdb, ID: 30
   Device size: 7.28TiB
   Device slack:  0.00B
   Data

Re: how to run balance successfully (No space left on device)?

2017-10-31 Thread Tomasz Chmielewski


On 2017-10-31 23:18, Tomasz Chmielewski wrote:


On a different server, however, it failed badly:

# time btrfs balance start /srv
WARNING:

Full balance without filters requested. This operation is very
intense and takes potentially very long. It is recommended to
use the balance filters to narrow down the scope of balance.
Use 'btrfs balance start --full-balance' option to skip this
warning. The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/srv': Read-only file system
There may be more info in syslog - try dmesg | tail

[312304.050731] BTRFS info (device sda4): found 15073 extents
[313555.971253] BTRFS info (device sda4): relocating block group
1208022466560 flags data|raid1
[314963.506580] BTRFS: Transaction aborted (error -28)
[314963.506608] [ cut here ]
[314963.506639] WARNING: CPU: 2 PID: 27854 at
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:3089
btrfs_run_delayed_refs+0x244/0x250 [btrfs]


(...)


[314963.506955] BTRFS: error (device sda4) in
btrfs_run_delayed_refs:3089: errno=-28 No space left
[314963.507032] BTRFS info (device sda4): forced readonly
[314963.510570] BTRFS warning (device sda4): Skipping commit of
aborted transaction.
[314963.510577] BTRFS: error (device sda4) in
cleanup_transaction:1873: errno=-28 No space left
[314970.954768] mail[32290]: segfault at c0 ip 7f6b507ae33b sp
7ffec4849ac0 error 4 in libmailutils.so.4.0.0[7f6b50724000+b]
[314983.475988] BTRFS error (device sda4): pending csums is 167936



And btrfs balance can be a real database killer :(


root@backupslave01:/var/log/mysql# tail -f mysql-error.log
InnoDB: Doing recovery: scanned up to log sequence number 2206178343424
InnoDB: Doing recovery: scanned up to log sequence number 2206183586304
InnoDB: Doing recovery: scanned up to log sequence number 2206188829184
InnoDB: Doing recovery: scanned up to log sequence number 2206194072064
InnoDB: Doing recovery: scanned up to log sequence number 2206199314944
InnoDB: Doing recovery: scanned up to log sequence number 2206204557824
InnoDB: Doing recovery: scanned up to log sequence number 2206209800704
InnoDB: Doing recovery: scanned up to log sequence number 2206215043584
InnoDB: Doing recovery: scanned up to log sequence number 2206220286464
InnoDB: Doing recovery: scanned up to log sequence number 2206220752384

InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 1 row operations to undo
InnoDB: Trx id counter is 21145843968
2017-10-31 14:46:59 4359 [Note] InnoDB: Starting an apply batch of log 
records to the database...

InnoDB: Progress in percent: 14:46:59 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this 
binary
or one of the libraries it was linked against is corrupt, improperly 
built,
or misconfigured. This error can also be caused by malfunctioning 
hardware.

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=0
max_threads=502
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 
232495 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x4
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x8d444b]
/usr/sbin/mysqld(handle_fatal_signal+0x49a)[0x649b0a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f74b90bf390]
/usr/sbin/mysqld[0x99fcae]
/usr/sbin/mysqld[0x9a17ed]
/usr/sbin/mysqld[0x9881ea]
/usr/sbin/mysqld[0x989fc7]
/usr/sbin/mysqld[0xa6dd87]
/usr/sbin/mysqld[0xab8cd8]
/usr/sbin/mysqld[0xa08300]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f74b90b56ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f74b854a3dd]



Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-10-31 Thread Tomasz Chmielewski


On 2017-09-18 17:20, Tomasz Chmielewski wrote:

# df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a 
situation
such as this, as it really doesn't give you the information you need 
(it

can say you have lots of space available, but if btrfs has all of it
allocated into chunks, even if the chunks have space in them still, 
there

can be problems).


I see here on RAID-1, "df -h" it shows pretty much the same amount of
free space as "btrfs fi show":

- "df -h" shows 105G free
- "btrfs fi show" says: Free (estimated):104.28GiB
(min: 104.28GiB)




But chances are pretty good that one you get that patch integrated,
whether by integrating it yourself to what you have currently, or by
trying 4.14-rc1 or waiting until it hits release or stable, that bug 
will

have been squashed! =:^)


OK, will wait for 4.14.


So I've tried to run balance with 4.14-rc6.

It succeeded on one server where it was failing with 4.13.x.


On a different server, however, it failed badly:

# time btrfs balance start /srv
WARNING:

Full balance without filters requested. This operation is very
intense and takes potentially very long. It is recommended to
use the balance filters to narrow down the scope of balance.
Use 'btrfs balance start --full-balance' option to skip this
warning. The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/srv': Read-only file system
There may be more info in syslog - try dmesg | tail

real5194m41.749s
user0m0.000s
sys 301m10.928s


[312304.050731] BTRFS info (device sda4): found 15073 extents
[313555.971253] BTRFS info (device sda4): relocating block group 
1208022466560 flags data|raid1

[314963.506580] BTRFS: Transaction aborted (error -28)
[314963.506608] [ cut here ]
[314963.506639] WARNING: CPU: 2 PID: 27854 at 
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:3089 
btrfs_run_delayed_refs+0x244/0x250 [btrfs]
[314963.506640] Modules linked in: vhost_net vhost tap xt_REDIRECT 
nf_nat_redirect xt_NFLOG nfnetlink_log nfnetlink xt_conntrack veth 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_filter ip6_tables xt_comment xt_CHECKSUM 
binfmt_misc iptable_mangle nf_log_ipv4 nf_log_common xt_LOG 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 
xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc btrfs 
zstd_compress shpchp intel_rapl lpc_ich x86_pkg_temp_thermal 
intel_powerclamp input_leds tpm_infineon ie31200_edac serio_raw coretemp 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm_intel pcbc kvm 
aesni_intel irqbypass aes_x86_64 mac_hid crypto_simd glue_helper cryptd 
intel_cstate
[314963.506684]  eeepc_wmi asus_wmi sparse_keymap intel_rapl_perf 
wmi_bmof nfsd auth_rpcgss nfs_acl lockd grace sunrpc lp parport autofs4 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 e1000e ahci 
libahci ptp pps_core wmi video
[314963.506710] CPU: 2 PID: 27854 Comm: sadc Tainted: GW   
4.14.0-041400rc6-generic #201710230731
[314963.506711] Hardware name: System manufacturer System Product 
Name/P8B WS, BIOS 0904 10/24/2011

[314963.506713] task: 8bc0fd39ae00 task.stack: b28d4949
[314963.506732] RIP: 0010:btrfs_run_delayed_refs+0x244/0x250 [btrfs]
[314963.506734] RSP: 0018:b28d49493d30 EFLAGS: 00010286
[314963.506736] RAX: 0026 RBX: ffe4 RCX: 

[314963.506737] RDX:  RSI: 8bc8afa8dc98 RDI: 
8bc8afa8dc98
[314963.506738] RBP: b28d49493d88 R08: 0001 R09: 
242b
[314963.506740] R10: b28d49493c20 R11:  R12: 
8bc883a81078
[314963.506741] R13: 8bc887eb R14: 8bc1876ec400 R15: 
0018ba90
[314963.506743] FS:  7f62a12d9700() GS:8bc8afa8() 
knlGS:

[314963.506744] CS:  0010 DS:  ES:  CR0: 80050033
[314963.506746] CR2: 7f25f6f53880 CR3: 0003cf4f7004 CR4: 
000626e0

[314963.506747] Call Trace:
[314963.506773]  btrfs_commit_transaction+0x9b/0x8d0 [btrfs]
[314963.506799]  ? btrfs_wait_ordered_range+0x9c/0x110 [btrfs]
[314963.506821]  btrfs_sync_file+0x348/0x410 [btrfs]
[314963.506826]  vfs_fsync_range+0x4b/0xb0
[314963.506828]  do_fsync+0x3d/0x70
[314963.506831]  SyS_fdatasync+0x13/0x20
[314963.506834]  do_syscall_64+0x61/0x120
[314963.506838]  entry_SYSCALL64_slow_path+0x25/0x25
[314963.506840] RIP: 0033:0x7f62a0dfec30
[314963.506841] RSP: 002b:7fffca89f288 EFLAGS: 0246 ORIG_RAX: 
004b
[314963.506844] RAX: ffda RBX: 0001 RCX: 
7f62a0dfec30
[314963.506845] RDX:  RSI: 7f62a10c47a0 RDI: 
0003

Re: Cannot remove device, no space left on device

2017-10-22 Thread Adam Bahe

I have upgraded to kernel 4.13.8-1 and still cannot delete this disk.

I find it weird that I cannot remove a from my array. Especially on
one of the newest kernels available sourced straight from kernel.org

On Sun, Oct 1, 2017 at 4:57 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> Adam Bahe posted on Sun, 01 Oct 2017 02:48:19 -0500 as excerpted:
>
>> Hello,
>
> Hi, Just a user and list regular here, not a dev, but perhaps some of
> this will be of help.
>
>> I have a hard drive that is about a year old with some pending sectors
>> on it. I'd like to RMA this drive out of an abundance of caution. Doing
>> so requires me removing it from my raid10 array. However I am unable to
>> do so as it eventually errors out by saying there is no space left on
>> the device. I have 21 drives in a raid10 array. Totalling about 100TB
>> raw. I'm using around 28TB. So I should have plenty of space left.
>
> Yes, and your btrfs * outputs below reflect plenty of space...
>
>> I have done a number of balances with incremental increases in dusage
>> and musage values from 5-100%. Each balance completed successfully. So
>> it looks as though my filesystem is balanced fine. I'm on kernel 4.10
>
> FWIW, this list, being btrfs development focused, with btrfs itself still
> stabilizing, not fully stable and mature, tends to focus forward rather
> than backward.  As such, our recommendation for best support is one of
> the latest two mainline kernel series in either current or LTS track.
> With the current kernel being 4.13, 4.13 and 4.12 are supported there.
> On the LTS track 4.9 is the latest, with the second latest.  4.14 is
> scheduled to be an LTS release as well, which is good because 4.4 was
> quite a long time ago in btrfs history and is getting hard to support.
>
> Your 4.10 is a bit dated for current, and isn't an LTS, so the
> recommendation would be to try a newer 4.12 or 4.13, or drop a notch to
> 4.9 LTS.
>
> We do still try to support out of the above range, but it won't be as
> well, and similarly you're running a distro kernel, because we don't
> track what they've added or backported and what they haven't backported.
> Of course in the distro kernel case they're better placed to provide
> support as they know what they've backported, etc.
>
> Meanwhile, as it happens there's a patch that should be in 4.14-rcs and
> will eventually be backported to the stable series tho I'm not sure it
> has been yet, that fixes an erroneous ENOSPC condition that triggers most
> frequently during balances.  There was something reserving (or attempting
> to reserve) waaayyy too much space in such large transactions, triggering
> the ENOSPCs.
>
> Given your time constraints, I'd suggest trying first the latest 4.13.x
> stable series kernel and hope it has that patch (which I haven't tracked
> well enough to give you the summary of, or I would and you could check),
> and if it doesn't work, 4.14-rc3, which should be out late today (Sunday,
> US time), because your symptoms fit the description and it's very likely
> to be fixed in at least the latest 4.14-rcs.
>
> Another less pressing note below...
>
>> btrfs device usage:
>>
>> /dev/sdc, ID: 19
>> Device size: 9.10TiB
>> Device slack:  0.00B
>> Data,RAID10:   463.85GiB
>> Data,RAID10:61.43GiB
>> Data,RAID10:   115.98GiB
>> Data,RAID10:   118.31GiB
>> Data,RAID10:10.93GiB
>> Data,RAID10:   776.75GiB
>> Metadata,RAID10: 1.13GiB
>> Metadata,RAID10:99.00MiB
>> Metadata,RAID10:   211.75MiB
>> Metadata,RAID10:59.09MiB
>> System,RAID10:   2.16MiB
>> Unallocated: 7.58TiB
>
> [Other devices similar]
>
> Those multiple entries for the same chunk type indicate chunks of
> differing stripe widths.  That won't hurt but you might want the better
> performance of a full stripe, and all those extra lines in the listing
> would bother me.
>
> Once you get that device removed and are in normal operation again, you
> can, if desired, try balancing using the "stripes=" balance filter to try
> to get them all to full stripe width, at least until your space on the
> smallest drives is out and you have to drop to a lower stripe width.
> You'll need a reasonably new btrfs-progs to recognize the stripes=
> filter.  See the btrfs-balance manpage and/or previous threads here.  (On
> a quick look I didn't see it on the wiki yet, but it's possible I missed
> it.)
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  htt

"No space left on device"

2017-10-06 Thread Nick Gilmour

Hi all,

I'm getting an error "No space left on device" on a VM in VirtualBox.
It started as I was trying to convert the .vdi to .img. I wanted to
shrink the size of the disk first and I followed the steps from here:
https://superuser.com/questions/529149/how-to-compact-virtualboxs-vdi-file-size#529183

and then I got the error:
$ dd if=/dev/zero of=/tmp/bigemptyfile bs=4096k
dd: error writing '/tmp/bigemptyfile': No space left on device
94174+0 records in
94173+0 records out
394990190592 bytes (395 GB, 368 GiB) copied, 685.984 s, 576 MB/s
$ rm /tmp/bigemptyfile

After I rebooted I got only a terminal prompt, no desktop environment,
and I couldn't start it with startx (not even bash completion doesn't
work).

I have tried with balancing following the steps from here:
https://unix.stackexchange.com/questions/174446/btrfs-error-error-during-balancing-no-space-left-on-device

but I keep getting:
"Done, had to relocate 0 out of XX chunks"
regardless of the increase of the dusage parameter.


Debug information (Copy & Paste was not possible, some text is missing...):

uname -a
Linux VM-Ubuntu 4.4.0-83-generic

btrfs --version
btrfs-progs v4.4

btrfs fi show
Label: none uuid: x
   Total devices 1 FS bytes used 473.68GiB
devid 1 size 492.00 GiB used 492.00GiB path /dev/sda1

Label: 'extra' uuid: y
   Total devices 1 FS bytes used 112.00KiB
devid 1 size 100.00 GiB used 2.02GiB path /dev/sdb1

btrfs fi df /home
Data, single: total=462.23GiB, used=462.23GiB
System, DUP: total=8.00MiB, used=80.00KiB
GlobalReserve, single: total=512.00 MiB, used=160.00KiB

dmesg > dmesg.log
dmesg: write failed: No space left on device
dmesg: write error


Any ideas how can I fix this?
Thanks.

Regards,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Cannot remove device, no space left on device

2017-10-01 Thread Duncan

Adam Bahe posted on Sun, 01 Oct 2017 02:48:19 -0500 as excerpted:

> Hello,

Hi, Just a user and list regular here, not a dev, but perhaps some of 
this will be of help.

> I have a hard drive that is about a year old with some pending sectors
> on it. I'd like to RMA this drive out of an abundance of caution. Doing
> so requires me removing it from my raid10 array. However I am unable to
> do so as it eventually errors out by saying there is no space left on
> the device. I have 21 drives in a raid10 array. Totalling about 100TB
> raw. I'm using around 28TB. So I should have plenty of space left.

Yes, and your btrfs * outputs below reflect plenty of space...

> I have done a number of balances with incremental increases in dusage
> and musage values from 5-100%. Each balance completed successfully. So
> it looks as though my filesystem is balanced fine. I'm on kernel 4.10

FWIW, this list, being btrfs development focused, with btrfs itself still 
stabilizing, not fully stable and mature, tends to focus forward rather 
than backward.  As such, our recommendation for best support is one of 
the latest two mainline kernel series in either current or LTS track.  
With the current kernel being 4.13, 4.13 and 4.12 are supported there.  
On the LTS track 4.9 is the latest, with the second latest.  4.14 is 
scheduled to be an LTS release as well, which is good because 4.4 was 
quite a long time ago in btrfs history and is getting hard to support.

Your 4.10 is a bit dated for current, and isn't an LTS, so the 
recommendation would be to try a newer 4.12 or 4.13, or drop a notch to 
4.9 LTS.

We do still try to support out of the above range, but it won't be as 
well, and similarly you're running a distro kernel, because we don't 
track what they've added or backported and what they haven't backported.  
Of course in the distro kernel case they're better placed to provide 
support as they know what they've backported, etc.

Meanwhile, as it happens there's a patch that should be in 4.14-rcs and 
will eventually be backported to the stable series tho I'm not sure it 
has been yet, that fixes an erroneous ENOSPC condition that triggers most 
frequently during balances.  There was something reserving (or attempting 
to reserve) waaayyy too much space in such large transactions, triggering 
the ENOSPCs.

Given your time constraints, I'd suggest trying first the latest 4.13.x 
stable series kernel and hope it has that patch (which I haven't tracked 
well enough to give you the summary of, or I would and you could check), 
and if it doesn't work, 4.14-rc3, which should be out late today (Sunday, 
US time), because your symptoms fit the description and it's very likely 
to be fixed in at least the latest 4.14-rcs.

Another less pressing note below...

> btrfs device usage:
> 
> /dev/sdc, ID: 19
> Device size: 9.10TiB
> Device slack:  0.00B
> Data,RAID10:   463.85GiB
> Data,RAID10:61.43GiB
> Data,RAID10:   115.98GiB
> Data,RAID10:   118.31GiB
> Data,RAID10:10.93GiB
> Data,RAID10:   776.75GiB
> Metadata,RAID10: 1.13GiB
> Metadata,RAID10:99.00MiB
> Metadata,RAID10:   211.75MiB
> Metadata,RAID10:59.09MiB
> System,RAID10:   2.16MiB
> Unallocated: 7.58TiB

[Other devices similar]

Those multiple entries for the same chunk type indicate chunks of 
differing stripe widths.  That won't hurt but you might want the better 
performance of a full stripe, and all those extra lines in the listing 
would bother me.

Once you get that device removed and are in normal operation again, you 
can, if desired, try balancing using the "stripes=" balance filter to try 
to get them all to full stripe width, at least until your space on the 
smallest drives is out and you have to drop to a lower stripe width.  
You'll need a reasonably new btrfs-progs to recognize the stripes= 
filter.  See the btrfs-balance manpage and/or previous threads here.  (On 
a quick look I didn't see it on the wiki yet, but it's possible I missed 
it.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cannot remove device, no space left on device

2017-10-01 Thread Adam Bahe

Hello,

I have a hard drive that is about a year old with some pending sectors
on it. I'd like to RMA this drive out of an abundance of caution.
Doing so requires me removing it from my raid10 array. However I am
unable to do so as it eventually errors out by saying there is no
space left on the device. I have 21 drives in a raid10 array.
Totalling about 100TB raw. I'm using around 28TB. So I should have
plenty of space left.

I have done a number of balances with incremental increases in dusage
and musage values from 5-100%. Each balance completed successfully. So
it looks as though my filesystem is balanced fine. I'm on kernel 4.10

I also tried adding more space. I threw in another 4TB hard drive and
it added mostly fine. It took 3-4 tries at a balance before it was
fully balanced into the array. The same no space left on device error
occurred when adding the drive. But it did eventually add.

But I still can't seem to remove the hard drive I want to RMA. Here
are some statistics. Let me know if there is any more info I can
provide. But I really need to get this drive removed as my RMA window
is only open for 30 days once I submit.

/dev/sdo is the drive I would like to remove, why am I unable to do so?

btrfs fi show:

Label: 'nas'  uuid: 4fcd5725-b6c6-4d8a-9860-f2fc5474cbcb
Total devices 20 FS bytes used 24.12TiB
devid1 size 3.64TiB used 2.86TiB path /dev/sdm
devid2 size 3.64TiB used 2.86TiB path /dev/sde
devid3 size 7.28TiB used 3.02TiB path /dev/sdt
devid4 size 7.28TiB used 2.08TiB path /dev/sdo
devid5 size 7.28TiB used 3.02TiB path /dev/sdi
devid6 size 7.28TiB used 3.02TiB path /dev/sdd
devid7 size 1.82TiB used 1.82TiB path /dev/sdp
devid9 size 1.82TiB used 1.82TiB path /dev/sdv
devid   10 size 1.82TiB used 1.82TiB path /dev/sdk
devid   11 size 1.82TiB used 1.82TiB path /dev/sdq
devid   12 size 1.82TiB used 1.82TiB path /dev/sdg
devid   13 size 1.82TiB used 1.82TiB path /dev/sdl
devid   14 size 1.82TiB used 1.82TiB path /dev/sdr
devid   15 size 1.82TiB used 1.82TiB path /dev/sdf
devid   16 size 5.46TiB used 3.02TiB path /dev/sds
devid   17 size 9.10TiB used 3.02TiB path /dev/sdn
devid   18 size 9.10TiB used 3.02TiB path /dev/sdh
devid   19 size 9.10TiB used 3.02TiB path /dev/sdc
devid   20 size 9.10TiB used 3.02TiB path /dev/sdu
devid   21 size 3.64TiB used 1.76TiB path /dev/sdj

btrfs fi df

[root@nas ~]# btrfs fi df /mnt2/nas
Data, RAID10: total=24.13TiB, used=24.10TiB
System, RAID10: total=30.19MiB, used=5.39MiB
Metadata, RAID10: total=25.51GiB, used=24.98GiB

GlobalReserve, single: total=512.00MiB, used=0.00B
Overall:
Device size:  96.42TiB
Device allocated: 48.31TiB
Device unallocated:   48.11TiB
Device missing:  0.00B
Used: 48.24TiB
Free (estimated): 24.09TiB  (min: 24.09TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

btrfs fi usage

Data,RAID10: Size:24.13TiB, Used:24.10TiB
/dev/sdc1.51TiB
/dev/sdd1.51TiB
/dev/sde1.43TiB
/dev/sdf  930.43GiB
/dev/sdg  930.59GiB
/dev/sdh1.51TiB
/dev/sdi1.51TiB
/dev/sdj  900.48GiB
/dev/sdk  930.17GiB
/dev/sdl  930.42GiB
/dev/sdm1.43TiB
/dev/sdn1.51TiB
/dev/sdo1.04TiB
/dev/sdp  929.93GiB
/dev/sdq  930.42GiB
/dev/sdr  930.04GiB
/dev/sds1.51TiB
/dev/sdt1.51TiB
/dev/sdu1.51TiB
/dev/sdv  930.48GiB

Metadata,RAID10: Size:25.51GiB, Used:24.98GiB
/dev/sdc1.49GiB
/dev/sdd1.49GiB
/dev/sde1.49GiB
/dev/sdf1.04GiB
/dev/sdg  903.66MiB
/dev/sdh1.49GiB
/dev/sdi1.49GiB
/dev/sdj1.49GiB
/dev/sdk1.27GiB
/dev/sdl1.01GiB
/dev/sdm1.49GiB
/dev/sdn1.49GiB
/dev/sdp1.49GiB
/dev/sdq1.04GiB
/dev/sdr1.37GiB
/dev/sds1.49GiB
/dev/sdt1.49GiB
/dev/sdu1.49GiB
/dev/sdv 1005.44MiB

System,RAID10: Size:30.19MiB, Used:5.39MiB
/dev/sdc2.16MiB
/dev/sdd2.16MiB
/dev/sde2.16MiB
/dev/sdh2.16MiB
/dev/sdi2.16MiB
/dev/sdj2.16MiB
/dev/sdk2.16MiB
/dev/sdm2.16MiB
/dev/sdn2.16MiB
/dev/sdp2.16MiB
/dev/sdr2.16MiB
/dev/sds2.16MiB
/dev/sdt2.16MiB
/dev/sdu2.16MiB

Unallocated:
/dev/sdc7.58TiB
/dev/sdd5.76TiB
/dev/sde2.21TiB
/dev/sdf  931.55GiB
/dev/sdg  931.54GiB
/dev/sdh7.58TiB
/dev/sdi5.76TiB
/dev/sdj2.76TiB
/dev/sdk  931.57GiB
/dev/sdl  931.58GiB
/dev/sdm2.21TiB
/dev/sdn7.58TiB
/dev/sdo6.24TiB
/dev/sdp  931.59GiB
/dev/sdq  931.55GiB
/dev/sdr  931.61GiB
/dev/sds3.95TiB
/dev/sdt5.76TiB
/dev/sdu7

Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Duncan

Tomasz Chmielewski posted on Mon, 18 Sep 2017 18:27:09 +0900 as excerpted:

> And perhaps more important - can I assume that right now, with the
> latest stable kernel (4.13.2 right now), running "btrfs balance" is not
> safe and can lead to data corruption or loss?
> 
> 
> Consider the following case:
> 
> - system admin runs btrfs balance on a filesystem with 100 GB free and
> assumes it is enough space to complete successfully
> 
> - btrfs balance fails due to some bug with "No space left on device"
> 
> - at the same time, a database using this filesystem will fail with "No
> space left on device", apt/rpm will fail a package upgrade, some program
> using temp space will fail, log collector will fail to catch some data,
> because of "No space left on device" and so on?

To the best of my knowledge that shouldn't be a problem, certainly not 
one I'd worry about if you're following the sysadmin's first rule of 
backups, the true value of data to you is defined not by any claims but 
by the number of backups you consider it worth having of that data, so it 
follows that no backups means you've defined the data as worth less than 
the time/trouble/resources it would take to create at least that one 
backup.

The ENOSPC is because the internal calculation for the reserved-space 
requirement is buggy ATM, but AFAIK it's just that, an /internal/ 
calculation, that goes waayyy wild, and stops any action it's going to 
stop before it goes anywhere -- it doesn't get to the point of affecting 
anything else because the reserve space calculation goes wild and stops 
it before it can actually reserve the space.

Talking about which... I've not seen it mentioned in the bug discussion, 
but I wonder if doing a btrfs balance start -d, followed by a another 
balance with -m replacing the -d, thus separating the data and metadata 
balances, might work around the problem.  At least you could know for 
sure which is causing it that way, and complete a balance of the other 
one.  And if that blocks on one or the other, you could split the job up 
further using the devid= and drange= filters (see the btrfs-balance 
manpage), doing only part of the filesystem at a time.  My speculation is 
that you should be able to divide the operation up enough so that even if 
the reserve space calculation is off, it'll still complete.

Meanwhile, I don't believe it's just balance that's affected, either, tho 
it's the most commonly reported.  By my understanding, any sufficiently 
large operation could trigger it, tho obviously a full btrfs balance is 
about the largest operation a btrfs is likely to have, so it stands to 
reason that would trigger it more reliably than common generic filesystem 
operations.

Of course if you're paranoid, you can refrain from doing balances until 
you know the bug is fixed, but then I'd have to ask, if you're that 
paranoid of a filesystem failure, why are you running the still 
stabilizing, not yet entirely stable and mature, btrfs, in the first 
place?  Seems a bit like the folks still running RHEL/CentOS 6 with their 
stable kernels because they want stability, yet choosing to run the still 
not entirely stable btrfs, definitely not entirely stable on that old a 
kernel, on top of them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Tomasz Chmielewski


On 2017-09-18 22:44, Peter Becker wrote:

i'm not sure if it would help, but maybe you could try adding an 8GB
(or more) USB flash drive to the pool and try to start balance.
if it works out, you can throw him out of the pool after that.


I really can't, it's an "online server".

But I've removed some 65 GB data, so now it's 171 GB free, or, 60% used 
filesystem.


The balance still fails.


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Peter Becker

i'm not sure if it would help, but maybe you could try adding an 8GB
(or more) USB flash drive to the pool and try to start balance.
if it works out, you can throw him out of the pool after that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Tomasz Chmielewski


On 2017-09-18 17:29, Andrei Borzenkov wrote:
On Mon, Sep 18, 2017 at 11:20 AM, Tomasz Chmielewski  
wrote:

# df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a 
situation
such as this, as it really doesn't give you the information you need 
(it

can say you have lots of space available, but if btrfs has all of it
allocated into chunks, even if the chunks have space in them still, 
there

can be problems).



I see here on RAID-1, "df -h" it shows pretty much the same amount of 
free

space as "btrfs fi show":

- "df -h" shows 105G free
- "btrfs fi show" says: Free (estimated):104.28GiB  
(min:

104.28GiB)



I think both use the same algorithm to compute free space (df at the
end just shows what kernel returns). The problem is that this
algorithm itself is just approximation in general case. For uniform
RAID1 profile it should be correct though.


And perhaps more important - can I assume that right now, with the 
latest stable kernel (4.13.2 right now), running "btrfs balance" is not 
safe and can lead to data corruption or loss?



Consider the following case:

- system admin runs btrfs balance on a filesystem with 100 GB free and 
assumes it is enough space to complete successfully


- btrfs balance fails due to some bug with "No space left on device"

- at the same time, a database using this filesystem will fail with "No 
space left on device", apt/rpm will fail a package upgrade, some program 
using temp space will fail, log collector will fail to catch some data, 
because of "No space left on device" and so on?




Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Andrei Borzenkov

On Mon, Sep 18, 2017 at 11:20 AM, Tomasz Chmielewski  wrote:
>>> # df -h /var/lib/lxd
>>>
>>> FWIW, standard (aka util-linux) df is effectively useless in a situation
>>> such as this, as it really doesn't give you the information you need (it
>>> can say you have lots of space available, but if btrfs has all of it
>>> allocated into chunks, even if the chunks have space in them still, there
>>> can be problems).
>
>
> I see here on RAID-1, "df -h" it shows pretty much the same amount of free
> space as "btrfs fi show":
>
> - "df -h" shows 105G free
> - "btrfs fi show" says: Free (estimated):104.28GiB  (min:
> 104.28GiB)
>

I think both use the same algorithm to compute free space (df at the
end just shows what kernel returns). The problem is that this
algorithm itself is just approximation in general case. For uniform
RAID1 profile it should be correct though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Tomasz Chmielewski


# df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a 
situation
such as this, as it really doesn't give you the information you need 
(it

can say you have lots of space available, but if btrfs has all of it
allocated into chunks, even if the chunks have space in them still, 
there

can be problems).


I see here on RAID-1, "df -h" it shows pretty much the same amount of 
free space as "btrfs fi show":


- "df -h" shows 105G free
- "btrfs fi show" says: Free (estimated):104.28GiB  
(min: 104.28GiB)





But chances are pretty good that one you get that patch integrated,
whether by integrating it yourself to what you have currently, or by
trying 4.14-rc1 or waiting until it hits release or stable, that bug 
will

have been squashed! =:^)


OK, will wait for 4.14.


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to run balance successfully (No space left on device)?

2017-09-17 Thread Duncan

Tomasz Chmielewski posted on Mon, 18 Sep 2017 00:02:46 +0900 as excerpted:

> I'm trying to run balance on a 4.13.2 kernel without much luck:
> 
> # time btrfs balance start -v /var/lib/lxd -dusage=5 -musage=5
> [works, but only 1 chunk balanced]

> # time btrfs balance start -v /var/lib/lxd -dusage=0 -musage=0
> [no chunks with 0 usage to balance]
> 
> 
> # time btrfs balance start -v /var/lib/lxd
> [...]
> ERROR: error during balancing '/var/lib/lxd': No space left on device

OK, that fails.  Let's see what your unallocated space looks like, 
below...

> # df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a situation 
such as this, as it really doesn't give you the information you need (it 
can say you have lots of space available, but if btrfs has all of it 
allocated into chunks, even if the chunks have space in them still, there 
can be problems).

And actually, (util-linux) df really doesn't give you a whole lot of 
useful information on a btrfs in enough cases that most list regulars 
tend to discount its output almost entirely.  The only thing it's really 
useful for is getting a reasonable idea as to whether your next major 
file operation can be expected to succeed or not -- if it says you have 
50 MB left and you're trying to put a new 1 GiB file on the btrfs, it's 
unlikely to work, but if it says you have 300 GiB left in a multi-TB 
multi-device filesystem, you might have 300, or 3000 (its estimates are 
deliberately on the pessimistic side).

For better numbers, always use the btrfs tools, btrfs fi usage is the one 
I tend to use most, but btrfs dev usage can be very useful if you're more 
interested in a per-device listing, and btrfs fi show combined with btrfs 
fi df provide much the same information, tho it needs a bit more 
interpreting.

But you do provide them too. =:^)

> # btrfs fi df /var/lib/lxd
> Data, RAID1: total=318.00GiB, used=313.82GiB
> System, RAID1: total=32.00MiB, used=80.00KiB
> Metadata, RAID1: total=5.00GiB, used=3.17GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

Looks reasonably healthy.  No global reserve used, good as that's a major 
indicator of problems, and data and metadata usage is reasonably close to 
totals -- no huge number of mostly empty allocated chunks.

> # btrfs fi show /var/lib/lxd Label: 'btrfs'  uuid:
> f5f30428-ec5b-4497-82de-6e20065e6f61
>  Total devices 2 FS bytes used 316.98GiB
>  devid1 size 423.13GiB used 323.03GiB path /dev/sda3
>  devid2 size 423.13GiB used 323.03GiB path /dev/sdb3

OK, given the ENOSPC error on balance above, those device lines are the 
real interesting numbers, and...

Healthy here too.  Very much so, in fact, as only 323 gigs out of 423 is 
allocated on each device -- 100 gigs not chunk-allocated and therefore 
free for chunk allocation on each device. =:^)

The ENOSPC is therefore a bug -- it shouldn't be happening.

And as it happens, AFAIK from reading the list, there's a currently known 
bug with over-reservation under certain circumstances that among other 
things, can (wrongly) trigger ENOSPC on balances, when there's plenty of 
space.

Also AFAIK, there's a patch on-list and (I think) in 4.14-rc1, that is I 
believe marked for stable as well, that will very likely fix your 
problem.  If it doesn't, there's another bug triggering similar symptoms.

But I'm not a dev and haven't been tracking the specific patch, so you'll 
need to either track it down (or wait to see if a dev or someone else 
points you at it) and apply it on your 4.13.x, or wait until it hits 
stable backports and you can get it there, or try 4.14-rc1 or wait until 
later/safer rcs or full release.

Meanwhile...

> # btrfs fi usage /var/lib/lxd Overall:
>  Device size: 846.25GiB
>  Device allocated:646.06GiB
>  Device unallocated:  200.19GiB
>  Device missing:  0.00B
>  Used:633.97GiB
>  Free (estimated):104.28GiB  (min: 104.28GiB)
>  Data ratio:   2.00
>  Metadata ratio:   2.00
>  Global reserve:  512.00MiB  (used: 0.00B)
> 
> Data,RAID1: Size:318.00GiB, Used:313.82GiB
> /dev/sda3 318.00GiB
> /dev/sdb3 318.00GiB
> 
> Metadata,RAID1: Size:5.00GiB, Used:3.17GiB
> /dev/sda3   5.00GiB
> /dev/sdb3   5.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:80.00KiB
> /dev/sda3  32.00MiB
> /dev/sdb3  32.00MiB
> 
> Unallocated:
> /dev/sda3 100.10GiB
> /dev/sdb3 100.10GiB

As I said above, btrfs fi usage output provides much of the same info, 
but in a much nicer format and with a bit more detail, th

how to run balance successfully (No space left on device)?

2017-09-17 Thread Tomasz Chmielewski


I'm trying to run balance on a 4.13.2 kernel without much luck:

# time btrfs balance start -v /var/lib/lxd -dusage=5 -musage=5
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=5
  METADATA (flags 0x2): balancing, usage=5
  SYSTEM (flags 0x2): balancing, usage=5
Done, had to relocate 1 out of 353 chunks

real0m2.356s
user0m0.005s
sys 0m0.175s


# time btrfs balance start -v /var/lib/lxd -dusage=0 -musage=0
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 353 chunks

real0m0.076s
user0m0.004s
sys 0m0.008s


# time btrfs balance start -v /var/lib/lxd
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing
WARNING:

Full balance without filters requested. This operation is very
intense and takes potentially very long. It is recommended to
use the balance filters to narrow down the balanced data.
Use 'btrfs balance start --full-balance' option to skip this
warning. The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/var/lib/lxd': No space left on device
There may be more info in syslog - try dmesg | tail

real284m58.541s
user0m0.000s
sys 47m39.037s




# df -h /var/lib/lxd
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda3   424G  318G  105G  76% /var/lib/lxd


# btrfs fi df /var/lib/lxd
Data, RAID1: total=318.00GiB, used=313.82GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=5.00GiB, used=3.17GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


# btrfs fi show /var/lib/lxd
Label: 'btrfs'  uuid: f5f30428-ec5b-4497-82de-6e20065e6f61
Total devices 2 FS bytes used 316.98GiB
devid1 size 423.13GiB used 323.03GiB path /dev/sda3
devid2 size 423.13GiB used 323.03GiB path /dev/sdb3


# btrfs fi usage /var/lib/lxd
Overall:
Device size: 846.25GiB
Device allocated:646.06GiB
Device unallocated:  200.19GiB
Device missing:  0.00B
Used:633.97GiB
Free (estimated):104.28GiB  (min: 104.28GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:318.00GiB, Used:313.82GiB
   /dev/sda3 318.00GiB
   /dev/sdb3 318.00GiB

Metadata,RAID1: Size:5.00GiB, Used:3.17GiB
   /dev/sda3   5.00GiB
   /dev/sdb3   5.00GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB
   /dev/sda3  32.00MiB
   /dev/sdb3  32.00MiB

Unallocated:
   /dev/sda3 100.10GiB
   /dev/sdb3 100.10GiB


Mount flags in /etc/fstab are:

LABEL=btrfs /var/lib/lxd btrfs 
defaults,noatime,space_cache=v2,device=/dev/sda3,device=/dev/sdb3,discard 
0 0




Last pieces logged in dmesg:

[46867.225334] BTRFS info (device sda3): relocating block group 
2996254998528 flags data|raid1

[46874.563631] BTRFS info (device sda3): found 9250 extents
[46894.827895] BTRFS info (device sda3): found 9250 extents
[46898.463053] BTRFS info (device sda3): found 201 extents
[46898.562564] BTRFS info (device sda3): relocating block group 
2995181256704 flags data|raid1

[46903.555976] BTRFS info (device sda3): found 7299 extents
[46914.188044] BTRFS info (device sda3): found 7299 extents
[46914.303476] BTRFS info (device sda3): relocating block group 
2947936616448 flags metadata|raid1

[46939.570810] BTRFS info (device sda3): found 42022 extents
[46945.053488] BTRFS info (device sda3): 2 enospc errors during balance



Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Hans van Kranenburg

On 07/20/2017 11:53 PM, Christoph Anton Mitterer wrote:
> On Thu, 2017-07-20 at 14:48 -0700, Omar Sandoval wrote:
> [...]
> 
>>> I assume you'll take care to get that patch into stable kernels?
>>> Is this patch alone enough to recommend the Debian maintainers to
>>> include it into their 4.9 long term stable kernels?
>>
>> I'll mark it for stable, assuming Debian tracks the upstream LTS
>> releases it should get in.

+1 \o/

> Okay :-)
> 
> Nevertheless I'll open a bug at their BTS, just to be safe.

The Debian kernel teams tries to minimize the amount of patches on top
of the original kernel code. Every patch added comes with a maintenance
burden.

So, the answer will probably be: Please get it in 4.9 itself.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Omar Sandoval

On Thu, Jul 20, 2017 at 11:53:16PM +0200, Christoph Anton Mitterer wrote:
> On Thu, 2017-07-20 at 14:48 -0700, Omar Sandoval wrote:
> > Just to be sure, did you explicitly write 0 to these?
> Nope... that seemed to have been the default value, i.e. I used
> sysctl(8) in read (and not set) mode here.

Okay, all good then.

> > These sysctls are
> > really confusing, see https://www.kernel.org/doc/Documentation/sysctl
> > /vm.txt.
> > Basically, there are two ways to specify these, either as a ratio of
> > system memory (vm.dirty_ratio and vm.dirty_background_ratio) or a
> > static
> > number of bytes (vm.dirty_bytes and vm.dirty_background_bytes). If
> > you
> > set one, the other appears as 0, and the kernel sets the ratios by
> > default. But if you explicitly set them to 0, the kernel is going to
> > flush stuff extremely aggressively.
> I see,... not sure why both are 0 here... at least I didn't change it
> myself - must be something from the distro?

That's normal, the default is to have the ratio set instead:

$ sysctl vm.dirty_{,background_}{bytes,ratio}
vm.dirty_bytes = 0
vm.dirty_ratio = 20
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10

> > Awesome, glad to hear it! I hadn't been able to reproduce the issue
> > outside of Facebook. Can I add your tested-by?
> Sure, but better use my other mail address for it, if you don't mind:
> Christoph Anton Mitterer 

No problem. I'll resend the patch with that shortly.

> > > I assume you'll take care to get that patch into stable kernels?
> > > Is this patch alone enough to recommend the Debian maintainers to
> > > include it into their 4.9 long term stable kernels?
> > 
> > I'll mark it for stable, assuming Debian tracks the upstream LTS
> > releases it should get in.
> Okay :-)
> 
> Nevertheless I'll open a bug at their BTS, just to be safe.
> 
> 
> Thanks :)
> 
> Chris.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

On Thu, 2017-07-20 at 14:48 -0700, Omar Sandoval wrote:
> Just to be sure, did you explicitly write 0 to these?
Nope... that seemed to have been the default value, i.e. I used
sysctl(8) in read (and not set) mode here.



> These sysctls are
> really confusing, see https://www.kernel.org/doc/Documentation/sysctl
> /vm.txt.
> Basically, there are two ways to specify these, either as a ratio of
> system memory (vm.dirty_ratio and vm.dirty_background_ratio) or a
> static
> number of bytes (vm.dirty_bytes and vm.dirty_background_bytes). If
> you
> set one, the other appears as 0, and the kernel sets the ratios by
> default. But if you explicitly set them to 0, the kernel is going to
> flush stuff extremely aggressively.
I see,... not sure why both are 0 here... at least I didn't change it
myself - must be something from the distro?


> Awesome, glad to hear it! I hadn't been able to reproduce the issue
> outside of Facebook. Can I add your tested-by?
Sure, but better use my other mail address for it, if you don't mind:
Christoph Anton Mitterer 


> > I assume you'll take care to get that patch into stable kernels?
> > Is this patch alone enough to recommend the Debian maintainers to
> > include it into their 4.9 long term stable kernels?
> 
> I'll mark it for stable, assuming Debian tracks the upstream LTS
> releases it should get in.
Okay :-)

Nevertheless I'll open a bug at their BTS, just to be safe.


Thanks :)

Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: strange No space left on device issues

2017-07-20 Thread Omar Sandoval

On Thu, Jul 20, 2017 at 11:33:56PM +0200, Christoph Anton Mitterer wrote:
> On Thu, 2017-07-20 at 10:32 -0700, Omar Sandoval wrote:
> > If that doesn't work, could you please also try
> > https://patchwork.kernel.org/patch/9829593/?
> 
> Okay, tried the patch now, applied upon:
> Linux 4.12.0-trunk-amd64 #1 SMP Debian 4.12.2-1~exp1 (2017-07-18) x86_64 
> GNU/Linux
> (that is the Debian source package, with all their further patches and
> their kernel config).
> 
> with the parameters at their defaults:
> # sysctl vm.dirty_bytes
> vm.dirty_bytes = 0
> # sysctl vm.dirty_background_bytes
> vm.dirty_background_bytes = 0

Just to be sure, did you explicitly write 0 to these? These sysctls are
really confusing, see https://www.kernel.org/doc/Documentation/sysctl/vm.txt.
Basically, there are two ways to specify these, either as a ratio of
system memory (vm.dirty_ratio and vm.dirty_background_ratio) or a static
number of bytes (vm.dirty_bytes and vm.dirty_background_bytes). If you
set one, the other appears as 0, and the kernel sets the ratios by
default. But if you explicitly set them to 0, the kernel is going to
flush stuff extremely aggressively.

> Tried copying the whole image three times (before I haven had a single
> copy of the whole image with at least one error, so that should be
> "proof" enough that it fixes the isse) upon the btrfs fs,... no errors
> this time...
> 
> Looks good :-)

Awesome, glad to hear it! I hadn't been able to reproduce the issue
outside of Facebook. Can I add your tested-by?

> I assume you'll take care to get that patch into stable kernels?
> Is this patch alone enough to recommend the Debian maintainers to
> include it into their 4.9 long term stable kernels?

I'll mark it for stable, assuming Debian tracks the upstream LTS
releases it should get in.

> And would you recommend this as an "urgent" fix?

This bug has been around since 4.8, so it's not _that_ urgent, but it
sucks.

Thanks!

> Cheers,
> Chris.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

On Thu, 2017-07-20 at 10:32 -0700, Omar Sandoval wrote:
> If that doesn't work, could you please also try
> https://patchwork.kernel.org/patch/9829593/?

Okay, tried the patch now, applied upon:
Linux 4.12.0-trunk-amd64 #1 SMP Debian 4.12.2-1~exp1 (2017-07-18) x86_64 
GNU/Linux
(that is the Debian source package, with all their further patches and
their kernel config).

with the parameters at their defaults:
# sysctl vm.dirty_bytes
vm.dirty_bytes = 0
# sysctl vm.dirty_background_bytes
vm.dirty_background_bytes = 0

Tried copying the whole image three times (before I haven had a single
copy of the whole image with at least one error, so that should be
"proof" enough that it fixes the isse) upon the btrfs fs,... no errors
this time...

Looks good :-)

I assume you'll take care to get that patch into stable kernels?
Is this patch alone enough to recommend the Debian maintainers to
include it into their 4.9 long term stable kernels?

And would you recommend this as an "urgent" fix?

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: strange No space left on device issues

2017-07-20 Thread Omar Sandoval

On Thu, Jul 20, 2017 at 10:28:15PM +0200, Christoph Anton Mitterer wrote:
> On Thu, 2017-07-20 at 11:14 -0700, Omar Sandoval wrote:
> > Yes, that's a safe enough workaround. It's a good idea to change the
> > parameters back after the copy.
> you mean even without having the fix, right?

Yes, even without the fix you should be okay.

> So AFAIU, the bug doesn't really cause FS corruption, but just "false"
> ENOSPC and these happen during having meta-data creating (e.g. during
> operations like mine) only?

Right.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

On Thu, 2017-07-20 at 11:14 -0700, Omar Sandoval wrote:
> Yes, that's a safe enough workaround. It's a good idea to change the
> parameters back after the copy.
you mean even without having the fix, right?

So AFAIU, the bug doesn't really cause FS corruption, but just "false"
ENOSPC and these happen during having meta-data creating (e.g. during
operations like mine) only?

smime.p7s
Description: S/MIME cryptographic signature

Re: strange No space left on device issues

2017-07-20 Thread Omar Sandoval

On Thu, Jul 20, 2017 at 08:06:52PM +0200, Christoph Anton Mitterer wrote:
> On Thu, 2017-07-20 at 10:55 -0700, Omar Sandoval wrote:
> > Against 4.12 would be best, thanks!
> okay,.. but that will take a while to compile...
> 
> 
> in the meantime... do you know whether it's more or less safe to use
> the 4.9 kernel without any fix, when I change the parameters mentioned
> before, during the massive copying?

Yes, that's a safe enough workaround. It's a good idea to change the
parameters back after the copy.

> Cheers,
> Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

On Thu, 2017-07-20 at 10:55 -0700, Omar Sandoval wrote:
> Against 4.12 would be best, thanks!
okay,.. but that will take a while to compile...


in the meantime... do you know whether it's more or less safe to use
the 4.9 kernel without any fix, when I change the parameters mentioned
before, during the massive copying?

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: strange No space left on device issues

2017-07-20 Thread Omar Sandoval

On Thu, Jul 20, 2017 at 07:48:24PM +0200, Christoph Anton Mitterer wrote:
> On Thu, 2017-07-20 at 10:32 -0700, Omar Sandoval wrote:
> > Could you try 4.12?
> Linux 4.12.0-trunk-amd64 #1 SMP Debian 4.12.2-1~exp1 (2017-07-18)
> x86_64 GNU/Linux
> from Debian experimental, doesn't fix the issue...

Okay, didn't think it would :)

> >  If that doesn't work, could you please also try
> > https://patchwork.kernel.org/patch/9829593/?
> Against 4.9?

Against 4.12 would be best, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

On Thu, 2017-07-20 at 10:32 -0700, Omar Sandoval wrote:
> Could you try 4.12?
Linux 4.12.0-trunk-amd64 #1 SMP Debian 4.12.2-1~exp1 (2017-07-18)
x86_64 GNU/Linux
from Debian experimental, doesn't fix the issue...


>  If that doesn't work, could you please also try
> https://patchwork.kernel.org/patch/9829593/?
Against 4.9?


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: strange No space left on device issues

2017-07-20 Thread Omar Sandoval

On Thu, Jul 20, 2017 at 05:20:13PM +0200, Christoph Anton Mitterer wrote:
> On Thu, 2017-07-20 at 15:00 +, Martin Raiber wrote:
> > It would be interesting if lowering the dirty ratio is a viable
> > work-around (sysctl vm.dirty_background_bytes=314572800 && sysctl
> > vm.dirty_bytes=1258291200).
> > 
> > Regards,
> > Martin
> 
> I took away a trailing 0 for each of them... and then it goes through
> without error
> 
> sysctl vm.dirty_bytes=125829120
> vm.dirty_bytes = 125829120
> sysctl vm.dirty_background_bytes=31457280
> vm.dirty_background_bytes = 31457280
> 
> 
> But what does that mean now... could there be still any corruptions?
> And do you need to permanently set the value (until this is fixed in
> stable), or is this just necessary when I had this large copying
> operation?
> 
> 
> Cheers,
> Chris.

Could you try 4.12? If that doesn't work, could you please also try
https://patchwork.kernel.org/patch/9829593/?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

On Thu, 2017-07-20 at 15:00 +, Martin Raiber wrote:
> It would be interesting if lowering the dirty ratio is a viable
> work-around (sysctl vm.dirty_background_bytes=314572800 && sysctl
> vm.dirty_bytes=1258291200).
> 
> Regards,
> Martin

I took away a trailing 0 for each of them... and then it goes through
without error

sysctl vm.dirty_bytes=125829120
vm.dirty_bytes = 125829120
sysctl vm.dirty_background_bytes=31457280
vm.dirty_background_bytes = 31457280

But what does that mean now... could there be still any corruptions?
And do you need to permanently set the value (until this is fixed in
stable), or is this just necessary when I had this large copying
operation?

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

On Thu, 2017-07-20 at 15:00 +, Martin Raiber wrote:
> there are patches on this list/upstream which could fix this ( e.g.
> "fix
> delalloc accounting leak caused by u32 overflow"/"fix early ENOSPC
> due
> to delalloc").

mhh... it's a bit problematic to test these on that nodes...


> Do you use compression?

nope...


> It would be interesting if lowering the dirty ratio is a viable
> work-around (sysctl vm.dirty_background_bytes=314572800 && sysctl
> vm.dirty_bytes=1258291200).

doesn't seem to change anything.

smime.p7s
Description: S/MIME cryptographic signature

Re: strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

Oh and I should add:
After such error, cp goes on copying (with other files)...

Same issue occurs when I do something like tar -cf - /media | tar -xf


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

strange No space left on device issues

2017-07-20 Thread Christoph Anton Mitterer

Hey.

The following happens on Debian stretch systems:
# uname -a
Linux lcg-lrz-admin 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) 
x86_64 GNU/Linux

What I have are VMs, which run with root fs as ext4 and which I want to
migrate to btrfs.
So I've added further disk images and then something like this:
- mkfs.btrfs --nodiscard --label system /dev/sdc2 (i.e. the new image)
- mounted that at /mnt
- created a subvol "root" in it
- stopped all services on the node
- remount,ro /
- mount --bind / /media
- cp -a /media/ /mnt/subvol/
- and then I'd go on move everything in place, install bootloader etc.

That used to always work, and does when I try the same with ext4
instead of btrfs on the new images.

But with btrfs I get spurious No space error like:
cp: cannot create regular file
'/mnt/root/X/media/usr/share/doc/openjdk-8-jre-
headless/api/java/security/PrivilegedExceptionAction.html': No space
left on device
cp: cannot create regular file
'/mnt/root/X/media/usr/share/doc/openjdk-8-jre-
headless/api/java/security/Provider.Service.html': No space left on
device
cp: cannot create regular file
'/mnt/root/X/media/usr/share/doc/openjdk-8-jre-
headless/api/javax/script/AbstractScriptEngine.html': No space left on
device

or:
cp: preserving permissions for
‘/mnt/root/X/usr/include/c++/6/gnu/javax/crypto/keyring/BaseKeyring.h’:
No space left on device
cp: preserving permissions for ‘/mnt/root/X/usr/share/doc/cmake-
data/html/variable/CMAKE_CXX_STANDARD_REQUIRED.html’: No space left on
device


All these happen always (when I create a fresh btrfs on the volume and
start over) with different files... and btrfs filesystem df shows
plenty of space left like in terms of >15GB left.


Any ideas?

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: No space left on device when doing "mkdir"

2017-05-01 Thread Gerard Saraber

It did it again:
shrapnel share # touch test.txt
touch: cannot touch 'test.txt': No space left on device
shrapnel share # df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/root35G   19G   15G  56% /
devtmpfs 10M 0   10M   0% /dev
tmpfs   3.2G  1.2M  3.2G   1% /run
shm  16G 0   16G   0% /dev/shm
cgroup_root  10M 0   10M   0% /sys/fs/cgroup
/dev/sdb 35T   22T   14T  62% /home/exports
shrapnel share # grep -IR .
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/flags:2
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/raid1/used_bytes:3997696
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/raid1/total_bytes:33554432
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_pinned:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/disk_total:67108864
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_may_use:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_readonly:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_used:3997696
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_reserved:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/disk_used:7995392
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/total_bytes_pinned:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/total_bytes:33554432
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/flags:4
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/raid1/used_bytes:66595684352
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/raid1/total_bytes:280246616064
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_pinned:835584
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/disk_total:560493232128
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_may_use:2014478974976
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_readonly:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_used:66595684352
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_reserved:16384
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/disk_used:133191368704
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/total_bytes_pinned:1048576
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/total_bytes:280246616064
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/global_rsv_size:536870912
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/flags:1
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/raid1/used_bytes:23249396273152
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/raid1/total_bytes:23320598675456
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_pinned:1835008
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/disk_total:46641197350912
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_may_use:262144
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_readonly:1769472
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_used:23249396273152
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_reserved:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/disk_used:46498792546304
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/total_bytes_pinned:2097152
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/total_bytes:23320598675456
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/global_rsv_reserved:536018944

On Fri, Apr 28, 2017 at 8:56 AM, Gerard Saraber  wrote:
> Dmarc is off, here's the output of the allocations: it's working
> correctly right now, I'll update when it does it again.
>
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/flags:2
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/raid1/used_bytes:3948544
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/raid1/total_bytes:33554432
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_pinned:0
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/disk_total:67108864
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_may_use:0
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_readonly:0
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_used:3948544
> /sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9e

Re: No space left on device when doing "mkdir"

2017-04-28 Thread Gerard Saraber

Dmarc is off, here's the output of the allocations: it's working
correctly right now, I'll update when it does it again.

/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/flags:2
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/raid1/used_bytes:3948544
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/raid1/total_bytes:33554432
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_pinned:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/disk_total:67108864
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_may_use:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_readonly:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_used:3948544
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/bytes_reserved:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/disk_used:7897088
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/total_bytes_pinned:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/system/total_bytes:33554432
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/flags:4
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/raid1/used_bytes:65864957952
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/raid1/total_bytes:83751862272
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_pinned:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/disk_total:167503724544
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_may_use:739508224
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_readonly:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_used:65864957952
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/bytes_reserved:1835008
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/disk_used:131729915904
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/total_bytes_pinned:1884160
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/metadata/total_bytes:83751862272
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/global_rsv_size:536870912
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/flags:1
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/raid1/used_bytes:23029876707328
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/raid1/total_bytes:23175643529216
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_pinned:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/disk_total:46351287058432
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_may_use:36474880
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_readonly:1703936
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_used:23029876707328
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/bytes_reserved:15003648
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/disk_used:46059753414656
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/total_bytes_pinned:0
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/data/total_bytes:23175643529216
/sys/fs/btrfs/7af2e65c-3935-4e0d-aa63-9ef6be991cb9/allocation/global_rsv_reserved:536870912


On Thu, Apr 27, 2017 at 6:35 PM, Chris Murphy  wrote:
> On Thu, Apr 27, 2017 at 10:46 AM, Gerard Saraber  wrote:
>> After a reboot, I found this in the logs:
>> [  322.510152] BTRFS info (device sdm): The free space cache file
>> (36114966511616) is invalid. skip it
>> [  488.702570] btrfs_printk: 847 callbacks suppressed
>>
>>
>>
>> On Thu, Apr 27, 2017 at 10:18 AM, Gerard Saraber  wrote:
>>> no snapshots and no qgroups, just a straight up large volume.
>>>
>>> shrapnel gerard-store # btrfs fi df /home/exports
>>> Data, RAID1: total=20.93TiB, used=20.86TiB
>>> System, RAID1: total=32.00MiB, used=3.73MiB
>>> Metadata, RAID1: total=79.00GiB, used=61.10GiB
>>> GlobalReserve, single: total=512.00MiB, used=544.00KiB
>>>
>>> shrapnel gerard-store # btrfs filesystem usage /home/exports
>>> Overall:
>>> Device size:  69.13TiB
>>> Device allocated: 42.01TiB
>>> Device unallocated:   27.13TiB
>>> Device missing:  0.00B
>>> Used: 41.84TiB
>>> Free (estimated): 13.63TiB  (min: 13.63TiB)
>>> Data ratio:   2.00
>>> Metadata ratio:   2.00
>>> Global reserve:  512.00MiB  (used: 1.52MiB)
>>>
>>> On Thu, Apr 27, 2017 at 9:07 AM, Roman Mamedov  wrote:
 On Thu, 27 Apr 2017 08:52:30 -0500
 Gerard Saraber  wrote:

> I could just reboot the sy

Re: No space left on device when doing "mkdir"

2017-04-27 Thread Chris Murphy

On Thu, Apr 27, 2017 at 10:46 AM, Gerard Saraber  wrote:
> After a reboot, I found this in the logs:
> [  322.510152] BTRFS info (device sdm): The free space cache file
> (36114966511616) is invalid. skip it
> [  488.702570] btrfs_printk: 847 callbacks suppressed
>
>
>
> On Thu, Apr 27, 2017 at 10:18 AM, Gerard Saraber  wrote:
>> no snapshots and no qgroups, just a straight up large volume.
>>
>> shrapnel gerard-store # btrfs fi df /home/exports
>> Data, RAID1: total=20.93TiB, used=20.86TiB
>> System, RAID1: total=32.00MiB, used=3.73MiB
>> Metadata, RAID1: total=79.00GiB, used=61.10GiB
>> GlobalReserve, single: total=512.00MiB, used=544.00KiB
>>
>> shrapnel gerard-store # btrfs filesystem usage /home/exports
>> Overall:
>> Device size:  69.13TiB
>> Device allocated: 42.01TiB
>> Device unallocated:   27.13TiB
>> Device missing:  0.00B
>> Used: 41.84TiB
>> Free (estimated): 13.63TiB  (min: 13.63TiB)
>> Data ratio:   2.00
>> Metadata ratio:   2.00
>> Global reserve:  512.00MiB  (used: 1.52MiB)
>>
>> On Thu, Apr 27, 2017 at 9:07 AM, Roman Mamedov  wrote:
>>> On Thu, 27 Apr 2017 08:52:30 -0500
>>> Gerard Saraber  wrote:
>>>
 I could just reboot the system and be fine for a week or so, but is
 there any way to diagnose this?
>>>
>>> `btrfs fi df` for a start.
>>>
>>> Also obligatory questions: do you have a lot of snapshots, and do you use
>>> qgroups?
>>>

A dev might find this helpful
$ grep -IR . /sys/fs/btrfs/usevolumeUUIDhere/allocation/


Also note that a lot of people on Btrfs aren't getting Gerard's
emails, because anyone using gmail and some other agents see it as
spam because of DMARC failure. Basically rarcoa.com is configured to
tell mail senders to fail to (re)send emails, they can only be sent
from raroa.com. Anyway, I think this is supposed to be fixed in
mailing list servers, they need to strip these headers and insert
their own rather than leaving them intact only later to get rejected
due to honoring the header's stated policy.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: No space left on device when doing "mkdir"

2017-04-27 Thread Gerard Saraber

After a reboot, I found this in the logs:
[  322.510152] BTRFS info (device sdm): The free space cache file
(36114966511616) is invalid. skip it
[  488.702570] btrfs_printk: 847 callbacks suppressed



On Thu, Apr 27, 2017 at 10:18 AM, Gerard Saraber  wrote:
> no snapshots and no qgroups, just a straight up large volume.
>
> shrapnel gerard-store # btrfs fi df /home/exports
> Data, RAID1: total=20.93TiB, used=20.86TiB
> System, RAID1: total=32.00MiB, used=3.73MiB
> Metadata, RAID1: total=79.00GiB, used=61.10GiB
> GlobalReserve, single: total=512.00MiB, used=544.00KiB
>
> shrapnel gerard-store # btrfs filesystem usage /home/exports
> Overall:
> Device size:  69.13TiB
> Device allocated: 42.01TiB
> Device unallocated:   27.13TiB
> Device missing:  0.00B
> Used: 41.84TiB
> Free (estimated): 13.63TiB  (min: 13.63TiB)
> Data ratio:   2.00
> Metadata ratio:   2.00
> Global reserve:  512.00MiB  (used: 1.52MiB)
>
> On Thu, Apr 27, 2017 at 9:07 AM, Roman Mamedov  wrote:
>> On Thu, 27 Apr 2017 08:52:30 -0500
>> Gerard Saraber  wrote:
>>
>>> I could just reboot the system and be fine for a week or so, but is
>>> there any way to diagnose this?
>>
>> `btrfs fi df` for a start.
>>
>> Also obligatory questions: do you have a lot of snapshots, and do you use
>> qgroups?
>>
>> --
>> With respect,
>> Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: No space left on device when doing "mkdir"

2017-04-27 Thread Gerard Saraber

no snapshots and no qgroups, just a straight up large volume.

shrapnel gerard-store # btrfs fi df /home/exports
Data, RAID1: total=20.93TiB, used=20.86TiB
System, RAID1: total=32.00MiB, used=3.73MiB
Metadata, RAID1: total=79.00GiB, used=61.10GiB
GlobalReserve, single: total=512.00MiB, used=544.00KiB

shrapnel gerard-store # btrfs filesystem usage /home/exports
Overall:
Device size:  69.13TiB
Device allocated: 42.01TiB
Device unallocated:   27.13TiB
Device missing:  0.00B
Used: 41.84TiB
Free (estimated): 13.63TiB  (min: 13.63TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 1.52MiB)

On Thu, Apr 27, 2017 at 9:07 AM, Roman Mamedov  wrote:
> On Thu, 27 Apr 2017 08:52:30 -0500
> Gerard Saraber  wrote:
>
>> I could just reboot the system and be fine for a week or so, but is
>> there any way to diagnose this?
>
> `btrfs fi df` for a start.
>
> Also obligatory questions: do you have a lot of snapshots, and do you use
> qgroups?
>
> --
> With respect,
> Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: No space left on device when doing "mkdir"

2017-04-27 Thread Roman Mamedov

On Thu, 27 Apr 2017 08:52:30 -0500
Gerard Saraber  wrote:

> I could just reboot the system and be fine for a week or so, but is
> there any way to diagnose this?

`btrfs fi df` for a start.

Also obligatory questions: do you have a lot of snapshots, and do you use
qgroups?

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

No space left on device when doing "mkdir"

2017-04-27 Thread Gerard Saraber

Hi everyone,

I'm running: Linux shrapnel 4.11.0-rc8 #2 SMP Mon Apr 24 08:47:46 CDT
2017 x86_64 Intel(R) Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux
32GB memory

There's a kworker taking 100% cpu:
30856 root  20   0   0  0  0 R 100.0  0.0  19:26.90
kworker/u8:0

and writing about 3-5MB/sec according to iotop:
30856 be/4 root0.00 B/s5.07 M/s  0.00 %  0.00 % [kworker/u8:0]

My filesystem is a Raid1 and has plenty of space:
/dev/sdb 35T   21T   14T  61% /home/exports

and I am using NFS to write to the volume, a lot

Label: none  uuid: 7af2e65c-3935-4e0d-aa63-9ef6be991cb9
Total devices 18 FS bytes used 20.92TiB
devid2 size 3.64TiB used 2.24TiB path /dev/sdb
devid3 size 3.64TiB used 2.46TiB path /dev/sdc
devid4 size 3.64TiB used 2.00TiB path /dev/sde
devid7 size 2.73TiB used 1.42TiB path /dev/sdl
devid8 size 2.73TiB used 1.12TiB path /dev/sdn
devid9 size 3.64TiB used 2.35TiB path /dev/sdq
devid   10 size 3.64TiB used 2.07TiB path /dev/sdr
devid   11 size 5.46TiB used 4.19TiB path /dev/sda
devid   12 size 5.46TiB used 4.23TiB path /dev/sdf
devid   13 size 5.46TiB used 4.08TiB path /dev/sdh
devid   14 size 3.64TiB used 1.98TiB path /dev/sdo
devid   15 size 2.73TiB used 1.07TiB path /dev/sdm
devid   17 size 5.46TiB used 3.80TiB path /dev/sdd
devid   18 size 2.73TiB used 1.07TiB path /dev/sdj
devid   19 size 2.73TiB used 1.07TiB path /dev/sdg
devid   20 size 2.73TiB used 1.07TiB path /dev/sdk
devid   21 size 3.64TiB used 1.98TiB path /dev/sdp
devid   22 size 5.46TiB used 3.80TiB path /dev/sds

The problem:

shrapnel gerard-store # mkdir gopro
mkdir: cannot create directory 'gopro': No space left on device

^^ that error message pops up after 10-15 seconds.

I could just reboot the system and be fine for a week or so, but is
there any way to diagnose this?

Thanks!
-Gerard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

"btrfs receive" throws "No space left on device" on empty and large enough fs

2017-02-14 Thread Luca Citi


Hi all

I recently submitted a bug report to launchpad ( 
https://bugs.launchpad.net/ubuntu/+source/btrfs-tools/+bug/1664013 ) but 
I now found out about this mailing list, which may be a better place to 
post.


Basically, whenever many files are created at high throughput in my 
empty btrfs file system, it throws an ENOSPC error.I have had this 
happen with both "btrfs receive" and with "rsync" but not when creating 
a single large file. Both "rsync" and "btrfs receive" fail not when 
creating a new file but when renaming/moving it (see link).


I hope the description of the error in launchpad is helpful (please let 
me know if I need to re-post it here).
I am happy to provide additional information or run other tests if this 
can help.


Thanks!
Luca
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [SOLVED] BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-23 Thread Ronan Arraes Jardim Chagas

Hi guys!

After a week without experiencing the problem, I think we can mark this
problem as solved. I want to thanks all the devs on this list. You were
always very helpful. For anyone who is still experiencing the reported
problem, upgrade to kernel 4.7.3 and I think you will be fine :)

Best regards and thank you all,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Ronan Arraes Jardim Chagas

Hi Josef,

Em qui, 2016-09-22 às 13:49 -0400, Josef Bacik escreveu:
> That patch fixed a problem where we would screw up the ENOSPC
> accounting, and 
> would slowly leak space into one of the counters.  So eventually (or
> often in 
> your case) you'd hit ENOSPC, but have plenty of space available.  If
> you 
> unmounted and mounted again, or simply rebooted, everything would
> have been 
> fine.  You can still use the fs, the accounting is purely in memory
> so it's not 
> like your FS is permanently screwed.  Thanks,


Thank you very much for the explanation. I am very glad it is finally
fixed here :)

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Josef Bacik


On 09/22/2016 01:06 PM, Ronan Arraes Jardim Chagas wrote:

Hi Josef,

Em qui, 2016-09-22 às 10:39 -0400, Josef Bacik escreveu:

This is what fixed it.  I thought it was in 4.7 which is why I
started paying
attention, but I guess I was wrong.  Glad your problem is
resolved.  Thanks,


Do you have any explanations why the problem solved by the patch was
causing me the ENOSPC? Also, is it necessary to format my partition or
should I consider it good for use after the installation of the new
kernel?


That patch fixed a problem where we would screw up the ENOSPC accounting, and 
would slowly leak space into one of the counters.  So eventually (or often in 
your case) you'd hit ENOSPC, but have plenty of space available.  If you 
unmounted and mounted again, or simply rebooted, everything would have been 
fine.  You can still use the fs, the accounting is purely in memory so it's not 
like your FS is permanently screwed.  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Ronan Arraes Jardim Chagas

Hi Josef,

Em qui, 2016-09-22 às 10:39 -0400, Josef Bacik escreveu:
> This is what fixed it.  I thought it was in 4.7 which is why I
> started paying 
> attention, but I guess I was wrong.  Glad your problem is
> resolved.  Thanks,

Do you have any explanations why the problem solved by the patch was
causing me the ENOSPC? Also, is it necessary to format my partition or
should I consider it good for use after the installation of the new
kernel?

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Josef Bacik


On 09/22/2016 10:03 AM, Ronan Arraes Jardim Chagas wrote:

Em qui, 2016-09-22 às 09:41 -0400, Austin S. Hemmelgarn escreveu:

Most likely the kernel upgrade fixed things.  It's possible that the
large allocation is impacting something and making it work, but I
don't
think that that is very likely.


The patches related to btrfs I could find in kernel 4.7.2 and 4.7.3
changelog are:

commit 8d32aaa89067225d4202a362dc201280e2514952
Author: Chris Mason 
Date:   Tue Jul 19 05:52:36 2016 -0700

Btrfs: fix delalloc accounting after copy_from_user faults


This is what fixed it.  I thought it was in 4.7 which is why I started paying 
attention, but I guess I was wrong.  Glad your problem is resolved.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Ronan Arraes Jardim Chagas

Em qui, 2016-09-22 às 09:41 -0400, Austin S. Hemmelgarn escreveu:
> Most likely the kernel upgrade fixed things.  It's possible that the 
> large allocation is impacting something and making it work, but I
> don't 
> think that that is very likely.

The patches related to btrfs I could find in kernel 4.7.2 and 4.7.3
changelog are:

commit 8d32aaa89067225d4202a362dc201280e2514952
Author: Chris Mason 
Date:   Tue Jul 19 05:52:36 2016 -0700

Btrfs: fix delalloc accounting after copy_from_user faults

commit f495a60eb6351bf2f29fdbc1854375df9fe4022b
Author: Paolo Valente 
Date:   Wed Jul 27 07:22:05 2016 +0200

block: add missing group association in bio-cloning functions
    Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers")

commit ff3235105fc7e4ecf04eb308940821d4a098c08d
Author: Jeff Mahoney 
Date:   Wed Aug 17 21:58:33 2016 -0400

btrfs: don't create or leak aliased root while cleaning up orphans

commit 64563a38fde57a26f4d68d488d0d4918f843547c
Author: Jeff Mahoney 
Date:   Mon Aug 15 12:10:33 2016 -0400

btrfs: properly track when rescan worker is running

commit 69b69167965e108a775ef20decabcc76fbe4fc08
Author: Jeff Mahoney 
Date:   Mon Aug 8 22:08:06 2016 -0400

btrfs: waiting on qgroup rescan should not always be interruptible

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Austin S. Hemmelgarn


On 2016-09-22 09:20, Ronan Arraes Jardim Chagas wrote:

Guys,

Something very strange happened. I have not seen the problem since
Monday, which is pretty much the first time ever I work more than 3
days without seeing it.

Ok, it can be a coincidence. Notice that I did not change anything
related to my work behavior. However, I did do two things:

_ Update the kernel to 4.7.2; and
_ Created 50 dummy files with 3.0 GiB each.

Can anyone, please, tell me if these things seems to be correlated?
Most likely the kernel upgrade fixed things.  It's possible that the 
large allocation is impacting something and making it work, but I don't 
think that that is very likely.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Jeff Mahoney

On 9/18/16 10:38 PM, Wang Xiaoguang wrote:
> hi,
> 
> On 09/14/2016 10:25 PM, Jeff Mahoney wrote:
>> On 9/13/16 10:24 PM, Josef Bacik wrote:
>>> On 09/08/2016 07:02 PM, Jeff Mahoney wrote:
 On 9/8/16 2:49 PM, Jeff Mahoney wrote:
> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
>> Hi all!
>>
>> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>>> Just like what Wang has mentioned, would you please paste all the
>>> output
>>> of the contents of /sys/fs/btrfs//allocation?
>>>
>>> It's recommended to use "grep . -IR " to get all the data as
>>> it
>>> will show the file name.
>> So, one more time, I see the problem. This time I was just using
>> Firefox and I cannot recover using `btrfs balance`. I think that, one
>> more time, I will need to reboot this machine. This problem is really
>> causing me a lot of troubles :(
> I have a hunch the list is about to be flooded with similar reports if
> we don't find this one before 4.8.
>
> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
> Author: Josef Bacik 
> Date:   Fri Mar 25 13:25:51 2016 -0400
>
>  Btrfs: warn_on for unaccounted spaces
>
> This commit isn't the source of the bug, but it's making it a lot more
> noisy.  I spent a few hours last night trying to track down why
> xfstests
> was throwing these warnings and I was able to reproduce them at
> least as
> far back as 4.4-vanilla with -oenospc_debug enabled.
>
> Speaking of which, can you turn on mounting with -oenospc_debug if you
> haven't already?
>
> In my case, space_info->bytes_may_use was getting accounted
> incorrectly.
>
> I am able to reproduce that even with the following commit:
> commit 18513091af9483ba84328d42092bd4d42a3c958f
> Author: Wang Xiaoguang 
> Date:   Mon Jul 25 15:51:40 2016 +0800
>
>  btrfs: update btrfs_space_info's bytes_may_use timely
 And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
 fixed by:

 commit ed7a6948394305b810d0c6203268648715e5006f
 Author: Wang Xiaoguang 
 Date:   Fri Aug 26 11:33:14 2016 +0800

  btrfs: do not decrease bytes_may_use when replaying extents

 ... which shouldn't change anything for your issue, unfortunately.

 I still see these:
 WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
 btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
 Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
 msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
 acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr
 ipmi_msghandler
 8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
 amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
 ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
 ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
 ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod
 autofs4
 CPU: 2 PID: 8166 Comm: umount Tainted: GW
 4.4.19-11.g81405db-vanilla #1
 Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
    880230317d10 813170ec 
   a0472528 880230317d48 8107d816 
   88009ab03600 8800ba106288 8800ab75a000 8800ba106200
 Call Trace:
   [] dump_stack+0x63/0x87
   [] warn_slowpath_common+0x86/0xc0
   [] warn_slowpath_null+0x1a/0x20
   [] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
   [] close_ctree+0x15b/0x330 [btrfs]
   [] btrfs_put_super+0x19/0x20 [btrfs]
   [] generic_shutdown_super+0x6f/0x100
   [] kill_anon_super+0x12/0x20
   [] btrfs_kill_super+0x18/0x120 [btrfs]
   [] deactivate_locked_super+0x43/0x70
   [] deactivate_super+0x46/0x60
   [] cleanup_mnt+0x3f/0x80
   [] __cleanup_mnt+0x12/0x20
   [] task_work_run+0x86/0xb0
   [] exit_to_usermode_loop+0x73/0xa2
   [] syscall_return_slowpath+0x8d/0xa0
   [] int_ret_from_sys_call+0x25/0x8f
 ---[ end trace 09a0cc2892b6305c ]---
 BTRFS: space_info 1 has 7946240 free, is not full
 BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
 may_use=4096, readonly=0

 ... where the value of may_use varies.

>>> What test are you seeing this with?  Thanks,
>> btrfs/022 hits it every time for me.
> btrfs/022 is not related to this enospc error.
> Qu wenruo's patch “ btrfs: Fix leaking bytes_may_use after hitting
> EDQUOTA” has
> fixed this warning, please check his patch for detailed commit message.

Yep, that's understood.  This was just something I happened to encounter
while looking at this.

-Jeff


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-22 Thread Ronan Arraes Jardim Chagas

Guys,

Something very strange happened. I have not seen the problem since
Monday, which is pretty much the first time ever I work more than 3
days without seeing it.

Ok, it can be a coincidence. Notice that I did not change anything
related to my work behavior. However, I did do two things:

_ Update the kernel to 4.7.2; and
_ Created 50 dummy files with 3.0 GiB each.

Can anyone, please, tell me if these things seems to be correlated?

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-18 Thread Wang Xiaoguang


hi,

On 09/14/2016 10:25 PM, Jeff Mahoney wrote:

On 9/13/16 10:24 PM, Josef Bacik wrote:

On 09/08/2016 07:02 PM, Jeff Mahoney wrote:

On 9/8/16 2:49 PM, Jeff Mahoney wrote:

On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:

Hi all!

Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:

Just like what Wang has mentioned, would you please paste all the
output
of the contents of /sys/fs/btrfs//allocation?

It's recommended to use "grep . -IR " to get all the data as
it
will show the file name.

So, one more time, I see the problem. This time I was just using
Firefox and I cannot recover using `btrfs balance`. I think that, one
more time, I will need to reboot this machine. This problem is really
causing me a lot of troubles :(

I have a hunch the list is about to be flooded with similar reports if
we don't find this one before 4.8.

commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
Author: Josef Bacik 
Date:   Fri Mar 25 13:25:51 2016 -0400

 Btrfs: warn_on for unaccounted spaces

This commit isn't the source of the bug, but it's making it a lot more
noisy.  I spent a few hours last night trying to track down why xfstests
was throwing these warnings and I was able to reproduce them at least as
far back as 4.4-vanilla with -oenospc_debug enabled.

Speaking of which, can you turn on mounting with -oenospc_debug if you
haven't already?

In my case, space_info->bytes_may_use was getting accounted incorrectly.

I am able to reproduce that even with the following commit:
commit 18513091af9483ba84328d42092bd4d42a3c958f
Author: Wang Xiaoguang 
Date:   Mon Jul 25 15:51:40 2016 +0800

 btrfs: update btrfs_space_info's bytes_may_use timely

And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
fixed by:

commit ed7a6948394305b810d0c6203268648715e5006f
Author: Wang Xiaoguang 
Date:   Fri Aug 26 11:33:14 2016 +0800

 btrfs: do not decrease bytes_may_use when replaying extents

... which shouldn't change anything for your issue, unfortunately.

I still see these:
WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod
autofs4
CPU: 2 PID: 8166 Comm: umount Tainted: GW
4.4.19-11.g81405db-vanilla #1
Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
   880230317d10 813170ec 
  a0472528 880230317d48 8107d816 
  88009ab03600 8800ba106288 8800ab75a000 8800ba106200
Call Trace:
  [] dump_stack+0x63/0x87
  [] warn_slowpath_common+0x86/0xc0
  [] warn_slowpath_null+0x1a/0x20
  [] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
  [] close_ctree+0x15b/0x330 [btrfs]
  [] btrfs_put_super+0x19/0x20 [btrfs]
  [] generic_shutdown_super+0x6f/0x100
  [] kill_anon_super+0x12/0x20
  [] btrfs_kill_super+0x18/0x120 [btrfs]
  [] deactivate_locked_super+0x43/0x70
  [] deactivate_super+0x46/0x60
  [] cleanup_mnt+0x3f/0x80
  [] __cleanup_mnt+0x12/0x20
  [] task_work_run+0x86/0xb0
  [] exit_to_usermode_loop+0x73/0xa2
  [] syscall_return_slowpath+0x8d/0xa0
  [] int_ret_from_sys_call+0x25/0x8f
---[ end trace 09a0cc2892b6305c ]---
BTRFS: space_info 1 has 7946240 free, is not full
BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
may_use=4096, readonly=0

... where the value of may_use varies.


What test are you seeing this with?  Thanks,

btrfs/022 hits it every time for me.

btrfs/022 is not related to this enospc error.
Qu wenruo's patch “ btrfs: Fix leaking bytes_may_use after hitting 
EDQUOTA” has

fixed this warning, please check his patch for detailed commit message.

Regards,
Xiaoguang Wang


-Jeff





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-14 Thread Ronan Arraes Jardim Chagas

Hi Chris,

Em Qua, 2016-09-14 às 16:25 -0600, Chris Murphy escreveu:
> All I can think of is the file system has gotten into a unique state
> through a combination of events. I'm still suspicious that qgroups is
> contributing to the problem even after being disabled. The workload
> you're talking about is completely ordinary and trivial.

This seems reasonable. However, I formatted the computer and after two
days, if I remember correctly, I started to see the problems again. I'm
still thinking it should be also related to my HDD (7200 RPM). In all
my other computers, everything is fine and I use SSD.

> The openSUSE layout is basically impossible to backup and restore,
> there's astrometric tons of snapshots, there's no recursive btrfs
> send/receive to try and migrate it to a new file system intact, so
> you'd pretty much just have to reinstall it no matter what. If it
> were
> me, reinstall with Btrfs same as now, and first thing before anything
> else I'd disable quotas. Or yeah, it's completely reasonable for you
> to move to a different file system, it's really a coin toss for ext4
> vs XFS, but at least XFS now checksums metadata and the journal by
> default so if I thought about it at the time of the installation I'd
> do that.

Thanks! 

> Yeah FWIW, the devs seem to prefer the output from 'grep . -IR
> /sys/fs/btrfs//allocation/' so for these kinds of problems
> I'd
> report that.

Yeah, unfortunately I forgot this one today :(

> If you *really* want to, you could grab a Fedora Rawhide nightly that
> has kernel 4.8 rc6 on it, with debug stuff enabled. If it face
> plants,
> it should catch useful stuff for Josef. If it doesn't, maybe it fixes
> enough things that you can get back to work for a while longer until
> a
> long term fix becomes available. The only way to know for sure is to
> test it. But it's completely sane to just switch to XFS and get back
> to work also.
> 
> Current
> https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-201
> 60914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-
> x86_64-Rawhide-20160914.n.0.iso.n.0.iso
> 
> Use 'dd if=ISO of=USBstick bs=256K' that will boot anything, BIOS or
> UEFI. At the menu, choose Troubleshooting, then the Rescue option, at
> the next text menu choose 3 to get to a shell. And from there you can
> mount with enospc_debug, and do a balance of the file system. To get
> logs off the system, use a 2nd USB stick, or if you have wired
> ethernet use scp, or if you know nmcli you can maybe get the wireless
> up by command line.

This seems good. However, I just have access to that machine during my
working period, and I just does not have time to test this, sorry :(

Nevertheless, when you mentioned the `dd` command, I had a great idea
that can help me to live with this problem until I have access to
kernel 4.8. I will use `dd` to create, let's say, 100 files with 3 GiB
each in my /home directory. Hence, when I see ENOSPC, I will just need
to delete some of these files. I think this should work.

Thanks for all the advices Chris!

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-14 Thread Chris Murphy

All I can think of is the file system has gotten into a unique state
through a combination of events. I'm still suspicious that qgroups is
contributing to the problem even after being disabled. The workload
you're talking about is completely ordinary and trivial.

The openSUSE layout is basically impossible to backup and restore,
there's astrometric tons of snapshots, there's no recursive btrfs
send/receive to try and migrate it to a new file system intact, so
you'd pretty much just have to reinstall it no matter what. If it were
me, reinstall with Btrfs same as now, and first thing before anything
else I'd disable quotas. Or yeah, it's completely reasonable for you
to move to a different file system, it's really a coin toss for ext4
vs XFS, but at least XFS now checksums metadata and the journal by
default so if I thought about it at the time of the installation I'd
do that.


> Look what happened to my METADATA during the update:
>
> 1) When the problem occured:
>
> # btrfs fi usage /

Yeah FWIW, the devs seem to prefer the output from 'grep . -IR
/sys/fs/btrfs//allocation/' so for these kinds of problems I'd
report that.




>
> 4) After another rebalance (I saw the ENOSPC again):

> Metadata,DUP: Size:150.50GiB, Used:1.17GiB
>/dev/sda6 301.00GiB

Yeah holy crap weird.

But the fs is already in some funky state so at this point it's not
surprising it continues to do crazy things. If the devs knew exactly
what was going on, they'd say so. If they had a fix, they'd post it or
at least an ETA. And while ostensibly the enospc work in 4.8 would
work around this problem, it's unknown until it's tested.

If you *really* want to, you could grab a Fedora Rawhide nightly that
has kernel 4.8 rc6 on it, with debug stuff enabled. If it face plants,
it should catch useful stuff for Josef. If it doesn't, maybe it fixes
enough things that you can get back to work for a while longer until a
long term fix becomes available. The only way to know for sure is to
test it. But it's completely sane to just switch to XFS and get back
to work also.

Current
https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20160914.n.0.iso.n.0.iso

Use 'dd if=ISO of=USBstick bs=256K' that will boot anything, BIOS or
UEFI. At the menu, choose Troubleshooting, then the Rescue option, at
the next text menu choose 3 to get to a shell. And from there you can
mount with enospc_debug, and do a balance of the file system. To get
logs off the system, use a 2nd USB stick, or if you have wired
ethernet use scp, or if you know nmcli you can maybe get the wireless
up by command line.


> This problem is really causing me problems. I am starting to think that
> Tumbleweed, at least, should not choose BTRFS as the default file
> system, since this distribution is supposed to be stable. I think that
> BTRFS has some serious problems at least in kernels 4.6 and 4.7.
>
> I reported this problem more than 1 month ago, and yet nobody could
> provide me at least a workaround so I can keep working here. I think
> the best will be to format this machine (**again**) and use EXT4 of
> XFS, if nobody could help me to fix or avoid this problem in the
> following days.

Yep, completely reasonable.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-14 Thread Ronan Arraes Jardim Chagas

Hi guys,

The problem happened again, but now it was way more serious. I was
doing a big Tumbleweed update (4680 packages) and I got the ENOSPC
during the update. To avoid being left with a broken system, as it has
already happened in the past, I, unfortunately, needed to delete data
that I really was not planning to. This is a disaster, because I have
more than 1 TiB of **free space**.

After deleting 7GiB of data, I could run rebalance and the update
finished successfully. However, the ENOSPC happened 3 more times (!)
and I always needed to run rebalance to keep the update going.

Sometimes, during the rebalance, I saw the message:

[28736.688266] BTRFS info (device sda6): relocating block group
389998968832 flags 34
[28737.376302] BTRFS info (device sda6): found 4 extents
[28737.712815] BTRFS info (device sda6): relocating block group
343760961536 flags 36
[28738.010030] BTRFS info (device sda6): relocating block group
343224090624 flags 36
[28738.343461] BTRFS info (device sda6): relocating block group
342687219712 flags 36
[28738.660023] BTRFS info (device sda6): relocating block group
342150348800 flags 36
[28738.665241] use_block_rsv: 11 callbacks suppressed
[28738.665247] [ cut here ]
[28738.665290] WARNING: CPU: 10 PID: 639 at ../fs/btrfs/extent-
tree.c:8097 btrfs_alloc_tree_block+0x3f1/0x4c0 [btrfs]
[28738.665292] BTRFS: block rsv returned -28
[28738.665295] Modules linked in: dm_mod fuse nf_log_ipv6 xt_pkttype
nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft
iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4
iptable_raw xt_CT snd_hda_codec_hdmi snd_hda_codec_realtek
nvidia_drm(PO) snd_hda_codec_generic snd_hda_intel nvidia_modeset(PO)
snd_hda_codec snd_hda_core snd_hwdep iptable_filter nvidia(PO) joydev
drm_kms_helper intel_rapl drm fb_sys_fops iTCO_wdt mei_wdt syscopyarea
snd_pcm snd_timer iTCO_vendor_support sysfillrect sb_edac snd i2c_i801
mei_me lpc_ich edac_core sysimgblt ip6table_mangle x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel soundcore mei aes_x86_64
[28738.665359]  lrw gf128mul glue_helper ablk_helper cryptd e1000e
hp_wmi ioatdma fjes nf_conntrack_netbios_ns ptp shpchp pps_core
sparse_keymap pcspkr mfd_core nf_conntrack_broadcast rfkill
tpm_infineon tpm_tis dca tpm nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables btrfs xor
raid6_pq hid_generic usbhid crc32c_intel serio_raw xhci_pci ehci_pci
sr_mod firewire_ohci xhci_hcd ehci_hcd cdrom firewire_core crc_itu_t
usbcore isci usb_common libsas ata_generic mpt3sas raid_class
scsi_transport_sas wmi button sg
[28738.665419] CPU: 10 PID: 639 Comm: systemd-journal Tainted:
PW  O4.7.1-1-default #1
[28738.665421] Hardware name: Hewlett-Packard HP Z820 Workstation/158B,
BIOS J63 v03.65 12/19/2013
[28738.665425]   81393104 88080bc63a68

[28738.665430]  8107ca1e 8804eaa73300 88080bc63ab8
4000
[28738.665434]   88017be9a000 880f51b31760
8107ca8f
[28738.665438] Call Trace:
[28738.665464]  [] dump_trace+0x5e/0x320
[28738.665472]  [] show_stack_log_lvl+0x10c/0x180
[28738.665478]  [] show_stack+0x21/0x40
[28738.665486]  [] dump_stack+0x5c/0x78
[28738.665496]  [] __warn+0xbe/0xe0
[28738.665503]  [] warn_slowpath_fmt+0x4f/0x60
[28738.665529]  [] btrfs_alloc_tree_block+0x3f1/0x4c0
[btrfs]
[28738.665560]  [] btrfs_copy_root+0xf2/0x280 [btrfs]
[28738.665593]  [] create_reloc_root+0x171/0x1e0
[btrfs]
[28738.665623]  [] btrfs_init_reloc_root+0x8f/0xa0
[btrfs]
[28738.665652]  [] record_root_in_trans+0xb2/0x110
[btrfs]
[28738.665679]  []
btrfs_record_root_in_trans+0x41/0x70 [btrfs]
[28738.665704]  [] start_transaction+0xa0/0x4f0
[btrfs]
[28738.665732]  [] btrfs_dirty_inode+0x33/0xc0
[btrfs]
[28738.665741]  [] file_update_time+0x99/0xf0
[28738.665770]  [] btrfs_page_mkwrite+0xa3/0x450
[btrfs]
[28738.665779]  [] do_page_mkwrite+0x69/0xc0
[28738.665785]  [] handle_pte_fault+0xf4/0x1760
[28738.665792]  [] handle_mm_fault+0x29e/0x5a0
[28738.665798]  [] __do_page_fault+0x1e0/0x510
[28738.665809]  [] page_fault+0x28/0x30
[28738.669296] DWARF2 unwinder stuck at page_fault+0x28/0x30

[28738.669300] Leftover inexact backtrace:

[28738.669327] ---[ end trace 8ef9cfba38cc9bfc ]---

Look what happened to my METADATA during the update:

1) When the problem occured:

# btrfs fi usage /
Overall:
Device size:   1.26TiB
Device allocated:     63.07GiB
Device unallocated:    1.20TiB
Device missing:  0.00B
Used:     50.21GiB
Free (estimated):      1.20TiB  (min: 612.49GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  400.00MiB  (used:

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-14 Thread Ronan Arraes Jardim Chagas

Hi Josef,

Em Ter, 2016-09-13 às 17:01 -0400, Josef Bacik escreveu:
> I just started paying attention to this, the last kernel I saw you
> were running 
> was 4.7.  Have you tried a recent kernel, like chris's tree?
> 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
> for-linus-4.8
> 
> is what I would like you to try if not.  Thanks,
> 
> Josef

Unfortunately, since this is a production machine, I am not allowed to
install unreleased kernels. If this is the only solution, I will need
to wait for 4.8 or search if anyone has already backported the BTRFS
patches for 4.7.

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-14 Thread Jeff Mahoney

On 9/13/16 10:24 PM, Josef Bacik wrote:
> On 09/08/2016 07:02 PM, Jeff Mahoney wrote:
>> On 9/8/16 2:49 PM, Jeff Mahoney wrote:
>>> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
 Hi all!

 Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
> Just like what Wang has mentioned, would you please paste all the
> output
> of the contents of /sys/fs/btrfs//allocation?
>
> It's recommended to use "grep . -IR " to get all the data as
> it
> will show the file name.

 So, one more time, I see the problem. This time I was just using
 Firefox and I cannot recover using `btrfs balance`. I think that, one
 more time, I will need to reboot this machine. This problem is really
 causing me a lot of troubles :(
>>>
>>> I have a hunch the list is about to be flooded with similar reports if
>>> we don't find this one before 4.8.
>>>
>>> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
>>> Author: Josef Bacik 
>>> Date:   Fri Mar 25 13:25:51 2016 -0400
>>>
>>> Btrfs: warn_on for unaccounted spaces
>>>
>>> This commit isn't the source of the bug, but it's making it a lot more
>>> noisy.  I spent a few hours last night trying to track down why xfstests
>>> was throwing these warnings and I was able to reproduce them at least as
>>> far back as 4.4-vanilla with -oenospc_debug enabled.
>>>
>>> Speaking of which, can you turn on mounting with -oenospc_debug if you
>>> haven't already?
>>>
>>> In my case, space_info->bytes_may_use was getting accounted incorrectly.
>>>
>>> I am able to reproduce that even with the following commit:
>>> commit 18513091af9483ba84328d42092bd4d42a3c958f
>>> Author: Wang Xiaoguang 
>>> Date:   Mon Jul 25 15:51:40 2016 +0800
>>>
>>> btrfs: update btrfs_space_info's bytes_may_use timely
>>
>> And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
>> fixed by:
>>
>> commit ed7a6948394305b810d0c6203268648715e5006f
>> Author: Wang Xiaoguang 
>> Date:   Fri Aug 26 11:33:14 2016 +0800
>>
>> btrfs: do not decrease bytes_may_use when replaying extents
>>
>> ... which shouldn't change anything for your issue, unfortunately.
>>
>> I still see these:
>> WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
>> btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
>> Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
>> msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
>> acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
>> 8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
>> amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
>> ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
>> ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
>> ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod
>> autofs4
>> CPU: 2 PID: 8166 Comm: umount Tainted: GW
>> 4.4.19-11.g81405db-vanilla #1
>> Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
>>   880230317d10 813170ec 
>>  a0472528 880230317d48 8107d816 
>>  88009ab03600 8800ba106288 8800ab75a000 8800ba106200
>> Call Trace:
>>  [] dump_stack+0x63/0x87
>>  [] warn_slowpath_common+0x86/0xc0
>>  [] warn_slowpath_null+0x1a/0x20
>>  [] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
>>  [] close_ctree+0x15b/0x330 [btrfs]
>>  [] btrfs_put_super+0x19/0x20 [btrfs]
>>  [] generic_shutdown_super+0x6f/0x100
>>  [] kill_anon_super+0x12/0x20
>>  [] btrfs_kill_super+0x18/0x120 [btrfs]
>>  [] deactivate_locked_super+0x43/0x70
>>  [] deactivate_super+0x46/0x60
>>  [] cleanup_mnt+0x3f/0x80
>>  [] __cleanup_mnt+0x12/0x20
>>  [] task_work_run+0x86/0xb0
>>  [] exit_to_usermode_loop+0x73/0xa2
>>  [] syscall_return_slowpath+0x8d/0xa0
>>  [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace 09a0cc2892b6305c ]---
>> BTRFS: space_info 1 has 7946240 free, is not full
>> BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
>> may_use=4096, readonly=0
>>
>> ... where the value of may_use varies.
>>
> 
> What test are you seeing this with?  Thanks,

btrfs/022 hits it every time for me.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-13 Thread Josef Bacik


On 09/13/2016 04:49 PM, Ronan Arraes Jardim Chagas wrote:

Hi guys,

One more time I saw the problem. It begins to happen on a daily basis
now. Unfortunately the `enospc_debug` flag did not help. I did not see
any new information in the logs. This time, only a hard reset worked. I
could not even reboot using gnome panel.


I just started paying attention to this, the last kernel I saw you were running 
was 4.7.  Have you tried a recent kernel, like chris's tree?



git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-4.8

is what I would like you to try if not.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-13 Thread Ronan Arraes Jardim Chagas

Hi guys,

One more time I saw the problem. It begins to happen on a daily basis
now. Unfortunately the `enospc_debug` flag did not help. I did not see
any new information in the logs. This time, only a hard reset worked. I
could not even reboot using gnome panel.

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-13 Thread Josef Bacik


On 09/08/2016 07:02 PM, Jeff Mahoney wrote:

On 9/8/16 2:49 PM, Jeff Mahoney wrote:

On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:

Hi all!

Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:

Just like what Wang has mentioned, would you please paste all the
output
of the contents of /sys/fs/btrfs//allocation?

It's recommended to use "grep . -IR " to get all the data as
it
will show the file name.


So, one more time, I see the problem. This time I was just using
Firefox and I cannot recover using `btrfs balance`. I think that, one
more time, I will need to reboot this machine. This problem is really
causing me a lot of troubles :(


I have a hunch the list is about to be flooded with similar reports if
we don't find this one before 4.8.

commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
Author: Josef Bacik 
Date:   Fri Mar 25 13:25:51 2016 -0400

Btrfs: warn_on for unaccounted spaces

This commit isn't the source of the bug, but it's making it a lot more
noisy.  I spent a few hours last night trying to track down why xfstests
was throwing these warnings and I was able to reproduce them at least as
far back as 4.4-vanilla with -oenospc_debug enabled.

Speaking of which, can you turn on mounting with -oenospc_debug if you
haven't already?

In my case, space_info->bytes_may_use was getting accounted incorrectly.

I am able to reproduce that even with the following commit:
commit 18513091af9483ba84328d42092bd4d42a3c958f
Author: Wang Xiaoguang 
Date:   Mon Jul 25 15:51:40 2016 +0800

btrfs: update btrfs_space_info's bytes_may_use timely


And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
fixed by:

commit ed7a6948394305b810d0c6203268648715e5006f
Author: Wang Xiaoguang 
Date:   Fri Aug 26 11:33:14 2016 +0800

btrfs: do not decrease bytes_may_use when replaying extents

... which shouldn't change anything for your issue, unfortunately.

I still see these:
WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod autofs4
CPU: 2 PID: 8166 Comm: umount Tainted: GW
4.4.19-11.g81405db-vanilla #1
Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
  880230317d10 813170ec 
 a0472528 880230317d48 8107d816 
 88009ab03600 8800ba106288 8800ab75a000 8800ba106200
Call Trace:
 [] dump_stack+0x63/0x87
 [] warn_slowpath_common+0x86/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
 [] close_ctree+0x15b/0x330 [btrfs]
 [] btrfs_put_super+0x19/0x20 [btrfs]
 [] generic_shutdown_super+0x6f/0x100
 [] kill_anon_super+0x12/0x20
 [] btrfs_kill_super+0x18/0x120 [btrfs]
 [] deactivate_locked_super+0x43/0x70
 [] deactivate_super+0x46/0x60
 [] cleanup_mnt+0x3f/0x80
 [] __cleanup_mnt+0x12/0x20
 [] task_work_run+0x86/0xb0
 [] exit_to_usermode_loop+0x73/0xa2
 [] syscall_return_slowpath+0x8d/0xa0
 [] int_ret_from_sys_call+0x25/0x8f
---[ end trace 09a0cc2892b6305c ]---
BTRFS: space_info 1 has 7946240 free, is not full
BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
may_use=4096, readonly=0

... where the value of may_use varies.



What test are you seeing this with?  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-13 Thread Ronan Arraes Jardim Chagas

Hi!

Em Ter, 2016-09-13 às 11:17 +0800, Wang Xiaoguang escreveu:
> It maybe a irrelevant question, but do you have compression enabled?
> 
> Regards,
> Xiaoguang Wang

No, I do not have compression enabled. I'm using openSUSE's default
configuration.

By the way, I was wrongly mounting the filesystem with `enospc_debug`.
It turns out that I modified the fstab in a backup directory, sorry :)
Now, I did it correctly so, hopefully, we will have much more
information about the problem the next time I see it!

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-12 Thread Wang Xiaoguang


hello,

On 08/13/2016 01:36 AM, Ronan Arraes Jardim Chagas wrote:

Hi guys,

I'm facing a daily problem with BTRFS. Almost everyday, I get the
message "No space left on device". Sometimes I can recover by balancing
the system but sometimes even balancing does not work due to the lack
of space. In this case, only a hard reset works if I can't delete some
files. The problem is that I have a huge unallocated space as you can
see here:

# btrfs fi usage /
Overall:
 Device size:  1.26TiB
 Device allocated:   119.07GiB
 Device unallocated:   1.14TiB
 Device missing: 0.00B
 Used:   115.08GiB
 Free (estimated): 1.14TiB  (min: 586.21GiB)
 Data ratio:  1.00
 Metadata ratio:  2.00
 Global reserve: 512.00MiB  (used: 0.00B)

Data,single: Size:113.01GiB, Used:111.19GiB
/dev/sda6113.01GiB

Metadata,DUP: Size:3.00GiB, Used:1.94GiB
/dev/sda6  6.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
/dev/sda6 64.00MiB

Unallocated:
/dev/sda6  1.14TiB

It is not easy to trigger the problem. But I do find some correlation
between two things:

1) When I started to create jails to build openSUSE packages locally,
then the problem happens more often. In these jails, some directories
like /dev/, /dev/pts, /proc, are mounted inside the jail.

2) When I open my KVM, I also see this problem more often. Notice,
however, that the KVM disk is stored in another EXT4 partition.

I would be glad if anyone can help me to fix it. In the following, I'm
providing more information about my system:

# uname -a
Linux ronanarraes-osd 4.7.0-1-default #1 SMP PREEMPT Mon Jul 25
08:42:47 UTC 2016 (89a2ada) x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.6.1+20160714

# btrfs fi show
Label: none  uuid: 80381f7f-8cef-4bd8-bdbc-3487253ee566
Total devices 1 FS bytes used 113.13GiB
devid1 size 1.26TiB used 119.07GiB path /dev/sda6

# btrfs fi df /
Data, single: total=113.01GiB, used=111.19GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=3.00GiB, used=1.94GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Regards,
Ronan Arraes

It maybe a irrelevant question, but do you have compression enabled?

Regards,
Xiaoguang Wang


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-08 Thread Jeff Mahoney

On 9/8/16 2:49 PM, Jeff Mahoney wrote:
> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
>> Hi all!
>>
>> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>>> Just like what Wang has mentioned, would you please paste all the
>>> output 
>>> of the contents of /sys/fs/btrfs//allocation?
>>>
>>> It's recommended to use "grep . -IR " to get all the data as
>>> it 
>>> will show the file name.
>>
>> So, one more time, I see the problem. This time I was just using
>> Firefox and I cannot recover using `btrfs balance`. I think that, one
>> more time, I will need to reboot this machine. This problem is really
>> causing me a lot of troubles :(
> 
> I have a hunch the list is about to be flooded with similar reports if
> we don't find this one before 4.8.
> 
> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
> Author: Josef Bacik 
> Date:   Fri Mar 25 13:25:51 2016 -0400
> 
> Btrfs: warn_on for unaccounted spaces
> 
> This commit isn't the source of the bug, but it's making it a lot more
> noisy.  I spent a few hours last night trying to track down why xfstests
> was throwing these warnings and I was able to reproduce them at least as
> far back as 4.4-vanilla with -oenospc_debug enabled.
> 
> Speaking of which, can you turn on mounting with -oenospc_debug if you
> haven't already?
> 
> In my case, space_info->bytes_may_use was getting accounted incorrectly.
> 
> I am able to reproduce that even with the following commit:
> commit 18513091af9483ba84328d42092bd4d42a3c958f
> Author: Wang Xiaoguang 
> Date:   Mon Jul 25 15:51:40 2016 +0800
> 
> btrfs: update btrfs_space_info's bytes_may_use timely

And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
fixed by:

commit ed7a6948394305b810d0c6203268648715e5006f
Author: Wang Xiaoguang 
Date:   Fri Aug 26 11:33:14 2016 +0800

btrfs: do not decrease bytes_may_use when replaying extents

... which shouldn't change anything for your issue, unfortunately.

I still see these:
WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod autofs4
CPU: 2 PID: 8166 Comm: umount Tainted: GW
4.4.19-11.g81405db-vanilla #1
Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
  880230317d10 813170ec 
 a0472528 880230317d48 8107d816 
 88009ab03600 8800ba106288 8800ab75a000 8800ba106200
Call Trace:
 [] dump_stack+0x63/0x87
 [] warn_slowpath_common+0x86/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
 [] close_ctree+0x15b/0x330 [btrfs]
 [] btrfs_put_super+0x19/0x20 [btrfs]
 [] generic_shutdown_super+0x6f/0x100
 [] kill_anon_super+0x12/0x20
 [] btrfs_kill_super+0x18/0x120 [btrfs]
 [] deactivate_locked_super+0x43/0x70
 [] deactivate_super+0x46/0x60
 [] cleanup_mnt+0x3f/0x80
 [] __cleanup_mnt+0x12/0x20
 [] task_work_run+0x86/0xb0
 [] exit_to_usermode_loop+0x73/0xa2
 [] syscall_return_slowpath+0x8d/0xa0
 [] int_ret_from_sys_call+0x25/0x8f
---[ end trace 09a0cc2892b6305c ]---
BTRFS: space_info 1 has 7946240 free, is not full
BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
may_use=4096, readonly=0

... where the value of may_use varies.

-Jeff

> 
>> grep . -IR /sys/fs/btrfs/e9efaa0c-d477-4249-830f-
>> ee5956768b29/allocation
>> allocation/data/flags:1
>> allocation/data/bytes_pinned:0
>> allocation/data/bytes_may_use:0
>> allocation/data/total_bytes_pinned:202973265920
> 
> That adds up to ~ 189 GB.  total_bytes is only about 42 GB.
> 
>> allocation/data/bytes_reserved:0
>> allocation/data/bytes_used:45623730176
>> allocation/data/single/used_bytes:45623730176
>> allocation/data/single/total_bytes:46179287040
>> allocation/data/total_bytes:46179287040
>> allocation/data/disk_total:46179287040
>> allocation/data/disk_used:45623730176
>> allocation/metadata/dup/used_bytes:1120698368
>> allocation/metadata/dup/total_bytes:6979321856
>> allocation/metadata/flags:4
>> allocation/metadata/bytes_pinned:0
>> allocation/metadata/bytes_may_use:88521768960
>> allocation/metadata/total_bytes_pinned:-44285952
> 
> ... well that's certainly interesting.  It looks like we'll need to see
> how that happened.  It seems like we've messed up at least that portion
> of accounting.
> 
> -Jeff
> 
>> allocation/metadata/bytes_reserved:0
>> allocation/metadata/bytes_used:1120698368
>> allocation/metadata/total_bytes:6979

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-08 Thread Jeff Mahoney

On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
> Hi all!
> 
> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>> Just like what Wang has mentioned, would you please paste all the
>> output 
>> of the contents of /sys/fs/btrfs//allocation?
>>
>> It's recommended to use "grep . -IR " to get all the data as
>> it 
>> will show the file name.
> 
> So, one more time, I see the problem. This time I was just using
> Firefox and I cannot recover using `btrfs balance`. I think that, one
> more time, I will need to reboot this machine. This problem is really
> causing me a lot of troubles :(

I have a hunch the list is about to be flooded with similar reports if
we don't find this one before 4.8.

commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
Author: Josef Bacik 
Date:   Fri Mar 25 13:25:51 2016 -0400

Btrfs: warn_on for unaccounted spaces

This commit isn't the source of the bug, but it's making it a lot more
noisy.  I spent a few hours last night trying to track down why xfstests
was throwing these warnings and I was able to reproduce them at least as
far back as 4.4-vanilla with -oenospc_debug enabled.

Speaking of which, can you turn on mounting with -oenospc_debug if you
haven't already?

In my case, space_info->bytes_may_use was getting accounted incorrectly.

I am able to reproduce that even with the following commit:
commit 18513091af9483ba84328d42092bd4d42a3c958f
Author: Wang Xiaoguang 
Date:   Mon Jul 25 15:51:40 2016 +0800

btrfs: update btrfs_space_info's bytes_may_use timely


> grep . -IR /sys/fs/btrfs/e9efaa0c-d477-4249-830f-
> ee5956768b29/allocation
> allocation/data/flags:1
> allocation/data/bytes_pinned:0
> allocation/data/bytes_may_use:0
> allocation/data/total_bytes_pinned:202973265920

That adds up to ~ 189 GB.  total_bytes is only about 42 GB.

> allocation/data/bytes_reserved:0
> allocation/data/bytes_used:45623730176
> allocation/data/single/used_bytes:45623730176
> allocation/data/single/total_bytes:46179287040
> allocation/data/total_bytes:46179287040
> allocation/data/disk_total:46179287040
> allocation/data/disk_used:45623730176
> allocation/metadata/dup/used_bytes:1120698368
> allocation/metadata/dup/total_bytes:6979321856
> allocation/metadata/flags:4
> allocation/metadata/bytes_pinned:0
> allocation/metadata/bytes_may_use:88521768960
> allocation/metadata/total_bytes_pinned:-44285952

... well that's certainly interesting.  It looks like we'll need to see
how that happened.  It seems like we've messed up at least that portion
of accounting.

-Jeff

> allocation/metadata/bytes_reserved:0
> allocation/metadata/bytes_used:1120698368
> allocation/metadata/total_bytes:6979321856
> allocation/metadata/disk_total:13958643712
> allocation/metadata/disk_used:2241396736
> allocation/global_rsv_size:385875968
> allocation/global_rsv_reserved:385875968
> allocation/system/dup/used_bytes:16384
> allocation/system/dup/total_bytes:33554432
> allocation/system/flags:2
> allocation/system/bytes_pinned:0
> allocation/system/bytes_may_use:0
> allocation/system/total_bytes_pinned:0
> allocation/system/bytes_reserved:0
> allocation/system/bytes_used:16384
> allocation/system/total_bytes:33554432
> allocation/system/disk_total:67108864
> allocation/system/disk_used:32768
> 
> Additional information:
> 
> btrfs fi usage /
> Overall:
> Device size: 1.26TiB
> Device allocated:   56.07GiB
> Device unallocated:  1.20TiB
> Device missing:0.00B
> Used:   44.58GiB
> Free (estimated):1.20TiB  (min: 616.41GiB)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve:368.00MiB  (used: 0.00B)
> 
> Data,single: Size:43.01GiB, Used:42.49GiB
>/dev/sda643.01GiB
> 
> Metadata,DUP: Size:6.50GiB, Used:1.04GiB
>/dev/sda613.00GiB
> 
> System,DUP: Size:32.00MiB, Used:16.00KiB
>/dev/sda664.00MiB
> 
> Unallocated:
>/dev/sda6 1.20TiB
> 
> Can anyone help me?
> 
> Best regards,
> Ronan Arraes
> 


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-08 Thread Ronan Arraes Jardim Chagas

Hi all!

Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
> Just like what Wang has mentioned, would you please paste all the
> output 
> of the contents of /sys/fs/btrfs//allocation?
> 
> It's recommended to use "grep . -IR " to get all the data as
> it 
> will show the file name.

So, one more time, I see the problem. This time I was just using
Firefox and I cannot recover using `btrfs balance`. I think that, one
more time, I will need to reboot this machine. This problem is really
causing me a lot of troubles :(

I have disabled the quotas and the first error message after the
problem was:

[ 2444.592255] [ cut here ]
[ 2444.592314] WARNING: CPU: 4 PID: 289 at ../fs/btrfs/extent-
tree.c:4303 btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
[ 2444.592317] Modules linked in: fuse nf_log_ipv6 xt_pkttype
nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft
iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw nvidia_drm(PO) ipt_REJECT
nf_reject_ipv4 snd_hda_codec_hdmi nvidia_modeset(PO) intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp nvidia(PO) coretemp
snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic iptable_raw
drm_kms_helper snd_hda_intel drm xt_CT snd_hda_codec snd_hda_core
snd_hwdep kvm_intel snd_pcm snd_timer joydev mei_wdt fb_sys_fops
iTCO_vendor_support i2c_i801 lpc_ich kvm syscopyarea snd sysfillrect
irqbypass mei_me hp_wmi sysimgblt iptable_filter crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper
[ 2444.592386]  cryptd soundcore mei sparse_keymap rfkill e1000e shpchp
pcspkr ioatdma mfd_core tpm_infineon tpm_tis dca tpm fjes ptp pps_core
ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast
nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack
ip6table_filter ip6_tables x_tables btrfs xor raid6_pq hid_generic
usbhid crc32c_intel serio_raw xhci_pci ehci_pci xhci_hcd ehci_hcd
firewire_ohci sr_mod firewire_core cdrom crc_itu_t usbcore isci
usb_common libsas ata_generic mpt3sas raid_class scsi_transport_sas wmi
button sg
[ 2444.592447] CPU: 4 PID: 289 Comm: kworker/u65:7 Tainted:
PW  O4.7.1-1-default #1
[ 2444.592450] Hardware name: Hewlett-Packard HP Z820 Workstation/158B,
BIOS J63 v03.65 12/19/2013
[ 2444.592458] Workqueue: writeback wb_workfn (flush-btrfs-1)
[ 2444.592462]   81393104 

[ 2444.592468]  8107ca1e 88080de6d800 9000
88080c437a00
[ 2444.592472]  880634b379ac 9000 88080dcfb73c
a02af98e
[ 2444.592477] Call Trace:
[ 2444.592499]  [] dump_trace+0x5e/0x320
[ 2444.592507]  [] show_stack_log_lvl+0x10c/0x180
[ 2444.592514]  [] show_stack+0x21/0x40
[ 2444.592523]  [] dump_stack+0x5c/0x78
[ 2444.592531]  [] __warn+0xbe/0xe0
[ 2444.592561]  []
btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
[ 2444.592602]  [] btrfs_clear_bit_hook+0x296/0x380
[btrfs]
[ 2444.592642]  [] clear_state_bit+0x55/0x1d0 [btrfs]
[ 2444.592676]  [] __clear_extent_bit+0x13d/0x3f0
[btrfs]
[ 2444.592707]  []
extent_clear_unlock_delalloc+0x62/0x280 [btrfs]
[ 2444.592739]  [] cow_file_range+0x299/0x440 [btrfs]
[ 2444.592768]  [] run_delalloc_range+0x392/0x3b0
[btrfs]
[ 2444.592801]  []
writepage_delalloc.isra.40+0x100/0x170 [btrfs]
[ 2444.592834]  [] __extent_writepage+0xc3/0x340
[btrfs]
[ 2444.592864]  []
extent_write_cache_pages.isra.36.constprop.53+0x23b/0x350 [btrfs]
[ 2444.592894]  [] extent_writepages+0x4e/0x60
[btrfs]
[ 2444.592900]  []
__writeback_single_inode+0x3d/0x3b0
[ 2444.592907]  [] writeback_sb_inodes+0x20a/0x440
[ 2444.592914]  [] __writeback_inodes_wb+0x87/0xb0
[ 2444.592921]  [] wb_writeback+0x28d/0x330
[ 2444.592927]  [] wb_workfn+0x222/0x3f0
[ 2444.592934]  [] process_one_work+0x1ed/0x4e0
[ 2444.592942]  [] worker_thread+0x47/0x4c0
[ 2444.592947]  [] kthread+0xbd/0xe0
[ 2444.592954]  [] ret_from_fork+0x1f/0x40
[ 2444.596679] DWARF2 unwinder stuck at ret_from_fork+0x1f/0x40

[ 2444.596683] Leftover inexact backtrace:

[ 2444.596689]  [] ? kthread_worker_fn+0x170/0x170

I will also provide the information requested by Qu:

grep . -IR /sys/fs/btrfs/e9efaa0c-d477-4249-830f-
ee5956768b29/allocation
allocation/data/flags:1
allocation/data/bytes_pinned:0
allocation/data/bytes_may_use:0
allocation/data/total_bytes_pinned:202973265920
allocation/data/bytes_reserved:0
allocation/data/bytes_used:45623730176
allocation/data/single/used_bytes:45623730176
allocation/data/single/total_bytes:46179287040
allocation/data/total_bytes:46179287040
allocation/data/disk_total:46179287040
allocation/data/disk_used:45623730176
allocation/metadata/dup/used_bytes:1120698368
allocation/metadata/dup/total_bytes:6979321856
allocation/metadata/flags:4
allocation/metadata/bytes_pinned:0
allocation/metadata/bytes_may_use:88521768960
allocation/metadata/total_bytes_pinned:-44285952
allocation/metadata/bytes_res

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-05 Thread Qu Wenruo

Just like what Wang has mentioned, would you please paste all the output 
of the contents of /sys/fs/btrfs//allocation?


It's recommended to use "grep . -IR " to get all the data as it 
will show the file name.


Thanks,
Qu

At 09/03/2016 03:25 AM, Ronan Arraes Jardim Chagas wrote:

Hi guys!

Jeff was right. I had the problem again today and quotas are disabled
now. I couldn't get any useful message in log this time. Look at the
metadata:

btrfs fi usage /
Overall:
Device size:   1.26TiB
Device allocated: 43.07GiB
Device unallocated:1.21TiB
Device missing:  0.00B
Used: 41.94GiB
Free (estimated):  1.21TiB  (min: 622.46GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  352.00MiB  (used: 0.00B)

Data,single: Size:40.01GiB, Used:39.94GiB
   /dev/sda6  40.01GiB

Metadata,DUP: Size:1.50GiB, Used:1.00GiB
   /dev/sda6   3.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6  64.00MiB

Unallocated:
   /dev/sda6   1.21TiB

Any ideas to help me?

Regards,
Ronan Arraes





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Chris Murphy

On Fri, Sep 2, 2016 at 9:47 PM, Ronan Arraes Jardim Chagas
 wrote:
> Hi Chris,
>
> Em Sex, 2016-09-02 às 21:41 -0600, Chris Murphy escreveu:
>> I suggest removing the hardware, and the proprietary driver, and
>> retest the system with the existing Tumbleweed 4.7.0 kernel; and if
>> that still fails, then try the Leap 4.4 kernel.
>>
>> Proprietary kernels can do all kinds of crazy things they shouldn't
>> so
>> it's entirely possible that driver is a factor in the problem.
>
> Actually it is just a module that I load. It is only loaded when I need
> to work with it. However, I can assure this is not the problem because
> I installed the board one month ago +-, but I have been seeing ENOSPC
> since the beginning of the year IIRC. I am using Tumbleweed default
> kernel right now, but I just can try Leap when 42.2 is released.

If you want a work around sooner than later, pick up one of the latest
Leap 42.2 kernels from the URL I provided, I haven't tried it but it
ought to work. Leap 42.2 isn't going to be released for another 2.5
months.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Ronan Arraes Jardim Chagas

Hi Chris,

Em Sex, 2016-09-02 às 21:41 -0600, Chris Murphy escreveu:
> I suggest removing the hardware, and the proprietary driver, and
> retest the system with the existing Tumbleweed 4.7.0 kernel; and if
> that still fails, then try the Leap 4.4 kernel.
> 
> Proprietary kernels can do all kinds of crazy things they shouldn't
> so
> it's entirely possible that driver is a factor in the problem.

Actually it is just a module that I load. It is only loaded when I need
to work with it. However, I can assure this is not the problem because
I installed the board one month ago +-, but I have been seeing ENOSPC
since the beginning of the year IIRC. I am using Tumbleweed default
kernel right now, but I just can try Leap when 42.2 is released.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Chris Murphy

On Fri, Sep 2, 2016 at 8:47 PM, Ronan Arraes Jardim Chagas
 wrote:
> Hi guys!
>
> Em Sex, 2016-09-02 às 16:39 -0600, Chris Murphy escreveu:
>> Worth a shot, considering the opensuse/SLE 4.4 kernel has a shittonne
>> of backports. It seems unlikely to me opensuse intends to not support
>> your hardware (skylake?)
>
> Actually it is a peripheral we use to program embedded systems here and
> the (proprietary) driver requires kernel >= 4.6. I barely use it. I am
> really thinking to transfer it to another machine just to be able to
> change my kernel.

I suggest removing the hardware, and the proprietary driver, and
retest the system with the existing Tumbleweed 4.7.0 kernel; and if
that still fails, then try the Leap 4.4 kernel.

Proprietary kernels can do all kinds of crazy things they shouldn't so
it's entirely possible that driver is a factor in the problem.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Ronan Arraes Jardim Chagas

Hi guys!

Em Sex, 2016-09-02 às 16:39 -0600, Chris Murphy escreveu:
> Worth a shot, considering the opensuse/SLE 4.4 kernel has a shittonne
> of backports. It seems unlikely to me opensuse intends to not support
> your hardware (skylake?)

Actually it is a peripheral we use to program embedded systems here and
the (proprietary) driver requires kernel >= 4.6. I barely use it. I am
really thinking to transfer it to another machine just to be able to
change my kernel.

I will post here one thing I already posted on openSUSE mailing list:

I think I forgot to mention one very important thing: I have been using
Tumbleweed+BTRFS on this machine for a very very very long time. I
think I installed it just after it changed to the current model. By
that time, I was using the same machine but without one peripheral that
requires a "new" kernel (HDD, processor, RAM, everything was the same).
AFAIK, the first time I saw that problem was this year. So, I think it
must be a regression after some kernel / btrfs-progs update.

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Chris Murphy

On Fri, Sep 2, 2016 at 4:13 PM, Ronan Arraes Jardim Chagas
 wrote:
> Hi!
>
> Em Sex, 2016-09-02 às 15:34 -0600, Chris Murphy escreveu:
>> Except for your software build case, I have about the same workload
>> you have with two machines, one SSD one HDD, using 4.7.0 for a month,
>> and then 4.7.2 for the last week. I haven't had any enospc on these
>> two systems.
>>
>> I think for you the path of least resistance that also permits
>> further
>> testing is to see if you can track down the leap 42.2 beta kernel
>> which is 4.4.19-1-default. I'm not easily finding that particular
>> one,
>> but I did find something a bit more recent:
>> http://download.opensuse.org/repositories/Kernel:/openSUSE-42.2/stand
>> ard/x86_64/
>
> Unfortunately, it will not be possible since my actual hardware depends
> on kernel >= 4.6 :(

Worth a shot, considering the opensuse/SLE 4.4 kernel has a shittonne
of backports. It seems unlikely to me opensuse intends to not support
your hardware (skylake?)



>
> Just now, I saw the problem again. For the first time, it happened
> twice in a small period. I was copying the e-mail from one IMAP server
> to my local HD. I use offlineimap, but this time it changed the backend
> to sqlite and started to create tons of database files, I think. My HDD
> IO stayed at 60/70% for a very long period.
>
> Hence, let's do a review of situations in which I saw the problem:
>
> 1) Local builds using `osc`;
> 2) During `zypper dup`;
> 3) When offlineimap created tons of database files;
> 4) During rsync-ing /home;
> 4) During usage of a virtual machine (the disk image was in an EXT4
> partition).

I don't think there's anything remarkable about any of these. And I
even do VM stuff on Btrfs. I also don't think it's the drive.

What it sounds like is possible, is the file system is now in some
kind of weird metadata state and it keeps tripping up on that. There
may be more than one bug going on, one that gets it into this state,
and then one that face plants with enospc when it's encountered.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Ronan Arraes Jardim Chagas

Hi!

Em Sex, 2016-09-02 às 15:34 -0600, Chris Murphy escreveu:
> Except for your software build case, I have about the same workload
> you have with two machines, one SSD one HDD, using 4.7.0 for a month,
> and then 4.7.2 for the last week. I haven't had any enospc on these
> two systems.
> 
> I think for you the path of least resistance that also permits
> further
> testing is to see if you can track down the leap 42.2 beta kernel
> which is 4.4.19-1-default. I'm not easily finding that particular
> one,
> but I did find something a bit more recent:
> http://download.opensuse.org/repositories/Kernel:/openSUSE-42.2/stand
> ard/x86_64/

Unfortunately, it will not be possible since my actual hardware depends
on kernel >= 4.6 :(

Just now, I saw the problem again. For the first time, it happened
twice in a small period. I was copying the e-mail from one IMAP server
to my local HD. I use offlineimap, but this time it changed the backend
to sqlite and started to create tons of database files, I think. My HDD
IO stayed at 60/70% for a very long period.

Hence, let's do a review of situations in which I saw the problem:

1) Local builds using `osc`;
2) During `zypper dup`;
3) When offlineimap created tons of database files;
4) During rsync-ing /home;
4) During usage of a virtual machine (the disk image was in an EXT4
partition).

I think we can conclude that this problem is tightly coupled with
actions that require a lot of writing to the HDD. Here is the
specification of my HDD:

hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
Model Number:   ST2000DM001-1CH164  
Serial Number:  W1E73CF5
Firmware Revision:  HP34
Transport:  Serial, SATA 1.0a, SATA II Extensions, SATA
Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x001f) 
Supported: 9 8 7 6 5 
Likely used: 9
Configuration:
Logical max current
cylinders   16383   16383
heads   16  16
sectors/track   63  63
--
CHS current addressable sectors:   16514064
LBAuser addressable sectors:  268435455
LBA48  user addressable sectors: 3907029168
Logical  Sector size:   512 bytes
Physical Sector size:  4096 bytes
Logical Sector-0 offset:  0 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size  = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific
minimum
R/W multiple sector transfer: Max = 16  Current = ?
Advanced power management level: 128
Recommended acoustic management value: 208, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
 Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4 
 Cycle time: no flow control=120ns  IORDY flow
control=120ns
Commands/features:
Enabled Supported:
   *SMART feature set
Security Mode feature set
   *Power Management feature set
   *Write cache
   *Look-ahead
   *WRITE_BUFFER command
   *READ_BUFFER command
   *DOWNLOAD_MICROCODE
   *Advanced Power Management feature set
Power-Up In Standby feature set
   *SET_FEATURES required to spinup after power up
   *48-bit Address feature set
   *Device Configuration Overlay feature set
   *Mandatory FLUSH_CACHE
   *FLUSH_CACHE_EXT
   *SMART error logging
   *SMART self-test
   *General Purpose Logging feature set
   *64-bit World wide name
   *WRITE_UNCORRECTABLE_EXT command
   *{READ,WRITE}_DMA_EXT_GPL commands
   *Segmented DOWNLOAD_MICROCODE
   *Gen1 signaling speed (1.5Gb/s)
   *Gen2 signaling speed (3.0Gb/s)
   *Gen3 signaling speed (6.0Gb/s)
   *Native Command Queueing (NCQ)
   *Phy event counters
   *READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
   *DMA Setup Auto-Activate optimization
Device-initiated interface power management
   *Software settings preservation
   *SMART Command Transport (SCT) feature set
   *SCT Read/Write Long (AC1), obsolete
   *SCT Error Recovery Control (AC3)
   *SCT Features Control (AC4)
   *SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
Secu

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Chris Murphy

On Fri, Sep 2, 2016 at 1:56 PM, Ronan Arraes Jardim Chagas
 wrote:
> Hi again guys!
>
> After I rebooted the computer, I still can't run balance on metatada:

Except for your software build case, I have about the same workload
you have with two machines, one SSD one HDD, using 4.7.0 for a month,
and then 4.7.2 for the last week. I haven't had any enospc on these
two systems.

I think for you the path of least resistance that also permits further
testing is to see if you can track down the leap 42.2 beta kernel
which is 4.4.19-1-default. I'm not easily finding that particular one,
but I did find something a bit more recent:
http://download.opensuse.org/repositories/Kernel:/openSUSE-42.2/standard/x86_64/

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Ronan Arraes Jardim Chagas

Hi again guys!

After I rebooted the computer, I still can't run balance on metatada:

btrfs balance start -musage=1 /
ERROR: error during balancing '/': No space left on device
There may be more info in syslog - try dmesg | tail

dmesg shows:

[ 2022.530285] BTRFS info (device sda6): relocating block group
128509280256 flags 36
[ 2023.355206] BTRFS info (device sda6): relocating block group
127972409344 flags 36
[ 2024.265313] BTRFS info (device sda6): relocating block group
127435538432 flags 36
[ 2025.646712] BTRFS info (device sda6): relocating block group
126898667520 flags 36
[ 2026.794791] BTRFS info (device sda6): relocating block group
126361796608 flags 36
[ 2028.023517] BTRFS info (device sda6): relocating block group
125824925696 flags 36
[ 2028.881287] BTRFS info (device sda6): relocating block group
125288054784 flags 36
[ 2029.739342] BTRFS info (device sda6): relocating block group
124751183872 flags 36
[ 2030.631990] BTRFS info (device sda6): relocating block group
124214312960 flags 36
[ 2031.523176] BTRFS info (device sda6): relocating block group
123677442048 flags 36
[ 2032.407859] BTRFS info (device sda6): relocating block group
123140571136 flags 36
[ 2033.806672] BTRFS info (device sda6): relocating block group
122603700224 flags 36
[ 2035.237712] BTRFS info (device sda6): relocating block group
122066829312 flags 36
[ 2038.257268] BTRFS info (device sda6): relocating block group
122033274880 flags 34
[ 2039.911443] BTRFS info (device sda6): relocating block group
121496403968 flags 36
[ 2040.958106] BTRFS info (device sda6): relocating block group
120959533056 flags 36
[ 2041.841051] BTRFS info (device sda6): relocating block group
120422662144 flags 36
[ 2042.828359] BTRFS info (device sda6): relocating block group
119885791232 flags 36
[ 2044.297744] BTRFS info (device sda6): relocating block group
119348920320 flags 36
[ 2045.684932] BTRFS info (device sda6): relocating block group
118812049408 flags 36
[ 2046.761787] BTRFS info (device sda6): relocating block group
118275178496 flags 36
[ 2048.200756] BTRFS info (device sda6): relocating block group
117738307584 flags 36
[ 2049.806986] BTRFS info (device sda6): relocating block group
117201436672 flags 36
[ 2051.170470] BTRFS info (device sda6): relocating block group
116664565760 flags 36
[ 2051.910536] BTRFS info (device sda6): relocating block group
116127694848 flags 36
[ 2052.678395] BTRFS info (device sda6): relocating block group
115590823936 flags 36
[ 2053.737959] BTRFS info (device sda6): relocating block group
106363355136 flags 36
[ 2054.852065] BTRFS info (device sda6): relocating block group
105826484224 flags 36
[ 2055.911187] BTRFS info (device sda6): relocating block group
105222504448 flags 36
[ 2057.047407] BTRFS info (device sda6): 4 enospc errors during balance

and I have:

btrfs fi usage /
Overall:
Device size:   1.26TiB
Device allocated:     80.07GiB
Device unallocated:    1.18TiB
Device missing:  0.00B
Used:     41.95GiB
Free (estimated):      1.18TiB  (min: 603.95GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  352.00MiB  (used: 576.00KiB)

Data,single: Size:40.01GiB, Used:39.95GiB
   /dev/sda6  40.01GiB

Metadata,DUP: Size:20.00GiB, Used:1.00GiB
   /dev/sda6  40.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6  64.00MiB

Unallocated:
   /dev/sda6   1.18TiB

Hope this brings new information!

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Ronan Arraes Jardim Chagas

Hi guys!

Jeff was right. I had the problem again today and quotas are disabled
now. I couldn't get any useful message in log this time. Look at the
metadata:

btrfs fi usage /
Overall:
Device size:   1.26TiB
Device allocated:     43.07GiB
Device unallocated:    1.21TiB
Device missing:  0.00B
Used:     41.94GiB
Free (estimated):      1.21TiB  (min: 622.46GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  352.00MiB  (used: 0.00B)

Data,single: Size:40.01GiB, Used:39.94GiB
   /dev/sda6  40.01GiB

Metadata,DUP: Size:1.50GiB, Used:1.00GiB
   /dev/sda6   3.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6  64.00MiB

Unallocated:
   /dev/sda6   1.21TiB

Any ideas to help me?

Regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Jeff Mahoney

On 9/2/16 11:20 AM, Ronan Arraes Jardim Chagas wrote:
> Hi Jeff,
> 
> Em Sex, 2016-09-02 às 10:48 -0400, Jeff Mahoney escreveu:
>> Sorry, I miscommunicated there.  The WARN_ON is annoying.  It's the
>> underlying issue that's causing you to lose work that is the one that
>> concerns me.
>>  
> 
> Oh, OK, I see, sorry about that :)
> 
> Thus, if disabling quotas does not help to fix my problem, is there any
> workaround you can think of to avoid the problem you suggested in the
> previous e-mail?

Which part?  The quota reservation race will go away with quotas
disabled, so you won't get the WARN_ON.  The ENOSPC issue needs more
investigation before I can suggest a workaround/fix.  I won't be able to
get into that until Tuesday.  (Start of a holiday weekend in the US).

-Jeff

-- 
Jeff Mahoney
SUSE Labs

signature.asc
Description: OpenPGP digital signature

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Ronan Arraes Jardim Chagas

Hi Jeff,

Em Sex, 2016-09-02 às 10:48 -0400, Jeff Mahoney escreveu:
> Sorry, I miscommunicated there.  The WARN_ON is annoying.  It's the
> underlying issue that's causing you to lose work that is the one that
> concerns me.
> 

Oh, OK, I see, sorry about that :)

Thus, if disabling quotas does not help to fix my problem, is there any
workaround you can think of to avoid the problem you suggested in the
previous e-mail?

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Jeff Mahoney

On 9/2/16 10:43 AM, Ronan Arraes Jardim Chagas wrote:
> Hi Jeff,
> 
> Em Sex, 2016-09-02 às 10:26 -0400, Jeff Mahoney escreveu:
>> I explained what I think Ronan's issue is in another part of the
>> thread
>> just now.  I don't think that's a severe issue at
>> all.  Annoying?  Sure,
>> but I'm more concerned with the underlying ENOSPC issue.  Without
>> more
>> info, I don't know what the cause of it is and when it was
>> introduced.
> 
> Sorry, but I really need to humbly disagree with you. Look to what has
> already happened to me when the problem occurred (which is almost every
> day):
> 
> 1) Firefox crash;
> 2) Libreoffice crash (auto-save stop working);
> 3) Can't save my work in any text editor (vim, neovim, gedit, etc.);
> 4) Sometimes I can't even log as root (in TTY or by `su`);
> 5) Sometimes only a hard-reset solves the problem;
> 6) I was left with a broken operational system when the problem
> occurred during a `zypper dup`.
> 
> I just can't tell you how much work I lost during those situations. So,
> I think we cannot call this issue just annoying. I think it is very
> severe.

Sorry, I miscommunicated there.  The WARN_ON is annoying.  It's the
underlying issue that's causing you to lose work that is the one that
concerns me.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Ronan Arraes Jardim Chagas

Hi Jeff,

Em Sex, 2016-09-02 às 10:26 -0400, Jeff Mahoney escreveu:
> I explained what I think Ronan's issue is in another part of the
> thread
> just now.  I don't think that's a severe issue at
> all.  Annoying?  Sure,
> but I'm more concerned with the underlying ENOSPC issue.  Without
> more
> info, I don't know what the cause of it is and when it was
> introduced.

Sorry, but I really need to humbly disagree with you. Look to what has
already happened to me when the problem occurred (which is almost every
day):

1) Firefox crash;
2) Libreoffice crash (auto-save stop working);
3) Can't save my work in any text editor (vim, neovim, gedit, etc.);
4) Sometimes I can't even log as root (in TTY or by `su`);
5) Sometimes only a hard-reset solves the problem;
6) I was left with a broken operational system when the problem
occurred during a `zypper dup`.

I just can't tell you how much work I lost during those situations. So,
I think we cannot call this issue just annoying. I think it is very
severe.

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Jeff Mahoney

On 9/1/16 8:12 PM, Chris Murphy wrote:
> On Thu, Sep 1, 2016 at 12:47 PM, Austin S. Hemmelgarn
>  wrote:
> 
> 
>> 2. Snapper's default snapshot creation configuration is absolutely
>> pathological in nature, generating insane amounts of background resource
>> usage and taking up huge amounts of space.  If this were changed, you would
>> be a lot less dependent on being able to free up snapshots based on space
>> usage.
> 
> That's diplomatic.
> 
> They know all of this already though, but instead of toning down
> snapper defaults, they're amping up the voluming by enabling quotas
> instead.
> 
> There is only one logical reason for this that I can thing of. They're
> trying to increase problem reports, presumably in order to smooth out
> noisy data, maybe even by getting better bug reports like Ronan's. But
> I think this is a specious policy.

There's no conspiracy to leverage the openSUSE user base to generate bug
reports any more than enabling any other feature in Tumbleweed before
SLES is.  We've enabled qgroups by default so that snapper can make sane
decisions based on space usage.  That's it.

>> It's poor choices like this that fall into the category of 'Ooh, this looks
>> cool, let's do it!' made by major distros that are most of the reason that
>> BTRFS has such a bad reputation right now.
> 
> Over on Factory list, they're trying to have this two ways. First
> they're saying quotas are stable as they've implemented them in the
> Leap 4.4 kernel. And they consider the btrfs-progs man page warning
> that quotas aren't yet stable even in 4.7, and aren't recommended
> unless the user will use them, is a bug that should be removed from
> their copy of the man page.

Yep.  That's a bug in the man page.  We do consider them stable.  I see
every btrfs bug that gets reported against SLE12 SP2, upon which the
Leap kernel is based.  Have there been qgroups bugs over the development
cycle?  You bet.  There's a reason if you look at the commit log for
qgroups over the past year, you'll see a bunch of fixes from SUSE
developers.

I explained what I think Ronan's issue is in another part of the thread
just now.  I don't think that's a severe issue at all.  Annoying?  Sure,
but I'm more concerned with the underlying ENOSPC issue.  Without more
info, I don't know what the cause of it is and when it was introduced.

We, like every other group of file system developers, run xfstests
pretty religiously.  Since qgroups are becoming a bigger part of the
btrfs experience for our products, we test them specifically.  Yes,
there are xfstests /just/ for qgroups, but we also make it a point to
run the entire xfstests suite with and without qgroups enabled.  Since
the requirement for snapper was to have accurate space tracking, that's
what we've focused on.

I obviously can't open up the SLES bugzilla to the world, so you're
going to have to take my word on this.  For our 4.4-based kernel there
are currently 3 qgroup related bugs.  The first is a report about how
annoying it is to see old qgroup items for removed subvolumes.  The
second is an accounting bug that is old and the developer just hasn't
gotten around to closing it yet.  The third is a real issue, where users
can hit the qgroup limit and are then stuck, similar to how it used to
be when you'd hit ENOSPC and couldn't remove files or subvolumes.  My
gut feeling is that it's the same kind of problem:  Removing files
involves allocating blocks to CoW the metadata and when you've hit your
quota limit, you can't allocate the blocks.  I expect the solution will
be similar to the ENOSPC issue except that rather than keeping a pool
around, we can just CoW knowing full well the intention is to release
space.  My team is working on that today and I expect a fix shortly.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

signature.asc
Description: OpenPGP digital signature

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-02 Thread Jeff Mahoney

On 8/31/16 4:49 PM, Ronan Arraes Jardim Chagas wrote:
> Hi guys!
> 
> And the problem happened again. This time, I was only using Mozilla
> Firefox. I could get the very first message after the error. I hope it
> brings more information:

Ok, so I think this is a race that can happen when one thread is
starting a transaction and another thread is committing a transaction
that involves creating a snapshot.

We reserve blocks at the top of start_transaction and that reservation
stays with the root.  In: btrfs_commit_transaction->
create_pending_snapshots-> create_pending_snapshot->
qgroup_account_snapshot-> commit_fs_roots, we clear that reservation
from the root via btrfs_qgroup_free_meta_all, potentially while
start_transaction is waiting to join a new transaction.  Or not.  It can
happen asynchronously, which is the point of having the reservation
prior to that.

So the thing is that this error can only occur if start_transaction
fails after this race occurs.  That, combined with your report that you
were seeing ENOSPC instead of EDQUOT, leads me to believe that this is
just a side effect of whatever is causing you to not hit ENOSPC.  I
expect that you'll see it again -- you just won't see the WARN_ON
anymore since quotas are disabled.  I suspect it's probably the
btrfs_block_rsv_add call immediately after the reservation, but there's
no way to tell without tracing.

-Jeff


> [28039.672199] [ cut here ]
> [28039.672253] WARNING: CPU: 3 PID: 31800 at ../fs/btrfs/qgroup.c:2667
> btrfs_qgroup_free_meta+0x88/0x90 [btrfs]
> [28039.672255] Modules linked in: fuse nf_log_ipv6 xt_pkttype
> nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft
> iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4
> iptable_raw xt_CT nvidia_drm(PO) nvidia_modeset(PO) iptable_filter
> nvidia(PO) ip6table_mangle nf_conntrack_netbios_ns
> nf_conntrack_broadcast drm_kms_helper nf_conntrack_ipv4 drm
> nf_defrag_ipv4 fb_sys_fops snd_hda_codec_hdmi joydev
> snd_hda_codec_realtek ip_tables syscopyarea snd_hda_codec_generic
> xt_conntrack snd_hda_intel sysfillrect intel_rapl sb_edac edac_core
> snd_hda_codec hp_wmi x86_pkg_temp_thermal intel_powerclamp snd_hda_core
> snd_hwdep nf_conntrack sparse_keymap sysimgblt coretemp kvm_intel kvm
> rfkill irqbypass snd_pcm snd_timer crct10dif_pclmul
> [28039.672305]  e1000e crc32_pclmul ghash_clmulni_intel snd aesni_intel
> ip6table_filter aes_x86_64 lrw gf128mul glue_helper ablk_helper
> iTCO_wdt iTCO_vendor_support mei_wdt ioatdma pcspkr cryptd ip6_tables
> ptp lpc_ich fjes i2c_i801 dca mfd_core soundcore pps_core shpchp
> tpm_infineon tpm_tis tpm mei_me mei x_tables btrfs xor raid6_pq
> hid_generic usbhid crc32c_intel serio_raw xhci_pci ehci_pci sr_mod
> firewire_ohci xhci_hcd ehci_hcd cdrom firewire_core crc_itu_t isci
> usbcore usb_common libsas ata_generic mpt3sas raid_class
> scsi_transport_sas wmi button sg
> [28039.672373] CPU: 3 PID: 31800 Comm: gnome-terminal- Tainted:
> PW  O4.7.1-1-default #1
> [28039.672375] Hardware name: Hewlett-Packard HP Z820 Workstation/158B,
> BIOS J63 v03.65 12/19/2013
> [28039.672378]   81393104 
> 
> [28039.672382]  8107ca1e 881008780800 00014000
> 881008780800
> [28039.672386]  ffe4 88100b297c00 88053b7e3540
> a02c9f58
> [28039.672390] Call Trace:
> [28039.672406]  [] dump_trace+0x5e/0x320
> [28039.672413]  [] show_stack_log_lvl+0x10c/0x180
> [28039.672419]  [] show_stack+0x21/0x40
> [28039.672425]  [] dump_stack+0x5c/0x78
> [28039.672430]  [] __warn+0xbe/0xe0
> [28039.672461]  [] btrfs_qgroup_free_meta+0x88/0x90
> [btrfs]
> [28039.672492]  [] start_transaction+0x3c3/0x4f0
> [btrfs]
> [28039.672521]  [] btrfs_create+0x38/0x1d0 [btrfs]
> [28039.672528]  [] path_openat+0x139b/0x14a0
> [28039.672535]  [] do_filp_open+0x7e/0xe0
> [28039.672541]  [] do_sys_open+0x124/0x1f0
> [28039.672547]  []
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> [28039.676186] DWARF2 unwinder stuck at
> entry_SYSCALL_64_fastpath+0x1e/0xa8


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-01 Thread Qu Wenruo




At 09/01/2016 05:44 AM, Chris Murphy wrote:

On Wed, Aug 31, 2016 at 2:49 PM, Ronan Arraes Jardim Chagas
 wrote:

Hi guys!

And the problem happened again. This time, I was only using Mozilla
Firefox. I could get the very first message after the error. I hope it
brings more information:

[28039.672199] [ cut here ]
[28039.672253] WARNING: CPU: 3 PID: 31800 at ../fs/btrfs/qgroup.c:2667
btrfs_qgroup_free_meta+0x88/0x90 [btrfs]



Does this file system have quota enabled?

I'm testing this right now and can't even figure out how to determine
when quota is enabled on a Btrfs file system. There's enable, disable,
and rescan. If it's enabled or disabled, I get the same message if I
rescan. If I mount the file system with quota previously enabled,
there is no mount time notification that quota is enabled.

I sincerely hope opensuse isn't enabled quota by default.




The　kernel warning is interesting.

It means qgroup is underflowing its reserved metadata space.
However although it's a warning, it won't really under flow the numbers, 
but decrease it to zero.


It shows there is something wrong with metadata allocation, but won't 
directly cause quota corruption.


Quota uses two isolated different system, one extent based for qgroup 
numbers,

and one reserved space based for reserved space.

The latter one is only used to prevent user from exceeding qgroup limit, 
and if user doesn't use limit, it won't cause any qgroup corruption or 
ENOSPC.


Further more, if it's qgroup reserved space causing anything wrong, it 
won't return -ENOSPC, but -EDQUOT.


So, just as Wang suspected, there is something wrong with metadata 
allocation, causing the problem and triggering the qgroup warning.


Thankg,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-01 Thread Chris Murphy

On Thu, Sep 1, 2016 at 12:47 PM, Austin S. Hemmelgarn
 wrote:

> 2. Snapper's default snapshot creation configuration is absolutely
> pathological in nature, generating insane amounts of background resource
> usage and taking up huge amounts of space.  If this were changed, you would
> be a lot less dependent on being able to free up snapshots based on space
> usage.

That's diplomatic.

They know all of this already though, but instead of toning down
snapper defaults, they're amping up the voluming by enabling quotas
instead.

There is only one logical reason for this that I can thing of. They're
trying to increase problem reports, presumably in order to smooth out
noisy data, maybe even by getting better bug reports like Ronan's. But
I think this is a specious policy.

> It's poor choices like this that fall into the category of 'Ooh, this looks
> cool, let's do it!' made by major distros that are most of the reason that
> BTRFS has such a bad reputation right now.

Over on Factory list, they're trying to have this two ways. First
they're saying quotas are stable as they've implemented them in the
Leap 4.4 kernel. And they consider the btrfs-progs man page warning
that quotas aren't yet stable even in 4.7, and aren't recommended
unless the user will use them, is a bug that should be removed from
their copy of the man page.

So, what are they using? Pulling out such warnings doesn't make
upstream code backported to their 4.4 kernel magically stable. If
they're using out of tree quota code, fine, remove the warnings. But
then, what is this code? How does it interact with upstream kernels?

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-01 Thread Chris Murphy

On Thu, Sep 1, 2016 at 7:21 AM, Austin S. Hemmelgarn
 wrote:

> Yes, you can just run `btrfs quota disable /` and it should work.  This
> ironically reiterates that one of the bigger problems with BTRFS is that
> distros are enabling unstable and known broken features by default on
> install.  I was pretty much dumbfounded when I first learned that OpenSUSE
> is enabling BTRFS qgroups by default since they are known to not work
> reliably and cause all kinds of issues.

Yes, I've just confirmed this on the OpenSUSE Factory mailing list.
[1] This is default on Tumbleweed (devel) and Leap (stable), and also
SLE 12 SP2.

The feature that depends on it, that's actually enabling it is snapper:
http://snapper.io/2016/05/18/space-aware-cleanup.html

That feature says "btrfs quota support looks mature enough" which is
big news to me. If it's that mature, why not make it the mkfs default?
Just turn it on for everyone out of the gate? And if it isn't that
mature, is it really appropriate for broad, by default, silent
deployment for opensuse stable, and SUSE enterprise? I'm surprised no
one said on this list that qgroups were stable enough for widespread
testing for list regulars first. It just suddenly ends up enabled
across three major distro outputs?

Even the fucking error messages were misleading. It wasn't until the
most recent call trace that qgroups was even considered as possibly
being related to this. How is it that busting a quota limit doesn't
cause a very explicit quota related message, rather than a generic
enospc?

[1] https://lists.opensuse.org/opensuse-factory/2016-09/msg00033.html

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space

2016-09-01 Thread Ronan Arraes Jardim Chagas

Hi Jeff,

Em Qui, 2016-09-01 às 13:12 -0400, Jeff Mahoney escreveu:
> It's not.  We use qgroups because that's the only way we can track
> how
> much space each subvolume is using, regardless of whether anyone
> wants
> to do enforcement.  When it's working properly, snapper can make use
> of
> that information to make informed decisions on how much space will
> actually be released when removing old snapshots.
> 

Given that, what am I loosing by disabling qgroups here? Will I still
be able to recover my machine using snapshots (this saved my two or
three times)?

Best regards,
Ronan Arraes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 5 >

1 - 100 of 463 matches

Mail list logo