Re: Regression with crc32c selection?

2018-07-23 Thread Patrik Lundquist
$ uname -a
Linux nas 4.17.0-1-amd64 #1 SMP Debian 4.17.8-1 (2018-07-20) x86_64 GNU/Linux

$ dmesg | grep Btrfs
[8.168408] Btrfs loaded, crc32c=crc32c-intel

$ lsmod | grep crc32
crc32_pclmul   16384  0
libcrc32c  16384  1 btrfs
crc32c_generic 16384  0
crc32c_intel   24576  2

$ grep CRC /boot/config-4.17.0-1-amd64
# CONFIG_PCIE_ECRC is not set
# CONFIG_W1_SLAVE_DS2433_CRC is not set
CONFIG_CRYPTO_CRC32C=m
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC4 is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m


On Mon, 23 Jul 2018 at 16:14, Holger Hoffstätte
 wrote:
>
> Hi,
>
> While backporting a bunch of fixes to my own 4.16.x tree
> (4.17 had a few too many bugs for my taste) I also ended up merging:
>
> df91f56adce1f: libcrc32c: Add crc32c_impl function
> 9678c54388b6a: btrfs: Remove custom crc32c init code
>
> ..which AFAIK went into 4.17 and seemed harmless enough; after fixing up
> a trivial context conflict it builds, runs, all good..except that btrfs
> (apprently?) no longer uses the preferred crc32c-intel module, but the
> crc32c-generic one instead.
>
> In order to rule out any mistakes on my part I built 4.18.0-rc6 and it
> seems to have the same problem:
>
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x1   gen() 11267 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x1   xor()  8110 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x2   gen() 13409 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x2   xor()  9137 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x4   gen() 15884 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x4   xor() 10579 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: using algorithm sse2x4 gen() 15884 
> MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6:  xor() 10579 MB/s, rmw enabled
> Jul 23 15:55:09 ragnarok kernel: raid6: using ssse3x2 recovery algorithm
> Jul 23 15:55:09 ragnarok kernel: xor: automatically using best checksumming 
> function   avx
> Jul 23 15:55:09 ragnarok kernel: Btrfs loaded, crc32c=crc32c-generic
>
> I understand that the new crc32c_impl() function changed from
> crypto_tfm_alg_driver_name() to crypto_shash_driver_name() - could this
> be the reason? The module is loaded just fine, but apprently not used:
>
> $lsmod | grep crc32
> crc32_pclmul   16384  0
> crc32c_intel   24576  0
>
> In other words, is this supposed to happen or is my kernel config somehow
> no longer right? It worked before and doesn't look too wrong:
>
> $grep CRC /etc/kernels/kernel-config-x86_64-4.18.0-rc6
> # CONFIG_PCIE_ECRC is not set
> CONFIG_CRYPTO_CRC32C=y
> CONFIG_CRYPTO_CRC32C_INTEL=m
> CONFIG_CRYPTO_CRC32=m
> CONFIG_CRYPTO_CRC32_PCLMUL=m
> # CONFIG_CRYPTO_CRCT10DIF is not set
> CONFIG_CRC_CCITT=m
> CONFIG_CRC16=y
> # CONFIG_CRC_T10DIF is not set
> CONFIG_CRC_ITU_T=y
> CONFIG_CRC32=y
> # CONFIG_CRC32_SELFTEST is not set
> CONFIG_CRC32_SLICEBY8=y
> # CONFIG_CRC32_SLICEBY4 is not set
> # CONFIG_CRC32_SARWATE is not set
> # CONFIG_CRC32_BIT is not set
> # CONFIG_CRC4 is not set
> # CONFIG_CRC7 is not set
> CONFIG_LIBCRC32C=y
> # CONFIG_CRC8 is not set
>
> Ultimately btrfs (and everything else) works, but the process of how
> the kernel selects a crc32c implementation seems rather mysterious to me. :/
>
> Any insights welcome. If it's a regression I can gladly test fixes.
>
> cheers
> Holger
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ongoing Btrfs stability issues

2018-03-13 Thread Patrik Lundquist
On 9 March 2018 at 20:05, Alex Adriaanse  wrote:
>
> Yes, we have PostgreSQL databases running these VMs that put a heavy I/O load 
> on these machines.

Dump the databases and recreate them with --data-checksums and Btrfs
No_COW attribute.

You can add this to /etc/postgresql-common/createcluster.conf in
Debian/Ubuntu if you use pg_createcluster:
initdb_options = '--data-checksums'
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs - failed btrfs replace on RAID1 seems to have left things in a wrong state

2017-12-01 Thread Patrik Lundquist
On 1 December 2017 at 08:18, Duncan <1i5t5.dun...@cox.net> wrote:
>
> When udev sees a device it triggers
> a btrfs device scan, which lets btrfs know which devices belong to which
> individual btrfs.  But once it associates a device with a particular
> btrfs, there's nothing to unassociate it -- the only way to do that on
> a running kernel is to successfully complete a btrfs device remove or
> replacement... and your replace didn't complete due to error.
>
> Of course the other way to do it is to reboot, fresh kernel, fresh
> btrfs state, and it learns again what devices go with which btrfs
> when the appearing devices trigger the udev rule that triggers a
> btrfs scan.

Or reload the btrfs module.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A partially failing disk in raid0 needs replacement

2017-11-14 Thread Patrik Lundquist
On 14 November 2017 at 09:36, Klaus Agnoletti  wrote:
>
> How do you guys think I should go about this?

I'd clone the disk with GNU ddrescue.

https://www.gnu.org/software/ddrescue/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please help with exact actions for raid1 hot-swap

2017-09-10 Thread Patrik Lundquist
On 10 September 2017 at 08:33, Marat Khalili <m...@rqc.ru> wrote:
> It doesn't need replaced disk to be readable, right?

Only enough to be mountable, which it already is, so your read errors
on /dev/sdb isn't a problem.

> Then what prevents same procedure to work without a spare bay?

It is basically the same procedure but with a bunch of gotchas due to
bugs and odd behaviour. Only having one shot at it, before it can only
be mounted read-only, is especially problematic (will be fixed in
Linux 4.14).


> --
>
> With Best Regards,
> Marat Khalili
>
> On September 9, 2017 1:29:08 PM GMT+03:00, Patrik Lundquist 
> <patrik.lundqu...@gmail.com> wrote:
>>On 9 September 2017 at 12:05, Marat Khalili <m...@rqc.ru> wrote:
>>> Forgot to add, I've got a spare empty bay if it can be useful here.
>>
>>That makes it much easier since you don't have to mount it degraded,
>>with the risks involved.
>>
>>Add and partition the disk.
>>
>># btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data
>>
>>Remove the old disk when it is done.
>>
>>> --
>>>
>>> With Best Regards,
>>> Marat Khalili
>>>
>>> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili
>><m...@rqc.ru> wrote:
>>>>Dear list,
>>>>
>>>>I'm going to replace one hard drive (partition actually) of a btrfs
>>>>raid1. Can you please spell exactly what I need to do in order to get
>>>>my
>>>>filesystem working as RAID1 again after replacement, exactly as it
>>was
>>>>before? I saw some bad examples of drive replacement in this list so
>>I
>>>>afraid to just follow random instructions on wiki, and putting this
>>>>system out of action even temporarily would be very inconvenient.
>>>>
>>>>For this filesystem:
>>>>
>>>>> $ sudo btrfs fi show /dev/sdb7
>>>>> Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
>>>>> Total devices 2 FS bytes used 106.23GiB
>>>>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7
>>>>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
>>>>> $ grep /mnt/data /proc/mounts
>>>>> /dev/sda7 /mnt/data btrfs
>>>>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0
>>>>> $ sudo btrfs fi df /mnt/data
>>>>> Data, RAID1: total=123.00GiB, used=104.57GiB
>>>>> System, RAID1: total=8.00MiB, used=48.00KiB
>>>>> Metadata, RAID1: total=3.00GiB, used=1.67GiB
>>>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>> $ uname -a
>>>>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC
>>>>> 2017 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>>I've got this in dmesg:
>>>>
>>>>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0
>>>>> action 0x0
>>>>> [  +0.51] ata6.00: irq_stat 0x4008
>>>>> [  +0.29] ata6.00: failed command: READ FPDMA QUEUED
>>>>> [  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag
>>3
>>>>> ncq 57344 in
>>>>>res 41/40:00:68:6c:f3/00:00:79:00:00/40
>>Emask
>>>>> 0x409 (media error) 
>>>>> [  +0.94] ata6.00: status: { DRDY ERR }
>>>>> [  +0.26] ata6.00: error: { UNC }
>>>>> [  +0.001195] ata6.00: configured for UDMA/133
>>>>> [  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result:
>>hostbyte=DID_OK
>>>>> driverbyte=DRIVER_SENSE
>>>>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error
>>>>> [current] [descriptor]
>>>>> [  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read
>>>>> error - auto reallocate failed
>>>>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00
>>00
>>>>
>>>>> 79 f3 6c 50 00 00 00 70 00 00
>>>>> [  +0.03] blk_update_request: I/O error, dev sdb, sector
>>>>2045996136
>>>>> [  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>>>rd
>>>>> 1, flush 0, corrupt 0, gen 0
>>>>> [  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>>>rd
>>>>> 2, flush 0, corrupt 0, gen 0
>>>>> [  +0.77] ata6: EH complete
>>>>
>>>>There's still 1 in Current_Pending_Sector line of smartctl output as
>>of
>>>>
>>>>now, so it probably won't heal by itself.
>>>>
>>>>--
>>>>
>>>>With Best Regards,
>>>>Marat Khalili
>>>>--
>>>>To unsubscribe from this list: send the line "unsubscribe
>>linux-btrfs"
>>>>in
>>>>the body of a message to majord...@vger.kernel.org
>>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>in
>>the body of a message to majord...@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Patrik Lundquist
On 9 September 2017 at 12:05, Marat Khalili  wrote:
> Forgot to add, I've got a spare empty bay if it can be useful here.

That makes it much easier since you don't have to mount it degraded,
with the risks involved.

Add and partition the disk.

# btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data

Remove the old disk when it is done.

> --
>
> With Best Regards,
> Marat Khalili
>
> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili  wrote:
>>Dear list,
>>
>>I'm going to replace one hard drive (partition actually) of a btrfs
>>raid1. Can you please spell exactly what I need to do in order to get
>>my
>>filesystem working as RAID1 again after replacement, exactly as it was
>>before? I saw some bad examples of drive replacement in this list so I
>>afraid to just follow random instructions on wiki, and putting this
>>system out of action even temporarily would be very inconvenient.
>>
>>For this filesystem:
>>
>>> $ sudo btrfs fi show /dev/sdb7
>>> Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
>>> Total devices 2 FS bytes used 106.23GiB
>>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7
>>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
>>> $ grep /mnt/data /proc/mounts
>>> /dev/sda7 /mnt/data btrfs
>>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0
>>> $ sudo btrfs fi df /mnt/data
>>> Data, RAID1: total=123.00GiB, used=104.57GiB
>>> System, RAID1: total=8.00MiB, used=48.00KiB
>>> Metadata, RAID1: total=3.00GiB, used=1.67GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>> $ uname -a
>>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC
>>> 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>>I've got this in dmesg:
>>
>>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0
>>> action 0x0
>>> [  +0.51] ata6.00: irq_stat 0x4008
>>> [  +0.29] ata6.00: failed command: READ FPDMA QUEUED
>>> [  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3
>>> ncq 57344 in
>>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask
>>> 0x409 (media error) 
>>> [  +0.94] ata6.00: status: { DRDY ERR }
>>> [  +0.26] ata6.00: error: { UNC }
>>> [  +0.001195] ata6.00: configured for UDMA/133
>>> [  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK
>>> driverbyte=DRIVER_SENSE
>>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error
>>> [current] [descriptor]
>>> [  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read
>>> error - auto reallocate failed
>>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00
>>
>>> 79 f3 6c 50 00 00 00 70 00 00
>>> [  +0.03] blk_update_request: I/O error, dev sdb, sector
>>2045996136
>>> [  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>rd
>>> 1, flush 0, corrupt 0, gen 0
>>> [  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>rd
>>> 2, flush 0, corrupt 0, gen 0
>>> [  +0.77] ata6: EH complete
>>
>>There's still 1 in Current_Pending_Sector line of smartctl output as of
>>
>>now, so it probably won't heal by itself.
>>
>>--
>>
>>With Best Regards,
>>Marat Khalili
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>in
>>the body of a message to majord...@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Patrik Lundquist
On 9 September 2017 at 09:46, Marat Khalili  wrote:
>
> Dear list,
>
> I'm going to replace one hard drive (partition actually) of a btrfs raid1. 
> Can you please spell exactly what I need to do in order to get my filesystem 
> working as RAID1 again after replacement, exactly as it was before? I saw 
> some bad examples of drive replacement in this list so I afraid to just 
> follow random instructions on wiki, and putting this system out of action 
> even temporarily would be very inconvenient.


I recently replaced both disks in a two disk Btrfs raid1 to increase
capacity and took some notes.

Using systemd? systemd will automatically unmount a degraded disk and
ruin your one chance to replace the disk as long as Btrfs has the bug
where it notes single chunks and one disk missing and refuses to mount
degraded again.

Comment out your mount in fstab and run "systemctl daemon-reload". The
mount file in /var/run/systemd/generator/ will be removed. (Is there a
better way?)

Unmount the volume.

# hdparm -Y /dev/sdb
# echo 1 > /sys/block/sdb/device/delete

Replace the disk. Create partitions etc. You might have to restart
smartd, if using it.

Make Btrfs forget the old device. Will otherwise think the old disk is
still there. (Is there a better way?)
# rmmod btrfs; modprobe btrfs
# btrfs device scan

# mount -o degraded /dev/sda7 /mnt/data
# btrfs device usage /mnt/data

# btrfs replace start  /dev/sdbX /mnt/data
# btrfs replace status /mnt/data

Convert single or dup chunks to raid1
# btrfs balance start -fv -dconvert=raid1,soft -mconvert=raid1,soft
-sconvert=raid1,soft /mnt/data

Unmount, restore fstab, reload systemd again, mount.

>
> For this filesystem:
>
>> $ sudo btrfs fi show /dev/sdb7
>> Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
>> Total devices 2 FS bytes used 106.23GiB
>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7
>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
>> $ grep /mnt/data /proc/mounts
>> /dev/sda7 /mnt/data btrfs 
>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0
>> $ sudo btrfs fi df /mnt/data
>> Data, RAID1: total=123.00GiB, used=104.57GiB
>> System, RAID1: total=8.00MiB, used=48.00KiB
>> Metadata, RAID1: total=3.00GiB, used=1.67GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> $ uname -a
>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 
>> x86_64 x86_64 x86_64 GNU/Linux
>
>
> I've got this in dmesg:
>
>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action 
>> 0x0
>> [  +0.51] ata6.00: irq_stat 0x4008
>> [  +0.29] ata6.00: failed command: READ FPDMA QUEUED
>> [  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 ncq 
>> 57344 in
>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 0x409 
>> (media error) 
>> [  +0.94] ata6.00: status: { DRDY ERR }
>> [  +0.26] ata6.00: error: { UNC }
>> [  +0.001195] ata6.00: configured for UDMA/133
>> [  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK 
>> driverbyte=DRIVER_SENSE
>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] 
>> [descriptor]
>> [  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - 
>> auto reallocate failed
>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 79 f3 
>> 6c 50 00 00 00 70 00 00
>> [  +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136
>> [  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 1, 
>> flush 0, corrupt 0, gen 0
>> [  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 2, 
>> flush 0, corrupt 0, gen 0
>> [  +0.77] ata6: EH complete
>
>
> There's still 1 in Current_Pending_Sector line of smartctl output as of now, 
> so it probably won't heal by itself.
>
> --
>
> With Best Regards,
> Marat Khalili
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: device usage: don't calculate slack on missing device

2017-08-31 Thread Patrik Lundquist
Print  Device slack:  0.00B
instead of Device slack:   16.00EiB

Signed-off-by: Patrik Lundquist <patrik.lundqu...@gmail.com>
---
 cmds-fi-usage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index 101a0c4..6c846c1 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -1040,6 +1040,7 @@ void print_device_sizes(struct device_info *devinfo, 
unsigned unit_mode)
pretty_size_mode(devinfo->device_size, unit_mode));
printf("   Device slack: %*s%10s\n",
(int)(20 - strlen("Device slack")), "",
-   pretty_size_mode(devinfo->device_size - devinfo->size,
+   pretty_size_mode(devinfo->device_size > 0 ?
+   devinfo->device_size - devinfo->size : 0,
unit_mode));
 }
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: checksum error in metadata node - best way to move root fs to new drive?

2016-08-12 Thread Patrik Lundquist
On 10 August 2016 at 23:21, Chris Murphy  wrote:
>
> I'm using LUKS, aes xts-plain64, on six devices. One is using mixed-bg
> single device. One is dsingle mdup. And then 2x2 mraid1 draid1. I've
> had zero problems. The two computers these run on do have aesni
> support. Aging wise, they're all at least a  year old. But I've been
> using Btrfs on LUKS for much longer than that.

FWIW:
I've had 5 spinning disks with LUKS + Btrfs raid1 for 1,5 years.
Also xts-plain64 with AES-NI acceleration.
No problems so far. Not using Btrfs compression.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Patrik Lundquist
On 21 July 2016 at 15:34, Chris Murphy  wrote:
>
> Do programs have a way to communicate what portion of a data file is
> modified, so that only changed blocks are COW'd? When I change a
> single pixel in a 400MiB image and do a save (to overwrite the
> original file), it takes just as long to overwrite as to write it out
> as a new file. It'd be neat if that could be optimized but I don't see
> it being the case at the moment.

Programs can choose to seek within a file and only overwrite changed
parts, like BitTorrent (use NOCOW or defrag files like that).

Paint programs usually compress the changed image on save, so most of
the file is changed anyway. But if it's a raw image file just writing
the changed pixels should work, but that would require a comparison
with the original image (or a for pixel change history) so I doubt
anyone cares to implement it at the application level.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair

2016-05-08 Thread Patrik Lundquist
On 7 May 2016 at 18:11, Niccolò Belli  wrote:

> Which kind of hardware issue? I did a full memtest86 check, a full 
> smartmontools extended check and even a badblocks -wsv.
> If this is really an hardware issue that we can identify I would be more than 
> happy because Dell will replace my laptop and this nightmare will be finally 
> over. I'm open to suggestions.


Well, your hardware differs from a lot of successful installations.
Are you using any power management tweaks?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scrub: Tree block spanning stripes, ignored

2016-04-07 Thread Patrik Lundquist
On 7 April 2016 at 17:33, Ivan P  wrote:
>
> After running btrfsck --readonly again, the output is:
>
> ===
> Checking filesystem on /dev/sdb
> UUID: 013cda95-8aab-4cb2-acdd-2f0f78036e02
> checking extents
> checking free space cache
> block group 632463294464 has wrong amount of free space
> failed to load free space cache for block group 632463294464

Mount once with option "clear_cache" and check again.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: device stats: Print devid instead of null

2016-04-05 Thread Patrik Lundquist
Print e.g. "[devid:4].write_io_errs   6" instead of
"[(null)].write_io_errs   6" when device is missing.

Signed-off-by: Patrik Lundquist <patrik.lundqu...@gmail.com>
---
 cmds-device.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/cmds-device.c b/cmds-device.c
index b17b6c6..7616c43 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -447,6 +447,13 @@ static int cmd_device_stats(int argc, char **argv)
 
canonical_path = canonicalize_path((char *)path);
 
+   /* No path when device is missing. */
+   if (!canonical_path) {
+   canonical_path = malloc(32);
+   snprintf(canonical_path, 32,
+"devid:%llu", args.devid);
+   }
+
if (args.nr_items >= BTRFS_DEV_STAT_WRITE_ERRS + 1)
printf("[%s].write_io_errs   %llu\n",
   canonical_path,
-- 
2.8.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad metadata crossing stripe boundary

2016-04-02 Thread Patrik Lundquist
On 2 April 2016 at 20:31, Kai Krakow  wrote:
> Am Sat, 2 Apr 2016 11:44:32 +0200
> schrieb Marc Haber :
>
>> On Sat, Apr 02, 2016 at 11:03:53AM +0200, Kai Krakow wrote:
>> > Am Fri, 1 Apr 2016 07:57:25 +0200
>> > schrieb Marc Haber :
>> > > On Thu, Mar 31, 2016 at 11:16:30PM +0200, Kai Krakow wrote:
>>  [...]
>>  [...]
>>  [...]
>> > >
>> > > I cryptsetup luksFormat'ted the partition before I mkfs.btrfs'ed
>> > > it. That should do a much better job than wipefsing it, shouldnt
>> > > it?
>> >
>> > Not sure how luksFormat works. If it encrypts what is already on the
>> > device, it would also encrypt orphan superblocks.
>>
>> It overwrites the LUKS metadata including the symmetric key that was
>> used to encrypt the existing data. Short of Shor's Algorithm and
>> Quantum Computers, after that operation it is no longer possible to
>> even guess what was on the disk before.
>
> If it was encrypted before... ;-)

What does wipefs -n find?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: attempt to mount after crash during rebalance hard crashes server

2016-03-29 Thread Patrik Lundquist
On 29 March 2016 at 22:46, Chris Murphy  wrote:
> On Tue, Mar 29, 2016 at 2:21 PM, Warren, Daniel
>  wrote:
>> Greetings all,
>>
>> I'm running 4.4.0 from deb sid
>>
>> btrfs fi sh http://pastebin.com/QLTqSU8L
>> kernel panic http://pastebin.com/aBF6XmzA
>
> Panic shows:
> CPU: 0 PID: 153 Comm: kworker/u8:13 Not tainted 3.16-2-amd64 #1 Debian 
> 3.16.3-2

That kernel is from 2014-09-20, long before even Jessie was released.

Current Sid is 4.4.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-28 Thread Patrik Lundquist
On 28 March 2016 at 05:54, Anand Jain <anand.j...@oracle.com> wrote:
>
> On 03/26/2016 07:51 PM, Patrik Lundquist wrote:
>>
>> # btrfs device stats /mnt
>>
>> [/dev/sde].write_io_errs   11
>> [/dev/sde].read_io_errs0
>> [/dev/sde].flush_io_errs   2
>> [/dev/sde].corruption_errs 0
>> [/dev/sde].generation_errs 0
>>
>> The old counters are back. That's good, but wtf?
>
>
>  No. I doubt if they are old counters. The steps above didn't
>  show old error counts, but since you have created a file
>  test3 so there will be some write_io_errors, which we don;t
>  see after the balance. So I doubt if they are old counter
>  but instead they are new flush errors.

No, /mnt/test3 doesn't generate errors, only 'single' block groups.
The old counters seem to be cached somewhere and replace doesn't reset
them everywhere.

One more time with more device stats and I've upgraded the kernel to
Linux debian 4.5.0-trunk-amd64 #1 SMP Debian 4.5-1~exp1 (2016-03-20)
x86_64 GNU/Linux

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# mount /dev/sdb /mnt; dmesg | tail
# touch /mnt/test1; sync; btrfs device usage /mnt

Only raid10 profiles.

# echo 1 >/sys/block/sde/device/delete; dmesg | tail

[  426.831037] sd 5:0:0:0: [sde] Synchronizing SCSI cache
[  426.831517] sd 5:0:0:0: [sde] Stopping disk
[  426.845199] ata6.00: disabled

We lost a disk.

# touch /mnt/test2; sync; dmesg | tail

[  467.126471] BTRFS error (device sde): bdev /dev/sde errs: wr 1, rd
0, flush 0, corrupt 0, gen 0
[  467.127386] BTRFS error (device sde): bdev /dev/sde errs: wr 2, rd
0, flush 0, corrupt 0, gen 0
[  467.128125] BTRFS error (device sde): bdev /dev/sde errs: wr 3, rd
0, flush 0, corrupt 0, gen 0
[  467.128640] BTRFS error (device sde): bdev /dev/sde errs: wr 4, rd
0, flush 0, corrupt 0, gen 0
[  467.129215] BTRFS error (device sde): bdev /dev/sde errs: wr 4, rd
0, flush 1, corrupt 0, gen 0
[  467.129331] BTRFS warning (device sde): lost page write due to IO
error on /dev/sde
[  467.129334] BTRFS error (device sde): bdev /dev/sde errs: wr 5, rd
0, flush 1, corrupt 0, gen 0
[  467.129420] BTRFS warning (device sde): lost page write due to IO
error on /dev/sde
[  467.129422] BTRFS error (device sde): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

We've got write errors on the lost disk.

# btrfs device usage /mnt

No 'single' profiles because we haven't remounted yet.

# btrfs device stat /mnt

[/dev/sde].write_io_errs   6
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   1
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

# reboot
# wipefs -a /dev/sde; reboot

# mount -o degraded /dev/sdb /mnt; dmesg | tail

[   52.876897] BTRFS info (device sdb): allowing degraded mounts
[   52.876901] BTRFS info (device sdb): disk space caching is enabled
[   52.876902] BTRFS: has skinny extents
[   52.878008] BTRFS warning (device sdb): devid 4 uuid
231d7892-3f31-40b5-8dff-baf8fec1a8aa is missing
[   52.879057] BTRFS info (device sdb): bdev (null) errs: wr 6, rd 0,
flush 1, corrupt 0, gen 0

# btrfs device usage /mnt

Still only raid10 profiles.

# btrfs device stat /mnt

[(null)].write_io_errs   6
[(null)].read_io_errs0
[(null)].flush_io_errs   1
[(null)].corruption_errs 0
[(null)].generation_errs 0

/dev/sde is now called "(null)". Print device id instead? E.g.
"[devid:4].write_io_errs   6"

# touch /mnt/test3; sync; btrfs device usage /mnt
/dev/sdb, ID: 1
   Device size: 2.00GiB
   Data,single:   624.00MiB
   Data,RAID10:   102.38MiB
   Metadata,RAID10:   102.38MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.19GiB

/dev/sdc, ID: 2
   Device size: 2.00GiB
   Data,RAID10:   102.38MiB
   Metadata,RAID10:   102.38MiB
   System,single:  32.00MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.76GiB

/dev/sdd, ID: 3
   Device size: 2.00GiB
   Data,RAID10:   102.38MiB
   Metadata,single:   256.00MiB
   Metadata,RAID10:   102.38MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.55GiB

missing, ID: 4
   Device size:   0.00B
   Data,RAID10:   102.38MiB
   Metadata,RAID10:   102.38MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.80GiB

Now we've got 'single' profiles on all devices except the missing one.
Replace missing device before unmount or get stuck with a read-only
filesystem.

# btrfs device stat /mnt

Same as before. Only old errors on the missing device.

# btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail

[ 1268.598652] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde started
[ 1268.615601] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde finished

# btrfs device stats /mnt

[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   0

Re: Possible Raid Bug

2016-03-26 Thread Patrik Lundquist
So with the lessons learned:

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# mount /dev/sdb /mnt; dmesg | tail
# touch /mnt/test1; sync; btrfs device usage /mnt

Only raid10 profiles.

# echo 1 >/sys/block/sde/device/delete

We lost a disk.

# touch /mnt/test2; sync; dmesg | tail

We've got write errors.

# btrfs device usage /mnt

No 'single' profiles because we haven't remounted yet.

# reboot
# wipefs -a /dev/sde; reboot

# mount -o degraded /dev/sdb /mnt; dmesg | tail
# btrfs device usage /mnt

Still only raid10 profiles.

# touch /mnt/test3; sync; btrfs device usage /mnt

Now we've got 'single' profiles. Replace now or get hosed.

# btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

We didn't inherit the /dev/sde error count. Is that a bug?

# btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft
-sconvert=raid10,soft -vf /mnt; dmesg | tail

# btrfs device usage /mnt

Back to only 'raid10' profiles.

# umount /mnt; mount /dev/sdb /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   11
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   2
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

The old counters are back. That's good, but wtf?

# btrfs device stats -z /dev/sde

Give /dev/sde a clean bill of health. Won't warn when mounting again.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Patrik Lundquist
On 25 March 2016 at 18:20, Stephen Williams  wrote:
>
> Your information below was very helpful and I was able to recreate the
> Raid array. However my initial question still stands - What if the
> drives dies completely? I work in a Data center and we see this quite a
> lot where a drive is beyond dead - The OS will literally not detect it.

That's currently a weakness of Btrfs. I don't know how people deal
with it in production. I think Anand Jain is working on improving it.

> At this point would the Raid10 array be beyond repair? As you need the
> drive present in order to mount the array in degraded mode.

Right... let's try it again but a little bit differently.

# mount /dev/sdb /mnt

Let's drop the disk.

# echo 1 >/sys/block/sde/device/delete

[ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache
[ 3669.024934] sd 5:0:0:0: [sde] Stopping disk
[ 3669.037028] ata6.00: disabled

# touch /mnt/test3
# sync

[ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd
0, flush 0, corrupt 0, gen 0
[ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd
0, flush 0, corrupt 0, gen 0
[ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd
0, flush 0, corrupt 0, gen 0
[ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
0, flush 0, corrupt 0, gen 0
[ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
0, flush 1, corrupt 0, gen 0
[ 3845.963686] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd
0, flush 1, corrupt 0, gen 0
[ 3845.963932] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

# umount /mnt

[ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd
0, flush 1, corrupt 0, gen 0
[ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
0, flush 1, corrupt 0, gen 0
[ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
0, flush 2, corrupt 0, gen 0
[ 4095.279373] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd
0, flush 2, corrupt 0, gen 0
[ 4095.279609] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd
0, flush 2, corrupt 0, gen 0

# mount -o degraded /dev/sdb /mnt

[ 4608.113751] BTRFS info (device sdb): allowing degraded mounts
[ 4608.113756] BTRFS info (device sdb): disk space caching is enabled
[ 4608.113757] BTRFS: has skinny extents
[ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

# touch /mnt/test4
# sync

Writing to the filesystem works while the device is missing.
No new errors in dmesg after re-mounting degraded. Reboot to get back /dev/sde.

[4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 4 transid 26 /dev/sde
[4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 3 transid 31 /dev/sdd
[4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 2 transid 31 /dev/sdc
[4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 1 transid 31 /dev/sdb

/dev/sde transid is lagging behind, of course.

# wipefs -a /dev/sde
# btrfs device scan

# mount -o degraded /dev/sdb /mnt

[  507.248621] BTRFS info (device sdb): allowing degraded mounts
[  507.248626] BTRFS info (device sdb): disk space caching is enabled
[  507.248628] BTRFS: has skinny extents
[  507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0
[  507.252919] BTRFS: missing devices(1) exceeds the limit(0),
writeable mount is not allowed
[  507.278277] BTRFS: open_ctree failed

Well, that was unexpected! Reboot again.

# mount -o degraded /dev/sdb /mnt

[   94.368514] BTRFS info (device sdd): allowing degraded mounts
[   94.368519] BTRFS info (device sdd): disk space caching is enabled
[   94.368521] BTRFS: has skinny extents
[   94.370909] BTRFS warning (device sdd): devid 4 uuid
8549a275-f663-4741-b410-79b49a1d465f is missing
[   94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0,
flush 1, corrupt 0, gen 0
[   94.372284] BTRFS: missing devices(1) exceeds the limit(0),
writeable mount is not allowed
[   94.395021] BTRFS: open_ctree failed

No go.

# mount -o degraded,ro /dev/sdb /mnt
# btrfs device stats /mnt
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs0
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs   0
[/dev/sdc].read_io_errs0
[/dev/sdc].flush_io_errs   0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sdd].write_io_errs   0
[/dev/sdd].read_io_errs0
[/dev/sdd].flush_io_errs   0
[/dev/sdd].corruption_errs 0

Re: Possible Raid Bug

2016-03-25 Thread Patrik Lundquist
On Debian Stretch with Linux 4.4.6, btrfs-progs 4.4 in VirtualBox
5.0.16 with 4*2GB VDIs:

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sdbe

# mount /dev/sdb /mnt
# touch /mnt/test
# umount /mnt

Everything fine so far.

# wipefs -a /dev/sde

*reboot*

# mount /dev/sdb /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

# dmesg | tail
[   85.979655] BTRFS info (device sdb): disk space caching is enabled
[   85.979660] BTRFS: has skinny extents
[   85.982377] BTRFS: failed to read the system array on sdb
[   85.996793] BTRFS: open_ctree failed

Not very informative! An information regression?

# mount -o degraded /dev/sdb /mnt

# dmesg | tail
[  919.899071] BTRFS info (device sdb): allowing degraded mounts
[  919.899075] BTRFS info (device sdb): disk space caching is enabled
[  919.899077] BTRFS: has skinny extents
[  919.903216] BTRFS warning (device sdb): devid 4 uuid
8549a275-f663-4741-b410-79b49a1d465f is missing

# touch /mnt/test2
# ls -l /mnt/
total 0
-rw-r--r-- 1 root root 0 mar 25 15:17 test
-rw-r--r-- 1 root root 0 mar 25 15:42 test2

# btrfs device remove missing /mnt
ERROR: error removing device 'missing': unable to go below four
devices on raid10

As expected.

# btrfs replace start -B missing /dev/sde /mnt
ERROR: source device must be a block device or a devid

Would have been nice if missing worked here too. Maybe it does in
btrfs-progs 4.5?

# btrfs replace start -B 4 /dev/sde /mnt

# dmesg | tail
[ 1618.170619] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde started
[ 1618.184979] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde finished

Repaired!

# umount /mnt
# mount /dev/sdb /mnt
# dmesg | tail
[ 1729.917661] BTRFS info (device sde): disk space caching is enabled
[ 1729.917665] BTRFS: has skinny extents

All in all it works just fine with Linux 4.4.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-25 Thread Patrik Lundquist
On 23 March 2016 at 20:33, Chris Murphy  wrote:
>
> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton  wrote:
> >
> > I am surprised to hear it said that having the mixed sizes is an odd
> > case.
>
> Not odd as in wrong, just uncommon compared to other arrangements being 
> tested.

I think mixed drive sizes in raid1 is a killer feature for a home NAS,
where you replace an old smaller drive with the latest and largest
when you need more storage.

My raid1 currently consists of 6TB+3TB+3*2TB.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Patrik Lundquist
On 25 March 2016 at 12:49, Stephen Williams  wrote:
>
> So catch 22, you need all the drives otherwise it won't let you mount,
> But what happens if a drive dies and the OS doesn't detect it? BTRFS
> wont allow you to mount the raid volume to remove the bad disk!

Version of Linux and btrfs-progs?

You can't have a raid10 with less than 4 devices so you need to add a
new device before deleting the missing. That is of course still a
problem with a read-only fs.

btrfs replace is also the recommended way to replace a failed device
nowadays. The wiki is outdated.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Major HDD performance degradation on btrfs receive

2016-02-24 Thread Patrik Lundquist
On 23 February 2016 at 18:26, Marc MERLIN  wrote:
>
> I'm currently doing a very slow defrag to see if it'll help (looks like
> it's going to take days).
> I'm doing this:
> for i in dir1 dir2 debian32 debian64 ubuntu dir4 ; do echo $i; time btrfs fi 
> defragment -v -r $i; done
[snip]
> Also, should I try running defragment -r from cron from time to time?

I find the default threshold a bit low and defragment daily with "-t
1m" to combat heavy random write fragmentation.

Once in a while I defrag e.g. VM disk images with "-t 128m" but find
higher thresholds mostly a waste of time.

YMMV.


> But, just to be clear, is there a way I missed to see how fragmented my
> filesystem is without running filefrag on millions of files and parsing
> the output?

I don't think so, and filefrag is slow with heavily fragmented files
because ioctl(FS_IOC_FIEMAP) is called many times with a buffer which
only fits 292 fiemap_extents.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 disk upgrade method

2016-02-04 Thread Patrik Lundquist
On 30 January 2016 at 15:50, Patrik Lundquist
<patrik.lundqu...@gmail.com> wrote:
> On 29 January 2016 at 13:14, Austin S. Hemmelgarn <ahferro...@gmail.com> 
> wrote:
>>
>> Last I checked, Seagate's 'NAS' drives and whatever they've re-branded their 
>> other enterprise line as, as well as WD's 'Red' drives support both SCT ERC 
>> and FUA, but I don't know about any other brands (most of the Hitachi, 
>> Toshiba, and Samsung drives I've seen do not support FUA).
>
> I don't know about WD Red Pro but my WD Reds don't support FUA.
>
> Can I list supported commands with something like hdparm? I'm curious
> about a WD Re in a LSI RAID.

No FUA in WD Re either.

[20312.701155] scsi 4:0:0:0: Direct-Access ATA  WDC
WD5003ABYZ-0 1S03 PQ: 0 ANSI: 5
[20312.701453] sd 4:0:0:0: [sdb] 976773168 512-byte logical blocks:
(500 GB/465 GiB)
[20312.701454] sd 4:0:0:0: Attached scsi generic sg2 type 0
[20312.701603] sd 4:0:0:0: [sdb] Write Protect is off
[20312.701609] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[20312.701663] sd 4:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[20312.712396] sd 4:0:0:0: [sdb] Attached SCSI disk
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "WARNING: device 0 not present" during scrub?

2016-02-01 Thread Patrik Lundquist
On 30 January 2016 at 12:59, Christian Pernegger  wrote:
>
> This is on a 1-month-old Debian stable (jessie) install and yes, I
> know that means the kernel and btrfs-progs are ancient

apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64

Or something like that for the image name. Unfortunately there's no
stable backport of btrfs-tools (as they call btrfs-progs).

https://tracker.debian.org/pkg/linux
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cannot repair filesystem

2016-01-06 Thread Patrik Lundquist
On 1 January 2016 at 16:44, Jan Koester  wrote:
>
> Hi,
>
> if I try to repair filesystem got I'am assert. I use Raid6.
>
> Linux dibsi 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt4-3~bpo70+1 
> (2015-02-12) x86_64 GNU/Linux

Raid6 wasn't completed until Linux 3.19 and I wouldn't call it stable yet.

https://btrfs.wiki.kernel.org/index.php/RAID56

I suggest you upgrade from Wheezy to Jessie and install the lastest
backports kernel and latest btrfs-progs from Git (there's no
stable-bpo for btrfs-tools) if you want to use raid56.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 3.19 and still "disk full" even though 'btrfs fi df" reports enough room left?

2015-11-19 Thread Patrik Lundquist
On 19 November 2015 at 06:58, Roman Mamedov  wrote:
>
> On Wed, 18 Nov 2015 19:53:03 +0100
> linux-btrfs.tebu...@xoxy.net wrote:
>
> >   $ uname -a
> >   Linux neptun 3.19.0-31-generic #36~14.04.1-Ubuntu SMP Thu Oct 8
> > 10:21:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[...]
>
> So my suggestion would be to try a newer kernel from www.kernel.org: if the
> problem disappears at 4.1 then just keep on using that, or 4.3 if you have to,
> but otherwise that one might be a bit too new to start using right away.

Give http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1.13-wily/ a try.

wget -e robots=off -r -l1 -np -nd -A '*all.deb','*generic*amd64.deb'
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1.13-wily/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More memory more jitters?

2015-11-15 Thread Patrik Lundquist
On 14 November 2015 at 15:11, CHENG Yuk-Pong, Daniel  wrote:
>
> Background info:
>
> I am running a heavy-write database server with 96GB ram. In the worse
> case it cause multi minutes of high cpu loads. Systemd keeping kill
> and restarting services, and old job don't die because they stuck in
> uninterruptable wait... etc.
>
> Tried with nodatacow, but it seems only affect new file. It is not an
> subvolume option either...

How about nocow (chattr +C) on the database directories? You will have
to copy the files to make nocow versions of them.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs/RAID5 became unmountable after SATA cable fault

2015-11-06 Thread Patrik Lundquist
On 6 November 2015 at 10:03, Janos Toth F.  wrote:
>
> Although I updated the firmware of the drives. (I found an IMPORTANT
> update when I went there to download SeaTools, although there was no
> change log to tell me why this was important). This might changed the
> error handling behavior of the drive...?

I've had Seagate drives not reporting errors until I updated the
firmware. They tended to timeout instead. Got a shitload of SMART
errors after I updated, but they still didn't handle errors very well
(became unresponsive).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Removing bad hdd from btrfs volume

2015-08-07 Thread Patrik Lundquist
On 7 August 2015 at 00:17, Peter Foley pefol...@pefoley.com wrote:
 Hi,

 I have an btrfs volume that spans multiple disks (no raid, just
 single), and earlier this morning I hit some hardware problems with
 one of the disks.
 I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the
 1gb that appears to be causing the read errors.
 See http://sprunge.us/aeZC

You might want to try to save as much as possible from the failing
disk with the help of GNU ddrescue. Either by copying sda to a
replacement disk or by copying sda1 to a file for loopback mounting.

Unmount filesystem before copying and remove sda before you mount with the copy.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Inappropriate ioctl for device

2015-07-25 Thread Patrik Lundquist
On 25 July 2015 at 10:56, Mojtaba ker...@rp2.org wrote:

 System is debian wheezy or Jessie.
 This is Debian Jessie:

 root@s2:/# uname -a
 Linux s2 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64 GNU/Linux

That's a way too old kernel to be running Btrfs on. You should be
running on at least the Jessie 3.16 kernel.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: defrag: fix threshold overflow again

2015-07-24 Thread Patrik Lundquist
Commit dedb1ebeee847e3c4d71e14d0c1077887630e44a broke commit
96cfbbf0ea9fce7ecaa9e03964474f407f6e76ab.

Casting thresh value greater than (u32)-1 simply truncates bits while
desired value is (u32)-1 for max defrag threshold.

I.e. btrfs fi defrag -t 4g is trimmed/truncated to 0
and -t 5g to 1073741824.

Also added a missing newline.

Signed-off-by: Patrik Lundquist patrik.lundqu...@gmail.com
---
 cmds-filesystem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 800aa4d..00a3f78 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -1172,8 +1172,9 @@ static int cmd_defrag(int argc, char **argv)
thresh = parse_size(optarg);
if (thresh  (u32)-1) {
fprintf(stderr,
-   WARNING: target extent size %llu too big, trimmed to 
%u,
+   WARNING: target extent size %llu too big, trimmed to 
%u\n,
thresh, (u32)-1);
+   thresh = (u32)-1;
}
defrag_global_fancy_ioctl = 1;
break;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: defrag: remove unused variable

2015-07-24 Thread Patrik Lundquist
A leftover from when recursive defrag was added.

Signed-off-by: Patrik Lundquist patrik.lundqu...@gmail.com
---
 cmds-filesystem.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 00a3f78..1b7b4c1 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -1131,7 +1131,6 @@ static int cmd_defrag(int argc, char **argv)
int i;
int recursive = 0;
int ret = 0;
-   struct btrfs_ioctl_defrag_range_args range;
int e = 0;
int compress_type = BTRFS_COMPRESS_NONE;
DIR *dirstream;
@@ -1189,7 +1188,7 @@ static int cmd_defrag(int argc, char **argv)
if (check_argc_min(argc - optind, 1))
usage(cmd_defrag_usage);
 
-   memset(defrag_global_range, 0, sizeof(range));
+   memset(defrag_global_range, 0, sizeof(defrag_global_range));
defrag_global_range.start = start;
defrag_global_range.len = len;
defrag_global_range.extent_thresh = (u32)thresh;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-07-21 Thread Patrik Lundquist
On 14 July 2015 at 21:15, Hugo Mills h...@carfax.org.uk wrote:
 On Tue, Jul 14, 2015 at 09:09:00PM +0200, Patrik Lundquist wrote:
 On 14 July 2015 at 20:41, Hugo Mills h...@carfax.org.uk wrote:
  On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
  On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:
  
   Regardless of whether 1 or huge -t means maximum defrag, however, the
   nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
   should be considered ideally defragged at 31 extents.  This is a
   departure from ext4, which AFAIK in theory has no extent upper limit, so
   should be able to do that 30 GiB file in a single extent.
  
   But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 
   extents
   still indicates at least some remaining fragmentation.
 
  So I converted the VMware VMDK file to a VirtualBox VDI file:
 
  -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
  -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
 
  $ filefrag Windows7.vdi
  Windows7.vdi: 15 extents found
 
  $ btrfs filesystem defragment -t 3g Windows7.vdi
  $ filefrag Windows7.vdi
  Windows7.vdi: 24 extents found
 
  How can it be less than 28 extents with a chunk size of 1 GiB?
 
 I _think_ the fragment size will be limited by the block group
  size. This is not the same as the chunk size for some RAID levels --
  for example, RAID-0, a block group can be anything from 2 to n chunks
  (across the same number of devices), where each chunk is 1 GiB, so
  potentially you could have arbitrary-sized block groups. The same
  would apply to RAID-10, -5 and -6.
 
 (Note, I haven't verified this, but it makes sense based on what I
  know of the internal data structures).

 It's a raid1 filesystem, so the block group ought to be the same size
 as the chunk, right?

Yes.

 A 2GiB block group would suffice to explain it though.

Not with RAID-1 -- I'd expect the block group size to be 1 GiB.

So I had a look at the filefrag source and filefrag actually doesn't
print the number of extents but the number of disk fragments.
Contiguously allocated extents counts as one fragment.

Windows7.vdi: 47 extents found is really 213 extents over 47 disk fragments.

But I have one 2GiB extent, according to filefrag -v, so the question
remains. :-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-07-14 Thread Patrik Lundquist
On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:

 Regardless of whether 1 or huge -t means maximum defrag, however, the
 nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
 should be considered ideally defragged at 31 extents.  This is a
 departure from ext4, which AFAIK in theory has no extent upper limit, so
 should be able to do that 30 GiB file in a single extent.

 But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
 still indicates at least some remaining fragmentation.

So I converted the VMware VMDK file to a VirtualBox VDI file:

-rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
-rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi

$ filefrag Windows7.vdi
Windows7.vdi: 15 extents found

$ btrfs filesystem defragment -t 3g Windows7.vdi
$ filefrag Windows7.vdi
Windows7.vdi: 24 extents found

How can it be less than 28 extents with a chunk size of 1 GiB?

E2fsprogs version 1.42.12
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-07-14 Thread Patrik Lundquist
On 14 July 2015 at 20:41, Hugo Mills h...@carfax.org.uk wrote:
 On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
 On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:
 
  Regardless of whether 1 or huge -t means maximum defrag, however, the
  nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
  should be considered ideally defragged at 31 extents.  This is a
  departure from ext4, which AFAIK in theory has no extent upper limit, so
  should be able to do that 30 GiB file in a single extent.
 
  But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
  still indicates at least some remaining fragmentation.

 So I converted the VMware VMDK file to a VirtualBox VDI file:

 -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
 -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi

 $ filefrag Windows7.vdi
 Windows7.vdi: 15 extents found

 $ btrfs filesystem defragment -t 3g Windows7.vdi
 $ filefrag Windows7.vdi
 Windows7.vdi: 24 extents found

 How can it be less than 28 extents with a chunk size of 1 GiB?

I _think_ the fragment size will be limited by the block group
 size. This is not the same as the chunk size for some RAID levels --
 for example, RAID-0, a block group can be anything from 2 to n chunks
 (across the same number of devices), where each chunk is 1 GiB, so
 potentially you could have arbitrary-sized block groups. The same
 would apply to RAID-10, -5 and -6.

(Note, I haven't verified this, but it makes sense based on what I
 know of the internal data structures).

It's a raid1 filesystem, so the block group ought to be the same size
as the chunk, right?

A 2GiB block group would suffice to explain it though.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't remove missing device

2015-07-13 Thread Patrik Lundquist
On 10 July 2015 at 06:05, None None whocares0...@freemail.hu wrote:
 According to dmesg sda returns bad data but the smart values for it seem fine.

 # smartctl -a /dev/sda
...
 SMART Self-test log structure revision number 1
 No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Run smartctl -t long /dev/sda
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: inspect: Fix out of bounds string termination.

2015-06-26 Thread Patrik Lundquist
Signed-off-by: Patrik Lundquist patrik.lundqu...@gmail.com
---
 cmds-inspect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-inspect.c b/cmds-inspect.c
index 053cf8e..aafe37d 100644
--- a/cmds-inspect.c
+++ b/cmds-inspect.c
@@ -293,7 +293,7 @@ static int cmd_subvolid_resolve(int argc, char **argv)
goto out;
}
 
-   path[PATH_MAX] = '\0';
+   path[PATH_MAX-1] = '\0';
printf(%s\n, path);
 
 out:
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-06-25 Thread Patrik Lundquist
On 25 June 2015 at 06:01, Duncan 1i5t5.dun...@cox.net wrote:

 Patrik Lundquist posted on Wed, 24 Jun 2015 14:05:57 +0200 as excerpted:

  On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:

 If it's uint32 limited, either kill everything above that in both the
 documentation and code, or alias everything above that to 3G (your next
 paragraph) or whatever.

My simple overflow patch yesterday fixes the problem, so 4G or larger
is max instead of 0.


  But btrfs or ext4, 31 extents ideal or a single extent ideal, 150
  extents still indicates at least some remaining fragmentation.
 
  I gave it another shot but I've now got 154 extents instead. :-)

 Is it possible there's simply no gig-size free-space holes in the
 filesystem allocation, so it simply /can't/ defrag further than that,
 because there's no place to allocate whole-gig data chunks at a time?

I would guess so, without allocating new chunks. Defrag can probably
be smarter and avoid rewriting extents if it means splitting them
(unless the compression flag is set and it must rewrite everything).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-06-24 Thread Patrik Lundquist
On 24 June 2015 at 05:20, Marc MERLIN m...@merlins.org wrote:

 Hello again,

 Just curious, is anyone seeing similar things with big VM images or other
 DBs?
 I forgot to mention that my vdi file is 88GB.

 It's surprising that it took longer to count the fragments than to actually
 defragment the file.
 Or that it took 3 defrag runs to get down to 11K extents from 104K.

 Are others seeing similar things?

Filefrag is pretty much instant for my 30GB (150 extents) virtual
disk, no CoW on file, no snapshots on volume.

But what doesn't make sense to me is btrfs fi defrag; the -t option says

   -t size
   defragment only files at least size bytes big

The -t value goes into struct
btrfs_ioctl_defrag_range_args.extent_thresh which is documented as

   /*
 * any extent bigger than this will be considered
 * already defragged.  Use 0 to take the kernel default
 * Use 1 to say every single extent must be rewritten
 */

Default extent_thresh is 256K. I can't see how 1 would say every
single extent must be rewritten. On the contrary; 1 skips every
extent. The compress flag even sets extent_thresh=(u32)-1 to force a
rewrite.

Marc, try btrfs fi defrag -t 4294967295 Win7.vdi for maximum defrag
and time filefrag again with fewer extents.

/Patrik


 Marc

 On Thu, Jun 04, 2015 at 05:42:45PM +0900, Marc MERLIN wrote:
  Hi Chris,
 
  After our quick chat, I gave it a shot on 3.19.6, and things are better
  than last time I tried.
 
  legolas:/var/local/nobck/VirtualBox VMs# lsattr Win7/
  ---C Win7/Logs
  ---C Win7/Snapshots
  ---C Win7/Win7.vdi
  ---C Win7/Win7.png
  ---C Win7/autotune1.png
  ---C Win7/new_autotune2.png
  ---C Win7/Win7.vbox-prev
  ---C Win7/Win7.vbox
 
  But I have snapshots of that subvolume, so obviously that gets
  in the way of disabling COW.
 
  I had a look, and I have 100K fragments. That took 10mn to figure out:
 
  legolas:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi
  Win7.vdi: 104306 extents found
 
  This first filefrag run took about 10mn to count all the fragments on my
  SSD. That feels a bit slow, but maybe the userland tool is doing things
  in suboptimal ways.
 
  Defrag actually worked (mostly) and wasn't too slow. It used to take hours
  not to finish, and now it worked in 3mn:
  legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi
  real  3m43.807s
  user  0m0.000s
  sys   0m44.044s
 
  This is defintely better than before.
  Note that it's not fully defragged, but close enough. Each subsequent
  run, filefrag is faster, and defrag is still faster than filefrag:
 
  legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
  Win7.vdi: 11428 extents found
  real  2m42.090s
  user  0m0.000s
  sys   2m37.308s
 
  legolas:/var/local/nobck/VirtualBox VMs/Win7# time btrfs fi defrag Win7.vdi
  real  0m7.483s
  user  0m0.000s
  sys   0m2.672s
 
  legolas:/var/local/nobck/VirtualBox VMs/Win7# time filefrag Win7.vdi
  Win7.vdi: 11132 extents found
  real  0m22.525s
  user  0m0.000s
  sys   0m22.264s
 
  It's a bit unexpected that I still have 10k fragments after 2 defrag
  runs, but it's better than 100k :)
 
  Marc
  --
  A mouse is a device used to point at the xterm you want to type in - 
  A.S.R.
  Microsoft is to operating systems 
 what McDonalds is to gourmet 
  cooking
  Home page: http://marc.merlins.org/ | PGP 
  1024R/763BE901
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

 --
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-06-24 Thread Patrik Lundquist
On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:
 Patrik Lundquist posted on Wed, 24 Jun 2015 10:28:09 +0200 as excerpted:

 AFAIK, it's set huge to defrag everything,

It's set to 256K by default.


 Assuming set a huge -t to defrag to the maximum extent possible is
 correct, that means -t 1G should be exactly as effective as -t 1T...

1G is actually more effective because 1T overflows the uint32
extent_thresh field, so 1T, 0, and 256K are currently the same.

3G is the largest value that works with -t as expected (disregarding
the man page) and is easy to type.


 But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
 still indicates at least some remaining fragmentation.

I gave it another shot but I've now got 154 extents instead. :-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Fix defrag threshold overflow

2015-06-24 Thread Patrik Lundquist
btrfs fi defrag -t 1T overflows the u32 thresh variable and default, instead of 
max, threshold is used.

Signed-off-by: Patrik Lundquist patrik.lundqu...@gmail.com
---
 cmds-filesystem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 530f815..72bb45b 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -1127,7 +1127,7 @@ static int cmd_defrag(int argc, char **argv)
int flush = 0;
u64 start = 0;
u64 len = (u64)-1;
-   u32 thresh = 0;
+   u64 thresh = 0;
int i;
int recursive = 0;
int ret = 0;
@@ -1186,7 +1186,7 @@ static int cmd_defrag(int argc, char **argv)
memset(defrag_global_range, 0, sizeof(range));
defrag_global_range.start = start;
defrag_global_range.len = len;
-   defrag_global_range.extent_thresh = thresh;
+   defrag_global_range.extent_thresh = thresh  (u32)-1 ? (u32)-1 : 
(u32)thresh;
if (compress_type) {
defrag_global_range.flags |= BTRFS_DEFRAG_RANGE_COMPRESS;
defrag_global_range.compress_type = compress_type;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs performance - ssd array

2015-01-12 Thread Patrik Lundquist
On 12 January 2015 at 15:54, Austin S Hemmelgarn ahferro...@gmail.com wrote:

 Another thing to consider is that the kernel's default I/O scheduler and the 
 default parameters for that I/O scheduler are almost always suboptimal for 
 SSD's, and this tends to show far more with BTRFS than anything else.  
 Personally I've found that using the CFQ I/O scheduler with the following 
 parameters works best for a majority of SSD's:
 1. slice_idle=0
 2. back_seek_penalty=1
 3. back_seek_max set equal to the size in sectors of the device
 4. nr_requests and quantum set to the hardware command queue depth

 You can easily set these persistently for a given device with a udev rule 
 like this:
   KERNEL=='sda', SUBSYSTEM=='block', ACTION=='add', 
 ATTR{queue/scheduler}='cfq', ATTR{queue/iosched/back_seek_penalty}='1', 
 ATTR{queue/iosched/back_seek_max}='device_size', 
 ATTR{queue/iosched/quantum}='128', ATTR{queue/iosched/slice_idle}='0', 
 ATTR{queue/nr_requests}='128'

 Make sure to replace '128' in the rule with whatever the command queue depth 
 is for the device in question (It's usually 128 or 256, occasionally more), 
 and device_size with the size of the device in kibibytes.


So is it size in sectors of the device or size of the device in
kibibytes for back_seek_max? :-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs on top of LUKS (dm-crypt)

2015-01-12 Thread Patrik Lundquist
Hi,

I've been looking at recommended cryptsetup options for Btrfs and I
have one question:

Marc uses cryptsetup luksFormat --align-payload=1024 directly on a
disk partition and not on e.g. a striped mdraid. Is there a Btrfs
reason for that alignment?

http://marc.merlins.org/perso/btrfs/post_2014-04-27_Btrfs-Multi-Device-Dmcrypt.html

Thanks,
Patrik
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS free space handling still needs more work: Hangs again

2014-12-28 Thread Patrik Lundquist
On 28 December 2014 at 13:03, Martin Steigerwald mar...@lichtvoll.de wrote:

 BTW, I found that the Oracle blog didn´t work at all for me. I completed
 a cycle of defrag, sdelete -c and VBoxManage compact, [...] and it
 apparently did *nothing* to reduce the size of the file.

They've changed the argument to -z; sdelete -z.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A note on spotting bugs [Was: ENOSPC after conversion]

2014-12-12 Thread Patrik Lundquist
On 12 December 2014 at 14:29, Robert White rwh...@pobox.com wrote:

 You yourself even found the annotation in the wiki that said you should have
 e4defragged the system before conversion.

There's no mention of e4defrag on the Btrfs wiki, it says to btrfs
defrag before balance to avoid ENOSPC, as the last step of conversion.


 What you are experiencing is a little vexing, but it's not a bug. It's not
 even a huge problem. And if you'd stop banging your head against it it
 wouldn't be any sort of problem at all. Neither of us can change these
 facts.

I stopped banging my head several emails ago. I understand the problem
and I will start over.


 I feel your pain man, but thats about it.

I'm in no pain, it has been interesting. No data loss. No hurry.


 What more can I do?

The conversion wiki is lacking. It would be great if someone (maybe
you?) could expand upon the drawbacks of conversion.


 What is it that you want?

Nothing more.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ENOSPC after conversion [Was: Fixing Btrfs Filesystem Full Problems typo?]

2014-12-11 Thread Patrik Lundquist
I'll reboot the thread with a recap and my latest findings.

* Half full 3TB disk converted from ext4 to Btrfs, after first
verifying it with fsck.
* Undo subvolume deleted after being happy with the conversion.
* Recursive defrag.
* Full balance, that ended with 98 enospc errors during balance.

In that order, nothing in between. No snapshots or other subvolumes.
Loads of real free space.

Btrfs check reports a clean filesystem.

Btrfs balance -musage=100 -dusage=99 works, but not -dusage=100.

Conversion of metadata (~1.55 GiB) to DUP worked fine.

A theory, based on the error messages, is that some of the converted
files, even after defrag, still have extents larger than 1GiB and
hence don't fit in a native Btrfs extent.

Running defrag several more times and balance again doesn't help.

An error looks like:
BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664


The following script returned 46 filenames (looking up the block group
in the error):
grep -B 1 BTRFS error /var/log/syslog | grep relocating | cut -d ' ' -f 14 | \
while read block
do
echo Block group: $block
btrfs inspect-internal logical-resolve $block /mnt
done

The files are ranging from 41KiB to 6.6GiB in size, which doesn't seem
to support the theory of too large extents.

Moving the 46 files to another disk (no errors reported) and running
balance again resulted in 64 enospc errors during balance - down
from 98 errors.

Running the above script again gives this error for about half of the
block groups:
ioctl ret=-1, error: No such file or directory

I had no such errors the first time I looked up block groups.

What's the next step in zeroing in on the bug, before I start over?
And I will start over.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-11 Thread Patrik Lundquist
On 11 December 2014 at 09:42, Robert White rwh...@pobox.com wrote:
 On 12/10/2014 05:36 AM, Patrik Lundquist wrote:

 On 10 December 2014 at 13:17, Robert White rwh...@pobox.com wrote:

 On 12/09/2014 11:19 PM, Patrik Lundquist wrote:


 BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
 filesystem. That is, the recommended balance (and recursive defrag) is
 _not_
 a useability issue, its an efficiency issue.


 But if I can't start with an efficient filesystem I'd rather start
 over now/soon. I intend to add four more old disks for a RAID1 and it
 will be problematic to start over later on (I'd have to buy new, large
 disks).


 Nope, not an issue.

 When you add the space and rebalance with the conversions by adding all
 those other disks and such it will _completely_ _obliterate_ the current
 balance.

But if the issue is too large extents, why would they fit on any added
btrfs space?


 You are cleaning the house before the maid comes.

Indeed, as a health check. And the patient is slightly ill.


 If you are going to add four more volumes, if those volumes are big enough
 just make a new filesystem on them then copy the files over.

As it looks now, I will, but I also think there's a bug which I'm
trying to zero in on.


 I deleted the subvolume after being satisfied with the conversion,
 defragged recursively, and balanced. In that order.

 Yea, but your file system is full and you are out of space so get on with
 the adding space.

I don't think it is full. balance -musage=100 -dusage=99 completes
with ~1.5TB free space. The remaining unbalanced data is using full or
close to full blocks. Still can't speak for contiguous space though.


 (looking back through my mail spool) You haven't sent the output of /bin/df
 or btrfs fi df yet, I'd like to see what those two commands say.

I have posted these before, but not /bin/df (no access at the moment).

btrfs fi show
Label: none  uuid: 770fe01d-6a45-42b9-912e-
e8f8b413f6a4
Total devices 1 FS bytes used 1.35TiB
devid1 size 2.73TiB used 1.36TiB path /dev/sdc1


btrfs fi df /mnt
Data, single: total=1.35TiB, used=1.35TiB
System, single: total=32.00MiB, used=112.00KiB
Metadata, single: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


btrfs check /dev/sdc1
Checking filesystem on /dev/sdc1
UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
found 825003219475 bytes used err is 0
total csum bytes: 1452612464
total tree bytes: 1669943296
total fs tree bytes: 39600128
total extent tree bytes: 52903936
btree space waste bytes: 79921034
file data blocks allocated: 1487627730944
 referenced 1487627730944



 This would
 be quadruply true if you'd tweaked the block group ratios when you made
 the original file system.

 Ext4 created with defaults, but I think it has been completely full at one
 time.

 Did you use e4defrag before you did the conversion or is this the result of
 converting chaos most profound?

Didn't use e4defrag.



 Think of the time and worry you'd have saved if you'd copied the thing in
 the first place. 8-)

 But then I wouldn't learn as much. :-)

 Learning not to cut corners is a lesson... 8-)

This is more of an experiment than cutting corners, but yeah.


 TRUTH BE TOLD :: After two very eventful conversions not too long ago I
 just don't do those any more. The total amount of time I saved by not
 copying the files was in the negative numbers before I just copied the files
 onto an external media and reformatted and restored.

Conversion probably should be discouraged on the wiki then.


 It's like a choose-your-own-adventure book! 8-)

I like that! :-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-11 Thread Patrik Lundquist
On 11 December 2014 at 05:13, Duncan 1i5t5.dun...@cox.net wrote:

 Patrik correct me if I have this wrong, but filling in the history as I
 believe I have it...

You're right Duncan, except it began as a private question about an
error in a blog and went from there. Not that it matters, except the
subject is not very fitting anymore and I tried to reboot the thread
with a summary since it's getting a bit hard to find the facts.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC after conversion [Was: Fixing Btrfs Filesystem Full Problems typo?]

2014-12-11 Thread Patrik Lundquist
On 11 December 2014 at 11:18, Robert White rwh...@pobox.com wrote:
 So far I don't see a bug.

Fair enough, lets call it a huge problem with btrfs convert. I think
it warrants a note in the wiki.


 On 12/11/2014 12:18 AM, Patrik Lundquist wrote:

 Running defrag several more times and balance again doesn't help.

 That sounds correct as defrag defrags files, it does not reallocate extents.

From https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3

A notable caveat is that a balance can fail with ENOSPC if the
defragment is skipped. This is usually due to large extents on ext
being larger than the maximum size btrfs normally operates with (1
GB). A defrag of all large files will avoid this:

I interpreted it as breaking down large extents and reallocating them,
thus avoiding my current situation.


 There's a good chance that if you balanced again and again the number of no
 space errors might decrease. With only one 2-ish gig empty slot sliding
 around like one of those puzzles where you have to sort the numbers from 1
 to 15 by sliding them around in the 4x4=16 element grid.

I was never fond of those puzzles.


 The first step is admitting that you _don't_ have a problem.

I've got 99 problems and balance is one of them (the other are block
groups). :-)

Of course the filesystem is in a problematic state after the
conversion, even if it's not a bug. ~1.5TB of free space and yet out
of space and it can't be fixed with a balance. It might not be wrong
per se but it's very problematic from a user perspective.

Anyway, this thread has turned up lots of good information.


 You are _not_ out of space in which to create files. (or so I presume, you
 still haven't posted the output of /bin/df or btrfs filesystem df).

I'm not; creating new files works.

$ df
Filesystem  1K-blocks   Used  Available Use% Mounted on
/dev/sdc1  2930265088 1402223656 1526389096  48% /mnt

$ btrfs fi df /mnt
Data, single: total=1.41TiB, used=1.30TiB
System, DUP: total=32.00MiB, used=124.00KiB
Metadata, DUP: total=2.50GiB, used=1.49GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


 Your next step is to either add storage in accordance with your plan of
 adding four more volumes to make a RAID (as expressed elsewhere), or make a
 clean filesystem and copy your files over.

I've already decided to start over with a clean filesystem to get rid
of the ext4 legacy. I'm only curious about how to solve the balance
problem, and now I know how.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A note on spotting bugs [Was: ENOSPC after conversion]

2014-12-11 Thread Patrik Lundquist
On 11 December 2014 at 23:00, Robert White rwh...@pobox.com wrote:
 On 12/11/2014 12:18 AM, Patrik Lundquist wrote:

 * Full balance, that ended with 98 enospc errors during balance.

 Assuming that quote is an actual quote from the output of the balance...

It is, from dmesg.


 Bugs are unexpected things that cause failures and/or damage.

Not all errors are as pretty as

BTRFS info (device sdc1): relocating block group 1756675178496 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 1272844288
BTRFS: space_info 1 has 13703077888 free, is not full
BTRFS: space_info total=1504312295424, used=1487622750208, pinned=0,
reserved=2986196992, may_use=1308749824, readonly=270336

some are

BTRFS info (device sdc1): relocating block group 1780297498624 flags 1
[ cut here ]
WARNING: CPU: 2 PID: 11094 at
/build/linux-Y9HjRe/linux-3.16.7/fs/btrfs/extent-tree.c:7280
btrfs_alloc_free_block+0x219/0x450 [btrfs]()
BTRFS: block rsv returned -28
Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd
fscache sunrpc btrfs xor nls_utf8 nls_cp437 vfat fat kvm_intel
raid6_pq kvm crc32_pclmul jc42 coretemp ghash_clmulni_intel iTCO_wdt
ipmi_watchdog iTCO_vendor_support aesni_intel joydev aes_x86_64
efi_pstore lrw gf128mul evdev glue_helper ast ablk_helper lpc_ich
cryptd ttm pcspkr efivars mfd_core i2c_i801 drm_kms_helper drm tpm_tis
tpm acpi_cpufreq i2c_ismt shpchp button processor thermal_sys ipmi_si
ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 mbcache
jbd2 sg sd_mod crc_t10dif crct10dif_generic hid_generic usbhid hid
ahci libahci crct10dif_pclmul crct10dif_common crc32c_intel igb libata
ehci_pci i2c_algo_bit xhci_hcd ehci_hcd i2c_core dca scsi_mod ptp
usbcore pps_core usb_common
CPU: 2 PID: 11094 Comm: btrfs Tainted: GW 3.16.0-4-amd64
#1 Debian 3.16.7-2
Hardware name: Supermicro A1SAi/A1SAi, BIOS 1.0c 02/27/2014
 0009 81506b43 88032779f780 81065717
 88032d68a640 88032779f7d0 1000 8803117df480
  8106577c a0536338 0020
Call Trace:
 [81506b43] ? dump_stack+0x41/0x51
 [81065717] ? warn_slowpath_common+0x77/0x90
 [8106577c] ? warn_slowpath_fmt+0x4c/0x50
 [a04a8b09] ? btrfs_alloc_free_block+0x219/0x450 [btrfs]
 [81142bf6] ? free_hot_cold_page_list+0x46/0x90
 [a04dc5c8] ? read_extent_buffer+0xc8/0x120 [btrfs]
 [a0492c31] ? btrfs_copy_root+0x101/0x2e0 [btrfs]
 [a05032d1] ? create_reloc_root+0x201/0x2d0 [btrfs]
 [a0509398] ? btrfs_init_reloc_root+0x98/0xb0 [btrfs]
 [a04b9564] ? record_root_in_trans+0xa4/0xf0 [btrfs]
 [a04ba95f] ? btrfs_record_root_in_trans+0x3f/0x70 [btrfs]
 [a04bb940] ? start_transaction+0x90/0x560 [btrfs]
 [a04c605a] ? btrfs_evict_inode+0x33a/0x4d0 [btrfs]
 [811bf0ec] ? evict+0xac/0x170
 [a04c0762] ? btrfs_run_delayed_iputs+0xd2/0xf0 [btrfs]
 [a04bb812] ? btrfs_commit_transaction+0x922/0x9c0 [btrfs]
 [a04bb940] ? start_transaction+0x90/0x560 [btrfs]
 [a0504ea4] ? prepare_to_relocate+0xf4/0x1b0 [btrfs]
 [a0509e72] ? relocate_block_group+0x42/0x670 [btrfs]
 [a050a667] ? btrfs_relocate_block_group+0x1c7/0x2d0 [btrfs]
 [a04e0432] ? btrfs_relocate_chunk.isra.27+0x62/0x700 [btrfs]
 [a04928d1] ? btrfs_set_path_blocking+0x31/0x70 [btrfs]
 [a0497d8d] ? btrfs_search_slot+0x4ad/0xad0 [btrfs]
 [a04d1fd5] ? btrfs_get_token_64+0x55/0xf0 [btrfs]
 [a04e355b] ? btrfs_balance+0x82b/0xe80 [btrfs]
 [a04eaba4] ? btrfs_ioctl_balance+0x154/0x500 [btrfs]
 [a04ef89c] ? btrfs_ioctl+0x58c/0x2b10 [btrfs]
 [811670f1] ? handle_mm_fault+0xa91/0x11a0
 [810562a1] ? __do_page_fault+0x1d1/0x4e0
 [8116afc1] ? vma_link+0xb1/0xc0
 [811b788f] ? do_vfs_ioctl+0x2cf/0x4b0
 [811b7af1] ? SyS_ioctl+0x81/0xa0
 [8150ecc8] ? page_fault+0x28/0x30
 [8150cc2d] ? system_call_fast_compare_end+0x10/0x15
---[ end trace 880987d36ae50245 ]---
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 8384299008 free, is not full
BTRFS: space_info total=1500017328128, used=1491533037568, pinned=0,
reserved=99807232, may_use=2147475456, readonly=184320
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-10 Thread Patrik Lundquist
On 10 December 2014 at 13:17, Robert White rwh...@pobox.com wrote:
 On 12/09/2014 11:19 PM, Patrik Lundquist wrote:

 BUT FIRST UNDERSTAND: you do _not_ need to balance a newly converted
 filesystem. That is, the recommended balance (and recursive defrag) is _not_
 a useability issue, its an efficiency issue.

But if I can't start with an efficient filesystem I'd rather start
over now/soon. I intend to add four more old disks for a RAID1 and it
will be problematic to start over later on (I'd have to buy new, large
disks).

I deleted the subvolume after being satisfied with the conversion,
defragged recursively, and balanced. In that order.


 Because you made a backup and everything yes?

Shh!


 So anyway. Your system isn't bugged or broken it's full but its a
 fragmented fullness that has lots of free sectors but insufficent contiguous
 free sectors, so it cannot satisfy the request.

It's a half full 3TB disk. There _is_ space, somewhere. I can't speak
for contiguous space though.


 I don't know how to interpret the space_info error. Why is only
 4773171200 (4,4GiB) free?
 Can I inspect block group 1821099687936 to try to find out what makes
 it problematic?

 BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
 BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
 BTRFS: space_info 1 has 4773171200 free, is not full
 BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
 reserved=99700736, may_use=2102390784, readonly=241664


 So it was looking for a single chunk 2013265920 bytes long and it couldn't
 find one because all the spaces were smaller and there was no room to make a
 new suitable space.

 The problem is that it wanted 2013265920 bytes and while the system as a
 whole had no way to satisfy that desire. It asked for something just shy of
 two gigs as a single extent. That's a tough order on a full platter.

 Since your entire free size is 2102390784 that is an attempt to allocate
 about 80% of your free space as one contiguous block. That's never going to
 happen. 8-)

What about space_info 1 has 4773171200 free? Besides the other 1,5TB
free space.


 I don't even know if 2GiB is normally a legal size for an extent. My
 understanding is that data is allocated in 1G chunks, so I'd expect all
 extents to be smaller than 1G.

The 'summary' after the failed balances is always something like 98
enospc errors which now makes me suspect that I have 98 files with
extents larger than 1GiB that the defrag didn't take care of.

So if I can find out which files have 1GiB extents I can then copy
them back and forth to solve the problem.

Maybe running defrag more times can also solve it? Can I get a list of
fragmented files?

Suppose an old file with 2GiB extent isn't fragmented, will btrfs
defrag still try to defrag it?


 After a quick glance at the btrfs-convert, it looks like it might make some
 pretty atypical extents if the underlying donor filesystem needed needed
 them. It wouldn't have had a choice. So it's easily within the realm of
 reason that you'd have some really fascinating data as a result of
 converting a nearly full EXT4 file system of the Terabyte+ size.

It was about half full at conversion.


 This would
 be quadruply true if you'd tweaked the block group ratios when you made the
 original file system.

Ext4 created with defaults, but I think it has been completely full at one time.


 So since you have nice backups... you should probably drop the ext2_saved
 subvolume and then get on with your life for good or ill.

Done before defrag and balance attempts.


 Think of the time and worry you'd have saved if you'd copied the thing in
 the first place. 8-)

But then I wouldn't learn as much. :-)


 P.S. you should re-balance your System and Metadata as DUP for now. Two
 copies of that stuff is better than one as right now you have no real
 recovery path for that stuff. If you didn't make that change on purpose
 it
 probably got down-revved from DUP automagically when you tired to RAID
 it.


 Good point. Maybe btrfs-convert should do that by default? I don't
 think it has ever been DUP.

 Eyup.

And the metadata is now DUP. That's ~1.5GB extra metadata that was
allocated just fine after the failed balance.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-10 Thread Patrik Lundquist
On 10 December 2014 at 14:11, Duncan 1i5t5.dun...@cox.net wrote:

 From there... I've never used it but I /think/ btrfs inspect-internal
 logical-resolve should let you map the 182109... address to a filename.
 From there, moving that file out of the filesystem and back in should
 eliminate that issue.

btrfs inspect-internal logical-resolve 1821099687936 /mnt gives me the
filename and it's only a 54175 bytes file.


 Assuming no snapshots still contain the file, of course, and that the
 ext* saved subvolume has already been deleted.

Got no snapshots or subvolumes. Keeping it simple for now.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-10 Thread Patrik Lundquist
On 10 December 2014 at 13:47, Duncan 1i5t5.dun...@cox.net wrote:

 The recursive btrfs defrag after deleting the saved ext* subvolume
 _should_ have split up any such  1 GiB extents so balance could deal
 with them, but either it failed for some reason on at least one such
 file, or there's some other weird corner-case going on, very likely
 something else having to do with the conversion.

I've run defrag several times again and it doesn't do anything additional.


 Patrik, assuming no btrfs snapshots yet, can you do a du --all --block-
 size=1M | sort -n (or similar), then take a look at all results over 1024
 (1 GiB since the du specified 1 MiB blocks), and see if it's reasonable
 to move all those files out of the filesystem and back?

Good idea, but it's quite a lot of files. I'd rather start over.

But I've identified 46 files from Btrfs errors in syslog and will try
to move them to another disk. They're ranging from 41KiB to 6.6GiB in
size.

Is btrfs-debug-tree -e useful in finding problematic files?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-10 Thread Patrik Lundquist
On 10 December 2014 at 23:28, Robert White rwh...@pobox.com wrote:
 On 12/10/2014 10:56 AM, Patrik Lundquist wrote:

 On 10 December 2014 at 14:11, Duncan 1i5t5.dun...@cox.net wrote:

 Assuming no snapshots still contain the file, of course, and that the
 ext* saved subvolume has already been deleted.

 Got no snapshots or subvolumes. Keeping it simple for now.

 Does that mean that you have already manually removed the subvolume that was
 automatically created by btrfs-convert?

Yes.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-09 Thread Patrik Lundquist
On 24 November 2014 at 13:35, Patrik Lundquist
patrik.lundqu...@gmail.com wrote:
 On 24 November 2014 at 05:23, Duncan 1i5t5.dun...@cox.net wrote:
 Patrik Lundquist posted on Sun, 23 Nov 2014 16:12:54 +0100 as excerpted:

 The balance run now finishes without errors with usage=99 and I think
 I'll leave it at that. No RAID yet but will convert to RAID1.

 Converting between raid modes is done with a balance, so if you can't get
 that last bit to balance, you can't do a full conversion to raid1.

 Good point! It slipped my mind. I'll report back if incremental
 balances eventually solves the balance after conversion ENOSPC
 problem.

I'm having no luck with a full balance of the converted filesystem.
Tried it again with Linux v3.18.0 and btrfs-progs v3.17.3.

What conclusions can be drawn from the following?

BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664
BTRFS: block group 234109272064 has 5368709120 bytes, 5368709120 used
0 pinned 0 reserved
BTRFS info (device sdc1): block group has cluster?: no
BTRFS info (device sdc1): 0 blocks of free space at or bigger than bytes is
BTRFS: block group 242699206656 has 5368709120 bytes, 5368709120 used
0 pinned 0 reserved
BTRFS info (device sdc1): block group has cluster?: no
BTRFS info (device sdc1): 0 blocks of free space at or bigger than bytes is
BTRFS: block group 339335970816 has 5368709120 bytes, 5368705024 used
0 pinned 0 reserved
BTRFS critical (device sdc1): entry offset 344704675840, bytes 4096, bitmap no


Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
Total devices 1 FS bytes used 1.35TiB
devid1 size 2.73TiB used 1.36TiB path /dev/sdc1


Data, single: total=1.35TiB, used=1.35TiB
System, single: total=32.00MiB, used=112.00KiB
Metadata, single: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


Checking filesystem on /dev/sdc1
UUID: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
found 825003219475 bytes used err is 0
total csum bytes: 1452612464
total tree bytes: 1669943296
total fs tree bytes: 39600128
total extent tree bytes: 52903936
btree space waste bytes: 79921034
file data blocks allocated: 1487627730944
 referenced 1487627730944
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-12-09 Thread Patrik Lundquist
On 10 December 2014 at 00:13, Robert White rwh...@pobox.com wrote:
 On 12/09/2014 02:29 PM, Patrik Lundquist wrote:

 Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
  Total devices 1 FS bytes used 1.35TiB
  devid1 size 2.73TiB used 1.36TiB path /dev/sdc1


 Data, single: total=1.35TiB, used=1.35TiB
 System, single: total=32.00MiB, used=112.00KiB
 Metadata, single: total=3.00GiB, used=1.55GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B


 Are you trying to convert a filesystem on a single device/partition to RAID
 1?

Not yet. I'm stuck at the full balance after the conversion from ext4.
I haven't added the disks for RAID1 and might need them for starting
over instead.

A balance with -musage=100 -dusage=99 works but a full fails. It would
be nice to nail the bug since the fs passes btrfs check and it seems
to be a clear ENOSPC bug.


I don't know how to interpret the space_info error. Why is only
4773171200 (4,4GiB) free?
Can I inspect block group 1821099687936 to try to find out what makes
it problematic?

BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664


 P.S. you should re-balance your System and Metadata as DUP for now. Two
 copies of that stuff is better than one as right now you have no real
 recovery path for that stuff. If you didn't make that change on purpose it
 probably got down-revved from DUP automagically when you tired to RAID it.

Good point. Maybe btrfs-convert should do that by default? I don't
think it has ever been DUP.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-28 Thread Patrik Lundquist
On 25 November 2014 at 22:34, Phillip Susi ps...@ubuntu.com wrote:
 On 11/19/2014 7:05 PM, Chris Murphy wrote:
  I'm not a hard drive engineer, so I can't argue either point. But
  consumer drives clearly do behave this way. On Linux, the kernel's
  default 30 second command timer eventually results in what look
  like link errors rather than drive read errors. And instead of the
  problems being fixed with the normal md and btrfs recovery
  mechanisms, the errors simply get worse and eventually there's data
  loss. Exhibits A, B, C, D - the linux-raid list is full to the brim
  of such reports and their solution.

 I have seen plenty of error logs of people with drives that do
 properly give up and return an error instead of timing out so I get
 the feeling that most drives are properly behaved.  Is there a
 particular make/model of drive that is known to exhibit this silly
 behavior?

I had a couple of Seagate Barracuda 7200.11 (codename Moose) drives
with seriously retarded firmware.

They never reported a read error AFAIK but began to time out instead.
They wouldn't even respond after a link reset. I had to power cycle
the disks.

Funny days with ddrescue. Got almost everything off them.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-28 Thread Patrik Lundquist
On 25 November 2014 at 23:14, Phillip Susi ps...@ubuntu.com wrote:
 On 11/19/2014 6:59 PM, Duncan wrote:

 The paper specifically mentioned that it wasn't necessarily the
 more expensive devices that were the best, either, but the ones
 that faired best did tend to have longer device-ready times.  The
 conclusion was that a lot of devices are cutting corners on
 device-ready, gambling that in normal use they'll work fine,
 leading to an acceptable return rate, and evidently, the gamble
 pays off most of the time.

 I believe I read the same study and don't recall any such conclusion.
  Instead the conclusion was that the badly behaving drives aren't
 ordering their internal writes correctly and flushing their metadata
 from ram to flash before completing the write request.  The problem
 was on the power *loss* side, not the power application.

I've found:

http://www.usenix.org/conference/fast13/technical-sessions/presentation/zheng
http://lkcl.net/reports/ssd_analysis.html

Are there any more studies?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-11-23 Thread Patrik Lundquist
On 23 November 2014 at 08:52, Duncan 1i5t5.dun...@cox.net wrote:
 [a whole lot]

Thanks for the long post, Duncan.

My venture into the finer details of balance began with converting an
ext4 fs to btrfs and after an inital defrag having a full balance fail
with about a third to go.

Consecutive full balances further reduced the number of chunks and got
me closer to finish without the infamous ENOSPC. After 3-4 full
balance runs it failed with less than 8% to go.

The balance run now finishes without errors with usage=99 and I think
I'll leave it at that. No RAID yet but will convert to RAID1.

Is it correct that there is no reason to ever do a 100% balance as
routine maintenance? I mean if you really need that last 1% space you
actually need a disk upgrade instead.

How about running a monthly maintenance job that uses bytes_used and
dev_item.bytes_used from btrfs-show-super to approximate the balance
need?

(dev_item.bytes_used - bytes_used) / bytes_used == extra device space used

The extra device space used after my balance usage=99 is 0,15%. It was
7,0% before I began tinkering with usage and ran into ENOSPC and I
think it is safe to assume that it was a lot more right after the fs
conversion.

So lets iterate a balance run which begins with usage=0 and increases
in steps of 5 or 10 and stops at 90 or 99 or when the extra device
space used is less than 1%.

Does it make sense?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-11-22 Thread Patrik Lundquist
On 22 November 2014 at 23:26, Marc MERLIN m...@merlins.org wrote:

 This one hurts my brain every time I think about it :)

I'm new to Btrfs so I may very well be wrong, since I haven't really
read up on it. :-)


 So, the bigger the -dusage number, the more work btrfs has to do.

Agreed.


 -dusage=0 does almost nothing
 -dusage=100 effectively rebalances everything

And -dusage=0 effectively reclaims empty chunks, right?


 But saying saying less than 95% full for -dusage=95 would mean
 rebalancing everything that isn't almost full,

But isn't that what rebalance does? Rewriting chunks =95% full to
completely full chunks and effectively defragmenting chunks and most
likely reduce the number of chunks.

A -dusage=0 rebalance reduced my number of chunks from 1173 to 998 and
dev_item.bytes_used went from 1593466421248 to 1491460947968.


 Now, just to be sure, if I'm getting this right, if your filesystem is
 55% full, you could rebalance all blocks that have less than 55% space
 free, and use -dusage=55

I realize that I interpret the usage parameter as operating on blocks
(chunks? are they the same in this case?) that are = 55% full while
you interpret it as = 55% free.

Which is correct?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html