Re: Kernel crash on mount after SMR disk trouble

2016-06-10 Thread Jukka Larja

10.6.2016, 23.20, Henk Slager kirjoitti:

On Sat, May 14, 2016 at 10:19 AM, Jukka Larja  wrote:

In short:

I added two 8TB Seagate Archive SMR disk to btrfs pool and tried to delete
one of the old disks. After some errors I ended up with file system that can
be mounted read-only, but crashes the kernel if mounted normally. Tried
btrfs check --repair (which noted that space cache needs to be zeroed) and
zeroing space cache (via mount parameter), but that didn't change anything.

Longer version:

I was originally running Debian Jessie with some pretty recent kernel (maybe
4.4), but somewhat older btrfs tools. After the trouble started, I tried


You should at least have kernel 4.4, the critical patch for supporting
this drive was added in 4.4-rc3 or 4.4-rc4, i dont remember exactly.
Only if you somehow disable NCQ completely in your linux system
(kernel and more) or use a HW chipset/bridge that does that for you it
might work.


After the crash I tracked the issue somewhat and found a discussion about 
very similar issue (starting with drives failing with dd or badblocks and 
ending, after several patches, to drives working in everything except maybe 
in Btrfs in certain cases). As far as I could tell, the 4.5 kernel has all 
the patches from that discussion, but I may have missed something that 
wasn't mentioned there.



updating (now running Kernel 4.5.1 and tools 4.4.1). I checked the new disks
with badblocks (no problems found), but based on some googling, Seagate's
SMR disks seem to have various problems, so the root cause is probably one
type or another of disk errors.


Seagate provides a special variant of the linux ext4 fs system that
should then play well with their SMR drive. Also the advice is to not
use this drive in a array setup; the risk is way to high that they
can't keep up with the demands of the higher layers and then get
resets or their FW crashes. You should have had also have a look at
your system's and drive timeouts (see scterc). To summarize: adding
those drives to an btrfs raid array is asking for trouble.


Increasing timeouts didn't help with the drive. Array freezes when drive 
drops out, then there's a crash when timeout occurs. It doesn't matter if 
the drive has come back in the mean time (drive doesn't return with same 
/dev/sdX, though I don't know if that matters for Btrfs).


I always thought that the problem with these drives was supposed to be bad 
performance and worse than usual ability to handle power going out. My use 
case is quite light from bytes written point of view, so I didn't expect 
trouble. Of course, doing the initial add + balance isn't light at all.


What I don't expect is what's essentially write errors. Pity, since the 
disks are dirt cheap compared to alternatives and I really don't care about 
performance.



I am using 1 such drive with an Intel J1900 SoC (Atom, SATA2) and it
works, although I get still the typical error occasionally. As it is
just a btrfs receive target, just 1 fs dup/dup/single for the whole
drive, all CoW, it survives those lockups or crashes, I just restart
the board+drive. In general, reading back multi-TB ro snapshots works
fine and is on par with Gbps LAN speeds.


I'll probably test those drives as a target for DVR backups, when I get them 
out of the array (still waiting for new drives with which to start over. 
Then I just tear down the old array).



Indeed kernel should not crash on such a case. It is not clear if you
run a 4.5.1 or 4.5.0 kernel in terms of kernel.org terminology, but
newer than 4.5.x probably does not help in this case.
You could try to mount with usebackuproot and then see if you can get
it writable, after setting long timeout values for the drive. If it
works, then remove those 2 SMRs from the array ASAP.


I understand that usebackuproot requires kernel >= 4.6. I probably won't be 
installing a custom kernel, but if I still have the array in its current 
state when 4.6 becomes available in Debian Stretch, I'll give it a try.


--
 ...Elämälle vierasta toimintaa...
 Jukka Larja, jla...@iki.fi, 0407679919

"... on paper looked like a great chip (10 GFs at 1.2 GHZ whith 35W"
"It's a mystery to me why people continue to use silicon - processors on 
paper are always faster and cooler :-)"

- lubemark and Richard Cownie on RWT forums -

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: recent complete stalls of btrfs (4.6.0-rc4+) -- any advice?

2016-06-10 Thread Chris Murphy
On Fri, Jun 10, 2016 at 5:41 PM, Yaroslav Halchenko <y...@onerussian.com> wrote:
> Dear BTRFS developers,
>
> First of all -- thanks for developing BTRFS!  So far it served really
> well, when others falling (or failing) behind in my initial evaluation
> (http://datalad.org/test_fs_analysis.html).  With btrbk backups are a
> breeze.  But it still does fail completely for me at times
> unfortunately.
>
> I know that I should upgrade the kernel, and I will now...  but I
> thought to share this incident(s) report since those might have been of
> some value.  Running Debian jessie but with manually built kernel.
> btrfs is extensively used for a high meta-data partition (lots of
> symlinks, lots of directories with a single file in them -- heave use of
> git-annex), snapshots are taken regularly etc.
>
> Setup -- btrfs on top of software raids:
>
> # btrfs fi show /mnt/btrfs/
> Label: 'tank'  uuid: b5fe7f5e-3478-4293-a42c-bf9ca26ea724
> Total devices 4 FS bytes used 21.07TiB
> devid2 size 10.92TiB used 5.30TiB path /dev/md10
> devid3 size 10.92TiB used 5.30TiB path /dev/md11
> devid4 size 10.92TiB used 5.30TiB path /dev/md12
> devid5 size 10.92TiB used 5.30TiB path /dev/md13
>
>
> Within last 5 days, the beast has stalled twice by now.  The last signs
> were:
>
> * 20160605 -- kernel kaboomed at btrfs level
>
> smaug login: [3675876.734400] Kernel panic - not syncing: stack-protector: 
> Kernel stack is corrupted in: a03d0354
> [3675876.734400]
> [3675876.745680] CPU: 9 PID: 651474 Comm: git Tainted: GW IO
> 4.6.0-rc4+ #1
> [3675876.753272] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 
> 09/17/2014
> [3675876.760431]  0086 5e62edd4 813098f5 
> 817cd080
> [3675876.768104]  880036f23da8 811701af 881e0010 
> 880036f23db8
> [3675876.775763]  880036f23d50 5e62edd4 880036f23d88 
> a03d0354
> [3675876.783426] Call Trace:
> [3675876.786057]  [] ? dump_stack+0x5c/0x77
> [3675876.791575]  [] ? panic+0xdf/0x226
> [3675876.796812]  [] ? btrfs_add_link+0x384/0x3e0 [btrfs]
> [3675876.803549]  [] ? __stack_chk_fail+0x17/0x30
> [3675876.809610]  [] ? btrfs_add_link+0x384/0x3e0 [btrfs]
> [3675876.816391]  [] ? btrfs_link+0x143/0x220 [btrfs]
> [3675876.822802]  [] ? vfs_link+0x1af/0x280
> [3675876.828331]  [] ? SyS_link+0x22a/0x260
> [3675876.833859]  [] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
> [3675876.840740] Kernel Offset: disabled
> [3675876.854050] ---[ end Kernel panic - not syncing: stack-protector: Kernel 
> stack is corrupted in: a03d0354
> [3675876.854050]
>
> * 20160610 -- again, different kaboom
>
> [443370.085059] CPU: 10 PID: 1044513 Comm: git-annex Tainted: GW IO   
>  4.6.0-rc4+ #1
> [443370.093268] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 
> 09/17/2014
> [443370.100356] task: 8806c463d0c0 ti: 8808f9dc8000 task.ti: 
> 8808f9dc8000
> [443370.107953] RIP: 0010:[]  [] 
> 0x88090f67be10
> [443370.115761] RSP: 0018:8808f9dcbe18  EFLAGS: 00010292
> [443370.121187] RAX: 88103fd95fc0 RBX: 8808f9dcc000 RCX: 
> 
> [443370.128438] RDX:  RSI: 8806c463d0c0 RDI: 
> 88103fd95fc0
> [443370.135693] RBP: 8808f9dcbe30 R08: 8808f9dc8000 R09: 
> 
> [443370.142940] R10: 000a R11:  R12: 
> 881035beedc8
> [443370.150184] R13: 880ff1106800 R14: 88123d6c R15: 
> 88123d6c0068
> [443370.157432] FS:  7f0ab3d83740() GS:88103fd8() 
> knlGS:
> [443370.165645] CS:  0010 DS:  ES:  CR0: 80050033
> [443370.171512] CR2: 88090f67be10 CR3: 000cf7516000 CR4: 
> 001406e0
> [443370.178758] Stack:
> [443370.180880]  88069dda93c0 a0358700 88069dda93c0 
> 880f
> [443370.188490]  8806c463d0c0 810bb560 8808f9dcbe48 
> 8808f9dcbe48
> [443370.196107]  d5ce3509 88069dda93c0 0001 
> 8806a64835c8
> [443370.203726] Call Trace:
> [443370.206310]  [] ? btrfs_commit_transaction+0x350/0xa30 
> [btrfs]
> [443370.213826]  [] ? wait_woken+0x90/0x90
> [443370.219280]  [] ? btrfs_sync_file+0x2fb/0x3d0 [btrfs]
> [443370.226012]  [] ? do_fsync+0x38/0x60
> [443370.231267]  [] ? SyS_fdatasync+0xf/0x20
> [443370.236870]  [] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
> [443370.243604] Code: 88 ff ff 21 67 5b 81 ff ff ff ff 00 00 6c 3d 12 88 ff 
> ff dd 77 35 a0 ff ff ff ff 00 00 00 00 00 00 00 00 40 e0 91 4b 08 88 ff ff 
> <60> b5 0b 81 ff ff ff ff f0 fd 61 8a 0c 88 ff ff 18 7c 79 3e 

recent complete stalls of btrfs (4.6.0-rc4+) -- any advice?

2016-06-10 Thread Yaroslav Halchenko
Dear BTRFS developers,

First of all -- thanks for developing BTRFS!  So far it served really
well, when others falling (or failing) behind in my initial evaluation
(http://datalad.org/test_fs_analysis.html).  With btrbk backups are a
breeze.  But it still does fail completely for me at times
unfortunately.

I know that I should upgrade the kernel, and I will now...  but I
thought to share this incident(s) report since those might have been of
some value.  Running Debian jessie but with manually built kernel.
btrfs is extensively used for a high meta-data partition (lots of
symlinks, lots of directories with a single file in them -- heave use of
git-annex), snapshots are taken regularly etc.

Setup -- btrfs on top of software raids:

# btrfs fi show /mnt/btrfs/
Label: 'tank'  uuid: b5fe7f5e-3478-4293-a42c-bf9ca26ea724
Total devices 4 FS bytes used 21.07TiB
devid2 size 10.92TiB used 5.30TiB path /dev/md10
devid3 size 10.92TiB used 5.30TiB path /dev/md11
devid4 size 10.92TiB used 5.30TiB path /dev/md12
devid5 size 10.92TiB used 5.30TiB path /dev/md13


Within last 5 days, the beast has stalled twice by now.  The last signs
were:

* 20160605 -- kernel kaboomed at btrfs level

smaug login: [3675876.734400] Kernel panic - not syncing: stack-protector: 
Kernel stack is corrupted in: a03d0354
[3675876.734400]
[3675876.745680] CPU: 9 PID: 651474 Comm: git Tainted: GW IO
4.6.0-rc4+ #1
[3675876.753272] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
[3675876.760431]  0086 5e62edd4 813098f5 
817cd080
[3675876.768104]  880036f23da8 811701af 881e0010 
880036f23db8
[3675876.775763]  880036f23d50 5e62edd4 880036f23d88 
a03d0354
[3675876.783426] Call Trace:
[3675876.786057]  [] ? dump_stack+0x5c/0x77
[3675876.791575]  [] ? panic+0xdf/0x226
[3675876.796812]  [] ? btrfs_add_link+0x384/0x3e0 [btrfs]
[3675876.803549]  [] ? __stack_chk_fail+0x17/0x30
[3675876.809610]  [] ? btrfs_add_link+0x384/0x3e0 [btrfs]
[3675876.816391]  [] ? btrfs_link+0x143/0x220 [btrfs]
[3675876.822802]  [] ? vfs_link+0x1af/0x280
[3675876.828331]  [] ? SyS_link+0x22a/0x260
[3675876.833859]  [] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[3675876.840740] Kernel Offset: disabled
[3675876.854050] ---[ end Kernel panic - not syncing: stack-protector: Kernel 
stack is corrupted in: a03d0354
[3675876.854050]

* 20160610 -- again, different kaboom

[443370.085059] CPU: 10 PID: 1044513 Comm: git-annex Tainted: GW IO
4.6.0-rc4+ #1
[443370.093268] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
[443370.100356] task: 8806c463d0c0 ti: 8808f9dc8000 task.ti: 
8808f9dc8000
[443370.107953] RIP: 0010:[]  [] 
0x88090f67be10
[443370.115761] RSP: 0018:8808f9dcbe18  EFLAGS: 00010292
[443370.121187] RAX: 88103fd95fc0 RBX: 8808f9dcc000 RCX: 

[443370.128438] RDX:  RSI: 8806c463d0c0 RDI: 
88103fd95fc0
[443370.135693] RBP: 8808f9dcbe30 R08: 8808f9dc8000 R09: 

[443370.142940] R10: 000a R11:  R12: 
881035beedc8
[443370.150184] R13: 880ff1106800 R14: 88123d6c R15: 
88123d6c0068
[443370.157432] FS:  7f0ab3d83740() GS:88103fd8() 
knlGS:
[443370.165645] CS:  0010 DS:  ES:  CR0: 80050033
[443370.171512] CR2: 88090f67be10 CR3: 000cf7516000 CR4: 
001406e0
[443370.178758] Stack:
[443370.180880]  88069dda93c0 a0358700 88069dda93c0 
880f
[443370.188490]  8806c463d0c0 810bb560 8808f9dcbe48 
8808f9dcbe48
[443370.196107]  d5ce3509 88069dda93c0 0001 
8806a64835c8
[443370.203726] Call Trace:
[443370.206310]  [] ? btrfs_commit_transaction+0x350/0xa30 
[btrfs]
[443370.213826]  [] ? wait_woken+0x90/0x90
[443370.219280]  [] ? btrfs_sync_file+0x2fb/0x3d0 [btrfs]
[443370.226012]  [] ? do_fsync+0x38/0x60
[443370.231267]  [] ? SyS_fdatasync+0xf/0x20
[443370.236870]  [] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[443370.243604] Code: 88 ff ff 21 67 5b 81 ff ff ff ff 00 00 6c 3d 12 88 ff ff 
dd 77 35 a0 ff ff ff ff 00 00 00 00 00 00 00 00 40 e0 91 4b 08 88 ff ff <60> b5 
0b 81 ff ff ff ff f0 fd 61 8a 0c 88 ff ff 18 7c 79 3e 00
[443370.264107] RIP  [] 0x88090f67be10
[443370.271044]  RSP 
[443370.276177] CR2: 88090f67be10
[443370.284979] ---[ end trace 2c4b690b49d17ebd ]---

and for the last case here is more details with dmesg showing apparently other 
tracebacks 
and errors logged before, so might be of help:

http://www.onerussian.com/tmp/dmesg-nonet.20160610.txt

Are those issues something which was fixed since 4.6.0-rc4+ or I should
be on look out for them to come back?  What other information should I
provide if I run into them again to help you troubleshoot/fix it?

P.S. Please CC me the replies

-- 
Yaro

Re: Cannot balance FS (No space left on device)

2016-06-10 Thread Hans van Kranenburg

On 06/11/2016 12:10 AM, ojab // wrote:

On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
 wrote:

You can work around it by either adding two disks (like Henk said), or by
temporarily converting some chunks to single. Just enough to get some free
space on the first two disks to get a balance going that can fill the third
one. You don't have to convert all of your data or metadata to single!

Something like:

btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/


Unfortunately it fails even if I set limit=1:

$ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x120): converting, target=281474976710656, soft is off, limit=1
ERROR: error during balancing '/mnt/xxx/': No space left on device
There may be more info in syslog - try dmesg | tail


Ah, apparently the balance operation *always* wants to allocate some new 
empty space before starting to look more close at the task you give it...


This means that it's trying to allocate a new set of RAID0 chunks 
first... and that's exactly the opposite of what we want to accomplish here.


If you really can add only one extra device now, there's always a more 
dirty way to get the job done.


What you can do for example is:
- partition the new disk in two partitions
- add them both to the filesystem (btrfs doesn't know both block devices 
are on the same physical disk, ghehe)

- convert a small number of data blocks to single
- then device delete the third disk again so the single chunks move back 
to the two first disks

- add the third disk back as one whole block device
- etc...

:D

Moo,

--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allocator behaviour during device delete

2016-06-10 Thread Hans van Kranenburg

On 06/10/2016 09:58 PM, Hans van Kranenburg wrote:

On 06/10/2016 09:26 PM, Henk Slager wrote:

On Thu, Jun 9, 2016 at 3:54 PM, Brendan Hide
 wrote:


On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:


OK, I'm pretty sure I know what was going on in this case.  Your
assumption that device delete uses the balance code is correct, and
that
is why you see what's happening happening.  There are two key bits that
are missing though:
1. Balance will never allocate chunks when it doesn't need to.


In relation to discussions w.r.t. enospc and device full of chunks, I
say this 1. statement and I see different behavior with kernel 4.6.0
tools 4.5.3
On a idle fs with some fragmentation, I did balance -dusage=5, it
completes succesfuly and leaves and new empty chunk (highest vaddr).
Then balance -dusage=6, does 2 chunks with that usage level:
- the zero filled last chunk is replaced with a new empty chunk
(higher vaddr)
- the 2 usage=6 chunks are gone
- one chunk with the lowest vaddr saw its usage increase from 47 to 60
- several metadata chunks have change slightly in usage


I noticed the same thing, kernel 4.5.4, progs 4.4.1.

When balance starts doing anything, (so relocating >= 1 chunks, not when
relocating 0), it first creates a new empty chunk. Even if all data that
is balanced away is added to already existing chunks, the new empty one
is still always left behind.

When doing balance again with dusage=0, or repeatedly doing so, each
time a new empty chunk is created, and then the previous empty one is
removed, bumping up the start vaddr of the new chunk with 1GB each time.



Well, there it is:

commit 2c9fe835525896077e7e6d8e416b97f2f868edef

http://www.spinics.net/lists/linux-btrfs/msg47679.html

First the "I find it somewhat awkward that we always allocate a new data 
block group no matter what." section, and then the answer below:


"2: for filesystem with data, we have to create target-chunk in balance 
operation, this patch only make "creating-chunk" earlier"


^^ This overlooks the case in which creating a new chunk is not 
necessary at all, because all data can be appended to existing ones?


This also prevents ojab in the latest thread here to convert some chunks 
to single when his devices with RAID0 are full, because it forcibly 
tries to create new empty RAID0 space first, which is not going to be 
used at all, and which is the opposite behaviour of what is intented...


--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot balance FS (No space left on device)

2016-06-10 Thread ojab //
On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
 wrote:
> You can work around it by either adding two disks (like Henk said), or by
> temporarily converting some chunks to single. Just enough to get some free
> space on the first two disks to get a balance going that can fill the third
> one. You don't have to convert all of your data or metadata to single!
>
> Something like:
>
> btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/

Unfortunately it fails even if I set limit=1:
>$ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
>Dumping filters: flags 0x1, state 0x0, force is off
>  DATA (flags 0x120): converting, target=281474976710656, soft is off, limit=1
>ERROR: error during balancing '/mnt/xxx/': No space left on device
>There may be more info in syslog - try dmesg | tail

//wbr ojab
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot balance FS (No space left on device)

2016-06-10 Thread Hans van Kranenburg

On 06/10/2016 11:33 PM, ojab // wrote:

On Fri, Jun 10, 2016 at 9:00 PM, Henk Slager  wrote:

I have seldom seen an fs so full, very regular numbers :)

But can you provide the output of this script:
https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py

It gives better info w.r.t. devices and it is then easier to say what
has to be done.

But you have btrfs raid0 data (2 stripes) and raid1 metadata, and they
both want 2 devices currently and there is only one device with place
for your 2G chunks. So in theory you need 2 empty devices added for a
balance to succeed. If you can allow reduces redundancy for some time,
you could shrink the fs used space on hdd1 to half, same for the
partition itself, add a hdd2 parttition and add that to the fs. Or
just add another HDD.
Then your 50Gb of deletions could get into effect if you start
balancing. Also have a look at the balance stripe filters I would say.


Output of show_usage.py:
https://gist.githubusercontent.com/ojab/850276af6ff3aa566b8a3ce6ec444521/raw/4d77e02d556ed0edb0f9823259f145f65e80bc66/gistfile1.txt
Looks like I only have smaller spare drives at the moment (largest is
100GB), is it ok to use? Or there is some minimal drive size needed
for my setup?


You can work around it by either adding two disks (like Henk said), or 
by temporarily converting some chunks to single. Just enough to get some 
free space on the first two disks to get a balance going that can fill 
the third one. You don't have to convert all of your data or metadata to 
single!


Something like:

btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/

New allocated chunks will go to the third disk, because it has the most 
free space.


After this, you can convert the single data back to raid0:

btrfs balance start -v -dconvert=raid0,soft /mnt/xxx/

soft is important, because it only touches everything that is not raid0 yet.

And in the end there should be a few GB of free space on the first two 
disks, so you can do the big balance to spread all data over the three 
disks, just btrfs balance start -v -dusage=100 /mnt/xxx/


Review the commands before doing anything, as I haven't tested this 
here. The man page for btrfs-balance contains all info :)


Looking at btrfs balance status, btrfs fi show etc, in another terminal 
while it's working is always nice, so you see what's happening, and you 
can always stop it when you think it moved around enough data with btrfs 
balance cancel.


Moo,

--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot balance FS (No space left on device)

2016-06-10 Thread ojab //
On Fri, Jun 10, 2016 at 9:00 PM, Henk Slager  wrote:
> I have seldom seen an fs so full, very regular numbers :)
>
> But can you provide the output of this script:
> https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py
>
> It gives better info w.r.t. devices and it is then easier to say what
> has to be done.
>
> But you have btrfs raid0 data (2 stripes) and raid1 metadata, and they
> both want 2 devices currently and there is only one device with place
> for your 2G chunks. So in theory you need 2 empty devices added for a
> balance to succeed. If you can allow reduces redundancy for some time,
> you could shrink the fs used space on hdd1 to half, same for the
> partition itself, add a hdd2 parttition and add that to the fs. Or
> just add another HDD.
> Then your 50Gb of deletions could get into effect if you start
> balancing. Also have a look at the balance stripe filters I would say.

Output of show_usage.py:
https://gist.githubusercontent.com/ojab/850276af6ff3aa566b8a3ce6ec444521/raw/4d77e02d556ed0edb0f9823259f145f65e80bc66/gistfile1.txt
Looks like I only have smaller spare drives at the moment (largest is
100GB), is it ok to use? Or there is some minimal drive size needed
for my setup?

//wbr ojab
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot balance FS (No space left on device)

2016-06-10 Thread Henk Slager
On Fri, Jun 10, 2016 at 8:04 PM, ojab //  wrote:
> [Please CC me since I'm not subscribed to the list]
> Hi,
> I've tried to `/usr/bin/btrfs fi defragment -r` my btrfs partition,
> but it's failed w/ "No space left on device" and now I can't get any
> free space on that partition (deleting some files or adding new device
> doesn't help). During defrag I've used `space_cache=v2` mount option,
> but remounted FS w/ `clear_cache` flag since then. Also I've deleted
> about 50Gb of files and added new 250Gb disk since then:
>
>>$ df -h /mnt/xxx/
>>Filesystem  Size  Used Avail Use% Mounted on
>>/dev/sdc1   2,1T  1,8T   37G  99% /mnt/xxx
>>$ sudo /usr/bin/btrfs fi show
>>Label: none  uuid: 8a65465d-1a8c-4f80-abc6-c818c38567c3
>>Total devices 3 FS bytes used 1.78TiB
>>devid1 size 931.51GiB used 931.51GiB path /dev/sdc1
>>devid2 size 931.51GiB used 931.51GiB path /dev/sdb1
>>devid3 size 230.41GiB used 0.00B path /dev/sdd1
>>$ sudo /usr/bin/btrfs fi usage /mnt/xxx/
>>Overall:
>>Device size:   2.04TiB
>>Device allocated:  1.82TiB
>>Device unallocated:230.41GiB
>>Device missing:0.00B
>>Used:  1.78TiB
>>Free (estimated):  267.23GiB  (min: 152.03GiB)
>>Data ratio:1.00
>>Metadata ratio:2.00
>>Global reserve:512.00MiB  (used: 0.00B)
>>
>>Data,RAID0: Size:1.81TiB, Used:1.78TiB
>>   /dev/sdb1   928.48GiB
>>   /dev/sdc1   928.48GiB
>>
>>Metadata,RAID1: Size:3.00GiB, Used:2.30GiB
>>   /dev/sdb1   3.00GiB
>>   /dev/sdc1   3.00GiB
>>
>>System,RAID1: Size:32.00MiB, Used:176.00KiB
>>   /dev/sdb132.00MiB
>>   /dev/sdc132.00MiB
>>
>>Unallocated:
>>   /dev/sdb1   1.01MiB
>>   /dev/sdc1   1.00MiB
>>   /dev/sdd1   230.41GiB
>>$ sudo /usr/bin/btrfs balance start -dusage=66 /mnt/xxx/
>>Done, had to relocate 0 out of 935 chunks
>>$ sudo /usr/bin/btrfs balance start -dusage=67 /mnt/xxx/
>>ERROR: error during balancing '/mnt/xxx/': No space left on device
>>There may be more info in syslog - try dmesg | tail
>
> I assume that there is something wrong with metadata, since I can copy
> files to FS.
> I'm on 4.6.2 vanilla kernel and using btrfs-progs-4.6, btrfs-debugfs
> output can be found here:
> https://gist.githubusercontent.com/ojab/1a8b1f83341403a169a8e66995c7c3da/raw/61621d22f706d7543a93a3d005415543af9a0db0/gistfile1.txt.
> Any hint what else can I try to fix the issue?

I have seldom seen an fs so full, very regular numbers :)

But can you provide the output of this script:
https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py

It gives better info w.r.t. devices and it is then easier to say what
has to be done.

But you have btrfs raid0 data (2 stripes) and raid1 metadata, and they
both want 2 devices currently and there is only one device with place
for your 2G chunks. So in theory you need 2 empty devices added for a
balance to succeed. If you can allow reduces redundancy for some time,
you could shrink the fs used space on hdd1 to half, same for the
partition itself, add a hdd2 parttition and add that to the fs. Or
just add another HDD.
Then your 50Gb of deletions could get into effect if you start
balancing. Also have a look at the balance stripe filters I would say.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: add check-only option for balance

2016-06-10 Thread Hans van Kranenburg

Hi,

Correct me if I'm wrong,

On 06/09/2016 11:46 PM, Ashish Samant wrote:

+/* return 0 if balance can remove a data block group, otherwise return 1 */
+static int search_data_bgs(const char *path)
+{
+   struct btrfs_ioctl_search_args args;
+   struct btrfs_ioctl_search_key *sk;
+   struct btrfs_ioctl_search_header *header;
+   struct btrfs_block_group_item *bg;
+   unsigned long off = 0;
+   DIR *dirstream = NULL;
+   int e;
+   int fd;
+   int i;
+   u64 total_free = 0;
+   u64 min_used = (u64)-1;
+   u64 free_of_min_used = 0;
+   u64 bg_of_min_used = 0;
+   u64 flags;
+   u64 used;
+   int ret = 0;
+   int nr_data_bgs = 0;
+
+   fd = btrfs_open_dir(path, , 1);
+   if (fd < 0)
+   return 1;
+
+   memset(, 0, sizeof(args));
+   sk = 
+
+   sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID;
+   sk->min_objectid = sk->min_offset = sk->min_transid = 0;
+   sk->max_objectid = sk->max_offset = sk->max_transid = (u64)-1;
+   sk->max_type = sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+   sk->nr_items = 65536;


This search returns not only block group information, but also 
everything else. You're first retrieving the complete extent tree to 
userspace, in buffers...



+
+   while (1) {
+   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, );
+   e = errno;
+   if (ret < 0) {
+   fprintf(stderr, "ret %d error '%s'\n", ret,
+   strerror(e));
+   return ret;
+   }
+   /*
+* it should not happen.
+*/
+   if (sk->nr_items == 0)
+   break;
+
+   off = 0;
+   for (i = 0; i < sk->nr_items; i++) {
+   header = (struct btrfs_ioctl_search_header *)(args.buf
+ + off);
+
+   off += sizeof(*header);
+   if (header->type == BTRFS_BLOCK_GROUP_ITEM_KEY) {


...and then just throwing 99.99% of the results away again. This is 
going to take a phenomenal amount of effort on a huge filesystem, 
copying unnessecary data around between the kernel and your program.


The first thing I learned myself when starting to play with the search 
ioctl is that the search doesn't happen in some kind of 3 dimensional 
space. You can't just filter on a type of object when walking the tree.


http://logs.tvrrug.org.uk/logs/%23btrfs/2016-02-13.html#2016-02-13T22:32:52

The sk->max_type = sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY only makes 
the search space start somewhere halfway objid 0 and end halfway objid 
max, including all other possible values for the type field for all 
objids in between.



+   bg = (struct btrfs_block_group_item *)
+   (args.buf + off);
+   flags = btrfs_block_group_flags(bg);
+   if (flags & BTRFS_BLOCK_GROUP_DATA) {
+   nr_data_bgs++;
+   used = btrfs_block_group_used(bg);
+   printf(
+   "block_group %15llu (len %11llu used 
%11llu)\n",
+   header->objectid,
+   header->offset, used);
+   total_free += header->offset - used;
+   if (min_used >= used) {
+   min_used = used;
+   free_of_min_used =
+   header->offset - used;
+   bg_of_min_used =
+   header->objectid;
+   }
+   }
+   }
+
+   off += header->len;
+   sk->min_objectid = header->objectid;
+   sk->min_type = header->type;
+   sk->min_offset = header->offset;


When the following is a part of your extent tree...

key (289406976 EXTENT_ITEM 19193856) itemoff 15718 itemsize 53
extent refs 1 gen 11 flags DATA
extent data backref root 5 objectid 258 offset 0 count 1

key (289406976 BLOCK_GROUP_ITEM 1073741824) itemoff 15694 itemsize 24
block group used 24612864 chunk_objectid 256 flags DATA

...and when the extent_item just manages to squeeze in as last result 
into the current result buffer from the ioctl...


...then your search key looks like (289406976 168 19193856) after 
copying the values from the last seen object...



+   }
+   sk->nr_items = 65536;
+
+ 

Re: Kernel crash on mount after SMR disk trouble

2016-06-10 Thread Henk Slager
On Sat, May 14, 2016 at 10:19 AM, Jukka Larja  wrote:
> In short:
>
> I added two 8TB Seagate Archive SMR disk to btrfs pool and tried to delete
> one of the old disks. After some errors I ended up with file system that can
> be mounted read-only, but crashes the kernel if mounted normally. Tried
> btrfs check --repair (which noted that space cache needs to be zeroed) and
> zeroing space cache (via mount parameter), but that didn't change anything.
>
> Longer version:
>
> I was originally running Debian Jessie with some pretty recent kernel (maybe
> 4.4), but somewhat older btrfs tools. After the trouble started, I tried

You should at least have kernel 4.4, the critical patch for supporting
this drive was added in 4.4-rc3 or 4.4-rc4, i dont remember exactly.
Only if you somehow disable NCQ completely in your linux system
(kernel and more) or use a HW chipset/bridge that does that for you it
might work.

> updating (now running Kernel 4.5.1 and tools 4.4.1). I checked the new disks
> with badblocks (no problems found), but based on some googling, Seagate's
> SMR disks seem to have various problems, so the root cause is probably one
> type or another of disk errors.

Seagate provides a special variant of the linux ext4 fs system that
should then play well with their SMR drive. Also the advice is to not
use this drive in a array setup; the risk is way to high that they
can't keep up with the demands of the higher layers and then get
resets or their FW crashes. You should have had also have a look at
your system's and drive timeouts (see scterc). To summarize: adding
those drives to an btrfs raid array is asking for trouble.

I am using 1 such drive with an Intel J1900 SoC (Atom, SATA2) and it
works, although I get still the typical error occasionally. As it is
just a btrfs receive target, just 1 fs dup/dup/single for the whole
drive, all CoW, it survives those lockups or crashes, I just restart
the board+drive. In general, reading back multi-TB ro snapshots works
fine and is on par with Gbps LAN speeds.

> Here's the output of btrfs fi show:
>
> Label: none  uuid: 8b65962d-0982-449b-ac6f-1acc8397ceb9
> Total devices 12 FS bytes used 13.15TiB
> devid1 size 3.64TiB used 3.36TiB path /dev/sde1
> devid2 size 3.64TiB used 3.36TiB path /dev/sdg1
> devid3 size 3.64TiB used 3.36TiB path /dev/sdh1
> devid4 size 3.64TiB used 3.34TiB path /dev/sdf1
> devid5 size 1.82TiB used 1.44TiB path /dev/sdi1
> devid6 size 1.82TiB used 1.54TiB path /dev/sdl1
> devid7 size 1.82TiB used 1.51TiB path /dev/sdk1
> devid8 size 1.82TiB used 1.54TiB path /dev/sdj1
> devid9 size 3.64TiB used 3.31TiB path /dev/sdb1
> devid   10 size 3.64TiB used 3.36TiB path /dev/sda1
> devid   11 size 7.28TiB used 168.00GiB path /dev/sdc1
> devid   12 size 7.28TiB used 168.00GiB path /dev/sdd1
>
> Last two devices (11 and 12) are the new disks. After adding them, I first
> copied some new data in (about 130 GBs), which seemed to go fine. Then I
> tried to remove disk 5. After some time (about 30 GiBs written to 11 and
> 12), there were some errors and disk 11 or 12 dropped out and fs went
> read-only. After some trouble-shooting (googling), I decided the new disks
> were too iffy to trust and tried to remove them.
>
> I don't remember exactly what errors I got, but device delete operation was
> interrupted due to errors at least once or twice, before more serious
> trouble began. In between the attempts I updated the HBA's (an LSI 9300)
> firmware. After final device delete attempt the end result was that
> attempting to mount causes kernel to crash. I then tried updating kernel and
> running check --repair, but that hasn't helped. Mounting read-only seems to
> work perfectly, but I haven't tried copying everything to /dev/null or
> anything like that (just few files).
>
> The log of the crash (it is very repeatable) can be seen here:
> http://jane.aarghimedes.fi/~jlarja/tempe/btrfs-trouble/btrfs_crash_log.txt
>
> Snipped from start of that:
>
> touko 12 06:41:22 jane kernel: BTRFS info (device sda1): disk space caching
> is enabled
> touko 12 06:41:24 jane kernel: BTRFS info (device sda1): bdev /dev/sdd1
> errs: wr 0, rd 0, flush 1, corrupt 0, gen 0
> touko 12 06:41:39 jane kernel: BUG: unable to handle kernel NULL pointer
> dereference at 01f0
> touko 12 06:41:39 jane kernel: IP: []
> can_overcommit+0x1e/0xf0 [btrfs]
> touko 12 06:41:39 jane kernel: PGD 0
> touko 12 06:41:39 jane kernel: Oops:  [#1] SMP
>
> My dmesg log is here:
> http://jane.aarghimedes.fi/~jlarja/tempe/btrfs-trouble/dmesg.log
>
> Other information:
> Linux jane 4.5.0-1-amd64 #1 SMP Debian 4.5.1-1 (2016-04-14) x86_64 GNU/Linux
> btrfs-progs v4.4.1
>
> btrfs fi df /mnt/Allosaurus/
> Data, RAID1: total=13.13TiB, used=13.07TiB
> Data, single: total=8.00MiB, used=0.00B
> System, RAID1: 

[GIT PULL] Btrfs

2016-06-10 Thread Chris Mason
Hi Linus

My for-linus-4.7 branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-4.7

Has some fixes and some new self tests for btrfs.  The self tests are
usually disabled in the .config file (unless you're doing btrfs dev
work), and this bunch is meant to find problems with the 64K page
size patches.

Jeff has a patch to help people see if they are using the hardware
assist crc32c module, which really helps us nail down problems when
people ask why crcs are using so much CPU.

Otherwise, it's small fixes.

Feifei Xu (8) commits (+475/-361):
Btrfs: test_check_exists: Fix infinite loop when searching for free space 
entries (+2/-2)
Btrfs: self-tests: Execute page straddling test only when nodesize < 
PAGE_SIZE (+30/-19)
Btrfs: self-tests: Use macros instead of constants and add missing newline 
(+31/-18)
Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes 
(+32/-22)
Btrfs: self-tests: Fix extent buffer bitmap test fail on BE system (+11/-1)
Btrfs: Fix integer overflow when calculating bytes_per_bitmap (+7/-7)
Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsize (+7/-1)
Btrfs: self-tests: Support non-4k page size (+355/-291)

Liu Bo (3) commits (+104/-15):
Btrfs: clear uptodate flags of pages in sys_array eb (+2/-0)
Btrfs: add validadtion checks for chunk loading (+67/-15)
Btrfs: add more validation checks for superblock (+35/-0)

Josef Bacik (1) commits (+1/-0):
Btrfs: end transaction if we abort when creating uuid root

Jeff Mahoney (1) commits (+9/-2):
btrfs: advertise which crc32c implementation is being used at module load

Vinson Lee (1) commits (+1/-1):
btrfs: Use __u64 in exported linux/btrfs.h.

Total: (14) commits (+590/-379)

 fs/btrfs/ctree.c   |   6 +-
 fs/btrfs/disk-io.c |  20 +-
 fs/btrfs/disk-io.h |   2 +-
 fs/btrfs/extent_io.c   |  10 +-
 fs/btrfs/extent_io.h   |   4 +-
 fs/btrfs/free-space-cache.c|  18 +-
 fs/btrfs/hash.c|   5 +
 fs/btrfs/hash.h|   1 +
 fs/btrfs/super.c   |  57 --
 fs/btrfs/tests/btrfs-tests.c   |   6 +-
 fs/btrfs/tests/btrfs-tests.h   |  27 +--
 fs/btrfs/tests/extent-buffer-tests.c   |  13 +-
 fs/btrfs/tests/extent-io-tests.c   |  86 ++---
 fs/btrfs/tests/free-space-tests.c  |  76 +---
 fs/btrfs/tests/free-space-tree-tests.c |  30 +--
 fs/btrfs/tests/inode-tests.c   | 344 ++---
 fs/btrfs/tests/qgroup-tests.c  | 111 ++-
 fs/btrfs/volumes.c | 109 +--
 include/uapi/linux/btrfs.h |   2 +-
 19 files changed, 569 insertions(+), 358 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allocator behaviour during device delete

2016-06-10 Thread Hans van Kranenburg

On 06/10/2016 09:26 PM, Henk Slager wrote:

On Thu, Jun 9, 2016 at 3:54 PM, Brendan Hide  wrote:


On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:


OK, I'm pretty sure I know what was going on in this case.  Your
assumption that device delete uses the balance code is correct, and that
is why you see what's happening happening.  There are two key bits that
are missing though:
1. Balance will never allocate chunks when it doesn't need to.


In relation to discussions w.r.t. enospc and device full of chunks, I
say this 1. statement and I see different behavior with kernel 4.6.0
tools 4.5.3
On a idle fs with some fragmentation, I did balance -dusage=5, it
completes succesfuly and leaves and new empty chunk (highest vaddr).
Then balance -dusage=6, does 2 chunks with that usage level:
- the zero filled last chunk is replaced with a new empty chunk (higher vaddr)
- the 2 usage=6 chunks are gone
- one chunk with the lowest vaddr saw its usage increase from 47 to 60
- several metadata chunks have change slightly in usage


I noticed the same thing, kernel 4.5.4, progs 4.4.1.

When balance starts doing anything, (so relocating >= 1 chunks, not when 
relocating 0), it first creates a new empty chunk. Even if all data that 
is balanced away is added to already existing chunks, the new empty one 
is still always left behind.


When doing balance again with dusage=0, or repeatedly doing so, each 
time a new empty chunk is created, and then the previous empty one is 
removed, bumping up the start vaddr of the new chunk with 1GB each time.


--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allocator behaviour during device delete

2016-06-10 Thread Henk Slager
On Thu, Jun 9, 2016 at 3:54 PM, Brendan Hide  wrote:
>
>
> On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:
>>
>> On 2016-06-09 08:34, Brendan Hide wrote:
>>>
>>> Hey, all
>>>
>>> I noticed this odd behaviour while migrating from a 1TB spindle to SSD
>>> (in this case on a LUKS-encrypted 200GB partition) - and am curious if
>>> this behaviour I've noted below is expected or known. I figure it is a
>>> bug. Depending on the situation, it *could* be severe. In my case it was
>>> simply annoying.
>>>
>>> ---
>>> Steps
>>>
>>> After having added the new device (btrfs dev add), I deleted the old
>>> device (btrfs dev del)
>>>
>>> Then, whilst waiting for that to complete, I started a watch of "btrfs
>>> fi show /". Note that the below is very close to the output at the time
>>> - but is not actually copy/pasted from the output.
>>>
 Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
 Total devices 2 FS bytes used 115.03GiB
 devid1 size 0.00GiB used 298.06GiB path /dev/sda2
 devid2 size 200.88GiB used 0.00GiB path
 /dev/mapper/cryptroot
>>>
>>>
>>>
>>> devid1 is the old disk while devid2 is the new SSD
>>>
>>> After a few minutes, I saw that the numbers have changed - but that the
>>> SSD still had no data:
>>>
 Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
 Total devices 2 FS bytes used 115.03GiB
 devid1 size 0.00GiB used 284.06GiB path /dev/sda2
 devid2 size 200.88GiB used 0.00GiB path
 /dev/mapper/cryptroot
>>>
>>>
>>> The "FS bytes used" amount was changing a lot - but mostly stayed near
>>> the original total, which is expected since there was very little
>>> happening other than the "migration".
>>>
>>> I'm not certain of the exact point where it started using the new disk's
>>> space. I figure that may have been helpful to pinpoint. :-/
>>
>> OK, I'm pretty sure I know what was going on in this case.  Your
>> assumption that device delete uses the balance code is correct, and that
>> is why you see what's happening happening.  There are two key bits that
>> are missing though:
>> 1. Balance will never allocate chunks when it doesn't need to.

In relation to discussions w.r.t. enospc and device full of chunks, I
say this 1. statement and I see different behavior with kernel 4.6.0
tools 4.5.3
On a idle fs with some fragmentation, I did balance -dusage=5, it
completes succesfuly and leaves and new empty chunk (highest vaddr).
Then balance -dusage=6, does 2 chunks with that usage level:
- the zero filled last chunk is replaced with a new empty chunk (higher vaddr)
- the 2 usage=6 chunks are gone
- one chunk with the lowest vaddr saw its usage increase from 47 to 60
- several metadata chunks have change slightly in usage

It could be a 2-step datamove, but from just the states before and
after balance I can't prove that.

>> 2. The space usage listed in fi show is how much space is allocated to
>> chunks, not how much is used in those chunks.
>>
>> In this case, based on what you've said, you had a lot of empty or
>> mostly empty chunks.  As a result of this, the device delete was both
>> copying data, and consolidating free space.  If you have a lot of empty
>> or mostly empty chunks, it's not unusual for a device delete to look
>> like this until you start hitting chunks that have actual data in them.
>> The pri8mary point of this behavior is that it makes it possible to
>> directly switch to a smaller device without having to run a balance and
>> then a resize before replacing the device, and then resize again
>> afterwards.
>
>
> Thanks, Austin. Your explanation is along the lines of my thinking though.
>
> The new disk should have had *some* data written to it at that point, as it
> started out at over 600GiB in allocation (should have probably mentioned
> that already). Consolidating or not, I would consider data being written to
> the old disk to be a bug, even if it is considered minor.
>
> I'll set up a reproducible test later today to prove/disprove the theory. :)
>
> --
> __
> Brendan Hide
> http://swiftspirit.co.za/
> http://www.webafrica.co.za/?AFF1E97
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Replacing drives with larger ones in a 4 drive raid1

2016-06-10 Thread Jukka Larja

This is somewhat off topic but...

9.6.2016, 18.20, Duncan kirjoitti:


Are those the 8 TB SMR "archive" drives?

I haven't been following the issue very closely, but be aware that there
were serious issues with those drives a few kernels back, and that while
those issues are now fixed, the drives themselves operate rather
differently than normal drives, and simply don't work well in normal
usage.


Either the issues were not fixed or LSI Logic / Symbios Logic SAS3008 is 
incompatible with the drives (and an older model of theirs, which I don't 
have anymore) as well as Intel Corporation 8 Series/C220 Series Chipset 
Family 6-port SATA Controller 1 [AHCI mode] (rev 05).


I haven't been able to get the disks to fail with any other load but Btrfs. 
However, with that they fail spectacularly. They drop out and make enough 
mess to corrupt things beyond repair. (See 
https://www.spinics.net/lists/linux-btrfs/msg55218.html for more info.)


There's a slight change that I missed some relevant kernel update. When I 
get new disks and can get the array fixed (it still only mounts read-only), 
I'll do some testing with the SMR drives. If they work, that's great, but at 
the moment I wouldn't buy them for Btrfs use even if the workload or 
environmental characteristics wouldn't be a problem.


--
 ...Elämälle vierasta toimintaa...
Jukka Larja, roskak...@aarghimedes.fi

"Are we feeling better then?"
"I'm naming all the stars."
"You can't see the stars, love. That's the ceiling. Also, it's day."
"I can see them. But I've named them all the same name, and there's terrible 
confusion..."

- Spike & Drusilla, Buffy the Vampire Slayer -

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cannot balance FS (No space left on device)

2016-06-10 Thread ojab //
[Please CC me since I'm not subscribed to the list]
Hi,
I've tried to `/usr/bin/btrfs fi defragment -r` my btrfs partition,
but it's failed w/ "No space left on device" and now I can't get any
free space on that partition (deleting some files or adding new device
doesn't help). During defrag I've used `space_cache=v2` mount option,
but remounted FS w/ `clear_cache` flag since then. Also I've deleted
about 50Gb of files and added new 250Gb disk since then:

>$ df -h /mnt/xxx/
>Filesystem  Size  Used Avail Use% Mounted on
>/dev/sdc1   2,1T  1,8T   37G  99% /mnt/xxx
>$ sudo /usr/bin/btrfs fi show
>Label: none  uuid: 8a65465d-1a8c-4f80-abc6-c818c38567c3
>Total devices 3 FS bytes used 1.78TiB
>devid1 size 931.51GiB used 931.51GiB path /dev/sdc1
>devid2 size 931.51GiB used 931.51GiB path /dev/sdb1
>devid3 size 230.41GiB used 0.00B path /dev/sdd1
>$ sudo /usr/bin/btrfs fi usage /mnt/xxx/
>Overall:
>Device size:   2.04TiB
>Device allocated:  1.82TiB
>Device unallocated:230.41GiB
>Device missing:0.00B
>Used:  1.78TiB
>Free (estimated):  267.23GiB  (min: 152.03GiB)
>Data ratio:1.00
>Metadata ratio:2.00
>Global reserve:512.00MiB  (used: 0.00B)
>
>Data,RAID0: Size:1.81TiB, Used:1.78TiB
>   /dev/sdb1   928.48GiB
>   /dev/sdc1   928.48GiB
>
>Metadata,RAID1: Size:3.00GiB, Used:2.30GiB
>   /dev/sdb1   3.00GiB
>   /dev/sdc1   3.00GiB
>
>System,RAID1: Size:32.00MiB, Used:176.00KiB
>   /dev/sdb132.00MiB
>   /dev/sdc132.00MiB
>
>Unallocated:
>   /dev/sdb1   1.01MiB
>   /dev/sdc1   1.00MiB
>   /dev/sdd1   230.41GiB
>$ sudo /usr/bin/btrfs balance start -dusage=66 /mnt/xxx/
>Done, had to relocate 0 out of 935 chunks
>$ sudo /usr/bin/btrfs balance start -dusage=67 /mnt/xxx/
>ERROR: error during balancing '/mnt/xxx/': No space left on device
>There may be more info in syslog - try dmesg | tail

I assume that there is something wrong with metadata, since I can copy
files to FS.
I'm on 4.6.2 vanilla kernel and using btrfs-progs-4.6, btrfs-debugfs
output can be found here:
https://gist.githubusercontent.com/ojab/1a8b1f83341403a169a8e66995c7c3da/raw/61621d22f706d7543a93a3d005415543af9a0db0/gistfile1.txt.
Any hint what else can I try to fix the issue?

//wbr ojab
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: add check-only option for balance

2016-06-10 Thread Goffredo Baroncelli
Hi all,

On 2016-06-09 23:46, Ashish Samant wrote:
> From: Liu Bo 
> 
> This aims to decide whether a balance can reduce the number of
> data block groups and if it can, this shows the '-dvrange' block
> group's objectid.
> 
> With this, you can run
> 'btrfs balance start -c mnt' or 'btrfs balance start --check-only mnt'
> 
>  --
> $ btrfs balance start -c /mnt/btrfs
> Checking data block groups...
> block_group12582912 (len 8388608 used  786432)
> block_group  1103101952 (len  1073741824 used   536870912)
> block_group  2176843776 (len  1073741824 used  1073741824)
> total bgs 3 total_free 544473088 min_used bg 12582912 has (min_used 786432 
> free 7602176)
> run 'btrfs balance start -dvrange=12582912..12582913 your_mnt'
> 
> $ btrfs balance start -dvrange=12582912..12582913 /mnt/btrfs
> Done, had to relocate 1 out of 5 chunks
> 
> $ btrfs balance start -c /mnt/btrfs
> Checking data block groups...
> block_group  1103101952 (len  1073741824 used   537395200)
> block_group  2176843776 (len  1073741824 used  1073741824)
> total bgs 2 total_free 536346624 min_used bg 1103101952 has (min_used 
> 537395200 free 536346624)
>  --
> 
> So you now know how to babysit your btrfs in a smart way.

I think that it is an excellent tool. However I have some suggestions, most of 
these are from an user interface POV:

1) this should be a real command; it doesn't make sense at all that this 
command is a "sub command" of "btrfs bal start". I have two suggestion about 
that:
a) we could add a new sub-command to the "balance" family. Something like 
"btrfs bal analisys", where we could put some suggestions for a good balance
b) we could add a new sub-command to the "inspect" family. We could also add 
some feature like showing other block_gruop (system and metadata), and print 
their profile: i.e.

  # btrfs inspect block-group-analisys /
  Type   Mode  Start LenUsed
  Data   single  83806388224 1.00GiB   945.64MiB
  Data   single  84880130048 1.00GiB   890.60MiB
  Data   single  85953871872 1.00GiB   818.18MiB
  Data   single  87027613696 1.00GiB   835.58MiB
  Data   single  88101355520 1.00GiB  1023.91MiB
  System single  8917509734432.00MiB16.00KiB
  Metadata   single  89208651776 1.00GiB   614.88MiB
  [...]

further options could be added like showing only the most empty chunks, sorting 
by the Used value, filtering by type and/or profile

2) From a readability POV, I suggest to use the pretty_size() function to 
display a more readable "len" and "used".
3) For the same reason, I suggest to switch to a "tabular" format, like my 
example: it doesn't make sense to write for every line "block_group/len/used"...
4) when the usual balance command fails because ENOSPACE, we could suggest to 
use this new command

more notes below

BR
G.Baroncelli

> 
> Signed-off-by: Liu Bo 
> Signed-off-by: Ashish Samant 
> ---
>  cmds-balance.c |  127 
> +++-
>  1 files changed, 126 insertions(+), 1 deletions(-)
> 
> diff --git a/cmds-balance.c b/cmds-balance.c
> index 8f3bf5b..e2aab6c 100644
> --- a/cmds-balance.c
> +++ b/cmds-balance.c
> @@ -493,6 +493,116 @@ out:
>   return ret;
>  }
>  
> +/* return 0 if balance can remove a data block group, otherwise return 1 */
> +static int search_data_bgs(const char *path)
> +{
> + struct btrfs_ioctl_search_args args;
> + struct btrfs_ioctl_search_key *sk;
> + struct btrfs_ioctl_search_header *header;
> + struct btrfs_block_group_item *bg;
> + unsigned long off = 0;
> + DIR *dirstream = NULL;
> + int e;
> + int fd;
> + int i;
> + u64 total_free = 0;
> + u64 min_used = (u64)-1;
> + u64 free_of_min_used = 0;
> + u64 bg_of_min_used = 0;
> + u64 flags;
> + u64 used;
> + int ret = 0;
> + int nr_data_bgs = 0;
> +
> + fd = btrfs_open_dir(path, , 1);
> + if (fd < 0)
> + return 1;
> +
> + memset(, 0, sizeof(args));
> + sk = 
> +
> + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID;
> + sk->min_objectid = sk->min_offset = sk->min_transid = 0;
> + sk->max_objectid = sk->max_offset = sk->max_transid = (u64)-1;
> + sk->max_type = sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> + sk->nr_items = 65536;
> +
> + while (1) {
> + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, );
> + e = errno;
> + if (ret < 0) {
> + fprintf(stderr, "ret %d error '%s'\n", ret,
> + strerror(e));
> + return ret;
> + }
> + /*
> +  * it should not happen.
> +  */
> 

Re: fsck: to repair or not to repair

2016-06-10 Thread Henk Slager
On Fri, Jun 10, 2016 at 7:22 PM, Adam Borowski  wrote:
> On Fri, Jun 10, 2016 at 01:12:42PM -0400, Austin S. Hemmelgarn wrote:
>> On 2016-06-10 12:50, Adam Borowski wrote:
>> >And, as of coreutils 8.25, the default is no reflink, with "never" not being
>> >recognized even as a way to avoid an alias.  As far as I remember, this
>> >applies to every past version with support for reflinks too.
>> >
>> Odd, I could have sworn that was an option...
>>
>> And I do know there was talk at least at one point of adding it and
>> switching to reflink=auto by default.
>
> Yes please!
>
> It's hard to come with a good reason for not reflinking when it's possible
> -- the only one I see is if you have a nocow VM and want to slightly improve
> speed at a cost of lots of disk space.  And even then, there's cat a >b for
> that.

For a nocow VM imagefile, reflink anyhow does not work so cp
--reflink=auto would then just duplicate the whole thing, do doing a
'cp --reflink=never' (never works for --sparse), either silently or
with a warning/note.

For a cow VM imagefile, the only thing I do and want w.r.t. cp is
reflink=always, so I also vote for auto on by default.

If you want to 'defrag' a VM imagefile, using cat or dd and enough RAM
does a better and faster job than cp or btrfs manual defrag.

> And the cost on non-btrfs non-unmerged-xfs is a single syscall per file,
> that's utterly negligible compared to actually copying the data.
>
> --
> An imaginary friend squared is a real enemy.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-06-10 Thread Austin S. Hemmelgarn

On 2016-06-10 13:22, Adam Borowski wrote:

On Fri, Jun 10, 2016 at 01:12:42PM -0400, Austin S. Hemmelgarn wrote:

On 2016-06-10 12:50, Adam Borowski wrote:

And, as of coreutils 8.25, the default is no reflink, with "never" not being
recognized even as a way to avoid an alias.  As far as I remember, this
applies to every past version with support for reflinks too.


Odd, I could have sworn that was an option...

And I do know there was talk at least at one point of adding it and
switching to reflink=auto by default.


Yes please!

It's hard to come with a good reason for not reflinking when it's possible
-- the only one I see is if you have a nocow VM and want to slightly improve
speed at a cost of lots of disk space.  And even then, there's cat a >b for
that.
There are other arguments, the most common one being not changing user 
visible behavior.  There are (misguided) people who expect copying a 
file to mean you have two distinct copies of that file.


OTOH, it's not too hard to set up a system to do this, you just put:
alias cp='cp --reflink=auto'
into your bashrc (or something similar into whatever other shell you 
use).  I've been doing this since cp added support for it.


And the cost on non-btrfs non-unmerged-xfs is a single syscall per file,
that's utterly negligible compared to actually copying the data.
Actually, IIRC, it's an ioctl, not a syscall, which can be kind of 
expensive (I don't know how much more expensive, but ioctls are usually 
more expensive than syscalls).


Other things to keep in mind though that may impact this (either way):
1. There are other filesystems that support reflinks (OCFS2 and ZFS come 
immediately to mind).
2. Most of the filesystems that support reflinks are used more in 
enterprise situations, where the bit about not changing user visible 
behavior is a much stronger argument.
3. Even in enterprise situations, reflink capable filesystems are still 
unusual outside of petabyte scale data storage.
4. Last I checked, the most widely used filesystem that supports 
reflinks (ZFS) uses a different ioctl interface for them than most other 
Linux filesystems, which means more checking is needed than just calling 
one ioctl.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-06-10 Thread Adam Borowski
On Fri, Jun 10, 2016 at 01:12:42PM -0400, Austin S. Hemmelgarn wrote:
> On 2016-06-10 12:50, Adam Borowski wrote:
> >And, as of coreutils 8.25, the default is no reflink, with "never" not being
> >recognized even as a way to avoid an alias.  As far as I remember, this
> >applies to every past version with support for reflinks too.
> >
> Odd, I could have sworn that was an option...
> 
> And I do know there was talk at least at one point of adding it and
> switching to reflink=auto by default.

Yes please!

It's hard to come with a good reason for not reflinking when it's possible
-- the only one I see is if you have a nocow VM and want to slightly improve
speed at a cost of lots of disk space.  And even then, there's cat a >b for
that.

And the cost on non-btrfs non-unmerged-xfs is a single syscall per file,
that's utterly negligible compared to actually copying the data.

-- 
An imaginary friend squared is a real enemy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-06-10 Thread Austin S. Hemmelgarn

On 2016-06-10 12:50, Adam Borowski wrote:

On Fri, Jun 10, 2016 at 08:54:36AM -0700, Nikolaus Rath wrote:

On Jun 10 2016, "Austin S. Hemmelgarn"  wrote:

JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
it making reflinks.


I would have expected so, but at least in coreutils 8.23 the only valid
options are "never" and "auto" (at least according to cp --help and the
manpage).


Where do you get "never" from?

.--
cp: invalid argument ‘never’ for ‘--reflink’
Valid arguments are:
  - ‘auto’
  - ‘always’
Try 'cp --help' for more information.
`

And, as of coreutils 8.25, the default is no reflink, with "never" not being
recognized even as a way to avoid an alias.  As far as I remember, this
applies to every past version with support for reflinks too.


Odd, I could have sworn that was an option...

And I do know there was talk at least at one point of adding it and 
switching to reflink=auto by default.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs filesystem keeps allocating new chunks for no apparent reason

2016-06-10 Thread Henk Slager
On Thu, Jun 9, 2016 at 5:41 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Hans van Kranenburg posted on Thu, 09 Jun 2016 01:10:46 +0200 as
> excerpted:
>
>> The next question is what files these extents belong to. To find out, I
>> need to open up the extent items I get back and follow a backreference
>> to an inode object. Might do that tomorrow, fun.
>>
>> To be honest, I suspect /var/log and/or the file storage of mailman to
>> be the cause of the fragmentation, since there's logging from postfix,
>> mailman and nginx going on all day long in a slow but steady tempo.
>> While using btrfs for a number of use cases at work now, we normally
>> don't use it for the root filesystem. And the cases where it's used as
>> root filesystem don't do much logging or mail.
>
> FWIW, that's one reason I have a dedicated partition (and filesystem) for
> logs, here.  (The other reason is that should something go runaway log-
> spewing, I get a warning much sooner when my log filesystem fills up, not
> much later, with much worse implications, when the main filesystem fills
> up!)
>
>> And no, autodefrag is not in the mount options currently. Would that be
>> helpful in this case?
>
> It should be helpful, yes.  Be aware that autodefrag works best with
> smaller (sub-half-gig) files, however, and that it used to cause
> performance issues with larger database and VM files, in particular.

I don't know why you relate filesize and autodefrag. Maybe because you
say '... used to cause ...'.

autodefrag detects random writes and then tries to defrag a certain
range. Its scope size is 256K as far as I see from the code and over
time you see VM images that are on a btrfs fs (CoW, hourly ro
snapshots) having a lot of 256K (or a bit less) sized extents
according to what filefrag reports. I once wanted to try and change
the 256K to 1M or even 4M, but I haven't  come to that.
A 32G VM image would consist of 131072 extents for 256K, 32768 extents
for 1M, 8192 extents for 4M.

> There used to be a warning on the wiki about that, that was recently
> removed, so apparently it's not the issue that it was, but you might wish
> to monitor any databases or VMs with gig-plus files to see if it's going
> to be a performance issue, once you turn on autodefrag.

For very active databases, I don't know what the effects are, with or
without autodefrag ( either on SSD and/or HDD).
At least on HDD-only, so no persistent SSD caching and noautodefrag,
VMs will result in unacceptable performance soon.

> The other issue with autodefrag is that if it hasn't been on and things
> are heavily fragmented, it can at first drive down performance as it
> rewrites all these heavily fragmented files, until it catches up and is
> mostly dealing only with the normal refragmentation load.

I assume you mean that one only gets a performance drop if you
actually do new writes to the fragmented files since autodefrag on. It
shouldn't start defragging by itself AFAIK.

> Of course the
> best way around that is to run autodefrag from the first time you mount
> the filesystem and start writing to it, so it never gets overly
> fragmented in the first place.  For a currently in-use and highly
> fragmented filesystem, you have two choices, either backup and do a fresh
> mkfs.btrfs so you can start with a clean filesystem and autodefrag from
> the beginning, or doing manual defrag.
>
> However, be aware that if you have snapshots locking down the old extents
> in their fragmented form, a manual defrag will copy the data to new
> extents without releasing the old ones as they're locked in place by the
> snapshots, thus using additional space.  Worse, if the filesystem is
> already heavily fragmented and snapshots are locking most of those
> fragments in place, defrag likely won't help a lot, because the free
> space as well will be heavily fragmented.   So starting off with a clean
> and new filesystem and using autodefrag from the beginning really is your
> best bet.

If it is about multi-TB fs, I think most important is to have enough
unfragmented free space available and hopefully at the beginning of
the device if it is flat HDD. Maybe a  balance -ddrange=1M..<20% of
device> can do that, I haven't tried.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-06-10 Thread Nikolaus Rath
On Jun 10 2016, Adam Borowski  wrote:
> On Fri, Jun 10, 2016 at 08:54:36AM -0700, Nikolaus Rath wrote:
>> On Jun 10 2016, "Austin S. Hemmelgarn"  wrote:
>> > JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
>> > it making reflinks.
>> 
>> I would have expected so, but at least in coreutils 8.23 the only valid
>> options are "never" and "auto" (at least according to cp --help and the
>> manpage).
>
> Where do you get "never" from?

I meant to write "always" (as in my second mail, I thought I hit "cancel"
quickly enough).


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-06-10 Thread Adam Borowski
On Fri, Jun 10, 2016 at 08:54:36AM -0700, Nikolaus Rath wrote:
> On Jun 10 2016, "Austin S. Hemmelgarn"  wrote:
> > JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
> > it making reflinks.
> 
> I would have expected so, but at least in coreutils 8.23 the only valid
> options are "never" and "auto" (at least according to cp --help and the
> manpage).

Where do you get "never" from?

.--
cp: invalid argument ‘never’ for ‘--reflink’
Valid arguments are:
  - ‘auto’
  - ‘always’
Try 'cp --help' for more information.
`

And, as of coreutils 8.25, the default is no reflink, with "never" not being
recognized even as a way to avoid an alias.  As far as I remember, this
applies to every past version with support for reflinks too.

-- 
An imaginary friend squared is a real enemy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: prefix fsid to all trace events

2016-06-10 Thread Liu Bo
On Thu, Jun 09, 2016 at 07:48:01PM -0400, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> When using trace events to debug a problem, it's impossible to determine
> which file system generated a particular event.  This patch adds a
> macro to prefix standard information to the head of a trace event.
> 
> The extent_state alloc/free events are all that's left without an
> fs_info available.

Looks good to me.

Reviewed-by: Liu Bo 

Thanks,

-liubo

> 
> Signed-off-by: Jeff Mahoney 
> ---
>  fs/btrfs/delayed-ref.c   |   9 +-
>  fs/btrfs/extent-tree.c   |  10 +-
>  fs/btrfs/qgroup.c|  19 +--
>  fs/btrfs/qgroup.h|   9 +-
>  fs/btrfs/super.c |   2 +-
>  include/trace/events/btrfs.h | 282 
> ---
>  6 files changed, 182 insertions(+), 149 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 430b368..e7b1ec0 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -606,7 +606,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>   qrecord->num_bytes = num_bytes;
>   qrecord->old_roots = NULL;
>  
> - qexisting = btrfs_qgroup_insert_dirty_extent(delayed_refs,
> + qexisting = btrfs_qgroup_insert_dirty_extent(fs_info,
> +  delayed_refs,
>qrecord);
>   if (qexisting)
>   kfree(qrecord);
> @@ -615,7 +616,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>   spin_lock_init(_ref->lock);
>   mutex_init(_ref->mutex);
>  
> - trace_add_delayed_ref_head(ref, head_ref, action);
> + trace_add_delayed_ref_head(fs_info, ref, head_ref, action);
>  
>   existing = htree_insert(_refs->href_root,
>   _ref->href_node);
> @@ -682,7 +683,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
>   ref->type = BTRFS_TREE_BLOCK_REF_KEY;
>   full_ref->level = level;
>  
> - trace_add_delayed_tree_ref(ref, full_ref, action);
> + trace_add_delayed_tree_ref(fs_info, ref, full_ref, action);
>  
>   ret = add_delayed_ref_tail_merge(trans, delayed_refs, head_ref, ref);
>  
> @@ -739,7 +740,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>   full_ref->objectid = owner;
>   full_ref->offset = offset;
>  
> - trace_add_delayed_data_ref(ref, full_ref, action);
> + trace_add_delayed_data_ref(fs_info, ref, full_ref, action);
>  
>   ret = add_delayed_ref_tail_merge(trans, delayed_refs, head_ref, ref);
>  
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 689d25a..ecb68bb 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2194,7 +2194,7 @@ static int run_delayed_data_ref(struct 
> btrfs_trans_handle *trans,
>   ins.type = BTRFS_EXTENT_ITEM_KEY;
>  
>   ref = btrfs_delayed_node_to_data_ref(node);
> - trace_run_delayed_data_ref(node, ref, node->action);
> + trace_run_delayed_data_ref(root->fs_info, node, ref, node->action);
>  
>   if (node->type == BTRFS_SHARED_DATA_REF_KEY)
>   parent = ref->parent;
> @@ -2349,7 +2349,7 @@ static int run_delayed_tree_ref(struct 
> btrfs_trans_handle *trans,
>SKINNY_METADATA);
>  
>   ref = btrfs_delayed_node_to_tree_ref(node);
> - trace_run_delayed_tree_ref(node, ref, node->action);
> + trace_run_delayed_tree_ref(root->fs_info, node, ref, node->action);
>  
>   if (node->type == BTRFS_SHARED_BLOCK_REF_KEY)
>   parent = ref->parent;
> @@ -2413,7 +2413,8 @@ static int run_one_delayed_ref(struct 
> btrfs_trans_handle *trans,
>*/
>   BUG_ON(extent_op);
>   head = btrfs_delayed_node_to_head(node);
> - trace_run_delayed_ref_head(node, head, node->action);
> + trace_run_delayed_ref_head(root->fs_info, node, head,
> +node->action);
>  
>   if (insert_reserved) {
>   btrfs_pin_extent(root, node->bytenr,
> @@ -8316,7 +8317,8 @@ static int record_one_subtree_extent(struct 
> btrfs_trans_handle *trans,
>  
>   delayed_refs = >transaction->delayed_refs;
>   spin_lock(_refs->lock);
> - if (btrfs_qgroup_insert_dirty_extent(delayed_refs, qrecord))
> + if (btrfs_qgroup_insert_dirty_extent(trans->root->fs_info,
> +  delayed_refs, qrecord))
>   kfree(qrecord);
>   spin_unlock(_refs->lock);
>  
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 9d4c05b..13e28d8 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1453,9 +1453,10 @@ int btrfs_qgroup_prepare_account_extents(struct 
> btrfs_trans_handle *trans,
>   return ret;
>  }
>  
> -struct 

Re: fsck: to repair or not to repair

2016-06-10 Thread Nikolaus Rath
On Jun 10 2016, "Austin S. Hemmelgarn"  wrote:
> JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
> it making reflinks.

I would have expected so, but at least in coreutils 8.23 the only valid
options are "always" and "auto" (at least according to cp --help and the
manpage).

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-06-10 Thread Nikolaus Rath
On Jun 10 2016, "Austin S. Hemmelgarn"  wrote:
> JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
> it making reflinks.

I would have expected so, but at least in coreutils 8.23 the only valid
options are "never" and "auto" (at least according to cp --help and the
manpage).

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: doc: add missing newline in btrfs-convert

2016-06-10 Thread Noah Massey
Signed-off-by: Noah Massey 
---
 Documentation/btrfs-convert.asciidoc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/btrfs-convert.asciidoc 
b/Documentation/btrfs-convert.asciidoc
index 28f9a39..ab3577d 100644
--- a/Documentation/btrfs-convert.asciidoc
+++ b/Documentation/btrfs-convert.asciidoc
@@ -90,6 +90,7 @@ are supported by old kernels. To disable a feature, prefix it 
with '^'.
 To see all available features that btrfs-convert supports run:
 +
 +btrfs-convert -O list-all+
++
 -p|--progress::
 show progress of conversion, on by default
 --no-progress::
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs progs release 4.6

2016-06-10 Thread David Sterba
Hi,

the btrfs-progs 4.6 have been released (no change since rc1). The
biggest change is the btrfs-convert rewrite. The delayed release was
caused by more testing as there were some late fixes to the code
although the patchset has been in the development branch for a long
time.

Apart from that, usual load of small fixes and improvements.

* convert - major rewrite:
  * fix a long-standing bug that led to mixing data blocks into metadata block
groups
  * the workaround was to do full balance after conversion, which was
recommended practice anyway
  * explicitly set the lowest supported version of e2fstools to 1.41
* provide and install udev rules file that addresses problems with device
  mapper devices, renames after removal
* send: new option: quiet
* dev usage: report slack space (device size minus filesystem area on the dev)
* image: support DUP
* build: short options to enable debugging builds
* other:
  * code cleanups
  * build fixes
  * more tests and other enhancements

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

Anand Jain (2):
  btrfs-progs: makefile: add clean-all to the usage
  btrfs-progs: clean up commands.h

David Sterba (22):
  btrfs-progs: build: add support for debugging builds
  btrfs-progs: docs: compression is disabled with nodatasum/nodatacow
  btrfs-progs: device usage: report slack space
  btrfs-progs: makefile: add target for testing installation
  btrfs-progs: drop O_CREATE from open_ctree_fs_info
  btrfs-progs: fix type mismatch in backtrace dumping functions
  btrfs-progs: switch to common message helpers in utils.c
  btrfs-progs: tests: convert, run md5sum with sudo helper
  btrfs-progs: tests: run rollback after conversion
  btrfs-progs: tests: convert: dump all superblocks after conversion
  btrfs-progs: tests: document cli-tests in readme
  btrfs-progs: use wider int type in btrfs_min_global_blk_rsv_size
  btrfs-progs: tests: move convert helpers to a separate file
  btrfs-progs: tests: convert: separate ext2 tests
  btrfs-progs: tests: convert: separate ext3 tests
  btrfs-progs: tests: convert: separate ext4 tests
  btrfs-progs: tests: clean up the test driver of convert tests
  btrfs-progs: tests: convert: set common variables
  btrfs-progs: tests: unify test drivers
  btrfs-progs: tests: 004-ext2-backup-superblock-ranges, drop unnecessary 
root privs
  btrfs-progs: tests: 004-ext2-backup-superblock-ranges, use common helpers 
for image loop
  Btrfs progs v4.6

Jeff Mahoney (1):
  btrfs-progs: udev: add rules for dm devices

Lu Fengqi (2):
  btrfs-progs: tests: add 020-extent-ref-cases
  btrfs-progs: make btrfs-image restore to support dup

M G Berberich (1):
  btrfs-progs: send: add quiet option

Merlin Hartley (1):
  btrfs-progs: doc: fix typo in btrfs-subvolume

Nicholas D Steeves (1):
  btrfs-progs: typo review of strings and comments

Qu Wenruo (36):
  btrfs-progs: Enhance tree block check by checking empty leaf or node
  btrfs-progs: Return earlier for previous item
  btrfs-progs: convert-tests: Add test for backup superblock migration
  btrfs-progs: corrupt-block: Add support to corrupt extent for skinny 
metadata
  btrfs-progs: utils: Introduce new pseudo random API
  btrfs-progs: Use new random number API
  btrfs-progs: convert-tests: Add support for custom test scripts
  btrfs-progs: convert-tests: Add test case for backup superblock migration
  btrfs-progs: convert: add compatibility layer for e2fsprogs < 1.42
  btrfs-progs: convert: Introduce functions to read used space
  btrfs-progs: convert: Introduce new function to remove reserved ranges
  btrfs-progs: convert: Introduce function to calculate the available space
  btrfs-progs: utils: Introduce new function for convert
  btrfs-progs: Introduce function to setup temporary superblock
  btrfs-progs: Introduce function to setup temporary tree root
  btrfs-progs: Introduce function to setup temporary chunk root
  btrfs-progs: Introduce function to initialize device tree
  btrfs-progs: Introduce function to initialize fs tree
  btrfs-progs: Introduce function to initialize csum tree
  btrfs-progs: Introduce function to setup temporary extent tree
  btrfs-progs: Introduce function to create convert data chunks
  btrfs-progs: extent-tree: Introduce function to find the first 
overlapping extent
  btrfs-progs: extent-tree: Enhance btrfs_record_file_extent
  btrfs-progs: convert: Introduce new function to create converted image
  btrfs-progs: convert: Introduce function to migrate reserved ranges
  btrfs-progs: convert: Enhance record_file_blocks to handle reserved ranges
  btrfs-progs: convert: Introduce init_btrfs_v2 function.
  btrfs-progs: Introduce 

Re: fsck: to repair or not to repair

2016-06-10 Thread Austin S. Hemmelgarn

On 2016-06-09 23:40, Nikolaus Rath wrote:

On May 11 2016, Nikolaus Rath  wrote:

Hello,

I recently ran btrfsck on one of my file systems, and got the following
messages:

checking extents
checking free space cache
checking fs roots
root 5 inode 3149867 errors 400, nbytes wrong
root 5 inode 3150237 errors 400, nbytes wrong
root 5 inode 3150238 errors 400, nbytes wrong
root 5 inode 3150242 errors 400, nbytes wrong
root 5 inode 3150260 errors 400, nbytes wrong
[ lots of similar message with different inode numbers ]
root 5 inode 15595011 errors 400, nbytes wrong
root 5 inode 15595016 errors 400, nbytes wrong
Checking filesystem on /dev/mapper/vg0-nikratio_crypt
UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
found 263648960636 bytes used err is 1
total csum bytes: 395314372
total tree bytes: 908644352
total fs tree bytes: 352735232
total extent tree bytes: 95039488
btree space waste bytes: 156301160
file data blocks allocated: 675209801728
 referenced 410351722496
Btrfs v3.17


Can someone explain to me the risk that I run by attempting a repair,
and (conversely) what I put at stake when continuing to use this file
system as-is?


To follow-up on this: after finding out which files were affected (using
btrfs inspect-internal), I was able to fix the problem without using
btrfsck by simply copying the data, deleting the file, and restoring it:

cat affected-files.txt | while read -r name; do
rsync -a "${name}" "/backup/location/${name}"
rm -f "${name}"
cp -a "/backup/location/${name}" "${name}"
done

(I used rsync to avoid cp making use of reflinks). After this procedure,
btrfschk reported no more problems.
JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid it 
making reflinks.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Managing storage (incl. Btrfs) on Linux with openATTIC

2016-06-10 Thread Lenz Grimmer
Hi there,

if you're using Btrfs on Linux for file serving purposes, i'd like to invite 
you to take a look at our open source storage management project "openATTIC":

  http://openattic.org/

We provide a web UI and RESTful API to create CIFS/NFS shares on top of Btrfs 
and other file systems, including monitoring and snapshots. Other file systems 
like ext4, XFS or ZFS are supported, too. We also support sharing block volumes 
via iSCSI and Fibre Channel via LIO and are currently working on adding Ceph 
Management and Monitoring support as well.

openATTIC 2.0 is currently in development and we're looking for more testers 
and feedback. Packages for the Debian/Ubuntu, RHEL/CentOS and SUSE are 
available via apt/yum repos.

For the time being, we don't yet support all the nifty Btrfs features (e.g. 
RAID levels), but you can already use openATTIC to manage (e.g. creating and 
snapshotting) and monitor Btrfs file systems via the WebUI. We plan to further 
extend the Btrfs functionality incrementally with each release. Some use cases 
we have in mind are documented here: 
https://wiki.openattic.org/display/OP/openATTIC+Storage+Management+Use+Cases

So if you're looking for a free (GPLv2) storage management tool that supports 
your favorite file system, we'd be glad if you give openATTIC a try!

Thanks and sorry for the noise,

Lenz
-- 
 Lenz Grimmer  - http://www.lenzg.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to map extents to files

2016-06-10 Thread Qu Wenruo



At 06/02/2016 10:56 PM, Nikolaus Rath wrote:

On Jun 02 2016, Qu Wenruo  wrote:

At 06/02/2016 11:06 AM, Nikolaus Rath wrote:

Hello,

For one of my btrfs volumes, btrfsck reports a lot of the following
warnings:

[...]
checking extents
bad extent [138477568, 138510336), type mismatch with chunk
bad extent [140091392, 140148736), type mismatch with chunk
bad extent [140148736, 140201984), type mismatch with chunk
bad extent [140836864, 140865536), type mismatch with chunk
[...]

Is there a way to discover which files are affected by this (in
particular so that I can take a look at them before and after a btrfsck
--repair)?


Which version is the progs? If the fs is not converted from ext2/3/4,
it may be a false alert.


Version is 4.4.1. The fs may very well have been converted from ext4,
but I can't tell for sure.


Best,
-Nikolaus



Sorry for the late reply.

For such case, btrfsck --repair is unable to fix it, as btrfs-progs is 
not able to balance extents.


Normally, a full balance would fix it.


I would try to update btrfs-progs to 4.5 and recheck, to see if it's a 
false alert.

If not, then remove unused snapshots and then do the full balance.

It's recommended to delete unused snapshots, as if there are too many 
snapshots, balance may be quite slow.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html