Re: [PATCH 0/4] Lowmem mode btrfs fixes exposed by complex tree

2017-11-13 Thread Qu Wenruo


On 2017年11月13日 15:34, Qu Wenruo wrote:
> The patchset (along with "backref lost" bug fixes and test cases) can be
> fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/lowmem_fix

Branch updated with new fixes for "referencer count mismatch".
Thanks Chris Murphy for his image.

If nothing goes wrong, the branch should fix all the problem Chris
Murphy found.

Thanks,
Qu
> 
> Despite the backref lost false alerts reported by Chris Murphy, there
> are still some other bugs to be fixed.
> 
> One is also exposed by Chris Murphy, where btrfs-progs backref can't
> handle shared block ref for metadata. Fix by 1st patch.
> 
> And 2 more bugs exposed by the test image which is originally designed
> for the bug fixed by 1st patch.
> 
> Last but not the least, here comes the test image.
> Which is an image with a lot of metadata and under a relocation.
> It is definitely a bomb for old lowmem check.
> 
> Qu Wenruo (4):
>   btrfs-progs: backref: Allow backref walk to handle direct parent ref
>   btrfs-progs: lowmem check: Fix function call stack overflow caused by
> wrong tree reloc tree detection
>   btrfs-progs: lowmem check: Fix false alerts for image with shared
> block ref only backref
>   btrfs-progs: fsck-test: Add new image with shared block ref only
> metadata backref
> 
>  backref.c  |   3 ++
>  cmds-check.c   |  35 
> +
>  .../020-extent-ref-cases/shared_block_ref_only.img | Bin 0 -> 304128 bytes
>  3 files changed, 32 insertions(+), 6 deletions(-)
>  create mode 100644 
> tests/fsck-tests/020-extent-ref-cases/shared_block_ref_only.img
> 



signature.asc
Description: OpenPGP digital signature


[PATCH 2/2] btrfs-progs: fsck-tests: Introduce test case with keyed data backref with shared tree blocks

2017-11-13 Thread Qu Wenruo
For snapshot shared tree blocks with source subvolume, the keyed backref
counter only counts the exclusive owned references.

In the following case, 258 is a snapshot of 257, which inherits all the
reference to this data extent.
--
item 4 key (12582912 EXTENT_ITEM 524288) itemoff 3741 itemsize 140
refs 179 gen 9 flags DATA
extent data backref root 257 objectid 258 offset 0 count 49
extent data backref root 257 objectid 257 offset 0 count 1
extent data backref root 256 objectid 258 offset 0 count 128
extent data backref root 256 objectid 257 offset 0 count 1
--

However lowmem mode used to iterate the whole inode to find all
reference, and doesn't care if the reference is already counted by the
shared tree block.

And the test case to check it.

Signed-off-by: Qu Wenruo 
---
 .../keyed_data_ref_with_shared_leaf.img | Bin 0 -> 19456 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 
tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img

diff --git 
a/tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img 
b/tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img
new file mode 100644
index 
..2ce5068f01c693e577de8b9a536fdeb360029e5f
GIT binary patch
literal 19456
zcmeIZby!qi_c%IqrzoHxp-8usbaw~{2q>i>jevj>LkLJHC?E*Z-5?;%01|@IAkra7
zNq5Zbdj|Zx@AuyOzR&^YvGv}fegIR8Qb
z65U9Si9W){(1$b`^bz(FeIS>C@L;e^^r86zc0t3l7qBfZ1k1>tX#2
zK817m9UcUiDV@VlArQu=d=3Ym!z$;nJsLJwJ%^dltZZGt=QuiQ=j~%uP

[PATCH 1/2] btrfs-progs: lowmem check: Fix false alerts of referencer count mismatch for snapshot

2017-11-13 Thread Qu Wenruo
Btrfs lowmem check reports such false alerts:
--
ERROR: extent[366498091008, 134217728] referencer count mismatch (root: 827, 
owner: 73782, offset: 134217728) wanted: 4, have: 26
ERROR: extent[366498091008, 134217728] referencer count mismatch (root: 818, 
owner: 73782, offset: 134217728) wanted: 4, have: 26
ERROR: extent[366498091008, 134217728] referencer count mismatch (root: 870, 
owner: 73782, offset: 134217728) wanted: 4, have: 26
--

While in extent tree, the extent has:
--
item 81 key (366498091008 EXTENT_ITEM 134217728) itemoff 9008 itemsize 
169
refs 39 gen 224 flags DATA
extent data backref root 827 objectid 73782 offset 134217728 
count 4
extent data backref root 818 objectid 73782 offset 134217728 
count 4
extent data backref root 259 objectid 73482 offset 134217728 
count 1
extent data backref root 644 objectid 73782 offset 134217728 
count 26
extent data backref root 870 objectid 73782 offset 134217728 
count 4
--

And in root 827, there is one leaf with 4 references to that extent
which is owned by 827:
--
leaf 714964992 items 68 free space 10019 generation 641 owner 827
leaf 714964992 flags 0x1(WRITTEN) backref revision 1
..
item 64 key (73782 EXTENT_DATA 134217728) itemoff 11878 itemsize 53
generation 224 type 1 (regular)
extent data disk byte 366498091008 nr 134217728
extent data offset 0 nr 6410240 ram 134217728
extent compression 0 (none)
item 65 key (73782 EXTENT_DATA 140627968) itemoff 11825 itemsize 53
generation 224 type 1 (regular)
extent data disk byte 366498091008 nr 134217728
extent data offset 6410240 nr 512 ram 134217728
extent compression 0 (none)
item 66 key (73782 EXTENT_DATA 145747968) itemoff 11772 itemsize 53
generation 224 type 1 (regular)
extent data disk byte 366498091008 nr 134217728
extent data offset 11530240 nr 7675904 ram 134217728
extent compression 0 (none)
item 67 key (73782 EXTENT_DATA 153423872) itemoff 11719 itemsize 53
generation 224 type 1 (regular)
extent data disk byte 366498091008 nr 134217728
extent data offset 19206144 nr 6397952 ram 134217728
extent compression 0 (none)
--

And starts from next leaf, there are 22 references to the data extent:
--
leaf 894861312 items 208 free space 59 generation 261 owner 644
leaf 894861312 flags 0x1(WRITTEN) backref revision 1
item 0 key (73782 EXTENT_DATA 159821824) itemoff 16230 itemsize 53
generation 224 type 1 (regular)
extent data disk byte 366498091008 nr 134217728
extent data offset 25604096 nr 8192 ram 134217728
extent compression 0 (none)
item 1 key (73782 EXTENT_DATA 159830016) itemoff 16177 itemsize 53
generation 224 type 1 (regular)
extent data disk byte 366498091008 nr 134217728
extent data offset 25612288 nr 7675904 ram 134217728
extent compression 0 (none)
..
--

However the next leaf is owned by other subvolume, normally owned by (part of)
the snapshot source.

Fix it by also checking the leaf's owner before increasing the reference
counter.

Reported-by: Chris Murphy 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 4805e11b752b..d782567c77ee 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -10909,11 +10909,17 @@ static int check_extent_data_backref(struct 
btrfs_fs_info *fs_info,
 * Except normal disk bytenr and disk num bytes, we still
 * need to do extra check on dbackref offset as
 * dbackref offset = file_offset - file_extent_offset
+*
+* Also, we must check the leaf owner.
+* In case of shared tree blocks (snapshots) we can inherit
+* leaves from source snapshot.
+* In that case, reference from source snapshot should not
+* count.
 */
if (btrfs_file_extent_disk_bytenr(leaf, fi) == bytenr &&
btrfs_file_extent_disk_num_bytes(leaf, fi) == len &&
(u64)(key.offset - btrfs_file_extent_offset(leaf, fi)) ==
-   offset)
+   offset && btrfs_header_owner(leaf) == root_id)
found_count++;
 
 next:
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read before you deploy btrfs + zstd

2017-11-13 Thread Martin Steigerwald
Hello David.

David Sterba - 13.11.17, 23:50:
> while 4.14 is still fresh, let me address some concerns I've seen on linux
> forums already.
> 
> The newly added ZSTD support is a feature that has broader impact than
> just the runtime compression. The btrfs-progs understand filesystem with
> ZSTD since 4.13. The remaining key part is the bootloader.
> 
> Up to now, there are no bootloaders supporting ZSTD. This could lead to an
> unmountable filesystem if the critical files under /boot get accidentally
> or intentionally compressed by ZSTD.

But otherwise ZSTD is safe to use? Are you aware of any other issues?

I consider switching from LZO to ZSTD on this ThinkPad T520 with Sandybridge.

Thank you,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)

2017-11-13 Thread Marat Khalili

On 14/11/17 06:39, Dave wrote:

My rsync command currently looks like this:

rsync -axAHv --inplace --delete-delay --exclude-from="/some/file"
"$source_snapshop/" "$backup_location"
As I learned from Kai Krakow in this maillist, you should also add 
--no-whole-file if both sides are local. Otherwise target space usage 
can be much worse (but fragmentation much better).


I wonder what is your justification for --delete-delay, I just use --delete.

Here's what I use: --verbose --archive --hard-links --acls --xattrs 
--numeric-ids --inplace --delete --delete-excluded --stats. Since in my 
case source is always remote, there's no --no-whole-file, but there's 
--numeric-ids.



In particular, I want to know if I should or should not be using these options:

 -H, --hard-linkspreserve hard links
 -A, --acls  preserve ACLs (implies -p)
 -X, --xattrspreserve extended attributes
 -x, --one-file-system   don't cross filesystem boundaries
I don't know any semantic use of hard links in modern systems. There're 
ACLs on some files in /var/log/journal on systems with systemd. Synology 
actively uses ACL, but it's implementation is sadly incompatible with 
rsync. There can always be some ACLs or xattrs set by sysadmin manually. 
End result, I always specify first three options where possible just in 
case (even though man page says that --hard-links may affect performance).



I had to use the "x" option to prevent rsync from deleting files in
snapshots in the backup location (as the source location does not
retain any snapshots). Is there a better way?
Don't keep snapshots under rsync target, place them under ../snapshots 
(if snapper supports this):



# find . -maxdepth 2
.
./snapshots
./snapshots/2017-11-08T13:18:20+00:00
./snapshots/2017-11-08T15:10:03+00:00
./snapshots/2017-11-08T23:28:44+00:00
./snapshots/2017-11-09T23:41:30+00:00
./snapshots/2017-11-10T22:44:36+00:00
./snapshots/2017-11-11T21:48:19+00:00
./snapshots/2017-11-12T21:27:41+00:00
./snapshots/2017-11-13T23:29:49+00:00
./rsync
Or, specify them in --exclude and avoid using --delete-excluded. Or keep 
using -x if it works, why not?


--

With Best Regards,
Marat Khalili
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check fails with: btrfs_alloc_chunk: BUG_ON `ret` triggered, value -28

2017-11-13 Thread Qu Wenruo


On 2017年11月14日 11:50, Chris Murphy wrote:
> On Mon, Nov 13, 2017 at 8:45 PM, Chris Murphy  wrote:
>> On Mon, Nov 13, 2017 at 8:08 PM, Ben Hooper  wrote:
>>
>>> [28205.454029] Code: 79 ff ff ff 49 8b 7c 24 60 89 da 48 c7 c6 68 c7 be a0 
>>> 31 c0 e8 12 4e fe ff eb 9b 89 de 48 c7 c7 38 c7 be a0 31 c0 e8 53 11 5a e0 
>>> <0f> ff eb 87 66 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
>>> [28205.456407] ---[ end trace 77358f42ce65a0d0 ]---
>>> [28205.457579] BTRFS: error (device sdl) in 
>>> btrfs_create_pending_block_groups:10254: errno=-27 unknown
>>> [28206.172366] BTRFS: error (device sdl) in 
>>> btrfs_create_pending_block_groups:10254: errno=-27 unknown
>>> [28206.178599] BTRFS warning (device sdl): Skipping commit of aborted 
>>> transaction.
>>> [28206.179840] BTRFS: error (device sdl) in cleanup_transaction:1873: 
>>> errno=-5 IO failure
>>> [28206.256720] BTRFS error (device sdl): pending csums is 1368064
>>
>> The mysterious -27. But then also IO failure. Are there any other
>> storage related kernel messages in the ~2 to 5 minutes prior to this
>> trace? I'm wondering if there's something misbehaving: cable,
>> controller, drive, other.
>>
> 
> Found a similar trace from July 2015 that Omar ran into and Filipe
> thought he was close to tracking it down. It was also a pretty big
> file system. Maybe they have an idea here. But btrfs check blowing up
> isn't good in any case.

The BUG_ON in btrfs check is pretty obvious, btrfs check --repair runs
out of space.

Of course, you won't hit it if using default btrfs check which is readonly.

IIRC I have submitted patch to make it end more gracefully.

Thanks,
Qu
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: btrfs check fails with: btrfs_alloc_chunk: BUG_ON `ret` triggered, value -28

2017-11-13 Thread Ben Hooper


> On 14 Nov 2017, at 11:45 am, Chris Murphy  wrote:
> 
> On Mon, Nov 13, 2017 at 8:08 PM, Ben Hooper  wrote:
> 
>> [28205.454029] Code: 79 ff ff ff 49 8b 7c 24 60 89 da 48 c7 c6 68 c7 be a0 
>> 31 c0 e8 12 4e fe ff eb 9b 89 de 48 c7 c7 38 c7 be a0 31 c0 e8 53 11 5a e0 
>> <0f> ff eb 87 66 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
>> [28205.456407] ---[ end trace 77358f42ce65a0d0 ]---
>> [28205.457579] BTRFS: error (device sdl) in 
>> btrfs_create_pending_block_groups:10254: errno=-27 unknown
>> [28206.172366] BTRFS: error (device sdl) in 
>> btrfs_create_pending_block_groups:10254: errno=-27 unknown
>> [28206.178599] BTRFS warning (device sdl): Skipping commit of aborted 
>> transaction.
>> [28206.179840] BTRFS: error (device sdl) in cleanup_transaction:1873: 
>> errno=-5 IO failure
>> [28206.256720] BTRFS error (device sdl): pending csums is 1368064
> 
> The mysterious -27. But then also IO failure. Are there any other
> storage related kernel messages in the ~2 to 5 minutes prior to this
> trace? I'm wondering if there's something misbehaving: cable,
> controller, drive, other.

Good question. I thought the IO error here was a response to the filesystem 
being remounted RO. 

Unfortunately I cannot see any other IO errors in the dmesg, and the mpt2sas 
driver is not reporting any errors. I do see a few blocked tasks:

[15238.872017] INFO: task kworker/u50:21:12403 blocked for more than 
120 seconds.
[15238.873173]   Not tainted 4.14.0-rc7+ #1
[15238.874251] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[15238.875773] kworker/u50:21  D0 12403  2 0x8080
[15238.877313] Workqueue: events_unbound 
btrfs_async_reclaim_metadata_space [btrfs]

Complete dmesg here if interested - https://pastebin.com/raw/wFasQrWs (long)

Cheers,

Ben

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check fails with: btrfs_alloc_chunk: BUG_ON `ret` triggered, value -28

2017-11-13 Thread Chris Murphy
On Mon, Nov 13, 2017 at 8:45 PM, Chris Murphy  wrote:
> On Mon, Nov 13, 2017 at 8:08 PM, Ben Hooper  wrote:
>
>> [28205.454029] Code: 79 ff ff ff 49 8b 7c 24 60 89 da 48 c7 c6 68 c7 be a0 
>> 31 c0 e8 12 4e fe ff eb 9b 89 de 48 c7 c7 38 c7 be a0 31 c0 e8 53 11 5a e0 
>> <0f> ff eb 87 66 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
>> [28205.456407] ---[ end trace 77358f42ce65a0d0 ]---
>> [28205.457579] BTRFS: error (device sdl) in 
>> btrfs_create_pending_block_groups:10254: errno=-27 unknown
>> [28206.172366] BTRFS: error (device sdl) in 
>> btrfs_create_pending_block_groups:10254: errno=-27 unknown
>> [28206.178599] BTRFS warning (device sdl): Skipping commit of aborted 
>> transaction.
>> [28206.179840] BTRFS: error (device sdl) in cleanup_transaction:1873: 
>> errno=-5 IO failure
>> [28206.256720] BTRFS error (device sdl): pending csums is 1368064
>
> The mysterious -27. But then also IO failure. Are there any other
> storage related kernel messages in the ~2 to 5 minutes prior to this
> trace? I'm wondering if there's something misbehaving: cable,
> controller, drive, other.
>

Found a similar trace from July 2015 that Omar ran into and Filipe
thought he was close to tracking it down. It was also a pretty big
file system. Maybe they have an idea here. But btrfs check blowing up
isn't good in any case.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check fails with: btrfs_alloc_chunk: BUG_ON `ret` triggered, value -28

2017-11-13 Thread Chris Murphy
On Mon, Nov 13, 2017 at 8:08 PM, Ben Hooper  wrote:

> [28205.454029] Code: 79 ff ff ff 49 8b 7c 24 60 89 da 48 c7 c6 68 c7 be a0 31 
> c0 e8 12 4e fe ff eb 9b 89 de 48 c7 c7 38 c7 be a0 31 c0 e8 53 11 5a e0 <0f> 
> ff eb 87 66 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> [28205.456407] ---[ end trace 77358f42ce65a0d0 ]---
> [28205.457579] BTRFS: error (device sdl) in 
> btrfs_create_pending_block_groups:10254: errno=-27 unknown
> [28206.172366] BTRFS: error (device sdl) in 
> btrfs_create_pending_block_groups:10254: errno=-27 unknown
> [28206.178599] BTRFS warning (device sdl): Skipping commit of aborted 
> transaction.
> [28206.179840] BTRFS: error (device sdl) in cleanup_transaction:1873: 
> errno=-5 IO failure
> [28206.256720] BTRFS error (device sdl): pending csums is 1368064

The mysterious -27. But then also IO failure. Are there any other
storage related kernel messages in the ~2 to 5 minutes prior to this
trace? I'm wondering if there's something misbehaving: cable,
controller, drive, other.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)

2017-11-13 Thread Dave
On Wed, Nov 1, 2017 at 1:15 AM, Roman Mamedov  wrote:
> On Wed, 1 Nov 2017 01:00:08 -0400
> Dave  wrote:
>
>> To reconcile those conflicting goals, the only idea I have come up
>> with so far is to use btrfs send-receive to perform incremental
>> backups as described here:
>> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup .
>
> Another option is to just use the regular rsync to a designated destination
> subvolume on the backup host, AND snapshot that subvolume on that host from
> time to time (or on backup completions, if you can synchronize that).
>
> rsync --inplace will keep space usage low as it will not reupload entire files
> in case of changes/additions to them.
>
> Yes rsync has to traverse both directory trees to find changes, but that's
> pretty fast (couple of minutes at most, for a typical root filesystem),
> especially if you use SSD or SSD caching.

Hello. I am implementing this suggestion. So far, so good. However, I
need some further recommendations on rsync options to use for this
purpose.

My rsync command currently looks like this:

rsync -axAHv --inplace --delete-delay --exclude-from="/some/file"
"$source_snapshop/" "$backup_location"

In particular, I want to know if I should or should not be using these options:

-H, --hard-linkspreserve hard links
-A, --acls  preserve ACLs (implies -p)
-X, --xattrspreserve extended attributes
-x, --one-file-system   don't cross filesystem boundaries

I had to use the "x" option to prevent rsync from deleting files in
snapshots in the backup location (as the source location does not
retain any snapshots). Is there a better way?

I have my live system on one block device and a backup snapshot of it
on another block device. I am keeping them in sync with hourly rsync
transfers.

Here's how this system works in a little more detail:

1. I establish the baseline by sending a full snapshot to the backup
block device using btrfs send-receive.
2. Next, on the backup device I immediately create a rw copy of that
baseline snapshot.
3. I delete the source snapshot to keep the live filesystem free of
all snapshots (so it can be optimally defragmented, etc.)
4. hourly, I take a snapshot of the live system, rsync all changes to
the backup block device, and then delete the source snapshot. This
hourly process takes less than a minute currently. (My test system has
only moderate usage.)
5. hourly, following the above step, I use snapper to take a snapshot
of the backup subvolume to create/preserve a history of changes. For
example, I can find the version of a file 30 hours prior.

The backup volume contains up to 100 snapshots while the live volume
has no snapshots. Best of both worlds? I guess I'll find out over
time.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check fails with: btrfs_alloc_chunk: BUG_ON `ret` triggered, value -28

2017-11-13 Thread Ben Hooper
Received another balance error which made the fileyetem RO.

Running btrfs check errors out with:

volumes.c:1035: btrfs_alloc_chunk: BUG_ON `ret` triggered, value -28

I am still able to mount the filesystem RW but it takes over 10 minutes to 
mount.

So far I have tried different kernel versions (4.9, 4.13, 4.14) and adding 
additional disks but I still can't get the balance operation to complete.

Any suggestions on how to get this filesystem balanced? Or should I just invest 
in a new filesystem and transfer the data over?

# ./btrfs check --repair --progress /dev/sda
enabling repair mode
Checking filesystem on /dev/sda
UUID: 3742056f-7ff0-4ce1-9131-ab2cfd7b8736
checking extents [o]
Fixed 0 roots.
cache and super generation don't match, space cache will be invalidated
checking fs roots [O]


volumes.c:1035: btrfs_alloc_chunk: BUG_ON `ret` triggered, value -28
./btrfs[0x424b3a]
./btrfs(btrfs_alloc_chunk+0x982)[0x4262f2]
./btrfs[0x41b08d]
./btrfs(btrfs_reserve_extent+0x142)[0x41b2cb]
./btrfs(btrfs_alloc_free_block+0x55)[0x41bd3b]
./btrfs(__btrfs_cow_block+0xdd)[0x40c62d]
./btrfs(btrfs_cow_block+0x124)[0x40d12e]
./btrfs(btrfs_search_slot+0x1cf)[0x40fa58]
./btrfs(btrfs_insert_empty_items+0xcf)[0x411390]
./btrfs(btrfs_insert_file_extent+0xa1)[0x421358]
./btrfs(btrfs_punch_hole+0x79)[0x43721f]
./btrfs[0x456a3a]
./btrfs(cmd_check+0x2c08)[0x45fb44]
./btrfs(main+0x15d)[0x40b514]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0c376e9c05]
./btrfs[0x40afe9]
Aborted


# ./btrfs --version
btrfs-progs v4.13.3


# uname -a
Linux nas 4.14.0-rc7+ #1 SMP Thu Nov 2 00:35:35 HKT 2017 x86_64 x86_64 x86_64 
GNU/Linux


[28205.406171] [ cut here ]
[28205.406214] BTRFS: error (device sdl) in 
btrfs_create_pending_block_groups:10254: errno=-27 unknown
[28205.406216] BTRFS info (device sdl): forced readonly
[28205.407857] BTRFS info (device sdl): 2 enospc errors during balance
[28205.410456] WARNING: CPU: 13 PID: 10879 at fs/btrfs/extent-tree.c:10254 
btrfs_create_pending_block_groups+0x250/0x260 [btrfs]
[28205.411485] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set 
nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter veth 
w83795 jc42 ext4 mbcache jbd2 intel_powerclamp coretemp kvm_intel kvm iTCO_wdt 
gpio_ich iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc btrfs aesni_intel crypto_simd glue_helper cryptd 
intel_cstate xor zstd_decompress zstd_compress xxhash ipmi_si ipmi_devintf 
pcspkr ipmi_msghandler input_leds i2c_i801 lpc_ich mfd_core ioatdma i7core_edac 
i5500_temp dca ses enclosure raid6_pq sg shpchp acpi_cpufreq nfsd auth_rpcgss 
nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic mgag200 
pata_acpi i2c_algo_bit crc32c_intel drm_kms_helper
[28205.419253]  syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix 
serio_raw ttm e1000e mpt3sas drm libata ptp raid_class pps_core 
scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod dax
[28205.421641] CPU: 13 PID: 10879 Comm: kworker/u49:14 Not tainted 4.14.0-rc7+ 
#1
[28205.422815] Hardware name: Penguin Computing Relion 4724/X8DT6, BIOS 2.0c    
05/15/2012
[28205.424058] Workqueue: events_unbound btrfs_async_reclaim_metadata_space 
[btrfs]
[28205.425297] task: 880620912c00 task.stack: c900277f4000
[28205.426533] RIP: 0010:btrfs_create_pending_block_groups+0x250/0x260 [btrfs]
[28205.427773] RSP: 0018:c900277f7c50 EFLAGS: 00010246
[28205.429020] RAX: 0026 RBX: ffe5 RCX: 
[28205.430248] RDX:  RSI: 880627bce098 RDI: 880627bce098
[28205.431450] RBP: c900277f7cd0 R08:  R09: 0892
[28205.432665] R10: 0007 R11: 0891 R12: 8805c52948e8
[28205.433856] R13: 8801d243e400 R14: 8801d2438400 R15: 8801d243e520
[28205.435048] FS:  () GS:880627bc() 
knlGS:
[28205.436279] CS:  0010 DS:  ES:  CR0: 80050033
[28205.437462] CR2: 7f5985a7fff0 CR3: 01c09004 CR4: 000206e0
[28205.438648] Call Trace:
[28205.439875]  __btrfs_end_transaction+0x93/0x2f0 [btrfs]
[28205.441106]  btrfs_end_transaction+0x10/0x20 [btrfs]
[28205.442327]  flush_space+0x137/0x550 [btrfs]
[28205.443519]  ? pick_next_task_fair+0x48b/0x5d0
[28205.444706]  ? __switch_to+0x1ff/0x440
[28205.445892]  btrfs_async_reclaim_metadata_space+0xfc/0x4b0 [btrfs]
[28205.447079]  process_one_work+0x149/0x360
[28205.448285]  worker_thread+0x4d/0x3e0
[28205.449441]  kthread+0x109/0x140
[28205.450586]  ? rescuer_thread+0x380/0x380
[28205.451741]  ? kthread_park+0x60/0x60
[28205.452906]  ret_from_fork+0x25/0x30
[28205.454029] Code: 79 ff ff ff 49 8b 7c 24 60 89 da 48 c7 c6 68 c7 be a0 31 
c0 e8 12 4e fe ff eb 9b 89 de 48 c7 c7 38 c7 be a0 31 c0 e8 53 11 5a e0 <0f> ff 
eb 87 66 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90

Re: [GIT PULL] Btrfs changes for 4.15

2017-11-13 Thread Qu Wenruo


On 2017年11月13日 23:35, David Sterba wrote:
> Hi,
> 
> please pull the following btrfs changes. There are some new user features and
> the usual load of invisible enhancements or cleanups. The branch merges
> cleanly, has been frozen in case rc7 was the last one, so I send out the pull
> request early. Thanks.
> 
> 
> New features:
> 
> - extend mount options to specify zlib compression level, -o compress=zlib:9

However the support for it has a big problem, it will cause wild memory
access for "-o compress" mount option.

Kernel ASAN can detect it easily and we already have user report about
it. Btrfs/026 could also easily trigger it.

The fixing patch is submitted some days ago:
https://patchwork.kernel.org/patch/10042553/

And the default compression level when not specified is zero, which
means no compression but directly memory copy.

Thanks,
Qu

> 
> - v2 of ioctl "extent to inode mapping", addressing a usecase where we want to
>   retrieve more but inaccurate results and do the postprocessing in userspace,
>   aiding defragmentation or deduplication tools
> 
> - populate compression heuristics logic, do data sampling and try to guess
>   compressibility by: looking for repeated patterns, counting unique byte
>   values and distribution, calculating Shannon entropy;
>   this will need more benchmarking and possibly fine tuning, but the base
>   should be good enough
> 
> - enable indexing for btrfs as lower filesystem in overlayfs
> 
> - speedup page cache readahead during send on large files
> 
> 
> Internal enhancements:
> 
> - more sanity checks of b-tree items when reading them from disk
> 
> - more EINVAL/EUCLEAN fixups, missing BLK_STS_* conversion, other errno or
>   error handling fixes
> 
> - remove some homegrown IO-related logic, that's been obsoleted by core block
>   layer changes (batching, plug/unplug, own counters)
> 
> - add ref-verify, optional debugging feature to verify extent reference
>   accounting
> 
> - simplify code handling outstanding extents, make it more clear where and how
>   the accounting is done
> 
> - make delalloc reservations per-inode, simplify the code and make the logic
>   more straightforward
> 
> - extensive cleanup of delayed refs code
> 
> 
> Notable fixes:
> 
> - fix send ioctl on 32bit with 64bit kernel
> 
> 
> The branch top commit matches the signed tag for-4.15-tag.
> 
> 
> The following changes since commit 0b07194bb55ed836c2cc7c22e866b87a14681984:
> 
>   Linux 4.14-rc7 (2017-10-29 13:58:38 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.15
> 
> for you to fetch changes up to d28e649a5c58b779b303c252c66ee84a0f2c3b32:
> 
>   btrfs: Fix bug for misused dev_t when lookup in dev state hash table. 
> (2017-11-01 20:45:36 +0100)
> 
> 
> Adam Borowski (1):
>   btrfs: allow setting zlib compression level via :9
> 
> Allen Pais (1):
>   btrfs: return -ENOMEM on allocation failure in btrfsic
> 
> Anand Jain (13):
>   btrfs: declare TRACE_DEFINE_ENUM for each of show_flush_state enum
>   btrfs: copy fsid to super_block s_uuid
>   btrfs: undo writable superblocke when sprouting fails
>   btrfs: fix BUG_ON in btrfs_init_new_device()
>   btrfs: error out if btrfs_attach_transaction() fails
>   btrfs: add_missing_dev() should return the actual error
>   btrfs: fix EIO misuse to report missing degraded option
>   btrfs: declare btrfs_report_missing_device() static
>   btrfs: fix use of error or warning for missing device
>   btrfs: use BLK_STS defines where needed
>   btrfs: use need_full_stripe() in __btrfs_map_block()
>   btrfs: fix false EIO for missing device
>   btrfs: remove BUG_ON in btrfs_rm_dev_replace_free_srcdev()
> 
> Arnd Bergmann (1):
>   btrfs: tree-checker: use %zu format string for size_t
> 
> Christophe JAILLET (1):
>   btrfs: tests: Fix a memory leak in error handling path in 'run_test()'
> 
> Christos Gkekas (2):
>   btrfs: Clean up dead code in root-tree
>   btrfs: Clean up unused variables in free-space-tree.c
> 
> Colin Ian King (2):
>   btrfs: avoid null pointer dereference on fs_info when calling btrfs_crit
>   btrfs: make array types static const, reduces object code size
> 
> David Sterba (4):
>   btrfs: scrub: get rid of sector_t
>   btrfs: rename page offset parameter in submit_extent_page
>   btrfs: get rid of sector_t and use u64 offset in submit_extent_page
>   btrfs: allow to set compression level for zlib
> 
> Goldwyn Rodrigues (1):
>   btrfs: cleanup extent locking sequence
> 
> Gu JinXiang (2):
>   btrfs: Use bd_dev to generate index when dev_state_hashtable add items.
>   btrfs: Fix bug for misused dev_t when lookup in dev state hash table.
> 
> Hans van Kranenburg (1):
>   btrfs: prefix sysfs attribute struct 

Read before you deploy btrfs + zstd

2017-11-13 Thread David Sterba
Hi,

while 4.14 is still fresh, let me address some concerns I've seen on linux
forums already.

The newly added ZSTD support is a feature that has broader impact than
just the runtime compression. The btrfs-progs understand filesystem with
ZSTD since 4.13. The remaining key part is the bootloader.

Up to now, there are no bootloaders supporting ZSTD. This could lead to an
unmountable filesystem if the critical files under /boot get accidentally
or intentionally compressed by ZSTD.

There are several ways how to get around that:

- separate boot partition, no zstd there

- reset potential compression of /boot/* files to something supported, eg.
  $ btrfs filesystem defrag -r -czlib /boot/*
  or
  $ btrfs filesystem defrag -r -clzo /boot/*

To see if there are zstd files:

$ find /boot -print -exec sudo btrfs prop get '{}' compression \; | grep -B1 
zstd

There might be other workarounds but I want to keep the advice simple.

d.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2017-11-13 Thread Friedrich Mayrhofer


This is the second time i am sending you this Email.

I, Friedrich Mayrhofer Donate $ 1,000,000.00 to You, Email Me  
personally for more details.


Regards.
Friedrich Mayrhofer






This message was sent using IMP, the Internet Messaging Program.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs list corruption and soft lockups while testing writeback error handling

2017-11-13 Thread Liu Bo
On Thu, May 11, 2017 at 03:56:35PM -0400, Chris Mason wrote:
> On 05/11/2017 03:52 PM, Jeff Layton wrote:
> > On Thu, 2017-05-11 at 07:13 -0400, Jeff Layton wrote:
> > > I finally got my writeback error handling test to work on btrfs (thanks,
> > > Chris!), by making the filesystem stripe the data and mirror the
> > > metadata across two devices. The test passes now, but on one run, I got
> > > the following list corruption warning and then a soft lockup (which is
> > > probably fallout from the list corruption).
> > > 
> > > I ran the test several times before and since then without this failure,
> > > so I don't have a clear reproducer. The kernel in this instance is
> > > basically a v4.11 kernel with my pile of writeback error handling
> > > patches on top:
> > > 
> > > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.samba.org_-3Fp-3Djlayton_linux.git-3Ba-3Dshortlog-3Bh-3Drefs_heads_wberr=DwICaQ=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=BXXwaUFQNFNaGGFYHEVlvNBwkrXiIoH7K5iOdR_PvxM=xE6pIXeQ1rlaxAV8aTYBSiI06pb3WZoiRJW8Vo1L3NQ=
> > > 
> > > It may be that they are a contributing factor, but this smells more like
> > > a bug down in btrfs. Let me know if you need other info:
> 
> [ btrfs inode logging ]
> 
> > (cc'ing Liu Bo since we were discussing this earlier this week)
> > 
> > I can't reproduce this on stock v4.11, so I think this is a bug in my
> > series.
> > 
> > I think this is due to the differences in how errors are being reported
> > from filemap_fdatawait_range now causing some transactions to end up
> > being freed while they're still on the log_ctxs list. I'm working on
> > hunting down the problem now.
> > 
> > Sorry for the noise!
> > 
> 
> There's a list in the inode logging code that we consistently seem to find
> list debugging assertions with.  We've fixed up all the known issues, but I
> wouldn't be surprised if we've got a goto fail in there.
> 
> I'll take a look ;)

FYI, I've nailed this down, and it turns out to be a bug from btrfs side[1].

[1] https://patchwork.kernel.org/patch/10056535/
"Btrfs: fix list_add corruption and soft lockups in fsync"

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix list_add corruption and soft lockups in fsync

2017-11-13 Thread Liu Bo
Xfstests btrfs/146 revealed this corruption,

[   58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page 
read
[   58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 
1, rd 0, flush 0, corrupt 0, gen 0
[   58.152403] list_add corruption. prev->next should be next 
(88005e6775d8), but was c9000189be88. (prev=c9000189be88).
[   58.153518] [ cut here ]
[   58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 
__list_add_valid+0x169/0x1f0
...
[   58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0
...
[   58.161956] Call Trace:
[   58.162264]  btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs]
[   58.163583]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[   58.164003]  btrfs_sync_file+0x4c2/0x6f0 [btrfs]
[   58.164393]  vfs_fsync_range+0x5f/0xd0
[   58.164898]  do_fsync+0x5a/0x90
[   58.165170]  SyS_fsync+0x10/0x20
[   58.165395]  entry_SYSCALL_64_fastpath+0x1f/0xbe
...

It turns out that we could record btrfs_log_ctx:io_err in
log_one_extents when IO fails, but make log_one_extents() return '0'
instead of -EIO, so the IO error is not acknowledged by the callers,
i.e.  btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list
from list head 'root->log_ctxs'.  Since btrfs_log_ctx is allocated
from stack memory, it'd get freed with a object alive on the
list. then a future list_add will throw the above warning.

This returns the correct error in the above case.

Jeff also reported this while testing against his fsync error
patch set[1].

[1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html
"btrfs list corruption and soft lockups while testing writeback error handling"

Signed-off-by: Liu Bo 
---
 fs/btrfs/file.c | 1 +
 fs/btrfs/tree-log.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 063180b..db70eaa 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2241,6 +2241,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
if (ctx.io_err) {
btrfs_end_transaction(trans);
ret = ctx.io_err;
+   ASSERT(list_empty());
goto out;
}
 
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c800d06..d300284 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4100,7 +4100,7 @@ static int log_one_extent(struct btrfs_trans_handle 
*trans,
 
if (ordered_io_err) {
ctx->io_err = -EIO;
-   return 0;
+   return ctx->io_err;
}
 
btrfs_init_map_token();
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem vs original

2017-11-13 Thread Chris Murphy
On Mon, Nov 13, 2017 at 11:40 AM, Chris Murphy  wrote:
> On Sun, Nov 12, 2017 at 9:37 PM, Qu Wenruo  wrote:
>
>>
>> The final bug about backref counts mismatch, without the image I'm not
>> really sure what's going on.
>>
>> (But extent tree verification is really buggy though)
>
> Ahh I imaged the wrong file system! Sorry about that:
>
>
>>>ERROR: extent[366498091008, 134217728] referencer count mismatch
>>>(root: 827, owner: 73782, offset: 134217728) wanted: 4, have: 26
>
>>>Complete output, with filtered btrfs-debug-tree info for extent
>>>address 366498091008.
>>>https://drive.google.com/open?id=1LUMtLIc1LcXhYN3twFcxTyFuJ6ScMlyQ
>
> Here is the image for that two device raid1:
>
> https://drive.google.com/open?id=1LO96omP3BQh9C3ZYVXoUDT4-H8P5lZ6D
>
By the way, this one is done with -ss. But it has many failures to
compute collisions, hundreds of files/dirs with 4 characters or less.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem vs original

2017-11-13 Thread Chris Murphy
On Sun, Nov 12, 2017 at 9:37 PM, Qu Wenruo  wrote:

>
> The final bug about backref counts mismatch, without the image I'm not
> really sure what's going on.
>
> (But extent tree verification is really buggy though)

Ahh I imaged the wrong file system! Sorry about that:


>>ERROR: extent[366498091008, 134217728] referencer count mismatch
>>(root: 827, owner: 73782, offset: 134217728) wanted: 4, have: 26

>>Complete output, with filtered btrfs-debug-tree info for extent
>>address 366498091008.
>>https://drive.google.com/open?id=1LUMtLIc1LcXhYN3twFcxTyFuJ6ScMlyQ

Here is the image for that two device raid1:

https://drive.google.com/open?id=1LO96omP3BQh9C3ZYVXoUDT4-H8P5lZ6D



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-image hash collision option, super slow

2017-11-13 Thread Chris Murphy
On Mon, Nov 13, 2017 at 1:02 AM, Piotr Pawłow  wrote:
> W dniu 13.11.2017 o 04:42, Chris Murphy pisze:
>> Strange. I was using 4.3.3 and it had been running for over 9 hours at
>> the time I finally cancelled it.
>
> If you're compiling from source, the usual advice would be to "make clean" 
> and make sure you're using the correct executable.

Interesting, using btrfs-progs-4.13.3-1.fc28.x86_64, it goes super
fast with -ss (minutes). But when using 4.3.3 built myself (nothing
special, autogen, configure, make) it's super slow, hours.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: bail out gracefully rather than BUG_ON

2017-11-13 Thread David Sterba
On Mon, Oct 30, 2017 at 11:14:38AM -0600, Liu Bo wrote:
> If a file's DIR_ITEM key is invalid (due to memory errors) and gets
> written to disk, a future lookup_path can end up with kernel panic due
> to BUG_ON().
> 
> This gets rid of the BUG_ON(), meanwhile output the corrupted key and
> return ENOENT if it's invalid.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 

> ---
> The diff doesn't show the logic well, 'goto out_err' will return with
> assigning 0 to location->objectid, and the caller already has a check
> for (location->objectid == 0) to return -ENOENT.

Feel free to send a cleanup patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Btrfs: introduce device flags

2017-11-13 Thread David Sterba
On Wed, Nov 08, 2017 at 11:46:13AM -0800, Liu Bo wrote:
> On Mon, Nov 06, 2017 at 05:40:25PM +0100, David Sterba wrote:
> > > + Faulty, /* device is known to have a fault */
> > > + In_sync,/* device is in_sync with rest of array */
> > Enums usually are all caps, and with some prefix.
> Well, copied from md... I'll make it all caps, but no prefix name
> seems better to me.

There are 19 .c files including volumes.h, this needs a prefix. We
should keep the naming consistent within btrfs, the references to MD can
be put to a comment.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: set plug for fsync

2017-11-13 Thread David Sterba
On Thu, Nov 09, 2017 at 05:16:16PM -0700, Liu Bo wrote:
> Setting plug can merge adjacent IOs before dispatching IOs to the disk
> driver.
> 
> Without plug, it'd not be a problem for single disk usecases, but for
> multiple disks using raid profile, a large IO can be split to several
> IOs of stripe length, and plug can be helpful to bring them together
> for each disk so that we can save several disk access.
> 
> Moreover, fsync issues synchronous writes, so plug can really take
> effect.
> 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/file.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index e43da6c..063180b 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2018,10 +2018,13 @@ int btrfs_release_file(struct inode *inode, struct 
> file *filp)
>  static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end)
>  {
>   int ret;
> + struct blk_plug plug;
>  
> + blk_start_plug();

Can you please add a comment here? Essentially repeating the commit
message. The plug/unplug calls are never obvious so the expectations
could be documented.

>   atomic_inc(_I(inode)->sync_writers);
>   ret = btrfs_fdatawrite_range(inode, start, end);
>   atomic_dec(_I(inode)->sync_writers);
> + blk_finish_plug();
>  
>   return ret;
>  }
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs changes for 4.15

2017-11-13 Thread David Sterba
Hi,

please pull the following btrfs changes. There are some new user features and
the usual load of invisible enhancements or cleanups. The branch merges
cleanly, has been frozen in case rc7 was the last one, so I send out the pull
request early. Thanks.


New features:

- extend mount options to specify zlib compression level, -o compress=zlib:9

- v2 of ioctl "extent to inode mapping", addressing a usecase where we want to
  retrieve more but inaccurate results and do the postprocessing in userspace,
  aiding defragmentation or deduplication tools

- populate compression heuristics logic, do data sampling and try to guess
  compressibility by: looking for repeated patterns, counting unique byte
  values and distribution, calculating Shannon entropy;
  this will need more benchmarking and possibly fine tuning, but the base
  should be good enough

- enable indexing for btrfs as lower filesystem in overlayfs

- speedup page cache readahead during send on large files


Internal enhancements:

- more sanity checks of b-tree items when reading them from disk

- more EINVAL/EUCLEAN fixups, missing BLK_STS_* conversion, other errno or
  error handling fixes

- remove some homegrown IO-related logic, that's been obsoleted by core block
  layer changes (batching, plug/unplug, own counters)

- add ref-verify, optional debugging feature to verify extent reference
  accounting

- simplify code handling outstanding extents, make it more clear where and how
  the accounting is done

- make delalloc reservations per-inode, simplify the code and make the logic
  more straightforward

- extensive cleanup of delayed refs code


Notable fixes:

- fix send ioctl on 32bit with 64bit kernel


The branch top commit matches the signed tag for-4.15-tag.


The following changes since commit 0b07194bb55ed836c2cc7c22e866b87a14681984:

  Linux 4.14-rc7 (2017-10-29 13:58:38 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.15

for you to fetch changes up to d28e649a5c58b779b303c252c66ee84a0f2c3b32:

  btrfs: Fix bug for misused dev_t when lookup in dev state hash table. 
(2017-11-01 20:45:36 +0100)


Adam Borowski (1):
  btrfs: allow setting zlib compression level via :9

Allen Pais (1):
  btrfs: return -ENOMEM on allocation failure in btrfsic

Anand Jain (13):
  btrfs: declare TRACE_DEFINE_ENUM for each of show_flush_state enum
  btrfs: copy fsid to super_block s_uuid
  btrfs: undo writable superblocke when sprouting fails
  btrfs: fix BUG_ON in btrfs_init_new_device()
  btrfs: error out if btrfs_attach_transaction() fails
  btrfs: add_missing_dev() should return the actual error
  btrfs: fix EIO misuse to report missing degraded option
  btrfs: declare btrfs_report_missing_device() static
  btrfs: fix use of error or warning for missing device
  btrfs: use BLK_STS defines where needed
  btrfs: use need_full_stripe() in __btrfs_map_block()
  btrfs: fix false EIO for missing device
  btrfs: remove BUG_ON in btrfs_rm_dev_replace_free_srcdev()

Arnd Bergmann (1):
  btrfs: tree-checker: use %zu format string for size_t

Christophe JAILLET (1):
  btrfs: tests: Fix a memory leak in error handling path in 'run_test()'

Christos Gkekas (2):
  btrfs: Clean up dead code in root-tree
  btrfs: Clean up unused variables in free-space-tree.c

Colin Ian King (2):
  btrfs: avoid null pointer dereference on fs_info when calling btrfs_crit
  btrfs: make array types static const, reduces object code size

David Sterba (4):
  btrfs: scrub: get rid of sector_t
  btrfs: rename page offset parameter in submit_extent_page
  btrfs: get rid of sector_t and use u64 offset in submit_extent_page
  btrfs: allow to set compression level for zlib

Goldwyn Rodrigues (1):
  btrfs: cleanup extent locking sequence

Gu JinXiang (2):
  btrfs: Use bd_dev to generate index when dev_state_hashtable add items.
  btrfs: Fix bug for misused dev_t when lookup in dev state hash table.

Hans van Kranenburg (1):
  btrfs: prefix sysfs attribute struct names

Josef Bacik (22):
  btrfs: change how we decide to commit transactions during flushing
  btrfs: fix send ioctl on 32bit with 64bit kernel
  btrfs: add ref-verify mount option
  btrfs: pass root to various extent ref mod functions
  Btrfs: add a extent ref verify tool
  Btrfs: only check delayed ref usage in should_end_transaction
  btrfs: add a helper to return a head ref
  btrfs: move extent_op cleanup to a helper
  btrfs: breakout empty head cleanup to a helper
  btrfs: move ref_mod modification into the if (ref) logic
  btrfs: move all ref head cleanup to the helper function
  btrfs: remove delayed_ref_node from ref_head
  btrfs: remove type argument from 

Re:

2017-11-13 Thread Amos Kalonzo
Attn:

I am wondering why You haven't respond to my email for some days now.
reference to my client's contract balance payment of (11.7M,USD)
Kindly get back to me for more details.

Best Regards

Amos Kalonzo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: discard on SSDs quickly causes backup trees to vanish

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-11 19:28, Qu Wenruo wrote:



On 2017年11月12日 04:12, Hans van Kranenburg wrote:

Hi,

On 11/11/2017 04:48 AM, Qu Wenruo wrote:


On 2017年11月11日 11:13, Hans van Kranenburg wrote:

On 11/11/2017 03:30 AM, Qu Wenruo wrote:





One more chance to recover is never a bad idea.


It is a bad idea. The *only* case you can recover from is when you
freeze the filesystem *directly* after writing the superblock. Only in
that case you have both a consistent last committed and previous
transaction on disk.


You're talking about the ideal case.

The truth is, we're living in a real world where every software has
bugs. And that's why sometimes we get transid error.

So keeps the backup root still makes sense.

And further more, different trees have different update frequency.
For root and extent tree, they get updated every transaction, while for
chunk tree it's seldom updated.

And backup roots are updated per transaction, which means we may have a
high chance to recover at least chunk root and to know the chunk map and
possible to grab some data.


 That's entirely right yes. But "possible to grab some data" is a
whole different thing than "getting the filesystem back into a fully
functional consistent state..."

 So it's about expectation management for end users. If the user
thinks "Ha! A backup! That's nice of btrfs, it keeps them so I can go
back!.", then the user will get disappointed when the backups are unusable.


Without discard, user should be able to rollback to previous transaction
(backup_root[0])
Unless BTRFS is going out of it's way to ensure this, that's not 
necessarily true.  I'm fairly certain that we try to reuse empty space 
in already allocated chunks before allocating new ones, which would mean 
that there's a reasonable chance on a filesystem that's got the proper 
ratio of metadata and data chunks and has very little slack space in the 
metadata chunks that the old transactions will get overwritten pretty 
quickly (possibly immediately).


The last transaction committed with commit_root and root->node switched,
and as I stated in previous mail, until this swtich, commit_root must be
fully available.

And after the last transaction there is no modification (since the last
trans is for unmount), so backuproot[0] should be fully accessible.

Discard can break it unless we have method to trace tree block space
usage for at least 2 transactions.



The design of btrfs is that all metadata tree blocks and data extent
space that is not used by the last completed transaction are freed to be
reused, as soon as possible. For cow-only roots (e.g. root tree, extent
tree) this is already done immediately in the transaction code after
writing the super block (btrfs_finish_extent_commit, discard is also
immediately triggered), and for reference counted roots (subvolume
roots) the cleaner will asap do it.

So, the design gives zero guarantee that following a backup root will
work. But, it's better than nothing when trying to scrape some data off
of the borken filesystem.


Again, only for discard.



 Maybe it's enough to change man 5 btrfs with the mount options with
a warning for the usebackuproot option to let the user know that doing
this might result in a mountable filesystem, but that even in case it
does, the result should only be used to get as much data as possible off
of it before doing mkfs again. Or, if it succeeds, and if also umounting
again and running a full btrfsck and scrub to check all metadata and
data succeeds, the user might be pretty confident that nothing
referenced by the previous backuproot was already overwritten with new
data, in which case the filesystem can be continued to be used.

But it puts usebackuproot very much in the same department where tools
like btrfs-restore live.


Isn't it the original design?
No one sane would use it for daily usage and it's original called
"recovery", I don't see any problem here.
I agree on this point, it's not something regular users should be using, 
but we don't really need to tell most people that.  The only ones I can 
see being a potential issue are those who actually read the 
documentation but don't really have a good understanding of computers, 
which in my experience is usually less than 1% of users in most cases.



If you do new writes and then again are able to mount with -o
usebackuproot and if any of the
transaction-before-the-last-committed-transaction blocks are overwritten
you're in a field of land mines and time bombs. Being able to mount
gives a false sense of recovery to the user in that case, because either
you're gonna crash into transid problems for metadata, or there are
files in the filesystem in which different data shows up than should,
potentially allowing users to see data from other users etc... It's just
dangerous.


As you can see, if metadata CoW is completely implemented as designed,
there will be no report of transid mismatch at all.
And btrfs should be bullet proof from the very 

Re: discard on SSDs quickly causes backup trees to vanish

2017-11-13 Thread Qu Wenruo


On 2017年11月13日 20:57, Austin S. Hemmelgarn wrote:
> On 2017-11-11 19:28, Qu Wenruo wrote:
>>
>>
>> On 2017年11月12日 04:12, Hans van Kranenburg wrote:
>>> Hi,
>>>
>>> On 11/11/2017 04:48 AM, Qu Wenruo wrote:

 On 2017年11月11日 11:13, Hans van Kranenburg wrote:
> On 11/11/2017 03:30 AM, Qu Wenruo wrote:
>>
>
>> One more chance to recover is never a bad idea.
>
> It is a bad idea. The *only* case you can recover from is when you
> freeze the filesystem *directly* after writing the superblock. Only in
> that case you have both a consistent last committed and previous
> transaction on disk.

 You're talking about the ideal case.

 The truth is, we're living in a real world where every software has
 bugs. And that's why sometimes we get transid error.

 So keeps the backup root still makes sense.

 And further more, different trees have different update frequency.
 For root and extent tree, they get updated every transaction, while for
 chunk tree it's seldom updated.

 And backup roots are updated per transaction, which means we may have a
 high chance to recover at least chunk root and to know the chunk map
 and
 possible to grab some data.
>>>
>>>  That's entirely right yes. But "possible to grab some data" is a
>>> whole different thing than "getting the filesystem back into a fully
>>> functional consistent state..."
>>>
>>>  So it's about expectation management for end users. If the user
>>> thinks "Ha! A backup! That's nice of btrfs, it keeps them so I can go
>>> back!.", then the user will get disappointed when the backups are
>>> unusable.
>>
>> Without discard, user should be able to rollback to previous transaction
>> (backup_root[0])
> Unless BTRFS is going out of it's way to ensure this, that's not
> necessarily true.  I'm fairly certain that we try to reuse empty space
> in already allocated chunks before allocating new ones, which would mean
> that there's a reasonable chance on a filesystem that's got the proper
> ratio of metadata and data chunks and has very little slack space in the
> metadata chunks that the old transactions will get overwritten pretty
> quickly (possibly immediately).

Then btrfs will make metadata just like butter.
As the only thing to keep btrfs survive from a power loss is its
metadata CoW.
If previous (committed) transaction get modified before current trans
fully committed, power loss = death of data.

I'll add new sanity check to see if this is true.
If it ends up btrfs has already such protection, then just another
sanity test.
If not, at least we will find something to fix and know the reason why
btrfs is not bullet proof to power loss.

Thanks,
Qu

>>
>> The last transaction committed with commit_root and root->node switched,
>> and as I stated in previous mail, until this swtich, commit_root must be
>> fully available.
>>
>> And after the last transaction there is no modification (since the last
>> trans is for unmount), so backuproot[0] should be fully accessible.
>>
>> Discard can break it unless we have method to trace tree block space
>> usage for at least 2 transactions.
>>
>>>
>>> The design of btrfs is that all metadata tree blocks and data extent
>>> space that is not used by the last completed transaction are freed to be
>>> reused, as soon as possible. For cow-only roots (e.g. root tree, extent
>>> tree) this is already done immediately in the transaction code after
>>> writing the super block (btrfs_finish_extent_commit, discard is also
>>> immediately triggered), and for reference counted roots (subvolume
>>> roots) the cleaner will asap do it.
>>>
>>> So, the design gives zero guarantee that following a backup root will
>>> work. But, it's better than nothing when trying to scrape some data off
>>> of the borken filesystem.
>>
>> Again, only for discard.
>>
>>>
>>>  Maybe it's enough to change man 5 btrfs with the mount options with
>>> a warning for the usebackuproot option to let the user know that doing
>>> this might result in a mountable filesystem, but that even in case it
>>> does, the result should only be used to get as much data as possible off
>>> of it before doing mkfs again. Or, if it succeeds, and if also umounting
>>> again and running a full btrfsck and scrub to check all metadata and
>>> data succeeds, the user might be pretty confident that nothing
>>> referenced by the previous backuproot was already overwritten with new
>>> data, in which case the filesystem can be continued to be used.
>>>
>>> But it puts usebackuproot very much in the same department where tools
>>> like btrfs-restore live.
>>
>> Isn't it the original design?
>> No one sane would use it for daily usage and it's original called
>> "recovery", I don't see any problem here.
> I agree on this point, it's not something regular users should be using,
> but we don't really need to tell most people that.  The only ones I can
> see being a potential issue 

Re: [PATCH 0/4] Lowmem mode btrfs fixes exposed by complex tree

2017-11-13 Thread Qu Wenruo


On 2017年11月13日 15:34, Qu Wenruo wrote:
> The patchset (along with "backref lost" bug fixes and test cases) can be
> fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/lowmem_fix
> 
> Despite the backref lost false alerts reported by Chris Murphy, there
> are still some other bugs to be fixed.
> 
> One is also exposed by Chris Murphy, where btrfs-progs backref can't
> handle shared block ref for metadata. Fix by 1st patch.
> 
> And 2 more bugs exposed by the test image which is originally designed
> for the bug fixed by 1st patch.
> 
> Last but not the least, here comes the test image.
> Which is an image with a lot of metadata and under a relocation.
> It is definitely a bomb for old lowmem check.
> 
> Qu Wenruo (4):
>   btrfs-progs: backref: Allow backref walk to handle direct parent ref
>   btrfs-progs: lowmem check: Fix function call stack overflow caused by
> wrong tree reloc tree detection
>   btrfs-progs: lowmem check: Fix false alerts for image with shared
> block ref only backref
>   btrfs-progs: fsck-test: Add new image with shared block ref only
> metadata backref

The last patch is a little big.

Even the image is dumped by -c9, it still takes near 300KiB.

Anyway, it's a binary patch, submitting to mail list doesn't help much
to review.

If any one want to test or just to see the last image, please fetch it
from github.

Thanks,
Qu

> 
>  backref.c  |   3 ++
>  cmds-check.c   |  35 
> +
>  .../020-extent-ref-cases/shared_block_ref_only.img | Bin 0 -> 304128 bytes
>  3 files changed, 32 insertions(+), 6 deletions(-)
>  create mode 100644 
> tests/fsck-tests/020-extent-ref-cases/shared_block_ref_only.img
> 



signature.asc
Description: OpenPGP digital signature


Re: btrfs-image hash collision option, super slow

2017-11-13 Thread Piotr Pawłow
W dniu 13.11.2017 o 04:42, Chris Murphy pisze:
> Strange. I was using 4.3.3 and it had been running for over 9 hours at
> the time I finally cancelled it.

If you're compiling from source, the usual advice would be to "make clean" and 
make sure you're using the correct executable.

If your fs is very large then caching may be a problem. With the brute force 
method it was a good idea to cache the results. With the "reverse crc" method 
caching is not very useful. It's only marginally faster on my root fs, and on 
larger filesystems searching the cache will be slower than finding collisions. 
You can remove the code and see if it makes any difference:

diff --git a/image/main.c b/image/main.c
index 4cffbdba..f77b1504 100644
--- a/image/main.c
+++ b/image/main.c
@@ -500,19 +500,9 @@ static char *find_collision(struct metadump_struct *md, 
char *name,
    u32 name_len)
 {
    struct name *val;
-   struct rb_node *entry;
-   struct name tmp;
    int found;
    int i;
 
-   tmp.val = name;
-   tmp.len = name_len;
-   entry = tree_search(>name_tree, , name_cmp, 0);
-   if (entry) {
-   val = rb_entry(entry, struct name, n);
-   free(name);
-   return val->sub;
-   }
 
    val = malloc(sizeof(struct name));
    if (!val) {
@@ -548,7 +538,6 @@ static char *find_collision(struct metadump_struct *md, 
char *name,
    }
    }
 
-   tree_insert(>name_tree, >n, name_cmp);
    return val->sub;
 }

>
>> Unfortunately there are no CRC32 collisions for any file name having 4 or 
>> less characters when you have to keep the same file name length, and there 
>> may be no collisions for longer file names when you limit the result to 
>> ASCII only.
> Gotcha.

Yeah, it also means for short file names an attacker can easily find the real 
name by finding all collisions and filtering out obviously nonsensical names, 
so it's more of an obfuscation than sanitization :/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html