[PATCH 2/2] btrfs-progs: dump-tree: Also output log root tree

2017-03-02 Thread Qu Wenruo
In btrfs-dump-tree, we output any existing log tree, however we don't
output the log root tree, which records all root items for log trees.

This makes it confusing for any one who want to know where the log tree
comes from.

Signed-off-by: Qu Wenruo 
---
 cmds-inspect-dump-tree.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index 2c6bec7f..eca91b5e 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -344,6 +344,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
printf("chunk tree: %llu level %d\n",
 (unsigned long long)info->chunk_root->node->start,
 btrfs_header_level(info->chunk_root->node));
+   printf("log root tree: %llu level %d\n",
+(unsigned long 
long)info->log_root_tree->node->start,
+btrfs_header_level(info->log_root_tree->node));
} else {
if (info->tree_root->node) {
printf("root tree\n");
@@ -356,6 +359,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
btrfs_print_tree(info->chunk_root,
 info->chunk_root->node, 1);
}
+
+   if (info->log_root_tree) {
+   printf("log root tree\n");
+   btrfs_print_tree(info->log_root_tree,
+   info->log_root_tree->node, 1);
+   }
}
}
tree_root_scan = info->tree_root;
@@ -388,6 +397,17 @@ again:
goto close_root;
}
 
+   if (tree_id && tree_id == BTRFS_TREE_LOG_OBJECTID) {
+   if (!info->log_root_tree) {
+   error("cannot print log root tree, invalid pointer");
+   goto close_root;
+   }
+   printf("log root tree\n");
+   btrfs_print_tree(info->log_root_tree, info->log_root_tree->node,
+1);
+   goto close_root;
+   }
+
key.offset = 0;
key.objectid = 0;
key.type = BTRFS_ROOT_ITEM_KEY;
-- 
2.12.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Duncan
Peter Grandi posted on Fri, 03 Mar 2017 00:47:46 + as excerpted:

>> [ ... ] Meanwhile, the problem as I understand it is that at the first
>> raid1 degraded writable mount, no single-mode chunks exist, but without
>> the second device, they are created.  [ ... ]
> 
> That does not make any sense, unless there is a fundamental mistake in
> the design of the 'raid1' profile, which this and other situations make
> me think is a possibility: that the category of "mirrored" 'raid1' chunk
> does not exist in the Btrfs chunk manager. That is, a chunk is either
> 'raid1' if it has a mirror, or if has no mirror it must be 'single'.
> 
> If a member device of a 'raid1' profile multidevice volume disappears
> there will be "unmirrored" 'raid1' profile chunks and some code path
> must recognize them as such, but the logic of the code does not allow
> their creation. Question: how does the code know that a specific 'raid1'
> chunk is mirrored or not? The chunk must have a link (member, offset) to
> its mirror, do they?

The problem at the surface level is, raid1 chunks MUST be created with 
two copies, one each on two different devices.  It is (currently) not 
allowed to create only a single copy of a raid1 chunk, and the two copies 
must be on different devices, so once you have only a single device, 
raid1 chunks cannot be created.

Which presents a problem when you're trying to recover, needing writable 
in ordered to be able to do a device replace or add/remove (with the 
remove triggering a balance), because btrfs is COW, so any changes get 
written to new locations, which requires chunked space that might not be 
available in the currently allocated chunks.

To work around that, they allowed the chunk allocator to fallback to 
single mode when it couldn't create raid1.

Which is fine as long as the recovery is completed in the same mount.  
But if you unmount or crash and try to remount to complete the job after 
those single-mode chunks have been created, oops!  Single mode chunks on 
a multi-device filesystem with a device missing, and the logic currently 
isn't sophisticated enough to realize that all the chunks are actually 
accounted for, so it forces read-only mounting to prevent further damage.

Which means you can copy off the files to a different filesystem as 
they're still all available, including any written in single-mode, but 
you can't fix the degraded filesystem any longer, as that requires a 
writable mount you're not going to be able to get, at least not with 
mainline.


At a lower level, the problem is that for raid1 (and I think raid10 as 
well tho I'm not sure on it), they made a mistake in the implementation.

For raid56, the minimum allowed writable devices is lower than the 
minimum number of devices for undegraded write, by the number of parity 
devices (so raid5 will allow two devices for undegraded write, 1 parity, 
one data, but one device for degraded write, raid6 will allow three 
devices for undegraded write, one data, two parity, or again, one device 
for degraded write).

But for raid1, both the degraded write minimum and the undegraded write 
minimum are set to *two* devices, an implementation error since the 
degraded write minimum should arguably be one device, without a mirror.

So the degrade to single-mode is a workaround for the real problem, not 
allowing degraded raid1 write (that is, chunk creation).

And all this is known and has been discussed right here on this list by 
the devs, but nobody has actually bothered to properly fix it, either by 
correctly setting the degraded raid1 write minimum to a single device, or 
even by working around the single-mode workaround, by correctly checking 
each chunk and allowing writable mount if all are accounted for, even if 
there's a missing device.

Or rather, the workaround for the incomplete workaround has had a patch 
submitted, but it got stuck in that long-running project and has been in 
limbo every since, and now I guess the patch has gone stale and doesn't 
even properly apply any longer.


All of which is yet more demonstration of the fact that is stated time 
and again on this list, that btrfs should be considered stabilizing, but 
still under heavy development and not yet fully stable, and backups 
should be kept updated and at-hand for any data you value higher than the 
bother and resources necessary to make those backups.

Because if there's backups updated and at hand, then what happens to the 
working copy doesn't matter, and in this particular case, even if the 
backups aren't fully current, the fact that they're available means 
there's space available to update them from the working copy should it go 
into readonly mode as well, which means recovery from the read-only 
formerly working copy is no big deal.

Either that, or by definition, the data wasn't of enough value to have 
backups when storing it on a widely known to be still stabilizing and 
under heavy development filesystem, where those backups 

[PATCH] Btrfs: remove ASSERT in btrfs_truncate_inode_items

2017-03-02 Thread Liu Bo
After 76b42abbf748 ("Btrfs: fix data loss after truncate when using the
no-holes feature"),

For either NO_HOLES or inline extents, we've set last_size to newsize to
avoid data loss after remount or inode got evicted and read again, thus,
we don't need this check anymore.

Signed-off-by: Liu Bo 
---
 fs/btrfs/inode.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ee6978d..5652f5f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4603,13 +4603,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
 
btrfs_free_path(path);
 
-   if (err == 0) {
-   /* only inline file may have last_size != new_size */
-   if (new_size >= fs_info->sectorsize ||
-   new_size > fs_info->max_inline)
-   ASSERT(last_size == new_size);
-   }
-
if (be_nice && bytes_deleted > SZ_32M) {
unsigned long updates = trans->delayed_ref_updates;
if (updates) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

2017-03-02 Thread Liu Bo
On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> > 
> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
> > cloning function as well) we end up allowing copy an inline extent from
> > the source file into a non-zero offset of the destination file. This is
> > something not expected and that the btrfs code is not prepared to deal
> > with - all inline extents must be at a file offset equals to 0.
> >
> 
> Somehow I failed to reproduce the BUG_ON with this case.
> 
> > For example, the following excerpt of a test case for fstests triggers
> > a crash/BUG_ON() on a write operation after an inline extent is cloned
> > into a non-zero offset:
> > 
> >   _scratch_mkfs >>$seqres.full 2>&1
> >   _scratch_mount
> > 
> >   # Create our test files. File foo has the same 2K of data at offset 4K
> >   # as file bar has at its offset 0.
> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
> >   -c "pwrite -S 0xbb 4k 2K" \
> >   -c "pwrite -S 0xcc 8K 4K" \
> >   $SCRATCH_MNT/foo | _filter_xfs_io
> > 
> >   # File bar consists of a single inline extent (2K size).
> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
> >  $SCRATCH_MNT/bar | _filter_xfs_io
> > 
> >   # Now call the clone ioctl to clone the extent of file bar into file
> >   # foo at its offset 4K. This made file foo have an inline extent at
> >   # offset 4K, something which the btrfs code can not deal with in future
> >   # IO operations because all inline extents are supposed to start at an
> >   # offset of 0, resulting in all sorts of chaos.
> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
> >   # what it returns for other cases dealing with inlined extents.
> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
> >   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> > 
> >   # Because of the inline extent at offset 4K, the following write made
> >   # the kernel crash with a BUG_ON().
> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
> >
> 
> On 4.10, after allowing to clone an inline extent to dst file's offset greater
> than zero, I followed the test case manually and got these
> 
> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
> (257 0): ram 4096 disk 12648448 disk_size 4096
> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
> (257 8192): ram 4096 disk 12656640 disk_size 4096
> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20
> 
> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo 
> wrote 2048/2048 bytes at offset 6144
> 2 KiB, 1 ops; 0. sec (12.520 MiB/sec and 6410.2564 ops/sec)
> 
> [root@localhost trinity]# sync
> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
> (257 0): ram 4096 disk 12648448 disk_size 4096
> (257 4096): ram 4096 disk 12582912 disk_size 4096
> (257 8192): ram 4096 disk 12656640 disk_size 4096
> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00
> 
> 
> Looks like we now are able to cope with these inline extents?

I went back to test against v4.1 and v4.5, it turns out that we got the below
BUG_ON() in MM and -EIO when writing to the inline extent, because of the fact
that, when writing to the page that covers the inline extent, firstly it reads
page to get an uptodate page for writing, in readpage(), for inline extent,
btrfs_get_extent() always goes to search fs tree to read inline data out to page
and then tries to insert a em, -EEXIST would be returned if there is an existing
one.

However, after commit 8dff9c853410 ("Btrfs: deal with duplciates during
extent_map insertion in btrfs_get_extent"), we have that fixed, so now we can
read/write inline extent even they've been mixed with other regular extents.

But...I'm not 100% sure whether such files (with mixing inline with regular)
would have any other problems rather than read/write.  Let me know if you could
think of a corruption due to that.

Thanks,

-liubo
> 
> 
> Thanks,
> 
> -liubo
> 
> 
> >   status=0
> >   exit
> > 
> > The stack trace of the BUG_ON() triggered by the last write is:
> > 
> >   [152154.035903] [ cut here ]
> >   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
> >   [152154.036424] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >   [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic 
> > xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache 
> > sunrpc loop fuse parport_pc acpi_cpu$
> >   [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: GW   
> > 4.1.0-rc6-btrfs-next-11+ #2
> >   [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
> >   [152154.036424] task: 880429f70990 ti: 880429efc000 task.ti: 

Re: assertion failed: last_size == new_size, file: fs/btrfs/inode.c

2017-03-02 Thread Liu Bo
On Thu, Mar 02, 2017 at 07:58:01AM -0800, Liu Bo wrote:
> On Wed, Mar 01, 2017 at 03:03:19PM -0500, Dave Jones wrote:
> > On Tue, Feb 28, 2017 at 05:12:01PM -0800, Liu Bo wrote:
> >  > On Mon, Feb 27, 2017 at 11:23:42AM -0500, Dave Jones wrote:
> >  > > On Mon, Feb 27, 2017 at 07:53:48AM -0800, Liu Bo wrote:
> >  > >  > On Sun, Feb 26, 2017 at 07:18:42PM -0500, Dave Jones wrote:
> >  > >  > > Hitting this fairly frequently.. I'm not sure if this is the same 
> > bug I've
> >  > >  > > been hitting occasionally since 4.9. The assertion looks new to 
> > me at least.
> >  > >  > >
> >  > >  > 
> >  > >  > It was recently introduced by my commit and used to catch data loss 
> > at truncate.
> >  > >  > 
> >  > >  > Were you running the test with a mkfs.btrfs -O NO_HOLES?
> >  > >  > (We just queued a fix for the NO_HOLES case in btrfs-next.)
> >  > > 
> >  > > No, a fs created with default mkfs.btrfs options.
> >  > 
> >  > I have this patch[1] to fix a bug which results in file hole extent, and 
> > this
> >  > bug could lead us to hit the assertion.
> >  > 
> >  > Would you try to run the test w/ it, please?
> >  > 
> >  > [1]: https://patchwork.kernel.org/patch/9597281/
> > 
> > Made no difference. Still see the same trace & assertion.
> 
> Some updates here, I've got it reproduced, somehow a corner case ends up
> with a inline file extent following by some pre-alloc extents, along the
> way, isize also got updated unexpectedly.  Will try to narrow it down.
>

I realized that btrfs now could tolerate files that mix inline extents with
regular extents, so we don't need this ASSERT() anymore, will send a patch to
remove it.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Chris Murphy
On Thu, Mar 2, 2017 at 6:18 PM, Qu Wenruo  wrote:
>
>
> At 03/03/2017 09:15 AM, Chris Murphy wrote:
>>
>> [1805985.267438] BTRFS info (device dm-6): allowing degraded mounts
>> [1805985.267566] BTRFS info (device dm-6): disk space caching is enabled
>> [1805985.267676] BTRFS info (device dm-6): has skinny extents
>> [1805987.187857] BTRFS warning (device dm-6): missing devices (1)
>> exceeds the limit (0), writeable mount is not allowed
>> [1805987.228990] BTRFS error (device dm-6): open_ctree failed
>> [chris@f25s ~]$ sudo mount -o noatime,degraded,ro /dev/mapper/sdb /mnt
>> [chris@f25s ~]$ sudo btrfs fi df /mnt
>> Data, RAID1: total=434.00GiB, used=432.46GiB
>> Data, single: total=1.00GiB, used=1.66MiB
>> System, RAID1: total=8.00MiB, used=48.00KiB
>> System, single: total=32.00MiB, used=32.00KiB
>> Metadata, RAID1: total=2.00GiB, used=729.17MiB
>> Metadata, single: total=1.00GiB, used=0.00B
>> GlobalReserve, single: total=495.02MiB, used=0.00B
>> [chris@f25s ~]$
>>
>>
>>
>> So the sequence is:
>> 1. mkfs.btrfs -d raid1 -m raid1 > 2. fill it with a bunch of data over a few months, always mounted
>> normally with default options
>> 3. physically remove 1 of 2 devices, and do a degraded mount. This
>> mounts without error, and more stuff is added. Volume is umounted.
>> 4. Try to mount the same 1 of 2 devices, with degraded mount option,
>> and I get the first error, "writeable mount is not allowed".
>> 5. Try to mount the same 1 of 2 devices, with degraded,ro option, and
>> it mounts, and then I captured the 'btfs fi df' above.
>>
>> So very clearly there are single chunks added during the degraded rw
>> mount.
>>
>> But does 1.66MiB of data in that single data chunk make sense? And
>> does 0.00 MiB of metadata in that single metadata chunk make sense?
>> I'm not sure, seems unlikely. Most of what happened in that subvolume
>> since the previous snapshot was moving things around, reorganizing,
>> not adding files. So, maybe 1.66MiB data added is possible? But
>> definitely the metadata changes must be in the raid1 chunks, while the
>> newly created single profile metadata chunk is left unused.
>>
>> So I think there's more than one bug going on here, separate problems
>> for data and metadata.
>
>
> IIRC I submitted a patch long time ago to check each chunk to see if it's OK
> to mount in degraded mode.
>
> And in your case, it will allow RW degraded mount since the stripe of that
> single chunk is not missing.
>
> That patch is later merged into hot-spare patchset, but AFAIK it will be a
> long long time before such hot-spare get merged.
>
> So I'll update that patch and hope it can solve the problem.
>

OK thanks. Yeah I should have said that this is not a critical
situation for me. It's just a confusing situation.

In particular that people could do a btrfs replace; or do btrfs dev
add, then btrfs dev missing, and what happens? There's some data
that's not replicated on the replacement drive because it's single
profile, and if that happens to be metadata it's possibly
unpredictable what happens when the drive with single chunks dies. At
the very least there is going to be some data loss. It's entirely
possible the drive that's missing these single chunks can't be mounted
degraded. And for sure it's possible that it can't be used for
replication, when doing a device replace for the 1st device with the
only copy of these single chunks.

Again, my data is fine. The problem I'm having is this:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/filesystems/btrfs.txt?id=refs/tags/v4.10.1

Which says in the first line, in part, "focusing on fault tolerance,
repair and easy administration" and quite frankly this sort of
enduring bug in this file system that's nearly 10 years old now, is
rendered misleading, and possibly dishonest. How do we describe this
file system as focusing on fault tolerance when, in the identical
scenario using mdadm or LVM raid, the user's data is not mishandled
like it is on Btrfs with multiple devices?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Qu Wenruo



At 03/02/2017 05:43 PM, Marc Joliet wrote:

On Thursday 02 March 2017 08:43:53 Qu Wenruo wrote:

At 02/02/2017 08:01 PM, Marc Joliet wrote:

On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:

Hello list!


Hi list


[kernel message snipped]


Btrfs --repair refused to repair the filesystem telling me something
about compressed extents and an unsupported case, wanting me to take an
image and send it to the devs. *sigh*


I haven't tried a repair yet; it's a big file system, and btrfs-check is
still running:

# btrfs check -p /dev/sdd2
Checking filesystem on /dev/sdd2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858
parent transid verify failed on 3829276291072 wanted 224274 found 283858


Normal transid error, can't say much about if it's harmless, but at
least some thing went wrong.


Ignoring transid failure
leaf parent key incorrect 3829276291072
bad block 3829276291072


That's some what a big problem for that tree block.

If this tree block is extent tree block, no wonder why kernel output
kernel warning and abort transaction.

You could try "btrfs-debug-tree -b 3829276291072 " to show the
content of the tree block.


# btrfs-debug-tree -b 3829276291072 /dev/sdb2
btrfs-progs v4.9
node 3829276291072 level 1 items 70 free 51 generation 292525 owner 2
fs uuid f97b3cda-15e8-418b-bb9b-235391ef2a38
chunk uuid 1cee580c-3442-4717-9300-8514dd8ff297
key (3828594696192 METADATA_ITEM 0) block 3828933423104 (934798199)
gen 292523
key (3828594925568 METADATA_ITEM 0) block 3829427818496 (934918901)
gen 292525
key (3828595109888 METADATA_ITEM 0) block 3828895723520 (934788995)
gen 292523
key (3828595232768 METADATA_ITEM 0) block 3829202751488 (934863953)
gen 292524
key (3828595412992 METADATA_ITEM 0) block 3829097209856 (934838186)
gen 292523
key (3828595572736 TREE_BLOCK_REF 33178) block 3829235073024
(934871844) gen 292524
key (3828595744768 METADATA_ITEM 0) block 3829128351744 (934845789)
gen 292524
key (3828595982336 METADATA_ITEM 0) block 3829146484736 (934850216)
gen 292524
key (3828596187136 METADATA_ITEM 1) block 3829097234432 (934838192)
gen 292523
key (3828596387840 TREE_BLOCK_REF 33527) block 3829301653504
(934888099) gen 292525
key (3828596617216 METADATA_ITEM 0) block 3828885737472 (934786557)
gen 292523
key (3828596838400 METADATA_ITEM 0) block 3828885741568 (934786558)
gen 292523
key (3828597047296 METADATA_ITEM 0) block 3829320552448 (934892713)
gen 292525
key (3828597231616 METADATA_ITEM 0) block 3828945653760 (934801185)
gen 292523
key (3828597383168 METADATA_ITEM 0) block 3829276299264 (934881909)
gen 292525
key (3828597641216 METADATA_ITEM 1) block 3829349351424 (934899744)
gen 292525
key (3828597866496 METADATA_ITEM 0) block 3829364776960 (934903510)
gen 292525
key (3828598067200 METADATA_ITEM 0) block 3828598321152 (934716387)
gen 292522
key (3828598259712 METADATA_ITEM 0) block 3829422968832 (934917717)
gen 292525
key (3828598415360 TREE_BLOCK_REF 33252) block 3828885803008
(934786573) gen 292523
key (3828598665216 METADATA_ITEM 0) block 3828937863168 (934799283)
gen 292523
key (3828598829056 METADATA_ITEM 0) block 3828885811200 (934786575)
gen 292523
key (3828599054336 METADATA_ITEM 0) block 3829363744768 (934903258)
gen 292525
key (3828599246848 METADATA_ITEM 0) block 3828915838976 (934793906)
gen 292523
key (3828599504896 METADATA_ITEM 0) block 3829436194816 (934920946)
gen 292525
key (3828599672832 METADATA_ITEM 0) block 3828905140224 (934791294)
gen 292523
key (3828599771136 METADATA_ITEM 0) block 382923776 (934895831)
gen 292525
key (3828599988224 METADATA_ITEM 0) block 3829087199232 (934835742)
gen 292523
key (3828600135680 METADATA_ITEM 0) block 3828885827584 (934786579)
gen 292523
key (3828600389632 METADATA_ITEM 0) block 3829436284928 (934920968)
gen 292525
key (3828600528896 METADATA_ITEM 0) block 3829316214784 (934891654)
gen 292525
key (3828600729600 METADATA_ITEM 0) block 3828885905408 (934786598)
gen 292523
key (3828600934400 METADATA_ITEM 0) block 3829384486912 (934908322)
gen 292525
key (3828601143296 METADATA_ITEM 0) block 3829423611904 (934917874)
gen 292525
key (3828601356288 METADATA_ITEM 0) block 3829113688064 (934842209)
gen 292524
key (3828601556992 METADATA_ITEM 0) block 3829134540800 (934847300)
gen 292524
key (3828601696256 METADATA_ITEM 0) block 3829181837312 (934858847)
gen 292524
key (3828601823232 METADATA_ITEM 0) block 3829157421056 (934852886)
gen 292524
key (3828602015744 TREE_BLOCK_REF 32943) block 3829316218880
(934891655) gen 292525
  

Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Qu Wenruo



At 03/03/2017 09:15 AM, Chris Murphy wrote:

[1805985.267438] BTRFS info (device dm-6): allowing degraded mounts
[1805985.267566] BTRFS info (device dm-6): disk space caching is enabled
[1805985.267676] BTRFS info (device dm-6): has skinny extents
[1805987.187857] BTRFS warning (device dm-6): missing devices (1)
exceeds the limit (0), writeable mount is not allowed
[1805987.228990] BTRFS error (device dm-6): open_ctree failed
[chris@f25s ~]$ sudo mount -o noatime,degraded,ro /dev/mapper/sdb /mnt
[chris@f25s ~]$ sudo btrfs fi df /mnt
Data, RAID1: total=434.00GiB, used=432.46GiB
Data, single: total=1.00GiB, used=1.66MiB
System, RAID1: total=8.00MiB, used=48.00KiB
System, single: total=32.00MiB, used=32.00KiB
Metadata, RAID1: total=2.00GiB, used=729.17MiB
Metadata, single: total=1.00GiB, used=0.00B
GlobalReserve, single: total=495.02MiB, used=0.00B
[chris@f25s ~]$



So the sequence is:
1. mkfs.btrfs -d raid1 -m raid1 

IIRC I submitted a patch long time ago to check each chunk to see if 
it's OK to mount in degraded mode.


And in your case, it will allow RW degraded mount since the stripe of 
that single chunk is not missing.


That patch is later merged into hot-spare patchset, but AFAIK it will be 
a long long time before such hot-spare get merged.


So I'll update that patch and hope it can solve the problem.

Thanks,
Qu



Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Qu Wenruo



At 03/02/2017 05:44 PM, Marc Joliet wrote:

On Wednesday 01 March 2017 19:14:07 Marc Joliet wrote:

In any
case, I started btrfs-check on the device itself.


OK, it's still running, but the output so far is:

# btrfs check --mode=lowmem --progress /dev/sdb2
Checking filesystem on /dev/sdb2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
ERROR: shared extent[3826242740224 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3826442825728 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3826744471552 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827106349056 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827141001216 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827150958592 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827251724288 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827433795584 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827536166912 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827536183296 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3827621646336 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828179406848 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828267970560 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828284530688 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828714246144 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3828794187776 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3829161340928 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3829373693952 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830252130304 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830421159936 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830439141376 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830441398272 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3830785138688 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831099297792 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831128768512 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831371513856 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831535570944 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831591952384 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831799398400 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831829250048 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3831829512192 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832011440128 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832011767808 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832023920640 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832024678400 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832027316224 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832028762112 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832030236672 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832030330880 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832161079296 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832164904960 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832164945920 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3832613765120 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833727565824 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833914073088 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833929310208 4096] lost its parent (parent:
3827251183616, level: 0)
ERROR: shared extent[3833930141696 4096] lost its parent (parent:
3827251183616, level: 0)


The "shared extent lost its parent" is all about the same extent, 
3827251183616.


It would be nice if you could paste the output of btrfs-debug-tree -b 
3827251183616 to check what tree it belongs to.



ERROR: extent[3837768077312, 24576] referencer count mismatch (root: 33174,
owner: 1277577, offset: 4767744) wanted: 1, have: 0
[snip many more referencer count mismatches]

Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Chris Murphy
[1805985.267438] BTRFS info (device dm-6): allowing degraded mounts
[1805985.267566] BTRFS info (device dm-6): disk space caching is enabled
[1805985.267676] BTRFS info (device dm-6): has skinny extents
[1805987.187857] BTRFS warning (device dm-6): missing devices (1)
exceeds the limit (0), writeable mount is not allowed
[1805987.228990] BTRFS error (device dm-6): open_ctree failed
[chris@f25s ~]$ sudo mount -o noatime,degraded,ro /dev/mapper/sdb /mnt
[chris@f25s ~]$ sudo btrfs fi df /mnt
Data, RAID1: total=434.00GiB, used=432.46GiB
Data, single: total=1.00GiB, used=1.66MiB
System, RAID1: total=8.00MiB, used=48.00KiB
System, single: total=32.00MiB, used=32.00KiB
Metadata, RAID1: total=2.00GiB, used=729.17MiB
Metadata, single: total=1.00GiB, used=0.00B
GlobalReserve, single: total=495.02MiB, used=0.00B
[chris@f25s ~]$



So the sequence is:
1. mkfs.btrfs -d raid1 -m raid1 http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Peter Grandi
> [ ... ] Meanwhile, the problem as I understand it is that at
> the first raid1 degraded writable mount, no single-mode chunks
> exist, but without the second device, they are created.  [
> ... ]

That does not make any sense, unless there is a fundamental
mistake in the design of the 'raid1' profile, which this and
other situations make me think is a possibility: that the
category of "mirrored" 'raid1' chunk does not exist in the Btrfs
chunk manager. That is, a chunk is either 'raid1' if it has a
mirror, or if has no mirror it must be 'single'.

If a member device of a 'raid1' profile multidevice volume
disappears there will be "unmirrored" 'raid1' profile chunks and
some code path must recognize them as such, but the logic of the
code does not allow their creation. Question: how does the code
know that a specific 'raid1' chunk is mirrored or not? The chunk
must have a link (member, offset) to its mirror, do they?

What makes me think that "unmirrored" 'raid1' profile chunks are
"not a thing" is that it is impossible to remove explicitly a
member device from a 'raid1' profile volume: first one has to
'convert' to 'single', and then  the 'remove' copies back to the
remaining devices the 'single' chunks that are on the explicitly
'remove'd device. Which to me seems absurd.

Going further in my speculation, I suspect that at the core of
the Btrfs multidevice design there is a persistent "confusion"
(to use en euphemism) between volumes having a profile, and
merely chunks have a profile.

My additional guess that the original design concept had
multidevice volumes to be merely containers for chunks of
whichever mixed profiles, so a subvolume could have 'raid1'
profile metadata and 'raid0' profile data, and another could
have 'raid10' profile metadata and data, but since handling this
turned out to be too hard, this was compromised into volumes
having all metadata chunks to have the same profile and all data
of the same profile, which requires special-case handling of
corner cases, like volumes being converted or missing member
devices.

So in the case of 'raid1', a volume with say a 'raid1' data
profile should have all-'raid1' and fully mirrored profile
chunks, and the lack of a member devices fails that aim in two
ways.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] btrfs: Handle delalloc error correctly to avoid ordered extent hang

2017-03-02 Thread Qu Wenruo



At 03/03/2017 01:28 AM, Filipe Manana wrote:

On Tue, Feb 28, 2017 at 2:28 AM, Qu Wenruo  wrote:

[BUG]
Reports about btrfs hang running btrfs/124 with default mount option and
btrfs/125 with nospace_cache or space_cache=v2 mount options, with
following backtrace.

Call Trace:
 __schedule+0x2d4/0xae0
 schedule+0x3d/0x90
 btrfs_start_ordered_extent+0x160/0x200 [btrfs]
 ? wake_atomic_t_function+0x60/0x60
 btrfs_run_ordered_extent_work+0x25/0x40 [btrfs]
 btrfs_scrubparity_helper+0x1c1/0x620 [btrfs]
 btrfs_flush_delalloc_helper+0xe/0x10 [btrfs]
 process_one_work+0x2af/0x720
 ? process_one_work+0x22b/0x720
 worker_thread+0x4b/0x4f0
 kthread+0x10f/0x150
 ? process_one_work+0x720/0x720
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x2e/0x40

[CAUSE]
The problem is caused by error handler in run_delalloc_nocow() doesn't
handle error from btrfs_reloc_clone_csums() well.


The cause is bad error handling in general, not specific to
btrfs_reloc_clone_csums().
Keep in mind that you're giving a cause for specific failure scenario
while providing a solution to a more broader problem.


Right, I'll update the commit message in next update.





Error handlers in run_dealloc_nocow() and cow_file_range() will clear
dirty flags and finish writeback for remaining pages like the following:


They don't finish writeback because writeback isn't even started.
Writeback is started when a bio is about to be submitted, at
__extent_writepage_io().



|<-- delalloc range --->|
| Ordered extent 1 | Ordered extent 2  |
|Submitted OK  | recloc_clone_csum() error |
|<>|   |<--- cleanup range >|
 ||
 \_=> First page handled by end_extent_writepage() in __extent_writepage()

This behavior has two problems:
1) Ordered extent 2 will never finish


Neither will ordered extent 1.


Not always.
If ordered extent 1 is only 1 page large, then it can finish.

So here I introduced ordered extent 2 for this corner case.




   Ordered extent 2 is already submitted, which relies endio hooks to
   wait for all its pages to finish.


submitted -> created

endio hooks don't wait for pages to finish. What you want to say is
that the ordered extent is marked as complete by the endio hooks.




   However since we finish writeback in error handler, ordered extent 2
   will never finish.


finish -> complete

Again, we don't even reach the point of starting writeback. And
neither ordered extent 2 nor ordered extent 1 complete.



2) Metadata underflow
   btrfs_finish_ordered_io() for ordered extent will free its reserved
   metadata space, while error handlers will also free metadata space of
   the remaining range, which covers ordered extent 2.

   So even if problem 1) is solved, we can still under flow metadata
   reservation, which will leads to deadly btrfs assertion.

[FIX]
This patch will resolve the problem in two steps:
1) Introduce btrfs_cleanup_ordered_extents() to cleanup ordered extents
   Slightly modify one existing function,
   btrfs_endio_direct_write_update_ordered() to handle free space inode
   just like btrfs_writepage_endio_hook() and skip first page to
   co-operate with end_extent_writepage().

   So btrfs_cleanup_ordered_extents() will search all submitted ordered
   extent in specified range, and clean them up except the first page.

2) Make error handlers skip any range covered by ordered extent
   For run_delalloc_nocow() and cow_file_range(), only allow error
   handlers to clean up pages/extents not covered by submitted ordered
   extent.

   For compression, it's calling writepage_end_io_hook() itself to handle
   its error, and any submitted ordered extent will have its bio
   submitted, so no need to worry about compression part.

After the fix, the clean up will happen like:

|<--- delalloc range --->|
| Ordered extent 1 | Ordered extent 2  |
|Submitted OK  | recloc_clone_csum() error |
|<>|<- Cleaned up by cleanup_ordered_extents ->|<-- old error handler--->|
 ||
 \_=> First page handled by end_extent_writepage() in __extent_writepage()

Suggested-by: Filipe Manana 
Signed-off-by: Qu Wenruo 
---
v2:
  Add BTRFS_ORDERED_SKIP_METADATA flag to avoid double reducing
  outstanding extents, which is already done by
  extent_clear_unlock_delalloc() with EXTENT_DO_ACCOUNT control bit
v3:
  Skip first page to avoid underflow ordered->bytes_left.
  Fix range passed in cow_file_range() which doesn't cover the whole
  extent.
  Expend extent_clear_unlock_delalloc() range to allow them to handle
  metadata release.
v4:
  Don't use extra bit to skip metadata freeing for ordered extent,
  but only handle btrfs_reloc_clone_csums() error just before processing
  next extent.
  This makes error handle much easier for run_delalloc_nocow().
v5:
  Variant gramma and comment fixes suggested by Filipe Manana
  Enhanced 

Re: [PATCH v3] btrfs: remove btrfs_err_str function from uapi/linux/btrfs.h

2017-03-02 Thread David Sterba
On Thu, Mar 02, 2017 at 04:01:17PM +0300, Dmitry V. Levin wrote:
> On Thu, Mar 02, 2017 at 12:42:12PM +0100, David Sterba wrote:
> > On Wed, Mar 01, 2017 at 03:54:35PM +0100, David Sterba wrote:
> > > On Wed, Mar 01, 2017 at 02:12:50AM +0300, Dmitry V. Levin wrote:
> > > > btrfs_err_str function is not called from anywhere and is replicated
> > > > in the userspace headers for btrfs-progs.
> > > > 
> > > > It's removal also fixes the following linux/btrfs.h userspace
> > > > compilation error:
> > > > 
> > > > /usr/include/linux/btrfs.h: In function 'btrfs_err_str':
> > > > /usr/include/linux/btrfs.h:740:11: error: 'NULL' undeclared (first use 
> > > > in this function)
> > > > return NULL;
> > > > 
> > > > Suggested-by: Jeff Mahoney 
> > > > Signed-off-by: Dmitry V. Levin 
> > > > Reviewed-by: David Sterba 
> > > > ---
> > > > v3: the patch seems to be lost, resending with updated list of 
> > > > addressees
> > > 
> > > Indeed, I can't find how or where it got lost, sorry. Added to 4.11
> > > again.
> > 
> > So it's because you did not CC linux-btrfs@ , I have the mails in my
> > inbox but haven't found it in the other folder while picking patches.
> 
> Thanks, I though so when Cc'ed linux-btrfs@ the last time.
> 
> Consider updating MAINTAINERS file so that scripts/get_maintainer.pl
> would be able to print the right addressees for btrfs header files:

Good idea.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0001835..04a758f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2905,4 +2905,6 @@ S:  Maintained
>  F:   Documentation/filesystems/btrfs.txt
>  F:   fs/btrfs/
> +F:   include/linux/btrfs*
> +F:   include/uapi/linux/btrfs*
>  
>  BTTV VIDEO4LINUX DRIVER

Please send a proper patch and add

Acked-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] btrfs: Handle delalloc error correctly to avoid ordered extent hang

2017-03-02 Thread Filipe Manana
On Tue, Feb 28, 2017 at 2:28 AM, Qu Wenruo  wrote:
> [BUG]
> Reports about btrfs hang running btrfs/124 with default mount option and
> btrfs/125 with nospace_cache or space_cache=v2 mount options, with
> following backtrace.
>
> Call Trace:
>  __schedule+0x2d4/0xae0
>  schedule+0x3d/0x90
>  btrfs_start_ordered_extent+0x160/0x200 [btrfs]
>  ? wake_atomic_t_function+0x60/0x60
>  btrfs_run_ordered_extent_work+0x25/0x40 [btrfs]
>  btrfs_scrubparity_helper+0x1c1/0x620 [btrfs]
>  btrfs_flush_delalloc_helper+0xe/0x10 [btrfs]
>  process_one_work+0x2af/0x720
>  ? process_one_work+0x22b/0x720
>  worker_thread+0x4b/0x4f0
>  kthread+0x10f/0x150
>  ? process_one_work+0x720/0x720
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x2e/0x40
>
> [CAUSE]
> The problem is caused by error handler in run_delalloc_nocow() doesn't
> handle error from btrfs_reloc_clone_csums() well.

The cause is bad error handling in general, not specific to
btrfs_reloc_clone_csums().
Keep in mind that you're giving a cause for specific failure scenario
while providing a solution to a more broader problem.

>
> Error handlers in run_dealloc_nocow() and cow_file_range() will clear
> dirty flags and finish writeback for remaining pages like the following:

They don't finish writeback because writeback isn't even started.
Writeback is started when a bio is about to be submitted, at
__extent_writepage_io().

>
> |<-- delalloc range --->|
> | Ordered extent 1 | Ordered extent 2  |
> |Submitted OK  | recloc_clone_csum() error |
> |<>|   |<--- cleanup range >|
>  ||
>  \_=> First page handled by end_extent_writepage() in __extent_writepage()
>
> This behavior has two problems:
> 1) Ordered extent 2 will never finish

Neither will ordered extent 1.

>Ordered extent 2 is already submitted, which relies endio hooks to
>wait for all its pages to finish.

submitted -> created

endio hooks don't wait for pages to finish. What you want to say is
that the ordered extent is marked as complete by the endio hooks.


>
>However since we finish writeback in error handler, ordered extent 2
>will never finish.

finish -> complete

Again, we don't even reach the point of starting writeback. And
neither ordered extent 2 nor ordered extent 1 complete.

>
> 2) Metadata underflow
>btrfs_finish_ordered_io() for ordered extent will free its reserved
>metadata space, while error handlers will also free metadata space of
>the remaining range, which covers ordered extent 2.
>
>So even if problem 1) is solved, we can still under flow metadata
>reservation, which will leads to deadly btrfs assertion.
>
> [FIX]
> This patch will resolve the problem in two steps:
> 1) Introduce btrfs_cleanup_ordered_extents() to cleanup ordered extents
>Slightly modify one existing function,
>btrfs_endio_direct_write_update_ordered() to handle free space inode
>just like btrfs_writepage_endio_hook() and skip first page to
>co-operate with end_extent_writepage().
>
>So btrfs_cleanup_ordered_extents() will search all submitted ordered
>extent in specified range, and clean them up except the first page.
>
> 2) Make error handlers skip any range covered by ordered extent
>For run_delalloc_nocow() and cow_file_range(), only allow error
>handlers to clean up pages/extents not covered by submitted ordered
>extent.
>
>For compression, it's calling writepage_end_io_hook() itself to handle
>its error, and any submitted ordered extent will have its bio
>submitted, so no need to worry about compression part.
>
> After the fix, the clean up will happen like:
>
> |<--- delalloc range --->|
> | Ordered extent 1 | Ordered extent 2  |
> |Submitted OK  | recloc_clone_csum() error |
> |<>|<- Cleaned up by cleanup_ordered_extents ->|<-- old error handler--->|
>  ||
>  \_=> First page handled by end_extent_writepage() in __extent_writepage()
>
> Suggested-by: Filipe Manana 
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Add BTRFS_ORDERED_SKIP_METADATA flag to avoid double reducing
>   outstanding extents, which is already done by
>   extent_clear_unlock_delalloc() with EXTENT_DO_ACCOUNT control bit
> v3:
>   Skip first page to avoid underflow ordered->bytes_left.
>   Fix range passed in cow_file_range() which doesn't cover the whole
>   extent.
>   Expend extent_clear_unlock_delalloc() range to allow them to handle
>   metadata release.
> v4:
>   Don't use extra bit to skip metadata freeing for ordered extent,
>   but only handle btrfs_reloc_clone_csums() error just before processing
>   next extent.
>   This makes error handle much easier for run_delalloc_nocow().
> v5:
>   Variant gramma and comment fixes suggested by Filipe Manana
>   Enhanced commit message to focus on the generic error handler bug,
>   pointed out 

Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Austin S. Hemmelgarn

On 2017-03-02 12:26, Andrei Borzenkov wrote:

02.03.2017 16:41, Duncan пишет:

Chris Murphy posted on Wed, 01 Mar 2017 17:30:37 -0700 as excerpted:


[1717713.408675] BTRFS warning (device dm-8): missing devices (1)
exceeds the limit (0), writeable mount is not allowed
[1717713.446453] BTRFS error (device dm-8): open_ctree failed

[chris@f25s ~]$ uname
-r 4.9.8-200.fc25.x86_64

I thought this was fixed. I'm still getting a one time degraded rw
mount, after that it's no longer allowed, which really doesn't make any
sense because those single chunks are on the drive I'm trying to mount.
I don't understand what problem this proscription is trying to avoid. If
it's OK to mount rw,degraded once, then it's OK to allow it twice. If
it's not OK twice, it's not OK once.


AFAIK, no, it hasn't been fixed, at least not in mainline, because the
patches to fix it got stuck in some long-running project patch queue
(IIRC, the one for on-degraded auto-device-replace), with no timeline
known to me on mainline merge.

Meanwhile, the problem as I understand it is that at the first raid1
degraded writable mount, no single-mode chunks exist, but without the
second device, they are created.


Is not it the root cause? I would expect it to create degraded mirrored
chunks that will be synchronized when second device is added back.
That's exactly what it should be doing, and AFAIK what the correct fix 
for this should be, but in the interim just relaxing the degraded check 
to be per-chunk makes things usable, and is arguably how it should have 
been to begin with.


 (It's not clear to me whether they are

created with the first write, that is, ignoring any space in existing
degraded raid1 chunks, or if that's used up first and the single-mode
chunks only created later, when a new chunk must be allocated to continue
writing as the old ones are full.)

So the first degraded-writable mount is allowed, because no single-mode
chunks yet exist, while after such single-mode chunks are created, the
existing dumb algorithm won't allow further writable mounts, because it
sees single-mode chunks on a multi-device filesystem, and never mind that
all the single mode chunks are there, it simply doesn't check that and
won't allow writable mount because some /might/ be on the missing device.

The patches stuck in queue would make btrfs more intelligent about that,
having it check each chunk as listed in the chunk tree, and if at least
one copy is available (as would be the case for single-mode chunks
created after the degraded mount), writable mount would still be
allowed.  But... that's stuck in a long running project queue with no
known timetable for merging... ... so the only way to
get it is to go find and merge them yourself, in your own build.



Will it replicate single mode chunks when second device is added?
Not automatically, you would need to convert them to raid1 (or whatever 
other profile.  Even with the patch, this would still be needed, but at 
least it would (technically) work sanely.  On that note, on most of my 
systems, I have a startup script that calls balance with the appropriate 
convert flags and the soft flag for every fixed (non-removable) BTRFS 
volume on the system to clean up after this.  The actual balance call 
takes no time at all unless there are actually chunks to convert, so it 
normally has very little impact on boot times.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

2017-03-02 Thread Liu Bo
On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Using the clone ioctl (or extent_same ioctl, which calls the same extent
> cloning function as well) we end up allowing copy an inline extent from
> the source file into a non-zero offset of the destination file. This is
> something not expected and that the btrfs code is not prepared to deal
> with - all inline extents must be at a file offset equals to 0.
>

Somehow I failed to reproduce the BUG_ON with this case.

> For example, the following excerpt of a test case for fstests triggers
> a crash/BUG_ON() on a write operation after an inline extent is cloned
> into a non-zero offset:
> 
>   _scratch_mkfs >>$seqres.full 2>&1
>   _scratch_mount
> 
>   # Create our test files. File foo has the same 2K of data at offset 4K
>   # as file bar has at its offset 0.
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>   -c "pwrite -S 0xbb 4k 2K" \
>   -c "pwrite -S 0xcc 8K 4K" \
>   $SCRATCH_MNT/foo | _filter_xfs_io
> 
>   # File bar consists of a single inline extent (2K size).
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>  $SCRATCH_MNT/bar | _filter_xfs_io
> 
>   # Now call the clone ioctl to clone the extent of file bar into file
>   # foo at its offset 4K. This made file foo have an inline extent at
>   # offset 4K, something which the btrfs code can not deal with in future
>   # IO operations because all inline extents are supposed to start at an
>   # offset of 0, resulting in all sorts of chaos.
>   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>   # what it returns for other cases dealing with inlined extents.
>   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> 
>   # Because of the inline extent at offset 4K, the following write made
>   # the kernel crash with a BUG_ON().
>   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
>

On 4.10, after allowing to clone an inline extent to dst file's offset greater
than zero, I followed the test case manually and got these

[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 2048 disk 0 disk_size 2048 -- inline
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20

[root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo 
wrote 2048/2048 bytes at offset 6144
2 KiB, 1 ops; 0. sec (12.520 MiB/sec and 6410.2564 ops/sec)

[root@localhost trinity]# sync
[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 4096 disk 12582912 disk_size 4096
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00


Looks like we now are able to cope with these inline extents?


Thanks,

-liubo


>   status=0
>   exit
> 
> The stack trace of the BUG_ON() triggered by the last write is:
> 
>   [152154.035903] [ cut here ]
>   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
>   [152154.036424] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>   [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic 
> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache 
> sunrpc loop fuse parport_pc acpi_cpu$
>   [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: GW   
> 4.1.0-rc6-btrfs-next-11+ #2
>   [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>   [152154.036424] task: 880429f70990 ti: 880429efc000 task.ti: 
> 880429efc000
>   [152154.036424] RIP: 0010:[]  [] 
> clear_page_dirty_for_io+0x1e/0x90
>   [152154.036424] RSP: 0018:880429effc68  EFLAGS: 00010246
>   [152154.036424] RAX: 02000806 RBX: ea0006a6d8f0 RCX: 
> 0001
>   [152154.036424] RDX:  RSI: 81155d1b RDI: 
> ea0006a6d8f0
>   [152154.036424] RBP: 880429effc78 R08: 8801ce389fe0 R09: 
> 0001
>   [152154.036424] R10: 2000 R11:  R12: 
> 8800200dce68
>   [152154.036424] R13:  R14: 8800200dcc88 R15: 
> 8803d5736d80
>   [152154.036424] FS:  7fbf119f6700() GS:88043d28() 
> knlGS:
>   [152154.036424] CS:  0010 DS:  ES:  CR0: 80050033
>   [152154.036424] CR2: 01bdc000 CR3: 0003aa555000 CR4: 
> 06e0
>   [152154.036424] Stack:
>   [152154.036424]  8803d5736d80 0001 880429effcd8 
> a04e97c1
>   [152154.036424]  880429effd68 880429effd60 0001 
> 8800200dc9c8
>   [152154.036424]  0001 8800200dcc88  
> 

[GIT PULL] Btrfs

2017-03-02 Thread Chris Mason
Hi Linus,

My for-linus-4.11 branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-4.11

Has Btrfs round two.  These are mostly a continuation of Dave Sterba's 
collection
of cleanups, but Filipe also has some bug fixes and performance improvements.

Nikolay Borisov (42) commits (+611/-579):
btrfs: Make lock_and_cleanup_extent_if_need take btrfs_inode (+14/-14)
btrfs: Make btrfs_delalloc_reserve_metadata take btrfs_inode (+39/-38)
btrfs: Make btrfs_extent_item_to_extent_map take btrfs_inode (+10/-8)
btrfs: all btrfs_delalloc_release_metadata take btrfs_inode (+22/-19)
btrfs: make btrfs_inode_resume_unlocked_dio take btrfs_inode (+3/-4)
btrfs: make btrfs_alloc_data_chunk_ondemand take btrfs_inode (+7/-6)
btrfs: make btrfs_inode_block_unlocked_dio take btrfs_inode (+3/-3)
btrfs: Make btrfs_orphan_release_metadata take btrfs_inode (+8/-8)
btrfs: Make btrfs_orphan_reserve_metadata take btrfs_inode (+7/-7)
btrfs: Make check_parent_dirs_for_sync take btrfs_inode (+14/-14)
btrfs: make btrfs_free_io_failure_record take btrfs_inode (+9/-7)
btrfs: Make btrfs_lookup_ordered_range take btrfs_inode (+19/-18)
btrfs: Make (__)btrfs_add_inode_defrag take btrfs_inode (+17/-16)
btrfs: make btrfs_print_data_csum_error take btrfs_inode (+8/-7)
btrfs: make btrfs_is_free_space_inode take btrfs_inode (+20/-19)
btrfs: make btrfs_set_inode_index_count take btrfs_inode (+8/-8)
btrfs: Make btrfs_requeue_inode_defrag take btrfs_inode (+5/-5)
btrfs: Make clone_update_extent_map take btrfs_inode (+13/-14)
btrfs: Make btrfs_mark_extent_written take btrfs_inode (+6/-6)
btrfs: Make btrfs_drop_extent_cache take btrfs_inode (+30/-26)
btrfs: Make calc_csum_metadata_size take btrfs_inode (+12/-15)
btrfs: Make drop_outstanding_extent take btrfs_inode (+11/-12)
btrfs: Make btrfs_del_delalloc_inode take btrfs_inode (+7/-7)
btrfs: make btrfs_log_inode_parent take btrfs_inode (+24/-26)
btrfs: Make btrfs_set_inode_index take btrfs_inode (+13/-13)
btrfs: Make btrfs_clear_bit_hook take btrfs_inode (+25/-21)
btrfs: Make check_extent_to_block take btrfs_inode (+6/-5)
btrfs: make check_compressed_csum take btrfs_inode (+4/-5)
btrfs: Make btrfs_insert_dir_item take btrfs_inode (+7/-7)
btrfs: Make btrfs_log_all_parents take btrfs_inode (+5/-5)
btrfs: Make btrfs_i_size_write take btrfs_inode (+18/-19)
btrfs: make repair_io_failure take btrfs_inode (+12/-11)
btrfs: Make btrfs_orphan_add take btrfs_inode (+24/-22)
btrfs: make btrfs_orphan_del take btrfs_inode (+20/-20)
btrfs: make clean_io_failure take btrfs_inode (+15/-14)
btrfs: Make btrfs_add_nondir take btrfs_inode (+13/-9)
btrfs: make free_io_failure take btrfs_inode (+13/-11)
btrfs: Make check_can_nocow take btrfs_inode (+12/-10)
btrfs: Make btrfs_add_link take btrfs_inode (+26/-23)
btrfs: Make get_extent_t take btrfs_inode (+59/-54)
btrfs: Make hole_mergeable take btrfs_inode (+5/-4)
btrfs: Make fill_holes take btrfs_inode (+18/-19)

David Sterba (16) commits (+139/-124):
btrfs: use predefined limits for calculating maximum number of pages for 
compression (+6/-5)
btrfs: derive maximum output size in the compression implementation (+9/-14)
btrfs: merge nr_pages input and output parameter in compress_pages (+11/-15)
btrfs: merge length input and output parameter in compress_pages (+18/-20)
btrfs: add dummy callback for readpage_io_failed and drop checks (+10/-3)
btrfs: do proper error handling in btrfs_insert_xattr_item (+2/-1)
btrfs: drop checks for mandatory extent_io_ops callbacks (+3/-4)
btrfs: constify device path passed to relevant helpers (+22/-18)
btrfs: document existence of extent_io ops callbacks (+26/-11)
btrfs: handle allocation error in update_dev_stat_item (+2/-1)
btrfs: export compression buffer limits in a header (+15/-10)
btrfs: constify name of subvolume in creation helpers (+3/-3)
btrfs: constify buffers used by compression helpers (+3/-3)
btrfs: remove BUG_ON from __tree_mod_log_insert (+0/-2)
btrfs: constify input buffer of btrfs_csum_data (+3/-3)
btrfs: let writepage_end_io_hook return void (+6/-11)

Filipe Manana (8) commits (+163/-27):
Btrfs: do not create explicit holes when replaying log tree if NO_HOLES 
enabled (+5/-0)
Btrfs: try harder to migrate items to left sibling before splitting a leaf 
(+7/-0)
Btrfs: fix assertion failure when freeing block groups at close_ctree() 
(+9/-6)
Btrfs: incremental send, fix unnecessary hole writes for sparse files 
(+86/-2)
Btrfs: fix use-after-free due to wrong order of destroying work queues 
(+7/-2)
Btrfs: incremental send, do not delay rename when parent inode is new 
(+16/-3)
Btrfs: fix data loss after truncate when using the no-holes feature (+6/-13)
Btrfs: bulk delete checksum items in the same leaf (+27/-1)

Robbie Ko (3) commits 

[PATCH] MAINTAINERS: add btrfs file entries

2017-03-02 Thread Dmitry V. Levin
Add file entries for btrfs header files.

Signed-off-by: Dmitry V. Levin 
Acked-by: David Sterba 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0001835..04a758f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2904,6 +2904,8 @@ T:git 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
 S: Maintained
 F: Documentation/filesystems/btrfs.txt
 F: fs/btrfs/
+F: include/linux/btrfs*
+F: include/uapi/linux/btrfs*
 
 BTTV VIDEO4LINUX DRIVER
 M: Mauro Carvalho Chehab 
-- 
ldv
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Andrei Borzenkov
02.03.2017 16:41, Duncan пишет:
> Chris Murphy posted on Wed, 01 Mar 2017 17:30:37 -0700 as excerpted:
> 
>> [1717713.408675] BTRFS warning (device dm-8): missing devices (1)
>> exceeds the limit (0), writeable mount is not allowed
>> [1717713.446453] BTRFS error (device dm-8): open_ctree failed
>>
>> [chris@f25s ~]$ uname
>> -r 4.9.8-200.fc25.x86_64
>>
>> I thought this was fixed. I'm still getting a one time degraded rw
>> mount, after that it's no longer allowed, which really doesn't make any
>> sense because those single chunks are on the drive I'm trying to mount.
>> I don't understand what problem this proscription is trying to avoid. If
>> it's OK to mount rw,degraded once, then it's OK to allow it twice. If
>> it's not OK twice, it's not OK once.
> 
> AFAIK, no, it hasn't been fixed, at least not in mainline, because the 
> patches to fix it got stuck in some long-running project patch queue 
> (IIRC, the one for on-degraded auto-device-replace), with no timeline 
> known to me on mainline merge.
> 
> Meanwhile, the problem as I understand it is that at the first raid1 
> degraded writable mount, no single-mode chunks exist, but without the 
> second device, they are created. 

Is not it the root cause? I would expect it to create degraded mirrored
chunks that will be synchronized when second device is added back.

 (It's not clear to me whether they are
> created with the first write, that is, ignoring any space in existing 
> degraded raid1 chunks, or if that's used up first and the single-mode 
> chunks only created later, when a new chunk must be allocated to continue 
> writing as the old ones are full.)
> 
> So the first degraded-writable mount is allowed, because no single-mode 
> chunks yet exist, while after such single-mode chunks are created, the 
> existing dumb algorithm won't allow further writable mounts, because it 
> sees single-mode chunks on a multi-device filesystem, and never mind that 
> all the single mode chunks are there, it simply doesn't check that and 
> won't allow writable mount because some /might/ be on the missing device.
> 
> The patches stuck in queue would make btrfs more intelligent about that, 
> having it check each chunk as listed in the chunk tree, and if at least 
> one copy is available (as would be the case for single-mode chunks 
> created after the degraded mount), writable mount would still be 
> allowed.  But... that's stuck in a long running project queue with no 
> known timetable for merging... ... so the only way to 
> get it is to go find and merge them yourself, in your own build.
> 

Will it replicate single mode chunks when second device is added?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: assertion failed: last_size == new_size, file: fs/btrfs/inode.c

2017-03-02 Thread Liu Bo
On Wed, Mar 01, 2017 at 03:03:19PM -0500, Dave Jones wrote:
> On Tue, Feb 28, 2017 at 05:12:01PM -0800, Liu Bo wrote:
>  > On Mon, Feb 27, 2017 at 11:23:42AM -0500, Dave Jones wrote:
>  > > On Mon, Feb 27, 2017 at 07:53:48AM -0800, Liu Bo wrote:
>  > >  > On Sun, Feb 26, 2017 at 07:18:42PM -0500, Dave Jones wrote:
>  > >  > > Hitting this fairly frequently.. I'm not sure if this is the same 
> bug I've
>  > >  > > been hitting occasionally since 4.9. The assertion looks new to me 
> at least.
>  > >  > >
>  > >  > 
>  > >  > It was recently introduced by my commit and used to catch data loss 
> at truncate.
>  > >  > 
>  > >  > Were you running the test with a mkfs.btrfs -O NO_HOLES?
>  > >  > (We just queued a fix for the NO_HOLES case in btrfs-next.)
>  > > 
>  > > No, a fs created with default mkfs.btrfs options.
>  > 
>  > I have this patch[1] to fix a bug which results in file hole extent, and 
> this
>  > bug could lead us to hit the assertion.
>  > 
>  > Would you try to run the test w/ it, please?
>  > 
>  > [1]: https://patchwork.kernel.org/patch/9597281/
> 
> Made no difference. Still see the same trace & assertion.

Some updates here, I've got it reproduced, somehow a corner case ends up
with a inline file extent following by some pre-alloc extents, along the
way, isize also got updated unexpectedly.  Will try to narrow it down.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-03-02 Thread Jan Kara
On Thu 02-03-17 06:12:45, Matthew Wilcox wrote:
> On Thu, Mar 02, 2017 at 11:38:45AM +0100, Jan Kara wrote:
> > On Wed 01-03-17 07:38:57, Christoph Hellwig wrote:
> > > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote:
> > > > But what's going to kick these pages out of cache?  Shouldn't we rather
> > > > find the pages, kick them out if clean, start writeback if not, and 
> > > > *then*
> > > > return -EAGAIN?
> > > 
> > > As pointed out in the last round of these patches I think we really
> > > need to pass a flags argument to filemap_write_and_wait_range to
> > > communicate the non-blocking nature and only return -EAGAIN if we'd
> > > block.  As a bonus that can indeed start to kick the pages out.
> > 
> > Aren't flags to filemap_write_and_wait_range() unnecessary complication?
> > Realistically, most users wanting performance from AIO DIO so badly that
> > they bother with this API won't have any pages to write / evict. If they do
> > by some bad accident, they can fall back to standard "blocking" AIO DIO.
> > So I don't see much value in teaching filemap_write_and_wait_range() about
> > a non-blocking mode...
> 
> That lets me execute a DoS against a user using this API.  All I have
> to do is open the file they're using read-only and read a byte from it.
> Page goes into page-cache, and they'll only get -EAGAIN from calling
> this syscall until the page ages out.

It will not be a DoS. This non-blocking AIO can always return EAGAIN when
it feels like it and the caller is required to fall back to a blocking
version in that case if he wants to guarantee forward progress. It is just
a performance optimization which allows user (database) to submit IO from a
computation thread instead of having to offload it to an IO thread...

> Also, I don't understand why this is a flag.  Isn't the point of AIO to
> be non-blocking?  Why isn't this just a change to how we do AIO?

Because this is an API change and the caller has to implement some handling
to guarantee a forward progress of non-blocking IO...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Duncan
Chris Murphy posted on Wed, 01 Mar 2017 17:30:37 -0700 as excerpted:

> [1717713.408675] BTRFS warning (device dm-8): missing devices (1)
> exceeds the limit (0), writeable mount is not allowed
> [1717713.446453] BTRFS error (device dm-8): open_ctree failed
> 
> [chris@f25s ~]$ uname
> -r 4.9.8-200.fc25.x86_64
> 
> I thought this was fixed. I'm still getting a one time degraded rw
> mount, after that it's no longer allowed, which really doesn't make any
> sense because those single chunks are on the drive I'm trying to mount.
> I don't understand what problem this proscription is trying to avoid. If
> it's OK to mount rw,degraded once, then it's OK to allow it twice. If
> it's not OK twice, it's not OK once.

AFAIK, no, it hasn't been fixed, at least not in mainline, because the 
patches to fix it got stuck in some long-running project patch queue 
(IIRC, the one for on-degraded auto-device-replace), with no timeline 
known to me on mainline merge.

Meanwhile, the problem as I understand it is that at the first raid1 
degraded writable mount, no single-mode chunks exist, but without the 
second device, they are created.  (It's not clear to me whether they are 
created with the first write, that is, ignoring any space in existing 
degraded raid1 chunks, or if that's used up first and the single-mode 
chunks only created later, when a new chunk must be allocated to continue 
writing as the old ones are full.)

So the first degraded-writable mount is allowed, because no single-mode 
chunks yet exist, while after such single-mode chunks are created, the 
existing dumb algorithm won't allow further writable mounts, because it 
sees single-mode chunks on a multi-device filesystem, and never mind that 
all the single mode chunks are there, it simply doesn't check that and 
won't allow writable mount because some /might/ be on the missing device.

The patches stuck in queue would make btrfs more intelligent about that, 
having it check each chunk as listed in the chunk tree, and if at least 
one copy is available (as would be the case for single-mode chunks 
created after the degraded mount), writable mount would still be 
allowed.  But... that's stuck in a long running project queue with no 
known timetable for merging... ... so the only way to 
get it is to go find and merge them yourself, in your own build.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs progs pre-release 4.10-rc1

2017-03-02 Thread David Sterba
Hi,

a pre-release has been tagged. There are patches that have queued so far, but
I haven't gone through everything that's in the mailinglist. The 4.10 release
ETA is next week so I'll try to process the backlog and merge what would seem
applicable.

Changes:
  * send: dump output fixes: missing newlies
  * check: several fixes for the lowmem mode, improved error reporting
  * build
* removed some library deps for binaries that not use them
* ctags, cscope
* split Makefile to the autotool generated part and the rest, not needed
  to autogen.sh after adding a file
  * shared code: sync easy parts with kernel sources
  * other
* lots of cleanups
* source file reorganization: convert, mkfs, utils
* lots of spelling fixes in docs, other updates
* more tests

ETA for 4.10 is in +7 days (2017-03-08).

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

Austin S. Hemmelgarn (1):
  btrfs-progs: better document btrfs receive security

Benedikt Morbach (1):
  btrfs-progs: send-dump: add missing newlines

David Sterba (102):
  btrfs-progs: rework option parser to use getopt for global options
  btrfs-progs: introduce global config
  btrfs-progs: find-root: rename usage helper
  btrfs-progs: move help defines to own header
  btrfs-progs: move help implemetnation to own file
  btrfs-progs: move some common definitions to own header
  btrfs-progs: move mkfs definitions to own header
  btrfs-progs: move convert definitions to own header
  btrfs-progs: mkfs: move common api implementation to own file
  btrfs-progs: convert: move common api implementation to own file
  btrfs-progs: move fs features declarations to own header from utils
  btrfs-progs: move fs features implementation to own file
  btrfs-progs: convert: move definitions for interal conversion API to own 
file
  btrfs-progs: convert: move ext2 definitions out of main
  btrfs-progs: convert: move ext2 conversion out of main.c
  btrfs-progs: convert: move implementation for interal conversion API to 
own file
  btrfs-progs: build: list convert build objects in a variable
  btrfs-progs: build: list mkfs.btrfs build objects in a variable
  btrfs-progs: build: split LIBS
  btrfs-progs: build: reorder target dependencies
  btrfs-progs: build: replace target names with automatic variable
  btrfs-progs: build: use target deps on commandline via automatic variable
  btrfs-progs: build: remove directory-specific include paths
  btrfs-progs: mkfs: make list of source fs more visible
  btrfs-progs: convert: use wider types types for inode counts for progress 
reports
  btrfs-progs: convert: update some forward declarations
  btrfs-progs: build: add rule for ctags
  btrfs-progs: build: split makefile to generated and stable parts
  btrfs-progs: build: add rule for building cscope index
  btrfs-progs: convert: move struct initialization to the init function
  btrfs-progs: convert: use fixed lenght array for source fs name
  btrfs-progs: convert: use on-stack buffer for subvol name dir
  btrfs-progs: convert: remove unused includes
  btrfs-progs: convert: better error handling in ext2_read_used_space
  btrfs-progs: convert: use helper for special inode number check
  btrfs-progs: convert: use bit field for convert flags
  btrfs-progs: build: add stub makefile to convert
  btrfs-progs: build: build library by default
  btrfs-progs: kerncompat: print trace from ASSERT, if enabled
  btrfs-progs: move more mkfs declarations to the common header
  btrfs-progs: move mkfs helper implementation out of utils
  btrfs-progs: convert: rename ext2 function to create a symlink
  btrfs-progs: convert: move internal bg size definition
  btrfs-progs: build: drop deprecated utility from test dependencies
  btrfs-progs: build: use MAKEOPTS where missing
  btrfs-progs: build: remove unused variables from docs makefile
  btrfs-progs: mkfs: clear whole mkfs_cfg at once
  btrfs-progs: mkfs: describe fields of btrfs_mkfs_config
  btrfs-progs: mkfs: make make_cfg::blocks an internal member
  btrfs-progs: mkfs: use const char for label
  btrfs-progs: convert: rename members that clash with other functions
  btrfs-progs: convert: improve assert in make_convert_btrfs
  btrfs-progs: move utils code out of header
  btrfs-progs: move message helpers out of utils
  btrfs-progs: move message helpers implementation out of header
  btrfs-progs: drop unused argument from btrfs_truncate_item
  btrfs-progs: drop unused argument from btrfs_extend_item
  btrfs-progs: drop unused argument from btrfs_del_ptr
  btrfs-progs: move prefixcmp helper to utils
  btrfs-progs: move ulist.[ch] to kernel-shared
  btrfs-progs: shared: copy 

Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-03-02 Thread Matthew Wilcox
On Thu, Mar 02, 2017 at 11:38:45AM +0100, Jan Kara wrote:
> On Wed 01-03-17 07:38:57, Christoph Hellwig wrote:
> > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote:
> > > But what's going to kick these pages out of cache?  Shouldn't we rather
> > > find the pages, kick them out if clean, start writeback if not, and *then*
> > > return -EAGAIN?
> > 
> > As pointed out in the last round of these patches I think we really
> > need to pass a flags argument to filemap_write_and_wait_range to
> > communicate the non-blocking nature and only return -EAGAIN if we'd
> > block.  As a bonus that can indeed start to kick the pages out.
> 
> Aren't flags to filemap_write_and_wait_range() unnecessary complication?
> Realistically, most users wanting performance from AIO DIO so badly that
> they bother with this API won't have any pages to write / evict. If they do
> by some bad accident, they can fall back to standard "blocking" AIO DIO.
> So I don't see much value in teaching filemap_write_and_wait_range() about
> a non-blocking mode...

That lets me execute a DoS against a user using this API.  All I have
to do is open the file they're using read-only and read a byte from it.
Page goes into page-cache, and they'll only get -EAGAIN from calling
this syscall until the page ages out.

Also, I don't understand why this is a flag.  Isn't the point of AIO to
be non-blocking?  Why isn't this just a change to how we do AIO?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: remove btrfs_err_str function from uapi/linux/btrfs.h

2017-03-02 Thread David Sterba
On Wed, Mar 01, 2017 at 03:54:35PM +0100, David Sterba wrote:
> On Wed, Mar 01, 2017 at 02:12:50AM +0300, Dmitry V. Levin wrote:
> > btrfs_err_str function is not called from anywhere and is replicated
> > in the userspace headers for btrfs-progs.
> > 
> > It's removal also fixes the following linux/btrfs.h userspace
> > compilation error:
> > 
> > /usr/include/linux/btrfs.h: In function 'btrfs_err_str':
> > /usr/include/linux/btrfs.h:740:11: error: 'NULL' undeclared (first use in 
> > this function)
> > return NULL;
> > 
> > Suggested-by: Jeff Mahoney 
> > Signed-off-by: Dmitry V. Levin 
> > Reviewed-by: David Sterba 
> > ---
> > v3: the patch seems to be lost, resending with updated list of addressees
> 
> Indeed, I can't find how or where it got lost, sorry. Added to 4.11
> again.

So it's because you did not CC linux-btrfs@ , I have the mails in my
inbox but haven't found it in the other folder while picking patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: remove btrfs_err_str function from uapi/linux/btrfs.h

2017-03-02 Thread Dmitry V. Levin
On Thu, Mar 02, 2017 at 12:42:12PM +0100, David Sterba wrote:
> On Wed, Mar 01, 2017 at 03:54:35PM +0100, David Sterba wrote:
> > On Wed, Mar 01, 2017 at 02:12:50AM +0300, Dmitry V. Levin wrote:
> > > btrfs_err_str function is not called from anywhere and is replicated
> > > in the userspace headers for btrfs-progs.
> > > 
> > > It's removal also fixes the following linux/btrfs.h userspace
> > > compilation error:
> > > 
> > > /usr/include/linux/btrfs.h: In function 'btrfs_err_str':
> > > /usr/include/linux/btrfs.h:740:11: error: 'NULL' undeclared (first use in 
> > > this function)
> > > return NULL;
> > > 
> > > Suggested-by: Jeff Mahoney 
> > > Signed-off-by: Dmitry V. Levin 
> > > Reviewed-by: David Sterba 
> > > ---
> > > v3: the patch seems to be lost, resending with updated list of addressees
> > 
> > Indeed, I can't find how or where it got lost, sorry. Added to 4.11
> > again.
> 
> So it's because you did not CC linux-btrfs@ , I have the mails in my
> inbox but haven't found it in the other folder while picking patches.

Thanks, I though so when Cc'ed linux-btrfs@ the last time.

Consider updating MAINTAINERS file so that scripts/get_maintainer.pl
would be able to print the right addressees for btrfs header files:

diff --git a/MAINTAINERS b/MAINTAINERS
index 0001835..04a758f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2905,4 +2905,6 @@ S:Maintained
 F: Documentation/filesystems/btrfs.txt
 F: fs/btrfs/
+F: include/linux/btrfs*
+F: include/uapi/linux/btrfs*
 
 BTTV VIDEO4LINUX DRIVER

-- 
ldv
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-02 Thread Adam Borowski
On Wed, Mar 01, 2017 at 05:30:37PM -0700, Chris Murphy wrote:
> [1717713.408675] BTRFS warning (device dm-8): missing devices (1)
> exceeds the limit (0), writeable mount is not allowed
> [1717713.446453] BTRFS error (device dm-8): open_ctree failed
> 
> [chris@f25s ~]$ uname -r
> 4.9.8-200.fc25.x86_64
> 
> I thought this was fixed. I'm still getting a one time degraded rw
> mount, after that it's no longer allowed, which really doesn't make
> any sense because those single chunks are on the drive I'm trying to
> mount.

Well, there's Qu's patch at:
https://www.spinics.net/lists/linux-btrfs/msg47283.html
but it doesn't apply cleanly nor is easy to rebase to current kernels.

> I don't understand what problem this proscription is trying to
> avoid. If it's OK to mount rw,degraded once, then it's OK to allow it
> twice. If it's not OK twice, it's not OK once.

Well, yeah.  The current check is naive and wrong.  It does have a purpose,
just fails in this, very common, case.

For people needing to recover their filesystem at this moment there's
https://www.spinics.net/lists/linux-btrfs/msg62473.html
but it removes the protection you still want for other cases.

This problem pops up way too often, thus I guess that if not the devs, then
at least us in the peanut gallery should do the work reviving the real
solution.  Obviously, I for one am shortish on tuits at the moment...

-- 
⢀⣴⠾⠻⢶⣦⠀ Meow!
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄ preimage for double rot13!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-03-02 Thread Jan Kara
On Wed 01-03-17 07:38:57, Christoph Hellwig wrote:
> On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote:
> > Ugh, this is pretty inefficient.  If that's all you want to know, then
> > using the radix tree directly will be far more efficient than spinning
> > up all the pagevec machinery only to discard the pages found.
> > 
> > But what's going to kick these pages out of cache?  Shouldn't we rather
> > find the pages, kick them out if clean, start writeback if not, and *then*
> > return -EAGAIN?
> > 
> > So maybe we want to spin up the pagevec machinery after all so we can
> > do that extra work?
> 
> As pointed out in the last round of these patches I think we really
> need to pass a flags argument to filemap_write_and_wait_range to
> communicate the non-blocking nature and only return -EAGAIN if we'd
> block.  As a bonus that can indeed start to kick the pages out.

Aren't flags to filemap_write_and_wait_range() unnecessary complication?
Realistically, most users wanting performance from AIO DIO so badly that
they bother with this API won't have any pages to write / evict. If they do
by some bad accident, they can fall back to standard "blocking" AIO DIO.
So I don't see much value in teaching filemap_write_and_wait_range() about
a non-blocking mode...

Honza

-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Marc Joliet
On Wednesday 01 March 2017 19:14:07 Marc Joliet wrote:
> In any 
> case, I started btrfs-check on the device itself.

OK, it's still running, but the output so far is:

# btrfs check --mode=lowmem --progress /dev/sdb2
Checking filesystem on /dev/sdb2
UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
ERROR: shared extent[3826242740224 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3826442825728 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3826744471552 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827106349056 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827141001216 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827150958592 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827251724288 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827433795584 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827536166912 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827536183296 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3827621646336 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828179406848 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828267970560 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828284530688 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828714246144 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3828794187776 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3829161340928 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3829373693952 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830252130304 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830421159936 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830439141376 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830441398272 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3830785138688 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831099297792 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831128768512 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831371513856 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831535570944 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831591952384 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831799398400 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831829250048 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3831829512192 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832011440128 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832011767808 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832023920640 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832024678400 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832027316224 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832028762112 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832030236672 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832030330880 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832161079296 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832164904960 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832164945920 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3832613765120 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833727565824 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833914073088 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833929310208 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: shared extent[3833930141696 4096] lost its parent (parent: 
3827251183616, level: 0)
ERROR: extent[3837768077312, 24576] referencer count mismatch (root: 33174, 
owner: 1277577, offset: 4767744) wanted: 1, have: 0
[snip many more referencer count mismatches]
ERROR: extent[3878247383040, 8192] referencer count mismatch (root: 33495, 
owner: 2688918, offset: 3874816) wanted: 2, have: 3
ERROR: block group[3879328546816 1073741824] used 1072840704 but extent items 

Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-02 Thread Marc Joliet
On Thursday 02 March 2017 08:43:53 Qu Wenruo wrote:
> At 02/02/2017 08:01 PM, Marc Joliet wrote:
> > On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:
> >> Hello list!
> > 
> > Hi list
> 
> [kernel message snipped]
> 
> >> Btrfs --repair refused to repair the filesystem telling me something
> >> about compressed extents and an unsupported case, wanting me to take an
> >> image and send it to the devs. *sigh*
> > 
> > I haven't tried a repair yet; it's a big file system, and btrfs-check is
> > still running:
> > 
> > # btrfs check -p /dev/sdd2
> > Checking filesystem on /dev/sdd2
> > UUID: f97b3cda-15e8-418b-bb9b-235391ef2a38
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> > parent transid verify failed on 3829276291072 wanted 224274 found 283858
> 
> Normal transid error, can't say much about if it's harmless, but at
> least some thing went wrong.
> 
> > Ignoring transid failure
> > leaf parent key incorrect 3829276291072
> > bad block 3829276291072
> 
> That's some what a big problem for that tree block.
> 
> If this tree block is extent tree block, no wonder why kernel output
> kernel warning and abort transaction.
> 
> You could try "btrfs-debug-tree -b 3829276291072 " to show the
> content of the tree block.

# btrfs-debug-tree -b 3829276291072 /dev/sdb2 
btrfs-progs v4.9
node 3829276291072 level 1 items 70 free 51 generation 292525 owner 2
fs uuid f97b3cda-15e8-418b-bb9b-235391ef2a38
chunk uuid 1cee580c-3442-4717-9300-8514dd8ff297
key (3828594696192 METADATA_ITEM 0) block 3828933423104 (934798199) 
gen 292523
key (3828594925568 METADATA_ITEM 0) block 3829427818496 (934918901) 
gen 292525
key (3828595109888 METADATA_ITEM 0) block 3828895723520 (934788995) 
gen 292523
key (3828595232768 METADATA_ITEM 0) block 3829202751488 (934863953) 
gen 292524
key (3828595412992 METADATA_ITEM 0) block 3829097209856 (934838186) 
gen 292523
key (3828595572736 TREE_BLOCK_REF 33178) block 3829235073024 
(934871844) gen 292524
key (3828595744768 METADATA_ITEM 0) block 3829128351744 (934845789) 
gen 292524
key (3828595982336 METADATA_ITEM 0) block 3829146484736 (934850216) 
gen 292524
key (3828596187136 METADATA_ITEM 1) block 3829097234432 (934838192) 
gen 292523
key (3828596387840 TREE_BLOCK_REF 33527) block 3829301653504 
(934888099) gen 292525
key (3828596617216 METADATA_ITEM 0) block 3828885737472 (934786557) 
gen 292523
key (3828596838400 METADATA_ITEM 0) block 3828885741568 (934786558) 
gen 292523
key (3828597047296 METADATA_ITEM 0) block 3829320552448 (934892713) 
gen 292525
key (3828597231616 METADATA_ITEM 0) block 3828945653760 (934801185) 
gen 292523
key (3828597383168 METADATA_ITEM 0) block 3829276299264 (934881909) 
gen 292525
key (3828597641216 METADATA_ITEM 1) block 3829349351424 (934899744) 
gen 292525
key (3828597866496 METADATA_ITEM 0) block 3829364776960 (934903510) 
gen 292525
key (3828598067200 METADATA_ITEM 0) block 3828598321152 (934716387) 
gen 292522
key (3828598259712 METADATA_ITEM 0) block 3829422968832 (934917717) 
gen 292525
key (3828598415360 TREE_BLOCK_REF 33252) block 3828885803008 
(934786573) gen 292523
key (3828598665216 METADATA_ITEM 0) block 3828937863168 (934799283) 
gen 292523
key (3828598829056 METADATA_ITEM 0) block 3828885811200 (934786575) 
gen 292523
key (3828599054336 METADATA_ITEM 0) block 3829363744768 (934903258) 
gen 292525
key (3828599246848 METADATA_ITEM 0) block 3828915838976 (934793906) 
gen 292523
key (3828599504896 METADATA_ITEM 0) block 3829436194816 (934920946) 
gen 292525
key (3828599672832 METADATA_ITEM 0) block 3828905140224 (934791294) 
gen 292523
key (3828599771136 METADATA_ITEM 0) block 382923776 (934895831) 
gen 292525
key (3828599988224 METADATA_ITEM 0) block 3829087199232 (934835742) 
gen 292523
key (3828600135680 METADATA_ITEM 0) block 3828885827584 (934786579) 
gen 292523
key (3828600389632 METADATA_ITEM 0) block 3829436284928 (934920968) 
gen 292525
key (3828600528896 METADATA_ITEM 0) block 3829316214784 (934891654) 
gen 292525
key (3828600729600 METADATA_ITEM 0) block 3828885905408 (934786598) 
gen 292523
key (3828600934400 METADATA_ITEM 0) block 3829384486912 (934908322) 
gen 292525
key (3828601143296 METADATA_ITEM 0) block 3829423611904 (934917874) 
gen 292525
key (3828601356288 METADATA_ITEM 0) block 3829113688064 (934842209) 
gen 292524
key (3828601556992 METADATA_ITEM 0) block 3829134540800 (934847300) 
gen 292524
key (3828601696256 METADATA_ITEM 0) block 3829181837312 (934858847) 
gen 292524
key (3828601823232 METADATA_ITEM 0) block 3829157421056 (934852886) 
gen