Re: free space is missing after dist upgrade on lzo compressed vol

2015-11-14 Thread Timofey Titovets
Ubuntu create snapshot before each release upgrade
sudo mount /dev/sda6 /mnt -o rw,subvol=/;
ls /mnt

2015-11-14 9:16 GMT+03:00 Brenton Chapin :
> Thanks for the ideas.  Sadly, no snapshots, unless btrfs does that by
> default.  Never heard of snapper before.
>
> Don't see how open files could be a problem, since the computer has
> been rebooted several times.
>
> I wonder... could the distribution upgrade have moved all the old
> files into a hidden trash directory, rather than deleting them?  But
> du picks up hidden directories, I believe.  Doesn't seem like that
> could be it either.
>
> On Fri, Nov 13, 2015 at 4:38 PM, Hugo Mills  wrote:
>> On Fri, Nov 13, 2015 at 04:33:23PM -0600, Brenton Chapin wrote:
>>> I was running Lubuntu 14.04 on btrfs with lzo compresssion on, with
>>> the following partition scheme:
>>>
>>> sda5   232M  /boot
>>> sda6   16G   /
>>> sda7   104G /home
>>>
>>> (sda5 is ext4)
>>>
>>> I did 2 distribution upgrades, one after the other, to 15.04, then
>>> 15.10, since the upgrade utility would not go directly to the latest
>>> version.  This process did a whole lot of reading and writing to the
>>> root volume of course.  Everything seems to be working, except most of
>>> the free space I had on sda6 is gone.  Was using about 4G, now df
>>> reports that the usage is 12G.  At first, I thought Lubuntu had not
>>> removed old files, but I can't find anything old left behind.  I began
>>> to suspect btrfs, and checking, find that du shows only 4G used on
>>> sda6.  Where'd the other 8G go?
>>
>>Do you have snapshots? Are you running snapper, for example?
>>
>>The other place that large amounts of space can go over an upgrade
>> is in orphans -- files that are deleted, but still held open by
>> processes, and which therefore can't be reclaimed until the process is
>> restarted. I've been bitten by that one before.
>>
>>Hugo.
>>
>>> "btrfs fi df /" reports the following:
>>>
>>> Data, single: total=11.01GiB, used=10.58GiB
>>> System, DUP: total=8.00MiB, used=16.00KiB
>>> System, single: total=4.00MiB, used=0.00B
>>> Metadata, DUP: total=1.00GiB, used=397.80MiB
>>> Metadata, single: total=8.00MiB, used=0.00B
>>> GlobalReserve, single: total=144.00MiB, used=0.00B
>>>
>>> "btrfs filesystem show /" gives:
>>>
>>> Label: none  uuid: 4ea4ac08-ff37-4b51-b1a3-d8b21fd43ddd
>>> Total devices 1 FS bytes used 10.97GiB
>>> devid1 size 15.02GiB used 13.04GiB path /dev/sda6
>>>
>>> btrfs-progs v4.0
>>>
>>> "du --max-depth=1 -h -x" on / shows:
>>>
>>> 29M./etc
>>> 0./media
>>> 16M./bin
>>> 354M./lib
>>> 4.0K./lib64
>>> 0./mnt
>>> 160K./root
>>> 12M./sbin
>>> 0./srv
>>> 4.0K./tmp
>>> 3.1G./usr
>>> 442M./var
>>> 0./cdrom
>>> 3.8M./lib32
>>> 3.9G.
>>>
>>> And of course df:
>>>
>>> /dev/sda616G   12G  2.5G  83% /
>>> /dev/sda5   232M   53M  163M  25% /boot
>>> /dev/sda7   104G   46G   57G  45% /home
>>>
>>> And mount:
>>>
>>> mount |grep sda
>>> /dev/sda6 on / type btrfs
>>> (rw,relatime,compress=lzo,space_cache,subvolid=257,subvol=/@)
>>> /dev/sda5 on /boot type ext4 (rw,relatime,data=ordered)
>>> /dev/sda7 on /home type btrfs
>>> (rw,relatime,compress=lzo,space_cache,subvolid=257,subvol=/@home)
>>>
>>> uname -a
>>> Linux ichor 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC
>>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> I can live with the situation, but recovering that space would be nice.
>>
>> --
>> Hugo Mills | Happiness is mandatory. Are you happy?
>> hugo@... carfax.org.uk |
>> http://carfax.org.uk/  |
>> PGP: E2AB1DE4  |  
>> Paranoia
>
>
>
> --
> http://brentonchapin.no-ip.biz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3.2 00/12] xfstests: test the nfs/cifs/btrfs/xfs reflink/dedupe ioctls

2015-11-14 Thread Christoph Hellwig
Looks good,

Acked-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Using Btrfs on single drives

2015-11-14 Thread audio muze
I'm looking to make a "production copy" of my music and video library
for use in our media server.  It is not my intent to create any form
of RAID array, but rather to treat each drive independently where
filesystem is concerned and then to create a single view of the drives
using mhddfs.  As the data will remain relatively static I may also
deploy Snapraid in conjunction with mhddfs.

I'm considering using Btrfs as the underlying filesystem on each of
the individual drives, principally to take advantage of metadata
redundancy.  Am I corect in surmising that ?I can turn checksumming
off given it's of no utility where a Btrfs volume is comprised of a
single device only?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using Btrfs on single drives

2015-11-14 Thread Goffredo Baroncelli
On 2015-11-14 11:43, audio muze wrote:
> I can turn checksumming
> off given it's of no utility where a Btrfs volume is comprised of a
> single device only?

The checksums are used to detect a data corruption; in case of a btrfs-raid, 
the checksums are used *also* to pick the good copy.

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] Btrfs: fix the number of transaction units needed to remove a block group

2015-11-14 Thread fdmanana
From: Filipe Manana 

We were using only 1 transaction unit when attempting to delete an unused
block group but in reality we need 3 + N units, where N corresponds to the
number of stripes. We were accounting only for the addition of the orphan
item (for the block group's free space cache inode) but we were not
accounting that we need to delete one block group item from the extent
tree, one free space item from the tree of tree roots and N device extent
items from the device tree.

While one unit is not enough, it worked most of the time because for each
single unit we are too pessimistic and assume an entire tree path, with
the highest possible heigth (8), needs to be COWed with eventual node
splits at every possible level in the tree, so there was usually enough
reserved space for removing all the items and adding the orphan item.

However after adding the orphan item, writepages() can by called by the VM
subsystem against the btree inode when we are under memory pressure, which
causes writeback to start for the nodes we COWed before, this forces the
operation to remove the free space item to COW again some (or all of) the
same nodes (in the tree of tree roots). Even without writepages() being
called, we could fail with ENOSPC because these items are located in
multiple trees and one of them might have a higher heigth and require
node/leaf splits at many levels, exhausting all the reserved space before
removing all the items and adding the orphan.

In the kernel 4.0 release, commit 3d84be799194 ("Btrfs: fix BUG_ON in
btrfs_orphan_add() when delete unused block group"), we attempted to fix
a BUG_ON due to ENOSPC when trying to add the orphan item by making the
cleaner kthread reserve one transaction unit before attempting to remove
the block group, but this was not enough. We had a couple user reports
still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on
a 4.2-rc6 kernel for example:

http://www.spinics.net/lists/linux-btrfs/msg46070.html

So fix this by reserving all the necessary units of metadata.

Reported-by: Stefan Priebe 
Fixes: 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete 
unused block group")
Signed-off-by: Filipe Manana 
---

V2: Added missing units to account for removing the device extent items from
the device tree (done at btrfs_remove_chunk through btrfs_free_dev_extent).

 fs/btrfs/ctree.h   |  3 ++-
 fs/btrfs/extent-tree.c | 37 ++---
 fs/btrfs/volumes.c |  3 ++-
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1573be6..d88994f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3480,7 +3480,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
   u64 type, u64 chunk_objectid, u64 chunk_offset,
   u64 size);
 struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
-   struct btrfs_fs_info *fs_info);
+   struct btrfs_fs_info *fs_info,
+   const u64 chunk_offset);
 int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 struct btrfs_root *root, u64 group_start,
 struct extent_map *em);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7820093..e97d6d6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10257,14 +10257,44 @@ out:
 }
 
 struct btrfs_trans_handle *
-btrfs_start_trans_remove_block_group(struct btrfs_fs_info *fs_info)
+btrfs_start_trans_remove_block_group(struct btrfs_fs_info *fs_info,
+const u64 chunk_offset)
 {
+   struct extent_map_tree *em_tree = _info->mapping_tree.map_tree;
+   struct extent_map *em;
+   struct map_lookup *map;
+   unsigned int num_items;
+
+   read_lock(_tree->lock);
+   em = lookup_extent_mapping(em_tree, chunk_offset, 1);
+   read_unlock(_tree->lock);
+   ASSERT(em && em->start == chunk_offset);
+
/*
+* We need to reserve 3 + N units from the metadata space info in order
+* to remove a block group (done at btrfs_remove_chunk() and at
+* btrfs_remove_block_group()), which are used for:
+*
 * 1 unit for adding the free space inode's orphan (located in the tree
 * of tree roots).
+* 1 unit for deleting the block group item (located in the extent
+* tree).
+* 1 unit for deleting the free space item (located in tree of tree
+* roots).
+* N units for deleting N device extent items corresponding to each
+* stripe (located in the device tree).
+*
+* In order to remove a block group we also need to reserve units in the
+* system space info in order to update the chunk tree (update one or
+* more device items 

[PATCH v2 0/2] Btrfs: fixes for an ENOSPC issue that left a fs unusable

2015-11-14 Thread fdmanana
From: Filipe Manana 

The following pair of changes fix an issue observed in a production
environment where any file operations done by a package manager failed
with ENOSPC. Forcing a commit of the current transaction (through "sync")
didn't help, a balance operation with the filters -dusage=0 didn't help
either and the issue persisted even after rebooting the machine. There
were many data blocks groups that were unused, but they weren't getting
deleted by the cleaner kthread because whenever it tried to start a
transaction to delete a block group it got -ENOSPC error, which it silently
ignores (as it does for any other error).

So these just make sure we fallback to use the global reserve, if -ENOSPC
is encountered through the standard allocation path, to delete block groups
as we do already for inode unlink operations. Another issue fixed is hitting
a BUG_ON() when removing a block group due to -ENSPC failure when creating
the orphan item for its free space cache inode. This second issue has
been reported by a few users in the mailing list and bugzilla (for example
at http://www.spinics.net/lists/linux-btrfs/msg46070.html).

These changes are also available at:
http://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/log/?h=integration-4.4

Thanks.

Changes in v2:

Updated the second patch to account for the space required to remove the
device extents from the device tree (was previously ignored).


Filipe Manana (2):
  Btrfs: use global reserve when deleting unused block group after
ENOSPC
  Btrfs: fix the number of transaction units needed to remove a block
group

 fs/btrfs/ctree.h   |  3 +++
 fs/btrfs/extent-tree.c | 45 +++--
 fs/btrfs/inode.c   | 24 +---
 fs/btrfs/transaction.c | 32 
 fs/btrfs/transaction.h |  4 
 fs/btrfs/volumes.c |  3 ++-
 6 files changed, 85 insertions(+), 26 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] Btrfs: use global reserve when deleting unused block group after ENOSPC

2015-11-14 Thread fdmanana
From: Filipe Manana 

It's possible to reach a state where the cleaner kthread isn't able to
start a transaction to delete an unused block group due to lack of enough
free metadata space and due to lack of unallocated device space to allocate
a new metadata block group as well. If this happens try to use space from
the global block group reserve just like we do for unlink operations, so
that we don't reach a permanent state where starting a transaction for
filesystem operations (file creation, renames, etc) keeps failing with
-ENOSPC. Such an unfortunate state was observed on a machine where over
a dozen unused data block groups existed and the cleaner kthread was
failing to delete them due to ENOSPC error when attempting to start a
transaction, and even running balance with a -dusage=0 filter failed with
ENOSPC as well. Also unmounting and mounting again the filesystem didn't
help. Allowing the cleaner kthread to use the global block reserve to
delete the unused data block groups fixed the problem.

Signed-off-by: Filipe Manana 
Signed-off-by: Jeff Mahoney 
---

V2: No changes. Only the second patch in the series was updated to
account for the space required to remove device extent items.

 fs/btrfs/ctree.h   |  2 ++
 fs/btrfs/extent-tree.c | 14 --
 fs/btrfs/inode.c   | 24 +---
 fs/btrfs/transaction.c | 32 
 fs/btrfs/transaction.h |  4 
 fs/btrfs/volumes.c |  2 +-
 6 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a2e73f6..1573be6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3479,6 +3479,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, u64 bytes_used,
   u64 type, u64 chunk_objectid, u64 chunk_offset,
   u64 size);
+struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
+   struct btrfs_fs_info *fs_info);
 int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 struct btrfs_root *root, u64 group_start,
 struct extent_map *em);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index acf3ed1..7820093 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10256,6 +10256,17 @@ out:
return ret;
 }
 
+struct btrfs_trans_handle *
+btrfs_start_trans_remove_block_group(struct btrfs_fs_info *fs_info)
+{
+   /*
+* 1 unit for adding the free space inode's orphan (located in the tree
+* of tree roots).
+*/
+   return btrfs_start_transaction_fallback_global_rsv(fs_info->extent_root,
+  1, 1);
+}
+
 /*
  * Process the unused_bgs list and remove any that don't have any allocated
  * space inside of them.
@@ -10322,8 +10333,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
 * Want to do this before we do anything else so we can recover
 * properly if we fail to join the transaction.
 */
-   /* 1 for btrfs_orphan_reserve_metadata() */
-   trans = btrfs_start_transaction(root, 1);
+   trans = btrfs_start_trans_remove_block_group(fs_info);
if (IS_ERR(trans)) {
btrfs_dec_block_group_ro(root, block_group);
ret = PTR_ERR(trans);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6e93349..f82d1f4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4046,9 +4046,7 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
  */
 static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir)
 {
-   struct btrfs_trans_handle *trans;
struct btrfs_root *root = BTRFS_I(dir)->root;
-   int ret;
 
/*
 * 1 for the possible orphan item
@@ -4057,27 +4055,7 @@ static struct btrfs_trans_handle 
*__unlink_start_trans(struct inode *dir)
 * 1 for the inode ref
 * 1 for the inode
 */
-   trans = btrfs_start_transaction(root, 5);
-   if (!IS_ERR(trans) || PTR_ERR(trans) != -ENOSPC)
-   return trans;
-
-   if (PTR_ERR(trans) == -ENOSPC) {
-   u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5);
-
-   trans = btrfs_start_transaction(root, 0);
-   if (IS_ERR(trans))
-   return trans;
-   ret = btrfs_cond_migrate_bytes(root->fs_info,
-  >fs_info->trans_block_rsv,
-  num_bytes, 5);
-   if (ret) {
-   btrfs_end_transaction(trans, root);
-   return ERR_PTR(ret);
-   }
-   trans->block_rsv = >fs_info->trans_block_rsv;
-

Re: [PATCH 00/15] btrfs: Hot spare and Auto replace

2015-11-14 Thread Goffredo Baroncelli
On 2015-11-13 11:20, Anand Jain wrote:
> 
> Thanks for comments.
> 
> On 11/13/2015 03:21 AM, Goffredo Baroncelli wrote:
>> On 2015-11-09 11:56, Anand Jain wrote:
>>> These set of patches provides btrfs hot spare and auto replace support
>>> for you review and comments.
>>
>> Hi Anand,
>>
>> is there any reason to put this kind of logic in the kernel space ?
[...]
> 
>> Another feature of this daemon could be to add a disk when the disk
>> space is too low,
> 
>  That will be at the cost of a spare device which user should review
>  the trade-offs and do it manually ? I am not sure.

If you have more than one spare, you can do automatically both: a new disk is 
added when the space is low, and a disk is replaced in case of failure. If you 
have only one spare: you may decide to reserve it only for replacing a failed 
disk. But this should be a configurable option: a low space leads to a not 
available filesystem, a failed disk means a higher likelihood to loosing all 
the filesystem. I am not sure which should be the more critical.

>> or to start a balance when there is no space to
>> allocate further chunk.
> 
>  Yep. As you notice, the thread created here is casualty_kthread()
>  (instead of replace_kthread()) over the long run I wish to provide
>  that feature in this thread, as it is a mutually exclusive operations
>  with replace.

A disk replacing should be an higher priority operation. In case of disk 
failure during a balance/defrag, these operation should be stopped to allow a 
replace.
If you want to start a replace, you should stop others (long time) operations 
like balance and defrag.

> 
>> Of course all these logic could be implemented in kernel space,
>> but I think that we should avoid that when possible.
> 
>  Easy to handle the mutually_exclusive parts with in the kernel
>  and Its better to have the important logic at one place. Two heads
>  operating on an org looking and feeling different things will lead
>  to wrong decisions.

Which is the other logic which you are referring ?

> 
>> Moreover in  user space the logging is more easy
> 
> Thanks, Anand


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More memory more jitters?

2015-11-14 Thread Duncan
Duncan posted on Sat, 14 Nov 2015 16:37:14 + as excerpted:

> Hugo Mills posted on Sat, 14 Nov 2015 14:31:12 + as excerpted:
> 
>>> I have read the Gotcha[1] page:
>>> 
>>>Files with a lot of random writes can become heavily fragmented
>>> (1+ extents) causing trashing on HDDs and excessive multi-second
>>> spikes of CPU load on systems with an SSD or **large amount a RAM**.
>>> 
>>> Why could large amount of memory worsen the problem?
>> 
>>Because the kernel will hang on to lots of changes in RAM for
>> longer. With less memory, there's more pressure to write out dirty
>> pages to disk, so the changes get written out in smaller pieces more
>> often. With more memory, the changes being written out get "lumpier".
>> 
>>> If **too much** memory is a problem, is it possible to limit the
>>> memory btrfs use?
>> 
>>There's some VM knobs you can twiddle, I believe, but I haven't
>> really played with them myself -- I'm sure there's more knowledgable
>> people around here who can suggest suitable things to play with.
> 
> Yes.  Don't have time to explain now, but I will later, if nobody beats
> me to it.

And now it's later... =:^)

The official kernel documentation for this is in
$KERNELDIR/Documentation/filesystems/proc.txt, in
CHAPTER 2: MODIFYING SYSTEM PARAMETERS
(starting at line 1378 in the file as it exists in kernel 4.3), tho 
that's little more than an intro.  As it states,
$KERNELDIR/Documentation/sysctl/* contains rather more information.

Of course there's also various resources on the net covering this 
material, and if google finds this post I suppose it might become one of 
them. =:^]


So in that Documentation/sysctl dir, the README file contains an intro, 
but what we're primarily interested in is covered in vm.txt.  The files 
discussed there are found in /proc/sys/vm, tho your distro almost 
certainly has an init service, sysctl (the systemd-sysctl.service on 
systemd based systems, configured with *.conf files in /usr/lib/ssctl.d/ 
and /etc/sysctl.d/), that pokes non-kernel-default distro-configured and 
admin-configured values into the appropriate /proc/sys/vm/* files at 
boot.  Also check /etc/sysctl.conf, which at least here is symlinked 
from /etc/sysctl.d/99-sysctl.conf so systemd-sysctl loads it.  That's 
actually the file with my settings, here.

So (as root) you can poke the files directly for experimentation, and 
when you've settled on values that work for you, you can put them in /etc/
sysctl.d/*.conf or in /etc/sysctl.conf, or whatever your distro uses 
instead.  But keep in mind that (for systemd based systems anyway) the 
settings in /usr/lib/sysctl.d/*.conf will be loaded first and thus will 
apply if not overridden by your own config, so you might want to check 
there too, to see what's being applied there, before going too wild on 
your overrides.

Of course the sysctl mechanism loads various other settings as well, 
network, core-file, magic-srq, others, but what we're focused on here are 
the vm files and settings.

In particular, our files of interest are the /proc/sys/vm/dirty_* files 
and corresponding vm.dirty_* settings, tho while we're here, I'll mention 
that /proc/sys/vm/swappiness and the corresponding vm.swappiness setting 
is also quite commonly changed by users.

Basically, these dirty_* files control the amount of cached writes that 
can accumulate before the kernel will start writing them to storage at 
two different priority levels, the maximum time they are allowed to age 
before they're written back regardless, and the balance between these two 
writeback priorities.

Now, one thing that's important to keep in mind here is that the kernel 
defaults were originally setup back when 128 MiB RAM was a *LOT* of 
memory, and they aren't necessarily appropriate for systems with the GiB 
or often double-digit GiB RAM that most non-embedded systems come with 
today, particularly where people are still using legacy spinning rust -- 
SSDs are enough faster that the problem doesn't show up to the same 
degree, tho admins may still want to tweak the defaults in some cases.

Another thing to keep in mind for mobile systems in particular is that 
writing data out will of course spin up the drives, so you might want 
rather larger caches and longer timeouts on laptops and the like, and/or 
if you spin down your drives.  But balance that against the knowledge 
that data still in the write cache will be lost if the system crashes 
before it hits storage, so don't go /too/ overboard on extending your 
timeouts.  Timeouts of an hour could well save quite a bit of power, but 
they also risk losing an hour's worth of writes!


OK, from that rather high level view, let's jump to the lower level 
actual settings, tho not yet the actual values.  I'll group the settings 
in my discussion, but you can read the description for each individual 
setting in the vm.txt file mentioned above, if you like.

Note that there's a two-dimension parallel among the four 

Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-14 Thread Christoph Anton Mitterer
On Sun, 2015-11-15 at 09:29 +0800, Qu Wenruo wrote:
> > > If type is wrong, all the extents inside the chunk should be
> > > reported
> > > as
> > > mismatch type with chunk.
> > Isn't that the case? At least there are so many reported extents...
> 
> If you posted all the output
Sure, I posted everything that the dump gave :)

> , that's just a little more than nothing.
> Just tens of error reported, compared to millions of extents.
> And in your case, if a chunk is really bad, it will report about 65K
> errors.
I see..


> I think it's a btrfsck issue, at least from the dump info, your
> extent 
> tree is OK.
> And if there is no other error reported from btrfsck, your filesystem
> should be OK.
Nope.. there were no further errors.



> > In any case, I'll keep the fs in question for a while, so that I
> > can do
> > verifications in case you have patches.
> 
> Nice.
Just tell me if you have something.



btw: I saw these:
Nov 15 02:01:42 heisenberg kernel: INFO: task btrfs-transacti:28379 blocked for 
more than 120 seconds.
Nov 15 02:01:42 heisenberg kernel:   Not tainted 4.2.0-1-amd64 #1
Nov 15 02:01:42 heisenberg kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 02:01:42 heisenberg kernel: btrfs-transacti D 8109a1b0 0 
28379  2 0x
Nov 15 02:01:42 heisenberg kernel:  88016e3e6500 0046 
007a 88040be88f00
Nov 15 02:01:42 heisenberg kernel:  2659 88013807 
88041e355840 7fff
Nov 15 02:01:42 heisenberg kernel:  815508e0 88013806fbb8 
0007 815500ff
Nov 15 02:01:42 heisenberg kernel: Call Trace:
Nov 15 02:01:42 heisenberg kernel:  [] ? 
bit_wait_timeout+0x70/0x70
Nov 15 02:01:42 heisenberg kernel:  [] ? schedule+0x2f/0x70
Nov 15 02:01:42 heisenberg kernel:  [] ? 
schedule_timeout+0x1f7/0x290
Nov 15 02:01:42 heisenberg kernel:  [] ? 
extent_write_cache_pages.isra.28.constprop.43+0x222/0x330 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? read_tsc+0x5/0x10
Nov 15 02:01:42 heisenberg kernel:  [] ? 
bit_wait_timeout+0x70/0x70
Nov 15 02:01:42 heisenberg kernel:  [] ? 
io_schedule_timeout+0x9d/0x110
Nov 15 02:01:42 heisenberg kernel:  [] ? bit_wait_io+0x35/0x60
Nov 15 02:01:42 heisenberg kernel:  [] ? 
__wait_on_bit+0x5a/0x90
Nov 15 02:01:42 heisenberg kernel:  [] ? 
find_get_pages_tag+0x116/0x150
Nov 15 02:01:42 heisenberg kernel:  [] ? 
wait_on_page_bit+0xb6/0xc0
Nov 15 02:01:42 heisenberg kernel:  [] ? 
autoremove_wake_function+0x40/0x40
Nov 15 02:01:42 heisenberg kernel:  [] ? 
filemap_fdatawait_range+0xc7/0x140
Nov 15 02:01:42 heisenberg kernel:  [] ? 
btrfs_wait_ordered_range+0x73/0x110 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? 
btrfs_wait_cache_io+0x5d/0x1e0 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? 
btrfs_start_dirty_block_groups+0x17c/0x3f0 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? 
btrfs_commit_transaction+0x1b4/0xa90 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? 
start_transaction+0x90/0x580 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? 
transaction_kthread+0x224/0x240 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? 
btrfs_cleanup_transaction+0x510/0x510 [btrfs]
Nov 15 02:01:42 heisenberg kernel:  [] ? kthread+0xc1/0xe0
Nov 15 02:01:42 heisenberg kernel:  [] ? 
kthread_create_on_node+0x170/0x170
Nov 15 02:01:42 heisenberg kernel:  [] ? 
ret_from_fork+0x3f/0x70
Nov 15 02:01:42 heisenberg kernel:  [] ? 
kthread_create_on_node+0x170/0x170
Nov 15 02:03:42 heisenberg kernel: INFO: task btrfs-transacti:28379 blocked for 
more than 120 seconds.
Nov 15 02:03:42 heisenberg kernel:   Not tainted 4.2.0-1-amd64 #1
Nov 15 02:03:42 heisenberg kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 02:03:42 heisenberg kernel: btrfs-transacti D 8109a1b0 0 
28379  2 0x
Nov 15 02:03:42 heisenberg kernel:  88016e3e6500 0046 
007a 88040be88f00
Nov 15 02:03:42 heisenberg kernel:  2659 88013807 
88041e355840 7fff
Nov 15 02:03:42 heisenberg kernel:  815508e0 88013806fbb8 
0007 815500ff
Nov 15 02:03:42 heisenberg kernel: Call Trace:
Nov 15 02:03:42 heisenberg kernel:  [] ? 
bit_wait_timeout+0x70/0x70
Nov 15 02:03:42 heisenberg kernel:  [] ? schedule+0x2f/0x70
Nov 15 02:03:42 heisenberg kernel:  [] ? 
schedule_timeout+0x1f7/0x290
Nov 15 02:03:42 heisenberg kernel:  [] ? 
extent_write_cache_pages.isra.28.constprop.43+0x222/0x330 [btrfs]
Nov 15 02:03:42 heisenberg kernel:  [] ? read_tsc+0x5/0x10
Nov 15 02:03:42 heisenberg kernel:  [] ? 
bit_wait_timeout+0x70/0x70
Nov 15 02:03:42 heisenberg kernel:  [] ? 
io_schedule_timeout+0x9d/0x110
Nov 15 02:03:42 heisenberg kernel:  [] ? bit_wait_io+0x35/0x60
Nov 15 02:03:42 heisenberg kernel:  [] ? 
__wait_on_bit+0x5a/0x90
Nov 15 02:03:42 heisenberg kernel:  [] ? 
find_get_pages_tag+0x116/0x150
Nov 15 02:03:42 heisenberg kernel:  [] ? 

Re: Using Btrfs on single drives

2015-11-14 Thread audio muze
I've gone ahead and created a single drive Btrfs filesystem on a 3TB
drive and started copying content from a raid5 array to the Btrfs
volume.  Initially copy speeds were very good sustained at ~145MB/s
and I left it to run overnight.  This morning I ran btrfs fi usage
/mnt/btrfs and it reported around 700GB free.  I selected another
folder containing 204GB and started a copy operation, again from the
raid5 array to the Btrfs volume.  Copying is now materially slower and
slowing further...it started at ~105MB/s and after 141GB has slowed to
around 97MB/s.  Is this to be expected with Btrfs of have I come
across a bug of some sort?

On Sat, Nov 14, 2015 at 12:43 PM, audio muze  wrote:
> I'm looking to make a "production copy" of my music and video library
> for use in our media server.  It is not my intent to create any form
> of RAID array, but rather to treat each drive independently where
> filesystem is concerned and then to create a single view of the drives
> using mhddfs.  As the data will remain relatively static I may also
> deploy Snapraid in conjunction with mhddfs.
>
> I'm considering using Btrfs as the underlying filesystem on each of
> the individual drives, principally to take advantage of metadata
> redundancy.  Am I corect in surmising that ?I can turn checksumming
> off given it's of no utility where a Btrfs volume is comprised of a
> single device only?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using Btrfs on single drives

2015-11-14 Thread Duncan
audio muze posted on Sun, 15 Nov 2015 05:27:00 +0200 as excerpted:

> I've gone ahead and created a single drive Btrfs filesystem on a 3TB
> drive and started copying content from a raid5 array to the Btrfs
> volume.  Initially copy speeds were very good sustained at ~145MB/s and
> I left it to run overnight.  This morning I ran btrfs fi usage
> /mnt/btrfs and it reported around 700GB free.  I selected another folder
> containing 204GB and started a copy operation, again from the raid5
> array to the Btrfs volume.  Copying is now materially slower and slowing
> further...it started at ~105MB/s and after 141GB has slowed to around
> 97MB/s.  Is this to be expected with Btrfs of have I come across a bug
> of some sort?

That looks to /me/ like native drive limitations.

Due to the fact that a modern hard drive spins at the same speed no 
matter where the read/write head is located, when it's reading/writing to 
the first part of the drive -- the outside -- much more linear drive 
distance will pass under the read/write heads in say a tenth of a second 
than will be the case as the last part of the drive is filled -- the 
inside -- and throughput will be much higher at the first of the drive.

You report a 3 TB drive with initial/outside speeds of ~145 MB/s, then 
after copying quite some data, in the morning it had ~700 GB free, so 
presumably you had written something over 2 TB to it.  I'll leave the 
precise math to someone else, but you report that it started the second 
copy at 105 MB/s and was down to 97 MB/s after another 141 GB, so 
presumably ~550 GB free.  That's a slowdown of roughly a third from the 
initial outside edge where it was covering perhaps twice as much linear 
drive distance per unit of time, so it doesn't sound at all unreasonable 
to me.

What's the actual extended sequential write throughput rating on the 
drive?  What do the online reviews of the product say it does?  Have you 
used hdparm to test it?

It's kinda late for this test now, but if before creating a big 
filesystem out of the whole thing, if for testing you had created a small 
partition at the beginning of the drive, and another at the end, you 
could have then used hdparm to test each to see what the relative speed 
difference was between them, and further, if desired, you could have 
created small partitions at specific size locations into the drive, and 
done similar testing, to find the speed at say 1 TB into the drive, 2 TB 
in, etc.  Of course after testing you could erase those temporary 
partitions and make one big filesystem out of it, if desired.

Of course this is one of the big differences with SSDs, since they aren't 
spinning any longer and have direct access to any part of the device with 
just an address change, so speeds for them, in addition to being far 
faster, should normally be the same across the device.  But of course 
they cost far more per GB or TB, and tend to be vastly more expensive in 
the TB+ size ranges, tho you can of course combine many smaller ones 
using raid technologies to create a larger logical one, but you'll still 
be paying a marked premium for the SSD technology.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using Btrfs on single drives

2015-11-14 Thread Marc Joliet
On Sunday 15 November 2015 04:01:57 Duncan wrote:
>audio muze posted on Sun, 15 Nov 2015 05:27:00 +0200 as excerpted:
>> I've gone ahead and created a single drive Btrfs filesystem on a 3TB
>> drive and started copying content from a raid5 array to the Btrfs
>> volume.  Initially copy speeds were very good sustained at ~145MB/s and
>> I left it to run overnight.  This morning I ran btrfs fi usage
>> /mnt/btrfs and it reported around 700GB free.  I selected another folder
>> containing 204GB and started a copy operation, again from the raid5
>> array to the Btrfs volume.  Copying is now materially slower and slowing
>> further...it started at ~105MB/s and after 141GB has slowed to around
>> 97MB/s.  Is this to be expected with Btrfs of have I come across a bug
>> of some sort?
>
>That looks to /me/ like native drive limitations.
>
[Snip nice explanation]

I'll just add that I see this with my 3TB USB3 HDD, too, but also with my 
internal HDDs.  Old drives (the oldest I had were about 10 years old) also had 
this problem, only scaled appropriately (the worst was something like 40/60 
GB/s min./max.).

You can also see this very nicely with scrub runs (I use dstat for this):  
they start out at the max., but gradually slow down as they progress.

HTH
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: Where is the disk space?

2015-11-14 Thread Liu Bo
Hi,

On Fri, Nov 13, 2015 at 09:41:01AM -0800, Marc MERLIN wrote:
> root@polgara:/mnt/btrfs_root# du -sh *
> 28G @
> 28G @_hourly.20151113_08:04:01
> 4.0K@_last
> 4.0K@_last_rw
> 28G @_rw.20151113_00:02:01
> root@polgara:/mnt/btrfs_root# df -h .
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/sdb556G   40G  5.4G  89% /mnt/btrfs_root
> 
> root@polgara:/mnt/btrfs_root# btrfs fi df .
> Data, single: total=39.85GiB, used=38.52GiB
> System, DUP: total=8.00MiB, used=16.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=6.00GiB, used=579.17MiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=208.00MiB, used=0.00B
> 
> root@polgara:/mnt/btrfs_root# btrfs fi show .
> Label: 'btrfs_root'  uuid: a2a1ed7b-6bfe-4e83-bc10-727126ed17bf
> Total devices 1 FS bytes used 39.09GiB
> devid1 size 55.88GiB used 51.88GiB path /dev/sdb5
> 
> btrfs-progs v4.0-dirty
> root@polgara:/mnt/btrfs_root# 
> 
> root@polgara:/mnt/btrfs_root# btrfs balance start -dusage=80 -v 
> /mnt/btrfs_root
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=80
> Done, had to relocate 1 out of 55 chunks
> 
> Sadly, it's only running 3.17.8 because of complicated reasons, but still, 
> 
> 1) I have 28GB used (modulo a few files between the btrfs send snapshots and
> current status)
> 
> 2) fi show shows I'm using 39GB, not sure where the extra 11GB came from
> 
> 3) fi df agrees with fi show
> 
> 4) regular df agrees on used too, but shows 5GB free instead of 15GB despite
> the filesystem being balanced.
> 
> I did have a bunch of snapshots that I did delete a while ago now, but it
> looks like their blocks aren't being reclaimed.
> 
> Any ideas?
> 

Since you said you have some snapshots in between...I can think of one
case to prove where the space goes,

Say, you have a file with size=10M on a freshly created partition(the total 
used data space is 10M), and you have a snapshot which owns this file, then you 
modify the original file by overwrite the range [3M, 5M], and right now you can 
find that the total used data space increases to 15M or maybe more (because of 
unaliged write and extent pads to 4K length).

This comes from our COW and extent references implementation, so you get
the benefit of COW, meanwhile have to live with the un-reclaimed space.

It's sort of something I was trying to fix, but I found that my approach
led to other problems so I decided to give it up.

Thanks,

-liubo

> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


More memory more jitters?

2015-11-14 Thread CHENG Yuk-Pong, Daniel
Hi List,


I have read the Gotcha[1] page:

   Files with a lot of random writes can become heavily fragmented
(1+ extents) causing trashing on HDDs and excessive multi-second
spikes of CPU load on systems with an SSD or **large amount a RAM**.

Why could large amount of memory worsen the problem?

If **too much** memory is a problem, is it possible to limit the
memory btrfs use?

Background info:

I am running a heavy-write database server with 96GB ram. In the worse
case it cause multi minutes of high cpu loads. Systemd keeping kill
and restarting services, and old job don't die because they stuck in
uninterruptable wait... etc.

Tried with nodatacow, but it seems only affect new file. It is not an
subvolume option either...


Regards,
Daniel


[1] https://btrfs.wiki.kernel.org/index.php/Gotchas#Fragmentation
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs device initialisation is quite slow

2015-11-14 Thread Henk Slager
It might be that your metadata is quite scattered and if the 320GB is
a HDD and not an SSD,  than this 11s is just  what it takes.
Scattered metadata might be caused by the autodefrag mount option I
think (and by fs getting older and changing often).

What is the output of   btrfs fi df /

You could run   btrfs balance start -musage=50 /   or a bit higher
number to compact the metadata

If this does not help, it could be there is some error in the
filesystem that makes btrfs take time to  figure out, but I done have
example or experience with  it. The only thing that could cause even
more excessive mount delays is when you have an interupted (full)
balance restarting, but that would not be the case every time you
boot.

Maybe a   btrfs scrub start /   could lead to identifying HDD sectors
going bad, but it is unlikely the case.







On Sat, Nov 14, 2015 at 5:38 AM, Robbie Smith  wrote:
> Hey all
>
> I've been trying to figure out why my system (home desktop) is taking
> so long to boot. Systemd-analyze tells me that my root filesystem
> partition (which is btrfs) takes ~11 seconds to become active, and I'm
> curious as to why and whether or not I can optimise this.
>
> The primary disk has 4 partitions: a EFI/BIOS boot partion (for GRUB);
> a /boot partition (ext4); a swap partition; and the root partition. The
> disk itself is not particularly large (320 GB), and I'm using
> subvolumes to emulate partitions in btrfs. There are three top-level
> subvolumes, for /, /home, and /var, none of which have quotas, and I'm
> not at present doing snapshots because I backup every day to an
> external drive formatted with ext4.
>
> I've got a second 5 TB drive for multimedia that is also btrfs, but it
> only takes ~3 seconds to come online. I had been using a number of bind
> mounts from the multimedia drive to my home folder, so that $HOME/music
> and $HOME/videos point to the library, and replacing them with symlinks
> reduced the time by ~3 seconds, but it still doesn't account for why
> the root device takes so long.
>
> My fstab contains the following:
>
> # /dev/sdc4 LABEL=filesystem
> UUID=4ec80601-4799-4fa8-a711-0171c180f25b /
> btrfs rw,noatime,space_cache,autodefrag,subvol=rootvol 0 0
>
> # /dev/sdc4 LABEL=filesystem
> UUID=4ec80601-4799-4fa8-a711-0171c180f25b /home btrfs 
> rw,noatime,space_cache,autodefrag,subvol=homevol 0 0
>
> # /dev/sdc4 LABEL=filesystem
> UUID=4ec80601-4799-4fa8-a711-0171c180f25b /var btrfs 
> rw,noatime,space_cache,autodefrag,subvol=var 0 0
>
> # /dev/sdc2 LABEL=boot
> UUID=ca281471-0aac-4090-8660-33b8b9fee5a3 /boot ext4 rw,relatime,data=ordered 
> 0 2
>
> # /dev/sdb1 LABEL=library
> UUID=97226949-50e0-4a78-899e-863f5b436bcc /mnt/library btrfs 
> rw,noatime,space_cache,autodefrag 0 0
>
>
> Can anyone offer any insights or advice?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using Btrfs on single drives

2015-11-14 Thread Duncan
Goffredo Baroncelli posted on Sat, 14 Nov 2015 12:09:21 +0100 as
excerpted:

> On 2015-11-14 11:43, audio muze wrote:
>> I can turn checksumming off given it's of no utility where a Btrfs
>> volume is comprised of a single device only?
> 
> The checksums are used to detect a data corruption; in case of a
> btrfs-raid, the checksums are used *also* to pick the good copy.

And yes, you can turn them off (for data, not metadata), using the 
nodatasum mount option.

Tho personally, I prefer raid1, not just for the normal raid1 capacities, 
but for the ability to scrub corrupt data as well, and thus would never 
turn off checksumming here (except possibly in the context of nocow, for 
vm images, etc).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More memory more jitters?

2015-11-14 Thread Hugo Mills
On Sat, Nov 14, 2015 at 10:11:31PM +0800, CHENG Yuk-Pong, Daniel  wrote:
> Hi List,
> 
> 
> I have read the Gotcha[1] page:
> 
>Files with a lot of random writes can become heavily fragmented
> (1+ extents) causing trashing on HDDs and excessive multi-second
> spikes of CPU load on systems with an SSD or **large amount a RAM**.
> 
> Why could large amount of memory worsen the problem?

   Because the kernel will hang on to lots of changes in RAM for
longer. With less memory, there's more pressure to write out dirty
pages to disk, so the changes get written out in smaller pieces more
often. With more memory, the changes being written out get "lumpier".

> If **too much** memory is a problem, is it possible to limit the
> memory btrfs use?

   There's some VM knobs you can twiddle, I believe, but I haven't
really played with them myself -- I'm sure there's more knowledgable
people around here who can suggest suitable things to play with.

   Hugo.

> Background info:
> 
> I am running a heavy-write database server with 96GB ram. In the worse
> case it cause multi minutes of high cpu loads. Systemd keeping kill
> and restarting services, and old job don't die because they stuck in
> uninterruptable wait... etc.
> 
> Tried with nodatacow, but it seems only affect new file. It is not an
> subvolume option either...
> 
> 
> Regards,
> Daniel
> 
> 
> [1] https://btrfs.wiki.kernel.org/index.php/Gotchas#Fragmentation

-- 
Hugo Mills | Anyone who says their system is completely secure
hugo@... carfax.org.uk | understands neither systems nor security.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |Bruce Schneier


signature.asc
Description: Digital signature


Re: More memory more jitters?

2015-11-14 Thread Duncan
Hugo Mills posted on Sat, 14 Nov 2015 14:31:12 + as excerpted:

>> I have read the Gotcha[1] page:
>> 
>>Files with a lot of random writes can become heavily fragmented
>> (1+ extents) causing trashing on HDDs and excessive multi-second
>> spikes of CPU load on systems with an SSD or **large amount a RAM**.
>> 
>> Why could large amount of memory worsen the problem?
> 
>Because the kernel will hang on to lots of changes in RAM for
> longer. With less memory, there's more pressure to write out dirty pages
> to disk, so the changes get written out in smaller pieces more often.
> With more memory, the changes being written out get "lumpier".
> 
>> If **too much** memory is a problem, is it possible to limit the memory
>> btrfs use?
> 
>There's some VM knobs you can twiddle, I believe, but I haven't
> really played with them myself -- I'm sure there's more knowledgable
> people around here who can suggest suitable things to play with.

Yes.  Don't have time to explain now, but I will later, if nobody beats 
me to it.



-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-14 Thread Qu Wenruo



在 2015年11月14日 10:29, Christoph Anton Mitterer 写道:

On Sat, 2015-11-14 at 09:22 +0800, Qu Wenruo wrote:

Manually checked they all.

thanks a lot :-)



Strangely, they are all OK... although it's a good news for you.

Oh man... you're s mean ;-D



They are all tree blocks and are all in metadata block group.

and I guess that's... expected/intended?


Yes, that's the expected behavior.
But dismatch with btrfsck error report.




It seems to be a btrfsck false alert

that's a relieve (for me)

Well I've already started to copy all files from the device to a new
one... unfortunately I'll loose all older snapshots (at least on the
new fs) but instead I get skinny-metadata, which wasn't the default
back then.


Skinny metadata is quite nice feature, hugely reduce the space of 
metadata extent item size.



(being able to copy a full fs, with all subvols/snapshots is IMHO
really something that should be worked on)



If type is wrong, all the extents inside the chunk should be reported
as
mismatch type with chunk.

Isn't that the case? At least there are so many reported extents...


If you posted all the output, that's just a little more than nothing.
Just tens of error reported, compared to millions of extents.
And in your case, if a chunk is really bad, it will report about 65K errors.




And according to the dump result, the reported ones are not
continuous
even they have adjacent extents but adjacent ones are not reported.

I'm not so deep into btrfs... is this kinda expected and if not how
could all this happen? Or is it really just a check issue and
filesystem-wise fully as it should be?


I think it's a btrfsck issue, at least from the dump info, your extent 
tree is OK.
And if there is no other error reported from btrfsck, your filesystem 
should be OK.






Did you have any smaller btrfs with the same false alert?

Uhm... I can check, but I don't think so, especially as all other btrfs
I have are newer and already have skinny-metadata.
The only ones I had without are those two big 8TB HDDs...
Unfortunately they contain sensitive data from work, which I don't
think I can copy, otherwise  could have sent you the device or so...


Although I'll check the code to find what's wrong, but if you have
any
small enough image, debugging will be much much faster.

In any case, I'll keep the fs in question for a while, so that I can do
verifications in case you have patches.


Nice.

Thanks,
Qu


thanks a lot,
Chris.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html