Re: [PATCH 13/20] btrfs-progs: use cmd_struct as command entry point

2018-03-11 Thread Jeff Mahoney
On 3/7/18 9:40 PM, je...@suse.com wrote:
> diff --git a/cmds-filesystem.c b/cmds-filesystem.c
> index 62112705..ec038f2f 100644
> --- a/cmds-filesystem.c
> +++ b/cmds-filesystem.c
> @@ -1075,6 +1078,7 @@ next:
>  
>   return !!defrag_global_errors;
>  }
> +static DEFINE_SIMPLE_COMMAND(filesystem_defrag, "defrag");

"defragment"

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 13/20] btrfs-progs: use cmd_struct as command entry point

2018-03-11 Thread Jeff Mahoney
On 3/7/18 9:40 PM, je...@suse.com wrote:
> From: Jeff Mahoney 
> diff --git a/cmds-inspect.c b/cmds-inspect.c
> index afd7fe48..12f200b3 100644
> --- a/cmds-inspect.c
> +++ b/cmds-inspect.c
> @@ -625,33 +629,27 @@ static int cmd_inspect_min_dev_size(int argc, char 
> **argv)
>  out:
>   return !!ret;
>  }
> +static DEFINE_SIMPLE_COMMAND(inspect_min_dev_size, "min-dev-size");
>  
>  static const char inspect_cmd_group_info[] =
>  "query various internal information";
>  
> -const struct cmd_group inspect_cmd_group = {
> +static const struct cmd_group inspect_cmd_group = {
>   inspect_cmd_group_usage, inspect_cmd_group_info, {
> - { "inode-resolve", cmd_inspect_inode_resolve,
> - cmd_inspect_inode_resolve_usage, NULL, 0 },
> - { "logical-resolve", cmd_inspect_logical_resolve,
> - cmd_inspect_logical_resolve_usage, NULL, 0 },
> - { "subvolid-resolve", cmd_inspect_subvolid_resolve,
> - cmd_inspect_subvolid_resolve_usage, NULL, 0 },
> - { "rootid", cmd_inspect_rootid, cmd_inspect_rootid_usage, NULL,
> - 0 },
> - { "min-dev-size", cmd_inspect_min_dev_size,
> - cmd_inspect_min_dev_size_usage, NULL, 0 },
> - { "dump-tree", cmd_inspect_dump_tree,
> - cmd_inspect_dump_tree_usage, NULL, 0 },
> - { "dump-super", cmd_inspect_dump_super,
> - cmd_inspect_dump_super_usage, NULL, 0 },
> - { "tree-stats", cmd_inspect_tree_stats,
> - cmd_inspect_tree_stats_usage, NULL, 0 },
> - NULL_CMD_STRUCT
> + _struct_inspect_inode_resolve,
> + _struct_inspect_logical_resolve,
> + _struct_inspect_subvolid_resolve,
> + _struct_inspect_rootid,
> + _struct_inspect_min_dev_size,
> + _struct_inspect_dump_tree,
> + _struct_inspect_dump_super,
> + _struct_inspect_tree_stats,
> + NULL
>   }
>  };
>  
> -int cmd_inspect(int argc, char **argv)
> +static int cmd_inspect(int argc, char **argv)
>  {
>   return handle_command_group(_cmd_group, argc, argv);
>  }
> +DEFINE_GROUP_COMMAND(inspect, "inspect");

"inspect-internal"

-Jeff


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] btrfs-progs: dump-tree: add degraded option

2018-03-11 Thread Qu Wenruo


On 2018年03月10日 21:54, Anand Jain wrote:
> btrfs inspect dump-tree cli picks the disk with the largest generation
> to read the root tree, even when all the devices were not provided in
> the cli. But in 2 disks RAID1 you may need to know what's in the disks
> individually, so this option -x | --noscan indicates to use only the
> given disk to dump.
> 
> Signed-off-by: Anand Jain 
> ---
> v1->v2: rename --degraded to --noscan
> 
>  cmds-inspect-dump-tree.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
> index df44bb635c9c..d2676ce55af7 100644
> --- a/cmds-inspect-dump-tree.c
> +++ b/cmds-inspect-dump-tree.c
> @@ -198,6 +198,7 @@ const char * const cmd_inspect_dump_tree_usage[] = {
>   "-u|--uuid  print only the uuid tree",
>   "-b|--block  print info from the specified block only",
>   "-t|--tree print only tree with the given id (string or 
> number)",
> + "-x|--noscanuse the disk in the arg, do not scan for the 
> disks (for raid1)",

Still it looks a little too restrict.

What about some equivalent just like kernel "device=" mount option?

Thanks,
Qu

>   NULL
>  };
>  
> @@ -234,10 +235,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   { "uuid", no_argument, NULL, 'u'},
>   { "block", required_argument, NULL, 'b'},
>   { "tree", required_argument, NULL, 't'},
> + { "noscan", no_argument, NULL, 'x'},
>   { NULL, 0, NULL, 0 }
>   };
>  
> - c = getopt_long(argc, argv, "deb:rRut:", long_options, NULL);
> + c = getopt_long(argc, argv, "deb:rRut:x", long_options, NULL);
>   if (c < 0)
>   break;
>   switch (c) {
> @@ -286,6 +288,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   }
>   break;
>   }
> + case 'x':
> + open_ctree_flags |= OPEN_CTREE_NO_DEVICES;
> + break;
>   default:
>   usage(cmd_inspect_dump_tree_usage);
>   }
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-11 Thread Tobin C. Harding
On Fri, Mar 09, 2018 at 01:10:30PM -0800, Linus Torvalds wrote:
> On Fri, Mar 9, 2018 at 12:05 PM, Kees Cook  wrote:
> > When max() is used in stack array size calculations from literal values
> > (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
> > thinks this is a dynamic calculation due to the single-eval logic, which
> > is not needed in the literal case. This change removes several accidental
> > stack VLAs from an x86 allmodconfig build:
> 
> Ok, looks good.
> 
> I just have a couple of questions about applying it.
> 
> In particular, if this will help people working on getting rid of
> VLA's in the short term, I can apply it directly. But if people who
> are looking at it (anybody else than Kees?) don't much care, then this
> might be a 4.17 thing or at least "random -mm queue"?

It's easy enough to work on the other VLA removals without basing on
this, no rush.


thanks,
Tobin.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ongoing Btrfs stability issues

2018-03-11 Thread Christoph Anton Mitterer
On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote:
> 
> COW is needed to properly checksum the data. Otherwise is not
> possible to ensure the coherency between data and checksum (however I
> have to point out that BTRFS fails even in this case [*]).
> We could rearrange this sentence, saying that: if you want checksum,
> you need COW...

No,... not really... the meta-data is anyway always CoWed... so if you
do checksum *and* notdatacow,..., the only thing that could possibly
happen (in the worst case) is, that data that actually made it
correctly to the disk is falsely determined bad, as the metadata (i.e.
the checksums) weren't upgraded correctly.

That however is probably much less likely than the other way round,..
i.e. bad data went to disk and would be detected with checksuming.


I had lots of discussions about this here on the list, and no one ever
brought up a real argument against it... I also had an off-list
discussion with Chris Mason who IIRC confirmed that it would actually
work as I imagine it... with the only two problems:
- good data possibly be marked bad because of bad checksums
- reads giving back EIO where people would rather prefer bad data
(not really sure if this were really his two arguments,... I'd have to
look it up, so don't nail me down).


Long story short:

In any case, I think giving back bad data without EIO is unacceptable.
If someone really doesn't care (e.g. because he has higher level
checksumming and possibly even repair) he could still manually disable
checksumming.

The little chance of having a false positive weights IMO far less that
have very large amounts of data (DBs, VM images are our typical cases)
completely unprotected.

And not having checksumming with notdatacow breaks any safe raid repair
(so in that case "repair" may even overwrite good data),... which is
IMO also unacceptable.
And the typical use cases for nodatacow (VMs, DBs) are in turn not so
uncommon to want RAID.


I really like btrfs,... and it's not that other fs (which typically
have no checksumming at all) would perform better here... but not
having it for these major use case is a big disappointment for me.


Cheers,
Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-11 Thread Linus Torvalds
On Sun, Mar 11, 2018 at 4:05 AM, Ingo Molnar  wrote:
>
> BTW., while I fully agree with everything you said, it's not entirely correct 
> to
> claim that if a C compiler can generate VLA code it is necessarily able to 
> parse
> and evaluate constant array sizes "just fine".
>
> Constant expressions are typically parsed very early on, at the preprocessing
> stage. They can be used with some preprocessor directives as well, such as 
> '#if'
> (with some further limitations on their syntax).

Yes. But constant simplification and CSE etc is just a very
fundamental part of a compiler, and anybody who actually implements
VLA's would have to do it anyway.

So no, a message like

  warning: Array declaration is not a C90 constant expression,
resulting in VLA code generation

would be moronic. Only some completely mindless broken shit would do
"oh, it's not a parse-time constant, so it will be variable". The two
just do not follow AT ALL.

So the message might be about _possibly_ resulting in VLA code
generation, but honestly, at that point you should just add the
warning when you actually generate the code to do the stack
allocation.

Because at that point, you know whether it's variable or not.

And trust me, it won't be variable for things like (2,3), or even for
our "max()" thing with other odd builtins. Not unless the compiler
doesn't really support VLA at all (maybe some bolted-on crazy thing
that just turns a VLA at the front-end time into just an alloca), or
the user has explicitly asked to disable some fundamental optimization
phase.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ongoing Btrfs stability issues

2018-03-11 Thread Goffredo Baroncelli
On 03/10/2018 03:29 PM, Christoph Anton Mitterer wrote:
> On Sat, 2018-03-10 at 14:04 +0200, Nikolay Borisov wrote:
>> So for OLTP workloads you definitely want nodatacow enabled, bear in
>> mind this also disables crc checksumming, but your db engine should
>> already have such functionality implemented in it.
> 
> Unlike repeated claims made here on the list and other places... I
> woudln't now *any* DB system which actually does this per default and
> or in a way that would be comparable to filesystem lvl checksumming.
> 

I agree with you, also nobody warn that without checksum in case of RAID 
filesystem BTRFS is not capable anymore to check if a stripe is correct or not

> 
> Look back in the archives... when I've asked several times for
> checksumming support *with* nodatacow, I evaluated the existing status
> for the big ones (postgres,mysql,sqlite,bdb)... and all of them had
> this either not enabled per default, not at all, or requiring special
> support for the program using the DB.
> 
> 
> Similar btw: no single VM image type I've evaluated back then had any
> form of checksumming integrated.
> 
> 
> Still, one of the major deficiencies (not in comparison to other fs,
> but in comparison to how it should be) of btrfs unfortunately :-(

COW is needed to properly checksum the data. Otherwise is not possible to 
ensure the coherency between data and checksum (however I have to point out 
that BTRFS fails even in this case [*]).
We could rearrange this sentence, saying that: if you want checksum, you need 
COW...

> 
> 
> Cheers,
> Chris.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

[*] https://www.spinics.net/lists/linux-btrfs/msg69185.html

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid1 volume stuck as read-only: How to dump, recreate and restore its content?

2018-03-11 Thread Adam Borowski
On Sun, Mar 11, 2018 at 11:28:08PM +0700, Andreas Hild wrote:
> Following a physical disk failure of a RAID1 array, I tried to mount
> the remaining volume of a root partition with "-o degraded". For some
> reason it ended up as read-only as described here:
> https://btrfs.wiki.kernel.org/index.php/Gotchas#raid1_volumes_only_mountable_once_RW_if_degraded
> 
> 
> How to precisely do this: dump, recreate and restore its contents?
> Could someone please provided more details how to recover this volume
> safely?

> Linux debian 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3 (2017-12-03) x86_64 
> GNU/Linux

> [ 1313.279140] BTRFS warning (device sdb2): missing devices (1)
> exceeds the limit (0), writeable mount is not allowed

I'd recommend instead going with kernel 4.14 or newer (available in
stretch-backports), which handles this case well without the need to
restore.  If there's no actual data loss (there shouldn't be, it's RAID1
with only a single device missing), you can mount degraded normally, then
balance the data onto the new disk.

Recovery with 4.9 is unpleasant.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄ A master species delegates.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid1 volume stuck as read-only: How to dump, recreate and restore its content?

2018-03-11 Thread Andreas Hild
Dear All,

Following a physical disk failure of a RAID1 array, I tried to mount
the remaining volume of a root partition with "-o degraded". For some
reason it ended up as read-only as described here:
https://btrfs.wiki.kernel.org/index.php/Gotchas#raid1_volumes_only_mountable_once_RW_if_degraded


How to precisely do this: dump, recreate and restore its contents?
Could someone please provided more details how to recover this volume
safely?

Device 4 is missing as shown below. /dev/sda4 is a new disk available
now. I'm currently using a live disk to recover this root volume.

Many thanks!

Best wishes,
-Andreas


  *uname -a
Linux debian 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3 (2017-12-03) x86_64 GNU/Linux

  *btrfs --version
btrfs-progs v4.7.3

  *btrfs fi show
warning, device 4 is missing
Label: none  uuid: ce222f62-b31b-44d6-94ac-79de595325be
Total devices 2 FS bytes used 27.55GiB
devid1 size 50.00GiB used 35.31GiB path /dev/sdb2
*** Some devices missing

Label: none  uuid: f946ebb5-bdf9-43f4-8252-5c2d997f7021
Total devices 1 FS bytes used 146.54GiB
devid1 size 407.69GiB used 153.16GiB path /dev/sdb3

Label: none  uuid: 9e312f97-b2f5-4a53-9fd0-b4fc65653dd1
Total devices 1 FS bytes used 384.00KiB
devid1 size 50.00GiB used 2.02GiB path /dev/sda4

  *dmesg > dmesg.log
[ 1193.006992] BTRFS error (device sda4): unable to start balance with
target data profile 16
[ 1312.905264] BTRFS info (device sdb2): allowing degraded mounts
[ 1312.905271] BTRFS info (device sdb2): disk space caching is enabled
[ 1312.905275] BTRFS info (device sdb2): has skinny extents
[ 1312.907400] BTRFS warning (device sdb2): devid 4 uuid
18d9aa16-4543-41fa-90c8-560944e61b2c is missing
[ 1312.957486] BTRFS info (device sdb2): bdev (null) errs: wr
120118005, rd 38429879, flush 8815049, corrupt 0, gen 0
[ 1313.279140] BTRFS warning (device sdb2): missing devices (1)
exceeds the limit (0), writeable mount is not allowed
[ 1313.312077] BTRFS error (device sdb2): open_ctree failed
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-11 Thread Ingo Molnar

* Linus Torvalds  wrote:

> So an error message like
> 
>warning: ISO C90 requires array sizes to be constant-expressions
> 
> would be technically correct and useful from a portability angle. It
> tells you when you're doing something non-portable, and should be
> automatically enabled with "-ansi -pedantic", for example.
> 
> So what's misleading is actually the name of the warning and the
> message, not that it happens. The warning isn't about "variable
> length", it's literally about the rules for what a
> "constant-expression" is.
> 
> And in C, the expression (2,3) has a constant _value_ (namely 3), but
> it isn't a constant-expression as specified by the language.
> 
> Now, the thing is that once you actually do variable length arrays,
> those old front-end rules make no sense any more (outside of the "give
> portability warnings" thing).
> 
> Because once you do variable length arrays, you obviously _parse_
> everything just fine, and you're doing to evaluate much more complex
> expressions than some limited constant-expression rule.

BTW., while I fully agree with everything you said, it's not entirely correct 
to 
claim that if a C compiler can generate VLA code it is necessarily able to 
parse 
and evaluate constant array sizes "just fine".

Constant expressions are typically parsed very early on, at the preprocessing 
stage. They can be used with some preprocessor directives as well, such as 
'#if' 
(with some further limitations on their syntax).

If VLA support is implemented in a later stage, and results in heavy-handed 
code 
generation that will technically work for constant value expressions as well 
but 
results in suboptimal code, then a warning should probably be emitted - and it 
wouldn't be pedantic.

The existing warning is still very misleading:

  warning: ISO C90 forbids variable length array ‘array’ [-Wvla]

... and if my above theory is correct then I think a better warning would be 
something like:

  warning: Array declaration is not a C90 constant expression, resulting in VLA 
code generation

... and note that in this specific case it's not misleading to talk about VLAs 
in 
the warning text, because the array size, even if it's constant value, results 
in 
VLA code generation.

I don't know whether GCC has such a limitation, but a quick experiment with GCC 
7.2 suggests that a (2,3) array size expression results in a lot more code 
being 
generated than with a constant expression.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html