Re: Hand Patching a BTRFS Superblock?

2017-12-29 Thread Qu Wenruo


On 2017年12月30日 03:30, Stirling Westrup wrote:
> You were right! grep found two more signature blocks! How do I make use of 
> them?
> 
> videon:~ # LC_ALL=C grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D" /dev/sde
> 65600:_BHRfS_M

This the correct one.
Offset is 64K + 64.

> 26697111807:_BHRfS_M

It is a little tricky now.

Btrfs has its superblocks at:
64K (primary)
64M (backup 1)
256G (backup 2)

While this one is at 25G and has offset which is not 64 (magic inside
superblock).

Is there any btrfs image inside the fs?

> 26854350428:_BHRfS_M

Much like the previous one.

Despite that, you could try to use "inspect dump-super --bytenr" to
check if it's the super you want.

The bytenr you could pass is:
26697111743
26854350364

And at this point, I would say the chance to recover data is really very
low now.

Thanks,
Qu
> 
> On Thu, Dec 28, 2017 at 11:00 PM, Qu Wenruo  wrote:
>>
>>
>> On 2017年12月29日 11:35, Stirling Westrup wrote:
>>> On Thu, Dec 28, 2017 at 9:08 PM, Qu Wenruo  wrote:


>>>

 I strongly recommend to do a binary search for magic number "5f42 4852
 6653 5f4d" to locate the real offset (if it's offset, not a toasted image)

>>> I don't understand, how would I do a binary search for that signature?
>>>
>> The most stupid idea is to use xxd and grep.
>>
>> Something like:
>>
>> # xxd /dev/sde | grep 5f42 -C1
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: btrfs balance problems

2017-12-29 Thread Hans van Kranenburg
On 12/28/2017 12:15 PM, Nikolay Borisov wrote:
> 
> On 23.12.2017 13:19, James Courtier-Dutton wrote:
>>
>> During a btrfs balance, the process hogs all CPU.
>> Or, to be exact, any other program that wishes to use the SSD during a
>> btrfs balance is blocked for long periods. Long periods being more
>> than 5 seconds.
>> Is there any way to multiplex SSD access while btrfs balance is
>> operating, so that other applications can still access the SSD with
>> relatively low latency?
>>
>> My guess is that btrfs is doing a transaction with a large number of
>> SSD blocks at a time, and thus blocking other applications.
>>
>> This makes for atrocious user interactivity as well as applications
>> failing because they cannot access the disk in a relatively low latent
>> manner.
>> For, example, this is causing a High Definition network CCTV
>> application to fail.
>>
>> What I would really like, is for some way to limit SSD bandwidths to
>> applications.
>> For example the CCTV app always gets the bandwidth it needs, and all
>> other applications can still access the SSD, but are rate limited.
>> This would fix my particular problem.
>> We have rate limiting for network applications, why not disk access also?
> 
> So how are you running btrfs balance?

Or, to again take one step further back...

*Why* are you running btrfs balance at all?

:)

> Are you using any filters
> whatsoever? The documentation
> [https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance] has the
> following warning:
> 
> Warning: running balance without filters will take a lot of time as it
> basically rewrites the entire filesystem and needs to update all block
> pointers.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ATTENTION!!!

2017-12-29 Thread Loretta Robles


From: Loretta Robles
Sent: Friday, December 29, 2017 1:01 PM
To: Loretta Robles
Subject: ATTENTION!!!

You have been randomly selected for a donation. Contact soriz4...@gmail.com for 
claims.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance problems

2017-12-29 Thread Kai Krakow
Am Thu, 28 Dec 2017 00:39:37 + schrieb Duncan:

>> I can I get btrfs balance to work in the background, without adversely
>> affecting other applications?
> 
> I'd actually suggest a different strategy.
> 
> What I did here way back when I was still on reiserfs on spinning rust,
> where it made more difference than on ssd, but I kept the settings when
> I switched to ssd and btrfs, and at least some others have mentioned
> that similar settings helped them on btrfs as well, is...
> 
> Problem: The kernel virtual-memory subsystem's writeback cache was
> originally configured for systems with well under a Gigabyte of RAM, and
> the defaults no longer work so well on multi-GiB-RAM systems,
> particularly above 8 GiB RAM, because they are based on a percentage of
> available RAM, and will typically let several GiB of dirty writeback
> cache accumulate before kicking off any attempt to actually write it to
> storage.  On spinning rust, when writeback /does/ finally kickoff, this
> can result in hogging the IO for well over half a minute at a time,
> where 30 seconds also happens to be the default "flush it anyway" time.

This is somehow like the buffer bloat discussion for networking... Big 
buffers increase latency. There is more than one type of buffer.

Additionally to what Duncan wrote (first type of buffer), the kernel 
lately got a new option to fight this "buffer bloat": writeback-
throttling. It may help to enable that option.

The second type of buffer is the io queue.

So, you may also want to lower the io queue depth (nr_requests) of your 
devices. I think it defaults to 128 while most consumer drives only have 
a queue depth of 31 or 32 commands. Thus, reducing nr_requests for some 
of your devices may help you achieve better latency (but reduces 
throughput).

Especially if working with io schedulers that do not implement io 
priorities, you could simply lower nr_requests to around or below the 
native command queue depth of your devices. The device itself can handle 
it better in that case, especially on spinning rust, as the firmware 
knows when to pull certain selected commands from the queue during a 
rotation of the media. The kernel knows nothing about rotary positions, 
it can only use the queue to prioritize and reorder requests but cannot 
take advantage of rotary positions of the heads.

See

$ grep ^ /sys/block/*/queue/nr_requests


You may also get better results with increasing the nr_requests instead 
but at the cost of also adjusting the write buffer sizes, because with 
large nr_requests, you don't want blocking on writes so early, at least 
not when you need good latency. This probably works best for you with 
schedulers that care about latency, like deadline or kyber.

For testing, keep in mind that everything works in dependence to each 
other setting. So change one at a time, take your tests, then change 
another and see how that relates to the first change, even when the first 
change made your experience worse.

Another tip that's missing: Put different access classes onto different 
devices. That is, if you have a directory structure that's mostly written 
to, put it on it's own physical devices, with separate tuning and 
appropriate filesystem (log structured and cow filesystems are good at 
streaming writes). Put read mostly workloads also on their own device and 
filesystems. Put realtime workloads on their own device and filesystems. 
This gives you a much easier chance to succeed.


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: enchanse raid1/10 balance heuristic

2017-12-29 Thread Timofey Titovets
2017-12-29 22:14 GMT+03:00 Dmitrii Tcvetkov :
> On Fri, 29 Dec 2017 21:44:19 +0300
> Dmitrii Tcvetkov  wrote:
>> > +/**
>> > + * guess_optimal - return guessed optimal mirror
>> > + *
>> > + * Optimal expected to be pid % num_stripes
>> > + *
>> > + * That's generaly ok for spread load
>> > + * Add some balancer based on queue leght to device
>> > + *
>> > + * Basic ideas:
>> > + *  - Sequential read generate low amount of request
>> > + *so if load of drives are equal, use pid % num_stripes
>> > balancing
>> > + *  - For mixed rotate/non-rotate mirrors, pick non-rotate as
>> > optimal
>> > + *and repick if other dev have "significant" less queue lenght
>> > + *  - Repick optimal if queue leght of other mirror are less
>> > + */
>> > +static int guess_optimal(struct map_lookup *map, int optimal)
>> > +{
>> > +   int i;
>> > +   int round_down = 8;
>> > +   int num = map->num_stripes;
>>
>> num has to be initialized from map->sub_stripes if we're reading
>> RAID10, otherwise there will be NULL pointer dereference
>>
>
> Check can be like:
> if (map->type & BTRFS_BLOCK_GROUP_RAID10)
> num = map->sub_stripes;
>
>>@@ -5804,10 +5914,12 @@ static int __btrfs_map_block(struct
>>btrfs_fs_info *fs_info,
>>   stripe_index += mirror_num - 1;
>>   else {
>>   int old_stripe_index = stripe_index;
>>+  optimal = guess_optimal(map,
>>+  current->pid %
>>map->num_stripes);
>>   stripe_index = find_live_mirror(fs_info, map,
>> stripe_index,
>> map->sub_stripes,
>> stripe_index +
>>-current->pid %
>>map->sub_stripes,
>>+optimal,
>> dev_replace_is_ongoing);
>>   mirror_num = stripe_index - old_stripe_index
>> + 1; }
>>--
>>2.15.1
>
> Also here calculation should be with map->sub_stripes too.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Why you think we need such check?
I.e. guess_optimal always called for find_live_mirror()
Both in same context, like that:

if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
  u32 factor = map->num_stripes / map->sub_stripes;

  stripe_nr = div_u64_rem(stripe_nr, factor, _index);
  stripe_index *= map->sub_stripes;

  if (need_full_stripe(op))
num_stripes = map->sub_stripes;
  else if (mirror_num)
stripe_index += mirror_num - 1;
  else {
int old_stripe_index = stripe_index;
stripe_index = find_live_mirror(fs_info, map,
  stripe_index,
  map->sub_stripes, stripe_index +
  current->pid % map->sub_stripes,
  dev_replace_is_ongoing);
mirror_num = stripe_index - old_stripe_index + 1;
}

That useless to check that internally

---
Also, fio results for all hdd raid1, results from waxhead:

Original:

Disk-4k-randread-depth-32: (g=0): rw=randread, bs=(R) 4096B-512KiB,
(W) 4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=32
Disk-4k-read-depth-8: (g=0): rw=read, bs=(R) 4096B-512KiB, (W)
4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=8
Disk-4k-randwrite-depth-8: (g=0): rw=randwrite, bs=(R) 4096B-512KiB,
(W) 4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=8
fio-3.1
Starting 3 processes
Disk-4k-randread-depth-32: Laying out IO file (1 file / 65536MiB)
Jobs: 3 (f=3): [r(1),R(1),w(1)][100.0%][r=120MiB/s,w=9.88MiB/s][r=998,w=96
IOPS][eta 00m:00s]
Disk-4k-randread-depth-32: (groupid=0, jobs=1): err= 0: pid=3132: Fri
Dec 29 16:16:33 2017
   read: IOPS=375, BW=41.3MiB/s (43.3MB/s)(24.2GiB/600128msec)
slat (usec): min=15, max=206039, avg=88.71, stdev=990.35
clat (usec): min=357, max=3487.1k, avg=85022.93, stdev=141872.25
 lat (usec): min=399, max=3487.2k, avg=85112.58, stdev=141880.31
clat percentiles (msec):
 |  1.00th=[5],  5.00th=[7], 10.00th=[9], 20.00th=[   13],
 | 30.00th=[   19], 40.00th=[   27], 50.00th=[   39], 60.00th=[   56],
 | 70.00th=[   83], 80.00th=[  127], 90.00th=[  209], 95.00th=[  300],
 | 99.00th=[  600], 99.50th=[  852], 99.90th=[ 1703], 99.95th=[ 2165],
 | 99.99th=[ 2937]
   bw (  KiB/s): min=  392, max=75824, per=30.46%, avg=42736.09,
stdev=12019.09, samples=1186
   iops: min=3, max=  500, avg=380.24, stdev=99.50, samples=1186
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.29%, 10=12.33%, 20=19.67%, 50=24.92%
  lat (msec)   : 100=17.51%, 250=18.05%, 500=5.72%, 750=0.85%, 1000=0.28%
  lat (msec)   : 2000=0.29%, >=2000=0.07%
  cpu  : usr=0.67%, sys=4.62%, ctx=215716, majf=0, minf=526
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, 

Re: WARNING: CPU: 1 PID: 3016 at fs/btrfs/ctree.h:1564 btrfs_update_device+0x189/0x190 [btrfs]

2017-12-29 Thread Nikolay Borisov


On 29.12.2017 20:17, Elimar Riesebieter wrote:
> Thanks,
> 
> * Nikolay Borisov  [2017-12-29 19:23 +0200]:
> 
> [...]
> 
>> So OP:
>>
>> Update your btrfs-progs package to latest 4.14 and run btrfs rescue :
>>
>> btrfs rescue fix-device-size 
> 
> I installed btrfs-progs 4.14. Can't run 
> 'btrfs rescue fix-device-size /dev/sd(a|b)3'. The devices are mounted
> including my root... 
> 
> How to accomplish?

Then you have to resize your fs to a multiple of 4k, either up or down.
> 
> Thanks in advance
> 
> Elimar
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hand Patching a BTRFS Superblock?

2017-12-29 Thread Stirling Westrup
You were right! grep found two more signature blocks! How do I make use of them?

videon:~ # LC_ALL=C grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D" /dev/sde
65600:_BHRfS_M
26697111807:_BHRfS_M
26854350428:_BHRfS_M

On Thu, Dec 28, 2017 at 11:00 PM, Qu Wenruo  wrote:
>
>
> On 2017年12月29日 11:35, Stirling Westrup wrote:
>> On Thu, Dec 28, 2017 at 9:08 PM, Qu Wenruo  wrote:
>>>
>>>
>>
>>>
>>> I strongly recommend to do a binary search for magic number "5f42 4852
>>> 6653 5f4d" to locate the real offset (if it's offset, not a toasted image)
>>>
>> I don't understand, how would I do a binary search for that signature?
>>
> The most stupid idea is to use xxd and grep.
>
> Something like:
>
> # xxd /dev/sde | grep 5f42 -C1
>



-- 
Stirling Westrup
Programmer, Entrepreneur.
https://www.linkedin.com/e/fpf/77228
http://www.linkedin.com/in/swestrup
http://technaut.livejournal.com
http://sourceforge.net/users/stirlingwestrup
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: enchanse raid1/10 balance heuristic

2017-12-29 Thread Dmitrii Tcvetkov
On Fri, 29 Dec 2017 21:44:19 +0300
Dmitrii Tcvetkov  wrote:
> > +/**
> > + * guess_optimal - return guessed optimal mirror
> > + *
> > + * Optimal expected to be pid % num_stripes
> > + *
> > + * That's generaly ok for spread load
> > + * Add some balancer based on queue leght to device
> > + *
> > + * Basic ideas:
> > + *  - Sequential read generate low amount of request
> > + *so if load of drives are equal, use pid % num_stripes
> > balancing
> > + *  - For mixed rotate/non-rotate mirrors, pick non-rotate as
> > optimal
> > + *and repick if other dev have "significant" less queue lenght
> > + *  - Repick optimal if queue leght of other mirror are less
> > + */
> > +static int guess_optimal(struct map_lookup *map, int optimal)
> > +{
> > +   int i;
> > +   int round_down = 8;
> > +   int num = map->num_stripes;  
> 
> num has to be initialized from map->sub_stripes if we're reading
> RAID10, otherwise there will be NULL pointer dereference
> 

Check can be like:
if (map->type & BTRFS_BLOCK_GROUP_RAID10)
num = map->sub_stripes;

>@@ -5804,10 +5914,12 @@ static int __btrfs_map_block(struct
>btrfs_fs_info *fs_info,
>   stripe_index += mirror_num - 1;
>   else {
>   int old_stripe_index = stripe_index;
>+  optimal = guess_optimal(map,
>+  current->pid %
>map->num_stripes);
>   stripe_index = find_live_mirror(fs_info, map,
> stripe_index,
> map->sub_stripes,
> stripe_index +
>-current->pid %
>map->sub_stripes,
>+optimal,
> dev_replace_is_ongoing);
>   mirror_num = stripe_index - old_stripe_index
> + 1; }
>-- 
>2.15.1

Also here calculation should be with map->sub_stripes too.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: enchanse raid1/10 balance heuristic

2017-12-29 Thread Dmitrii Tcvetkov
On Fri, 29 Dec 2017 05:09:14 +0300
Timofey Titovets  wrote:

> Currently btrfs raid1/10 balancer balance requests to mirrors,
> based on pid % num of mirrors.
> 
> Make logic understood:
>  - if one of underline devices are non rotational
>  - Queue leght to underline devices
> 
> By default try use pid % num_mirrors guessing, but:
>  - If one of mirrors are non rotational, repick optimal to it
>  - If underline mirror have less queue leght then optimal,
>repick to that mirror
> 
> For avoid round-robin request balancing,
> lets round down queue leght:
>  - By 8 for rotational devs
>  - By 2 for all non rotational devs
> 
> Changes:
>   v1 -> v2:
> - Use helper part_in_flight() from genhd.c
>   to get queue lenght
> - Move guess code to guess_optimal()
> - Change balancer logic, try use pid % mirror by default
>   Make balancing on spinning rust if one of underline devices
>   are overloaded
> 
> Signed-off-by: Timofey Titovets 
> ---
>  block/genhd.c  |   1 +
>  fs/btrfs/volumes.c | 116
> - 2 files
> changed, 115 insertions(+), 2 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 96a66f671720..a77426a7 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -81,6 +81,7 @@ void part_in_flight(struct request_queue *q, struct
> hd_struct *part, atomic_read(>in_flight[1]);
>   }
>  }
> +EXPORT_SYMBOL_GPL(part_in_flight);
>  
>  struct hd_struct *__disk_get_part(struct gendisk *disk, int partno)
>  {
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9a04245003ab..1c84534df9a5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "ctree.h"
>  #include "extent_map.h"
> @@ -5216,6 +5217,112 @@ int btrfs_is_parity_mirror(struct
> btrfs_fs_info *fs_info, u64 logical, u64 len) return ret;
>  }
>  
> +/**
> + * bdev_get_queue_len - return rounded down in flight queue lenght
> of bdev
> + *
> + * @bdev: target bdev
> + * @round_down: round factor big for hdd and small for ssd, like 8
> and 2
> + */
> +static int bdev_get_queue_len(struct block_device *bdev, int
> round_down) +{
> + int sum;
> + struct hd_struct *bd_part = bdev->bd_part;
> + struct request_queue *rq = bdev_get_queue(bdev);
> + uint32_t inflight[2] = {0, 0};
> +
> + part_in_flight(rq, bd_part, inflight);
> +
> + sum = max_t(uint32_t, inflight[0], inflight[1]);
> +
> + /*
> +  * Try prevent switch for every sneeze
> +  * By roundup output num by some value
> +  */
> + return ALIGN_DOWN(sum, round_down);
> +}
> +
> +/**
> + * guess_optimal - return guessed optimal mirror
> + *
> + * Optimal expected to be pid % num_stripes
> + *
> + * That's generaly ok for spread load
> + * Add some balancer based on queue leght to device
> + *
> + * Basic ideas:
> + *  - Sequential read generate low amount of request
> + *so if load of drives are equal, use pid % num_stripes balancing
> + *  - For mixed rotate/non-rotate mirrors, pick non-rotate as optimal
> + *and repick if other dev have "significant" less queue lenght
> + *  - Repick optimal if queue leght of other mirror are less
> + */
> +static int guess_optimal(struct map_lookup *map, int optimal)
> +{
> + int i;
> + int round_down = 8;
> + int num = map->num_stripes;

num has to be initialized from map->sub_stripes if we're reading RAID10,
otherwise there will be NULL pointer dereference

> + int qlen[num];
> + bool is_nonrot[num];
> + bool all_bdev_nonrot = true;
> + bool all_bdev_rotate = true;
> + struct block_device *bdev;
> +
> + if (num == 1)
> + return optimal;
> +
> + /* Check accessible bdevs */
> + for (i = 0; i < num; i++) {
> + /* Init for missing bdevs */
> + is_nonrot[i] = false;
> + qlen[i] = INT_MAX;
> + bdev = map->stripes[i].dev->bdev;
> + if (bdev) {
> + qlen[i] = 0;
> + is_nonrot[i] =
> blk_queue_nonrot(bdev_get_queue(bdev));
> + if (is_nonrot[i])
> + all_bdev_rotate = false;
> + else
> + all_bdev_nonrot = false;
> + }
> + }
> +
> + /*
> +  * Don't bother with computation
> +  * if only one of two bdevs are accessible
> +  */
> + if (num == 2 && qlen[0] != qlen[1]) {
> + if (qlen[0] < qlen[1])
> + return 0;
> + else
> + return 1;
> + }
> +
> + if (all_bdev_nonrot)
> + round_down = 2;
> +
> + for (i = 0; i < num; i++) {
> + if (qlen[i])
> + continue;
> + bdev = map->stripes[i].dev->bdev;
> + qlen[i] = bdev_get_queue_len(bdev, round_down);
> + }
> +

Re: WARNING: CPU: 1 PID: 3016 at fs/btrfs/ctree.h:1564 btrfs_update_device+0x189/0x190 [btrfs]

2017-12-29 Thread Elimar Riesebieter
Thanks,

* Nikolay Borisov  [2017-12-29 19:23 +0200]:

[...]

> So OP:
> 
> Update your btrfs-progs package to latest 4.14 and run btrfs rescue :
> 
> btrfs rescue fix-device-size 

I installed btrfs-progs 4.14. Can't run 
'btrfs rescue fix-device-size /dev/sd(a|b)3'. The devices are mounted
including my root... 

How to accomplish?

Thanks in advance

Elimar
-- 
  Excellent day for drinking heavily.
  Spike the office water cooler;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 3016 at fs/btrfs/ctree.h:1564 btrfs_update_device+0x189/0x190 [btrfs]

2017-12-29 Thread Nikolay Borisov


On 29.12.2017 19:07, Holger Hoffstätte wrote:
> 
> Apply the patch from https://patchwork.kernel.org/patch/9960893/
> and follow the logged instructions re. device resizing (or see
> https://bugzilla.kernel.org/show_bug.cgi?id=196949 for examples).
> 
> The patch is unfortunately not yet merged into 4.15rc, otherwise it
> could be sent to 4.14-stable.
> 

This is not the correct way to resolve the issue. Rather, Qu has sent a
patch for btrfs-progs which does the correct repair. The code in
question is in btrfs progs 4.14.


So OP:

Update your btrfs-progs package to latest 4.14 and run btrfs rescue :

btrfs rescue fix-device-size 
> -h
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 3016 at fs/btrfs/ctree.h:1564 btrfs_update_device+0x189/0x190 [btrfs]

2017-12-29 Thread Holger Hoffstätte

Apply the patch from https://patchwork.kernel.org/patch/9960893/
and follow the logged instructions re. device resizing (or see
https://bugzilla.kernel.org/show_bug.cgi?id=196949 for examples).

The patch is unfortunately not yet merged into 4.15rc, otherwise it
could be sent to 4.14-stable.

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 3016 at fs/btrfs/ctree.h:1564 btrfs_update_device+0x189/0x190 [btrfs]

2017-12-29 Thread Elimar Riesebieter
* Elimar Riesebieter  [2017-12-29 17:33 +0100]:

> Hi all,
> 
> I get warnings as seen in attached dmesg.log. This is on 4.14.9.
> 4.9.72 runs flawless so far.
> 
> ##
> Linux toy 4.14.9-toy-lxtec-amd64 #7 SMP Fri Dec 29 10:43:28 CET 2017 x86_64 
> GNU/Linux
> --
> btrfs-progs v4.13.3
> --
> Label: 'TOY-RAID1'  uuid: 32fc4ea0-0b26-478c-9b2e-b299d6289270
>   Total devices 2 FS bytes used 560.98GiB
>   devid1 size 3.62TiB used 566.03GiB path /dev/sda3
>   devid2 size 3.62TiB used 566.03GiB path /dev/sdb3
> --
> Data, RAID1: total=563.00GiB, used=559.43GiB
> System, RAID1: total=32.00MiB, used=96.00KiB
> Metadata, RAID1: total=3.00GiB, used=1.55GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> #
> 
> I want to run this machine as a 24/7 server with the latest
> LT-KERNEL. So please investigate.

Ooops, dmesg attached now.

Many thanks
Elimar
-- 
  Alles, was viel bedacht wird, wird bedenklich!;-)
 Friedrich Nietzsche
[0.00] Linux version 4.14.9-toy-lxtec-amd64 (er@toy) (gcc version 7.2.0 
(Debian 7.2.0-18)) #7 SMP Fri Dec 29 10:43:28 CET 2017
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.14.9-toy-lxtec-amd64 
root=UUID=32fc4ea0-0b26-478c-9b2e-b299d6289270 ro
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[0.00] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[0.00] x86/fpu: Enabled xstate features 0x1f, context size is 960 
bytes, using 'compacted' format.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00057fff] usable
[0.00] BIOS-e820: [mem 0x00058000-0x00058fff] reserved
[0.00] BIOS-e820: [mem 0x00059000-0x0009efff] usable
[0.00] BIOS-e820: [mem 0x0009f000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x0010-0x77e90fff] usable
[0.00] BIOS-e820: [mem 0x77e91000-0x77e91fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x77e92000-0x77edbfff] reserved
[0.00] BIOS-e820: [mem 0x77edc000-0x7d09] usable
[0.00] BIOS-e820: [mem 0x7d0a-0x7d42dfff] reserved
[0.00] BIOS-e820: [mem 0x7d42e000-0x7d5ecfff] usable
[0.00] BIOS-e820: [mem 0x7d5ed000-0x7dd90fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7dd91000-0x7fe67fff] reserved
[0.00] BIOS-e820: [mem 0x7fe68000-0x7fffefff] type 20
[0.00] BIOS-e820: [mem 0x7000-0x7fff] usable
[0.00] BIOS-e820: [mem 0xe000-0xefff] reserved
[0.00] BIOS-e820: [mem 0xfe00-0xfe010fff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x000877bf] usable
[0.00] NX (Execute Disable) protection: active
[0.00] efi: EFI v2.40 by American Megatrends
[0.00] efi:  ESRT=0x7fc2e918  ACPI=0x7d7f5000  ACPI 2.0=0x7d7f5000  
SMBIOS=0xf05e0  SMBIOS 3.0=0x7fb79000  MPS=0xfca00 
[0.00] random: fast init done
[0.00] SMBIOS 3.0.0 present.
[0.00] DMI: Supermicro Super Server/X11SSM-F, BIOS 2.0b 07/28/2017
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x877c00 max_arch_pfn = 0x4
[0.00] MTRR default type: write-back
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 00C000 mask 7FC000 uncachable
[0.00]   1 base 00A000 mask 7FE000 uncachable
[0.00]   2 base 009000 mask 7FF000 uncachable
[0.00]   3 base 008C00 mask 7FFC00 uncachable
[0.00]   4 base 008A00 mask 7FFE00 uncachable
[0.00]   5 base 008900 mask 7FFF00 uncachable
[0.00]   6 base 008880 mask 7FFF80 uncachable
[0.00]   7 base 008840 mask 7FFFC0 uncachable
[  

WARNING: CPU: 1 PID: 3016 at fs/btrfs/ctree.h:1564 btrfs_update_device+0x189/0x190 [btrfs]

2017-12-29 Thread Elimar Riesebieter
Hi all,

I get warnings as seen in attached dmesg.log. This is on 4.14.9.
4.9.72 runs flawless so far.

##
Linux toy 4.14.9-toy-lxtec-amd64 #7 SMP Fri Dec 29 10:43:28 CET 2017 x86_64 
GNU/Linux
--
btrfs-progs v4.13.3
--
Label: 'TOY-RAID1'  uuid: 32fc4ea0-0b26-478c-9b2e-b299d6289270
Total devices 2 FS bytes used 560.98GiB
devid1 size 3.62TiB used 566.03GiB path /dev/sda3
devid2 size 3.62TiB used 566.03GiB path /dev/sdb3
--
Data, RAID1: total=563.00GiB, used=559.43GiB
System, RAID1: total=32.00MiB, used=96.00KiB
Metadata, RAID1: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
#

I want to run this machine as a 24/7 server with the latest
LT-KERNEL. So please investigate.

Many thanks in advance
Elimar
-- 
  355/113: Not the famous irrational number pi,
   but an incredible simulation!
-unknown
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/17] btrfs-progs: lowmem check: record returned errors after walk_down_tree_v2()

2017-12-29 Thread Nikolay Borisov


On 20.12.2017 06:57, Su Yue wrote:
> In lowmem mode with '--repair', check_chunks_and_extents_v2()
> will fix accounting in block groups and clear the error
> bit BG_ACCOUNTING_ERROR.
> However, return value of check_btrfs_root() is 0 either 1 instead of
> error bits.
> 
> If extent tree is on error, lowmem repair always prints error and
> returns nonzero value even the filesystem is fine after repair.
> 
> So let @err contains bits after walk_down_tree_v2().
> 
> Introduce FATAL_ERROR for lowmem mode to represents negative return
> values since negative and positive can't not be mixed in bits operations.
> 
> Signed-off-by: Su Yue 
> ---
>  cmds-check.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/cmds-check.c b/cmds-check.c
> index 309ac9553b3a..ebede26cef01 100644
> --- a/cmds-check.c
> +++ b/cmds-check.c
> @@ -134,6 +134,7 @@ struct data_backref {
>  #define DIR_INDEX_MISMATCH  (1<<19) /* INODE_INDEX found but not match */
>  #define DIR_COUNT_AGAIN (1<<20) /* DIR isize should be recalculated 
> */
>  #define BG_ACCOUNTING_ERROR (1<<21) /* Block group accounting error */
> +#define FATAL_ERROR (1<<22) /* fatal bit for errno */
>  
>  static inline struct data_backref* to_data_backref(struct extent_backref 
> *back)
>  {
> @@ -6556,7 +6557,7 @@ static struct data_backref *find_data_backref(struct 
> extent_record *rec,
>   *otherwise means check fs tree(s) items relationship and
>   * @root MUST be a fs tree root.
>   * Returns 0  represents OK.
> - * Returns not 0  represents error.
> + * Returns > 0represents error bits.
>   */

What about the code in 'if (!check_all)' branch, check_fs_first_inode
can return a negative value, hence check_btrfs_root can return a
negative value. A negative value can also be returned from
btrfs_search_slot.

Clearly this patch needs to be thought out better

>  static int check_btrfs_root(struct btrfs_trans_handle *trans,
>   struct btrfs_root *root, unsigned int ext_ref,
> @@ -6607,12 +6608,12 @@ static int check_btrfs_root(struct btrfs_trans_handle 
> *trans,
>   while (1) {
>   ret = walk_down_tree_v2(trans, root, , , ,
>   ext_ref, check_all);
> -
> - err |= !!ret;
> + if (ret > 0)
> + err |= ret;
>  
>   /* if ret is negative, walk shall stop */
>   if (ret < 0) {
> - ret = err;
> + ret = err | FATAL_ERROR;
>   break;
>   }
>  
> @@ -6636,12 +6637,12 @@ out:
>   * @ext_ref: the EXTENDED_IREF feature
>   *
>   * Return 0 if no error found.
> - * Return <0 for error.
> + * Return not 0 for error.
>   */
>  static int check_fs_root_v2(struct btrfs_root *root, unsigned int ext_ref)
>  {
>   reset_cached_block_groups(root->fs_info);
> - return check_btrfs_root(NULL, root, ext_ref, 0);
> + return !!check_btrfs_root(NULL, root, ext_ref, 0);
>  }

You make the function effectively boolean, make this explicit by
changing its return value to bool. Also the name and the boolean return
makes the function REALLY confusing. I.e when should we return true or
false? As it stands it return "false" on success and "true" otherwise,
this is a mess...


>  
>  /*
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] blk-mq: make blk_abort_request() trigger timeout path

2017-12-29 Thread Christoph Hellwig
On Sat, Dec 16, 2017 at 04:07:23AM -0800, Tejun Heo wrote:
> Note that this makes blk_abort_request() asynchronous - it initiates
> abortion but the actual termination will happen after a short while,
> even when the caller owns the request.  AFAICS, SCSI and ATA should be
> fine with that and I think mtip32xx and dasd should be safe but not
> completely sure.  It'd be great if people who know the drivers take a
> look.

For that you'll need to CC linux-ide and linux-scsi, and for the
SAS drivers some of the usual suspects that touch the SAS code.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] blk-mq: protect completion path with RCU

2017-12-29 Thread Christoph Hellwig
Why do you need the srcu protection?  The completion path can never
sleep.

If there is a good reason to keep it please add commment, and
make the srcu variant a separate function only used by drivers that
need it instead of adding the conditional.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v3] blk-mq: reimplement timeout handling

2017-12-29 Thread Christoph Hellwig
This seems to miss the linux-block list once again.  Please include
it in the next resend.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bytrfs-progs: Print error on invalid extent item format during check

2017-12-29 Thread Nikolay Borisov
While performing normal mode check if the code comes across an invalid
extent format it will just BUG() and exit without printing any useful
information for debugging. Improve the situation by outputting the
key/leaf bytenr/slot which will enable to quickly inspect the tree and
see what the corruption is.

Signed-off-by: Nikolay Borisov 
---
 cmds-check.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index a93ac2c88a38..371516709ed8 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -8362,7 +8362,12 @@ static int process_extent_item(struct btrfs_root *root,
if (item_size < sizeof(*ei)) {
 #ifdef BTRFS_COMPAT_EXTENT_TREE_V0
struct btrfs_extent_item_v0 *ei0;
-   BUG_ON(item_size != sizeof(*ei0));
+   if (item_size != sizeof(*ei0)) {
+   error("invalid extent item format: ITEM[%llu %u %llu] 
leaf: %llu slot: %d",
+ key.objectid, key.type, key.offset,
+ btrfs_header_bytenr(eb), slot);
+   BUG();
+   }
ei0 = btrfs_item_ptr(eb, slot, struct btrfs_extent_item_v0);
refs = btrfs_extent_refs_v0(eb, ei0);
 #else
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-29 Thread Masami Hiramatsu
On Thu, 28 Dec 2017 17:03:24 -0800
Alexei Starovoitov  wrote:

> On 12/28/17 12:20 AM, Masami Hiramatsu wrote:
> > On Wed, 27 Dec 2017 20:32:07 -0800
> > Alexei Starovoitov  wrote:
> >
> >> On 12/27/17 8:16 PM, Steven Rostedt wrote:
> >>> On Wed, 27 Dec 2017 19:45:42 -0800
> >>> Alexei Starovoitov  wrote:
> >>>
>  I don't think that's the case. My reading of current
>  trace_kprobe_ftrace() -> arch_check_ftrace_location()
>  is that it will not be true for old mcount case.
> >>>
> >>> In the old mcount case, you can't use ftrace to return without calling
> >>> the function. That is, no modification of the return ip, unless you
> >>> created a trampoline that could handle arbitrary stack frames, and
> >>> remove them from the stack before returning back to the function.
> >>
> >> correct. I was saying that trace_kprobe_ftrace() won't let us do
> >> bpf_override_return with old mcount.
> >
> > No, trace_kprobe_ftrace() just checks the given address will be
> > managed by ftrace. you can see arch_check_ftrace_location() in 
> > kernel/kprobes.c.
> >
> > FYI, CONFIG_KPROBES_ON_FTRACE depends on DYNAMIC_FTRACE_WITH_REGS, and
> > DYNAMIC_FTRACE_WITH_REGS doesn't depend on CC_USING_FENTRY.
> > This means if you compile kernel with old gcc and enable DYNAMIC_FTRACE,
> > kprobes uses ftrace on mcount address which is NOT the entry point
> > of target function.
> 
> ok. fair enough. I think we can gate the feature to !mcount only.
> 
> > On the other hand, changing IP feature has been implemented originaly
> > by kprobes with int3 (sw breakpoint). This means you can use kprobes
> > at correct address (the entry address of the function) you can hijack
> > the function, as jprobe did.
> >
>  As far as the rest of your arguments it very much puzzles me that
>  you claim that this patch suppose to work based on historical
>  reasoning whereas you did NOT test it.
> >>>
> >>> I believe that Masami is saying that the modification of the IP from
> >>> kprobes has been very well tested. But I'm guessing that you still want
> >>> a test case for using kprobes in this particular instance. It's not the
> >>> implementation of modifying the IP that you are worried about, but the
> >>> implementation of BPF using it in this case. Right?
> >>
> >> exactly. No doubt that old code works.
> >> But it doesn't mean that bpf_override_return() will continue to
> >> work in kprobes that are not ftrace based.
> >> I suspect Josef's existing test case will cover this situation.
> >> Probably only special .config is needed to disable ftrace, so
> >> "kprobe on entry but not ftrace" check will kick in.
> >
> > Right. If you need to test it, you can run Josef's test case without
> > CONFIG_DYNAMIC_FTRACE.
> 
> It should be obvious that the person who submits the patch
> must run the tests.
> 
> >> But I didn't get an impression that this situation was tested.
> >> Instead I see only logical reasoning that it's _supposed_ to work.
> >> That's not enough.
> >
> > OK, so would you just ask me to run samples/bpf ?
> 
> Please run Josef's test in the !ftrace setup.

Yes, I'll add the result of the test case.

Thank you,


-- 
Masami Hiramatsu 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html