Re: Very slow balance / btrfs-transaction

2017-02-03 Thread Lakshmipathi.G
>Should quota support generally be disabled during balances?

If this true and quota impacts balance throughput, at-least there
should an alert message like "Running Balance with quota will affect
performance" or similar before starting.


Cheers,
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Very slow balance / btrfs-transaction

2017-02-03 Thread Jorg Bornschein
February 4, 2017 1:07 AM, "Goldwyn Rodrigues"  wrote:

> On 02/03/2017 06:30 PM, Jorg Bornschein wrote:
> 
>> February 3, 2017 11:26 PM, "Goldwyn Rodrigues"  wrote:
>> 
>> Hi,
>> 
>> I'm currently running a balance (without any filters) on a 4 drives raid1 
>> filesystem. The array
>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because 
>> the 6TB drive recently
>> replaced a 2TB drive.
>> 
>> I know that balance is not supposed to be a fast operation, but this one is 
>> now running for ~6 days
>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 
>> considered), 82% left)
>> -- so I expect it to take another ~4 weeks.
>> 
>> That seems excessively slow for ~8TiB of data.
>> 
>> Is this expected behavior? In case it's not: Is there anything I can do to 
>> help debug it?
>>> Do you have quotas enabled?
>> 
>> I might have activated it when playing with "snapper" -- I remember using 
>> some quota command
>> without knowing what it does.
>> 
>> How can I check its active? Shall I just disable it wit "btrfs quota 
>> disable"?
> 
> To check your quota limits:
> # btrfs qgroup show 
> 
> To disable
> # btrfs quota disable 
> 
> Yes, please check if disabling quotas makes a difference in execution
> time of btrfs balance.


Quata support was indeed active -- and it warned me that the qroup data was 
inconsistent. 

Disabling quotas had an immediate impact on balance throughput -- it's *much* 
faster now! 
>From a quick glance at iostat I would guess it's at least a factor 100 faster.


Should quota support generally be disabled during balances? Or did I somehow 
push my fs into a weired state where it triggered a slow-path?



Thanks!   
   
   j
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Very slow balance / btrfs-transaction

2017-02-03 Thread Goldwyn Rodrigues


On 02/03/2017 06:30 PM, Jorg Bornschein wrote:
> February 3, 2017 11:26 PM, "Goldwyn Rodrigues"  wrote:
> 
>>> Hi,
>>>
>>> I'm currently running a balance (without any filters) on a 4 drives raid1 
>>> filesystem. The array
>>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because 
>>> the 6TB drive recently
>>> replaced a 2TB drive.
>>>
>>> I know that balance is not supposed to be a fast operation, but this one is 
>>> now running for ~6 days
>>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 
>>> considered), 82% left)
>>> -- so I expect it to take another ~4 weeks.
>>>
>>> That seems excessively slow for ~8TiB of data.
>>>
>>> Is this expected behavior? In case it's not: Is there anything I can do to 
>>> help debug it?
>>
>> Do you have quotas enabled?
> 
> 
> I might have activated it when playing with "snapper" -- I remember using 
> some quota command without knowing what it does. 
> 
> How can I check its active? Shall I just disable it wit "btrfs quota 
> disable"? 
> 

To check your quota limits:
# btrfs qgroup show 

To disable
# btrfs quota disable 

Yes, please check if disabling quotas makes a difference in execution
time of btrfs balance.

-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Very slow balance / btrfs-transaction

2017-02-03 Thread Jorg Bornschein
February 3, 2017 11:26 PM, "Goldwyn Rodrigues"  wrote:

>> Hi,
>> 
>> I'm currently running a balance (without any filters) on a 4 drives raid1 
>> filesystem. The array
>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because 
>> the 6TB drive recently
>> replaced a 2TB drive.
>> 
>> I know that balance is not supposed to be a fast operation, but this one is 
>> now running for ~6 days
>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 
>> considered), 82% left)
>> -- so I expect it to take another ~4 weeks.
>> 
>> That seems excessively slow for ~8TiB of data.
>> 
>> Is this expected behavior? In case it's not: Is there anything I can do to 
>> help debug it?
> 
> Do you have quotas enabled?


I might have activated it when playing with "snapper" -- I remember using some 
quota command without knowing what it does. 

How can I check its active? Shall I just disable it wit "btrfs quota disable"? 


   j
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Very slow balance / btrfs-transaction

2017-02-03 Thread Goldwyn Rodrigues


On 02/03/2017 04:13 PM, j...@capsec.org wrote:
> Hi, 
> 
> 
> I'm currently running a balance (without any filters) on a 4 drives raid1 
> filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running 
> the rebalance because the 6TB drive recently replaced a 2TB drive. 
> 
> 
> I know that balance is not supposed to be a fast operation, but this one is 
> now running for ~6 days and it managed to balance ~18% (754 out of about 4250 
> chunks balanced (755 considered),  82% left) -- so I expect it to take 
> another ~4 weeks. 
> 
> That seems excessively slow for ~8TiB of data.
> 
> 
> Is this expected behavior? In case it's not: Is there anything I can do to 
> help debug it?

Do you have quotas enabled?

-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-02-03 Thread Christoph Anton Mitterer
Hey Qu

On Fri, 2017-02-03 at 14:20 +0800, Qu Wenruo wrote:
> Great thanks for that!
You're welcome. :)


> I also added missing error message output for other places I found,
> and 
> updated the branch, the name remains as "lowmem_tests"
> 
> Please try it.
# btrfs check /dev/nbd0 ; echo $?
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
found 7519512838144 bytes used, no error found
total csum bytes: 7330834320
total tree bytes: 10902437888
total fs tree bytes: 2019704832
total extent tree bytes: 1020149760
btree space waste bytes: 925714197
file data blocks allocated: 7509228494848
 referenced 7630551511040
0

# btrfs check /dev/nbd0 ; echo $?
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
found 7519512838144 bytes used, no error found
total csum bytes: 7330834320
total tree bytes: 10902437888
total fs tree bytes: 2019704832
total extent tree bytes: 1020149760
btree space waste bytes: 925714197
file data blocks allocated: 7509228494848
 referenced 7630551511040
0

=> looks good this time :)


btw: In your commit messages, please change my email to
cales...@scientia.net everywhere... I accidentally used my university
address (christoph.anton.mitte...@lmu.de) sometimes when sending mail.

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-03 Thread Kai Krakow
Am Thu, 02 Feb 2017 13:01:03 +0100
schrieb Marc Joliet :

> On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:
> > Hello list!  
> 
> Hi list
> 
> > It happened again. While using VirtualBox the following crash
> > happened, btrfs check found a lot of errors which it couldn't
> > repair. Earlier that day my system crashed which may already
> > introduced errors into my filesystem. Apparently, I couldn't create
> > an image (not enough space available), I only can give this trace
> > from dmesg:
> > 
> > [44819.903435] [ cut here ]
> > [44819.903443] WARNING: CPU: 3 PID: 2787 at
> > fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x26c/0x290
> > [44819.903444] BTRFS: Transaction aborted (error -17)
> > [44819.903445] Modules linked in: nls_iso8859_15 nls_cp437 vfat fat
> > fuse rfcomm veth af_packet ipt_MASQUERADE nf_nat_masquerade_ipv4
> > iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> > nf_conntrack bridge stp llc w83627ehf bnep hwmon_vid cachefiles
> > btusb btintel bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek
> > snd_hda_codec_generic snd_hda_intel snd_hda_codec rfkill snd_hwdep
> > snd_hda_core snd_pcm snd_timer coretemp hwmon snd r8169 mii
> > kvm_intel kvm iTCO_wdt iTCO_vendor_support rtc_cmos irqbypass
> > soundcore ip_tables uas usb_storage nvidia_drm(PO) vboxpci(O)
> > vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO)
> > nvidia(PO) efivarfs unix ipv6 [44819.903484] CPU: 3 PID: 2787 Comm:
> > BrowserBlocking Tainted: P   O4.7.2-gentoo #2
> > [44819.903485] Hardware name: To Be Filled By O.E.M. To Be Filled
> > By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013 [44819.903487]
> >  8130af2d 8800b7d03d20 
> > [44819.903489]  810865fa 880409374428 8800b7d03d70
> > 8803bf299760 [44819.903491]  ffef
> > 8803f677f000 8108666a [44819.903493] Call Trace:
> > [44819.903496]  [] ? dump_stack+0x46/0x59
> > [44819.903500]  [] ? __warn+0xba/0xe0
> > [44819.903502]  [] ? warn_slowpath_fmt+0x4a/0x50
> > [44819.903504]  [] ?
> > btrfs_run_delayed_refs+0x26c/0x290 [44819.903507]
> > [] ? btrfs_release_path+0xe/0x80 [44819.903509]
> > [] ? btrfs_start_dirty_block_groups+0x2da/0x420
> > [44819.903511] [] ?
> > btrfs_commit_transaction+0x143/0x990 [44819.903514]
> > [] ? kmem_cache_free+0x165/0x180 [44819.903516]
> > [] ? btrfs_wait_ordered_range+0x7c/0x110
> > [44819.903518] [] ? btrfs_sync_file+0x286/0x360
> > [44819.903522] [] ? do_fsync+0x33/0x60
> > [44819.903524]  [] ? SyS_fdatasync+0xa/0x10
> > [44819.903528]  [] ?
> > entry_SYSCALL_64_fastpath+0x13/0x8f [44819.903529] ---[ end trace
> > 6944811e170a0e57 ]--- [44819.903531] BTRFS: error (device bcache2)
> > in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
> > [44819.903533] BTRFS info (device bcache2): forced readonly  
> 
> I got the same error myself, with this stack trace:
> 
> -- Logs begin at Fr 2016-04-01 17:07:28 CEST, end at Mi 2017-02-01
> 22:03:57 CET. --
> Feb 01 01:46:26 diefledermaus kernel: [ cut here
> ] Feb 01 01:46:26 diefledermaus kernel: WARNING: CPU: 1
> PID: 16727 at fs/btrfs/extent-tree.c:2967
> btrfs_run_delayed_refs+0x278/0x2b0 Feb 01 01:46:26 diefledermaus
> kernel: BTRFS: Transaction aborted (error -17) Feb 01 01:46:26
> diefledermaus kernel: BTRFS: error (device sdb2) in
> btrfs_run_delayed_refs:2967: errno=-17 Object already exists Feb 01
> 01:46:27 diefledermaus kernel: BTRFS info (device sdb2): forced
> readonly Feb 01 01:46:27 diefledermaus kernel: Modules linked in: msr
> ctr ccm tun arc4 snd_hda_codec_idt applesmc snd_hda_codec_generic
> input_polldev hwmon snd_hda_intel ath5k snd_hda_codec mac80211
> snd_hda_core ath snd_pcm cfg80211 snd_timer video acpi_cpufreq snd
> backlight sky2 rfkill processor button soundcore sg usb_storage
> sr_mod cdrom ata_generic pata_acpi uhci_hcd ahci libahci ata_piix
> libata ehci_pci ehci_hcd Feb 01 01:46:27 diefledermaus kernel: CPU: 1
> PID: 16727 Comm: kworker/u4:0 Not tainted 4.9.6-gentoo #1
> Feb 01 01:46:27 diefledermaus kernel: Hardware name: Apple Inc. 
> Macmini2,1/Mac-F4208EAA, BIOS MM21.88Z.009A.B00.0706281359
> 06/28/07 Feb 01 01:46:27 diefledermaus kernel: Workqueue:
> btrfs-extent-refs btrfs_extent_refs_helper
> Feb 01 01:46:27 diefledermaus kernel:  
> 812cf739 c9000285fd60 
> Feb 01 01:46:27 diefledermaus kernel:  8104908a
> 8800428df1e0 c9000285fdb0 0020
> Feb 01 01:46:27 diefledermaus kernel:  880003c1b1b8
> 8800bb73e900  810490fa
> Feb 01 01:46:27 diefledermaus kernel: Call Trace:
> Feb 01 01:46:27 diefledermaus kernel:  [] ? 
> dump_stack+0x46/0x5d
> Feb 01 01:46:27 diefledermaus kernel:  [] ?
> __warn+0xba/0xe0 Feb 01 01:46:27 diefledermaus kernel:
> [] ? warn_slowpath_fmt+0x4a/0x50
> Feb 01 01:46:27 diefledermaus kernel:  [] ? 
> 

Very slow balance / btrfs-transaction

2017-02-03 Thread jb
Hi, 


I'm currently running a balance (without any filters) on a 4 drives raid1 
filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running the 
rebalance because the 6TB drive recently replaced a 2TB drive. 


I know that balance is not supposed to be a fast operation, but this one is now 
running for ~6 days and it managed to balance ~18% (754 out of about 4250 
chunks balanced (755 considered),  82% left) -- so I expect it to take another 
~4 weeks. 

That seems excessively slow for ~8TiB of data.


Is this expected behavior? In case it's not: Is there anything I can do to help 
debug it?


The 4 individual devices are bcache devices with currently no ssd cache 
partition attached; the bcache backing devices sit ontop of luks encrypted 
devices. Maybe a few words about the history of this fs: It used to be a 1 
drive btrfs ontop of a bcache partition with a 30GiB SSD cache (actively used 
for >1 year). During the last month, I gradually added devices (always with 
active bcaches). At some point, after adding the 4th device, I deactivated 
(detached) the bcache caching device and instead activated raid1 for data and 
metadata and ran a rebalance (which was reasonably fast -- I don't remember how 
fast exactly, but probably <24h). The finaly steps that lead to the current 
situation: I activated "nossd" and replaced the smallest device with "btrfs dev 
replace" (which was also reasonabley fast, <12h).

 

Best & thanks, 

   j


--
[joerg@dorsal ~]$ lsblk
NAMEMAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda   8:00 111.8G  0 disk
├─sda18:10 1G  0 part  /boot
└─sda28:20 110.8G  0 part
  └─crypted 254:00 110.8G  0 crypt
├─ssd-root  254:10  72.8G  0 lvm   /
├─ssd-swap  254:20 8G  0 lvm   [SWAP]
└─ssd-cache 254:3030G  0 lvm
sdb   8:16   0   2.7T  0 disk
└─sdb18:17   0   2.7T  0 part
  └─crypted-sdb 254:70   2.7T  0 crypt
└─bcache2   253:20   2.7T  0 disk
sdc   8:32   0   2.7T  0 disk
└─sdc18:33   0   2.7T  0 part
  └─crypted-sdc 254:40   2.7T  0 crypt
└─bcache1   253:10   2.7T  0 disk
sdd   8:48   0   2.7T  0 disk
└─sdd18:49   0   2.7T  0 part
  └─crypted-sdd 254:60   2.7T  0 crypt
└─bcache0   253:00   2.7T  0 disk
sde   8:64   0   5.5T  0 disk
└─sde18:65   0   5.5T  0 part
  └─crypted-sde 254:50   5.5T  0 crypt
└─bcache3   253:30   5.5T  0 disk  /storage
--

joerg@dorsal ~]$ sudo btrfs  fi usage -h /storage/
Overall:
Device size:  13.64TiB
Device allocated:  8.35TiB
Device unallocated:5.29TiB
Device missing:  0.00B
Used:  8.34TiB
Free (estimated):  2.65TiB  (min: 2.65TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 15.77MiB)

Data,RAID1: Size:4.17TiB, Used:4.16TiB
   /dev/bcache02.38TiB
   /dev/bcache12.37TiB
   /dev/bcache22.38TiB
   /dev/bcache31.20TiB

Metadata,RAID1: Size:9.00GiB, Used:7.49GiB
   /dev/bcache18.00GiB
   /dev/bcache21.00GiB
   /dev/bcache39.00GiB

System,RAID1: Size:32.00MiB, Used:624.00KiB
   /dev/bcache1   32.00MiB
   /dev/bcache3   32.00MiB

Unallocated:
   /dev/bcache0  355.52GiB
   /dev/bcache1  356.49GiB
   /dev/bcache2  355.52GiB
   /dev/bcache34.25TiB
  
--
[joerg@dorsal ~]$ ps -xal | grep btrfs
1 0   227 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-worker]
1 0   229 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-worker-hi]
1 0   230 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-delalloc]
1 0   231 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-flush_del]
1 0   232 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-cache]
1 0   233 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-submit]
1 0   234 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-fixup]
1 0   235 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-endio]
1 0   236 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-endio-met]
1 0   237 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-endio-met]
1 0   238 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-endio-rai]
1 0   239 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-endio-rep]
1 0   240 2   0 -20  0 0 -  S<   ?  0:00 [btrfs-rmw]
1 0   241 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-endio-wri]
1 0   242 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-freespace]
1 0   243 2   0 -20  0 0 -  S<   ?  0:00 
[btrfs-delayed-m]
1 0   244 2   0 -20  0 

[PATCH v2] btrfs-progs: better document btrfs receive security

2017-02-03 Thread Austin S. Hemmelgarn
This adds some extra documentation to the btrfs-receive manpage that
explains some of the security related aspects of btrfs-receive.  The
first part covers the fact that the subvolume being received is writable
until the receive finishes, and the second covers the current lack of
sanity checking of the send stream.

Signed-off-by: Austin S. Hemmelgarn 
Suggested-by: Graham Cobb 

---
Chages since v1:
* Updated the description based on suggestions from Graham Cobb.

Inspired by a recent thread on the ML.

This could probably be more thorough, but I felt it was more important
to get it documented as quickly as possible, and this should cover the
basic info that most people will care about.

 Documentation/btrfs-receive.asciidoc | 20 
 1 file changed, 20 insertions(+)

diff --git a/Documentation/btrfs-receive.asciidoc 
b/Documentation/btrfs-receive.asciidoc
index a6838e5e..1ada396f 100644
--- a/Documentation/btrfs-receive.asciidoc
+++ b/Documentation/btrfs-receive.asciidoc
@@ -31,7 +31,7 @@ the stream, and print the stream metadata, one operation per 
line.
 
 3. default subvolume has changed or you didn't mount the filesystem at the 
toplevel subvolume
 
-A subvolume is made read-only after the receiving process finishes succesfully.
+A subvolume is made read-only after the receiving process finishes succesfully 
(see BUGS below).
 
 `Options`
 
@@ -68,6 +68,26 @@ dump the stream metadata, one line per operation
 +
 Does not require the 'path' parameter. The filesystem chanded.
 
+BUGS
+
+*btrfs receive* sets the subvolume read-only after it completes
+successfully.  However, while the receive is in progress, users who have
+write access to files or directories in the receiving 'path' can add,
+remove, or modify files, in which case the resulting read-only subvolume
+will not be an exact copy of the sent subvolume.
+
+If the intention is to create an exact copy, the receiving 'path'
+should be protected from access by users until the receive operation
+has completed and the subvolume is set to read-only.
+
+Additionally, receive does not currently do a very good job of validating
+that an incremental send streams actually makes sense, and it is thus
+possible for a specially crafted send stream to create a subvolume with
+reflinks to arbitrary files in the same filesystem.  Because of this,
+users are advised to not use *btrfs receive* on send streams from
+untrusted sources, and to protect trusted streams when sending them
+across untrusted networks.
+
 EXIT STATUS
 ---
 *btrfs receive* returns a zero exit status if it succeeds. Non zero is
-- 
2.11.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-03 Thread Austin S. Hemmelgarn

On 2017-02-03 14:17, Graham Cobb wrote:

On 03/02/17 16:01, Austin S. Hemmelgarn wrote:

Ironically, I ended up having time sooner than I thought.  The message
doesn't appear to be in any of the archives yet, but the message ID is:
<20170203134858.75210-1-ahferro...@gmail.com>


Ah. I didn't notice it until after I had sent my message.

No worries.



I actually like how you explained things a bit better though, so if you
are OK with it I'll update the patch I sent using your description (and
credit you in the commit message too of course).


You are welcome to use any of my phrasing or approach, of course!


I'll send out the new version shortly then.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-03 Thread Graham Cobb
On 03/02/17 16:01, Austin S. Hemmelgarn wrote:
> Ironically, I ended up having time sooner than I thought.  The message
> doesn't appear to be in any of the archives yet, but the message ID is:
> <20170203134858.75210-1-ahferro...@gmail.com>

Ah. I didn't notice it until after I had sent my message.

> I actually like how you explained things a bit better though, so if you
> are OK with it I'll update the patch I sent using your description (and
> credit you in the commit message too of course).

You are welcome to use any of my phrasing or approach, of course!


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/24] btrfs: Convert to separately allocated bdi

2017-02-03 Thread Liu Bo
On Thu, Feb 02, 2017 at 06:34:06PM +0100, Jan Kara wrote:
> Allocate struct backing_dev_info separately instead of embedding it
> inside superblock. This unifies handling of bdi among users.

Looks good.

Reviewed-by: Liu Bo 

Thanks,

-liubo

> 
> CC: Chris Mason 
> CC: Josef Bacik 
> CC: David Sterba 
> CC: linux-btrfs@vger.kernel.org
> Signed-off-by: Jan Kara 
> ---
>  fs/btrfs/ctree.h   |  1 -
>  fs/btrfs/disk-io.c | 36 +++-
>  fs/btrfs/super.c   |  7 +++
>  3 files changed, 14 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 6a823719b6c5..1dc06f66dfcf 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -801,7 +801,6 @@ struct btrfs_fs_info {
>   struct btrfs_super_block *super_for_commit;
>   struct super_block *sb;
>   struct inode *btree_inode;
> - struct backing_dev_info bdi;
>   struct mutex tree_log_mutex;
>   struct mutex transaction_kthread_mutex;
>   struct mutex cleaner_mutex;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 37a31b12bb0c..b25723e729c0 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1810,21 +1810,6 @@ static int btrfs_congested_fn(void *congested_data, 
> int bdi_bits)
>   return ret;
>  }
>  
> -static int setup_bdi(struct btrfs_fs_info *info, struct backing_dev_info 
> *bdi)
> -{
> - int err;
> -
> - err = bdi_setup_and_register(bdi, "btrfs");
> - if (err)
> - return err;
> -
> - bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE;
> - bdi->congested_fn   = btrfs_congested_fn;
> - bdi->congested_data = info;
> - bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
> - return 0;
> -}
> -
>  /*
>   * called by the kthread helper functions to finally call the bio end_io
>   * functions.  This is where read checksum verification actually happens
> @@ -2598,16 +2583,10 @@ int open_ctree(struct super_block *sb,
>   goto fail;
>   }
>  
> - ret = setup_bdi(fs_info, _info->bdi);
> - if (ret) {
> - err = ret;
> - goto fail_srcu;
> - }
> -
>   ret = percpu_counter_init(_info->dirty_metadata_bytes, 0, 
> GFP_KERNEL);
>   if (ret) {
>   err = ret;
> - goto fail_bdi;
> + goto fail_srcu;
>   }
>   fs_info->dirty_metadata_batch = PAGE_SIZE *
>   (1 + ilog2(nr_cpu_ids));
> @@ -2715,7 +2694,6 @@ int open_ctree(struct super_block *sb,
>  
>   sb->s_blocksize = 4096;
>   sb->s_blocksize_bits = blksize_bits(4096);
> - sb->s_bdi = _info->bdi;
>  
>   btrfs_init_btree_inode(fs_info);
>  
> @@ -2912,9 +2890,12 @@ int open_ctree(struct super_block *sb,
>   goto fail_sb_buffer;
>   }
>  
> - fs_info->bdi.ra_pages *= btrfs_super_num_devices(disk_super);
> - fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
> - SZ_4M / PAGE_SIZE);
> + sb->s_bdi->congested_fn = btrfs_congested_fn;
> + sb->s_bdi->congested_data = fs_info;
> + sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
> + sb->s_bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE;
> + sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
> + sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
>  
>   sb->s_blocksize = sectorsize;
>   sb->s_blocksize_bits = blksize_bits(sectorsize);
> @@ -3282,8 +3263,6 @@ int open_ctree(struct super_block *sb,
>   percpu_counter_destroy(_info->delalloc_bytes);
>  fail_dirty_metadata_bytes:
>   percpu_counter_destroy(_info->dirty_metadata_bytes);
> -fail_bdi:
> - bdi_destroy(_info->bdi);
>  fail_srcu:
>   cleanup_srcu_struct(_info->subvol_srcu);
>  fail:
> @@ -4010,7 +3989,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
>   percpu_counter_destroy(_info->dirty_metadata_bytes);
>   percpu_counter_destroy(_info->delalloc_bytes);
>   percpu_counter_destroy(_info->bio_counter);
> - bdi_destroy(_info->bdi);
>   cleanup_srcu_struct(_info->subvol_srcu);
>  
>   btrfs_free_stripe_hash_table(fs_info);
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index b5ae7d3d1896..08ef08b63132 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1133,6 +1133,13 @@ static int btrfs_fill_super(struct super_block *sb,
>  #endif
>   sb->s_flags |= MS_I_VERSION;
>   sb->s_iflags |= SB_I_CGROUPWB;
> +
> + err = super_setup_bdi(sb);
> + if (err) {
> + btrfs_err(fs_info, "super_setup_bdi failed");
> + return err;
> + }
> +
>   err = open_ctree(sb, fs_devices, (char *)data);
>   if (err) {
>   btrfs_err(fs_info, "open_ctree failed");
> -- 
> 2.10.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org

Re: [PATCH 04/24] fs: Provide infrastructure for dynamic BDIs in filesystems

2017-02-03 Thread Liu Bo
On Fri, Feb 03, 2017 at 02:50:42PM +0100, Jan Kara wrote:
> On Thu 02-02-17 11:28:27, Liu Bo wrote:
> > Hi,
> > 
> > On Thu, Feb 02, 2017 at 06:34:02PM +0100, Jan Kara wrote:
> > > Provide helper functions for setting up dynamically allocated
> > > backing_dev_info structures for filesystems and cleaning them up on
> > > superblock destruction.
> > 
> > Just one concern, will this cause problems for multiple superblock cases
> > like nfs with nosharecache?
> 
> Can you ellaborate a bit? I've looked for a while what nfs with
> nosharecache does but I didn't see how it would influence anything with
> bdis...

Oh, I missed that bdi_seq was static, then it should be fine.

(I was worried about that nfs with nosharecache would have multiple
superblocks and if each superblock has a bdi using the same bdi name,
nfs-xx.)

Thanks for the reply.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Remove unused function arg in delete_extent_records

2017-02-03 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues 

new_len is not used in delete_extent_records().

Signed-off-by: Goldwyn Rodrigues 
---
 cmds-check.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 84e1d99..9fb85e4 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -7969,7 +7969,7 @@ out:
 static int delete_extent_records(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct btrfs_path *path,
-u64 bytenr, u64 new_len)
+u64 bytenr)
 {
struct btrfs_key key;
struct btrfs_key found_key;
@@ -8975,7 +8975,7 @@ static int fixup_extent_refs(struct btrfs_fs_info *info,
 
/* step two, delete all the existing records */
ret = delete_extent_records(trans, info->extent_root, ,
-   rec->start, rec->max_size);
+   rec->start);
 
if (ret < 0)
goto out;
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-03 Thread Austin S. Hemmelgarn

On 2017-02-03 10:44, Graham Cobb wrote:

On 03/02/17 12:44, Austin S. Hemmelgarn wrote:

I can look at making a patch for this, but it may be next week before I
have time (I'm not great at multi-tasking when it comes to software
development, and I'm in the middle of helping to fix a bug in Ansible
right now).


That would be great, Austin! It is about 15 years since I last submitted
a patch under kernel development patch rules and things have changed a
fair bit in that time. So if you are set up to do it that sounds good.

As a starting point, I have created a suggested text (patch attached).
Ironically, I ended up having time sooner than I thought.  The message 
doesn't appear to be in any of the archives yet, but the message ID is:

<20170203134858.75210-1-ahferro...@gmail.com>

I actually like how you explained things a bit better though, so if you 
are OK with it I'll update the patch I sent using your description (and 
credit you in the commit message too of course).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-03 Thread Graham Cobb
On 03/02/17 12:44, Austin S. Hemmelgarn wrote:
> I can look at making a patch for this, but it may be next week before I
> have time (I'm not great at multi-tasking when it comes to software
> development, and I'm in the middle of helping to fix a bug in Ansible
> right now).

That would be great, Austin! It is about 15 years since I last submitted
a patch under kernel development patch rules and things have changed a
fair bit in that time. So if you are set up to do it that sounds good.

As a starting point, I have created a suggested text (patch attached).




diff --git a/Documentation/btrfs-receive.asciidoc 
b/Documentation/btrfs-receive.asciidoc
index 6be4aa6..db525d9 100644
--- a/Documentation/btrfs-receive.asciidoc
+++ b/Documentation/btrfs-receive.asciidoc
@@ -31,7 +31,7 @@ the stream, and print the stream metadata, one operation per 
line.
 
 3. default subvolume has changed or you didn't mount the filesystem at the 
toplevel subvolume
 
-A subvolume is made read-only after the receiving process finishes succesfully.
+A subvolume is made read-only after the receiving process finishes succesfully 
(see BUGS below).
 
 `Options`
 
@@ -73,6 +73,16 @@ EXIT STATUS
 *btrfs receive* returns a zero exit status if it succeeds. Non zero is
 returned in case of failure.
 
+BUGS
+
+*btrfs receive* sets the subvolume read-only after it completes successfully.
+However, while the receive is in progress, users who have write access to files
+or directories in the receiving 'path' can add, remove or modify files, in 
which
+case the resulting read-only subvolume will not be a copy of the sending 
subvolume.
+
+If the intention is to create an exact copy, the receiving 'path' should be 
protected
+from access by users until the receive has completed and the subvolume set to 
read-only.
+
 AVAILABILITY
 
 *btrfs* is part of btrfs-progs.


Re: [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"

2017-02-03 Thread Michal Hocko
On Mon 30-01-17 09:12:10, Michal Hocko wrote:
> On Fri 27-01-17 11:40:42, Theodore Ts'o wrote:
> > On Fri, Jan 27, 2017 at 10:37:35AM +0100, Michal Hocko wrote:
> > > If this ever turn out to be a problem and with the vmapped stacks we
> > > have good chances to get a proper stack traces on a potential overflow
> > > we can add the scope API around the problematic code path with the
> > > explanation why it is needed.
> > 
> > Yeah, or maybe we can automate it?  Can the reclaim code check how
> > much stack space is left and do the right thing automatically?
> 
> I am not sure how to do that. Checking for some magic value sounds quite
> fragile to me. It also sounds a bit strange to focus only on the reclaim
> while other code paths might suffer from the same problem.
> 
> What is actually the deepest possible call chain from the slab reclaim
> where I stopped? I have tried to follow that path but hit the callback
> wall quite early.
>  
> > The reason why I'm nervous is that nojournal mode is not a common
> > configuration, and "wait until production systems start failing" is
> > not a strategy that I or many SRE-types find comforting.
> 
> I understand that but I would be much more happier if we did the
> decision based on the actual data rather than a fear something would
> break down.

ping on this. I would really like to move forward here and target 4.11
merge window. Is your concern so serious to block this patch?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Heatmap - v4 ... colors!!

2017-02-03 Thread Hans van Kranenburg
On 02/03/2017 03:36 PM, Timofey Titovets wrote:
> 
> ➜  python-btrfs git:(master) ✗ tar -vxf python-btrfs-5-1-any.pkg.tar.xz
> .PKGINFO
> .BUILDINFO
> .MTREE
> usr/
> usr/lib/
> usr/lib/python3.6/
> usr/lib/python3.6/site-packages/
> usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/
> usr/lib/python3.6/site-packages/btrfs/
> usr/lib/python3.6/site-packages/btrfs/__pycache__/
> usr/lib/python3.6/site-packages/btrfs/utils.py
> usr/lib/python3.6/site-packages/btrfs/ioctl.py
> usr/lib/python3.6/site-packages/btrfs/ctree.py
> usr/lib/python3.6/site-packages/btrfs/crc32c.py
> usr/lib/python3.6/site-packages/btrfs/__init__.py
> usr/lib/python3.6/site-packages/btrfs/__pycache__/utils.cpython-36.pyc
> usr/lib/python3.6/site-packages/btrfs/__pycache__/ioctl.cpython-36.pyc
> usr/lib/python3.6/site-packages/btrfs/__pycache__/ctree.cpython-36.pyc
> usr/lib/python3.6/site-packages/btrfs/__pycache__/crc32c.cpython-36.pyc
> usr/lib/python3.6/site-packages/btrfs/__pycache__/__init__.cpython-36.pyc
> usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/installed-files.txt
> usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/PKG-INFO
> usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/dependency_links.txt
> usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/top_level.txt
> usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/SOURCES.txt
> 
> I think it's building automaticaly by:
> PIP_CONFIG_FILE=/dev/null pip install --isolated --root="$pkgdir"
> --ignore-installed --no-deps btrfs
> 

Even better :-)

Thanks,

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Heatmap - v4 ... colors!!

2017-02-03 Thread Timofey Titovets
2017-02-03 17:27 GMT+03:00 Hans van Kranenburg :
> On 02/03/2017 03:18 PM, Timofey Titovets wrote:
>> 2017-02-03 15:57 GMT+03:00 Hans van Kranenburg 
>> :
>>> On 02/03/2017 12:25 PM, Timofey Titovets wrote:
 Thank you for your great work:
 JFYI Packaged in AUR:
 https://aur.archlinux.org/packages/python-btrfs-heatmap/
>>>
>>> Hey, thanks.
>>>
>>> Just wondering... what is that btrfs.py you refer to in...
>>> https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-btrfs ?
>>>
>>> It's a package, not a single file, so maybe that compile command doesn't
>>> do anything? I'm not familiar with arch, so correct me if I'm wrong.
>>>
>>> .
>>> ├── btrfs
>>> │   ├── crc32c.py
>>> │   ├── ctree.py
>>> │   ├── __init__.py
>>> │   ├── ioctl.py
>>> │   ├── utils.py
>>>
>>
>> You right, compile command are useless, so i did remove it, thanks.
>
> -$ python2 -m compileall btrfs
> Listing btrfs ...
> Compiling btrfs/__init__.py ...
> Compiling btrfs/crc32c.py ...
> Compiling btrfs/ctree.py ...
> Compiling btrfs/ioctl.py ...
> Compiling btrfs/utils.py ...
>
> .
> ├── btrfs
> │   ├── crc32c.py
> │   ├── crc32c.pyc
> │   ├── ctree.py
> │   ├── ctree.pyc
> │   ├── __init__.py
> │   ├── __init__.pyc
> │   ├── ioctl.py
> │   ├── ioctl.pyc
> │   ├── utils.py
> │   └── utils.pyc
>
>
> -$ python3 -m compileall btrfs
> Listing 'btrfs'...
> Compiling 'btrfs/__init__.py'...
> Compiling 'btrfs/crc32c.py'...
> Compiling 'btrfs/ctree.py'...
> Compiling 'btrfs/ioctl.py'...
> Compiling 'btrfs/utils.py'...
>
> .
> ├── btrfs
> │   ├── crc32c.py
> │   ├── ctree.py
> │   ├── __init__.py
> │   ├── ioctl.py
> │   ├── __pycache__
> │   │   ├── crc32c.cpython-35.pyc
> │   │   ├── ctree.cpython-35.pyc
> │   │   ├── __init__.cpython-35.pyc
> │   │   ├── ioctl.cpython-35.pyc
> │   │   └── utils.cpython-35.pyc
> │   └── utils.py
>
>
> --
> Hans van Kranenburg

➜  python-btrfs git:(master) ✗ tar -vxf python-btrfs-5-1-any.pkg.tar.xz
.PKGINFO
.BUILDINFO
.MTREE
usr/
usr/lib/
usr/lib/python3.6/
usr/lib/python3.6/site-packages/
usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/
usr/lib/python3.6/site-packages/btrfs/
usr/lib/python3.6/site-packages/btrfs/__pycache__/
usr/lib/python3.6/site-packages/btrfs/utils.py
usr/lib/python3.6/site-packages/btrfs/ioctl.py
usr/lib/python3.6/site-packages/btrfs/ctree.py
usr/lib/python3.6/site-packages/btrfs/crc32c.py
usr/lib/python3.6/site-packages/btrfs/__init__.py
usr/lib/python3.6/site-packages/btrfs/__pycache__/utils.cpython-36.pyc
usr/lib/python3.6/site-packages/btrfs/__pycache__/ioctl.cpython-36.pyc
usr/lib/python3.6/site-packages/btrfs/__pycache__/ctree.cpython-36.pyc
usr/lib/python3.6/site-packages/btrfs/__pycache__/crc32c.cpython-36.pyc
usr/lib/python3.6/site-packages/btrfs/__pycache__/__init__.cpython-36.pyc
usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/installed-files.txt
usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/PKG-INFO
usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/dependency_links.txt
usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/top_level.txt
usr/lib/python3.6/site-packages/btrfs-5-py3.6.egg-info/SOURCES.txt

I think it's building automaticaly by:
PIP_CONFIG_FILE=/dev/null pip install --isolated --root="$pkgdir"
--ignore-installed --no-deps btrfs
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Heatmap - v4 ... colors!!

2017-02-03 Thread Hans van Kranenburg
On 02/03/2017 03:18 PM, Timofey Titovets wrote:
> 2017-02-03 15:57 GMT+03:00 Hans van Kranenburg 
> :
>> On 02/03/2017 12:25 PM, Timofey Titovets wrote:
>>> Thank you for your great work:
>>> JFYI Packaged in AUR:
>>> https://aur.archlinux.org/packages/python-btrfs-heatmap/
>>
>> Hey, thanks.
>>
>> Just wondering... what is that btrfs.py you refer to in...
>> https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-btrfs ?
>>
>> It's a package, not a single file, so maybe that compile command doesn't
>> do anything? I'm not familiar with arch, so correct me if I'm wrong.
>>
>> .
>> ├── btrfs
>> │   ├── crc32c.py
>> │   ├── ctree.py
>> │   ├── __init__.py
>> │   ├── ioctl.py
>> │   ├── utils.py
>>
> 
> You right, compile command are useless, so i did remove it, thanks.

-$ python2 -m compileall btrfs
Listing btrfs ...
Compiling btrfs/__init__.py ...
Compiling btrfs/crc32c.py ...
Compiling btrfs/ctree.py ...
Compiling btrfs/ioctl.py ...
Compiling btrfs/utils.py ...

.
├── btrfs
│   ├── crc32c.py
│   ├── crc32c.pyc
│   ├── ctree.py
│   ├── ctree.pyc
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── ioctl.py
│   ├── ioctl.pyc
│   ├── utils.py
│   └── utils.pyc


-$ python3 -m compileall btrfs
Listing 'btrfs'...
Compiling 'btrfs/__init__.py'...
Compiling 'btrfs/crc32c.py'...
Compiling 'btrfs/ctree.py'...
Compiling 'btrfs/ioctl.py'...
Compiling 'btrfs/utils.py'...

.
├── btrfs
│   ├── crc32c.py
│   ├── ctree.py
│   ├── __init__.py
│   ├── ioctl.py
│   ├── __pycache__
│   │   ├── crc32c.cpython-35.pyc
│   │   ├── ctree.cpython-35.pyc
│   │   ├── __init__.cpython-35.pyc
│   │   ├── ioctl.cpython-35.pyc
│   │   └── utils.cpython-35.pyc
│   └── utils.py


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Heatmap - v4 ... colors!!

2017-02-03 Thread Timofey Titovets
2017-02-03 15:57 GMT+03:00 Hans van Kranenburg :
> On 02/03/2017 12:25 PM, Timofey Titovets wrote:
>> Thank you for your great work:
>> JFYI Packaged in AUR:
>> https://aur.archlinux.org/packages/python-btrfs-heatmap/
>
> Hey, thanks.
>
> Just wondering... what is that btrfs.py you refer to in...
> https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-btrfs ?
>
> It's a package, not a single file, so maybe that compile command doesn't
> do anything? I'm not familiar with arch, so correct me if I'm wrong.
>
> .
> ├── btrfs
> │   ├── crc32c.py
> │   ├── ctree.py
> │   ├── __init__.py
> │   ├── ioctl.py
> │   ├── utils.py
>
> --
> Hans van Kranenburg

You right, compile command are useless, so i did remove it, thanks.

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/24] fs: Provide infrastructure for dynamic BDIs in filesystems

2017-02-03 Thread Jan Kara
On Thu 02-02-17 11:28:27, Liu Bo wrote:
> Hi,
> 
> On Thu, Feb 02, 2017 at 06:34:02PM +0100, Jan Kara wrote:
> > Provide helper functions for setting up dynamically allocated
> > backing_dev_info structures for filesystems and cleaning them up on
> > superblock destruction.
> 
> Just one concern, will this cause problems for multiple superblock cases
> like nfs with nosharecache?

Can you ellaborate a bit? I've looked for a while what nfs with
nosharecache does but I didn't see how it would influence anything with
bdis...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: better document btrfs receive security

2017-02-03 Thread Austin S. Hemmelgarn
This adds some extra documentation to the btrfs-receive manpage that
explains some of the security related aspects of btrfs-receive.  The
first part covers the fact that the subvolume being received is writable
until the receive finishes, and the second covers the current lack of
sanity checking of the send stream.

Signed-off-by: Austin S. Hemmelgarn 

---
Inspired by a recent thread on the ML.

This could probably be more thorough, but I felt it was more important
to get it documented as quickly as possible, and this should cover the
basic info that most people will care about.

 Documentation/btrfs-receive.asciidoc | 20 
 1 file changed, 20 insertions(+)

diff --git a/Documentation/btrfs-receive.asciidoc 
b/Documentation/btrfs-receive.asciidoc
index a6838e5e..1ada396f 100644
--- a/Documentation/btrfs-receive.asciidoc
+++ b/Documentation/btrfs-receive.asciidoc
@@ -68,6 +68,26 @@ dump the stream metadata, one line per operation
 +
 Does not require the 'path' parameter. The filesystem chanded.
 
+SECURITY
+
+Because of how *btrfs receive* works, subvolumes that are in the
+process of being received are writable.  As a result of this, there
+is a possibility that a subvolume for which the receive operation just
+completed may not be identical to the originally sent subvolume.
+
+To avoid this in cases where more than one user may have access to the
+area the received subvolumes are being stored, it is reccommended to
+receive subvolumes into a staging area which is only accessible to the
+user who is running receive, and then move then into the final destination
+when the receive command completes.
+
+Additionally, receive does not currently do a very good job of validating
+that an incremental send streams actually makes sense, and it is thus
+possible for a specially crafted send stream to create a subvolume with
+reflinks to arbitrary files in the same filesystem.  Because of this,
+users are advised to not use *btrfs receive* on send streams from
+untrusted sources.
+
 EXIT STATUS
 ---
 *btrfs receive* returns a zero exit status if it succeeds. Non zero is
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_drop_snapshot "IO failure" after RAID controller reset

2017-02-03 Thread Juergen 'Louis' Fluk
On Fri, Feb 03, 2017 at 11:16:51AM +0100, Juergen 'Louis' Fluk wrote:
> Dear all,
> 
> the RAID controller underneath our 32T BTRFS container had a sudden reset,
> and after rebooting BTRFS drops to readonly after some list of messages.
> 
> I did recovery + btrfs-zero-log + recovery (using a LVM snapshot), yet
> the error persists. From "transid verify failed" I understand that journal
> and data are not in sync (data is newer). BTRFS tries to drop a snapshot
> and fails there - is there a way to ignore it or force it?
> 
> RAID controller does not signal new errors so I assume it's not a problem
> of accessing some single disk block, but possibly some information was not
> written to disk at the time of controller reset.
...
> 
>   mount -o recovery /dev/vg/snap /mnt/backup
> 
> Feb 3 08:05:57 zeus kernel: [336619.494618] BTRFS info (device dm-2): 
> enabling auto recovery
> Feb 3 08:05:57 zeus kernel: [336619.494625] BTRFS info (device dm-2): disk 
> space caching is enabled
> Feb 3 08:09:32 zeus kernel: [336834.568348] BTRFS: checking UUID tree
> Feb 3 08:10:44 zeus kernel: [336905.752787] BTRFS info (device dm-2): The 
> free space cache file (814462533632) is invalid. skip it
> Feb 3 08:10:44 zeus kernel: [336905.752787]
> Feb 3 08:11:26 zeus kernel: [336948.358199] BTRFS (device dm-2): parent 
> transid verify failed on 4052030455808 wanted 451805 found 451973
> Feb 3 08:11:26 zeus kernel: [336948.397901] BTRFS (device dm-2): parent 
> transid verify failed on 4052030455808 wanted 451805 found 451973
> Feb 3 08:11:46 zeus kernel: [336968.341996] BTRFS (device dm-2): parent 
> transid verify failed on 4052030455808 wanted 451805 found 451973
> Feb 3 08:11:46 zeus kernel: [336968.362567] BTRFS (device dm-2): parent 
> transid verify failed on 4052030455808 wanted 451805 found 451973
> Feb 3 08:11:46 zeus kernel: [336968.406344] BTRFS: error (device dm-2) in 
> btrfs_drop_snapshot:8367: errno=-5 IO failure
> Feb 3 08:11:46 zeus kernel: [336968.418816] BTRFS info (device dm-2): forced 
> readonly
>
...
> The server is running kernel 3.19.0-79-generic (ubuntu 14.04), btrfs-tools 
> 3.12-1ubuntu0.1.
> Does it make sense to use newer kernel and/or tools to recover?


Running on kernel 4.4.0-62-generic now, procedure looks quite similar:

  mount -o recovery /dev/vg/snap /mnt/backup

Feb 3 11:38:30 zeus kernel: [ 297.414369] BTRFS info (device dm-2): enabling 
auto recovery
Feb 3 11:38:30 zeus kernel: [ 297.414375] BTRFS info (device dm-2): disk space 
caching is enabled
Feb 3 11:41:54 zeus kernel: [ 501.145009] BTRFS: checking UUID tree
Feb 3 11:43:02 zeus kernel: [ 568.938947] BTRFS info (device dm-2): The free 
space cache file (814462533632) is invalid. skip it
Feb 3 11:43:02 zeus kernel: [ 568.938947]
Feb 3 11:44:57 zeus kernel: [ 683.656849] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:44:57 zeus kernel: [ 683.718674] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:44:59 zeus kernel: [ 686.344684] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:44:59 zeus kernel: [ 686.370777] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:44:59 zeus kernel: [ 686.374094] BTRFS: error (device dm-2) in 
btrfs_drop_snapshot:9008: errno=-5 IO failure
Feb 3 11:44:59 zeus kernel: [ 686.32] BTRFS info (device dm-2): forced 
readonly

  umount /mnt/backup

Feb 3 11:46:36 zeus kernel: [ 783.112240] BTRFS error (device dm-2): cleaner 
transaction attach returned -30

  btrfs-zero-log /dev/vg/snap # takes 180s, no messages

  mount -o recovery /dev/vg/snap /mnt/backup

Feb 3 11:49:35 zeus kernel: [ 961.805605] BTRFS info (device dm-2): enabling 
auto recovery
Feb 3 11:49:35 zeus kernel: [ 961.805611] BTRFS info (device dm-2): disk space 
caching is enabled
Feb 3 11:53:03 zeus kernel: [ 1170.373099] BTRFS: checking UUID tree
Feb 3 11:54:12 zeus kernel: [ 1238.660425] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:54:12 zeus kernel: [ 1238.807281] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:54:25 zeus kernel: [ 1252.132065] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:54:25 zeus kernel: [ 1252.422404] BTRFS error (device dm-2): parent 
transid verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 11:54:25 zeus kernel: [ 1252.425953] BTRFS: error (device dm-2) in 
btrfs_drop_snapshot:9008: errno=-5 IO failure
Feb 3 11:54:25 zeus kernel: [ 1252.429649] BTRFS info (device dm-2): forced 
readonly
Feb 3 11:59:14 zeus kernel: [ 1541.593077] BTRFS warning (device dm-2): 
btrfs_uuid_scan_kthread failed -30
Feb 3 12:00:28 zeus kernel: [ 1614.931233] BTRFS error (device dm-2): parent 

Re: Btrfs Heatmap - v4 ... colors!!

2017-02-03 Thread Hans van Kranenburg
On 02/03/2017 12:25 PM, Timofey Titovets wrote:
> Thank you for your great work:
> JFYI Packaged in AUR:
> https://aur.archlinux.org/packages/python-btrfs-heatmap/

Hey, thanks.

Just wondering... what is that btrfs.py you refer to in...
https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-btrfs ?

It's a package, not a single file, so maybe that compile command doesn't
do anything? I'm not familiar with arch, so correct me if I'm wrong.

.
├── btrfs
│   ├── crc32c.py
│   ├── ctree.py
│   ├── __init__.py
│   ├── ioctl.py
│   ├── utils.py

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-03 Thread Austin S. Hemmelgarn

On 2017-02-03 04:14, Duncan wrote:

Graham Cobb posted on Thu, 02 Feb 2017 10:52:26 + as excerpted:


On 02/02/17 00:02, Duncan wrote:

If it's a workaround, then many of the Linux procedures we as admins
and users use every day are equally workarounds.  Setting 007 perms on
a dir that doesn't have anything immediately security vulnerable in it,
simply to keep other users from even potentially seeing or being able
to write to something N layers down the subdir tree, is standard
practice.


No. There is no need to normally place a read-only snapshot below a
no-execute directory just to prevent write access to it. That is not
part of the admin's expectation.


Which is my point.  This is no different than standard security
practice,
that an admin should be familiar with and using without even having to
think about it.  Btrfs is simply making the same assumptions that
everyone else does, that an admin knows what they are doing and sets
the upstream permissions with that in mind.  If they don't, how is that
btrfs' fault?


Because btrfs intends the receive snapshot to be read-only. That is the
expectation of the sysadmin.


Read-only *after* completion, yes.  But a sysadmin that believes really
setting something read-only and then trying to write to it from
userspace, which is what btrfs does until the receive is done, should
work, doesn't understand the meaning of read-only.

Meanwhile, Austin said most of what I'd say, probably better than I'd say
it, so I won't repeat it here, but I agree with him.


Even though it is security-related, I agree it isn't the highest
priority btrfs bug. It can probably wait until receive is being worked
on for other reasons. But if it isn't going to be fixed any time soon,
it should be documented in the Wiki and the man page, with the suggested
workround for anyone who needs to make sure the receive won't be
tampered with.


One thing I was going to say in the previous post and forgot, is that not
withstanding all the technical reasons, I do agree that documenting it in
the manpage, etc, would be a good idea.  In that I agree with both you
and Austin. =:^)
I can look at making a patch for this, but it may be next week before I 
have time (I'm not great at multi-tasking when it comes to software 
development, and I'm in the middle of helping to fix a bug in Ansible 
right now).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Heatmap - v4 ... colors!!

2017-02-03 Thread Timofey Titovets
Thank you for your great work:
JFYI Packaged in AUR:
https://aur.archlinux.org/packages/python-btrfs-heatmap/
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to dump/find parity of RAID-5 file?

2017-02-03 Thread Lakshmipathi.G
Hi.

Came across this thread 
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg55161.html 
Exploring possibility of adding test-scripts around these area using dump-tree 
& corrupt-block.But
unable to figure-out how to get parity of file or find its location.  dump-tree 
output gave,

item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 145096704) itemoff 15557 itemsize 
144
length 134217728 owner 2 stripe_len 65536 type DATA|RAID5
io_align 65536 io_width 65536 sector_size 4096
num_stripes 3 sub_stripes 0
stripe 0 devid 3 offset 6368 # Is this 
parity?Seems empty?
dev_uuid f62df114-186c-4e48-8152-9ed15aa078b4
stripe 1 devid 2 offset 6368 # Contains file 
data-stripe-1
dev_uuid c0aeaab0-e57e-4f7a-9356-db1878876d9f
stripe 2 devid 1 offset 83034112 # Contains file 
data-stripe-2
dev_uuid 637b3666-9d8f-4ec4-9969-53b0b933b9b1
thanks.

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_drop_snapshot "IO failure" after RAID controller reset

2017-02-03 Thread Juergen 'Louis' Fluk
Dear all,

the RAID controller underneath our 32T BTRFS container had a sudden reset,
and after rebooting BTRFS drops to readonly after some list of messages.

I did recovery + btrfs-zero-log + recovery (using a LVM snapshot), yet
the error persists. From "transid verify failed" I understand that journal
and data are not in sync (data is newer). BTRFS tries to drop a snapshot
and fails there - is there a way to ignore it or force it?

RAID controller does not signal new errors so I assume it's not a problem
of accessing some single disk block, but possibly some information was not
written to disk at the time of controller reset.


  mount -o recovery /dev/vg/snap /mnt/backup

Feb 3 08:05:57 zeus kernel: [336619.494618] BTRFS info (device dm-2): enabling 
auto recovery
Feb 3 08:05:57 zeus kernel: [336619.494625] BTRFS info (device dm-2): disk 
space caching is enabled
Feb 3 08:09:32 zeus kernel: [336834.568348] BTRFS: checking UUID tree
Feb 3 08:10:44 zeus kernel: [336905.752787] BTRFS info (device dm-2): The free 
space cache file (814462533632) is invalid. skip it
Feb 3 08:10:44 zeus kernel: [336905.752787]
Feb 3 08:11:26 zeus kernel: [336948.358199] BTRFS (device dm-2): parent transid 
verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 08:11:26 zeus kernel: [336948.397901] BTRFS (device dm-2): parent transid 
verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 08:11:46 zeus kernel: [336968.341996] BTRFS (device dm-2): parent transid 
verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 08:11:46 zeus kernel: [336968.362567] BTRFS (device dm-2): parent transid 
verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 08:11:46 zeus kernel: [336968.406344] BTRFS: error (device dm-2) in 
btrfs_drop_snapshot:8367: errno=-5 IO failure
Feb 3 08:11:46 zeus kernel: [336968.418816] BTRFS info (device dm-2): forced 
readonly

  umount /mnt/backup

Feb 3 08:14:13 zeus kernel: [337114.733143] BTRFS warning (device dm-2): page 
private not zero on page 4049746657280
Feb 3 08:14:13 zeus kernel: [337114.733148] BTRFS warning (device dm-2): page 
private not zero on page 4049746661376
Feb 3 08:14:13 zeus kernel: [337114.733151] BTRFS warning (device dm-2): page 
private not zero on page 4049746665472
Feb 3 08:14:13 zeus kernel: [337114.733154] BTRFS warning (device dm-2): page 
private not zero on page 4049746669568

  btrfs-zero-log /dev/vg/snap # takes about 180s, no messages

  mount -o recovery /dev/vg/snap /mnt/backup

Feb 3 08:17:01 zeus kernel: [337282.701412] BTRFS info (device dm-2): enabling 
auto recovery
Feb 3 08:17:01 zeus kernel: [337282.701418] BTRFS info (device dm-2): disk 
space caching is enabled
Feb 3 08:20:30 zeus kernel: [337492.359931] BTRFS: checking UUID tree
Feb 3 08:21:01 zeus kernel: [337523.269214] BTRFS (device dm-2): parent transid 
verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 08:21:01 zeus kernel: [337523.382927] BTRFS (device dm-2): parent transid 
verify failed on 4052030455808 wanted 451805 found 451973
Feb 3 08:26:06 zeus kernel: [337828.19] BTRFS (device dm-2): parent transid 
verify failed on 4052043694080 wanted 451805 found 451973
Feb 3 08:26:06 zeus kernel: [337828.291338] BTRFS (device dm-2): parent transid 
verify failed on 4052043694080 wanted 451805 found 451973
Feb 3 08:26:11 zeus kernel: [337833.611569] BTRFS (device dm-2): parent transid 
verify failed on 4050351652864 wanted 451804 found 451973
Feb 3 08:26:12 zeus kernel: [337833.662051] BTRFS (device dm-2): parent transid 
verify failed on 4050351652864 wanted 451804 found 451973
Feb 3 08:26:15 zeus kernel: [337837.077964] BTRFS (device dm-2): parent transid 
verify failed on 4052066533376 wanted 451806 found 451974
Feb 3 08:26:15 zeus kernel: [337837.106540] BTRFS (device dm-2): parent transid 
verify failed on 4052066533376 wanted 451806 found 451974
Feb 3 08:26:20 zeus kernel: [337842.595882] BTRFS (device dm-2): parent transid 
verify failed on 4050367676416 wanted 451804 found 451973
Feb 3 08:26:21 zeus kernel: [337842.686296] BTRFS (device dm-2): parent transid 
verify failed on 4050367676416 wanted 451804 found 451973
Feb 3 08:26:24 zeus kernel: [337845.666495] BTRFS (device dm-2): parent transid 
verify failed on 4051971883008 wanted 451804 found 451973
Feb 3 08:26:24 zeus kernel: [337845.728624] BTRFS (device dm-2): parent transid 
verify failed on 4051971883008 wanted 451804 found 451973
Feb 3 08:26:27 zeus kernel: [337848.780978] BTRFS (device dm-2): parent transid 
verify failed on 4051397328896 wanted 451804 found 451973
Feb 3 08:26:27 zeus kernel: [337848.827572] BTRFS (device dm-2): parent transid 
verify failed on 4051397328896 wanted 451804 found 451973
Feb 3 08:26:27 zeus kernel: [337849.116946] BTRFS (device dm-2): parent transid 
verify failed on 4052072022016 wanted 451806 found 451974
Feb 3 08:26:27 zeus kernel: [337849.164664] BTRFS (device dm-2): parent transid 
verify failed on 4052072022016 wanted 451806 found 451974
Feb 3 08:26:29 

Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.

2017-02-03 Thread Duncan
Austin S. Hemmelgarn posted on Thu, 02 Feb 2017 07:49:50 -0500 as
excerpted:

> I think (although I'm not sure about it) that this:
> http://www.spinics.net/lists/linux-btrfs/msg47283.html is the first
> posting of the patch series.

Yes.  That looks like it.  Thanks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-03 Thread Duncan
Graham Cobb posted on Thu, 02 Feb 2017 10:52:26 + as excerpted:

> On 02/02/17 00:02, Duncan wrote:
>> If it's a workaround, then many of the Linux procedures we as admins
>> and users use every day are equally workarounds.  Setting 007 perms on
>> a dir that doesn't have anything immediately security vulnerable in it,
>> simply to keep other users from even potentially seeing or being able
>> to write to something N layers down the subdir tree, is standard
>> practice.
> 
> No. There is no need to normally place a read-only snapshot below a
> no-execute directory just to prevent write access to it. That is not
> part of the admin's expectation.
> 
>> Which is my point.  This is no different than standard security
>> practice,
>> that an admin should be familiar with and using without even having to
>> think about it.  Btrfs is simply making the same assumptions that
>> everyone else does, that an admin knows what they are doing and sets
>> the upstream permissions with that in mind.  If they don't, how is that
>> btrfs' fault?
> 
> Because btrfs intends the receive snapshot to be read-only. That is the
> expectation of the sysadmin.

Read-only *after* completion, yes.  But a sysadmin that believes really 
setting something read-only and then trying to write to it from 
userspace, which is what btrfs does until the receive is done, should 
work, doesn't understand the meaning of read-only.

Meanwhile, Austin said most of what I'd say, probably better than I'd say 
it, so I won't repeat it here, but I agree with him.

> Even though it is security-related, I agree it isn't the highest
> priority btrfs bug. It can probably wait until receive is being worked
> on for other reasons. But if it isn't going to be fixed any time soon,
> it should be documented in the Wiki and the man page, with the suggested
> workround for anyone who needs to make sure the receive won't be
> tampered with.

One thing I was going to say in the previous post and forgot, is that not 
withstanding all the technical reasons, I do agree that documenting it in 
the manpage, etc, would be a good idea.  In that I agree with both you 
and Austin. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] btrfs: scrub: Fix RAID56 recovery race condition

2017-02-03 Thread Qu Wenruo
When scrubbing a RAID5 which has recoverable data corruption (only one
data stripe is corrupted), sometimes scrub will report more csum errors
than expected. Sometimes even unrecoverable error will be reported.

The problem can be easily reproduced by the following steps:
1) Create a btrfs with RAID5 data profile with 3 devs
2) Mount it with nospace_cache or space_cache=v2
   To avoid extra data space usage.
3) Create a 128K file and sync the fs, unmount it
   Now the 128K file lies at the beginning of the data chunk
4) Locate the physical bytenr of data chunk on dev3
   Dev3 is the 1st data stripe.
5) Corrupt the first 64K of the data chunk stripe on dev3
6) Mount the fs and scrub it

The correct csum error number should be 16(assuming using x86_64).
Larger csum error number can be reported in a 1/3 chance.
And unrecoverable error can also be reported in a 1/10 chance.

The root cause of the problem is RAID5/6 recover code has race
condition, due to the fact that full scrub is initiated per device.

While for other mirror based profiles, each mirror is independent with
each other, so race won't cause any big problem.

For example:
Corrupted   |   Correct  |  Correct|
|   Scrub dev3 (D1) |Scrub dev2 (D2) |Scrub dev1(P)|

Read out D1 |Read out D2 |Read full stripe |
Check csum  |Check csum  |Check parity |
Csum mismatch   |Csum match, continue|Parity mismatch  |
handle_errored_block||handle_errored_block |
 Read out full stripe   || Read out full stripe|
 D1 csum error(err++)   || D1 csum error(err++)|
 Recover D1 || Recover D1  |

So D1's csum error is accounted twice, just because
handle_errored_block() doesn't have enough protect, and race can happen.

On even worse case, for example D1's recovery code is re-writing
D1/D2/P, and P's recovery code is just reading out full stripe, then we
can cause unrecoverable error.

This patch will use previously introduced lock_full_stripe() and
unlock_full_stripe() to protect the whole scrub_handle_errored_block()
function for RAID56 recovery.
So no extra csum error nor unrecoverable error.

Reported-by: Goffredo Baroncelli 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/scrub.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e68369d425b0..2be7f2546e3a 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1067,6 +1067,7 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
unsigned int have_csum;
struct scrub_block *sblocks_for_recheck; /* holds one for each mirror */
struct scrub_block *sblock_bad;
+   int lock_ret;
int ret;
int mirror_index;
int page_num;
@@ -1096,6 +1097,17 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
have_csum = sblock_to_check->pagev[0]->have_csum;
dev = sblock_to_check->pagev[0]->dev;
 
+   /*
+* For RAID5/6 race can happen for different dev scrub thread.
+* For data corruption, Parity and Data thread will both try
+* to recovery the data.
+* Race can lead to double added csum error, or even unrecoverable
+* error.
+*/
+   ret = lock_full_stripe(fs_info, logical, GFP_NOFS);
+   if (ret < 0)
+   return ret;
+
if (sctx->is_dev_replace && !is_metadata && !have_csum) {
sblocks_for_recheck = NULL;
goto nodatasum_case;
@@ -1430,6 +1442,9 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
kfree(sblocks_for_recheck);
}
 
+   lock_ret = unlock_full_stripe(fs_info, logical);
+   if (lock_ret < 0)
+   return lock_ret;
return 0;
 }
 
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] btrfs: replace: Use ref counts to avoid destroying target device when canceled

2017-02-03 Thread Qu Wenruo
When dev-replace and scrub are run at the same time, dev-replace can be
canceled by scrub. It's quite common for btrfs/069.

The backtrace would be like:
general protection fault:  [#1] SMP
Workqueue: btrfs-endio-raid56 btrfs_endio_raid56_helper [btrfs]
RIP: 0010:[]  [] 
generic_make_request_checks+0x198/0x5a0
Call Trace:
 [] ? generic_make_request+0xcf/0x290
 [] generic_make_request+0x24/0x290
 [] ? generic_make_request+0xcf/0x290
 [] submit_bio+0x6e/0x120
 [] ? page_in_rbio+0x4d/0x80 [btrfs]
 [] ? rbio_orig_end_io+0x80/0x80 [btrfs]
 [] finish_rmw+0x401/0x550 [btrfs]
 [] validate_rbio_for_rmw+0x36/0x40 [btrfs]
 [] raid_rmw_end_io+0x7d/0x90 [btrfs]
 [] bio_endio+0x56/0x60
 [] end_workqueue_fn+0x3c/0x40 [btrfs]
 [] btrfs_scrubparity_helper+0xef/0x610 [btrfs]
 [] btrfs_endio_raid56_helper+0xe/0x10 [btrfs]
 [] process_one_work+0x2af/0x720
 [] ? process_one_work+0x22b/0x720
 [] worker_thread+0x4b/0x4f0
 [] ? process_one_work+0x720/0x720
 [] ? process_one_work+0x720/0x720
 [] kthread+0xf3/0x110
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x27/0x40

While in that case, target device can be destroyed at cancel time,
leading to a user-after-free bug:

 Process A (dev-replace) | Process B(scrub)
--
 |(Any RW is OK)
 |scrub_setup_recheck_block()
 ||- btrfs_map_sblock()
 |   Got a bbio with tgtdev
btrfs_dev_replace_finishing()|
|- btrfs_destory_dev_replace_tgtdev()|
   |- call_rcu(free_device)  |
  |- __free_device() |
 |- kfree(device)|
 | Scrub worker:
 | Access bbio->stripes[], which
 | contains tgtdev.
 | This triggers general protection.

The bug is mostly obvious for RAID5/6 since raid56 choose to keep old
rbio and rbio->bbio for later steal, this hugely enlarged the race
window and makes it much easier to trigger the bug.

This patch introduces 'tgtdev_refs' and 'tgtdev_wait' for btrfs_device
to wait for all its user released the target device.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dev-replace.c |  7 ++-
 fs/btrfs/volumes.c | 36 +++-
 fs/btrfs/volumes.h | 10 ++
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 5de280b9ad73..794a6a0bedf2 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -558,7 +558,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
  rcu_str_deref(src_device->name),
  src_device->devid,
  rcu_str_deref(tgt_device->name));
-   tgt_device->is_tgtdev_for_dev_replace = 0;
tgt_device->devid = src_device->devid;
src_device->devid = BTRFS_DEV_REPLACE_DEVID;
memcpy(uuid_tmp, tgt_device->uuid, sizeof(uuid_tmp));
@@ -579,6 +578,12 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
 
btrfs_dev_replace_unlock(dev_replace, 1);
 
+   /*
+* Only change is_tgtdev_for_dev_replace flag after all its
+* users get released.
+*/
+   wait_target_device(tgt_device);
+   tgt_device->is_tgtdev_for_dev_replace = 0;
btrfs_rm_dev_replace_blocked(fs_info);
 
btrfs_rm_dev_replace_remove_srcdev(fs_info, src_device);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3c3c69c0eee4..84db9fb22b7d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2064,6 +2064,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct 
btrfs_fs_info *fs_info,
WARN_ON(!tgtdev);
mutex_lock(_info->fs_devices->device_list_mutex);
 
+   wait_target_device(tgtdev);
btrfs_sysfs_rm_device_link(fs_info->fs_devices, tgtdev);
 
if (tgtdev->bdev)
@@ -2598,6 +2599,8 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
*fs_info,
device->is_tgtdev_for_dev_replace = 1;
device->mode = FMODE_EXCL;
device->dev_stats_valid = 1;
+   atomic_set(>tgtdev_refs, 0);
+   init_waitqueue_head(>tgtdev_wait);
set_blocksize(device->bdev, 4096);
device->fs_devices = fs_info->fs_devices;
list_add(>dev_list, _info->fs_devices->devices);
@@ -2624,6 +2627,8 @@ void btrfs_init_dev_replace_tgtdev_for_resume(struct 
btrfs_fs_info *fs_info,
tgtdev->sector_size = sectorsize;
tgtdev->fs_info = fs_info;
tgtdev->in_fs_metadata = 1;
+   atomic_set(>tgtdev_refs, 0);
+   init_waitqueue_head(>tgtdev_wait);
 }
 
 static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
@@ -5302,6 +5307,32 @@ static struct btrfs_bio *alloc_btrfs_bio(int 

[PATCH 1/5] btrfs: scrub: Introduce full stripe lock for RAID56

2017-02-03 Thread Qu Wenruo
Unlike mirror based profiles, RAID5/6 recovery needs to read out the
whole full stripe.

And if we don't do proper protect, it can easily cause race condition.

Introduce 2 new functions: lock_full_stripe() and unlock_full_stripe()
for RAID5/6.
Which stores a rb_tree of mutex for full stripes, so scrub callers can
use them to lock a full stripe to avoid race.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h   |   4 ++
 fs/btrfs/extent-tree.c |   3 +
 fs/btrfs/scrub.c   | 177 +
 3 files changed, 184 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6a823719b6c5..0dc0b113a691 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -639,6 +639,10 @@ struct btrfs_block_group_cache {
 * Protected by free_space_lock.
 */
int needs_free_space;
+
+   /* Scrub full stripe lock tree for RAID5/6 scrub */
+   struct rb_root scrub_lock_root;
+   spinlock_t scrub_lock;
 };
 
 /* delayed seq elem */
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dcd2e798767e..79769c168230 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -130,6 +130,7 @@ void btrfs_put_block_group(struct btrfs_block_group_cache 
*cache)
if (atomic_dec_and_test(>count)) {
WARN_ON(cache->pinned > 0);
WARN_ON(cache->reserved > 0);
+   WARN_ON(!RB_EMPTY_ROOT(>scrub_lock_root));
kfree(cache->free_space_ctl);
kfree(cache);
}
@@ -9910,6 +9911,8 @@ btrfs_create_block_group_cache(struct btrfs_fs_info 
*fs_info,
 
atomic_set(>count, 1);
spin_lock_init(>lock);
+   spin_lock_init(>scrub_lock);
+   cache->scrub_lock_root = RB_ROOT;
init_rwsem(>data_rwsem);
INIT_LIST_HEAD(>list);
INIT_LIST_HEAD(>cluster_list);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 9a94670536a6..e68369d425b0 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -240,6 +240,13 @@ struct scrub_warning {
struct btrfs_device *dev;
 };
 
+struct scrub_full_stripe_lock {
+   struct rb_node node;
+   u64 logical;
+   u64 refs;
+   struct mutex mutex;
+};
+
 static void scrub_pending_bio_inc(struct scrub_ctx *sctx);
 static void scrub_pending_bio_dec(struct scrub_ctx *sctx);
 static void scrub_pending_trans_workers_inc(struct scrub_ctx *sctx);
@@ -351,6 +358,176 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info 
*fs_info)
 }
 
 /*
+ * Caller must hold cache->scrub_root_lock.
+ *
+ * Return existing full stripe and increase it refs
+ * Or return NULL, and insert @fstripe_lock into the bg cache
+ */
+static struct scrub_full_stripe_lock *
+add_scrub_lock(struct btrfs_block_group_cache *cache,
+  struct scrub_full_stripe_lock *fstripe_lock)
+{
+   struct rb_node **p;
+   struct rb_node *parent = NULL;
+   struct scrub_full_stripe_lock *entry;
+
+   p = >scrub_lock_root.rb_node;
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct scrub_full_stripe_lock, node);
+   if (fstripe_lock->logical < entry->logical) {
+   p = &(*p)->rb_left;
+   } else if (fstripe_lock->logical > entry->logical) {
+   p = &(*p)->rb_right;
+   } else {
+   entry->refs++;
+   return entry;
+   }
+   }
+   /* Insert new one */
+   rb_link_node(_lock->node, parent, p);
+   rb_insert_color(_lock->node, >scrub_lock_root);
+
+   return NULL;
+}
+
+static struct scrub_full_stripe_lock *
+search_scrub_lock(struct btrfs_block_group_cache *cache, u64 bytenr)
+{
+   struct rb_node *node;
+   struct scrub_full_stripe_lock *entry;
+
+   node = cache->scrub_lock_root.rb_node;
+   while (node) {
+   entry = rb_entry(node, struct scrub_full_stripe_lock, node);
+   if (bytenr < entry->logical)
+   node = node->rb_left;
+   else if (bytenr > entry->logical)
+   node = node->rb_right;
+   else
+   return entry;
+   }
+   return NULL;
+}
+
+/*
+ * Helper to get full stripe logical from a normal bytenr.
+ * Thanks to the chaos of scrub structures, we need to get it all
+ * by ourselves, using btrfs_map_sblock().
+ */
+static int get_full_stripe_logical(struct btrfs_fs_info *fs_info, u64 bytenr,
+  u64 *bytenr_ret)
+{
+   struct btrfs_bio *bbio = NULL;
+   u64 len;
+   int ret;
+
+   /* Just use map_sblock() to get full stripe logical */
+   ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, bytenr,
+  , , 0, 1);
+   if (ret || !bbio || !bbio->raid_map)
+   goto error;
+   *bytenr_ret = bbio->raid_map[0];
+   btrfs_put_bbio(bbio);
+   

[PATCH 3/5] btrfs: raid56: Use correct stolen pages to calculate P/Q

2017-02-03 Thread Qu Wenruo
In the following situation, scrub will calculate wrong parity to
overwrite correct one:

RAID5 full stripe:

Before
| Dev 1  | Dev  2 | Dev 3 |
| Data stripe 1  | Data stripe 2  | Parity Stripe |
--- 0
| 0x (Bad)   | 0xcdcd | 0x|
--- 4K
| 0xcdcd | 0xcdcd | 0x|
...
| 0xcdcd | 0xcdcd | 0x|
--- 64K

After scrubbing dev3 only:

| Dev 1  | Dev  2 | Dev 3 |
| Data stripe 1  | Data stripe 2  | Parity Stripe |
--- 0
| 0xcdcd (Good)  | 0xcdcd | 0xcdcd (Bad)  |
--- 4K
| 0xcdcd | 0xcdcd | 0x|
...
| 0xcdcd | 0xcdcd | 0x|
--- 64K

The calltrace of such corruption is as following:

scrub_bio_end_io_worker() get called for each extent read out
|- scriub_block_complete()
   |- Data extent csum mismatch
   |- scrub_handle_errored_block
  |- scrub_recheck_block()
 |- scrub_submit_raid56_bio_wait()
|- raid56_parity_recover()

Now we have a rbio with correct data stripe 1 recovered.
Let's call it "good_rbio".

scrub_parity_check_and_repair()
|- raid56_parity_submit_scrub_rbio()
   |- lock_stripe_add()
   |  |- steal_rbio()
   | |- Recovered data are steal from "good_rbio", stored into
   |rbio->stripe_pages[]
   |Now rbio->bio_pages[] are bad data read from disk.
   |- async_scrub_parity()
  |- scrub_parity_work() (delayed_call to scrub_parity_work)

scrub_parity_work()
|- raid56_parity_scrub_stripe()
   |- validate_rbio_for_parity_scrub()
  |- finish_parity_scrub()
 |- Recalculate parity using *BAD* pages in rbio->bio_pages[]
So good parity is overwritten with *BAD* one

The fix is to introduce 2 new members, bad_ondisk_a/b, to struct
btrfs_raid_bio, to info scrub code to use correct data pages to
re-calculate parity.

Reported-by: Goffredo Baroncelli 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/raid56.c | 62 +++
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index d2a9a1ee5361..453eefdcb591 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -133,6 +133,16 @@ struct btrfs_raid_bio {
/* second bad stripe (for raid6 use) */
int failb;
 
+   /*
+* For steal_rbio, we can steal recovered correct page,
+* but in finish_parity_scrub(), we still use bad on-disk
+* page to calculate parity.
+* Use these members to info finish_parity_scrub() to use
+* correct pages
+*/
+   int bad_ondisk_a;
+   int bad_ondisk_b;
+
int scrubp;
/*
 * number of pages needed to represent the full
@@ -310,6 +320,12 @@ static void steal_rbio(struct btrfs_raid_bio *src, struct 
btrfs_raid_bio *dest)
if (!test_bit(RBIO_CACHE_READY_BIT, >flags))
return;
 
+   /* Record recovered stripe number */
+   if (src->faila != -1)
+   dest->bad_ondisk_a = src->faila;
+   if (src->failb != -1)
+   dest->bad_ondisk_b = src->failb;
+
for (i = 0; i < dest->nr_pages; i++) {
s = src->stripe_pages[i];
if (!s || !PageUptodate(s)) {
@@ -999,6 +1015,8 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
btrfs_fs_info *fs_info,
rbio->stripe_npages = stripe_npages;
rbio->faila = -1;
rbio->failb = -1;
+   rbio->bad_ondisk_a = -1;
+   rbio->bad_ondisk_b = -1;
atomic_set(>refs, 1);
atomic_set(>error, 0);
atomic_set(>stripes_pending, 0);
@@ -2261,6 +2279,9 @@ static int alloc_rbio_essential_pages(struct 
btrfs_raid_bio *rbio)
int bit;
int index;
struct page *page;
+   struct page *bio_page;
+   void *ptr;
+   void *bio_ptr;
 
for_each_set_bit(bit, rbio->dbitmap, rbio->stripe_npages) {
for (i = 0; i < rbio->real_stripes; i++) {
@@ -2271,6 +2292,23 @@ static int alloc_rbio_essential_pages(struct 
btrfs_raid_bio *rbio)
page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
if (!page)
return -ENOMEM;
+
+   /*
+* It's possible that only several pages need recover,
+* and rest are all good.
+* In that case we need to copy good bio pages into
+* stripe_pages[], as we will use stripe_pages[] other
+* than bio_pages[] in finish_parity_scrub().
+*/
+  

[PATCH 4/5] btrfs: raid56: Don't keep rbio for later steal

2017-02-03 Thread Qu Wenruo
Before this patch, btrfs raid56 will keep raid56 rbio even all its IO is
done.
This may save some time allocating rbio, but it can cause deadly
use-after-free bug, for the following case:

Original fs: 4 devices RAID5

   Process A |  Process B
--
 |  Start device replace
 |Now the fs has 5 devices
 |devid 0: replace device
 |devid 1~4: old devices
btrfs_map_bio()  |
|- __btrfs_map_block()   |
|bbio has 5 stripes  |
|including devid 0   |
|- raid56_parity_write() |
 |
raid_write_end_io()  |
|- rbio_orig_end_io()|
   |- unlock_stripe()|
   Keeps the old rbio for|
   later steal, which has|
   stripe for devid 0|
 |  Cancel device replace
 |Now the fs has 4 devices
 |devid 0 is freed
Some IO happens  |
raid_write_end_io()  |
|- rbio_orig_end_io()|
   |- unlock_stripe()|
  |- steal_rbio()|
   Use old rbio, whose   |
   bbio has freed devid 0|
   stripe|
Any access to rbio->bbio will|
cause general protection or NULL |
pointer dereference  |

Such bug can already be triggered by fstests btrfs/069 for RAID5/6
profiles.

Fix it by not keeping old rbio in unlock_stripe(), so we just free the
finished rbio and rbio->bbio, so above problem wont' happen.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/raid56.c | 18 +-
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 453eefdcb591..aba82b95ec73 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -776,7 +776,6 @@ static noinline void unlock_stripe(struct btrfs_raid_bio 
*rbio)
int bucket;
struct btrfs_stripe_hash *h;
unsigned long flags;
-   int keep_cache = 0;
 
bucket = rbio_bucket(rbio);
h = rbio->fs_info->stripe_hash_table->table + bucket;
@@ -788,19 +787,6 @@ static noinline void unlock_stripe(struct btrfs_raid_bio 
*rbio)
spin_lock(>bio_list_lock);
 
if (!list_empty(>hash_list)) {
-   /*
-* if we're still cached and there is no other IO
-* to perform, just leave this rbio here for others
-* to steal from later
-*/
-   if (list_empty(>plug_list) &&
-   test_bit(RBIO_CACHE_BIT, >flags)) {
-   keep_cache = 1;
-   clear_bit(RBIO_RMW_LOCKED_BIT, >flags);
-   BUG_ON(!bio_list_empty(>bio_list));
-   goto done;
-   }
-
list_del_init(>hash_list);
atomic_dec(>refs);
 
@@ -848,13 +834,11 @@ static noinline void unlock_stripe(struct btrfs_raid_bio 
*rbio)
goto done_nolock;
}
}
-done:
spin_unlock(>bio_list_lock);
spin_unlock_irqrestore(>lock, flags);
 
 done_nolock:
-   if (!keep_cache)
-   remove_rbio_from_cache(rbio);
+   remove_rbio_from_cache(rbio);
 }
 
 static void __free_raid_bio(struct btrfs_raid_bio *rbio)
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] raid56: variant bug fixes

2017-02-03 Thread Qu Wenruo
This patchset can be fetched from my github repo:
https://github.com/adam900710/linux.git raid56_fixes

It's based on v4.10-rc6 and none of the patch is modified after its first
appearance in mail list.

The patchset fixes the following bugs:

1) False alert or wrong csum error number when scrubbing RAID5/6
   The bug itself won't cause any damage to fs, just pure race.

2) Corrupted data stripe rebuild corrupts P/Q
   So scrub makes one error into another, not really fixing anything

3) Use-after-free caused by cancelling dev-replace 
   This is quite a deadly bug, since cancelling dev-replace can easily
   cause kernel panic, and thanks to raid bio steal, it makes the race
   windows quite large.

   Can be triggered by btrfs/069.

After all the fixes applied, no scrub/replace related regression can be
detected.

Qu Wenruo (5):
  btrfs: scrub: Introduce full stripe lock for RAID56
  btrfs: scrub: Fix RAID56 recovery race condition
  btrfs: raid56: Use correct stolen pages to calculate P/Q
  btrfs: raid56: Don't keep rbio for later steal
  btrfs: replace: Use ref counts to avoid destroying target device when
canceled

 fs/btrfs/ctree.h   |   4 ++
 fs/btrfs/dev-replace.c |   7 +-
 fs/btrfs/extent-tree.c |   3 +
 fs/btrfs/raid56.c  |  80 +++--
 fs/btrfs/scrub.c   | 192 +
 fs/btrfs/volumes.c |  36 +-
 fs/btrfs/volumes.h |  10 +++
 7 files changed, 309 insertions(+), 23 deletions(-)

-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html