Re: How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds

2014-03-18 Thread Chris Murphy

On Mar 19, 2014, at 12:09 AM, Marc MERLIN  wrote:
> 
> 7) you can remove a drive from an array, add files, and then if you plug
>   the drive in, it apparently gets auto sucked in back in the array.
> There is no rebuild that happens, you now have an inconsistent array where
> one drive is not at the same level than the other ones (I lost all files I 
> added 
> after the drive was removed when I added the drive back).

Seems worthy of a dedicated bug report and keeping an eye on in the future, not 
good.

>> 
>> polgara:/mnt/btrfs_backupcopy# df -h .
>> Filesystem  Size  Used Avail Use% Mounted on
>> /dev/mapper/crypt_sdb1  4.1T  3.0M  4.1T   1% /mnt/btrfs_backupcopy
> 
> Let's add one drive
>> polgara:/mnt/btrfs_backupcopy# btrfs device add -f /dev/mapper/crypt_sdm1 
>> /mnt/btrfs_backupcopy/
>> polgara:/mnt/btrfs_backupcopy# df -h .
>> Filesystem  Size  Used Avail Use% Mounted on
>> /dev/mapper/crypt_sdb1  4.6T  3.0M  4.6T   1% /mnt/btrfs_backupcopy
> 
> Oh look it's bigger now. We need to manual rebalance to use the new drive:

You don't have to. As soon as you add the additional drive, newly allocated 
chunks will stripe across all available drives. e.g. 1 GB allocations striped 
across 3x drives, if I add a 4th drive, initially any additional writes are 
only to the first three drives but once a new data chunk is allocated it gets 
striped across 4 drives.


> 
> In other words, btrfs happily added my device that was way behind and gave me 
> an incomplete fileystem instead of noticing
> that sdj1 was behind and giving me a degraded filesystem.
> Moral of the story: do not ever re-add a device that got kicked out if you 
> wrote new data after that, or you will end up with an older version of your 
> filesystem (on the plus side, it's consistent and apparently without data 
> corruption. That said, btrfs scrub complained loudly of many errors it didn't 
> know how to fix.

Sure the whole thing isn't corrupt. But if anything written while degraded 
vanishes once the missing device is reattached, and you remount normally 
(non-degraded), that's data loss. Yikes!


> There you go, hope this helps.

Yes. Thanks!

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: Fix minor problems in man page of btrfs

2014-03-18 Thread quwen...@cn.fujitsu.com
Man page of btrfs has some minor problem like:
1. Duplicant entry for "filesystem df"
2. Inconsistent parameters
3. Non-paired parens
4. Missing options
5. Wrong parameters

This patch fixes these minor bug.
Signed-off-by: Qu Wenruo 
---
 man/btrfs.8.in | 184 -
 1 file changed, 102 insertions(+), 82 deletions(-)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 7fbde82..3846f19 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -23,9 +23,9 @@ btrfs \- control a btrfs filesystem
 \fBbtrfs\fP \fBsubvolume show\fP\fI \fP
 .PP
 .PP
-\fBbtrfs\fP \fBfilesystem df\fP\fI \fP
+\fBbtrfs\fP \fBfilesystem df\fP\fI [-b] \fIpath [path..]\fR\fP
 .PP
-\fBbtrfs\fP \fBfilesystem show\fP 
[\fI--mounted\fP|\fI--all-devices\fP|\fI\fP]\fP
+\fBbtrfs\fP \fBfilesystem show\fP 
[\fI--mounted\fP|\fI--all-devices\fP|\fI\fP|\fI\fP|\fI\fP|\fI\fP]\fP
 .PP
 \fBbtrfs\fP \fBfilesystem sync\fP\fI  \fP
 .PP
@@ -35,12 +35,11 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBfilesystem label\fP [\fI\fP|\fI\fP] 
[\fI\fP]
 .PP
-\fBbtrfs\fP \fBfilesystem filesystem disk-usage [-t][-b]\fP\fI  
+\fBbtrfs\fP \fBfilesystem disk-usage [-tb]\fP\fI  
 [path..]\fP
 .PP
-\fBbtrfs\fP \fBfilesystem df\fP\fI [-b] \fIpath [path..]\fR\fP
 .PP
-\fBbtrfs\fP \fBfilesystem balance\fP\fI  \fP
+\fBbtrfs\fP \fB[filesystem] balance\fP\fI  \fP
 .PP
 \fBbtrfs\fP \fB[filesystem] balance start\fP [\fIoptions\fP] \fI\fP
 .PP
@@ -57,11 +56,10 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBdevice delete\fP \fI\fP [\fI...\fP] \fI\fP
 .PP
-\fBbtrfs\fP \fBdevice scan\fP [\fI--all-devices\fP|\fI 
\P[\fI...\fP]
+\fBbtrfs\fP \fBdevice scan\fP [(\fI-d\fP|\fI--all-devices\fP)|\fI\fP 
[\fI...\fP]]
 .PP
 \fBbtrfs\fP \fBdevice disk-usage\fP\fI [-b]  [...] \fP
 .PP
-.PP
 \fBbtrfs\fP \fBdevice ready\fP\fI \fP
 .PP
 \fBbtrfs\fP \fBdevice stats\fP [-z] {\fI\fP|\fI\fP}
@@ -78,11 +76,11 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBcheck\fP [\fIoptions\fP] \fI\fP
 .PP
-\fBbtrfs\fP \fBrescue chunk-recover\fP [\fIoptions\fP] \fI\fP
+\fBbtrfs\fP \fBrescue chunk-recover\fP [\fIoptions\fP] \fI\fP
 .PP
-\fBbtrfs\fP \fBrescue super-recover\fP [\fIoptions\fP] \fI\fP
+\fBbtrfs\fP \fBrescue super-recover\fP [\fIoptions\fP] \fI\fP
 .PP
-\fBbtrfs\fP \fBrestore\fP [\fIoptions\fP] \fI\fP
+\fBbtrfs\fP \fBrestore\fP [\fIoptions\fP] \fI\fP \fI\fP | -l 
\fI\fP
 .PP
 .PP
 \fBbtrfs\fP \fBinspect-internal inode-resolve\fP [-v] \fI\fP 
\fI\fP
@@ -103,7 +101,7 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBquota disable\fP\fI \fP
 .PP
-\fBbtrfs\fP \fBquota rescan\fP [-s] \fI\fP
+\fBbtrfs\fP \fBquota rescan\fP [-sw] \fI\fP
 .PP
 .PP
 \fBbtrfs\fP \fBqgroup assign\fP \fI\fP \fI\fP \fI\fP
@@ -114,7 +112,7 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBqgroup destroy\fP \fI\fP \fI\fP
 .PP
-\fBbtrfs\fP \fBqgroup show\fP \fI\fP
+\fBbtrfs\fP \fBqgroup show\fP [\fIoptions\fP] \fI\fP
 .PP
 \fBbtrfs\fP \fBqgroup limit\fP [\fIoptions\fP] \fI\fP|\fBnone\fP 
[\fI\fP] \fI\fP
 .PP
@@ -286,12 +284,55 @@ List the recently modified files in a subvolume, after 
\fI\fR ID.
 Show information of a given subvolume in the \fI\fR.
 .TP
 
-\fBfilesystem df\fP\fI \fR
+\fBfilesystem df\fP [-b] \fIpath [path..]\fR
+
 Show space usage information for a mount point.
+
+\fB-b\fP Set byte as unit
+
+The command \fBbtrfs filesystem df\fP is used to query how many space on the 
+disk(s) are used and an estimation of the free
+space of the filesystem.
+The output of the command \fBbtrfs filesystem df\fP shows:
+
+.RS
+.IP \fBDisk\ size\fP
+the total size of the disks which compose the filesystem.
+
+.IP \fBDisk\ allocated\fP
+the size of the area of the disks used by the chunks.
+
+.IP \fBDisk\ unallocated\fP 
+the size of the area of the disks which is free (i.e.
+the differences of the values above).
+
+.IP \fBUsed\fP
+the portion of the logical space used by the file and metadata.
+
+.IP \fBFree\ (estimated)\fP
+the estimated free space available: i.e. how many space can be used
+by the user. The evaluation 
+cannot be rigorous because it depends by the allocation policy (DUP, Single,
+RAID1...) of the metadata and data chunks. If every chunk is stored as
+"Single" the sum of the \fBfree (estimated)\fP space and the \fBused\fP 
+space  is equal to the \fBdisk size\fP.
+Otherwise if all the chunk are mirrored (raid1 or raid10) or duplicated
+the sum of the \fBfree (estimated)\fP space and the \fBused\fP space is
+half of the \fBdisk size\fP. Normally the \fBfree (estimated)\fP is between
+these two limits.
+
+.IP \fBData\ to\ disk\ ratio\fP
+the ratio betwen the \fBlogical size\fP (i.e. the space available by
+the chunks) and the \fBdisk allocated\fP (by the chunks). Normally it is 
+lower than 100% because the metadata is duplicated for security reasons.
+If all the data and metadata are duplicated (or have a profile like RAID1)
+the \fBData\ to\ disk\ ratio\fP could be 50%.
+.RE
 .TP
 
-\fBfilesystem show\fR [\fI--mounted\fP|\

[PATCH 3/3] btrfs-progs: Modify the help string to keep consistent with man page.

2014-03-18 Thread quwen...@cn.fujitsu.com
Help string of "btrfs dev scan" is inconsistent with man page,
which lacks the fact that -d|--all-device is conflict with .
This patch fixes the description

Signed-off-by: Qu Wenruo 
---
 cmds-device.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/cmds-device.c b/cmds-device.c
index 58a336f..a9b4a38 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -188,9 +188,8 @@ static int cmd_rm_dev(int argc, char **argv)
 }
 
 static const char * const cmd_scan_dev_usage[] = {
-   "btrfs device scan [options] [ [...]]",
+   "btrfs device scan [(-d|--all-devices)| [...]]",
"Scan devices for a btrfs filesystem",
-   "-d|--all-devicesscan all devices under /dev",
NULL
 };
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: Fix memleak in get_raid56_used()

2014-03-18 Thread quwen...@cn.fujitsu.com
Fix memleak in get_raid56_used().

Signed-off-by: Qu Wenruo 
---
 cmds-fi-disk_usage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/cmds-fi-disk_usage.c b/cmds-fi-disk_usage.c
index a3b06be..2bd591d 100644
--- a/cmds-fi-disk_usage.c
+++ b/cmds-fi-disk_usage.c
@@ -352,6 +352,7 @@ static int get_raid56_used(int fd, u64 *raid5_used, u64 
*raid6_used)
if (p->type & BTRFS_BLOCK_GROUP_RAID6)
(*raid6_used) += p->size / (p->num_stripes -2);
}
+   free(info_ptr);
 
return 0;
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds

2014-03-18 Thread Marc MERLIN
On Tue, Mar 18, 2014 at 09:02:07AM +, Duncan wrote:
> First just a note that you hijacked Mr Manana's patch thread.  Replying 
(...)
I did, I use mutt, I know about in Reply-To, I was tired, I screwed up,
sorry, and there was no undo :)

> Since you don't have to worry about the data I'd suggest blowing it away 
> and starting over.  Btrfs raid5/6 code is known to be incomplete at this 
> point, to work in normal mode and write everything out, but with 
> incomplete recovery code.  So I'd treat it like the raid-0 mode it 
> effectively is, and consider it lost if a device drops.
>
> Which I haven't.  My use-case wouldn't be looking at raid5/6 (or raid0) 
> anyway, but even if it were, I'd not touch the current code unless it 
> /was/ just for something I'd consider risking on a raid0.  Other than 

Thank you for the warning, and yes I know the risk and the data I'm putting
on it is ok with that risk :)

So, I was bit quiet because I diagnosed problems with the underlying
hardware.
My disk array was creating disk faults due to insufficient power coming in.

Now that I fixed that and made sure the drives work with a full run of
hdrecover of all the drives in parallel (exercises the drives while making
sure all their blocks work), I did tests again:

Summary:
1) You can grow and shrink a raid5 volume while it's mounted => very cool
2) shrinking causes a rebalance
3) growing requires you to run rebalance
4) btrfs cannot replace a drive in raid5, whether it's there or not
   that's the biggest thing missing: just no rebuilds in any way
5) you can mount a raid5 with a missing device with -o degraded
6) adding a drive to a degraded arrays will grow the array, not rebuild
   the missing bits
7) you can remove a drive from an array, add files, and then if you plug
   the drive in, it apparently gets auto sucked in back in the array.
There is no rebuild that happens, you now have an inconsistent array where
one drive is not at the same level than the other ones (I lost all files I 
added 
after the drive was removed when I added the drive back).

In other words, everything seems to work except there is no rebuild that I 
could 
see anywhere.

Here are all the details:

Creation
> polgara:/dev/disk/by-id# mkfs.btrfs -f -d raid5 -m raid5 -L backupcopy 
> /dev/mapper/crypt_sd[bdfghijkl]1
> 
> WARNING! - Btrfs v3.12 IS EXPERIMENTAL
> WARNING! - see http://btrfs.wiki.kernel.org before using
> 
> Turning ON incompat feature 'extref': increased hardlink limit per file to 
> 65536
> Turning ON incompat feature 'raid56': raid56 extended format
> adding device /dev/mapper/crypt_sdd1 id 2
> adding device /dev/mapper/crypt_sdf1 id 3
> adding device /dev/mapper/crypt_sdg1 id 4
> adding device /dev/mapper/crypt_sdh1 id 5
> adding device /dev/mapper/crypt_sdi1 id 6
> adding device /dev/mapper/crypt_sdj1 id 7
> adding device /dev/mapper/crypt_sdk1 id 8
> adding device /dev/mapper/crypt_sdl1 id 9
> fs created label backupcopy on /dev/mapper/crypt_sdb1
> nodesize 16384 leafsize 16384 sectorsize 4096 size 4.09TiB
> polgara:/dev/disk/by-id# mount -L backupcopy /mnt/btrfs_backupcopy/
> polgara:/mnt/btrfs_backupcopy# df -h .
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/mapper/crypt_sdb1  4.1T  3.0M  4.1T   1% /mnt/btrfs_backupcopy

Let's add one drive
> polgara:/mnt/btrfs_backupcopy# btrfs device add -f /dev/mapper/crypt_sdm1 
> /mnt/btrfs_backupcopy/
> polgara:/mnt/btrfs_backupcopy# df -h .
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/mapper/crypt_sdb1  4.6T  3.0M  4.6T   1% /mnt/btrfs_backupcopy

Oh look it's bigger now. We need to manual rebalance to use the new drive:
> polgara:/mnt/btrfs_backupcopy# btrfs balance start . 
> Done, had to relocate 6 out of 6 chunks
> 
> polgara:/mnt/btrfs_backupcopy#  btrfs device delete /dev/mapper/crypt_sdm1 .
> BTRFS info (device dm-9): relocating block group 23314563072 flags 130
> BTRFS info (device dm-9): relocating block group 22106603520 flags 132
> BTRFS info (device dm-9): found 6 extents
> BTRFS info (device dm-9): relocating block group 12442927104 flags 129
> BTRFS info (device dm-9): found 1 extents
> polgara:/mnt/btrfs_backupcopy# df -h .
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/mapper/crypt_sdb1  4.1T  4.7M  4.1T   1% /mnt/btrfs_backupcopy

Ah, it's smaller again. Note that it's not degraded, you can just keep removing 
drives
and it'll do a force reblance to fit the data in the remaining drives.

Ok, I've unounted the filesystem, and will manually remove a device:
> polgara:~# dmsetup remove crypt_sdl1
> polgara:~# mount -L backupcopy /mnt/btrfs_backupcopy/
> mount: wrong fs type, bad option, bad superblock on /dev/mapper/crypt_sdk1,
>missing codepage or helper program, or other error
>In some cases useful info is found in syslog - try
>dmesg | tail  or so
> BTRFS: open /dev/dm-9 failed
> BTRFS info (device dm-7): disk space caching is enabled
> BTRFS: failed to read chunk tree on dm

Please advise on repair action

2014-03-18 Thread Adam Khan
Hello,

I have a simple btrfs located on a dm-crypt volume. I'm getting a general 
protection fault when I 
attempt to access a specific directory in Thunar file manager and in a Python 
program.

The trace is attached for Thunar.

btrfsck returns this:

Checking filesystem on /dev/mapper/xyz_crypt
UUID: ...
found 88316880601 bytes used err is 1
total csum bytes: 180423792
total tree bytes: 291459072
total fs tree bytes: 50192384
total extent tree bytes: 12898304
btree space waste bytes: 55087032
file data blocks allocated: 352826490880
 referenced 184697802752
Btrfs v3.12

How should I proceed to repair this fs?

Best regards,

Adam
[  313.491347] general protection fault:  [#1] SMP 
[  313.491387] Modules linked in: ccm xt_conntrack xt_LOG xt_limit xt_tcpudp 
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_filter ip_tables x_tables rfcomm bnep deflate ctr 
twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common 
camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic lrw 
gf128mul glue_helper blowfish_generic blowfish_x86_64 blowfish_common 
cast5_generic cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 
sha512_ssse3 sha512_generic hmac crypto_null af_key xfrm_algo nfsd auth_rpcgss 
oid_registry nfs_acl nfs lockd fscache sunrpc ext4 mbcache jbd2 fuse parport_pc 
ppdev lp parport hid_generic joydev hid_lenovo_tpkbd usbhid hid sg btusb 
bluetooth crc16 usb_storage iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant 
coretemp kvm_intel kvm psmouse serio_raw pcspkr evdev i2c_i801 lpc_ich mfd_core 
arc4 iwldvm mac80211 iwlwifi cfg80211 wmi battery thinkpad_acpi nvram rfkill ac 
snd_hda_intel snd_hda_codec tpm_tis snd_hwdep snd_pcm tpm snd_page_alloc 
snd_seq snd_seq_device snd_timer i915 snd video uhci_hcd ehci_pci 
drm_kms_helper button acpi_cpufreq ehci_hcd drm i2c_algo_bit e1000e i2c_core 
mei_me processor mei ptp pps_core soundcore usbcore usb_common btrfs crc32c 
libcrc32c xor raid6_pq sha256_ssse3 sha256_generic cbc dm_crypt dm_mod sd_mod 
crc_t10dif crct10dif_common ahci libahci libata scsi_mod thermal thermal_sys
[  313.492281] CPU: 1 PID: 3946 Comm: Thunar Not tainted 3.13-1-amd64 #1 Debian 
3.13.5-1
[  313.492313] Hardware name: LENOVO 7454CTO/7454CTO, BIOS 6DET71WW (3.21 ) 
12/13/2011
[  313.492345] task: 88022fe1c010 ti: 88022f6d8000 task.ti: 
88022f6d8000
[  313.492376] RIP: 0010:[]  [] 
memcpy+0xd/0x110
[  313.492414] RSP: 0018:88022f6d9970  EFLAGS: 00010206
[  313.492438] RAX: 8800aa2528b5 RBX: 034b RCX: 0069
[  313.492467] RDX: 0003 RSI: db738800 RDI: 8800aa2528b5
[  313.492496] RBP: 880225b9e9c0 R08:  R09: 1000
[  313.492525] R10:  R11:  R12: 6db6db6db6db6db7
[  313.492554] R13: 1600 R14: 8800aa252c00 R15: 034b
[  313.492584] FS:  7fe3282f7a00() GS:88023bc8() 
knlGS:
[  313.492620] CS:  0010 DS:  ES:  CR0: 80050033
[  313.492643] CR2: 7fe2e0029228 CR3: b7625000 CR4: 000407e0
[  313.492673] Stack:
[  313.492683]  a013f168  8800b8289000 
880225ac8c40
[  313.492724]   0c00 880225615330 
880227448658
[  313.492764]  a0125064 880225b9e8f0 1000 
8800aa252000
[  313.492804] Call Trace:
[  313.492836]  [] ? read_extent_buffer+0xc8/0x120 [btrfs]
[  313.492877]  [] ? btrfs_get_extent+0x8f4/0x950 [btrfs]
[  313.492917]  [] ? set_state_bits+0x34/0x70 [btrfs]
[  313.492957]  [] ? __do_readpage+0x378/0x730 [btrfs]
[  313.492995]  [] ? lock_extent_bits+0x6d/0x1c0 [btrfs]
[  313.493034]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  313.493075]  [] ? 
__extent_readpages.constprop.42+0x2d2/0x2f0 [btrfs]
[  313.493119]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  313.493160]  [] ? extent_readpages+0x182/0x190 [btrfs]
[  313.493201]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  313.493234]  [] ? alloc_pages_current+0x97/0x150
[  313.493264]  [] ? __do_page_cache_readahead+0x193/0x240
[  313.493293]  [] ? ondemand_readahead+0x14a/0x280
[  313.493322]  [] ? generic_file_aio_read+0x4be/0x6e0
[  313.493350]  [] ? do_sync_read+0x57/0x90
[  313.493376]  [] ? vfs_read+0x8b/0x160
[  313.493399]  [] ? SyS_read+0x43/0xa0
[  313.493424]  [] ? system_call_fastpath+0x16/0x1b
[  313.493451] Code: fc ff ff 48 8b 43 58 48 2b 43 50 88 43 4e eb e9 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07  48 
a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 
[  313.493686] RIP  [] memcpy+0xd/0x110
[  313.493713]  RSP 
[  313.500471] ---[ end trace a08695abfe727a2b ]---


Re: [PATCH 5/6] Btrfs-progs: fsck: reduce memory usage of extent record struct

2014-03-18 Thread Wang Shilong

On 03/19/2014 02:18 AM, David Sterba wrote:

On Tue, Mar 18, 2014 at 08:02:46PM +0800, Wang Shilong wrote:

@@ -2742,7 +2742,10 @@ static int add_extent_rec(struct cache_tree 
*extent_cache,
-   rec->found_rec = extent_rec;
+   if (extent_rec)
+   rec->found_rec = 1;
+   else
+   rec->found_rec = 0;

I've modified this to avoid 'if'

rec->found_rec = !!extent_rec;

Dave, thanks for doing this.:-)


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] Btrfs-progs: fsck: reduce memory usage of extent record struct

2014-03-18 Thread David Sterba
On Tue, Mar 18, 2014 at 08:02:46PM +0800, Wang Shilong wrote:
> @@ -2742,7 +2742,10 @@ static int add_extent_rec(struct cache_tree 
> *extent_cache,
> - rec->found_rec = extent_rec;
> + if (extent_rec)
> + rec->found_rec = 1;
> + else
> + rec->found_rec = 0;

I've modified this to avoid 'if'

rec->found_rec = !!extent_rec;

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] xfstests: add test for btrfs send regarding directory moves/renames

2014-03-18 Thread Filipe David Borba Manana
Regression test for a btrfs incremental send issue where the kernel entered
an infinite loop building a path string. This happened when either of the 2
following cases happened:

1) A directory was made a child of another directory which has a lower inode
   number and has a pending move/rename operation;

2) A directory was made a child of another directory which has a higher inode
   number, but the new parent wasn't moved nor renamed. Instead some other
   ancestor higher in the hierarchy, with an higher inode number too, was
   moved/renamed too.

This issue is fixed by the following linux kernel btrfs patch:

   Btrfs: fix incremental send's decision to delay a dir move/rename
   Btrfs: part 2, fix incremental send's decision to delay a dir move/rename

Signed-off-by: Filipe David Borba Manana 
---

V2: Added more tests.
V3: Added more tests for more complex cases.

 tests/btrfs/045 |  214 +++
 tests/btrfs/045.out |1 +
 tests/btrfs/group   |1 +
 3 files changed, 216 insertions(+)
 create mode 100755 tests/btrfs/045
 create mode 100644 tests/btrfs/045.out

diff --git a/tests/btrfs/045 b/tests/btrfs/045
new file mode 100755
index 000..85201e3
--- /dev/null
+++ b/tests/btrfs/045
@@ -0,0 +1,214 @@
+#! /bin/bash
+# FS QA Test No. btrfs/045
+#
+# Regression test for a btrfs incremental send issue where the kernel entered
+# an infinite loop building a path string. This happened when either of the
+# 2 following cases happened:
+#
+# 1) A directory was made a child of another directory which has a lower inode
+#number and has a pending move/rename operation;
+#
+# 2) A directory was made a child of another directory which has a higher inode
+#number, but the new parent wasn't moved nor renamed. Instead some other
+#ancestor higher in the hierarchy, with an higher inode number too, was
+#moved/renamed too.
+#
+# This issue is fixed by the following linux kernel btrfs patch:
+#
+#   Btrfs: fix incremental send's decision to delay a dir move/rename
+#   Btrfs: part 2, fix incremental send's decision to delay a dir move/rename
+#
+#---
+# Copyright (c) 2014 Filipe Manana.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=`mktemp -d`
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fssum
+_need_to_be_root
+
+rm -f $seqres.full
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+# case 1), mentioned above
+mkdir -p $SCRATCH_MNT/a/b
+mkdir $SCRATCH_MNT/a/c
+mkdir $SCRATCH_MNT/a/b/d
+touch $SCRATCH_MNT/a/file1
+touch $SCRATCH_MNT/a/b/file2
+mv $SCRATCH_MNT/a/file1 $SCRATCH_MNT/a/b/d/file3
+ln $SCRATCH_MNT/a/b/d/file3 $SCRATCH_MNT/a/b/file4
+mkdir $SCRATCH_MNT/a/b/f
+mv $SCRATCH_MNT/a/b $SCRATCH_MNT/a/c/b2
+touch $SCRATCH_MNT/a/c/b2/d/file5
+
+# case 2), mentioned above
+mkdir -p $SCRATCH_MNT/a/x1/x2
+mkdir $SCRATCH_MNT/a/Z
+mkdir -p $SCRATCH_MNT/a/x1/x2/x3/x4/x5
+
+# case 2) again, but a more complex scenario
+mkdir -p $SCRATCH_MNT/_a/_b/_c/_d
+mkdir $SCRATCH_MNT/_a/_b/_c/_d/_e
+mkdir $SCRATCH_MNT/_a/_b/_c/_d/_f
+mv $SCRATCH_MNT/_a/_b/_c/_d/_e $SCRATCH_MNT/_a/_b/_c/_d/_f/_E2
+mkdir $SCRATCH_MNT/_a/_b/_c/_g
+mv $SCRATCH_MNT/_a/_b/_c/_d $SCRATCH_MNT/_a/_b/_D2
+
+# Filesystem looks like:
+#
+# .   (ino 256)
+# |-- a/  (ino 257)
+# |   |-- c/  (ino 259)
+# |   |   |-- b2/ (ino 258)
+# |   |   |-- d/  (ino 260)
+# |   |   |   |-- file3   (ino 261)
+# |   |   |   |-- file5   (ino 264)
+# |   |   |
+# |   |   |-- file2   (ino 262)
+# |   |   |-- file4   (ino 261)
+# |   |   |-- f/  (ino 263)
+# |   |
+# |   |-- x1/ (ino 265)
+# |   |   |-- x2/ (ino 266)
+# |   |   |-- x3/ (ino 268)
+# |   |   |-- x4/ (ino 269)
+# |   |   |--

[PATCH] Btrfs: remove unnecessary inode generation lookup in send

2014-03-18 Thread Filipe David Borba Manana
No need to search in the send tree for the generation number of the inode,
we already have it in the recorded_ref structure passed to us.

Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/send.c |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 5d757ee..db4b10c 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3179,7 +3179,7 @@ static int wait_for_parent_move(struct send_ctx *sctx,
int ret;
u64 ino = parent_ref->dir;
u64 parent_ino_before, parent_ino_after;
-   u64 new_gen, old_gen;
+   u64 old_gen;
struct fs_path *path_before = NULL;
struct fs_path *path_after = NULL;
int len1, len2;
@@ -3198,12 +3198,7 @@ static int wait_for_parent_move(struct send_ctx *sctx,
else if (ret < 0)
return ret;
 
-   ret = get_inode_info(sctx->send_root, ino, NULL, &new_gen,
-NULL, NULL, NULL, NULL);
-   if (ret < 0)
-   return ret;
-
-   if (new_gen != old_gen)
+   if (parent_ref->dir_gen != old_gen)
return 0;
 
path_before = fs_path_alloc();
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Btrfs: part 2, fix incremental send's decision to delay a dir move/rename

2014-03-18 Thread Filipe David Borba Manana
For an incremental send, fix the process of determining whether the directory
inode we're currently processing needs to have its move/rename operation 
delayed.

We were ignoring the fact that if the inode's new immediate ancestor has a 
higher
inode number than ours but wasn't renamed/moved, we might still need to delay 
our
move/rename, because some other ancestor directory higher in the hierarchy might
have an inode number higher than ours *and* was renamed/moved too - in this case
we have to wait for rename/move of that ancestor to happen before our current
directory's rename/move operation.

Simple steps to reproduce this issue:

  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdd /mnt

  $ mkdir -p /mnt/a/x1/x2
  $ mkdir /mnt/a/Z
  $ mkdir -p /mnt/a/x1/x2/x3/x4/x5

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1
  $ btrfs send /mnt/snap1 -f /tmp/base.send

  $ mv /mnt/a/x1/x2/x3 /mnt/a/Z/X33
  $ mv /mnt/a/x1/x2 /mnt/a/Z/X33/x4/x5/X22

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send

The incremental send caused the kernel code to enter an infinite loop when
building the path string for directory Z after its references are processed.

A more complex scenario:

  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdd /mnt

  $ mkdir -p /mnt/a/b/c/d
  $ mkdir /mnt/a/b/c/d/e
  $ mkdir /mnt/a/b/c/d/f
  $ mv /mnt/a/b/c/d/e /mnt/a/b/c/d/f/E2
  $ mkdir /mmt/a/b/c/g
  $ mv /mnt/a/b/c/d /mnt/a/b/D2

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1
  $ btrfs send /mnt/snap1 -f /tmp/base.send

  $ mkdir /mnt/a/o
  $ mv /mnt/a/b/c/g /mnt/a/b/D2/f/G2
  $ mv /mnt/a/b/D2 /mnt/a/b/dd
  $ mv /mnt/a/b/c /mnt/a/C2
  $ mv /mnt/a/b/dd/f /mnt/a/o/FF
  $ mv /mnt/a/b /mnt/a/o/FF/E2/BB

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send

A test case for xfstests follows.

Signed-off-by: Filipe David Borba Manana 
---

V2: Added missing error handling and fixed typo in commit message.
V3: Updated the algorithm to deal with more complex cases, hopefully all
cases are nailed down now.

 fs/btrfs/send.c |   56 ---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d869079..5d757ee 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2916,7 +2916,7 @@ static void free_waiting_dir_move(struct send_ctx *sctx,
kfree(dm);
 }
 
-static int add_pending_dir_move(struct send_ctx *sctx, u64 parent_ino)
+static int add_pending_dir_move(struct send_ctx *sctx, u64 ino, u64 parent_ino)
 {
struct rb_node **p = &sctx->pending_dir_moves.rb_node;
struct rb_node *parent = NULL;
@@ -2929,7 +2929,7 @@ static int add_pending_dir_move(struct send_ctx *sctx, 
u64 parent_ino)
if (!pm)
return -ENOMEM;
pm->parent_ino = parent_ino;
-   pm->ino = sctx->cur_ino;
+   pm->ino = ino;
pm->gen = sctx->cur_inode_gen;
INIT_LIST_HEAD(&pm->list);
INIT_LIST_HEAD(&pm->update_refs);
@@ -3183,6 +3183,7 @@ static int wait_for_parent_move(struct send_ctx *sctx,
struct fs_path *path_before = NULL;
struct fs_path *path_after = NULL;
int len1, len2;
+   int register_upper_dirs;
 
if (is_waiting_for_move(sctx, ino))
return 1;
@@ -3242,6 +3243,54 @@ static int wait_for_parent_move(struct send_ctx *sctx,
}
ret = 0;
 
+   /*
+* Ok, our new most direct ancestor has a higher inode number but
+* wasn't moved/renamed. So maybe some of the new ancestors higher in
+* the hierarchy have an higher inode number too *and* were renamed
+* or moved - in this case we need to wait for the ancestor's rename
+* or move operation before we can do the move/rename for the current
+* inode.
+*/
+   register_upper_dirs = 0;
+again:
+   while ((ret == 0 || register_upper_dirs) &&
+  parent_ino_after > sctx->cur_ino) {
+   ino = parent_ino_after;
+   fs_path_reset(path_before);
+   fs_path_reset(path_after);
+
+   ret = get_first_ref(sctx->send_root, ino, &parent_ino_after,
+   NULL, path_after);
+   if (ret < 0)
+   goto out;
+   ret = get_first_ref(sctx->parent_root, ino, &parent_ino_before,
+   NULL, path_before);
+   if (ret == -ENOENT) {
+   ret = 0;
+   break;
+   } else if (ret < 0) {
+   goto out;
+   }
+
+   len1 = fs_path_len(path_before);
+   len2 = fs_path_len(path_after);
+   if (parent_ino_before != parent_ino_after || len1 != len2 ||
+   

Re: Please help me to contribute to btrfs project

2014-03-18 Thread Ben Gamari
Ajesh js  writes:

> Hi,
>
> I have used the btrfs filesystem in one of my projects and I have
> added a small feature to it. I feel that the same feature will be
> useful for others too. Hence I would like to contribute the same to
> open source.
>
Excellent!

> If everything works fine and this feature is not already added by
> somebody else, this will be my first contribution to the opensource &
> I am excited to join the huge family of opensource :)
>
> Please help me with a precise steps to do the same.
>
In general the way to contribute is to send a patch for review. You
should have a look at the code style guidelines[1] and patch submission
guidelines[2] in the kernel tree. For nontrivial changes the patch
should be accompanied by a cover letter describing the change and the
motivations for any non-obvious design decisions.

It is possible that your change is acceptable as-is. More likely,
however, is that there will be some discussion and requests for
changes. Eventually the review process will produce a merge-worthy
patch. The first step, however, is sending something concrete for
community review.

Cheers,

- Ben


[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/CodingStyle
[2] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches



pgp9hFdMVn2wY.pgp
Description: PGP signature


[PATCH 2/6] Btrfs-progs: fsck: fix possible memory leaks in run_next_block()

2014-03-18 Thread Wang Shilong
We still need free allocated cache memory in case error happens.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index c0b7f8c..b3f7e22 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5909,6 +5909,9 @@ out:
free_block_group_tree(&block_group_cache);
free_device_extent_tree(&dev_extent_cache);
free_extent_cache_tree(&seen);
+   free_extent_cache_tree(&pending);
+   free_extent_cache_tree(&reada);
+   free_extent_cache_tree(&nodes);
return ret;
 }
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots

2014-03-18 Thread Wang Shilong
This patch makes us to rebuild a really corrupt extent tree with snapshots.
To implement this, we have to verify whether a block is FULL BACKREF.

This idea come from Josef Bacik:

1) We walk down the original tree, every eb we encounter has
btrfs_header_owner(eb) == root->objectid.  We add normal references
for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root.  World peace
is achieved.

2) We walk down the snapshotted tree.  Say we didn't change anything
at all, it was just a clean snapshot and then boom.  So the
btrfs_header_owner(root->node) == root->objectid, so normal backref.
We walk down to the next level, where btrfs_header_owner(eb) !=
root->objectid, but the level above did, so we add normal refs for all
of these blocks.  We go down the next level, now our
btrfs_header_owner(parent) != root->objectid and
btrfs_header_owner(eb) != root->objectid.  This is where we need to
now go back and see if btrfs_header_owner(eb) currently has a ref on
eb.  If it does we are done, move on to the next block in this same
level, we don't have to go further down.

3) Harder case, we snapshotted and then changed things in the original
root.  Do the same thing as in step 2, but now we get down to
btrfs_header_owner(eb) != root->objectid && btrfs_header_owner(parent)
!= root->objectid.  We lookup the references we have for eb and notice
that btrfs_header_owner(eb) no longer refers to eb.  So now we must
set FULL_BACKREF on this extent reference and add a
SHARED_BLOCK_REF_KEY for this eb using the parent->start as the
offset.  And we need to keep walking down and doing the same thing
until we either hit level 0 or btrfs_header_owner(eb) has a ref on the
block.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 132 +--
 1 file changed, 129 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index e40b806..e1238d7 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -107,6 +107,7 @@ struct extent_record {
unsigned int owner_ref_checked:1;
unsigned int is_root:1;
unsigned int metadata:1;
+   unsigned int flag_block_full_backref:1;
 };
 
 struct inode_backref {
@@ -3829,6 +3830,127 @@ static int is_dropped_key(struct btrfs_key *key,
return 0;
 }
 
+static int calc_extent_flag(struct btrfs_root *root,
+  struct cache_tree *extent_cache,
+  struct extent_buffer *buf,
+  struct root_item_record *ri,
+  u64 *flags)
+{
+   int i;
+   int nritems = btrfs_header_nritems(buf);
+   struct btrfs_key key;
+   struct extent_record *rec;
+   struct cache_extent *cache;
+   struct data_backref *dback;
+   struct tree_backref *tback;
+   struct extent_buffer *new_buf;
+   u64 owner = 0;
+   u64 bytenr;
+   u64 offset;
+   u64 ptr;
+   int size;
+   int ret;
+   u8 level;
+
+   /*
+* Except file/reloc tree, we can not have
+* FULL BACKREF MODE
+*/
+   if (ri->objectid < BTRFS_FIRST_FREE_OBJECTID)
+   goto normal;
+   /*
+* root node
+*/
+   if (buf->start == ri->bytenr)
+   goto normal;
+   if (btrfs_is_leaf(buf)) {
+   /*
+* we are searching from original root, world
+* peace is achieved, we use normal backref.
+*/
+   owner = btrfs_header_owner(buf);
+   if (owner == ri->objectid)
+   goto normal;
+   /*
+* we check every eb here, and if any of
+* eb dosen't have original root refers
+* to this eb, we set full backref flag for
+* this extent, otherwise normal backref.
+*/
+   for (i = 0; i < nritems; i++) {
+   struct btrfs_file_extent_item *fi;
+   btrfs_item_key_to_cpu(buf, &key, i);
+
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   continue;
+   fi = btrfs_item_ptr(buf, i,
+   struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(buf, fi) ==
+   BTRFS_FILE_EXTENT_INLINE)
+   continue;
+   if (btrfs_file_extent_disk_bytenr(buf, fi) == 0)
+   continue;
+   bytenr = btrfs_file_extent_disk_bytenr(buf, fi);
+   cache = lookup_cache_extent(extent_cache, bytenr, 1);
+   if (!cache)
+   goto full_backref;
+   offset = btrfs_file_extent_offset(buf, fi);
+   rec = container_of(cache, struct extent_record, cache);
+   dback = find_data_backref(rec, 0, ri->objectid, owner,
+ 

[PATCH 5/6] Btrfs-progs: fsck: reduce memory usage of extent record struct

2014-03-18 Thread Wang Shilong
Two changes:
1.use bit filed for @found_rec
2.u32 is enough to calculate duplicate extent number.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index e1238d7..34f8fa6 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -92,7 +92,6 @@ struct extent_record {
struct list_head list;
struct cache_extent cache;
struct btrfs_disk_key parent_key;
-   unsigned int found_rec;
u64 start;
u64 max_size;
u64 nr;
@@ -101,8 +100,9 @@ struct extent_record {
u64 generation;
u64 parent_generation;
u64 info_objectid;
-   u64 num_duplicates;
+   u32 num_duplicates;
u8 info_level;
+   unsigned int found_rec:1;
unsigned int content_checked:1;
unsigned int owner_ref_checked:1;
unsigned int is_root:1;
@@ -2742,7 +2742,10 @@ static int add_extent_rec(struct cache_tree 
*extent_cache,
rec->start = start;
rec->max_size = max_size;
rec->nr = max(nr, max_size);
-   rec->found_rec = extent_rec;
+   if (extent_rec)
+   rec->found_rec = 1;
+   else
+   rec->found_rec = 0;
rec->content_checked = 0;
rec->owner_ref_checked = 0;
rec->num_duplicates = 0;
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] Btrfs-progs: fsck: fix wrong index in pick_next_pending()

2014-03-18 Thread Wang Shilong
Though all tree blocks have same size, we'd better use right
index here.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 34f8fa6..ebdb643 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2928,7 +2928,7 @@ static int pick_next_pending(struct cache_tree *pending,
cache = search_cache_extent(reada, 0);
if (cache) {
bits[0].start = cache->start;
-   bits[1].size = cache->size;
+   bits[0].size = cache->size;
*reada_bits = 1;
return 1;
}
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] Btrfs-progs: fsck: don't free @seen cache until we finish searching

2014-03-18 Thread Wang Shilong
@seen cache is used to avoid iterating same block more than once, and
we can not free them until we have finished searching.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index d1cafe1..c0b7f8c 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -3892,12 +3892,6 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
remove_cache_extent(nodes, cache);
free(cache);
}
-   cache = lookup_cache_extent(seen, bytenr, size);
-   if (cache) {
-   remove_cache_extent(seen, cache);
-   free(cache);
-   }
-
cache = lookup_cache_extent(extent_cache, bytenr, size);
if (cache) {
struct extent_record *rec;
@@ -5914,6 +5908,7 @@ out:
free_device_cache_tree(&dev_cache);
free_block_group_tree(&block_group_cache);
free_device_extent_tree(&dev_extent_cache);
+   free_extent_cache_tree(&seen);
return ret;
 }
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree

2014-03-18 Thread Wang Shilong
Previously, we deal with node block firstly and then leaf block which can
maximize readahead. However, to rebuild extent tree, we need deal with snapshot
one by one.

This patch makes us deal with snapshot one by one if we need rebuild extent
tree otherwise we drop into previous way.

Signed-off-by: Wang Shilong 
---
 cmds-check.c | 248 +--
 1 file changed, 158 insertions(+), 90 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index b3f7e22..e40b806 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -123,10 +123,14 @@ struct inode_backref {
char name[0];
 };
 
-struct dropping_root_item_record {
+struct root_item_record {
struct list_head list;
-   struct btrfs_root_item ri;
-   struct btrfs_key found_key;
+   u64 objectid;
+   u64 bytenr;
+   u8 level;
+   u8 drop_level;
+   int level_size;
+   struct btrfs_key drop_key;
 };
 
 #define REF_ERR_NO_DIR_ITEM(1 << 0)
@@ -3839,7 +3843,7 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
  struct rb_root *dev_cache,
  struct block_group_tree *block_group_cache,
  struct device_extent_tree *dev_extent_cache,
- struct btrfs_root_item *ri)
+ struct root_item_record *ri)
 {
struct extent_buffer *buf;
u64 bytenr;
@@ -4072,11 +4076,8 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
size = btrfs_level_size(root, level - 1);
btrfs_node_key_to_cpu(buf, &key, i);
if (ri != NULL) {
-   struct btrfs_key drop_key;
-   btrfs_disk_key_to_cpu(&drop_key,
- &ri->drop_progress);
if ((level == ri->drop_level)
-   && is_dropped_key(&key, &drop_key)) {
+   && is_dropped_key(&key, &ri->drop_key)) {
continue;
}
}
@@ -4117,7 +4118,7 @@ static int add_root_to_pending(struct extent_buffer *buf,
   struct cache_tree *pending,
   struct cache_tree *seen,
   struct cache_tree *nodes,
-  struct btrfs_key *root_key)
+  u64 objectid)
 {
if (btrfs_header_level(buf) > 0)
add_pending(nodes, seen, buf->start, buf->len);
@@ -4126,13 +4127,12 @@ static int add_root_to_pending(struct extent_buffer 
*buf,
add_extent_rec(extent_cache, NULL, 0, buf->start, buf->len,
   0, 1, 1, 0, 1, 0, buf->len);
 
-   if (root_key->objectid == BTRFS_TREE_RELOC_OBJECTID ||
+   if (objectid == BTRFS_TREE_RELOC_OBJECTID ||
btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
add_tree_backref(extent_cache, buf->start, buf->start,
 0, 1);
else
-   add_tree_backref(extent_cache, buf->start, 0,
-root_key->objectid, 1);
+   add_tree_backref(extent_cache, buf->start, 0, objectid, 1);
return 0;
 }
 
@@ -5695,6 +5695,99 @@ static int check_devices(struct rb_root *dev_cache,
return ret;
 }
 
+static int add_root_item_to_list(struct list_head *head,
+ u64 objectid, u64 bytenr,
+ u8 level, u8 drop_level,
+ int level_size, struct btrfs_key *drop_key)
+{
+
+   struct root_item_record *ri_rec;
+   ri_rec = malloc(sizeof(*ri_rec));
+   if (!ri_rec)
+   return -ENOMEM;
+   ri_rec->bytenr = bytenr;
+   ri_rec->objectid = objectid;
+   ri_rec->level = level;
+   ri_rec->level_size = level_size;
+   ri_rec->drop_level = drop_level;
+   if (drop_key)
+   memcpy(&ri_rec->drop_key, drop_key, sizeof(*drop_key));
+   list_add_tail(&ri_rec->list, head);
+
+   return 0;
+}
+
+static int deal_root_from_list(struct list_head *list,
+  struct btrfs_trans_handle *trans,
+  struct btrfs_root *root,
+  struct block_info *bits,
+  int bits_nr,
+  struct cache_tree *pending,
+  struct cache_tree *seen,
+  struct cache_tree *reada,
+  struct cache_tree *nodes,
+  struct cache_tree *extent_cache,
+  struct cache_tree *chunk_cache,
+  struct rb_root *dev_cache,
+  struct block_group_tree

Re: [PATCH] Btrfs: fix a crash of clone with inline extents's split

2014-03-18 Thread Liu Bo
On Mon, Mar 17, 2014 at 03:41:31PM +0100, David Sterba wrote:
> On Mon, Mar 10, 2014 at 06:56:07PM +0800, Liu Bo wrote:
> > xfstests's btrfs/035 triggers a BUG_ON, which we use to detect the split
> > of inline extents in __btrfs_drop_extents().
> > 
> > For inline extents, we cannot duplicate another EXTENT_DATA item, because
> > it breaks the rule of inline extents, that is, 'start offset' needs to be 0.
> > 
> > We have set limitations for the source inode's compressed inline extents,
> > because it needs to decompress and recompress.  Now the destination inode's
> > inline extents also need similar limitations.
> 
> The limitation (by lack of implementation, not by design) of compressed
> inline extents is there, but it's impossible to reach. The inline
> extents are never longer than the 'inline limit' (the ~3916 size), so
> the comment is more a note to the future.
> 
> You're adding another limitation to avoid a crash, but I don't agree
> that EINVAL is right here, due to the fact that it's lack of
> implementation, not a real error.
> 
> There are enough EINVAL's that verify correcntess of the input
> parameters and it's not always clear which one fails. The EOPNOTSUPP
> errocode is close to the true reason of the failure, but it could be
> misinterpreted as if the whole clone operation is not supported, so it's
> not all correct but IMO better than EINVAL.

Yep, I was hesitating on these two errors while making the patch, but I
prefer EINVAL rather than EOPNOTSUPP because of the reason you've stated.

I think it'd be good to add one more btrfs_printk message to clarify what's
happening here, agree?

> 
> The most common case of 'cp --reflink' is not affected by this.
> 
> > 
> > With this, xfstests btrfs/035 doesn't run into panic.
> > 
> > Signed-off-by: Liu Bo 
> > ---
> >  fs/btrfs/file.c  | 15 ---
> >  fs/btrfs/ioctl.c | 10 ++
> >  2 files changed, 18 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index 0165b86..2c34a04 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3090,8 +3090,9 @@ process_slot:
> >  new_key.offset + datal,
> >  1);
> > if (ret) {
> > -   btrfs_abort_transaction(trans, root,
> > -   ret);
> > +   if (ret != -EINVAL)
> > +   btrfs_abort_transaction(trans,
> > +   root, ret);
> 
> The error comes from __btrfs_drop_extents and all callers would need to
> be updated (or at least reviewed) with the 'ret != ...' check as well,
> because it changes the semantics. And I'm not sure if to the right
> direction.

Good point, Dave, actually I missed this part before, just checked for
callers of __btrfs_drop_extents() and btrfs_drop_extents(), luckily EINVAL is
not a special one at these places, the error is just returned to upper callers.

> 
> > btrfs_end_transaction(trans, root);
> > goto out;
> > }
> > @@ -3175,8 +3176,9 @@ static noinline long btrfs_ioctl_clone(struct file 
> > *file, unsigned long srcfd,
> >  *   decompress into destination's address_space (the file offset
> >  *   may change, so source mapping won't do), then recompress (or
> >  *   otherwise reinsert) a subrange.
> 
> > -* - allow ranges within the same file to be cloned (provided
> > -*   they don't overlap)?
> 
> True, but unrelated.

yep, that's right, will clean it up.

Thanks for the comments!

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to handle a RAID5 arrawy with a failing drive?

2014-03-18 Thread Duncan
Marc MERLIN posted on Sun, 16 Mar 2014 15:20:26 -0700 as excerpted:

> Do I have other options?
> (data is not important at all, I just want to learn how to deal with
> such a case with the current code)

First just a note that you hijacked Mr Manana's patch thread.  Replying 
to a post and changing the topic (the usual cause of such hijacks) does 
NOT change the thread, as the References and In-Reply-To headers still 
includes the Message-IDs from the original thread, and that's what good 
clients thread by since the subject line isn't a reliable means of 
threading.  To start a NEW thread, don't reply to an existing thread, 
compose a NEW message, starting a NEW thread. =:^)

Back on topic...

Since you don't have to worry about the data I'd suggest blowing it away 
and starting over.  Btrfs raid5/6 code is known to be incomplete at this 
point, to work in normal mode and write everything out, but with 
incomplete recovery code.  So I'd treat it like the raid-0 mode it 
effectively is, and consider it lost if a device drops.

There *IS* a post from an earlier thread where someone mentioned a 
recovery under some specific circumstance that worked for him, but I'd 
consider that the exception not the norm since the code is known to be 
incomplete and I think he just got lucky and didn't hit the particular 
missing code in his specific case.  Certainly you could try to go back 
and see what he did and under what conditions, and that might actually be 
worth doing if you had valuable data you'd be losing otherwise, but since 
you don't, while of course it's up to you, I'd not bother were it me.

Which I haven't.  My use-case wouldn't be looking at raid5/6 (or raid0) 
anyway, but even if it were, I'd not touch the current code unless it 
/was/ just for something I'd consider risking on a raid0.  Other than 
pure testing, the /only/ case I'd consider btrfs raid5/6 for right now, 
would be something that I'd consider raid0 riskable currently, but with 
the bonus of it upgrading "for free" to raid5/6 when the code is complete 
without any further effort on my part, since it's actually being written 
as raid5/6 ATM, the recovery simply can't be relied upon as raid5/6, so 
in recovery terms you're effectively running raid0 until it can be.  
Other than that and for /pure/ testing, I just don't see the point of 
even thinking about raid5/6 at this point.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6 EARLY RFC] Btrfs: Get rid of whole page I/O.

2014-03-18 Thread chandan
Hello David,

> I looked at previous postings of this patchset, but haven't found what
> are the expected supported block sizes.
> 
> I assume powers of two starting with 512b, until 64k.

The earlier patchset posted by Chandra Seethraman was to get 4k
blocksize to work with ppc64's 64k PAGE_SIZE. I chose to do 2k
blocksize on x86_64's 4k PAGE_SIZE since that would allow others in
the community to work/experiment with subpagesize-blocksize feature.

The root node of "tree root" tree has 1957 bytes being written by
make_btrfs() (in btrfs-progs).  Hence I chose to do 2k blocksize for
the initial subpagesize-blocksize work. So with this patchset the
supported blocksizes would be in the range 2k-64k.

Thanks,
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please help me to contribute to btrfs project

2014-03-18 Thread Ajesh js
Hi,

I have used the btrfs filesystem in one of my projects and I have
added a small feature to it. I feel that the same feature will be
useful for others too. Hence I would like to contribute the same to
open source.

If everything works fine and this feature is not already added by
somebody else, this will be my first contribution to the opensource &
I am excited to join the huge family of opensource :)

Please help me with a precise steps to do the same.

Thank you,
Ajesh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html