Re: [PATCH v2 2/4] Btrfs: fix protection between send and root deletion

2014-01-22 Thread Wang Shilong

Hi David,

On 01/22/2014 02:16 AM, David Sterba wrote:

On Thu, Jan 16, 2014 at 10:32:38AM +0800, Miao Xie wrote:

Your fix makes sure that the deleted root will not get cleaned and stays
during the send. Only after it finishes it will be cleaned. Now, what if
send fails or is interrupted? There's no way to redo it. Yes the user
can be blamed for the mistake, or the tools will prevent him to do it.

I don't think so. The users should be responsible for their behavior if they
destroy the subvolume.

Right now it's not possible to determine if a subvolume is involved in a
send (other than the user knows by himself that he started send). Send
or subvolume cleaning can be performed on the background. Although the
user is responsible for his actions, the consequence here is not
obvious, silent and irreversible.


I see the latter as more user-friendly. Doing a 'send and forget' where
I don't care if the data will be sent properly does not fit the primary
purpose of send/receive with backups.

My idea to fix that:
- add an internal root_item flag to denote a dead root
- set this flag in btrfs_add_dead_root()
- check the flag in send similar to the btrfs_root_readonly checks, for
   all involved roots
- in 'destroy subvolume, check if the send_in_progress is set and refuse
   to delete

It is similar to our approach. But I think our idea is better because
- we needn't add a new flag

Adding the flag is cheap.


- The subvolumes are special directory, the most operations of them should
   be similar to the common directory. Since we can remove a directory while
   someone is accessing it, it is better that we can destroy a subvolume
   while we are using it as a send parent.

Yes they're similar, but subvolumes have additional features that need
to be handled appropriately. One cannot send a directory.

So we disagree, I see a reason for the deletion protection and will do
the patch myself. Let's see if we can get more user feedback then.

I'm NAKing this patch in current state, if it helps anything.

Both ways are ok for me actually, don't be annoyed anyway,

You and Miao are really doing a good job to Btrfs, just go ahead, i
am ok with dropping this patch.^_^

Thanks,
Wang



david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6] Btrfs: fix infinite path build loops in incremental send

2014-01-22 Thread Filipe David Borba Manana
The send operation processes inodes by their ascending number, and assumes
that any rename/move operation can be successfully performed (sent to the
caller) once all previous inodes (those with a smaller inode number than the
one we're currently processing) were processed.

This is not true when an incremental send had to process an hierarchical change
between 2 snapshots where the parent-children relationship between directory
inodes was reversed - that is, parents became children and children became
parents. This situation made the path building code go into an infinite loop,
which kept allocating more and more memory that eventually lead to a krealloc
warning being displayed in dmesg:

  WARNING: CPU: 1 PID: 5705 at mm/page_alloc.c:2477 
__alloc_pages_nodemask+0x365/0xad0()
  Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) 
vboxnetflt(O) vboxdrv(O) snd_hda_codec_hdmi snd_hda_codec_realtek joydev radeon 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq_midi snd_pcm psmouse i915 
snd_rawmidi serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer ttm 
snd_seq_device rfcomm drm_kms_helper parport_pc bnep bluetooth drm ppdev snd 
soundcore i2c_algo_bit snd_page_alloc binfmt_misc video lp parport r8169 mii 
hid_generic usbhid hid
  CPU: 1 PID: 5705 Comm: btrfs Tainted: G   O 
3.13.0-rc7-fdm-btrfs-next-18+ #3
  Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS 
P1.50 09/04/2012
  [ 5381.660441]  09ad 8806f6f2f4e8 81777434 
0007
  [ 5381.660447]   8806f6f2f528 8104a9ec 
8807038f36f0
  [ 5381.660452]   0206 8807038f2490 
8807038f36f0
  [ 5381.660457] Call Trace:
  [ 5381.660464]  [] dump_stack+0x4e/0x68
  [ 5381.660471]  [] warn_slowpath_common+0x8c/0xc0
  [ 5381.660476]  [] warn_slowpath_null+0x1a/0x20
  [ 5381.660480]  [] __alloc_pages_nodemask+0x365/0xad0
  [ 5381.660487]  [] ? local_clock+0x4f/0x60
  [ 5381.660491]  [] ? free_one_page+0x98/0x440
  [ 5381.660495]  [] ? local_clock+0x4f/0x60
  [ 5381.660502]  [] ? __get_free_pages+0x14/0x50
  [ 5381.660508]  [] ? trace_hardirqs_off_caller+0x28/0xd0
  [ 5381.660515]  [] alloc_pages_current+0x10f/0x1f0
  [ 5381.660520]  [] ? __get_free_pages+0x14/0x50
  [ 5381.660524]  [] __get_free_pages+0x14/0x50
  [ 5381.660530]  [] kmalloc_order_trace+0x3e/0x100
  [ 5381.660536]  [] __kmalloc_track_caller+0x220/0x230
  [ 5381.660560]  [] ? fs_path_ensure_buf.part.12+0x6b/0x200 
[btrfs]
  [ 5381.660564]  [] ? retint_restore_args+0xe/0xe
  [ 5381.660569]  [] krealloc+0x6f/0xb0
  [ 5381.660586]  [] fs_path_ensure_buf.part.12+0x6b/0x200 
[btrfs]
  [ 5381.660601]  [] fs_path_prepare_for_add+0x98/0xb0 [btrfs]
  [ 5381.660615]  [] fs_path_add_path+0x2c/0x60 [btrfs]
  [ 5381.660628]  [] get_cur_path+0x7c/0x1c0 [btrfs]

Even without this loop, the incremental send couldn't succeed, because it would 
attempt
to send a rename/move operation for the lower inode before the highest inode 
number was
renamed/move. This issue is easy to trigger with the following steps:

  $ mkfs.btrfs -f /dev/sdb3
  $ mount /dev/sdb3 /mnt/btrfs
  $ mkdir -p /mnt/btrfs/a/b/c/d
  $ mkdir /mnt/btrfs/a/b/c2
  $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
  $ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c2/d2
  $ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/c2/d2/cc
  $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
  $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send

The structure of the filesystem when the first snapshot is taken is:

 .   (ino 256)
 |-- a   (ino 257)
 |-- b   (ino 258)
 |-- c   (ino 259)
 |   |-- d   (ino 260)
 |
 |-- c2  (ino 261)

And its structure when the second snapshot is taken is:

 .   (ino 256)
 |-- a   (ino 257)
 |-- b   (ino 258)
 |-- c2  (ino 261)
 |-- d2  (ino 260)
 |-- cc  (ino 259)

Before the move/rename operation is performed for the inode 259, the
move/rename for inode 260 must be performed, since 259 is now a child
of 260.

A test case for xfstests, with a more complex scenario, will follow soon.

Signed-off-by: Filipe David Borba Manana 
---

V2: Removed some non ascii characters in directory hierarchy diagrams,
which were generated by the 'tree' command.
V3: Make sure stack list entries are freed on error path in function
apply_children_dir_moves.
V4: Simplified function apply_children_dir_moves, removed repeated and
confusing code.
V5: Cleaner error path in add_pending_dir_move function.
V6: Properly deal with multiple directories waiting for the same parent
directory move/rename to happen. Updated test case for xfstests to
exercise that code path too.

 fs/btrfs/send.c |

[PATCH v2] xfstests: add test for btrfs incremental send infinite loop issue

2014-01-22 Thread Filipe David Borba Manana
Regression test for btrfs' incremental send feature:

1) Create several nested directories;

2) Create a read only snapshot;

3) Change the parentship of some of the deepest directories in a reverse
   way, so that parents become children and children become parents;

4) Create another read only snapshot and use it for an incremental send
   relative to the first snapshot.

At step 4 btrfs' send entered an infinite loop, increasing the memory it
used while building path strings until a krealloc was unable to allocate
more memory, which caused a warning dump in dmesg.

The following linux kernel patch fixes this issue.

   Btrfs: fix infinite path build loops in incremental send
   (https://patchwork.kernel.org/patch/3522361/)

Signed-off-by: Filipe David Borba Manana 
---

V2: Updated test to trigger one more code path in the corresponding
btrfs linux kernel patch that fixes this issue.

 tests/btrfs/030 |  144 +++
 tests/btrfs/030.out |9 
 tests/btrfs/group   |1 +
 3 files changed, 154 insertions(+)
 create mode 100755 tests/btrfs/030
 create mode 100644 tests/btrfs/030.out

diff --git a/tests/btrfs/030 b/tests/btrfs/030
new file mode 100755
index 000..5e1b4fc
--- /dev/null
+++ b/tests/btrfs/030
@@ -0,0 +1,144 @@
+#! /bin/bash
+# FS QA Test No. btrfs/030
+#
+# Regression test for btrfs' incremental send feature:
+#
+# 1) Create several nested directories;
+# 2) Create a read only snapshot;
+# 3) Change the parentship of some of the deepest directories in a reverse
+#way, so that parents become children and children become parents;
+# 4) Create another read only snapshot and use it for an incremental send
+#relative to the first snapshot.
+#
+# At step 4 btrfs' send entered an infinite loop, increasing the memory it
+# used while building path strings until a krealloc was unable to allocate
+# more memory, which caused a warning dump in dmesg.
+#
+#---
+# Copyright (c) 2014 Filipe Manana.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=`mktemp -d`
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+FSSUM_PROG=$here/src/fssum
+[ -x $FSSUM_PROG ] || _notrun "fssum not built"
+
+rm -f $seqres.full
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+
+mkdir -p $SCRATCH_MNT/a/b/c
+echo "hello" > $SCRATCH_MNT/a/b/c/file.txt
+mkdir $SCRATCH_MNT/a/b/c/d
+mkdir $SCRATCH_MNT/a/b/c2
+mkdir $SCRATCH_MNT/a/b/www
+mkdir -p $SCRATCH_MNT/a/b/c3/x/y
+
+# Directory tree looks like:
+#
+# . (ino 256)
+# |-- a/(ino 257)
+# |-- b/(ino 258)
+# |-- c/(ino 259)
+# |   |-- file.txt  (ino 260)
+# |   |-- d/(ino 261)
+# |
+# |-- c2/   (ino 262)
+# |-- www/  (ino 263)
+# |
+# |-- c3/   (ino 264)
+# |-- x/(ino 265)
+# |-- y/(ino 266)
+
+$BTRFS_UTIL_PROG subvol snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 | \
+_filter_scratch
+
+echo " world" >> $SCRATCH_MNT/a/b/c/file.txt
+mv $SCRATCH_MNT/a/b/c/d $SCRATCH_MNT/a/b/c2/d2
+mv $SCRATCH_MNT/a/b/c $SCRATCH_MNT/a/b/c2/d2/cc
+mv $SCRATCH_MNT/a/b/c3/x/y $SCRATCH_MNT/a/b/c2/y2
+mv $SCRATCH_MNT/a/b/c3/x $SCRATCH_MNT/a/b/c2/y2/x2
+mv $SCRATCH_MNT/a/b/c3 $SCRATCH_MNT/a/b/c2/y2/x2/Z
+mv $SCRATCH_MNT/a/b/www $SCRATCH_MNT/a/b/c2/y2/x2/WWW
+ln $SCRATCH_MNT/a/b/c2/d2/cc/file.txt $SCRATCH_MNT/a/b/c2/y2/x2/Z/file_link.txt
+mv $SCRATCH_MNT/a/b/c2/d2/cc/file.txt $SCRATCH_MNT/a/b/c2/y2/x2
+
+# Directory tree now looks like:
+#
+# . (ino 256)
+# |-- a/(ino 257)
+# |-- b/(ino 258)
+# |-- c2/   (ino 262)
+# |-- d2/   (ino 261)
+# | 

Re: [PATCH] Btrfs: fix snprintf usage by send's gen_unique_name

2014-01-22 Thread David Sterba
On Tue, Jan 21, 2014 at 11:36:38PM +, Filipe David Borba Manana wrote:
> The buffer size argument passed to snprintf must account for the
> trailing null byte added by snprintf, and it returns a value >= then
> sizeof(buffer) when the string can't fit in the buffer.
> 
> Since our buffer has a size of 64 characters, and the maximum orphan
> name we can generate is 63 characters wide, we must pass 64 as the
> buffer size to snprintf, and not 63.
> 
> Signed-off-by: Filipe David Borba Manana 

JFYI, I have a patch to do the same plus cleans the code around, but
it's part of a bigger series that's in testing atm, so I haven't sent it
yet.

Consider it
Reviewed-by: David Sterba 

I'll update my version.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Working on Btrfs as topic for master thesis

2014-01-22 Thread David Sterba
On Sat, Jan 18, 2014 at 12:50:54PM +, Toggenburger Lukas wrote:
> Hello Tomasz
> 
> > Have you considered per-file/per-directory selection of raid level?
> 
> Sounds great, I haven't thought about it before.
> 
> Do you or someone else know what the current state of development is?
> Is someone working on this?

The feature lacks interface to specify the raid flags per-object. This
is WIP, keyword is 'properties', you'll find some preliminary patches in
the list. This is the ground work for all sorts of fancy tuning.

The filesystem split into areas with different raid levels will bring
interesting problems regarding free space and operations that cross the
raid levels. But I think it's doable.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:1593! with 3.13.0-rc7

2014-01-22 Thread Tomasz Chmielewski
I could still see the bug (below) with 3.13 and tried to apply the patch.

It did apply:

patching file fs/btrfs/ctree.c
Hunk #1 succeeded at 39 with fuzz 2.
Hunk #2 succeeded at 475 (offset 1 line).
Hunk #3 succeeded at 485 (offset 1 line).
Hunk #4 succeeded at 505 (offset 1 line).
Hunk #5 succeeded at 527 (offset 1 line).
Hunk #6 succeeded at 568 (offset 1 line).
Hunk #7 succeeded at 578 (offset 1 line).
Hunk #8 succeeded at 606 (offset 1 line).
Hunk #9 succeeded at 703 (offset 1 line).
Hunk #10 succeeded at 742 (offset 1 line).
Hunk #11 succeeded at 834 (offset 1 line).
Hunk #12 succeeded at 927 (offset 1 line).
Hunk #13 succeeded at 1230 (offset 1 line).
Hunk #14 succeeded at 3216 (offset -42 lines).
Hunk #15 succeeded at 3291 (offset -42 lines).
Hunk #16 succeeded at 3497 (offset -42 lines).


however, the kernel fails to compile:

  LD  fs/btrfs/built-in.o
  CC [M]  fs/btrfs/super.o
  CC [M]  fs/btrfs/ctree.o
fs/btrfs/ctree.c: In function ‘tree_mod_log_set_node_key’:
fs/btrfs/ctree.c:924:2: error: implicit declaration of function 
‘__tree_mod_log_insert_key’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[4]: *** [fs/btrfs/ctree.o] Error 1


Is there a patch which works with 3.13?


[130583.552477] [ cut here ]
[130583.552596] WARNING: CPU: 0 PID: 9052 at fs/btrfs/ctree.c:1321 
btrfs_search_old_slot+0x322/0x7ea [btrfs]()
[130583.552718] Modules linked in: ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 
cpufreq_ondemand cpufreq_conservative cpufreq_powersave cpufreq_stats bridge 
stp llc ipv6 btrfs xor raid6_pq zlib_deflate loop ehci_pci ehci_hcd video 
button lpc_ich mfd_core i2c_i801 i2c_core pcspkr acpi_cpufreq ext4 crc16 jbd2 
mbcache raid1 sg sd_mod ahci libahci libata r8169 scsi_mod mii
[130583.553167] CPU: 0 PID: 9052 Comm: btrfs-endio-wri Tainted: GW
3.13.0 #1
[130583.553287] Hardware name: System manufacturer System Product Name/P8H77-M 
PRO, BIOS 1101 02/04/2013
[130583.553409]  0009 880043697908 8138998a 
0006
[130583.553533]   880043697948 810370b5 
044684a45000
[130583.553688]  a025fe5a 8807ee7b1510 880741080800 
8802e3746000
[130583.553816] Call Trace:
[130583.553880]  [] dump_stack+0x46/0x58
[130583.553945]  [] warn_slowpath_common+0x77/0x91
[130583.554018]  [] ? btrfs_search_old_slot+0x322/0x7ea 
[btrfs]
[130583.554134]  [] warn_slowpath_null+0x15/0x17
[130583.554205]  [] btrfs_search_old_slot+0x322/0x7ea [btrfs]
[130583.554286]  [] __resolve_indirect_refs+0x10f/0x48d 
[btrfs]
[130583.554416]  [] find_parent_nodes+0x337/0x5d2 [btrfs]
[130583.554493]  [] iterate_extent_inodes+0xc9/0x1d6 [btrfs]
[130583.554590]  [] ? record_extent_backrefs+0xc3/0xc3 [btrfs]
[130583.554687]  [] ? record_extent_backrefs+0xc3/0xc3 [btrfs]
[130583.554764]  [] iterate_inodes_from_logical+0x7f/0x95 
[btrfs]
[130583.554891]  [] record_extent_backrefs+0x5b/0xc3 [btrfs]
[130583.554968]  [] btrfs_finish_ordered_io+0x77a/0x877 
[btrfs]
[130583.555105]  [] ? kmem_cache_free+0x164/0x17a
[130583.555171]  [] ? mempool_free_slab+0x12/0x14
[130583.555245]  [] finish_ordered_fn+0x10/0x12 [btrfs]
[130583.555322]  [] worker_loop+0x15e/0x495 [btrfs]
[130583.555398]  [] ? btrfs_queue_worker+0x269/0x269 [btrfs]
[130583.555465]  [] kthread+0xcd/0xd5
[130583.28]  [] ? kthread_freezable_should_stop+0x43/0x43
[130583.94]  [] ret_from_fork+0x7c/0xb0
[130583.555658]  [] ? kthread_freezable_should_stop+0x43/0x43
[130583.555723] ---[ end trace 29066b81af8a4336 ]---
[130583.555802] BTRFS critical (device sdd1): unable to find logical 
3472310704041439232 len 4096
[130583.555926] [ cut here ]
[130583.555987] kernel BUG at fs/btrfs/inode.c:1593!
[130583.556047] invalid opcode:  [#1] SMP 
[130583.556108] Modules linked in: ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 
cpufreq_ondemand cpufreq_conservative cpufreq_powersave cpufreq_stats bridge 
stp llc ipv6 btrfs xor raid6_pq zlib_deflate loop ehci_pci ehci_hcd video 
button lpc_ich mfd_core i2c_i801 i2c_core pcspkr acpi_cpufreq ext4 crc16 jbd2 
mbcache raid1 sg sd_mod ahci libahci libata r8169 scsi_mod mii
[130583.556524] CPU: 0 PID: 9052 Comm: btrfs-endio-wri Tainted: GW
3.13.0 #1
[130583.556637] Hardware name: System manufacturer System Product Name/P8H77-M 
PRO, BIOS 1101 02/04/2013
[130583.556751] task: 8806615adc40 ti: 880043696000 task.ti: 
880043696000
[130583.556862] RIP: 0010:[]  [] 
btrfs_merge_bio_hook+0x53/0x68 [btrfs]
[130583.556991] RSP: 0018:880043697588  EFLAGS: 00010282
[130583.557051] RAX: ffea RBX: 1000 RCX: 
0046
[130583.557161] RDX: 0006 RSI: 0046 RDI: 
88081fa0d040
[130583.557272] RBP: 8800436975a8 R08:  R09: 

Re: Working on Btrfs as topic for master thesis

2014-01-22 Thread David Sterba
On Sun, Jan 19, 2014 at 09:44:39PM -0800, Roger Binns wrote:
> If you are more interested in the theoretical side then looking into
> compression would be interesting.  ie how close to the theoretical best
> compression are we.  Various filesystems like btrfs and NTFS make all
> sorts of compromises in algorithm choices but also especially in the size
> of blocks they compress.  How much better could be done?

I have done some work here, so far it's stalled due to more important
work.

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Compression_enhancements

Do you have other suggestions beyond what's proposed there?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:1593! with 3.13.0-rc7

2014-01-22 Thread Filipe David Manana
On Wed, Jan 22, 2014 at 12:07 PM, Tomasz Chmielewski  wrote:
> I could still see the bug (below) with 3.13 and tried to apply the patch.
>
> It did apply:
>
> patching file fs/btrfs/ctree.c
> Hunk #1 succeeded at 39 with fuzz 2.
> Hunk #2 succeeded at 475 (offset 1 line).
> Hunk #3 succeeded at 485 (offset 1 line).
> Hunk #4 succeeded at 505 (offset 1 line).
> Hunk #5 succeeded at 527 (offset 1 line).
> Hunk #6 succeeded at 568 (offset 1 line).
> Hunk #7 succeeded at 578 (offset 1 line).
> Hunk #8 succeeded at 606 (offset 1 line).
> Hunk #9 succeeded at 703 (offset 1 line).
> Hunk #10 succeeded at 742 (offset 1 line).
> Hunk #11 succeeded at 834 (offset 1 line).
> Hunk #12 succeeded at 927 (offset 1 line).
> Hunk #13 succeeded at 1230 (offset 1 line).
> Hunk #14 succeeded at 3216 (offset -42 lines).
> Hunk #15 succeeded at 3291 (offset -42 lines).
> Hunk #16 succeeded at 3497 (offset -42 lines).
>
>
> however, the kernel fails to compile:
>
>   LD  fs/btrfs/built-in.o
>   CC [M]  fs/btrfs/super.o
>   CC [M]  fs/btrfs/ctree.o
> fs/btrfs/ctree.c: In function ‘tree_mod_log_set_node_key’:
> fs/btrfs/ctree.c:924:2: error: implicit declaration of function 
> ‘__tree_mod_log_insert_key’ [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> make[4]: *** [fs/btrfs/ctree.o] Error 1
>
>
> Is there a patch which works with 3.13?


Just apply this one before that:

http://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/commit/?id=a220765e8f936883d9968dd79cba0e230729f70e

thanks

>
>
> [130583.552477] [ cut here ]
> [130583.552596] WARNING: CPU: 0 PID: 9052 at fs/btrfs/ctree.c:1321 
> btrfs_search_old_slot+0x322/0x7ea [btrfs]()
> [130583.552718] Modules linked in: ipt_MASQUERADE iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables 
> x_tables cpufreq_ondemand cpufreq_conservative cpufreq_powersave 
> cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq zlib_deflate loop 
> ehci_pci ehci_hcd video button lpc_ich mfd_core i2c_i801 i2c_core pcspkr 
> acpi_cpufreq ext4 crc16 jbd2 mbcache raid1 sg sd_mod ahci libahci libata 
> r8169 scsi_mod mii
> [130583.553167] CPU: 0 PID: 9052 Comm: btrfs-endio-wri Tainted: GW
> 3.13.0 #1
> [130583.553287] Hardware name: System manufacturer System Product 
> Name/P8H77-M PRO, BIOS 1101 02/04/2013
> [130583.553409]  0009 880043697908 8138998a 
> 0006
> [130583.553533]   880043697948 810370b5 
> 044684a45000
> [130583.553688]  a025fe5a 8807ee7b1510 880741080800 
> 8802e3746000
> [130583.553816] Call Trace:
> [130583.553880]  [] dump_stack+0x46/0x58
> [130583.553945]  [] warn_slowpath_common+0x77/0x91
> [130583.554018]  [] ? btrfs_search_old_slot+0x322/0x7ea 
> [btrfs]
> [130583.554134]  [] warn_slowpath_null+0x15/0x17
> [130583.554205]  [] btrfs_search_old_slot+0x322/0x7ea 
> [btrfs]
> [130583.554286]  [] __resolve_indirect_refs+0x10f/0x48d 
> [btrfs]
> [130583.554416]  [] find_parent_nodes+0x337/0x5d2 [btrfs]
> [130583.554493]  [] iterate_extent_inodes+0xc9/0x1d6 [btrfs]
> [130583.554590]  [] ? record_extent_backrefs+0xc3/0xc3 
> [btrfs]
> [130583.554687]  [] ? record_extent_backrefs+0xc3/0xc3 
> [btrfs]
> [130583.554764]  [] iterate_inodes_from_logical+0x7f/0x95 
> [btrfs]
> [130583.554891]  [] record_extent_backrefs+0x5b/0xc3 [btrfs]
> [130583.554968]  [] btrfs_finish_ordered_io+0x77a/0x877 
> [btrfs]
> [130583.555105]  [] ? kmem_cache_free+0x164/0x17a
> [130583.555171]  [] ? mempool_free_slab+0x12/0x14
> [130583.555245]  [] finish_ordered_fn+0x10/0x12 [btrfs]
> [130583.555322]  [] worker_loop+0x15e/0x495 [btrfs]
> [130583.555398]  [] ? btrfs_queue_worker+0x269/0x269 [btrfs]
> [130583.555465]  [] kthread+0xcd/0xd5
> [130583.28]  [] ? 
> kthread_freezable_should_stop+0x43/0x43
> [130583.94]  [] ret_from_fork+0x7c/0xb0
> [130583.555658]  [] ? 
> kthread_freezable_should_stop+0x43/0x43
> [130583.555723] ---[ end trace 29066b81af8a4336 ]---
> [130583.555802] BTRFS critical (device sdd1): unable to find logical 
> 3472310704041439232 len 4096
> [130583.555926] [ cut here ]
> [130583.555987] kernel BUG at fs/btrfs/inode.c:1593!
> [130583.556047] invalid opcode:  [#1] SMP
> [130583.556108] Modules linked in: ipt_MASQUERADE iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables 
> x_tables cpufreq_ondemand cpufreq_conservative cpufreq_powersave 
> cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq zlib_deflate loop 
> ehci_pci ehci_hcd video button lpc_ich mfd_core i2c_i801 i2c_core pcspkr 
> acpi_cpufreq ext4 crc16 jbd2 mbcache raid1 sg sd_mod ahci libahci libata 
> r8169 scsi_mod mii
> [130583.556524] CPU: 0 PID: 9052 Comm: btrfs-endio-wri Tainted: GW
> 3.13.0 #1
> [130583.556637] Hardware name: System manufacturer System Product 
> Name/P8H77-M PRO, BIOS 1101 02/04/2013
> [130583.556751] task: 8806615

Re: Working on Btrfs as topic for master thesis

2014-01-22 Thread David Sterba
On Tue, Jan 21, 2014 at 04:52:00PM +, Hugo Mills wrote:
> On Tue, Jan 21, 2014 at 07:25:43AM -0500, Austin S Hemmelgarn wrote:
> > > Maybe this happens already: Might a similar effect be automatically
> > > achieved by tracking per-device I/O load averages and distributing
> > > reads based on the I/O loads of possible read devices?
> > > 
> > That might be the case, it depends on how the I/O load averages are
> > calculated.  I actually hadn't realized BTRFS did this, I thought it
> > behaved more like MD RAID (that is, distributing the reads among devices
> > in a un-weighted round-robin fashion).
> 
>I think David tried that a while ago, and the benchmarks were
> actually worse. I'm not sure how much investigation he did into why,
> though.

I haven't done any extensive testing, only streaming writes, and the
heuristic just batched writes by a given threshold before switching to
another mirror.  Load balancing was done without any logic that would
look at actual IO load of the devices. For me, the result was that
there's room for improvement.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Working on Btrfs as topic for master thesis

2014-01-22 Thread David Sterba
On Wed, Jan 22, 2014 at 01:20:10PM +0100, David Sterba wrote:
> I haven't done any extensive testing, only streaming writes, and the
> heuristic just batched writes by a given threshold before switching to
> another mirror.  Load balancing was done without any logic that would
> look at actual IO load of the devices. For me, the result was that
> there's room for improvement.

http://www.spinics.net/lists/linux-btrfs/msg12745.html
http://www.spinics.net/lists/linux-btrfs/msg17228.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] Btrfs: fix protection between send and root deletion

2014-01-22 Thread David Sterba
On Wed, Jan 22, 2014 at 04:44:14PM +0800, Wang Shilong wrote:
> >So we disagree, I see a reason for the deletion protection and will do
> >the patch myself. Let's see if we can get more user feedback then.
> >
> >I'm NAKing this patch in current state, if it helps anything.
> Both ways are ok for me actually, don't be annoyed anyway,

Nah, this is a message to maintainers that the patch discussion has
reached some conclusion and should help deciding if the patch should be
merged or not. We've seen in the past that after a moderate discussion
against patch inclusion, the patch ended up merged as if nothing
happend. _This_ can be annoying.

> You and Miao are really doing a good job to Btrfs, just go ahead, i
> am ok with dropping this patch.^_^

Ok, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] xfstests: btrfs: cross-subvolume sparse copy

2014-01-22 Thread David Sterba
On Tue, Jan 21, 2014 at 12:40:48PM +0100, Koen De Wit wrote:
> +btrfs subvol delete $SUBVOL1 >/dev/null 2>&1
> +btrfs subvol delete $SUBVOL2 >/dev/null 2>&1

Please use $BTRFS_UTIL_PROG instead of 'btrfs' and don't shorten the
command names, ie 'subvolume'.

> +cp --reflink $TESTDIR1/file1 $SUBVOL1
> +cp --reflink $TESTDIR1/file1 $SUBVOL2
> +cp --reflink $SUBVOL1/file2 $TESTDIR1/
> +cp --reflink $SUBVOL1/file2 $SUBVOL2
> +cp --reflink $SUBVOL2/file3 $TESTDIR1/
> +cp --reflink $SUBVOL2/file3 $SUBVOL1

--reflink without any parameter means 'always', that's what we want, but
can we possibly make it explicit? 'cp' is an external tool and if the
default changes, the test would not work as expected.

Otherwise ok,
Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Working on Btrfs as topic for master thesis

2014-01-22 Thread Hugo Mills
On Wed, Jan 22, 2014 at 01:05:33PM +0100, David Sterba wrote:
> On Sat, Jan 18, 2014 at 12:50:54PM +, Toggenburger Lukas wrote:
> > Hello Tomasz
> > 
> > > Have you considered per-file/per-directory selection of raid level?
> > 
> > Sounds great, I haven't thought about it before.
> > 
> > Do you or someone else know what the current state of development is?
> > Is someone working on this?
> 
> The feature lacks interface to specify the raid flags per-object. This
> is WIP, keyword is 'properties', you'll find some preliminary patches in
> the list. This is the ground work for all sorts of fancy tuning.
> 
> The filesystem split into areas with different raid levels will bring
> interesting problems regarding free space and operations that cross the
> raid levels. But I think it's doable.

   There's some potentially horrible ENOSPC cases with uneven-sized
devices. I need to run some simulations to see how it'll behave...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- The early bird gets the worm,  but the second mouse ---   
gets the cheese. 


signature.asc
Description: Digital signature


[PATCH v2] xfstests: btrfs: cross-subvolume sparse copy

2014-01-22 Thread Koen De Wit
This testscript creates reflinks to files on different subvolumes,
overwrites the original files and reflinks, and moves reflinked files
between subvolumes.

Signed-off-by: Koen De Wit 
Reviewed-by: David Sterba 
---
v1: Resend (originally submitted as test 302, btrfs/316)
v2: - use $BTRFS_UTIL_PROG instead of btrfs command
- use full subcommands
- explicitly define the "always" parameter to cp --reflink
- define $seqres

diff --git a/tests/btrfs/030 b/tests/btrfs/030
new file mode 100644
index 000..3a1b970
--- /dev/null
+++ b/tests/btrfs/030
@@ -0,0 +1,138 @@
+#! /bin/bash
+# FS QA Test No. 030
+#
+# Testing cross-subvolume sparse copy on btrfs
+#- Create two subvolumes, mount one of them
+#- Create a file on each (sub/root)volume,
+#  reflink them on the other volumes
+#- Change one original and two reflinked files
+#- Move reflinked files between subvolumes
+#
+#---
+# Copyright (c) 2014, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+umount $SCRATCH_MNT
+rm -rf $TESTDIR1
+rm -rf $TESTDIR2
+$BTRFS_UTIL_PROG subvolume delete $SUBVOL1 >> $seqres.full
+$BTRFS_UTIL_PROG subvolume delete $SUBVOL2 >> $seqres.full
+cd /
+rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+
+_require_scratch
+_require_cp_reflink
+
+_checksum_files() {
+for F in file1 file2 file3
+do
+echo "$F:"
+for D in $TESTDIR1 $SCRATCH_MNT $SUBVOL2
+do
+_md5_checksum $D/$F
+done
+done
+}
+
+TESTDIR1=$TEST_DIR/test-$seq-1
+TESTDIR2=$TEST_DIR/test-$seq-2
+SUBVOL1=$TEST_DIR/subvol-$seq-1
+SUBVOL2=$TEST_DIR/subvol-$seq-2
+
+_scratch_unmount 2>/dev/null
+rm -rf $seqres.full
+rm -rf $TESTDIR1 $TESTDIR2
+$BTRFS_UTIL_PROG subvolume delete $SUBVOL1 >/dev/null 2>&1
+$BTRFS_UTIL_PROG subvolume delete $SUBVOL2 >/dev/null 2>&1
+
+mkdir $TESTDIR1
+mkdir $TESTDIR2
+$BTRFS_UTIL_PROG subvolume create $SUBVOL1 >> $seqres.full
+$BTRFS_UTIL_PROG subvolume create $SUBVOL2 >> $seqres.full
+_mount -t btrfs -o subvol=subvol-$seq-1 $TEST_DEV $SCRATCH_MNT
+
+echo "Create initial files"
+# TESTDIR1/file1 is very small and will be inlined
+$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 10' $TESTDIR1/file1 \
+>> $seqres.full
+$XFS_IO_PROG -f -c 'pwrite -S 0x62 0 13000' $SCRATCH_MNT/file2 \
+>> $seqres.full
+$XFS_IO_PROG -f -c 'pwrite -S 0x63 0 17000' $SUBVOL2/file3 \
+>> $seqres.full
+
+echo "Create reflinks to the initial files on other subvolumes"
+cp --reflink=always $TESTDIR1/file1 $SUBVOL1
+cp --reflink=always $TESTDIR1/file1 $SUBVOL2
+cp --reflink=always $SUBVOL1/file2 $TESTDIR1/
+cp --reflink=always $SUBVOL1/file2 $SUBVOL2
+cp --reflink=always $SUBVOL2/file3 $TESTDIR1/
+cp --reflink=always $SUBVOL2/file3 $SUBVOL1
+
+echo "Verify the reflinks"
+_verify_reflink $SCRATCH_MNT/file2 $TESTDIR1/file2
+_verify_reflink $SCRATCH_MNT/file2 $SUBVOL2/file2
+_verify_reflink $SUBVOL2/file3 $TESTDIR1/file3
+_verify_reflink $SUBVOL2/file3 $SCRATCH_MNT/file3
+echo "Verify the file contents:"
+_checksum_files
+
+echo -e "---\nOverwrite some files with new content"
+$XFS_IO_PROG -c 'pwrite -S 0x64 0 20' $TESTDIR1/file1 >> $seqres.full
+$XFS_IO_PROG -c 'pwrite -S 0x66 0 21000' $SUBVOL2/file2 >> $seqres.full
+$XFS_IO_PROG -c 'pwrite -S 0x65 5000 5000' $SCRATCH_MNT/file3 \
+>> $seqres.full
+
+echo -n "Verify that non-overwritten reflinks "
+echo "still have the same data blocks"
+_verify_reflink $TESTDIR1/file2 $SCRATCH_MNT/file2
+_verify_reflink $TESTDIR1/file3 $SUBVOL2/file3
+echo "Verify the file contents:"
+_checksum_files
+
+echo -e "---\nShuffle files between directories"
+mv $TESTDIR1/file* $TESTDIR2
+mv $SCRATCH_MNT/file* $TESTDIR1/
+mv $SUBVOL2/file* $SCRATCH_MNT/
+mv $TESTDIR2/file* $SUBVOL2/
+
+# No _verify_reflink here as data is copied when moving files
+# between subvols
+echo "Verify the file contents:"
+_checksum_files
+
+# success, all done
+status=0

Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread Duncan
Graham Fleming posted on Tue, 21 Jan 2014 10:03:26 -0800 as excerpted:

> I want to keep playing around with BTRFSS RAID 5 and testing with it...
> assuming I have a drive with bad blocks, or let's say some inconsistent
> parity am I right in assuming that a) a btrfs scrub operation will not
> fix the stripes with bad parity

What I know is that it is said btrfs scrub doesn't work with btrfs raid5/6 
yet.  I don't know how it actually fails (tho I'd hope it simply returns 
an error to the effect that it doesn't work with raid5/6 yet) as I've not 
actually tried that mode, here.

> and b) a balance operation will not be
> successful? Or would a balance operation work to re-write parity?

Balance actually rewrites everything (well, everything matching its 
filters if a filtered balance is used, everything, if not), so it should 
rewrite parity correctly.

AFAIK, all the writing works and routine read works.  It's the error 
recovery that's still only partially implemented.  Since reading just 
reads data, not parity unless there's a dropped device or the like to 
recover from, as long as all devices are active and there's a good copy 
of the data (based on btrfs checksumming) to read, the rebalance should 
just use and rewrite that, ignoring the bad parity.



-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread Duncan
Jim Salter posted on Tue, 21 Jan 2014 12:18:01 -0500 as excerpted:

> Would it be reasonably accurate to say "btrfs' RAID5 implementation is
> likely working well enough and safe enough if you are backing up
> regularly and are willing and able to restore from backup if necessary
> if a device failure goes horribly wrong", then?

I'd say (and IIRC I did say somewhere, but don't remember if it was this 
thread) that in reliability terms btrfs raid5 should be treated like 
btrfs raid0 at this point.  Raid0 is well known to have absolutely no 
failover -- if a device fails, the raid is toast.  It's possible so-
called "extreme measures" may recover data from the surviving bits (think 
the $expen$ive$ $ervice$ of data recovery firms), but the idea is that 
either no data that's not easily replaced is stored on a raid0 in the 
first place, or if it is, there's (tested recoverable) backup to the 
level that you're fully comfortable with losing EVERYTHING not backed up.

Examples of good data for raid0 are the kernel sources (as a user, not a 
dev, so you're not hacking on them), your distro's local package cache, 
browser cache, etc.  This because by definition all those examples have 
the net as their backup, so loss of a local copy means a bit more to 
download, at worst.

That's what btrfs raid5/6 are at the moment, effectively raid0 from a 
recovery perspective.

Now the parity /is/ being written; it simply can't be treated as 
available for recovery.  So supposing you do /not/ lose a device (or 
suffer a bad checksum) on the raid5 until after the recovery code is 
complete and available, you've effectively "free" upgraded from raid0 
reliability to raid5 reliability as soon as recovery is possible, which 
will be nice, and meanwhile you can test the operational functionality, 
so there /are/ reasons you might want to run the btrfs raid5 mode now.  
As long as you remember it's currently effectively raid0 should something 
go wrong, and you either don't use it for valuable data in the first 
place, or you're willing to do without any updates to that data since the 
last tested backup, should it come to that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: failed to read the system array on sdX

2014-01-22 Thread Duncan
Hans-Kristian Bakke posted on Tue, 21 Jan 2014 23:09:58 +0100 as
excerpted:

> 2. There is no uninstall target in the btrfs-tools Makefile. How am I
> supposed to uninstall btrfs-progs if wanting to go back to older
> versions (or newer)?

As I run gentoo not debian, I won't try to answer the other.  However I 
can answer this, to /some/ extent anyway.

a) (probably not all that interesting to you) On gentoo there's a live-
git ebuild available.  The PM will track the files it installs just as it 
would for any other package.  (A fake-install to a temp location is done 
first and the files that appear there are automatically registered for 
tracking and later uninstall.  Then the fake install is copied to the 
live filesystem.)  If your distro has something similar...

b) As is often the case with in-development "leaf" packages that are 
mostly executables and documentation (that is, no libraries/headers/etc 
that other packages will need), a standard recommendation is to use the 
files directly from the build dir, without actually installing.

So you'd build btrfs, and any time you wanted to use it you'd simply cd 
to its build dir and do ./btrfs (or other command).  Then you'd simply 
delete the build dir to uninstall.

Of course you could manually copy individual files out of the build dir, 
say to your initr* build, if desired, and track the ones you did just as 
manually.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ERROR: error removing the device '/dev/sdf' - Input/output error

2014-01-22 Thread G. Michael Carter
How do I get around this.  The drive /dev/sdf has bad sectors.

Label: Store_01  uuid: ae612523-63cf-4860-a2cb-83a26d907e43
Total devices 5 FS bytes used 7.51TiB
devid1 size 0.00 used 77.00GiB path /dev/sdf
devid3 size 1.82TiB used 1.41TiB path /dev/sdd
devid4 size 2.73TiB used 2.32TiB path /dev/sda
devid5 size 2.73TiB used 2.32TiB path /dev/sdb
devid6 size 1.82TiB used 1.41TiB path /dev/sdc

Btrfs v3.12
Data, RAID0: total=1.86TiB, used=1.85TiB
Data, single: total=5.65TiB, used=5.64TiB
System, RAID1: total=32.00MiB, used=732.00KiB
Metadata, RAID1: total=10.00GiB, used=8.69GiB

btrfs device delete /dev/sdf /mnt/Store
ERROR: error removing the device '/dev/sdf' - Input/output error

I've tried rebalancing as much of the data off the drive I can.  But
there's still bits in that 77GB that's good data.

Is there a way of having btrfs skip around the input/output error and
then force the drive to remove?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread Chris Mason
On Tue, 2014-01-21 at 17:08 +, Duncan wrote:
> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
> 
> > Thanks for all the info guys.
> > 
> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
> > them.
> > 
> > I copied some data (from dev/urandom) into two test files and got their
> > MD5 sums and saved them to a text file.
> > 
> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
> > attached to /dev/loop4.
> > 
> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
> > added /dev/loop4 to the volume and then deleted the missing device and
> > it rebalanced. I had data spread out on all three devices now. MD5 sums
> > unchanged on test files.
> > 
> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
> > fact,
> > replace a dead drive.
> > 
> > Am I missing something?
> 
> What you're missing is that device death and replacement rarely happens 
> as neatly as your test (clean unmounts and all, no middle-of-process 
> power-loss, etc).  You tested best-case, not real-life or worst-case.
> 
> Try that again, setting up the raid5, setting up a big write to it, 
> disconnect one device in the middle of that write (I'm not sure if just 
> dropping the loop works or if the kernel gracefully shuts down the loop 
> device), then unplugging the system without unmounting... and /then/ see 
> what sense btrfs can make of the resulting mess.  In theory, with an 
> atomic write btree filesystem such as btrfs, even that should work fine, 
> minus perhaps the last few seconds of file-write activity, but the 
> filesystem should remain consistent on degraded remount and device add, 
> device remove, and rebalance, even if another power-pull happens in the 
> middle of /that/.
> 
> But given btrfs' raid5 incompleteness, I don't expect that will work.
> 

raid5/6 deals with IO errors from one or two drives, and it is able to
reconstruct the parity from the remaining drives and give you good data.

If we hit a crc error, the raid5/6 code will try a parity reconstruction
to make good data, and if we find good data from the other copy, it'll
return that up to userland.

In other words, for those cases it works just like raid1/10.  What it
won't do (yet) is write that good data back to the storage.  It'll stay
bad until you remove the device or run balance to rewrite everything.

Balance will reconstruct parity to get good data as it balances.  This
isn't as useful as scrub, but that work is coming.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Working on Btrfs as topic for master thesis

2014-01-22 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/01/14 04:12, David Sterba wrote:
> I have done some work here, so far it's stalled due to more important 
> work.
> 
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#Compression_enhancements
>
>  Do you have other suggestions beyond what's proposed there?

There was the theoretical side - ie coming up with a way of defining
perfection which then allows measuring against.  For example you have
going up to a 128K block size but without knowing the theoretical best we
don't know if that is a stopgap or very good.

That also feeds into things like if it would be a good idea to go back
afterwards (perhaps as part of defrag) and spend more effort on
(re)compression.

Another consideration is perhaps having the compression dictionary kept
separate from the compressed blocks thereby allowing it to be used across
blocks and potentially files.  Compressors like smaz (very good on short
pieces of text) work by having a precomputed dictionary - perhaps those
can be used too.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLgMEQACgkQmOOfHg372QRGDACeI604tw4OZsITHZEY60O6aiQX
GD4AoIj9s2rbVWiRp2W4FR6rkAf+iSsH
=cD4/
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread ronnie sahlberg
On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason  wrote:
> On Tue, 2014-01-21 at 17:08 +, Duncan wrote:
>> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
>>
>> > Thanks for all the info guys.
>> >
>> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
>> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
>> > them.
>> >
>> > I copied some data (from dev/urandom) into two test files and got their
>> > MD5 sums and saved them to a text file.
>> >
>> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
>> > attached to /dev/loop4.
>> >
>> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
>> > added /dev/loop4 to the volume and then deleted the missing device and
>> > it rebalanced. I had data spread out on all three devices now. MD5 sums
>> > unchanged on test files.
>> >
>> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
>> > fact,
>> > replace a dead drive.
>> >
>> > Am I missing something?
>>
>> What you're missing is that device death and replacement rarely happens
>> as neatly as your test (clean unmounts and all, no middle-of-process
>> power-loss, etc).  You tested best-case, not real-life or worst-case.
>>
>> Try that again, setting up the raid5, setting up a big write to it,
>> disconnect one device in the middle of that write (I'm not sure if just
>> dropping the loop works or if the kernel gracefully shuts down the loop
>> device), then unplugging the system without unmounting... and /then/ see
>> what sense btrfs can make of the resulting mess.  In theory, with an
>> atomic write btree filesystem such as btrfs, even that should work fine,
>> minus perhaps the last few seconds of file-write activity, but the
>> filesystem should remain consistent on degraded remount and device add,
>> device remove, and rebalance, even if another power-pull happens in the
>> middle of /that/.
>>
>> But given btrfs' raid5 incompleteness, I don't expect that will work.
>>
>
> raid5/6 deals with IO errors from one or two drives, and it is able to
> reconstruct the parity from the remaining drives and give you good data.
>
> If we hit a crc error, the raid5/6 code will try a parity reconstruction
> to make good data, and if we find good data from the other copy, it'll
> return that up to userland.
>
> In other words, for those cases it works just like raid1/10.  What it
> won't do (yet) is write that good data back to the storage.  It'll stay
> bad until you remove the device or run balance to rewrite everything.
>
> Balance will reconstruct parity to get good data as it balances.  This
> isn't as useful as scrub, but that work is coming.
>

That is awesome!

What about online conversion from not-raid5/6 to raid5/6  what is the
status for that code, for example
what happens if there is a failure during the conversion or a reboot ?



> -chris
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread Chris Mason
On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote:
> On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason  wrote:
> > On Tue, 2014-01-21 at 17:08 +, Duncan wrote:
> >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
> >>
> >> > Thanks for all the info guys.
> >> >
> >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
> >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
> >> > them.
> >> >
> >> > I copied some data (from dev/urandom) into two test files and got their
> >> > MD5 sums and saved them to a text file.
> >> >
> >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
> >> > attached to /dev/loop4.
> >> >
> >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
> >> > added /dev/loop4 to the volume and then deleted the missing device and
> >> > it rebalanced. I had data spread out on all three devices now. MD5 sums
> >> > unchanged on test files.
> >> >
> >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
> >> > fact,
> >> > replace a dead drive.
> >> >
> >> > Am I missing something?
> >>
> >> What you're missing is that device death and replacement rarely happens
> >> as neatly as your test (clean unmounts and all, no middle-of-process
> >> power-loss, etc).  You tested best-case, not real-life or worst-case.
> >>
> >> Try that again, setting up the raid5, setting up a big write to it,
> >> disconnect one device in the middle of that write (I'm not sure if just
> >> dropping the loop works or if the kernel gracefully shuts down the loop
> >> device), then unplugging the system without unmounting... and /then/ see
> >> what sense btrfs can make of the resulting mess.  In theory, with an
> >> atomic write btree filesystem such as btrfs, even that should work fine,
> >> minus perhaps the last few seconds of file-write activity, but the
> >> filesystem should remain consistent on degraded remount and device add,
> >> device remove, and rebalance, even if another power-pull happens in the
> >> middle of /that/.
> >>
> >> But given btrfs' raid5 incompleteness, I don't expect that will work.
> >>
> >
> > raid5/6 deals with IO errors from one or two drives, and it is able to
> > reconstruct the parity from the remaining drives and give you good data.
> >
> > If we hit a crc error, the raid5/6 code will try a parity reconstruction
> > to make good data, and if we find good data from the other copy, it'll
> > return that up to userland.
> >
> > In other words, for those cases it works just like raid1/10.  What it
> > won't do (yet) is write that good data back to the storage.  It'll stay
> > bad until you remove the device or run balance to rewrite everything.
> >
> > Balance will reconstruct parity to get good data as it balances.  This
> > isn't as useful as scrub, but that work is coming.
> >
> 
> That is awesome!
> 
> What about online conversion from not-raid5/6 to raid5/6  what is the
> status for that code, for example
> what happens if there is a failure during the conversion or a reboot ?

The conversion code uses balance, so that works normally.  If there is a
failure during the conversion you'll end up with some things raid5/6 and
somethings at whatever other level you used.

The data will still be there, but you are more prone to enospc
problems ;)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread ronnie sahlberg
On Wed, Jan 22, 2014 at 1:16 PM, Chris Mason  wrote:
> On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote:
>> On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason  wrote:
>> > On Tue, 2014-01-21 at 17:08 +, Duncan wrote:
>> >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
>> >>
>> >> > Thanks for all the info guys.
>> >> >
>> >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
>> >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
>> >> > them.
>> >> >
>> >> > I copied some data (from dev/urandom) into two test files and got their
>> >> > MD5 sums and saved them to a text file.
>> >> >
>> >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
>> >> > attached to /dev/loop4.
>> >> >
>> >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
>> >> > added /dev/loop4 to the volume and then deleted the missing device and
>> >> > it rebalanced. I had data spread out on all three devices now. MD5 sums
>> >> > unchanged on test files.
>> >> >
>> >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in
>> >> > fact,
>> >> > replace a dead drive.
>> >> >
>> >> > Am I missing something?
>> >>
>> >> What you're missing is that device death and replacement rarely happens
>> >> as neatly as your test (clean unmounts and all, no middle-of-process
>> >> power-loss, etc).  You tested best-case, not real-life or worst-case.
>> >>
>> >> Try that again, setting up the raid5, setting up a big write to it,
>> >> disconnect one device in the middle of that write (I'm not sure if just
>> >> dropping the loop works or if the kernel gracefully shuts down the loop
>> >> device), then unplugging the system without unmounting... and /then/ see
>> >> what sense btrfs can make of the resulting mess.  In theory, with an
>> >> atomic write btree filesystem such as btrfs, even that should work fine,
>> >> minus perhaps the last few seconds of file-write activity, but the
>> >> filesystem should remain consistent on degraded remount and device add,
>> >> device remove, and rebalance, even if another power-pull happens in the
>> >> middle of /that/.
>> >>
>> >> But given btrfs' raid5 incompleteness, I don't expect that will work.
>> >>
>> >
>> > raid5/6 deals with IO errors from one or two drives, and it is able to
>> > reconstruct the parity from the remaining drives and give you good data.
>> >
>> > If we hit a crc error, the raid5/6 code will try a parity reconstruction
>> > to make good data, and if we find good data from the other copy, it'll
>> > return that up to userland.
>> >
>> > In other words, for those cases it works just like raid1/10.  What it
>> > won't do (yet) is write that good data back to the storage.  It'll stay
>> > bad until you remove the device or run balance to rewrite everything.
>> >
>> > Balance will reconstruct parity to get good data as it balances.  This
>> > isn't as useful as scrub, but that work is coming.
>> >
>>
>> That is awesome!
>>
>> What about online conversion from not-raid5/6 to raid5/6  what is the
>> status for that code, for example
>> what happens if there is a failure during the conversion or a reboot ?
>
> The conversion code uses balance, so that works normally.  If there is a
> failure during the conversion you'll end up with some things raid5/6 and
> somethings at whatever other level you used.
>
> The data will still be there, but you are more prone to enospc
> problems ;)
>

Ok, but if there is enough space,  you could just restart the balance
and it will eventually finish and all should, with some luck, be ok?

Awesome. This sounds like things are a lot closer to raid5/6 being
fully operational than I realized.


> -chris
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ERROR: error removing the device '/dev/sdf' - Input/output error

2014-01-22 Thread Chris Murphy

On Jan 22, 2014, at 11:41 AM, G. Michael Carter  wrote:

> How do I get around this.  The drive /dev/sdf has bad sectors.
> 
> Label: Store_01  uuid: ae612523-63cf-4860-a2cb-83a26d907e43
>Total devices 5 FS bytes used 7.51TiB
>devid1 size 0.00 used 77.00GiB path /dev/sdf

size 0.00 used 77.00 GB? Does this even make sense?

>devid3 size 1.82TiB used 1.41TiB path /dev/sdd
>devid4 size 2.73TiB used 2.32TiB path /dev/sda
>devid5 size 2.73TiB used 2.32TiB path /dev/sdb
>devid6 size 1.82TiB used 1.41TiB path /dev/sdc
> 
> Btrfs v3.12
> Data, RAID0: total=1.86TiB, used=1.85TiB
> Data, single: total=5.65TiB, used=5.64TiB
> System, RAID1: total=32.00MiB, used=732.00KiB
> Metadata, RAID1: total=10.00GiB, used=8.69GiB
> 
> btrfs device delete /dev/sdf /mnt/Store
> ERROR: error removing the device '/dev/sdf' - Input/output error

Seems it can only be removed if all of the data on that device are successfully 
migrated to other devices.

> 
> I've tried rebalancing as much of the data off the drive I can.  But
> there's still bits in that 77GB that's good data.
> 
> Is there a way of having btrfs skip around the input/output error and
> then force the drive to remove?

It's a valid question for both raid0 and single data profiles, if there will 
one day be a possibility to tolerate read errors, migrate what can be migrated 
and then permit (bad) device removal. Already a scrub would identify corrupt 
files. An additional feature would be a way to cause corrupted files to be 
easily deleted.

In a case of multiple device raid0, without a regular balance being a 
requirement, I could very easily start with two disks, add two more disks, and 
so on, and end up with a significant amount of completely valid data that 
survives a one disk failure. Clearly the file system itself is OK due to 
metadata raid1.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix warning while merging two adjacent extents

2014-01-22 Thread Gui Hecheng
When we have two adjacent extents in relink_extent_backref,
we try to merge them. When we use btrfs_search_slot to locate the
slot for the current extent, we shouldn't set "ins_len = 1",
because we will merge it into the previous extent rather than
insert a new item. Otherwise, we may happen to create a new leaf
in btrfs_search_slot and path->slot[0] will be 0. Then we try to
fetch the previous item using "path->slots[0]--", and it will cause
a warning as follows:

[  145.713385] WARNING: CPU: 3 PID: 1796 at fs/btrfs/extent_io.c:5043 
map_private_extent_buffer+0xd4/0xe0
[  145.713387] btrfs bad mapping eb start 5337088 len 4096, wanted 
167772306 8
...
[  145.713462]  [] map_private_extent_buffer+0xd4/0xe0
[  145.713476]  [] ? btrfs_free_path+0x2a/0x40
[  145.713485]  [] btrfs_get_token_64+0x64/0xf0
[  145.713498]  [] relink_extent_backref+0x41c/0x820
[  145.713508]  [] btrfs_finish_ordered_io+0x239/0xa80

I encounter this warning when running defrag having mkfs.btrfs
with option -M. At the same time there are read/writes & snapshots
running at background.

Signed-off-by: Gui Hecheng 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1ea19ce..7f955d6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2314,7 +2314,7 @@ again:
u64 extent_len;
struct btrfs_key found_key;
 
-   ret = btrfs_search_slot(trans, root, &key, path, 1, 1);
+   ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
if (ret < 0)
goto out_free_path;
 
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#736227: linux-headers-3.12-1-amd64: general protection fault when using aptitude

2014-01-22 Thread Ben Hutchings
I'm sending this on to the btrfs developers to see if they can help.

On Tue, 2014-01-21 at 11:00 +0200, Giorgos Pallas wrote:
[...]
> I just installed 3.12-amd64 stock kernel. It booted OK, I opened a konsole
> and just tried to installed the kernel headers.  Just as aptitude tried to
> start downloading packages, I got:
> E: Method http has died unexpectedly!
> E: Sub-process http received a segmentation fault.
> and btrfs has crashed as seen in dmesg.
> 
> It should be noted that I use btrfs without problems with 3.7-trunk-amd64
> for almost a year.
[...]

The full message can be seen at ; here is
the crash log:

> [  322.368186] general protection fault:  [#1] SMP 
> [  322.368233] Modules linked in: cpufreq_userspace cpufreq_stats 
> cpufreq_powersave cpufreq_conservative xt_multiport uinput ib_iser rdma_cm 
> ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi fuse ip6table_filter ip6table_mangle ip6_tables 
> xt_tcpudp iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
> nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip_tables x_tables ext4 crc16 
> mbcache jbd2 loop firewire_sbp2 uvcvideo arc4 videobuf2_vmalloc 
> videobuf2_memops videobuf2_core videodev media b43 joydev bcma 
> snd_hda_codec_hdmi mac80211 cfg80211 snd_hda_codec_idt ir_lirc_codec rfkill 
> snd_hda_intel snd_hda_codec rng_core iTCO_wdt snd_hwdep snd_pcm r852 lirc_dev 
> iTCO_vendor_support ir_mce_kbd_decoder snd_page_alloc snd_seq ir_sony_decoder 
> ir_sanyo_decoder dell_wmi ir_jvc_decoder sm_common nand ir_nec_decoder 
> ir_rc6_decoder ir_rc5_decoder nand_ecc rc_rc6_mce nand_ids mtd lpc_ich 
> i2c_i801 dell_laptop coretemp acpi_cpufreq snd_seq_device !
>  r592 snd_timer mfd_core sparse_keymap dcdbas psmouse snd ite_cir rc_core 
> memstick soundcore processor pcspkr wmi serio_raw battery ac evdev btrfs xor 
> raid6_pq crc32c libcrc32c sha256_generic cbc dm_crypt dm_mod sg sd_mod 
> crct10dif_generic sr_mod crc_t10dif cdrom crct10dif_common firewire_ohci 
> sdhci_pci sdhci firewire_core crc_itu_t thermal ahci libahci tg3 libata ssb 
> mmc_core pcmcia i915 ptp pps_core libphy pcmcia_core video scsi_mod ehci_pci 
> uhci_hcd ehci_hcd button i2c_algo_bit drm_kms_helper drm i2c_core thermal_sys 
> usbcore usb_common
> [  322.369458] CPU: 0 PID: 4353 Comm: http Not tainted 3.12-1-amd64 #1 Debian 
> 3.12.6-2
> [  322.369493] Hardware name: Dell Inc. Studio 1737/0P792H, BIOS A09 
> 04/14/2011
> [  322.369534] task: 880036e80840 ti: 88009bf7e000 task.ti: 
> 88009bf7e000
> [  322.369569] RIP: 0010:[]  [] 
> memcpy+0x12/0x110
> [  322.369615] RSP: 0018:88009bf7f968  EFLAGS: 00010202
> [  322.369646] RAX: 88008e2eaf39 RBX: 0001 RCX: 
> 0001
> [  322.369679] RDX: 0001 RSI: db738800 RDI: 
> 88008e2eaf39
> [  322.369711] RBP: 880131ca9770 R08: 1000 R09: 
> 88009bf7f978
> [  322.369744] R10:  R11:  R12: 
> 6db6db6db6db6db7
> [  322.369776] R13: 1600 R14: 88008e2eaf3a R15: 
> 0001
> [  322.369810] FS:  7f00804a4720() GS:88013fc0() 
> knlGS:
> [  322.369855] CS:  0010 DS:  ES:  CR0: 80050033
> [  322.369881] CR2: ff600400 CR3: 9d52b000 CR4: 
> 07f0
> [  322.369914] Stack:
> [  322.369928]  a034a568  8801378a 
> 
> [  322.369980]  88013b3a9250 88008e2ea000 8800aa637d50 
> 8800925c3260
> [  322.370032]  a032f510 1000 880131ca96a0 
> ea0001f1a330
> [  322.370085] Call Trace:
> [  322.370119]  [] ? read_extent_buffer+0xc8/0x120 [btrfs]
> [  322.370164]  [] ? btrfs_get_extent+0x910/0x990 [btrfs]
> [  322.370214]  [] ? __do_readpage+0x398/0x780 [btrfs]
> [  322.370256]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
> [  322.370307]  [] ? 
> __extent_readpages.constprop.43+0x2d2/0x2f0 [btrfs]
> [  322.370355]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
> [  322.370406]  [] ? extent_readpages+0x182/0x190 [btrfs]
> [  322.370457]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
> [  322.370499]  [] ? kmem_getpages+0x15b/0x1a0
> [  322.370527]  [] ? alloc_pages_current+0x9d/0x160
> [  322.370565]  [] ? __do_page_cache_readahead+0x193/0x240
> [  322.370598]  [] ? ondemand_readahead+0x14a/0x280
> [  322.370636]  [] ? generic_file_aio_read+0x4a6/0x6f0
> [  322.370668]  [] ? do_sync_read+0x57/0x90
> [  322.370701]  [] ? vfs_read+0x94/0x160
> [  322.370726]  [] ? SyS_read+0x43/0xa0
> [  322.370759]  [] ? system_call_fastpath+0x16/0x1b
> [  322.370795] Code: 43 58 48 2b 43 50 88 43 4e eb e9 90 90 90 90 90 90 90 90 
> 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1  
> a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 
> [  322.371118] RIP  [] memcpy+0x12/0x110
> [  322.371154]  RSP 
> [  322.378460] ---[ end trace 4d0836c03317ee9d ]---

[PATCH 1/2] Btrfs: fix protection between walking backrefs and root deletion

2014-01-22 Thread Wang Shilong
There is a race condition between resolving indirect ref and root deletion,
and we should gurantee that root can not be destroyed to avoid accessing
broken tree here.

Here we fix it by holding @subvol_srcu, and we will release it as soon
as we have held root node lock.

Signed-off-by: Wang Shilong 
---
 fs/btrfs/backref.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 964679c..fd9ae72 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -299,23 +299,34 @@ static int __resolve_indirect_ref(struct btrfs_fs_info 
*fs_info,
int ret = 0;
int root_level;
int level = ref->level;
+   int index;
 
root_key.objectid = ref->root_id;
root_key.type = BTRFS_ROOT_ITEM_KEY;
root_key.offset = (u64)-1;
+
+   index = srcu_read_lock(&fs_info->subvol_srcu);
+
root = btrfs_read_fs_root_no_name(fs_info, &root_key);
if (IS_ERR(root)) {
+   srcu_read_unlock(&fs_info->subvol_srcu, index);
ret = PTR_ERR(root);
goto out;
}
 
root_level = btrfs_old_root_level(root, time_seq);
 
-   if (root_level + 1 == level)
+   if (root_level + 1 == level) {
+   srcu_read_unlock(&fs_info->subvol_srcu, index);
goto out;
+   }
 
path->lowest_level = level;
ret = btrfs_search_old_slot(root, &ref->key_for_search, path, time_seq);
+
+   /* root node has been locked, we can release @subvol_srcu safely here */
+   srcu_read_unlock(&fs_info->subvol_srcu, index);
+
pr_debug("search slot in root %llu (level %d, ref count %d) returned "
 "%d for key (%llu %u %llu)\n",
 ref->root_id, level, ref->count, ret,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: fix to catch all errors when resolving indirect ref

2014-01-22 Thread Wang Shilong
We can only tolerate ENOENT here, for other errors, we should
return directly.

Signed-off-by: Wang Shilong 
---
 fs/btrfs/backref.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index fd9ae72..3512437 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -387,10 +387,16 @@ static int __resolve_indirect_refs(struct btrfs_fs_info 
*fs_info,
continue;
err = __resolve_indirect_ref(fs_info, path, time_seq, ref,
 parents, extent_item_pos);
-   if (err == -ENOMEM)
-   goto out;
-   if (err)
+   /*
+* we can only tolerate ENOENT,otherwise,we should catch error
+* and return directly.
+*/
+   if (err == -ENOENT) {
continue;
+   } else if (err) {
+   ret = err;
+   goto out;
+   }
 
/* we put the first parent into the ref at hand */
ULIST_ITER_INIT(&uiter);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#736227: linux-headers-3.12-1-amd64: general protection fault when using aptitude

2014-01-22 Thread Γιώργος Πάλλας


In case this may help: Today the hard disk has reported unreadable 
sectors, so the issue reported could be related to some kind of emerging 
disk failure.


Giorgos

On 23/01/2014 07:47 πμ, Ben Hutchings wrote:

I'm sending this on to the btrfs developers to see if they can help.

On Tue, 2014-01-21 at 11:00 +0200, Giorgos Pallas wrote:
[...]

I just installed 3.12-amd64 stock kernel. It booted OK, I opened a konsole
and just tried to installed the kernel headers.  Just as aptitude tried to
start downloading packages, I got:
E: Method http has died unexpectedly!
E: Sub-process http received a segmentation fault.
and btrfs has crashed as seen in dmesg.

It should be noted that I use btrfs without problems with 3.7-trunk-amd64
for almost a year.

[...]

The full message can be seen at ; here is
the crash log:


[  322.368186] general protection fault:  [#1] SMP
[  322.368233] Modules linked in: cpufreq_userspace cpufreq_stats 
cpufreq_powersave cpufreq_conservative xt_multiport uinput ib_iser rdma_cm 
ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi fuse ip6table_filter ip6table_mangle ip6_tables xt_tcpudp 
iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_mangle ip_tables x_tables ext4 crc16 mbcache jbd2 loop 
firewire_sbp2 uvcvideo arc4 videobuf2_vmalloc videobuf2_memops videobuf2_core 
videodev media b43 joydev bcma snd_hda_codec_hdmi mac80211 cfg80211 
snd_hda_codec_idt ir_lirc_codec rfkill snd_hda_intel snd_hda_codec rng_core 
iTCO_wdt snd_hwdep snd_pcm r852 lirc_dev iTCO_vendor_support ir_mce_kbd_decoder 
snd_page_alloc snd_seq ir_sony_decoder ir_sanyo_decoder dell_wmi ir_jvc_decoder 
sm_common nand ir_nec_decoder ir_rc6_decoder ir_rc5_decoder nand_ecc rc_rc6_mce 
nand_ids mtd lpc_ich i2c_i801 dell_laptop coretemp acpi_cpufreq snd_seq_device !
  r592 snd_timer mfd_core sparse_keymap dcdbas psmouse snd ite_cir rc_core 
memstick soundcore processor pcspkr wmi serio_raw battery ac evdev btrfs xor 
raid6_pq crc32c libcrc32c sha256_generic cbc dm_crypt dm_mod sg sd_mod 
crct10dif_generic sr_mod crc_t10dif cdrom crct10dif_common firewire_ohci 
sdhci_pci sdhci firewire_core crc_itu_t thermal ahci libahci tg3 libata ssb 
mmc_core pcmcia i915 ptp pps_core libphy pcmcia_core video scsi_mod ehci_pci 
uhci_hcd ehci_hcd button i2c_algo_bit drm_kms_helper drm i2c_core thermal_sys 
usbcore usb_common
[  322.369458] CPU: 0 PID: 4353 Comm: http Not tainted 3.12-1-amd64 #1 Debian 
3.12.6-2
[  322.369493] Hardware name: Dell Inc. Studio 1737/0P792H, BIOS A09 04/14/2011
[  322.369534] task: 880036e80840 ti: 88009bf7e000 task.ti: 
88009bf7e000
[  322.369569] RIP: 0010:[]  [] 
memcpy+0x12/0x110
[  322.369615] RSP: 0018:88009bf7f968  EFLAGS: 00010202
[  322.369646] RAX: 88008e2eaf39 RBX: 0001 RCX: 0001
[  322.369679] RDX: 0001 RSI: db738800 RDI: 88008e2eaf39
[  322.369711] RBP: 880131ca9770 R08: 1000 R09: 88009bf7f978
[  322.369744] R10:  R11:  R12: 6db6db6db6db6db7
[  322.369776] R13: 1600 R14: 88008e2eaf3a R15: 0001
[  322.369810] FS:  7f00804a4720() GS:88013fc0() 
knlGS:
[  322.369855] CS:  0010 DS:  ES:  CR0: 80050033
[  322.369881] CR2: ff600400 CR3: 9d52b000 CR4: 07f0
[  322.369914] Stack:
[  322.369928]  a034a568  8801378a 

[  322.369980]  88013b3a9250 88008e2ea000 8800aa637d50 
8800925c3260
[  322.370032]  a032f510 1000 880131ca96a0 
ea0001f1a330
[  322.370085] Call Trace:
[  322.370119]  [] ? read_extent_buffer+0xc8/0x120 [btrfs]
[  322.370164]  [] ? btrfs_get_extent+0x910/0x990 [btrfs]
[  322.370214]  [] ? __do_readpage+0x398/0x780 [btrfs]
[  322.370256]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  322.370307]  [] ? 
__extent_readpages.constprop.43+0x2d2/0x2f0 [btrfs]
[  322.370355]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  322.370406]  [] ? extent_readpages+0x182/0x190 [btrfs]
[  322.370457]  [] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  322.370499]  [] ? kmem_getpages+0x15b/0x1a0
[  322.370527]  [] ? alloc_pages_current+0x9d/0x160
[  322.370565]  [] ? __do_page_cache_readahead+0x193/0x240
[  322.370598]  [] ? ondemand_readahead+0x14a/0x280
[  322.370636]  [] ? generic_file_aio_read+0x4a6/0x6f0
[  322.370668]  [] ? do_sync_read+0x57/0x90
[  322.370701]  [] ? vfs_read+0x94/0x160
[  322.370726]  [] ? SyS_read+0x43/0xa0
[  322.370759]  [] ? system_call_fastpath+0x16/0x1b
[  322.370795] Code: 43 58 48 2b 43 50 88 43 4e eb e9 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1  a4 c3 20 
4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d
[  322.371118] RIP  [] memcpy+0x12/0x110
[  322.371154]  RSP 
[  322.378460] ---[

Re: [PATCH] btrfs: fix warning while merging two adjacent extents

2014-01-22 Thread Liu Bo
On Thu, Jan 23, 2014 at 01:41:09PM +0800, Gui Hecheng wrote:
> When we have two adjacent extents in relink_extent_backref,
> we try to merge them. When we use btrfs_search_slot to locate the
> slot for the current extent, we shouldn't set "ins_len = 1",
> because we will merge it into the previous extent rather than
> insert a new item. Otherwise, we may happen to create a new leaf
> in btrfs_search_slot and path->slot[0] will be 0. Then we try to
> fetch the previous item using "path->slots[0]--", and it will cause
> a warning as follows:
> 
>   [  145.713385] WARNING: CPU: 3 PID: 1796 at fs/btrfs/extent_io.c:5043 
> map_private_extent_buffer+0xd4/0xe0
>   [  145.713387] btrfs bad mapping eb start 5337088 len 4096, wanted 
> 167772306 8
>   ...
>   [  145.713462]  [] map_private_extent_buffer+0xd4/0xe0
>   [  145.713476]  [] ? btrfs_free_path+0x2a/0x40
>   [  145.713485]  [] btrfs_get_token_64+0x64/0xf0
>   [  145.713498]  [] relink_extent_backref+0x41c/0x820
>   [  145.713508]  [] btrfs_finish_ordered_io+0x239/0xa80
> 
> I encounter this warning when running defrag having mkfs.btrfs
> with option -M. At the same time there are read/writes & snapshots
> running at background.

Looks good.

Reviewed-by: Liu Bo 

> 
> Signed-off-by: Gui Hecheng 
> ---
>  fs/btrfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 1ea19ce..7f955d6 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2314,7 +2314,7 @@ again:
>   u64 extent_len;
>   struct btrfs_key found_key;
>  
> - ret = btrfs_search_slot(trans, root, &key, path, 1, 1);
> + ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
>   if (ret < 0)
>   goto out_free_path;
>  
> -- 
> 1.8.0.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: enclose uuid tree compat code with ifdefs

2014-01-22 Thread Wang Shilong

Hi David,

On 01/21/2014 11:56 PM, David Sterba wrote:

Commit "Btrfs-progs: make send/receive compatible with older kernels"
adds code that will become deprecated, let's clearly mark it in the
sources.

CC: Stefan Behrens 
CC: Wang Shilong 
Signed-off-by: David Sterba 
---
  send-utils.c |   28 
  send-utils.h |   10 ++
  2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/send-utils.c b/send-utils.c
index 1772d2c5c0f3..8d4f46e3dd04 100644
--- a/send-utils.c
+++ b/send-utils.c
@@ -159,6 +159,7 @@ static int btrfs_read_root_item(int mnt_fd, u64 root_id,
return 0;
  }
  
+#ifdef BTRFS_COMPAT_SEND_NO_UUID_TREE

  static struct rb_node *tree_insert(struct rb_root *root,
   struct subvol_info *si,
   enum subvol_search_type type)
@@ -223,6 +224,7 @@ static struct rb_node *tree_insert(struct rb_root *root,
}
return NULL;
  }
+#endif
  
  int btrfs_subvolid_resolve(int fd, char *path, size_t path_len, u64 subvol_id)

  {
@@ -320,6 +322,7 @@ static int btrfs_subvolid_resolve_sub(int fd, char *path, 
size_t *path_len,
return 0;
  }
  
+#ifdef BTRFS_COMPAT_SEND_NO_UUID_TREE

  static int count_bytes(void *buf, int len, char b)
  {
int cnt = 0;
@@ -416,6 +419,16 @@ static struct subvol_info *subvol_uuid_search_old(struct 
subvol_uuid_search *s,
return NULL;
return tree_search(root, root_id, uuid, transid, path, type);
  }
+#else
+void subvol_uuid_search_add(struct subvol_uuid_search *s,
+   struct subvol_info *si)
+{
+   if (si) {
+   free(si->path);
+   free(si);
+   }
+}
+#endif

I noticed subvol_uuid_search_add() function before, anyway
it is not called anywhere before, aslo IMO it is a little strange that 
we free

memory here.

Thanks,
Wang
  
  struct subvol_info *subvol_uuid_search(struct subvol_uuid_search *s,

   u64 root_id, const u8 *uuid, u64 transid,
@@ -426,9 +439,11 @@ struct subvol_info *subvol_uuid_search(struct 
subvol_uuid_search *s,
struct btrfs_root_item root_item;
struct subvol_info *info = NULL;
  
+#ifdef BTRFS_COMPAT_SEND_NO_UUID_TREE

if (!s->uuid_tree_existed)
return subvol_uuid_search_old(s, root_id, uuid, transid,
 path, type);
+#endif
switch (type) {
case subvol_search_by_received_uuid:
ret = btrfs_lookup_uuid_received_subvol_item(s->mnt_fd, uuid,
@@ -481,6 +496,7 @@ out:
return info;
  }
  
+#ifdef BTRFS_COMPAT_SEND_NO_UUID_TREE

  static int is_uuid_tree_supported(int fd)
  {
int ret;
@@ -679,6 +695,18 @@ void subvol_uuid_search_finit(struct subvol_uuid_search *s)
s->received_subvols = RB_ROOT;
s->path_subvols = RB_ROOT;
  }
+#else
+int subvol_uuid_search_init(int mnt_fd, struct subvol_uuid_search *s)
+{
+   s->mnt_fd = mnt_fd;
+
+   return 0;
+}
+
+void subvol_uuid_search_finit(struct subvol_uuid_search *s)
+{
+}
+#endif
  
  char *path_cat(const char *p1, const char *p2)

  {
diff --git a/send-utils.h b/send-utils.h
index 943b0277cf7e..f451c1cb6071 100644
--- a/send-utils.h
+++ b/send-utils.h
@@ -30,6 +30,12 @@
  extern "C" {
  #endif
  
+/*

+ * Compatibility code for kernels < 3.12; the UUID tree is not available there
+ * and we have to do the slow search. This should be deprecated someday.
+ */
+#define BTRFS_COMPAT_SEND_NO_UUID_TREE 1
+
  enum subvol_search_type {
subvol_search_by_root_id,
subvol_search_by_uuid,
@@ -38,10 +44,12 @@ enum subvol_search_type {
  };
  
  struct subvol_info {

+#ifdef BTRFS_COMPAT_SEND_NO_UUID_TREE
struct rb_node rb_root_id_node;
struct rb_node rb_local_node;
struct rb_node rb_received_node;
struct rb_node rb_path_node;
+#endif
  
  	u64 root_id;

u8 uuid[BTRFS_UUID_SIZE];
@@ -57,12 +65,14 @@ struct subvol_info {
  
  struct subvol_uuid_search {

int mnt_fd;
+#ifdef BTRFS_COMPAT_SEND_NO_UUID_TREE
int uuid_tree_existed;
  
  	struct rb_root root_id_subvols;

struct rb_root local_subvols;
struct rb_root received_subvols;
struct rb_root path_subvols;
+#endif
  };
  
  int subvol_uuid_search_init(int mnt_fd, struct subvol_uuid_search *s);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html