Re: Can btrfs silently repair read-error in raid1

2012-05-09 Thread Atila

On 08-05-2012 18:47, Hubert Kario wrote:

On Tuesday 08 of May 2012 04:45:51 cwillu wrote:

On Tue, May 8, 2012 at 1:36 AM, Fajar A. Nugrahal...@fajar.net  wrote:

On Tue, May 8, 2012 at 2:13 PM, Clemens Eissererlinuxhi...@gmail.com

wrote:

Hi,

I have a quite unreliable SSD here which develops some bad blocks from
time to time which result in read-errors.
Once the block is written to again, its remapped internally and
everything is fine again for that block.

Would it be possible to create 2 btrfs partitions on that drive and
use it in RAID1 - with btrfs silently repairing read-errors when they
occur?
Would it require special settings, to not fallback to read-only mode
when a read-error occurs?

The problem would be how the SSD (and linux) behaves when it
encounters bad blocks (not bad disks, which is easier).

If it does oh, I can't read this block. I just return an error
immediately, then it's good.

However, in most situation, it would be like hmmm, I can't read this
block, let me retry that again. What? still error? then lets retry it
again, and again., which could take several minutes for a single bad
block. And during that time linux (the kernel) would do something like
hey, the disk is not responding. Why don't we try some stuff? Let's
try resetting the link. If it doesn't work, try downgrading the link
speed.

In short, if you KNOW the SSD is already showing signs of bad blocks,
better just throw it away.

The excessive number of retries (basically, the kernel repeating the
work the drive already attempted) is being addressed in the block
layer.

[PATCH] libata-eh don't waste time retrying media errors (v3), I
believe this is queued for 3.5

I just hope they don't remove retries completely, I've seen the second or
third try return correct data on multiple disks from different vendors.
(Which allowed me to use dd to write the data back to force relocation)

But yes, Linux is a bit too overzelous with regards to retries...

Regards,
I hope they do. If you wish, you can force the retry, just trying your 
command again. This decision should happen in a higher level.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


failed disk (was: kernel 3.3.4 damages filesystem (?))

2012-05-09 Thread Helmut Hullen
Hallo, Hugo,

Du meintest am 07.05.12:

mkfs.btrfs -m raid1 -d single should give you that.

 What's the difference to

  mkfs.btrfs -m raid1 -d raid0

  - RAID-0 stripes each piece of data across all the disks.
  - single puts data on one disk at a time.

[...]


In fact, this is probably a good argument for having the option to
 put back the old allocator algorithm, which would have ensured that
 the first disk would fill up completely first before it touched the
 next one...

The actual version seems to oscillate from disk to disk:

Copying about 160 GiByte shows

Label: none  uuid: fd0596c6-d819-42cd-bb4a-420c38d2a60b
Total devices 2 FS bytes used 155.64GB
devid2 size 136.73GB used 114.00GB path /dev/sdl1
devid1 size 68.37GB used 45.04GB path /dev/sdk1

Btrfs Btrfs v0.19



Watching the amount showed that both disks are filled nearly  
simultaneously.

That would be more difficult to restore ...

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


failed disk (was: kernel 3.3.4 damages filesystem (?))

2012-05-09 Thread Helmut Hullen
Hallo, Hugo,

Du meintest am 07.05.12:

[...]

 With a file system like ext2/3/4 I can work with several directories
 which are mounted together, but (as said before) one broken disk
 doesn't disturb the others.

mkfs.btrfs -m raid1 -d single should give you that.

Just a small bug, perhaps:

created a system with

mkfs.btrfs -m raid1 -d single /dev/sdl1
mount /dev/sdl1 /mnt/Scsi
btrfs device add /dev/sdk1 /mnt/Scsi
btrfs device add /dev/sdm1 /mnt/Scsi
(filling with data)

and

btrfs fi df /mnt/Scsi

now tells

Data, RAID0: total=183.18GB, used=76.60GB
Data: total=80.01GB, used=79.83GB
System, DUP: total=8.00MB, used=32.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=192.74MB
Metadata: total=8.00MB, used=0.00

--

Data, RAID0 confuses me (not very much ...), and the system for  
metadata (RAID1) is not told.


Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: failed disk (was: kernel 3.3.4 damages filesystem (?))

2012-05-09 Thread Hugo Mills
On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote:
 Du meintest am 07.05.12:
 
 [...]
 
  With a file system like ext2/3/4 I can work with several directories
  which are mounted together, but (as said before) one broken disk
  doesn't disturb the others.
 
 mkfs.btrfs -m raid1 -d single should give you that.
 
 Just a small bug, perhaps:
 
 created a system with
 
 mkfs.btrfs -m raid1 -d single /dev/sdl1
 mount /dev/sdl1 /mnt/Scsi
 btrfs device add /dev/sdk1 /mnt/Scsi
 btrfs device add /dev/sdm1 /mnt/Scsi
 (filling with data)
 
 and
 
 btrfs fi df /mnt/Scsi
 
 now tells
 
 Data, RAID0: total=183.18GB, used=76.60GB
 Data: total=80.01GB, used=79.83GB
 System, DUP: total=8.00MB, used=32.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=1.00GB, used=192.74MB
 Metadata: total=8.00MB, used=0.00
 
 --
 
 Data, RAID0 confuses me (not very much ...), and the system for  
 metadata (RAID1) is not told.

   DUP is two copies of each block, but it allows the two copies to
live on the same device. It's done this because you started with a
single device, and you can't do RAID-1 on one device. The first bit of
metadata you write to it should automatically upgrade the DUP chunk to
RAID-1.

   As to the spurious upgrade of single to RAID-0, I thought Ilya
had stopped it doing that. What kernel version are you running?

   Out of interest, why did you do the device adds separately, instead
of just this?

# mkfs.btrfs -m raid1 -d single /dev/sdl1 /dev/sdk1 /dev/sdm1

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Comic Sans goes into a bar,  and the barman says, We don't ---   
 serve your type here.  


signature.asc
Description: Digital signature


Re: [PATCH 1/5] btrfs: extend readahead interface

2012-05-09 Thread David Sterba
On Thu, Apr 12, 2012 at 05:54:38PM +0200, Arne Jansen wrote:
 @@ -97,30 +119,87 @@ struct reada_machine_work {
 +/*
 + * this is the default callback for readahead. It just descends into the
 + * tree within the range given at creation. if an error occurs, just cut
 + * this part of the tree
 + */
 +static void readahead_descend(struct btrfs_root *root, struct reada_control 
 *rc,
 +   u64 wanted_generation, struct extent_buffer *eb,
 +   u64 start, int err, struct btrfs_key *top,
 +   void *ctx)
 +{
 + int nritems;
 + u64 generation;
 + int level;
 + int i;
 +
 + BUG_ON(err == -EAGAIN); /* FIXME: not yet implemented, don't cancel
 +  * readahead with default callback */
 +
 + if (err || eb == NULL) {
 + /*
 +  * this is the error case, the extent buffer has not been
 +  * read correctly. We won't access anything from it and
 +  * just cleanup our data structures. Effectively this will
 +  * cut the branch below this node from read ahead.
 +  */
 + return;
 + }
 +
 + level = btrfs_header_level(eb);
 + if (level == 0) {
 + /*
 +  * if this is a leaf, ignore the content.
 +  */
 + return;
 + }
 +
 + nritems = btrfs_header_nritems(eb);
 + generation = btrfs_header_generation(eb);
 +
 + /*
 +  * if the generation doesn't match, just ignore this node.
 +  * This will cut off a branch from prefetch. Alternatively one could
 +  * start a new (sub-) prefetch for this branch, starting again from
 +  * root.
 +  */
 + if (wanted_generation != generation)
 + return;

I think I saw passing wanted_generation = 0 somewheree, but cannot find
it now again. Is it an expected value for the default RA callback,
meaning eg.  'any generation I find' ?

 +
 + for (i = 0; i  nritems; i++) {
 + u64 n_gen;
 + struct btrfs_key key;
 + struct btrfs_key next_key;
 + u64 bytenr;
 +
 + btrfs_node_key_to_cpu(eb, key, i);
 + if (i + 1  nritems)
 + btrfs_node_key_to_cpu(eb, next_key, i + 1);
 + else
 + next_key = *top;
 + bytenr = btrfs_node_blockptr(eb, i);
 + n_gen = btrfs_node_ptr_generation(eb, i);
 +
 + if (btrfs_comp_cpu_keys(key, rc-key_end)  0 
 + btrfs_comp_cpu_keys(next_key, rc-key_start)  0)
 + reada_add_block(rc, bytenr, next_key,
 + level - 1, n_gen, ctx);
 + }
 +}
  
 @@ -142,65 +221,21 @@ static int __readahead_hook(struct btrfs_root *root, 
 struct extent_buffer *eb,
   re-scheduled_for = NULL;
   spin_unlock(re-lock);
  
 - if (err == 0) {
 - nritems = level ? btrfs_header_nritems(eb) : 0;
 - generation = btrfs_header_generation(eb);
 - /*
 -  * FIXME: currently we just set nritems to 0 if this is a leaf,
 -  * effectively ignoring the content. In a next step we could
 -  * trigger more readahead depending from the content, e.g.
 -  * fetch the checksums for the extents in the leaf.
 -  */
 - } else {
 + /*
 +  * call hooks for all registered readaheads
 +  */
 + list_for_each_entry(rec, list, list) {
 + btrfs_tree_read_lock(eb);
   /*
 -  * this is the error case, the extent buffer has not been
 -  * read correctly. We won't access anything from it and
 -  * just cleanup our data structures. Effectively this will
 -  * cut the branch below this node from read ahead.
 +  * we set the lock to blocking, as the callback might want to
 +  * sleep on allocations.

What about a more finer control given to the callbacks? The blocking
lock may be unnecessary if the callback does not sleep.

My idea is to add a field to 'struct reada_uptodate_ctx', preset with
BTRFS_READ_LOCK by default, but let the RA user to set it to its needs.

*/
 - nritems = 0;
 - generation = 0;
 + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
 + rec-rc-callback(root, rec-rc, rec-generation, eb, start,
 +   err, re-top, rec-ctx);
 + btrfs_tree_read_unlock_blocking(eb);
   }
  
 @@ -521,12 +593,87 @@ static void reada_control_release(struct kref *kref)
 +/*
 + * context to pass from reada_add_block to worker in case the extent is
 + * already uptodate in memory
 + */
 +struct reada_uptodate_ctx {
 + struct btrfs_keytop;
 + struct extent_buffer*eb;
 + struct reada_control*rc;
 + u64 logical;
 + u64 

Re: kernel 3.3.4 damages filesystem (?)

2012-05-09 Thread Kaspar Schleiser

Hi,

On 05/08/2012 10:56 PM, Roman Mamedov wrote:

Regarding btrfs, AFAIK even btrfs -d single suggested above works not per
file, but per allocation extent, so in case of one disk failure you will lose
random *parts* (extents) of random files, which in effect could mean no file
in your whole file system will remain undamaged.
Maybe we should evaluate the possiblility of such a one file gets on 
one disk feature.


Helmut Hullen has the use case: Many disks, totally non-critical but 
nice-to-have data. If one disk dies, some *files* should lost, not some 
*random parts of all files*.


This could be accomplished by some userspace-tool that moves stuff 
around, combined with file pinning-support, that lets the user make 
sure a specific file is on a specific disk.


Cheers
Kaspar



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: failed disk

2012-05-09 Thread Helmut Hullen
Hallo, Hugo,

Du meintest am 09.05.12:

mkfs.btrfs -m raid1 -d single should give you that.

 Just a small bug, perhaps:

 created a system with

 mkfs.btrfs -m raid1 -d single /dev/sdl1
 mount /dev/sdl1 /mnt/Scsi
 btrfs device add /dev/sdk1 /mnt/Scsi
 btrfs device add /dev/sdm1 /mnt/Scsi
 (filling with data)

 and

 btrfs fi df /mnt/Scsi

 now tells

 Data, RAID0: total=183.18GB, used=76.60GB
 Data: total=80.01GB, used=79.83GB
 System, DUP: total=8.00MB, used=32.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=1.00GB, used=192.74MB
 Metadata: total=8.00MB, used=0.00

 --

 Data, RAID0 confuses me (not very much ...), and the system for
 metadata (RAID1) is not told.

DUP is two copies of each block, but it allows the two copies to
 live on the same device. It's done this because you started with a
 single device, and you can't do RAID-1 on one device. The first bit
 of metadata you write to it should automatically upgrade the DUP
 chunk to RAID-1.

Ok.

Sounds familiar - have you explained that to me many months ago?

As to the spurious upgrade of single to RAID-0, I thought Ilya
 had stopped it doing that. What kernel version are you running?

3.2.9, self made.
I could test the message with 3.3.4, but not today (if it's only an  
interpretation of always the same data).

Out of interest, why did you do the device adds separately,
 instead of just this?

a) making the first 2 devices: I have tested both versions (one line  
with 2 devices or 2 lines with 1 device); no big difference.

But I had tested the option -L (labelling) too, and that makes shit  
for the oneliner: both devices get the same label, and then findfs  
finds none of them.

The really safe way would be: deleting this option for the mkfs.btrfs  
command and only using

btrfs fi label device [newlabel]

b) third device: that's my usual test:
make a cluster of 2 deivces
fill them with data
add a third device
delete the smallest device

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: failed disk

2012-05-09 Thread Hugo Mills
On Wed, May 09, 2012 at 05:14:00PM +0200, Helmut Hullen wrote:
 Hallo, Hugo,
 
 Du meintest am 09.05.12:
 
 DUP is two copies of each block, but it allows the two copies to
  live on the same device. It's done this because you started with a
  single device, and you can't do RAID-1 on one device. The first bit
  of metadata you write to it should automatically upgrade the DUP
  chunk to RAID-1.
 
 Ok.
 
 Sounds familiar - have you explained that to me many months ago?

   Probably. I tend to explain this kind of thing a lot to people.

 As to the spurious upgrade of single to RAID-0, I thought Ilya
  had stopped it doing that. What kernel version are you running?
 
 3.2.9, self made.

   OK, I'm pretty sure that's too old -- it will upgrade single to
RAID-0. You can probably turn it back to single using balance
filters:

# btrfs fi balance -dconvert=single /mountpoint

(You may want to write at least a little data to the FS first --
balance has some slightly odd behaviour on empty filesystems).

 I could test the message with 3.3.4, but not today (if it's only an  
 interpretation of always the same data).
 
 Out of interest, why did you do the device adds separately,
  instead of just this?
 
 a) making the first 2 devices: I have tested both versions (one line  
 with 2 devices or 2 lines with 1 device); no big difference.
 
 But I had tested the option -L (labelling) too, and that makes shit  
 for the oneliner: both devices get the same label, and then findfs  
 finds none of them.

   Umm... Yes, of course both devices will get the same label --
you're labelling the filesystem, not the devices. (Didn't we have this
argument some time ago?).

   I don't know what findfs is doing, that it can't find the
filesystem by label: you may need to run sync after mkfs, possibly.

 The really safe way would be: deleting this option for the mkfs.btrfs  
 command and only using
 
 btrfs fi label device [newlabel]

   ... except that it'd have to take a filesystem as parameter, not a
device (see above).

 b) third device: that's my usual test:
 make a cluster of 2 deivces
 fill them with data
 add a third device
 delete the smallest device

   What are you testing? And by delete do you mean btrfs dev
delete or pull the cable out?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Quidquid latine dictum sit,  altum videtur. ---   


signature.asc
Description: Digital signature


Re: failed disk (was: kernel 3.3.4 damages filesystem (?))

2012-05-09 Thread Ilya Dryomov
On Wed, May 09, 2012 at 03:37:35PM +0100, Hugo Mills wrote:
 On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote:
  Du meintest am 07.05.12:
  
  [...]
  
   With a file system like ext2/3/4 I can work with several directories
   which are mounted together, but (as said before) one broken disk
   doesn't disturb the others.
  
  mkfs.btrfs -m raid1 -d single should give you that.
  
  Just a small bug, perhaps:
  
  created a system with
  
  mkfs.btrfs -m raid1 -d single /dev/sdl1
  mount /dev/sdl1 /mnt/Scsi
  btrfs device add /dev/sdk1 /mnt/Scsi
  btrfs device add /dev/sdm1 /mnt/Scsi
  (filling with data)
  
  and
  
  btrfs fi df /mnt/Scsi
  
  now tells
  
  Data, RAID0: total=183.18GB, used=76.60GB
  Data: total=80.01GB, used=79.83GB
  System, DUP: total=8.00MB, used=32.00KB
  System: total=4.00MB, used=0.00
  Metadata, DUP: total=1.00GB, used=192.74MB
  Metadata: total=8.00MB, used=0.00
  
  --
  
  Data, RAID0 confuses me (not very much ...), and the system for  
  metadata (RAID1) is not told.
 
DUP is two copies of each block, but it allows the two copies to
 live on the same device. It's done this because you started with a
 single device, and you can't do RAID-1 on one device. The first bit of

What Hugo said.  Newer mkfs.btrfs will error out if you try to do this.

 metadata you write to it should automatically upgrade the DUP chunk to
 RAID-1.

We don't upgrade chunks in place, only during balance.

 
As to the spurious upgrade of single to RAID-0, I thought Ilya
 had stopped it doing that. What kernel version are you running?

I did, but again, we were doing it only as part of balance, not as part
of normal operation.

Helmut, do you have any additional data points - the output of btrfs fi
df right after you created FS or somewhere in the middle of filling it ?

Also could you please paste the output of btrfs fi show and tell us what
kernel version you are running ?

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use ALIGN macro instead of open-coded expression

2012-05-09 Thread David Sterba
On Tue, May 08, 2012 at 04:16:24PM +0800, Yuanhan Liu wrote:
 According to section 'Find open-coded helpers or macros' at
 https://btrfs.wiki.kernel.org/index.php/Cleanup_ideas, here
 in the patch we use ALIGN macro to do the alignment.

Well, I wrote this section and some time later also the patches,
http://www.spinics.net/lists/linux-btrfs/msg12747.html

but did not update the section with the status reflecting this, sorry
that you duplicated work.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subdirectory creation on snapshot

2012-05-09 Thread David Sterba
On Mon, May 07, 2012 at 05:10:08PM -0700, Brendan Smithyman wrote:
 I'm experiencing some odd-seeming behaviour with btrfs on Ubuntu
 12.04, using the Ubuntu x86-64 generic 3.2.0-24 kernel and btrfs-tools
 0.19+20120328-1~precise1 (backported from the current Debian version
 using Ubuntu's backportpackage).  When I snapshot a subvolume on some
 of my drives, it creates an empty directory inside the snapshot with
 the same name as the original subvolume.  Example case and details
 below:

This is known and it's not a problem, though I was surprised when I had
first seen it myself. Snapshotting is not recursive, the case of
file-file, directory-directory is straightforward, and when a
subvolume is encountered, a new file sub-type is created, it's
identified by BTRFS_EMPTY_SUBVOL_DIR_OBJECTID internally, so it's a kind
of a stub subvolume.  It is identified by inode number 2 in stat
output. The object cannot be modified and just sits there.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs: extended inode refs

2012-05-09 Thread Chris Mason
On Tue, May 08, 2012 at 03:57:39PM -0700, Mark Fasheh wrote:
 Hi Jan, comments inline as usual!
 
  This function must not call free_extent_buffer(eb) in line 1306 after
  applying your patch set (immediately before the break). Second, I think
  we'd better add a blocking read lock on eb after incrementing it's
  refcount, because we need the current content to stay as it is. Both
  isn't part of your patches, but it might be easier if you make that
  bugfix change as a 3/4 patch within your set and turn this one into 4/4.
  If you don't like that, I'll send a separate patch for it. Don't miss
  the unlock if you do it ;-)
 
 Ok, I think I was able to figure out and add the correct locking calls.
 
 Basically I believe I need to wrap access around:
 
 btrfs_tree_read_lock(eb);
 btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
 
 read eb contents
 
 btrfs_tree_read_unlock_blocking(eb);

You only need a blocking lock if you're scheduling.  Otherwise the
spinlock variant is fine.

   +
   + while (1) {
   + ret = btrfs_find_one_extref(fs_root, inum, offset, path, iref2,
   + offset);
   + if (ret  0)
   + break;
   + if (ret) {
   + ret = found ? 0 : -ENOENT;
   + break;
   + }
   + ++found;
   +
   + slot = path-slots[0];
   + eb = path-nodes[0];
   + /* make sure we can use eb after releasing the path */
   + atomic_inc(eb-refs);
  
  You need a blocking read lock here, too. Grab it before releasing the path.

If you're calling btrfs_search_slot, it will give you a blocking lock
on the leaf.  If you set path-leave_spinning before the call, you'll
have a spinning lock on the leaf.

If you unlock a block that you got from a path (like eb =
path-nodes[0]), the path structure has a flag for each level that
indicates if that block was locked or not.  See btrfs_release_path().
So, don't fiddle the locks without fiddling the paths.

You can switch from spinning to/from blocking without touching the path, it
figures that out.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] btrfs: add command to zero out superblock

2012-05-09 Thread David Sterba
On Thu, May 03, 2012 at 03:11:45PM +0200, Hubert Kario wrote:
 nice, didn't know about this. Such functionality would be nice to have.
 But then I don't think that a recreate the array if the parameters are the 
 same is actually a good idea, lots of space for error. A pair of functions:
 
 btrfs dev zero-superblock
 btrfs dev restore-superblock

As a user, I'm not sure what can I expect from the restore command. From
where does it restore? Eg. a file?

As a tester I have use for a temporary clearing of a superblock on a
device, then mount it with -o degraded, work work, and then undo
clearing. So, my idea is like

  btrfs device zero-superblock --undo

with the obvious sanity checks. A regular user would never need to call
this.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] btrfs: add command to zero out superblock

2012-05-09 Thread Hubert Kario
On Wednesday 09 of May 2012 19:18:07 David Sterba wrote:
 On Thu, May 03, 2012 at 03:11:45PM +0200, Hubert Kario wrote:
  nice, didn't know about this. Such functionality would be nice to have.
  But then I don't think that a recreate the array if the parameters are
  the
  same is actually a good idea, lots of space for error. A pair of
  functions:
  
  btrfs dev zero-superblock
  btrfs dev restore-superblock
 
 As a user, I'm not sure what can I expect from the restore command. From
 where does it restore? Eg. a file?
 
 As a tester I have use for a temporary clearing of a superblock on a
 device, then mount it with -o degraded, work work, and then undo
 clearing. So, my idea is like
 
   btrfs device zero-superblock --undo
 
 with the obvious sanity checks. A regular user would never need to call
 this.

Yes, that's a better idea.

-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2] Btrfs: improve space count for files with fragments

2012-05-09 Thread David Sterba
On Fri, Apr 27, 2012 at 09:44:13AM +0800, Liu Bo wrote:
 Let's take the above case:
 0k  20k
 | --- extent --- |
 | - A - |
 1k  19k
 
 And we assume that this extent starts from disk_bytenr on its FS logical 
 offset.
 
 By splitting the [0k, 20k) extent, we'll get three delayed refs into the 
 delayed-ref rbtree:
 a) [0k, 20k),  in which only [disk_bytenr+1k, disk_bytenr+19k) will be freed 
 at the end.
 b) [0k, 1k),   which will _not_ allocate a new extent but use the remained 
 space of [0k, 20k).
 c) [19k, 20k), ditto.
 
 And another ref [1k,19k) will get a new allocated space by our normal endio 
 routine.
 
 What I want is
 free  [0k, 20k),  set this range DIRTY in the pinned_extents tree.
 alloc [0k, 1k),   clear this range DIRTY in the pinned_extents tree.
 alloc [19k, 20k), ditto.
 
 However, in my stress test, this three refs may not be ordered by a)-b)-c), 
 but b)-a)-c) instead.
 That would be a problem, because it will confuse our space_info's counter: 
 bytes_reserved, bytes_pinned.

Do you have an idea why the ordering may become broken? If it's a race,
it might be better to fix it instead of adding a new bit to extent
flags.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.3.4 damages filesystem (?)

2012-05-09 Thread Duncan
Helmut Hullen posted on Mon, 07 May 2012 12:46:00 +0200 as excerpted:

 The 3 btrfs disks are connected via a SiI 3114 SATA-PCI-Controller.
 Only 1 of the 3 disks seems to be damaged.

I don't plan to rehash the raid0/single discussion here, but here's some 
perhaps useful additional information on that hardware:


For some years I've been running that same hardware, SiI 3114 SATA PCI, 
on an old dual-socket 3-digit Opteron system, running for some years now 
dual dual-core Opteron 290s (the highest they went, 2.8 GHz, 4 cores in 
two sockets).  However, I *WAS* running them in RAID-1, 4-disk md RAID-1, 
to be exact (with reiserfs, FWIW).


What's VERY interesting is that I've just returned from being offline for 
several days due to severe disk-I/O hardware issues of my own -- again, 
on that Sil-SATA 3114.

Most of the time I was getting full system crashes, but perhaps 25-33% of 
the time it didn't fully crash the system, simply error out with an 
eventual ATA reset.  When the system didn't crash immediately, most of 
the time (about 80% I'd say) the reset would be good and I'd be back up, 
but sometimes it'd repeatedly reset, occasionally not ever becoming 
usable again.

As the drives are all the same quite old Seagate 300 gig drives, at about 
half their rated SMART operating hours but I think well beyond the 5 year 
warrantee, I originally thought I'd just learned my lesson on the don't 
use all the same model or you're risking them all going out at once rule, 
but I bought a new drive (half-TB seagate 2.5 drive, I've been thinking 
about going 2.5 for awhile now and this was the chance, I'll RAID it 
later with at least one more, preferably a different run at least if not 
a different model) and have been SLOWLY, PAINFULLY, RESETTINGLY copying 
stuff over from one or another of the four RAID-1 drives.

The reset problem, however, hasn't gone away, tho it's rather reduced on 
the newer hardware.

I also happened to have a 4-3.5-in-3-5.25-slot drive enclosure that 
seemed to be making the problem worse, as when I first tried the new 2.5 
inch retrofitted into it, the reset problem was as bad with it as with 
the old drives, but when I ran it lose, just cabled into the mobo and 
power-supply directly, resets went down significantly but did NOT go away.


So... I've now concluded that I need a new controller and will probably 
buy one in a day or two.

Meanwhile, I THOUGHT it was just me with the SIL-SATA controller, until 
I happened to see the same hardware mentioned on this thread.


Now, I'm beginning to suspect that there's some new kernel DMA or storage 
or perhaps xorg/mesa (AMD AGPGART, after all, handling the DMA using half 
the aperture. if either the graphics or storage try writing to the wrong 
half...) problem that stressed what was already aging hardware, 
triggering the problem.  It's worth noting that I tried running an older 
kernel and rebuilding (on Gentoo) most of X/mesa/anything-else-I-could-
think-might-be-related between older versions that WERE working find 
before and newer versions, and reverting to older didn't help, so it's 
apparently NOT a direct software-only-bug.  However, what I'm wondering 
now is whether as I said, software upgrades added stress to already aging 
hardware, such that it tipped it over the edge, and by the time I tried 
reverting, I'd already had enough crashes and etc that my entire system 
was unstable, and reverting to older software didn't help because now the 
hardware was unstable as well.

I'd still chalk it up to simply failing hardware, except that it's a 
rather interesting coincidence that both you and I had their SIL-SATA 
3114s go bad at very close to the same time.


Meanwhile, I did recently see an interesting kernel commit, either late 
3.4-rc5+ or early 3.4-rc6+.  I don't want to try to track it down and 
lose this post to a crash on a less than stable system, but it did 
mention that AMD AGPGARTs sometimes poked holes in memory allocations and 
the commit was to try to allow for that.  I'm not sure how long the bad 
code had been in the kernel, but if it was introduced at say the 3.2 or 
3.3 kernel, it could be that is what first started triggering the lockups 
that lead to more and more system instability, until now I've bought a 
new drive and it looks like I'm going to need to replace the onboard SIL-
SATA.

So, some questions:

* Do you run OpenGL/Mesa at all on that system, possibly with an OpenGL 
compositing window manager?

* If so, how new is your mesa and xorg-server, and what is your video 
card/driver?

* Do you run quite new kernels, say 3.3/3.4?

* What libffi and cairo? (I did notice reverting libffi seemed to lessen 
the crashing a bit, especially with firefox on my bank's SSL site, which 
was where the problem first became ugly for me as I kept crashing trying 
to get in to pay bills, etc, but I'm not positive that's related, or it 
might be that likely otherwise separate bug's crashes advanced the ATA-
resets issue 

Re: [ANN] btrfs.wiki.kernel.org with up-to-date content again

2012-05-09 Thread Duncan
David Sterba posted on Mon, 07 May 2012 17:44:16 +0200 as excerpted:

 Hi,
 
 the time of temporary wiki hosted at btrfs.ipv5.de is over, the content
 has been migrated back to official site at
 
  http://btrfs.wiki.kernel.org
 
 (ipv5.de wiki is set to redirect there).

Thanks.  I was checking it a couple days ago and noticed the migrating 
back notice.  Then last night I was looking up something else, and 
noticed the redirect back to kernel.org. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subdirectory creation on snapshot

2012-05-09 Thread Brendan Smithyman
Thanks David,

If I understand you correctly, this would be the case with nested subvolumes; 
i.e., if subvolume A is exists within the directory tree subvolume B, and B is 
snapshotted.  I expected this, and it sounds totally consistent with my 
understanding of how btrfs subvolumes work.  However, the behaviour I'm seeing 
seems to be a different thing, so I just want to double-check:

In my case I am executing the btrfs subvolume snapshot @working newsnapshot 
command (or something like it).  The @working subvolume exists in the 
filesystem root, and does not contain any other subvolumes within its own 
subdirectory tree.  In the new subvolume, newsnapshot, there is an entry 
called @working that is identified as inode number 2 as you say.  But this 
isn't due to a subvolume in the directory tree of the original @working, 
since it still happens, e.g., if it is the only subvolume on the system (apart 
from the root, of course).

The naive assumption is that (excepting nested subvolumes), the snapshot should 
be indistinguishable from the original.  Additionally, I'm a bit perplexed by 
the behaviour on some of my volumes and not others.  It's not a big deal, and 
I'm happy to take your word for it (or look at the code, if you'd be willing to 
point me in the right direction; I'm not averse to learning).  I just wanted to 
double-check that we're talking about the same thing.

I appreciate the help!

Cheers,
Brendan

On 2012-05-09, at 9:58 AM, David Sterba wrote:

 On Mon, May 07, 2012 at 05:10:08PM -0700, Brendan Smithyman wrote:
 I'm experiencing some odd-seeming behaviour with btrfs on Ubuntu
 12.04, using the Ubuntu x86-64 generic 3.2.0-24 kernel and btrfs-tools
 0.19+20120328-1~precise1 (backported from the current Debian version
 using Ubuntu's backportpackage).  When I snapshot a subvolume on some
 of my drives, it creates an empty directory inside the snapshot with
 the same name as the original subvolume.  Example case and details
 below:
 
 This is known and it's not a problem, though I was surprised when I had
 first seen it myself. Snapshotting is not recursive, the case of
 file-file, directory-directory is straightforward, and when a
 subvolume is encountered, a new file sub-type is created, it's
 identified by BTRFS_EMPTY_SUBVOL_DIR_OBJECTID internally, so it's a kind
 of a stub subvolume.  It is identified by inode number 2 in stat
 output. The object cannot be modified and just sits there.
 
 
 david



smime.p7s
Description: S/MIME cryptographic signature


Re: kernel 3.3.4 damages filesystem (?)

2012-05-09 Thread Atila
I dont know if this is related or not, but I updated two different 
computers to ubuntu 12, which uses kernel 3.2, and in both I had the 
same problem: using btrfs with compress-force=lzo, after some IO stress 
the filesystem became unusable, some sort of busy.

Im using kernel 3.0 right now, with no such problem.

On 09-05-2012 14:32, Duncan wrote:

Helmut Hullen posted on Mon, 07 May 2012 12:46:00 +0200 as excerpted:


The 3 btrfs disks are connected via a SiI 3114 SATA-PCI-Controller.
Only 1 of the 3 disks seems to be damaged.

I don't plan to rehash the raid0/single discussion here, but here's some
perhaps useful additional information on that hardware:


For some years I've been running that same hardware, SiI 3114 SATA PCI,
on an old dual-socket 3-digit Opteron system, running for some years now
dual dual-core Opteron 290s (the highest they went, 2.8 GHz, 4 cores in
two sockets).  However, I *WAS* running them in RAID-1, 4-disk md RAID-1,
to be exact (with reiserfs, FWIW).


What's VERY interesting is that I've just returned from being offline for
several days due to severe disk-I/O hardware issues of my own -- again,
on that Sil-SATA 3114.

Most of the time I was getting full system crashes, but perhaps 25-33% of
the time it didn't fully crash the system, simply error out with an
eventual ATA reset.  When the system didn't crash immediately, most of
the time (about 80% I'd say) the reset would be good and I'd be back up,
but sometimes it'd repeatedly reset, occasionally not ever becoming
usable again.

As the drives are all the same quite old Seagate 300 gig drives, at about
half their rated SMART operating hours but I think well beyond the 5 year
warrantee, I originally thought I'd just learned my lesson on the don't
use all the same model or you're risking them all going out at once rule,
but I bought a new drive (half-TB seagate 2.5 drive, I've been thinking
about going 2.5 for awhile now and this was the chance, I'll RAID it
later with at least one more, preferably a different run at least if not
a different model) and have been SLOWLY, PAINFULLY, RESETTINGLY copying
stuff over from one or another of the four RAID-1 drives.

The reset problem, however, hasn't gone away, tho it's rather reduced on
the newer hardware.

I also happened to have a 4-3.5-in-3-5.25-slot drive enclosure that
seemed to be making the problem worse, as when I first tried the new 2.5
inch retrofitted into it, the reset problem was as bad with it as with
the old drives, but when I ran it lose, just cabled into the mobo and
power-supply directly, resets went down significantly but did NOT go away.


So... I've now concluded that I need a new controller and will probably
buy one in a day or two.

Meanwhile, I THOUGHT it was just me with the SIL-SATA controller, until
I happened to see the same hardware mentioned on this thread.


Now, I'm beginning to suspect that there's some new kernel DMA or storage
or perhaps xorg/mesa (AMD AGPGART, after all, handling the DMA using half
the aperture. if either the graphics or storage try writing to the wrong
half...) problem that stressed what was already aging hardware,
triggering the problem.  It's worth noting that I tried running an older
kernel and rebuilding (on Gentoo) most of X/mesa/anything-else-I-could-
think-might-be-related between older versions that WERE working find
before and newer versions, and reverting to older didn't help, so it's
apparently NOT a direct software-only-bug.  However, what I'm wondering
now is whether as I said, software upgrades added stress to already aging
hardware, such that it tipped it over the edge, and by the time I tried
reverting, I'd already had enough crashes and etc that my entire system
was unstable, and reverting to older software didn't help because now the
hardware was unstable as well.

I'd still chalk it up to simply failing hardware, except that it's a
rather interesting coincidence that both you and I had their SIL-SATA
3114s go bad at very close to the same time.


Meanwhile, I did recently see an interesting kernel commit, either late
3.4-rc5+ or early 3.4-rc6+.  I don't want to try to track it down and
lose this post to a crash on a less than stable system, but it did
mention that AMD AGPGARTs sometimes poked holes in memory allocations and
the commit was to try to allow for that.  I'm not sure how long the bad
code had been in the kernel, but if it was introduced at say the 3.2 or
3.3 kernel, it could be that is what first started triggering the lockups
that lead to more and more system instability, until now I've bought a
new drive and it looks like I'm going to need to replace the onboard SIL-
SATA.

So, some questions:

* Do you run OpenGL/Mesa at all on that system, possibly with an OpenGL
compositing window manager?

* If so, how new is your mesa and xorg-server, and what is your video
card/driver?

* Do you run quite new kernels, say 3.3/3.4?

* What libffi and cairo? (I did notice reverting libffi seemed to lessen
the 

Re: failed disk

2012-05-09 Thread Helmut Hullen
Hallo, Hugo,

Du meintest am 09.05.12:

As to the spurious upgrade of single to RAID-0, I thought Ilya
 had stopped it doing that. What kernel version are you running?

 3.2.9, self made.

OK, I'm pretty sure that's too old -- it will upgrade single to
 RAID-0. You can probably turn it back to single using balance
 filters:

 # btrfs fi balance -dconvert=single /mountpoint

 (You may want to write at least a little data to the FS first --
 balance has some slightly odd behaviour on empty filesystems).

manana ... the system is just running balance after device delete.  
And that may still need 4 ... 5 hours.

Out of interest, why did you do the device adds separately,
 instead of just this?

 a) making the first 2 devices: I have tested both versions (one line
 with 2 devices or 2 lines with 1 device); no big difference.

 But I had tested the option -L (labelling) too, and that makes
 shit for the oneliner: both devices get the same label, and then
 findfs finds none of them.

Umm... Yes, of course both devices will get the same label --
 you're labelling the filesystem, not the devices. (Didn't we have
 this argument some time ago?).

Not with that special case (and that led me to misinterpreting the error  
...).

I don't know what findfs is doing, that it can't find the
 filesystem by label: you may need to run sync after mkfs, possibly.

No - findfs works quite simple: if it finds 1 label then it tells the  
partition.
If it finds more or less labels it tells nothing.

 b) third device: that's my usual test:
 make a cluster of 2 deivces
 fill them with data
 add a third device
 delete the smallest device

What are you testing? And by delete do you mean btrfs dev
 delete or pull the cable out?

First pure software delete. Tomorrow I'll reboot the system and look at  
the results with

btrfs fi show

It should tell only 2 devices (that's the part which seems to work as  
described at least since kernel 3.2).

By the way: it seems to be necessary running

btrfs fi balance ...

after btrfs device add ... and after btrfs device delete 

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-09 Thread Josef Bacik
On Fri, May 04, 2012 at 10:24:16PM +0200, Christian Brunner wrote:
 2012/5/3 Josef Bacik jo...@redhat.com:
  On Thu, May 03, 2012 at 09:38:27AM -0700, Josh Durgin wrote:
  On Thu, 3 May 2012 11:20:53 -0400, Josef Bacik jo...@redhat.com
  wrote:
   On Thu, May 03, 2012 at 08:17:43AM -0700, Josh Durgin wrote:
  
   Yeah all that was in the right place, I rebooted and I magically
   stopped getting
   that error, but now I'm getting this
  
   http://fpaste.org/OE92/
  
   with that ping thing repeating over and over.  Thanks,
 
  That just looks like the osd isn't running. If you restart the
  osd with 'debug osd = 20' the osd log should tell us what's going on.
 
  Ok that part was my fault, Duh I need to redo the tmpfs and mkcephfs stuff 
  after
  reboot.  But now I'm back to my original problem
 
  http://fpaste.org/PfwO/
 
  I have the osd class dir = /usr/lib64/rados-classes thing set and 
  libcls_rbd is
  in there, so I'm not sure what is wrong.  Thanks,
 
 Thats really strange. Do you have the osd logs in /var/log/ceph? If
 so, can you look if you find anything about rbd or class loading
 in there?
 
 Another thing you should try is, whether you can access ceph with rados:
 
 # rados -p rbd ls
 # rados -p rbd -i /proc/cpuinfo put testobj
 # rados -p rbd -o - get testobj


Ok weirdly ceph is trying to dlopen /usr/lib64/rados-classes/libcls_rbd.so but
all I had was libcls_rbd.so.1 and libcls_rbd.so.1.0.0.  Symlink fixed that part,
I'll see if I can reproduce now.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs RAID with enterprise SATA or SAS drives

2012-05-09 Thread Daniel Pocock


There is various information about
- enterprise-class drives (either SAS or just enterprise SATA)
- the SCSI/SAS protocols themselves vs SATA
having more advanced features (e.g. for dealing with error conditions)
than the average block device

For example, Adaptec recommends that such drives will work better with
their hardware RAID cards:

http://ask.adaptec.com/cgi-bin/adaptec_tic.cfg/php/enduser/std_adp.php?p_faqid=14596
Desktop class disk drives have an error recovery feature that will
result in a continuous retry of the drive (read or write) when an error
is encountered, such as a bad sector. In a RAID array this can cause the
RAID controller to time-out while waiting for the drive to respond.

and this blog:
http://www.adaptec.com/blog/?p=901
major advantages to enterprise drives (TLER for one) ... opt for the
enterprise drives in a RAID environment no matter what the cost of the
drive over the desktop drive

My question..

- does btrfs RAID1 actively use the more advanced features of these
drives, e.g. to work around errors without getting stuck on a bad block?

- if a non-RAID SAS card is used, does it matter which card is chosen?
Does btrfs work equally well with all of them?

- ignoring the better MTBF and seek times of these drives, do any of the
other features passively contribute to a better RAID experience when
using btrfs?

- for someone using SAS or enterprise SATA drives with Linux, I
understand btrfs gives the extra benefit of checksums, are there any
other specific benefits over using mdadm or dmraid?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2] Btrfs: improve space count for files with fragments

2012-05-09 Thread Liu Bo
On 05/10/2012 01:29 AM, David Sterba wrote:

 On Fri, Apr 27, 2012 at 09:44:13AM +0800, Liu Bo wrote:
 Let's take the above case:
 0k  20k
 | --- extent --- |
 | - A - |
 1k  19k

 And we assume that this extent starts from disk_bytenr on its FS logical 
 offset.

 By splitting the [0k, 20k) extent, we'll get three delayed refs into the 
 delayed-ref rbtree:
 a) [0k, 20k),  in which only [disk_bytenr+1k, disk_bytenr+19k) will be freed 
 at the end.
 b) [0k, 1k),   which will _not_ allocate a new extent but use the remained 
 space of [0k, 20k).
 c) [19k, 20k), ditto.

 And another ref [1k,19k) will get a new allocated space by our normal endio 
 routine.

 What I want is
 free  [0k, 20k),  set this range DIRTY in the pinned_extents tree.
 alloc [0k, 1k),   clear this range DIRTY in the pinned_extents tree.
 alloc [19k, 20k), ditto.

 However, in my stress test, this three refs may not be ordered by 
 a)-b)-c), but b)-a)-c) instead.
 That would be a problem, because it will confuse our space_info's counter: 
 bytes_reserved, bytes_pinned.
 
 Do you have an idea why the ordering may become broken? If it's a race,
 it might be better to fix it instead of adding a new bit to extent
 flags.
 


These refs are well managed in the delayed_ref rbtree, but processing these 
refs can be multi-threads,
so the ordering is not ensured to be sequenced since the original design thinks 
each ref is independent.

Any thoughts? :)

thanks,
liubo

 
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use ALIGN macro instead of open-coded expression

2012-05-09 Thread Yuanhan Liu
On Wed, May 09, 2012 at 06:45:49PM +0200, David Sterba wrote:
 On Tue, May 08, 2012 at 04:16:24PM +0800, Yuanhan Liu wrote:
  According to section 'Find open-coded helpers or macros' at
  https://btrfs.wiki.kernel.org/index.php/Cleanup_ideas, here
  in the patch we use ALIGN macro to do the alignment.
 
 Well, I wrote this section and some time later also the patches,
 http://www.spinics.net/lists/linux-btrfs/msg12747.html
 
 but did not update the section with the status reflecting this, sorry
 that you duplicated work.

It's OK. I just didn't find those 'issue' was fixed in mainline, thus I
thought it still exist.

Thanks,
Yuanhan Liu

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: failed disk

2012-05-09 Thread Helmut Hullen
Hallo, Hugo,

Du meintest am 09.05.12:

 btrfs fi df /mnt/Scsi

 now tells

 Data, RAID0: total=183.18GB, used=76.60GB
 Data: total=80.01GB, used=79.83GB
 System, DUP: total=8.00MB, used=32.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=1.00GB, used=192.74MB
 Metadata: total=8.00MB, used=0.00

 --

 Data, RAID0 confuses me (not very much ...), and the system for
 metadata (RAID1) is not told.

DUP is two copies of each block, but it allows the two copies to
 live on the same device. It's done this because you started with a
 single device, and you can't do RAID-1 on one device. The first bit
 of metadata you write to it should automatically upgrade the DUP
 chunk to RAID-1.

It has done - ok. Adding and removing disks/partitions works as  
expected.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html