Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)

2012-07-25 Thread Hugo Mills
On Tue, Jul 24, 2012 at 08:39:36PM -0400, Marios Titas wrote:
> When I create a btrfs volume of size strictly less than 256 MiB then if I do
> mount /dev/sdb1 /mnt/test
> the kernel tries unsuccessfully to do the mount with many other file systems
> before successfully trying with btrfs. For volumes of size larger than
> or equal to
> 256 MiB it just mounts the volume without doing that. Why is this discrepancy?

   Are you using the --mixed option when creating the filesystem? If
not, you should do with something that small.

   Hugo.

> Another possibly related symptom is that the volume does not appear in
> /dev/disk/by-label and /dev/disk/by-uuid at all. This means that it is
> impossible
> to mount the volume by uuid or label.
> 
> To make sure that this isn't a udev bug, I booted my system with 
> init=/bin/bash
> in the kernel command line, and then I tried again to mount the
> volume. This time
> it would not mount it at all unless I explicitly specified the fs
> type. On the other
> hand, it could mount larger volumes without any issues.
> 
> All the experiments were done in an initially zeroed out disk. I am
> using 3.4.6 kernel
> with btrfs from 3.5 and the latest btrfs-progs from git.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- In theory, theory and practice are the same. In --- 
  practice,  they're different.  


signature.asc
Description: Digital signature


Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)

2012-07-25 Thread cwillu
On Tue, Jul 24, 2012 at 6:39 PM, Marios Titas  wrote:
> When I create a btrfs volume of size strictly less than 256 MiB then if I do
> mount /dev/sdb1 /mnt/test
> the kernel tries unsuccessfully to do the mount with many other file systems
> before successfully trying with btrfs. For volumes of size larger than
> or equal to
> 256 MiB it just mounts the volume without doing that. Why is this discrepancy?

If I understand correctly, the kernel does not implement any fs
detection; this is performed by the mount utility, which indeed may
try a bunch of different filesystems until it finds one that works.

>From man mount:
  If no -t option is given, or if  the  auto  type  is  specified,
  mount  will try to guess the desired type.  Mount uses the blkid
  or volume_id library for guessing the filesystem type;  if  that
  does not turn up anything that looks familiar, mount will try to
  read the file /etc/filesystems, or,  if  that  does  not  exist,
  /proc/filesystems.   All  of  the  filesystem types listed there
  will be tried, except for those that are labeled "nodev"  (e.g.,
  devpts,  proc and nfs).  If /etc/filesystems ends in a line with
  a single * only, mount will read /proc/filesystems afterwards.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/6] Btrfs-progs: add btrfs send/receive commands

2012-07-25 Thread Alex Lyakas
Thanks!

So now:
A_PATH -> path -> full_path -> newpath
A_PATH_LINK  -> lnk -> full_link_path -> oldpath

while I viewed it the other way around.

I guess it's not important what is left/right, old/new :) as long as
it's consistent.

Alex.


On Tue, Jul 24, 2012 at 11:27 PM, Alexander Block
 wrote:
> On Thu, Jul 19, 2012 at 3:25 PM, Alex Lyakas
>  wrote:
>> +static int process_link(const char *path, const char *lnk, void *user)
>> +{
>> +   int ret;
>> +   struct btrfs_receive *r = user;
>> +   char *full_path = path_cat(r->full_subvol_path, path);
>> +
>> +   if (g_verbose >= 1)
>> +   fprintf(stderr, "link %s -> %s\n", path, lnk);
>> +
>> +   ret = link(lnk, full_path);
>> +   if (ret < 0) {
>> +   ret = -errno;
>> +   fprintf(stderr, "ERROR: link %s -> %s failed. %s\n", path,
>> +   lnk, strerror(-ret));
>> +   }
>>
>> Actually it has to be:
>> char *full_link_path = path_cat(r->full_subvol_path, lnk);
>> ...
>> ret = link(full_path/*oldpath*/, full_link_path/*newpath*/);
>> ...
>> free(full_link_path);
>>
>> Thanks,
>> Alex.
>
> Actually, the pathes got mixed up in-kernel. You'll find a pushed fix
> in the kernel repo. I also pushed a fix to btrfs-progs containing the
> full_link_path. Thanks again :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)

2012-07-25 Thread Marios Titas
On Wed, Jul 25, 2012 at 5:09 AM, cwillu  wrote:
> On Tue, Jul 24, 2012 at 6:39 PM, Marios Titas  wrote:
>> When I create a btrfs volume of size strictly less than 256 MiB then if I do
>> mount /dev/sdb1 /mnt/test
>> the kernel tries unsuccessfully to do the mount with many other file systems
>> before successfully trying with btrfs. For volumes of size larger than
>> or equal to
>> 256 MiB it just mounts the volume without doing that. Why is this 
>> discrepancy?
>
> If I understand correctly, the kernel does not implement any fs
> detection; this is performed by the mount utility, which indeed may
> try a bunch of different filesystems until it finds one that works.
>
> From man mount:
>   If no -t option is given, or if  the  auto  type  is  specified,
>   mount  will try to guess the desired type.  Mount uses the blkid
>   or volume_id library for guessing the filesystem type;  if  that
>   does not turn up anything that looks familiar, mount will try to
>   read the file /etc/filesystems, or,  if  that  does  not  exist,
>   /proc/filesystems.   All  of  the  filesystem types listed there
>   will be tried, except for those that are labeled "nodev"  (e.g.,
>   devpts,  proc and nfs).  If /etc/filesystems ends in a line with
>   a single * only, mount will read /proc/filesystems afterwards.

Thanks, that was helpful. It was a blkid bug. It was fixed  [1] in
util-linux 2.21.

[1] https://git.kernel.org/?p=utils/util-linux/util-linux.git;a=commit;h=04f7020
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)

2012-07-25 Thread Alexander Block
On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen  wrote:
> On 04.07.2012 15:39, Alexander Block wrote:
>> Hello all,
>>
>> This is the user space side of btrfs send/receive.
>>
>> You can apply them manually or use my git repo:
>>
>> git://github.com/ablock84/btrfs-progs.git (branch send)
>>
>> The branch is based on Hugo's integration-20120605 branch. I had to add a 
>> temporary
>> commit to fix a bug introduced in one of the strncpy/overflow patches that 
>> got into
>> btrfs-progs. This fix is not part of the btrfs send/receive patchset, but 
>> you'll
>> probably need it if you want to base on the integration branch. I hope this 
>> is not
>> required in the future when a new integration branch comes out.
>>
>> Example usage:
>>
>> Multiple snapshots at once:
>> btrfs send /mnt/snap[123] > snap123.btrfs
>
> a) Do we really want a single token command here, not
> btrfs filesystem send or subvol send?
In my opinion the single token is easier to type and remember. But if
enough speaks for normal subcommands this can be changed (but by
someone else as I'm running out of time).
> b) zfs makes sure stdout is not a tty, to prevent flooding
> your console. This kinda makes sense.
This makes sense. But again, this has to be done by someone else.
>
>>
>> Single snapshot with manual parent:
>> btrfs send -p /mnt/snap3 /mnt/snap4 > snap4.btrfs
>>
>> Receive both streams:
>> btrfs receive /mnt2 < snap123.btrfs
>> btrfs receive /mnt2 < snap4.btrfs
>>
>> (Please give suggestions for a file extension)
>>
>> Please read the kernel side email as well, especially the warnings!
>>
>> Alex.
>>
>> Alexander Block (6):
>>   Btrfs-progs: add BTRFS_IOC_SUBVOL_GET/SETFLAGS to ioctl.h
>>   Btrfs-progs: update ioctl.h to support clone range ioctl
>>   Btrfs-progs: print inode transid and dir item data field in
>> debug-tree
>>   Btrfs-progs: update btrfs-progs for subvol uuid+times support
>>   Btrfs-progs: update ioctl.h to support btrfs send ioctl
>>   Btrfs-progs: add btrfs send/receive commands
>>
>>  Makefile   |7 +-
>>  btrfs.c|2 +
>>  cmds-receive.c |  910 
>> 
>>  cmds-send.c|  677 +
>>  commands.h |4 +
>>  ctree.h|   40 ++-
>>  ioctl.h|   35 ++-
>>  print-tree.c   |   88 --
>>  send-stream.c  |  480 ++
>>  send-stream.h  |   58 
>>  send-utils.c   |  337 +
>>  send-utils.h   |   69 +
>>  send.h |  132 
>>  13 files changed, 2815 insertions(+), 24 deletions(-)
>>  create mode 100644 cmds-receive.c
>>  create mode 100644 cmds-send.c
>>  create mode 100644 send-stream.c
>>  create mode 100644 send-stream.h
>>  create mode 100644 send-utils.c
>>  create mode 100644 send-utils.h
>>  create mode 100644 send.h
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/7] Btrfs: introduce subvol uuids and times

2012-07-25 Thread Alexander Block
On Tue, Jul 24, 2012 at 7:55 AM, Arne Jansen  wrote:
> On 23.07.2012 21:41, Alexander Block wrote:
>> On Mon, Jul 16, 2012 at 4:56 PM, Arne Jansen  wrote:
>>> On 04.07.2012 15:38, Alexander Block wrote:
 +
 + ret = btrfs_update_root(trans, root->fs_info->tree_root,
 + &root->root_key, &root->root_item);
 + if (ret < 0) {
 + goto out;
>>>
>>> are you leaking a trans handle here?
>>>
>> btrfs_update_root is aborting the transaction in case of failure. Do I
>> still need to call end_transaction?
>
> It's your handle, you should free it.
Ahh...I thought abort_transaction already frees the handle. Fixed.
>
> ...
>

 +struct btrfs_ioctl_received_subvol_args {
 + charuuid[BTRFS_UUID_SIZE];  /* in */
 + __u64   stransid;   /* in */
 + __u64   rtransid;   /* out */
 + struct timespec stime;  /* in */
 + struct timespec rtime;  /* out */
 + __u64   reserved[16];
>>>
>>> What is this reserved used for? I don't see a mechanism that could be
>>> used to signal that there are useful information here, other than
>>> using a different ioctl.
>>>
>> The reserved is a result of a suggestion made by David. I can remove
>> it again if you want...
>
> I don't argue against some reserved space, I only have problems to
> see how you can make use of them in the future when there's no way
> to signal that they contain valid information. I should be sufficient
> to define the reserved values to be 0 at the moment.
Misunderstood that. Now I see the problem :) I've added a flags field
before the reserved fields. It's unused for now but can later be used
to signal that new fields are used.
>
 +};
 +
  #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
  struct btrfs_ioctl_vol_args)
  #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
 @@ -359,6 +368,10 @@ struct btrfs_ioctl_get_dev_stats {
   struct btrfs_ioctl_ino_path_args)
  #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \
   struct btrfs_ioctl_ino_path_args)
 +
 +#define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \
 + struct btrfs_ioctl_received_subvol_args)
 +
  #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
 struct btrfs_ioctl_get_dev_stats)
  #define BTRFS_IOC_GET_AND_RESET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 53, \
 diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
 index 24fb8ce..17d638e 100644
 --- a/fs/btrfs/root-tree.c
 +++ b/fs/btrfs/root-tree.c
 @@ -16,6 +16,7 @@
   * Boston, MA 021110-1307, USA.
   */

 +#include 
  #include "ctree.h"
  #include "transaction.h"
  #include "disk-io.h"
 @@ -25,6 +26,9 @@
   * lookup the root with the highest offset for a given objectid.  The key 
 we do
   * find is copied into 'key'.  If we find something return 0, otherwise 
 1, < 0
   * on error.
 + * We also check if the root was once mounted with an older kernel. If we 
 detect
 + * this, the new fields coming after 'level' get overwritten with zeros 
 so to
 + * invalidate the fields.
>>>
>>> ... "This is detected by a mismatch of the 2 generation fields" ... or 
>>> something
>>> like that.
>>>
>> The current version (found in git only) has this new function which is
>> called in find_last_root:
>> void btrfs_read_root_item(struct btrfs_root *root,
>>struct extent_buffer *eb, int slot,
>>struct btrfs_root_item *item)
>>
>> The comment above this function explains what happens.
>
> ok. Please regard most of my comments as an expression of my thoughts while
> reading it. So they mark places where it might be useful to add comments
> to make it easier for the next reader :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question about btrfs metadata structure

2012-07-25 Thread Guenther Rasch

Hi!

I'm writing my bsc thesis about btrfs and I'm analyzing 
the

btrfs metadata structure at the moment.

I'm using fedora 17 with kernel 3.4 and btrfs-prog from 
your
git-repository. For an overview I use "btrfs-debug-tree" 
and

here are my questions:

The root tree contains some inode_items, but I don't 
understand

the meaning of them:
item 4 key (ROOT_TREE_DIR INODE_ITEM 0) itemoff 3101 
itemsize 160

inode generation 3 size 0 block group 0 mode 40555 links 1

item 5 key (ROOT_TREE_DIR INODE_REF 6) itemoff 3089 
itemsize 12

inode ref index 0 namelen 2 name: ..

The key for an inode_item is objectid=ROOT_TREE_DIR. 
ROOT-TREE-DIR is the inode-Nr.?

What does offset=6 regarding INODE_REF mean? --> item 5

Also placed in the root tree, I've found entries of
(FREE_SPACE UNTYPED )
What does UNTYPED mean in this context and the offset is
the logical address of a block group (I guess...)? This
item has a pointer to an inode_item, is that right?

And last question about this entries:
item 2 key (FS_TREE INODE_REF 6) itemoff 3500 itemsize 17
 inode ref index 0 namelen 7 name: default
...
item 6 key (ROOT_TREE_DIR DIR_ITEM 2378154706) itemoff 
3052 itemsize 37
 location key (FS_TREE ROOT_ITEM 18446744073709551615) 
type 2

 namelen 7 datalen 0 name: default

There is an INODE_REF (again, offset 6...?) pointing to
which inode? I cannot find some...
What's about that offset in location key (FS_TREE)

Thx in advance for your patience and answers!

Regards
Guenther
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs-progs: segfault when re-creating filesystem with 3+ devices

2012-07-25 Thread Cyril B.
Hello,

Running on Linux 3.5.0, with the latest btrfs-progs from today. I was doing 
some tests with my 7 devices, and mkfs.btrfs segfaults when executed twice, 
creating a filesystem with 3+ devices:


# ./mkfs.btrfs /dev/sdc /dev/sdd /dev/sde

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

adding device /dev/sdd id 2
adding device /dev/sde id 3
fs created label (null) on /dev/sdc
nodesize 4096 leafsize 4096 sectorsize 4096 size 8.19TB
Btrfs Btrfs v0.19


# ./mkfs.btrfs /dev/sdc /dev/sdd /dev/sde

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

adding device /dev/sdd id 2
Segmentation fault


If executed a third time, everything is fine and the filesystem is created 
normally. The segfault doesn't happen with 2 or less devices (but still 
happens with 4 or more devices).

Backtrace:

#0  0x7fba810faaf0 in strcmp () from /lib/libc.so.6
#1  0x0041e17c in device_list_add (path=0x7fff10a8a810 "/dev/sdd", 
disk_super=0xac9570, devid=2, fs_devices_ret=0x7fff10a8ac88) at 
volumes.c:119
#2  0x0041e556 in btrfs_scan_one_device (fd=11, path=0x7fff10a8a810 
"/dev/sdd", fs_devices_ret=0x7fff10a8ac88, total_devs=0x7fff10a8ac80, 
super_offset=65536)
at volumes.c:221
#3  0x0042607b in btrfs_scan_block_devices (run_ioctl=1) at 
utils.c:1209
#4  0x00425bf0 in btrfs_scan_for_fsid (fs_devices=0xab5020, 
total_devs=7, run_ioctls=1) at utils.c:1057
#5  0x0042551e in check_mounted_where (fd=9, file=0x7fff10a8ca01 
"/dev/sde", where=0x0, size=0, fs_dev_ret=0x0) at utils.c:843
#6  0x00425494 in check_mounted (file=0x7fff10a8ca01 "/dev/sde") at 
utils.c:820
#7  0x0042be74 in main (ac=4, av=0x7fff10a8b058) at mkfs.c:1402

Thanks.

-- 
Cyril B.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mkfs devices ordering relevant with devices of different sizes?

2012-07-25 Thread Cyril B.
Hello,

When creating a filesystem with devices of different sizes, the resulting 
filesystem total size depends on the device order specified to mkfs. When 
the smaller device is specified first, the second (larger) device is seen as 
the same size as the first. This doesn't occur when the order is reversed.

It's confusing. Is this expected? I'm using the latest btrfs-progs and Linux 
3.5.


# ./mkfs.btrfs /dev/sda4 /dev/sdc

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

adding device /dev/sdc id 2
fs created label (null) on /dev/sda4
nodesize 4096 leafsize 4096 sectorsize 4096 size 3.97TB
Btrfs Btrfs v0.19
backup6:~/btrfs-progs# ./btrfs fi show /dev/sdc
Label: none  uuid: 806b237e-53ed-409e-a7c1-02f101798384
Total devices 2 FS bytes used 28.00KB
devid2 size 1.98TB used 2.01GB path /dev/sdc
devid1 size 1.98TB used 2.03GB path /dev/sda4

Btrfs Btrfs v0.19


# ./mkfs.btrfs /dev/sdc /dev/sda4

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

adding device /dev/sda4 id 2
fs created label (null) on /dev/sdc
nodesize 4096 leafsize 4096 sectorsize 4096 size 4.71TB
Btrfs Btrfs v0.19
backup6:~/btrfs-progs# ./btrfs fi show /dev/sdc
Label: none  uuid: 8f99c072-521b-4827-a2be-41de6ab11b4f
Total devices 2 FS bytes used 28.00KB
devid1 size 2.73TB used 2.03GB path /dev/sdc
devid2 size 1.98TB used 2.01GB path /dev/sda4

Btrfs Btrfs v0.19


Thanks.

-- 
Cyril B.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-25 Thread Alexander Block
On Tue, Jul 10, 2012 at 5:26 PM, Alex Lyakas
 wrote:
> Alexander,
> this focuses on area of sending file extents:
>
>> +static int is_extent_unchanged(struct send_ctx *sctx,
>> +  struct btrfs_path *left_path,
>> +  struct btrfs_key *ekey)
>> +{
>> +   int ret = 0;
>> +   struct btrfs_key key;
>> +   struct btrfs_path *path = NULL;
>> +   struct extent_buffer *eb;
>> +   int slot;
>> +   struct btrfs_key found_key;
>> +   struct btrfs_file_extent_item *ei;
>> +   u64 left_disknr;
>> +   u64 right_disknr;
>> +   u64 left_offset;
>> +   u64 right_offset;
>> +   u64 left_len;
>> +   u64 right_len;
>> +   u8 left_type;
>> +   u8 right_type;
>> +
>> +   path = alloc_path_for_send();
>> +   if (!path)
>> +   return -ENOMEM;
>> +
>> +   eb = left_path->nodes[0];
>> +   slot = left_path->slots[0];
>> +
>> +   ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
>> +   left_type = btrfs_file_extent_type(eb, ei);
>> +   left_disknr = btrfs_file_extent_disk_bytenr(eb, ei);
>> +   left_len = btrfs_file_extent_num_bytes(eb, ei);
>> +   left_offset = btrfs_file_extent_offset(eb, ei);
>> +
>> +   if (left_type != BTRFS_FILE_EXTENT_REG) {
>> +   ret = 0;
>> +   goto out;
>> +   }
>> +
>> +   key.objectid = ekey->objectid;
>> +   key.type = BTRFS_EXTENT_DATA_KEY;
>> +   key.offset = ekey->offset;
>> +
>> +   while (1) {
>> +   ret = btrfs_search_slot_for_read(sctx->parent_root, &key, 
>> path,
>> +   0, 0);
>> +   if (ret < 0)
>> +   goto out;
>> +   if (ret) {
>> +   ret = 0;
>> +   goto out;
>> +   }
>> +   btrfs_item_key_to_cpu(path->nodes[0], &found_key,
>> +   path->slots[0]);
>> +   if (found_key.objectid != key.objectid ||
>> +   found_key.type != key.type) {
>> +   ret = 0;
>> +   goto out;
>> +   }
>> +
>> +   eb = path->nodes[0];
>> +   slot = path->slots[0];
>> +
>> +   ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
>> +   right_type = btrfs_file_extent_type(eb, ei);
>> +   right_disknr = btrfs_file_extent_disk_bytenr(eb, ei);
>> +   right_len = btrfs_file_extent_num_bytes(eb, ei);
>> +   right_offset = btrfs_file_extent_offset(eb, ei);
>> +   btrfs_release_path(path);
>> +
>> +   if (right_type != BTRFS_FILE_EXTENT_REG) {
>> +   ret = 0;
>> +   goto out;
>> +   }
>> +
>> +   if (left_disknr != right_disknr) {
>> +   ret = 0;
>> +   goto out;
>> +   }
>> +
>> +   key.offset = found_key.offset + right_len;
>> +   if (key.offset >= ekey->offset + left_len) {
>> +   ret = 1;
>> +   goto out;
>> +   }
>> +   }
>> +
>> +out:
>> +   btrfs_free_path(path);
>> +   return ret;
>> +}
>> +
>
> Should we always treat left extent with bytenr==0 as not changed?
No, as we may have bytenr!=0 on the right side.
> Because right now, it simply reads and sends data of such extent,
> while bytenr==0 means "no data allocated here". Since we always do
> send_truncate() afterwards, file size will always be correct, so we
> can just skip bytenr==0 extents.
This is something that could be done for full sends only. Full sends
however do not call is_extent_unchanged, so the optimization is
something for process_extent.
In the incremental case, it may happen that left_disknr==0 and
right_disknr!=0 or vice versa, so we need to do the compare no matter
if one of them is ==0. process_extents could then again do some
optimization and send a special command to instruct a preallocated
zero block.
> Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those
> also don't contain real data.
> So something like:
> if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) {
> ret = 1;
> goto out;
> }
Do you mean "|| left_type == BTRFS_FILE_EXTENT_PREALLOC"?
> before we check for BTRFS_FILE_EXTENT_REG.
>
> Now I have a question about the rest of the logic that decides that
> extent is unchanged. I understand that if we see the same extent (same
> disk_bytenr) shared between parent_root and send_root, then it must
> contain the same data, even in nodatacow mode, because on a first
> write to such shared extent, it is cow'ed even with nodatacow.
>
> However, shouldn't we check btrfs_file_extent_offset(), to make sure
> that both send_root and parent_root point at the same offset into
> extent from the same file offset? Because if extent_of

Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)

2012-07-25 Thread Hugo Mills
On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote:
> On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen  wrote:
> > On 04.07.2012 15:39, Alexander Block wrote:
> >> Hello all,
> >>
> >> This is the user space side of btrfs send/receive.
> >>
> >> You can apply them manually or use my git repo:
> >>
> >> git://github.com/ablock84/btrfs-progs.git (branch send)
> >>
> >> The branch is based on Hugo's integration-20120605 branch. I had to add a 
> >> temporary
> >> commit to fix a bug introduced in one of the strncpy/overflow patches that 
> >> got into
> >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, but 
> >> you'll
> >> probably need it if you want to base on the integration branch. I hope 
> >> this is not
> >> required in the future when a new integration branch comes out.
> >>
> >> Example usage:
> >>
> >> Multiple snapshots at once:
> >> btrfs send /mnt/snap[123] > snap123.btrfs
> >
> > a) Do we really want a single token command here, not
> > btrfs filesystem send or subvol send?
> In my opinion the single token is easier to type and remember. But if
> enough speaks for normal subcommands this can be changed (but by
> someone else as I'm running out of time).

   Since everything else is two commands, yes, I think we need it for
consistency. (And, since it's a publically-visible interface, for
acceptance of the patches -- we don't want to be changing the way the
commands work after the fact).

> > b) zfs makes sure stdout is not a tty, to prevent flooding
> > your console. This kinda makes sense.
> This makes sense. But again, this has to be done by someone else.

   Can you keep a brief list of such cleanups/features and dump it on
the wiki as a proposed project when your time does run out, please.
That way the details don't get lost, and they can be found by other
people and dealt with independently.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Turning,  pages turning in the widening bath, / The spine ---
cannot bear the humidity. / Books fall apart; the binding
cannot hold. / Page 129 is loosed upon the world.


signature.asc
Description: Digital signature


[PATCH 07/16] btrfs: nuke write_super from comments

2012-07-25 Thread Artem Bityutskiy
From: Artem Bityutskiy 

The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from btrfs.

Cc: Chris Mason 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Artem Bityutskiy 
---

I expect this patch to be merged via Al Viro's VFS tree.

 fs/btrfs/super.c   |4 
 fs/btrfs/volumes.c |4 
 2 files changed, 8 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e239915..ad31627 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -100,10 +100,6 @@ static void __save_error_info(struct btrfs_fs_info 
*fs_info)
fs_info->fs_state = BTRFS_SUPER_FLAG_ERROR;
 }
 
-/* NOTE:
- * We move write_super stuff at umount in order to avoid deadlock
- * for umount hold all lock.
- */
 static void save_error_info(struct btrfs_fs_info *fs_info)
 {
__save_error_info(fs_info);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ecaad40..9f2416c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1738,10 +1738,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
 
device->fs_devices = root->fs_info->fs_devices;
 
-   /*
-* we don't want write_supers to jump in here with our device
-* half setup
-*/
mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
list_add_rcu(&device->dev_list, &root->fs_info->fs_devices->devices);
list_add(&device->dev_alloc_list,
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/16] btrfs: nuke pdflush from comments

2012-07-25 Thread Artem Bityutskiy
From: Artem Bityutskiy 

The pdflush thread is long gone, so this patch removes references to pdflush
from btrfs comments.

Cc: Chris Mason 
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Artem Bityutskiy 
---

I expect this patch to be merged via Al Viro's VFS tree.

 fs/btrfs/inode.c|3 ++-
 fs/btrfs/ordered-data.c |2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a7d1921..ca8b759 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -324,7 +324,8 @@ static noinline int add_async_extent(struct async_cow *cow,
  * If this code finds it can't get good compression, it puts an
  * entry onto the work queue to write the uncompressed bytes.  This
  * makes sure that both compressed inodes and uncompressed inodes
- * are written in the same order that pdflush sent them down.
+ * are written in the same order that the flusher thread sent them
+ * down.
  */
 static noinline int compress_file_range(struct inode *inode,
struct page *locked_page,
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 643335a..051c7fe 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -596,7 +596,7 @@ void btrfs_start_ordered_extent(struct inode *inode,
/*
 * pages in the range can be dirty, clean or writeback.  We
 * start IO on any dirty ones so the wait doesn't stall waiting
-* for pdflush to find them
+* for the flusher thread to find them
 */
if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags))
filemap_fdatawrite_range(inode->i_mapping, start, end);
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)

2012-07-25 Thread Chris Mason
On Wed, Jul 25, 2012 at 08:00:36AM -0600, Hugo Mills wrote:
> On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote:
> > On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen  wrote:
> > > On 04.07.2012 15:39, Alexander Block wrote:
> > >> Hello all,
> > >>
> > >> This is the user space side of btrfs send/receive.
> > >>
> > >> You can apply them manually or use my git repo:
> > >>
> > >> git://github.com/ablock84/btrfs-progs.git (branch send)
> > >>
> > >> The branch is based on Hugo's integration-20120605 branch. I had to add 
> > >> a temporary
> > >> commit to fix a bug introduced in one of the strncpy/overflow patches 
> > >> that got into
> > >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, 
> > >> but you'll
> > >> probably need it if you want to base on the integration branch. I hope 
> > >> this is not
> > >> required in the future when a new integration branch comes out.
> > >>
> > >> Example usage:
> > >>
> > >> Multiple snapshots at once:
> > >> btrfs send /mnt/snap[123] > snap123.btrfs
> > >
> > > a) Do we really want a single token command here, not
> > > btrfs filesystem send or subvol send?
> > In my opinion the single token is easier to type and remember. But if
> > enough speaks for normal subcommands this can be changed (but by
> > someone else as I'm running out of time).
> 
>Since everything else is two commands, yes, I think we need it for
> consistency. (And, since it's a publically-visible interface, for
> acceptance of the patches -- we don't want to be changing the way the
> commands work after the fact).

I've been sending and receiving while getting this code integrated.
These are really first class operations, and I'd prefer they not be
sub-commands.

I'm afraid there isn't a lot of logic here, just what feels good to
type.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] btrfs: nuke write_super from comments

2012-07-25 Thread cwillu
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ecaad40..9f2416c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1738,10 +1738,6 @@ int btrfs_init_new_device(struct btrfs_root *root, 
> char *device_path)
>
> device->fs_devices = root->fs_info->fs_devices;
>
> -   /*
> -* we don't want write_supers to jump in here with our device
> -* half setup
> -*/
> mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
> list_add_rcu(&device->dev_list, &root->fs_info->fs_devices->devices);
> list_add(&device->dev_alloc_list,

Is the locking still required for approximately the same reason?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] btrfs: nuke write_super from comments

2012-07-25 Thread Artem Bityutskiy
On Wed, 2012-07-25 at 09:46 -0600, cwillu wrote:
> > mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
> > list_add_rcu(&device->dev_list, 
> > &root->fs_info->fs_devices->devices);
> > list_add(&device->dev_alloc_list,
> 
> Is the locking still required for approximately the same reason?

I do not know, I assume Chris would check that.

-- 
Best Regards,
Artem Bityutskiy


signature.asc
Description: This is a digitally signed message part


Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)

2012-07-25 Thread Alexander Block
On Wed, Jul 25, 2012 at 4:00 PM, Hugo Mills  wrote:
> On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote:
>> On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen  wrote:
>> > On 04.07.2012 15:39, Alexander Block wrote:
>> >> Hello all,
>> >>
>> >> This is the user space side of btrfs send/receive.
>> >>
>> >> You can apply them manually or use my git repo:
>> >>
>> >> git://github.com/ablock84/btrfs-progs.git (branch send)
>> >>
>> >> The branch is based on Hugo's integration-20120605 branch. I had to add a 
>> >> temporary
>> >> commit to fix a bug introduced in one of the strncpy/overflow patches 
>> >> that got into
>> >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, but 
>> >> you'll
>> >> probably need it if you want to base on the integration branch. I hope 
>> >> this is not
>> >> required in the future when a new integration branch comes out.
>> >>
>> >> Example usage:
>> >>
>> >> Multiple snapshots at once:
>> >> btrfs send /mnt/snap[123] > snap123.btrfs
>> >
>> > a) Do we really want a single token command here, not
>> > btrfs filesystem send or subvol send?
>> In my opinion the single token is easier to type and remember. But if
>> enough speaks for normal subcommands this can be changed (but by
>> someone else as I'm running out of time).
>
>Since everything else is two commands, yes, I think we need it for
> consistency. (And, since it's a publically-visible interface, for
> acceptance of the patches -- we don't want to be changing the way the
> commands work after the fact).
>
>> > b) zfs makes sure stdout is not a tty, to prevent flooding
>> > your console. This kinda makes sense.
>> This makes sense. But again, this has to be done by someone else.
>
>Can you keep a brief list of such cleanups/features and dump it on
> the wiki as a proposed project when your time does run out, please.
> That way the details don't get lost, and they can be found by other
> people and dealt with independently.
Added a page to the wiki:
https://btrfs.wiki.kernel.org/index.php/Btrfs_Send/Receive
>
>Hugo.
>
> --
> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>   PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
> --- Turning,  pages turning in the widening bath, / The spine ---
> cannot bear the humidity. / Books fall apart; the binding
> cannot hold. / Page 129 is loosed upon the world.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)

2012-07-25 Thread Alex Lyakas
Alexander,
can you pls let know like a day or two before you run out of time?
I have compiled a list of questions, but also want to do more testing
before I publish them all.

Thanks for your work,
Alex.

On Wed, Jul 25, 2012 at 7:56 PM, Alexander Block
 wrote:
> On Wed, Jul 25, 2012 at 4:00 PM, Hugo Mills  wrote:
>> On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote:
>>> On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen  wrote:
>>> > On 04.07.2012 15:39, Alexander Block wrote:
>>> >> Hello all,
>>> >>
>>> >> This is the user space side of btrfs send/receive.
>>> >>
>>> >> You can apply them manually or use my git repo:
>>> >>
>>> >> git://github.com/ablock84/btrfs-progs.git (branch send)
>>> >>
>>> >> The branch is based on Hugo's integration-20120605 branch. I had to add 
>>> >> a temporary
>>> >> commit to fix a bug introduced in one of the strncpy/overflow patches 
>>> >> that got into
>>> >> btrfs-progs. This fix is not part of the btrfs send/receive patchset, 
>>> >> but you'll
>>> >> probably need it if you want to base on the integration branch. I hope 
>>> >> this is not
>>> >> required in the future when a new integration branch comes out.
>>> >>
>>> >> Example usage:
>>> >>
>>> >> Multiple snapshots at once:
>>> >> btrfs send /mnt/snap[123] > snap123.btrfs
>>> >
>>> > a) Do we really want a single token command here, not
>>> > btrfs filesystem send or subvol send?
>>> In my opinion the single token is easier to type and remember. But if
>>> enough speaks for normal subcommands this can be changed (but by
>>> someone else as I'm running out of time).
>>
>>Since everything else is two commands, yes, I think we need it for
>> consistency. (And, since it's a publically-visible interface, for
>> acceptance of the patches -- we don't want to be changing the way the
>> commands work after the fact).
>>
>>> > b) zfs makes sure stdout is not a tty, to prevent flooding
>>> > your console. This kinda makes sense.
>>> This makes sense. But again, this has to be done by someone else.
>>
>>Can you keep a brief list of such cleanups/features and dump it on
>> the wiki as a proposed project when your time does run out, please.
>> That way the details don't get lost, and they can be found by other
>> people and dealt with independently.
> Added a page to the wiki:
> https://btrfs.wiki.kernel.org/index.php/Btrfs_Send/Receive
>>
>>Hugo.
>>
>> --
>> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>>   PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>> --- Turning,  pages turning in the widening bath, / The spine ---
>> cannot bear the humidity. / Books fall apart; the binding
>> cannot hold. / Page 129 is loosed upon the world.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)

2012-07-25 Thread Alexander Block
On Wed, Jul 25, 2012 at 7:10 PM, Alex Lyakas
 wrote:
> Alexander,
> can you pls let know like a day or two before you run out of time?
> I have compiled a list of questions, but also want to do more testing
> before I publish them all.
My flight goes on 6. August...after that I don't know when I'm back. I
try my best to work on the most important issues until then, but time
is very limited already now due to the preparations I need to take
care of.
>
> Thanks for your work,
> Alex.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-25 Thread Alex Lyakas
Alexander,

>> Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those
>> also don't contain real data.
>> So something like:
>> if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) {
>> ret = 1;
>> goto out;
>> }
> Do you mean "|| left_type == BTRFS_FILE_EXTENT_PREALLOC"?

I see your point about bytenr==0, I missed that on the parent tree it
can be something else.

As for PREALLOC: can it happen that on differential send we see extent
of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had
some real data extent in that place? I don't know the answer, but if
yes, then we must treat PREALLOC as normal extent. So this case is
similar to bytenr==0.

Thanks,
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 1)

2012-07-25 Thread Alexander Block

Thanks for the review :)

On 07/18/2012 08:59 AM, Arne Jansen wrote:

On 04.07.2012 15:38, Alexander Block wrote:

This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.

I had to split the patch as it got larger then 100k which is
the limit for the mailing list. The first part only contains
the send.h header and the helper functions for TLV handling
and long path name handling and some other helpers. The second
part contains the actual send logic from send.c

Signed-off-by: Alexander Block
---
  fs/btrfs/Makefile |2 +-
  fs/btrfs/ioctl.h  |   10 +
  fs/btrfs/send.c   | 1009 +
  fs/btrfs/send.h   |  126 +++
  4 files changed, 1146 insertions(+), 1 deletion(-)
  create mode 100644 fs/btrfs/send.c
  create mode 100644 fs/btrfs/send.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 0c4fa2b..f740644 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-  reada.o backref.o ulist.o
+  reada.o backref.o ulist.o send.o

  btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
  btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index c9e3fac..282bc64 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -304,6 +304,15 @@ struct btrfs_ioctl_received_subvol_args {
__u64   reserved[16];
  };

+struct btrfs_ioctl_send_args {
+   __s64 send_fd;  /* in */
+   __u64 clone_sources_count;  /* in */
+   __u64 __user *clone_sources;/* in */
+   __u64 parent_root;  /* in */
+   __u64 flags;/* in */
+   __u64 reserved[4];  /* in */
+};
+
  #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
  #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -371,6 +380,7 @@ struct btrfs_ioctl_received_subvol_args {

  #define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \
struct btrfs_ioctl_received_subvol_args)
+#define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct 
btrfs_ioctl_send_args)

  #define BTRFS_IOC_GET_DEV_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \
  struct btrfs_ioctl_get_dev_stats)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
new file mode 100644
index 000..47a2557
--- /dev/null
+++ b/fs/btrfs/send.c
@@ -0,0 +1,1009 @@
+/*
+ * Copyright (C) 2012 Alexander Block.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include "send.h"
+#include "backref.h"
+#include "locking.h"
+#include "disk-io.h"
+#include "btrfs_inode.h"
+#include "transaction.h"
+
+static int g_verbose = 0;
+
+#define verbose_printk(...) if (g_verbose) printk(__VA_ARGS__)


Maybe pr_debug is interesting to you.

The advantage of this solution was that I could enable verbose output 
while I was in the debugger and single stepping. Not sure if I could do 
that with pr_debug. When send/receive gets stable and less debugging is 
needed, we can change this to pr_debug.

+
+/*
+ * A fs_path is a helper to dynamically build path names with unknown size.
+ * It reallocates the internal buffer on demand.
+ * It allows fast adding of path elements on the right side (normal path) and
+ * fast adding to the left side (reversed path). A reversed path can also be
+ * unreversed if needed.
+ */
+struct fs_path {
+   union {
+   struct {
+   char *start;
+   char *end;
+   char *prepared;
+
+   char *buf;
+   int buf_len;
+   int reversed:1;
+   int virtual_mem:1;


s/int/unsigned int/


Changed for the bitfields but not for buf_len.

+   char inline_buf[];
+

Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-25 Thread Alexander Block
On Wed, Jul 25, 2012 at 7:20 PM, Alex Lyakas
 wrote:
> Alexander,
>
>>> Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those
>>> also don't contain real data.
>>> So something like:
>>> if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) {
>>> ret = 1;
>>> goto out;
>>> }
>> Do you mean "|| left_type == BTRFS_FILE_EXTENT_PREALLOC"?
>
> I see your point about bytenr==0, I missed that on the parent tree it
> can be something else.
>
> As for PREALLOC: can it happen that on differential send we see extent
> of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had
> some real data extent in that place? I don't know the answer, but if
> yes, then we must treat PREALLOC as normal extent. So this case is
> similar to bytenr==0.
>
I also don't know if that may happen. Currently, only REG extents are
checked by is_extent_unchanged. All other types are regarded as
changed and will be sent. So in the worst case the stream gets larget
then it should be, but we won't loose data. I need to leave in a few
minutes and will continue working on btrfs send/receive v2 later
today. We should probably postpone "optimizations" (actually bug
fixing) here for later...don't know if I find enough time to
investigate more.

> Thanks,
> Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: fix error path in create_pending_snapshot()

2012-07-25 Thread Miao Xie
This patch fixes the following problem:
- If we failed to deal with the delayed dir items, we should abort transaction,
  just as its comment said. Fix it.
- If root reference or root back reference insertion failed, we should
  abort transaction. Fix it.
- Do not restore the trans->rsv if we doesn't change it.
- make the error path more clearly.

Signed-off-by: Miao Xie 
---
 fs/btrfs/transaction.c |   38 +-
 1 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b72b068..6d89603 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -932,18 +932,16 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
u64 objectid;
u64 root_flags;
 
-   rsv = trans->block_rsv;
-
new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS);
if (!new_root_item) {
ret = pending->error = -ENOMEM;
-   goto fail;
+   goto root_item_alloc_fail;
}
 
ret = btrfs_find_free_objectid(tree_root, &objectid);
if (ret) {
pending->error = ret;
-   goto fail;
+   goto no_free_objectid;
}
 
btrfs_reloc_pre_snapshot(trans, pending, &to_reserve);
@@ -953,7 +951,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
  to_reserve);
if (ret) {
pending->error = ret;
-   goto fail;
+   goto no_free_objectid;
}
}
 
@@ -961,6 +959,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
key.offset = (u64)-1;
key.type = BTRFS_ROOT_ITEM_KEY;
 
+   rsv = trans->block_rsv;
trans->block_rsv = &pending->block_rsv;
 
dentry = pending->dentry;
@@ -980,17 +979,16 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
BTRFS_FT_DIR, index);
if (ret == -EEXIST) {
pending->error = -EEXIST;
-   dput(parent);
goto fail;
} else if (ret) {
-   goto abort_trans_dput;
+   goto abort_trans;
}
 
btrfs_i_size_write(parent_inode, parent_inode->i_size +
 dentry->d_name.len * 2);
ret = btrfs_update_inode(trans, parent_root, parent_inode);
if (ret)
-   goto abort_trans_dput;
+   goto abort_trans;
 
/*
 * pull in the delayed directory update
@@ -999,10 +997,8 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 * snapshot
 */
ret = btrfs_run_delayed_items(trans, root);
-   if (ret) { /* Transaction aborted */
-   dput(parent);
-   goto fail;
-   }
+   if (ret)/* Transaction aborted */
+   goto abort_trans;
 
record_root_in_trans(trans, root);
btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
@@ -1021,7 +1017,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
if (ret) {
btrfs_tree_unlock(old);
free_extent_buffer(old);
-   goto abort_trans_dput;
+   goto abort_trans;
}
 
btrfs_set_lock_blocking(old);
@@ -1031,7 +1027,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
btrfs_tree_unlock(old);
free_extent_buffer(old);
if (ret)
-   goto abort_trans_dput;
+   goto abort_trans;
 
/* see comments in should_cow_block() */
root->force_cow = 1;
@@ -1044,7 +1040,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
btrfs_tree_unlock(tmp);
free_extent_buffer(tmp);
if (ret)
-   goto abort_trans_dput;
+   goto abort_trans;
 
/*
 * insert root back/forward references
@@ -1053,9 +1049,8 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 parent_root->root_key.objectid,
 btrfs_ino(parent_inode), index,
 dentry->d_name.name, dentry->d_name.len);
-   dput(parent);
if (ret)
-   goto fail;
+   goto abort_trans;
 
key.offset = (u64)-1;
pending->snap = btrfs_read_fs_root_no_name(root->fs_info, &key);
@@ -1067,16 +1062,17 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
ret = btrfs_reloc_post_snapshot(trans, pending);
if (ret)
goto abort_trans;
-   ret = 0;
 fail:
-   kfree(new_root_item);
+   dput(parent);
trans->block_rsv = rs

[PATCH 2/2] Btrfs: fix the snapshot that should not exist

2012-07-25 Thread Miao Xie
The snapshot should be the image of the fs tree before it was created,
so the metadata of the snapshot should not exist in the its tree. But now, we
found the directory item and directory name index is in both the snapshot tree
and the fs tree. It introduces some problem and makes the users feel strange:

 # mkfs.btrfs /dev/sda1
 # mount /dev/sda1 /mnt
 # mkdir /mnt/1
 # cd /mnt/1
 # btrfs subvolume snapshot /mnt snap0
 # ll /mnt/1
 total 0
 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1
^^^
 # ll /mnt/1/snap0/
 total 0
 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1
^^^
It is also 10, but...
 # ll /mnt/1/snap0/1
 total 0
 [None]
 # cd /mnt/1/snap0/1/snap0
 [Enter a unexisted directory successfully...]

There is nothing in the directory 1 in snap0, but btrfs told the length of
this directory is 10. Beside that, we can enter an unexisted directory, it is
very strange to the users.

 # btrfs subvolume snapshot /mnt/1/snap0 /mnt/snap1
 # ll /mnt/1/snap0/1/
 total 0
 [None]
 # ll /mnt/snap1/1/
 total 0
 drwxr-xr-x 1 root root 0 Ju1 24 12:14 snap0

And the source of snap1 did have any directory in Directory 1, but snap1 have
a snap0, it is different between the source and the snapshot.

So I think we should insert directory item and directory name index and update
the parent inode as the last step of snapshot creation, and do not leave the
useless metadata in the tree.

Signed-off-by: Miao Xie 
---
 fs/btrfs/transaction.c |   52 ++-
 1 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 6d89603..e9eceee 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -922,6 +922,8 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
struct btrfs_root *parent_root;
struct btrfs_block_rsv *rsv;
struct inode *parent_inode;
+   struct btrfs_path *path;
+   struct btrfs_dir_item *dir_item;
struct dentry *parent;
struct dentry *dentry;
struct extent_buffer *tmp;
@@ -932,6 +934,12 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
u64 objectid;
u64 root_flags;
 
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = pending->error = -ENOMEM;
+   goto path_alloc_fail;
+   }
+
new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS);
if (!new_root_item) {
ret = pending->error = -ENOMEM;
@@ -973,22 +981,20 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 */
ret = btrfs_set_inode_index(parent_inode, &index);
BUG_ON(ret); /* -ENOMEM */
-   ret = btrfs_insert_dir_item(trans, parent_root,
-   dentry->d_name.name, dentry->d_name.len,
-   parent_inode, &key,
-   BTRFS_FT_DIR, index);
-   if (ret == -EEXIST) {
+
+   /* check if there is a file/dir which has the same name. */
+   dir_item = btrfs_lookup_dir_item(NULL, parent_root, path,
+btrfs_ino(parent_inode),
+dentry->d_name.name,
+dentry->d_name.len, 0);
+   if (dir_item != NULL && !IS_ERR(dir_item)) {
pending->error = -EEXIST;
goto fail;
-   } else if (ret) {
+   } else if (IS_ERR(dir_item)) {
+   ret = PTR_ERR(dir_item);
goto abort_trans;
}
-
-   btrfs_i_size_write(parent_inode, parent_inode->i_size +
-dentry->d_name.len * 2);
-   ret = btrfs_update_inode(trans, parent_root, parent_inode);
-   if (ret)
-   goto abort_trans;
+   btrfs_release_path(path);
 
/*
 * pull in the delayed directory update
@@ -1062,17 +1068,33 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
ret = btrfs_reloc_post_snapshot(trans, pending);
if (ret)
goto abort_trans;
+
+   ret = btrfs_insert_dir_item(trans, parent_root,
+   dentry->d_name.name, dentry->d_name.len,
+   parent_inode, &key,
+   BTRFS_FT_DIR, index);
+   /* We have check the name at the beginning, so it is impossible. */
+   BUG_ON(ret == -EEXIST);
+   if (ret)
+   goto abort_trans;
+
+   btrfs_i_size_write(parent_inode, parent_inode->i_size +
+dentry->d_name.len * 2);
+   ret = btrfs_update_inode(trans, parent_root, parent_inode);
+   if (ret)
+   goto abort_trans;
 fail:
dput(parent);
trans->block_rsv = rsv;
 no_free_objectid:
kfree(new_root_item);
 root_item_a