On Thu, Jan 24, 2013 at 07:39:17PM +0800, Miao Xie wrote:
> On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote:
> > Hi Chen,
> > with all due respect, what do you mean by "I see" and "OK for users"?
> > The semantics of the fallocate API with/without the
> > FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I
> > understand, the file system *must* adhere to the semantics of this
> > API, as defined elsewhere.
> > Looking, for example, here:
> > http://man7.org/linux/man-pages/man2/fallocate.2.html
> > 
> > "Allocating disk space
> > The default operation (i.e., mode is zero) of fallocate() allocates
> > and initializes to zero the disk space within the range specified by
> > offset and len."
> > 
> > "Deallocating file space
> > Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux
> > 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte
> > range starting at offset and continuing for len bytes.  Within the
> > specified range, partial file system blocks are zeroed, and whole file
> > system blocks are removed from the file."
> > 
> > These are clearly two different modes of operation, and I don't think
> > you or me can decide otherwise, at this point.
> 
> I think send/receive commands just need ensure the content of the file is
> right, not the disk format. That is also the reason why the developers just
> send out the zero data when they meet a hole or pre-allocated extent. From
> this viewpoint, they are the same.

   I disagree on the "content vs representation" comment above. I
would very much hope that the reference receive implementation can
turn a string of zeroes (or a hole) back into a sparse file. As a
user, I'd be quite irritated if, say, one of my sparse VM images ended
up 5 times the size when I backed it up with send/receive, simply
because it's gone from holey to a huge bunch of zeroes. That
particular issue can be dealt with at the receiver level, though.

   Hugo.

> > However, I may be not knowledgeable enough to confirm this.
> > Jan/Alexander, can you perhaps comment on this?
> > 
> > Thanks,
> > Alex.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang <chenyang.f...@cn.fujitsu.com> 
> > wrote:
> >> 于 2013-1-23 19:56, Alex Lyakas 写道:
> >>> Hi,
> >>>
> >>> On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang <chenyang.f...@cn.fujitsu.com> 
> >>> wrote:
> >>>> From: Chen Yang <chenyang.f...@cn.fujitsu.com>
> >>>> Date: Wed, 23 Jan 2013 11:21:51 +0800
> >>>> Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for
> >>>>  btrfs-send mechanism
> >>>>
> >>>> When sending a file with sparse or pre-allocated part,
> >>>> these parts will be sent as ZERO streams, and it's unnecessary.
> >>>>
> >>>> There are two ways to improve this, one is just skip the EMPTY parts,
> >>>> and the other one is to add a punch command to send, when an EMPTY parts
> >>>> was detected. But considering a case of incremental sends, if we choose
> >>>> the first one, when a hole got punched into the file after the initial
> >>>> send, the data will be unchanged on the receiving side when received
> >>>> incrementally. So the second choice is right.
> >>>>
> >>>> Signed-off-by: Cheng Yang <chenyang.f...@cn.fujitsu.com>
> >>>> ---
> >>>>  fs/btrfs/send.c |   60 
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>>>  fs/btrfs/send.h |    3 +-
> >>>>  2 files changed, 61 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> >>>> index 5445454..31e9aef 100644
> >>>> --- a/fs/btrfs/send.c
> >>>> +++ b/fs/btrfs/send.c
> >>>> @@ -3585,6 +3585,52 @@ out:
> >>>>         return ret;
> >>>>  }
> >>>>
> >>>> +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len)
> >>>> +{
> >>>> +       int ret = 0;
> >>>> +       struct fs_path *p;
> >>>> +       mm_segment_t old_fs;
> >>>> +
> >>>> +       p = fs_path_alloc(sctx);
> >>>> +       if (!p)
> >>>> +               return -ENOMEM;
> >>>> +
> >>>> +       /*
> >>>> +        * vfs normally only accepts user space buffers for security 
> >>>> reasons.
> >>>> +        * we only read from the file and also only provide the read_buf 
> >>>> buffer
> >>>> +        * to vfs. As this buffer does not come from a user space call, 
> >>>> it's
> >>>> +        * ok to temporary allow kernel space buffers.
> >>>> +        */
> >>>> +       old_fs = get_fs();
> >>>> +       set_fs(KERNEL_DS);
> >>>> +
> >>>> +verbose_printk("btrfs: send_fallocate offset=%llu, len=%d\n", offset, 
> >>>> len);
> >>>> +
> >>>> +       ret = open_cur_inode_file(sctx);
> >>>> +       if (ret < 0)
> >>>> +               goto out;
> >>>> +
> >>>> +       ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH);
> >>>> +       if (ret < 0)
> >>>> +               goto out;
> >>>> +
> >>>> +       ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
> >>>> +       if (ret < 0)
> >>>> +               goto out;
> >>>> +
> >>>> +       TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
> >>>> +       TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
> >>>> +       TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len);
> >>>> +
> >>>> +       ret = send_cmd(sctx);
> >>>> +
> >>>> +tlv_put_failure:
> >>>> +out:
> >>>> +       fs_path_free(sctx, p);
> >>>> +       set_fs(old_fs);
> >>>> +       return ret;
> >>>> +}
> >>>> +
> >>>>  /*
> >>>>   * Read some bytes from the current inode/file and send a write command 
> >>>> to
> >>>>   * user space.
> >>>> @@ -3718,6 +3764,7 @@ static int send_write_or_clone(struct send_ctx 
> >>>> *sctx,
> >>>>         u64 pos = 0;
> >>>>         u64 len;
> >>>>         u32 l;
> >>>> +       u64 bytenr;
> >>>>         u8 type;
> >>>>
> >>>>         ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
> >>>> @@ -3731,8 +3778,19 @@ static int send_write_or_clone(struct send_ctx 
> >>>> *sctx,
> >>>>                  * sure to send the whole thing
> >>>>                  */
> >>>>                 len = PAGE_CACHE_ALIGN(len);
> >>>> -       } else {
> >>>> +       } else if (type == BTRFS_FILE_EXTENT_REG) {
> >>>>                 len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
> >>>> +               bytenr = btrfs_file_extent_disk_bytenr(path->nodes[0], 
> >>>> ei);
> >>>> +               if (bytenr == 0) {
> >>>> +                       ret = send_punch(sctx, offset, len);
> >>>> +                       goto out;
> >>>> +               }
> >>>> +       } else if (type == BTRFS_FILE_EXTENT_PREALLOC) {
> >>>> +               len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
> >>>> +               ret = send_punch(sctx, offset, len);
> >>>> +               goto out;
> >>>> +       } else {
> >>>> +               BUG();
> >>>>         }
> >>>
> >>> Are these two cases really the same? In the bytenr == 0 we want to
> >>> deallocate the range. While in the prealloc case, we want to ensure
> >>> disk space allocation. Or am I mistaken?
> >>> Looking at the receive side, you use the same command for both:
> >>> ret = fallocate(r->write_fd,
> >>>                FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> >>>                offset, len);
> >>>
> >>> Looking at btrfs_fallocate code, in that case it will always punch the 
> >>> hole:
> >>> static long btrfs_fallocate(struct file *file, int mode,
> >>>                           loff_t offset, loff_t len)
> >>> ...
> >>>       if (mode & FALLOC_FL_PUNCH_HOLE)
> >>>               return btrfs_punch_hole(inode, offset, len);
> >>>
> >>> So maybe you should have two different commands, or add a flag to
> >>> distinguish between the two cases?
> >>>
> >>> Thanks,
> >>> Alex.
> >>>
> >>
> >> I see sparse and pre-allocated parts both as EMPTY hole,
> >> and although different for a file, it's OK for users.
> >>
> >> Thanks
> >>
> >>>
> >>>
> >>>>
> >>>>         if (offset + len > sctx->cur_inode_size)
> >>>> diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
> >>>> index 1bf4f32..659ac8f 100644
> >>>> --- a/fs/btrfs/send.h
> >>>> +++ b/fs/btrfs/send.h
> >>>> @@ -20,7 +20,7 @@
> >>>>  #include "ctree.h"
> >>>>
> >>>>  #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
> >>>> -#define BTRFS_SEND_STREAM_VERSION 1
> >>>> +#define BTRFS_SEND_STREAM_VERSION 2
> >>>>
> >>>>  #define BTRFS_SEND_BUF_SIZE (1024 * 64)
> >>>>  #define BTRFS_SEND_READ_SIZE (1024 * 48)
> >>>> @@ -80,6 +80,7 @@ enum btrfs_send_cmd {
> >>>>         BTRFS_SEND_C_WRITE,
> >>>>         BTRFS_SEND_C_CLONE,
> >>>>
> >>>> +       BTRFS_SEND_C_PUNCH,
> >>>>         BTRFS_SEND_C_TRUNCATE,
> >>>>         BTRFS_SEND_C_CHMOD,
> >>>>         BTRFS_SEND_C_CHOWN,

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I gave up smoking, drinking and sex once. It was the scariest ---  
                         20 minutes of my life.                          

Attachment: signature.asc
Description: Digital signature

Reply via email to