Hi Jim, Sorry for the delay.
This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach cp(1) to make use of it. Changes: ======== extent-scan.h/extent-scan.c: 1. Removed all the *static* variables from this module. I'd like to introduce two new data structures "struct extent_scan" and "struct extent_info" which are used to reserve the information goes through a file reading and the info(logical offset/length/flags) regarding to each extent scan-returned separately, it is intended to work as a wrap for either fiemap or SEEK_DATA/SEEK_HOLE info. /* Structure used to reserve extent information. */ struct extent_info { /* Logical offset of an extent. */ off_t ext_logical; /* Extent length. */ uint64_t ext_length; /* Extent flags, use it for FIEMAP only. */ uint32_t ext_flags; }; /* Structure used to reserve extent scan information. */ struct extent_scan { /* File descriptor of extent scan run against. */ int fd; /* File name for debugging purpose. */ char *fname; /* Next extent scan start offset. */ off_t scan_start; /* How many extents info returned for a scan. */ uint32_t ei_count; /* If true, fall back to a normal copy, either set by the failure of ioctl(2) for FIEMAP or lseek(2) with SEEK_DATA. */ bool initial_scan_failed; /* If ture, the total extent scan per file has been finished. */ bool hit_last_extent; /* Extent information. */ struct extent_info *ext_info; }; 2. Functions in the new module: void open_extent_scan(int fd, char const *fname, struct extent_scan **scan); Call this function first, it do allocate the spaces for "extent_scan" and initialize some of entries in this structure if necessary. The returned variable will be used as the input argument of get_extents_info() which shown as below. bool get_extents_info(struct extent_scan *scan); maybe need to figure out a better name? This function accept an initialized extent_scan and allocated the spaces for extent_info based on the number of extents through either call fiemap ioctl(2) or lseek(2) SEEK_DATA/SEEK_HOLE stuff. Fill the extent_info with the logical offset and length per extent, and set the extent_info->ext_flags for fiemap, or just set it to ZERO for lseek. void close_extent_scan(struct extent_scan *scan); Call this function after every get_extents_info() done to release the memory of extent_info, but keep the extent_scan space. After a file copy done successfully, call it again to release the extent_scan. copy.c: ====== 1. Replace fill_with_holes_ok() with write_zeros (int fd, uint64_t len), but is it better to add an char const *src_name to this function for debugging purpose? 2. Replace fiemap_copy() with extent_copy() According to my tryout, it works fine, both make syntax-check and make distcheck also run passed. I hope you guys would like this implementation. :) Any comments are appreciated! >From 8ca00082e68b3bef4ee0ed042cb7a70dcaf154e8 Mon Sep 17 00:00:00 2001 From: Jie Liu <jeff....@oracle.com> Date: Sun, 26 Sep 2010 16:03:48 +0800 Subject: [PATCH 1/1] cp: add a new module for scanning extents * src/extent-scan.c: Source code for scanning extents. Call open_extent_scan() to allocate and return an initialized extent scan. Call get_extents_info() to get a number of extents for each iteration. Call close_extent_scan() to release the space allocated extent_info per extent scan, and extent_scan per file scan. * src/extent-scan.h: Header file of extent-scan.c. * src/Makefile.am: Reference it and link it to copy_source. * src/copy.c: Make use of the new module, replace fiemap_copy() with extent_copy(). Signed-off-by: Jie Liu <jeff....@oracle.com> --- src/Makefile.am | 2 +- src/copy.c | 186 +++++++++++++++++++++++++++++------------------------ src/extent-scan.c | 137 +++++++++++++++++++++++++++++++++++++++ src/extent-scan.h | 71 ++++++++++++++++++++ 4 files changed, 312 insertions(+), 84 deletions(-) create mode 100644 src/extent-scan.c create mode 100644 src/extent-scan.h diff --git a/src/Makefile.am b/src/Makefile.am index 7d56312..2412b16 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -459,7 +459,7 @@ uninstall-local: fi; \ fi -copy_sources = copy.c cp-hash.c +copy_sources = copy.c cp-hash.c extent-scan.c # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid # confusion with the `install' target. The install rule transforms `ginstall' diff --git a/src/copy.c b/src/copy.c index f48c74d..0c54365 100644 --- a/src/copy.c +++ b/src/copy.c @@ -35,6 +35,7 @@ #include "buffer-lcm.h" #include "copy.h" #include "cp-hash.h" +#include "extent-scan.h" #include "error.h" #include "fcntl--.h" #include "file-set.h" @@ -63,10 +64,6 @@ #include <sys/ioctl.h> -#ifndef HAVE_FIEMAP -# include "fiemap.h" -#endif - #ifndef HAVE_FCHOWN # define HAVE_FCHOWN false # define fchown(fd, uid, gid) (-1) @@ -153,74 +150,67 @@ clone_file (int dest_fd, int src_fd) #endif } -#ifdef __linux__ -# ifndef FS_IOC_FIEMAP -# define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap) -# endif -/* Perform a FIEMAP copy, if possible. - Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to - obtain a map of file extents excluding holes. This avoids the - overhead of detecting holes in a hole-introducing/preserving copy, - and thus makes copying sparse files much more efficient. Upon a - successful copy, return true. If the initial ioctl fails, set - *NORMAL_COPY_REQUIRED to true and return false. Upon any other - failure, set *NORMAL_COPY_REQUIRED to false and return false. */ static bool -fiemap_copy (int src_fd, int dest_fd, size_t buf_size, - off_t src_total_size, char const *src_name, - char const *dst_name, bool *normal_copy_required) +write_zeros (int fd, uint64_t n_bytes) { - bool last = false; - union { struct fiemap f; char c[4096]; } fiemap_buf; - struct fiemap *fiemap = &fiemap_buf.f; - struct fiemap_extent *fm_ext = &fiemap->fm_extents[0]; - enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext }; - verify (count != 0); + char *zeros = calloc (IO_BUFSIZE, sizeof (char)); + if (! zeros) + { + /* Try a small buffer. */ + static char zeros[1024]; + } + while (n_bytes) + { + uint64_t n = MIN (sizeof zeros, n_bytes); + if ((full_write (fd, zeros, n)) != n) + return false; + n_bytes -= n; + } + + return true; +} + +/* Perform an efficient extent copy, if possible. This avoids + the overhead of detecting holes in hole-introducing/preserving + copy, and thus makes copying sparse files much more efficient. + Upon a successful copy, return true. If the initial extent scan + fails, set *NORMAL_COPY_REQUIRED to true and return false. + Upon any other failure, set *NORMAL_COPY_REQUIRED to false and + return false. */ +static bool +extent_copy (int src_fd, int dest_fd, size_t buf_size, + off_t src_total_size, bool make_holes, + char const *src_name, char const *dst_name, + bool *require_normal_copy) +{ + struct extent_scan *scan; off_t last_ext_logical = 0; uint64_t last_ext_len = 0; uint64_t last_read_size = 0; - unsigned int i = 0; - *normal_copy_required = false; + unsigned int i; + bool ok = true; - /* This is required at least to initialize fiemap->fm_start, - but also serves (in mid 2010) to appease valgrind, which - appears not to know the semantics of the FIEMAP ioctl. */ - memset (&fiemap_buf, 0, sizeof fiemap_buf); + open_extent_scan (src_fd, src_name, &scan); do { - fiemap->fm_length = FIEMAP_MAX_OFFSET; - fiemap->fm_flags = FIEMAP_FLAG_SYNC; - fiemap->fm_extent_count = count; - - /* When ioctl(2) fails, fall back to the normal copy only if it - is the first time we met. */ - if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0) + ok = get_extents_info (scan); + if (! ok) { - /* If the first ioctl fails, tell the caller that it is - ok to proceed with a normal copy. */ - if (i == 0) - *normal_copy_required = true; - else - { - /* If the second or subsequent ioctl fails, diagnose it, - since it ends up causing the entire copy/cp to fail. */ - error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name)); - } + if (scan->hit_last_extent) + break; + + if (scan->initial_scan_failed) + *require_normal_copy = true; + return false; } - /* If 0 extents are returned, then more ioctls are not needed. */ - if (fiemap->fm_mapped_extents == 0) - break; - - for (i = 0; i < fiemap->fm_mapped_extents; i++) + for (i = 0; i < scan->ei_count; i++) { - assert (fm_ext[i].fe_logical <= OFF_T_MAX); - - off_t ext_logical = fm_ext[i].fe_logical; - uint64_t ext_len = fm_ext[i].fe_length; + off_t ext_logical = scan->ext_info[i].ext_logical; + uint64_t ext_len = scan->ext_info[i].ext_length; if (lseek (src_fd, ext_logical, SEEK_SET) < 0) { @@ -228,19 +218,30 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size, return false; } - if (lseek (dest_fd, ext_logical, SEEK_SET) < 0) + if (make_holes) { - error (0, errno, _("cannot lseek %s"), quote (dst_name)); - return false; + if (lseek (dest_fd, ext_logical, SEEK_SET) < 0) + { + error (0, errno, _("cannot lseek %s"), quote (dst_name)); + return false; + } } - - if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST) + else { - last_ext_logical = ext_logical; - last_ext_len = ext_len; - last = true; + /* If not making a sparse file, write zeros to the destination + file if there is a hole between the last and current extent. */ + if (last_ext_logical + last_ext_len < ext_logical) + { + uint64_t holes_len = ext_logical - last_ext_logical - last_ext_len; + if (! write_zeros (dest_fd, holes_len)) + return false; + } } + last_ext_logical = ext_logical; + last_ext_len = ext_len; + last_read_size = 0; + while (ext_len) { char buf[buf_size]; @@ -258,12 +259,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size, continue; #endif error (0, errno, _("reading %s"), quote (src_name)); - return false; + return false; } if (n_read == 0) { - /* Figure out how many bytes read from the last extent. */ + /* Figure out how many bytes read from the previous extent. */ last_read_size = last_ext_len - ext_len; break; } @@ -278,27 +279,44 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size, } } - fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length; + /* Release the space allocated to scan->ext_info. */ + close_extent_scan (scan); + } while (! scan->hit_last_extent); - } while (! last); + /* Release the space allocated to scan->fname and scan. */ + close_extent_scan (scan); /* If a file ends up with holes, the sum of the last extent logical offset - and the read-returned size will be shorter than the actual size of the - file. Use ftruncate to extend the length of the destination file. */ - if (last_ext_logical + last_read_size < src_total_size) + and the read-returned size or the last extent length will be shorter than + the actual size of the file. Use ftruncate to extend the length of the + destination file if make_holes, or write zeros up to the actual size of the + file. */ + if (make_holes) { - if (ftruncate (dest_fd, src_total_size) < 0) + if (last_ext_logical + last_read_size < src_total_size) { - error (0, errno, _("failed to extend %s"), quote (dst_name)); - return false; + if (ftruncate (dest_fd, src_total_size) < 0) + { + error (0, errno, _("failed to extend %s"), quote (dst_name)); + return false; + } + } + } + else + { + if (last_ext_logical + last_ext_len < src_total_size) + { + uint64_t holes_len = src_total_size - last_ext_logical - last_ext_len; + if (0 < holes_len) + { + if (! write_zeros (dest_fd, holes_len)) + return false; + } } } return true; } -#else -static bool fiemap_copy (ignored) { errno == ENOTSUP; return false; } -#endif /* FIXME: describe */ /* FIXME: rewrite this to use a hash table so we avoid the quadratic @@ -833,11 +851,13 @@ copy_reg (char const *src_name, char const *dst_name, if (make_holes) { bool require_normal_copy; - /* Perform efficient FIEMAP copy for sparse files, fall back to the - standard copy only if the ioctl(2) fails. */ - if (fiemap_copy (source_desc, dest_desc, buf_size, - src_open_sb.st_size, src_name, - dst_name, &require_normal_copy)) + /* Perform efficient extent copy for sparse file, fall back to the + standard copy only if the initial extent scan fails. If the + '--sparse=never' option was specified, we writing all data but + use extent copy if available to efficiently read. */ + if (extent_copy (source_desc, dest_desc, buf_size, + src_open_sb.st_size, make_holes, + src_name, dst_name, &require_normal_copy)) goto preserve_metadata; else { diff --git a/src/extent-scan.c b/src/extent-scan.c new file mode 100644 index 0000000..ff7d622 --- /dev/null +++ b/src/extent-scan.c @@ -0,0 +1,137 @@ +/* extent-scan.c -- core functions for scanning extents + Copyright (C) 2010 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + + Written by Jie Liu (jeff....@oracle.com). */ + +#include <config.h> +#include <stdio.h> +#include <sys/types.h> +#include <sys/ioctl.h> +#include <assert.h> + +#include "system.h" +#include "extent-scan.h" +#include "error.h" +#include "quote.h" + +#ifndef HAVE_FIEMAP +# include "fiemap.h" +#endif + +/* Allocate space for struct extent_scan, initialize the entries if + necessary and return it as the input argument of get_extents_info(). */ +extern void +open_extent_scan (int src_fd, char const *src_name, + struct extent_scan **scan) +{ + struct extent_scan *ext_scan = xmalloc (sizeof *ext_scan); + + ext_scan->fd = src_fd; + ext_scan->fname = xstrdup (src_name); + ext_scan->ei_count = 0; + ext_scan->scan_start = 0; + ext_scan->initial_scan_failed = false; + ext_scan->hit_last_extent = false; + + *scan = ext_scan; +} + +#ifdef __linux__ +# ifndef FS_IOC_FIEMAP +# define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap) +# endif +/* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to + obtain a map of file extents excluding holes. */ +extern bool +get_extents_info (struct extent_scan *scan) +{ + union { struct fiemap f; char c[4096]; } fiemap_buf; + struct fiemap *fiemap = &fiemap_buf.f; + struct fiemap_extent *fm_extents = &fiemap->fm_extents[0]; + enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_extents }; + verify (count != 0); + unsigned int i; + + /* This is required at least to initialize fiemap->fm_start, + but also serves (in mid 2010) to appease valgrind, which + appears not to know the semantics of the FIEMAP ioctl. */ + memset (&fiemap_buf, 0, sizeof fiemap_buf); + + fiemap->fm_start = scan->scan_start; + fiemap->fm_flags = FIEMAP_FLAG_SYNC; + fiemap->fm_extent_count = count; + fiemap->fm_length = FIEMAP_MAX_OFFSET - scan->scan_start; + + /* Fall back to the standard copy if call ioctl(2) failed for the + the first time. */ + if (ioctl (scan->fd, FS_IOC_FIEMAP, fiemap) < 0) + { + error (0, errno, _("%s: FIEMAP ioctl failed"), quote (scan->fname)); + + if (scan->scan_start == 0) + scan->initial_scan_failed = true; + return false; + } + + /* If 0 extents are returned, then more get_extent_table() are not needed. */ + if (fiemap->fm_mapped_extents == 0) + { + scan->hit_last_extent = true; + return false; + } + + scan->ei_count = fiemap->fm_mapped_extents; + scan->ext_info = xcalloc (scan->ei_count, sizeof scan->ext_info[0]); + + for (i = 0; i < scan->ei_count; i++) + { + assert (fm_extents[i].fe_logical <= OFF_T_MAX); + + scan->ext_info[i].ext_logical = fm_extents[i].fe_logical; + scan->ext_info[i].ext_length = fm_extents[i].fe_length; + scan->ext_info[i].ext_flags = fm_extents[i].fe_flags; + } + + i--; + if (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_LAST) + { + scan->hit_last_extent = true; + return true; + } + + scan->scan_start = fm_extents[i].fe_logical + fm_extents[i].fe_length; + + return true; +} +#else +extern bool get_extents_info (ignored) { errno = ENOTSUP; return false; } +#endif + +extern void +close_extent_scan (struct extent_scan *scan) +{ + if (scan) + { + /* If not hit the last extent, only release extent_info. */ + if (scan->ext_info && ! scan->hit_last_extent) + { + free (scan->ext_info); + return; + } + free (scan->fname); + free (scan); + } +} diff --git a/src/extent-scan.h b/src/extent-scan.h new file mode 100644 index 0000000..3caae88 --- /dev/null +++ b/src/extent-scan.h @@ -0,0 +1,71 @@ +/* core functions for efficient reading sparse files + Copyright (C) 2010 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + + Written by Jie Liu (jeff....@oracle.com). */ + +#ifndef EXTENT_SCAN_H +# define EXTENT_SCAN_H + +/* Structure used to reserve information of each extent. */ +struct extent_info +{ + /* Logical offset of an extent. */ + off_t ext_logical; + + /* Extent length. */ + uint64_t ext_length; + + /* Extent flags, use it for FIEMAP only, or set it to zero. */ + uint32_t ext_flags; +}; + +/* Structure used to reserve extent scan information per file. */ +struct extent_scan +{ + /* File descriptor of extent scan run against. */ + int fd; + + /* File name for debugging purpose. */ + char *fname; + + /* Next scan start offset. */ + off_t scan_start; + + /* How many extent info returned for a scan. */ + uint32_t ei_count; + + /* If true, fall back to a normal copy, either + set by the failure of ioctl(2) for FIEMAP or + lseek(2) with SEEK_DATA. */ + bool initial_scan_failed; + + /* If ture, the total extent scan per file has been finished. */ + bool hit_last_extent; + + /* Extent information. */ + struct extent_info *ext_info; +}; + +void +open_extent_scan (int src_fd, char const *src_name, + struct extent_scan **scan); + +bool +get_extents_info (struct extent_scan *scan); + +void +close_extent_scan (struct extent_scan *scan); +#endif /* EXTENT_SCAN_H */ -- 1.5.4.3 Regards, -Jeff Jim Meyering wrote: > jeff.liu wrote: >> Hi Jim and All, >> >> Do you have any comments for the current implementation? > > Hi Jeff, > > Thanks for the reminder. > I've just gone back and looked at those patches: > > http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/20534/focus=21008 > > There are some superficial problems. > > First, whenever you move code from one place to another, > the commit that performs the move should be careful to > induce no other change. In this case, the change should > remove code from copy.c and create the new .c file with > code that is essentially identical. You'll have to remove > a "static" attribute on the primary function(s), and add > #include directives in the new file, but those are inevitable. > Also, in copy.c, you will remove the function body > and associated #include directives, adding an #include > of the new .h file. Of course, this change must also update > src/Makefile.am, and the result should pass "make distcheck". > > But perhaps you require new functions like init_extent_scan > in order to use the abstracted function properly. > In that case, your first commit would add the new functions > in copy.c and make use of them there. Then you would move > all of the functions to their new file in the next commit, > making no semantic change. > > Note however, that this copying code is intended to be usable in a > multi-threaded application, and thus must avoid using internal static > state. Your patch adds a few new static variables, each of which > would cause trouble in such an application: > > +static size_t current_scanned_extents_count = 0; > + static uint64_t next_map_start = 0; > + static size_t i = 0; > > While you're rearranging your patch along the lines above, > please eliminate those static variables, too. > > Also, this new function should be adjusted: > > +/* Write zeros as holes to the destination file. */ > +static bool > +fill_with_holes_ok (int dest_fd, const char *dst_name, > + char *buf, size_t buf_size, > + uint64_t holes_len) > > Its signature is unnecessarily complicated for a function > that does nothing more than write N zero bytes to a file descriptor. > Also, the function name is misleading (as is its holes_len parameter), > since it certainly does not create holes. > > Hmm... now I'm suspicious: could the second use of your fill_with_holes_ok > write from an uninitialized "buf"? It appears that is possible, > but I confess not to have applied the patch. > > What do you think of this signature, > > static bool > write_zeros (int fd, uint64_t n_bytes) > > That would require a buffer full of zeros, preferably of optimal size. > It could use a body like this, > > { > static char zero[32 * 1024]; > while (n_bytes) > { > uint64_t n = MIN (sizeof zero, n_bytes); > if (full_write (fd, zero, n)) != n) > return false; > n_bytes -= n; > } > return true; > } > > or even calloc an IO_BUFSIZE-byte buffer on the first call > and use that. Yes, using calloc appears better, since this code > will end up being used relatively infrequently. Or perhaps better > still, do use calloc, but if the allocation fails, fall back on > using a smaller static buffer, of size say 1024 bytes. > > Of course, simplifying the function means each caller > will have to diagnose failure, but imho, that's preferable > in this case. > > Jim > > >