commit duperemove for openSUSE:Factory

h_root Tue, 09 Dec 2014 00:14:44 -0800

Hello community,

here is the log from the commit of package duperemove for openSUSE:Factory 
checked in at 2014-12-09 09:14:30
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/duperemove (Old)
 and      /work/SRC/openSUSE:Factory/.duperemove.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "duperemove"

Changes:
--------
--- /work/SRC/openSUSE:Factory/duperemove/duperemove.changes    2014-11-19 
20:30:29.000000000 +0100
+++ /work/SRC/openSUSE:Factory/.duperemove.new/duperemove.changes       
2014-12-09 09:14:07.000000000 +0100
@@ -1,0 +2,13 @@
+Tue Dec  9 04:12:43 UTC 2014 - mfas...@suse.com
+
+- Update to duperemove v0.09.beta5
+  - Documentation updates
+    - FAQ and README are more relevant now
+    - added man pages for show-shared-extents and hashstats programs
+    - updated duperemove man page, and duperemove usage() function
+  - Have show-shared-extents take a file list as arguments.
+  - Change default of --lookup-extents option back to 'no'
+  - Write hash type into hashfile header, check against what hash we were
+    compiled with.
+
+-------------------------------------------------------------------

Old:
----
  duperemove-v0.09.beta3.tar.gz

New:
----
  duperemove-v0.09.beta5.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ duperemove.spec ++++++
--- /var/tmp/diff_new_pack.s8n353/_old  2014-12-09 09:14:09.000000000 +0100
+++ /var/tmp/diff_new_pack.s8n353/_new  2014-12-09 09:14:09.000000000 +0100
@@ -17,13 +17,13 @@
 
 
 %define modname duperemove
-%define tar_version v0.09.beta3
+%define tar_version v0.09.beta5
 
 Name:           duperemove
 BuildRequires:  gcc-c++
 BuildRequires:  glib2-devel
 BuildRequires:  libgcrypt-devel
-Version:        0.09~beta3
+Version:        0.09~beta5
 Release:        0
 Summary:        Software to find duplicate extents in files and remove them
 License:        GPL-2.0
@@ -50,8 +50,6 @@
 
 %build
 make CFLAGS="%optflags"
-make hashstats CFLAGS="%optflags"
-make btrfs-extent-same CFLAGS="%optflags"
 
 %install
 mkdir -p %{buildroot}/%{_sbindir}
@@ -61,6 +59,8 @@
 cp %{_builddir}/%{modname}-%{tar_version}/%{samename} %{buildroot}/%{_sbindir}
 mkdir -p %{buildroot}%{_mandir}/man8
 cp %{_builddir}/%{modname}-%{tar_version}/%{modname}.8 
%{buildroot}/%{_mandir}/man8/
+cp %{_builddir}/%{modname}-%{tar_version}/hashstats.8 
%{buildroot}/%{_mandir}/man8/
+cp %{_builddir}/%{modname}-%{tar_version}/show-shared-extents.8 
%{buildroot}/%{_mandir}/man8/
 cp %{_builddir}/%{modname}-%{tar_version}/%{samename}.8 
%{buildroot}/%{_mandir}/man8/
 
 %files -n btrfs-extent-same
@@ -70,10 +70,12 @@
 
 %files
 %defattr(-, root, root)
-%doc LICENSE README
+%doc LICENSE README.md FAQ.md
 %{_sbindir}/duperemove
 %{_sbindir}/hashstats
 %{_sbindir}/show-shared-extents
 %{_mandir}/man?/%{modname}.8.gz
+%{_mandir}/man?/hashstats.8.gz
+%{_mandir}/man?/show-shared-extents.8.gz
 
 %changelog

++++++ duperemove-v0.09.beta3.tar.gz -> duperemove-v0.09.beta5.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/FAQ.md 
new/duperemove-v0.09.beta5/FAQ.md
--- old/duperemove-v0.09.beta3/FAQ.md   1970-01-01 01:00:00.000000000 +0100
+++ new/duperemove-v0.09.beta5/FAQ.md   2014-12-09 05:05:30.000000000 +0100
@@ -0,0 +1,53 @@
+# Duperemove: Frequently Asked Questions
+
+### Is there an upper limit to the amount of data duperemove can process?
+
+v0.08 of duperemove has been tested on small numbers of VMS or iso
+files (5-10) it can probably scale up to 50 or so.
+
+v0.09 is much faster at hashing and cataloging extents and therefore
+can handle a larger data set. My own testing is typically with a
+filesystem of about 750 gigabytes and millions of files.
+
+
+### Why does it not print out all duplicate extents?
+
+Internally duperemove is classifying extents based on various criteria
+like length, number of identical extents, etc. The printout we give is
+based on the results of that classification.
+
+
+### How can I find out my space savings after a dedupe?
+
+Duperemove will print out an estimate of the saved space after a
+dedupe operation for you. You can also do a df before the dedupe
+operation, then a df about 60 seconds after the operation. It is
+common for btrfs space reporting to be 'behind' while delayed updates
+get processed, so an immediate df after deduping might not show any
+savings.
+
+
+### Why is the total deduped data report an estimate?
+
+At the moment duperemove can detect that some underlying extents are
+shared with other files, but it can not resolve which files those
+extents are shared with.
+
+Imagine duperemove is examing a series of files and it notes a shared
+data region in one of them. That data could be shared with a file
+outside of the series. Since duperemove can't resolve that information
+it will account the shared data against our dedupe operation while in
+reality, the kernel might deduplicate it further for us.
+
+
+### Why are my files showing dedupe but my disk space is not shrinking?
+
+This is a little complicated, but it comes down to a feature in Btrfs
+called _bookending_. The Btrfs wiki explains this in [detail]
+(http://en.wikipedia.org/wiki/Btrfs#Extents).
+
+Essentially though, the underlying representation of an extent in
+Btrfs can not be split (with small exception). So sometimes we can end
+up in a situation where a file extent gets partially deduped (and the
+extents marked as shared) but the underlying extent item is not freed
+or truncated.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/Makefile 
new/duperemove-v0.09.beta5/Makefile
--- old/duperemove-v0.09.beta3/Makefile 2014-11-17 20:07:48.000000000 +0100
+++ new/duperemove-v0.09.beta5/Makefile 2014-12-09 05:05:30.000000000 +0100
@@ -1,9 +1,9 @@
-RELEASE=v0.09.beta3
+RELEASE=v0.09.beta5
 
 CC = gcc
 CFLAGS = -Wall -ggdb
 
-MANPAGES=duperemove.8 btrfs-extent-same.8
+MANPAGES=duperemove.8 btrfs-extent-same.8 hashstats.8 show-shared-extents.8
 
 CFILES=duperemove.c hash-tree.c results-tree.c rbtree.c dedupe.c filerec.c \
        btrfs-util.c util.c serialize.c memstats.c
@@ -16,8 +16,8 @@
 HEADERS=csum.h hash-tree.h results-tree.h kernel.h list.h rbtree.h dedupe.h \
        btrfs-ioctl.h filerec.h btrfs-util.h debug.h util.h serialize.h \
        memstats.h
-DIST_SOURCES:=$(DIST_CFILES) $(HEADERS) LICENSE Makefile rbtree.txt README \
-       TODO $(MANPAGES) SubmittingPatches
+DIST_SOURCES:=$(DIST_CFILES) $(HEADERS) LICENSE Makefile rbtree.txt README.md \
+       TODO $(MANPAGES) SubmittingPatches FAQ.md
 DIST=duperemove-$(RELEASE)
 DIST_TARBALL=$(DIST).tar.gz
 TEMP_INSTALL_DIR:=$(shell mktemp -du -p .)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/README 
new/duperemove-v0.09.beta5/README
--- old/duperemove-v0.09.beta3/README   2014-11-17 20:07:48.000000000 +0100
+++ new/duperemove-v0.09.beta5/README   1970-01-01 01:00:00.000000000 +0100
@@ -1,95 +0,0 @@
-Duperemove
-
-Duperemove is a simple tool for finding duplicated extents and
-submitting them for deduplication. When given a list of files it will
-hash their contents on a block by block basis and compare those hashes
-to each other, finding and categorizing extents that match each
-other. When given the -d option, duperemove will submit those
-extents for deduplication using the btrfs-extent-same ioctl.
-
-Duperemove has two major modes of operation one of which is a subset
-of the other.
-
-
-Readonly / Non-deduplicating Mode
-
-When run without -d (the default) duperemove will print out one or
-more tables of matching extents it has determined would be ideal
-candidates for deduplication. As a result, readonly mode is useful for
-seeing what duperemove might do when run with '-d'. The output could
-also be used by some other software to submit the extents for
-deduplication at a later time.
-
-It is important to note that this mode will not print out *all*
-instances of matching extents, just those it would consider for
-deduplication.
-
-Generally, duperemove does not concern itself with the underlying
-representation of the extents it processes. Some of them could be
-compressed, undergoing I/O, or even have already been deduplicated. In
-dedupe mode, the kernel handles those details and therefore we try not
-to replicate that work.
-
-
-Deduping Mode
-
-This functions similarly to readonly mode with the exception that the
-duplicated extents found in our "read, hash, and compare" step will
-actually be submitted for deduplication. An estimate of the total data
-deduplicated will be printed after the operation is complete. This
-estimate is calculated by comparing the total amount of shared bytes
-in each file before and after the dedupe.
-
-
-See the duperemove man page for further details about running duperemove.
-
-
-REQUIREMENTS
-
-Kernel: Duperemove needs a kernel version equal to or greater than 2.6.33.
-
-Libraries: Duperemove uses libgcrypt for hashing.
-
-
-FAQ
-
-* Is there an upper limit to the amount of data duperemove can process?
-
-Right now duperemove has been tested on small numbers of VMS or iso
-files (5-10). I don't believe there should be a major problem scaling
-that up to 50 or so.
-
-
-* Why does it not print out all duplicate extents?
-
-Internally duperemove is classifying extents based on various criteria
-like length, number of identical extents, etc. The printout we give is
-based on the results of that classification.
-
-
-* How can I find out my space savings after a dedupe?
-
-Duperemove will print out an estimate of the saved space after a
-dedupe operation for you. You can also do a df before the dedupe
-operation, then a df about 60 seconds after the operation. It is
-common for btrfs space reporting to be 'behind' while delayed updates
-get processed, so an immediate df after deduping might not show any
-savings.
-
-
-* Why is the total deduped data report an estimate?
-
-At the moment duperemove can detect that some underlying extents are
-shared with other files, but it can not resolve which files those
-extents are shared with.
-
-Imagine duperemove is examing a series of files and it notes a shared
-data region in one of them. That data could be shared with a file
-outside of the series. Since duperemove can't resolve that information
-it will account the shared data against our dedupe operation while in
-reality, the kernel might deduplicate it further for us.
-
-
-USAGE EXAMPLES
-
-TODO
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/README.md 
new/duperemove-v0.09.beta5/README.md
--- old/duperemove-v0.09.beta3/README.md        1970-01-01 01:00:00.000000000 
+0100
+++ new/duperemove-v0.09.beta5/README.md        2014-12-09 05:05:30.000000000 
+0100
@@ -0,0 +1,121 @@
+# Duperemove
+
+Duperemove is a simple tool for finding duplicated extents and
+submitting them for deduplication. When given a list of files it will
+hash their contents on a block by block basis and compare those hashes
+to each other, finding and categorizing extents that match each
+other. When given the -d option, duperemove will submit those
+extents for deduplication using the btrfs-extent-same ioctl.
+
+Duperemove has two major modes of operation one of which is a subset
+of the other.
+
+
+## Readonly / Non-deduplicating Mode
+
+When run without -d (the default) duperemove will print out one or
+more tables of matching extents it has determined would be ideal
+candidates for deduplication. As a result, readonly mode is useful for
+seeing what duperemove might do when run with '-d'. The output could
+also be used by some other software to submit the extents for
+deduplication at a later time.
+
+It is important to note that this mode will not print out *all*
+instances of matching extents, just those it would consider for
+deduplication.
+
+Generally, duperemove does not concern itself with the underlying
+representation of the extents it processes. Some of them could be
+compressed, undergoing I/O, or even have already been deduplicated. In
+dedupe mode, the kernel handles those details and therefore we try not
+to replicate that work.
+
+
+## Deduping Mode
+
+This functions similarly to readonly mode with the exception that the
+duplicated extents found in our "read, hash, and compare" step will
+actually be submitted for deduplication. An estimate of the total data
+deduplicated will be printed after the operation is complete. This
+estimate is calculated by comparing the total amount of shared bytes
+in each file before and after the dedupe.
+
+
+See the duperemove man page for further details about running duperemove.
+
+
+# Requirements
+
+The latest stable code can be found in 
[v0.09-branch](https://github.com/markfasheh/duperemove/tree/v0.09-branch).
+
+Kernel: Duperemove needs a kernel version equal to or greater than 3.13
+
+Libraries: Duperemove uses glib2 and optionally libgcrypt for hashing.
+
+
+# FAQ
+
+Please see the FAQ file [provided in the duperemove
+source](https://github.com/markfasheh/duperemove/blob/master/FAQ.md)
+
+# Usage Examples
+
+Duperemove takes a list of files and directories to scan for
+dedupe. If a directory is specified, all regular files within it will
+be scanned. Duperemove can also be told to recursively scan
+directories with the '-r' switch. If '-h' is provided, duperemove will
+print numbers in powers of 1024 (e.g., "128K").
+
+Assume this abitrary layout for the following examples.
+
+    .
+    ├── dir1
+    │   ├── file3
+    │   ├── file4
+    │   └── subdir1
+    │       └── file5
+    ├── file1
+    └── file2
+
+This will dedupe files 'file1' and 'file2':
+
+    duperemove -dh file1 file2
+
+This does the same but adds any files in dir1 (file3 and file4):
+
+    duperemove -dh file1 file2 dir1
+
+This will dedupe exactly the same as above but will recursively walk
+dir1, thus adding file5.
+
+    duperemove -dhr file1 file2 dir1/
+
+
+An actual run, output will differ according to duperemove version.
+
+    duperemove -dhr file1 file2 dir1
+    Using 128K blocks
+    Using hash: SHA256
+    Using 2 threads for file hashing phase
+    csum: file1     [1/5]
+    csum: file2     [2/5]
+    csum: dir1/file3       [3/5]
+    csum: dir1/subdir1/file5       [4/5]
+    csum: dir1/file4       [5/5]
+    Hashed 80 blocks, resulting in 17 unique hashes. Calculating duplicate
+    extents - this may take some time.
+    [########################################]
+    Search completed with no errors.
+    Simple read and compare of file data found 2 instances of extents that 
might
+    benefit from deduplication.
+    Start           Length          Filename (2 extents)
+    0.0     2.0M    "file2"
+    0.0     2.0M    "dir1//file4"
+    Start           Length          Filename (3 extents)
+    0.0     2.0M    "file1"
+    0.0     2.0M    "dir1//file3"
+    0.0     2.0M    "dir1//subdir1/file5"
+    Dedupe 1 extents with target: (0.0, 2.0M), "file2"
+    Dedupe 2 extents with target: (0.0, 2.0M), "file1"
+    Kernel processed data (excludes target files): 6.0M
+    Comparison of extent info shows a net change in shared extents of: 10.0M
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/csum-gcrypt.c 
new/duperemove-v0.09.beta5/csum-gcrypt.c
--- old/duperemove-v0.09.beta3/csum-gcrypt.c    2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/csum-gcrypt.c    2014-12-09 05:05:30.000000000 
+0100
@@ -29,6 +29,8 @@
 GCRY_THREAD_OPTION_PTHREAD_IMPL;
 
 unsigned int digest_len = 0;
+#define        HASH_TYPE       "SHA256  "
+char hash_type[8];
 
 void checksum_block(char *buf, int len, unsigned char *digest)
 {
@@ -59,6 +61,8 @@
        if (!digest_len)
                return 1;
 
+       strncpy(hash_type, HASH_TYPE, 8);
+
        abort_on(digest_len == 0 || digest_len > DIGEST_LEN_MAX);
 
        return 0;
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/csum-mhash.c 
new/duperemove-v0.09.beta5/csum-mhash.c
--- old/duperemove-v0.09.beta3/csum-mhash.c     2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/csum-mhash.c     2014-12-09 05:05:30.000000000 
+0100
@@ -27,6 +27,8 @@
 #define        HASH_FUNC       MHASH_SHA256
 
 uint32_t digest_len = 0;
+#define        HASH_TYPE       "SHA256  "
+char hash_type[8];
 
 void checksum_block(char *buf, int len, unsigned char *digest)
 {
@@ -43,6 +45,8 @@
        if (!digest_len)
                return 1;
 
+       strncpy(hash_type, HASH_TYPE, 8);
+
        abort_on(digest_len == 0 || digest_len > DIGEST_LEN_MAX);
 
        return 0;
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/csum-test.c 
new/duperemove-v0.09.beta5/csum-test.c
--- old/duperemove-v0.09.beta3/csum-test.c      2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/csum-test.c      2014-12-09 05:05:30.000000000 
+0100
@@ -38,7 +38,7 @@
 {
        char *fname = argv[1];
        int fd, ret;
-       size_t len;
+       ssize_t len;
        struct stat s;
        struct running_checksum *csum;
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/csum.h 
new/duperemove-v0.09.beta5/csum.h
--- old/duperemove-v0.09.beta3/csum.h   2014-11-17 20:07:48.000000000 +0100
+++ new/duperemove-v0.09.beta5/csum.h   2014-12-09 05:05:30.000000000 +0100
@@ -6,6 +6,7 @@
 #define        DIGEST_LEN_MAX  32
 
 extern unsigned int digest_len;
+extern char hash_type[8];
 
 /* Init / debug */
 int init_hash(void);
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/duperemove.8 
new/duperemove-v0.09.beta5/duperemove.8
--- old/duperemove-v0.09.beta3/duperemove.8     2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/duperemove.8     2014-12-09 05:05:30.000000000 
+0100
@@ -93,22 +93,26 @@
 
 .TP
 \fB\--lookup-extents=[yes|no]\fR
-While checksumming a file, duperemove will lookup file extent
-state. This information is later used to optimize the search for
-duplicate extents. This defaults to on. However, if you use duperemove on a
-subvolume that has been snapshotted you will want to read below.
-
-On btrfs, extents which have been snapshotted are reported as
-shared. Internally duperemove considers shared extents as
-deduped. When run on a subvolume with snapshots then, duperemove may
-skip some or all extents, depending on when the most recent snapshot
-was taken.
-
-The workaround is to run duperemove at least once with
-\fB\--lookup-extents=no\fR so that it considers all extents for
-dedupe. You can then run with extent lookups on until your next snapshot.
+Defaults to no. While checksumming a file, duperemove can optionally
+lookup file extent state to see whether a given file block is already
+shared. This information can later be used to optimize the search for
+duplicate extents. There are some caveats to this, so please read
+below.
+
+On btrfs, extents which have been snapshotted are reported as shared,
+as more than one inode points to them. A deduped extent also gets
+reported as shared for the same reasons. Internally duperemove can not
+yet make the distinction between the two. If \fB--lookup-extents\fR is
+turned on, duperemove will consider a shared extent to have already
+been deduped. On a snapshotted file system this might cause all or
+most of the extents to be skipped for dedupe.
+
+If you are not making snapshots on the fs you are deduping, this
+option will allow duperemove to make better decisions on which extents
+to dedupe.
 
-We plan to remove this restriction in a future version of duperemove.
+A future version of duperemove will remove this restriction, allowing
+us to default this option to on.
 
 .TP
 \fB\-?, --help\fR
@@ -116,39 +120,7 @@
 
 .SH "FAQ"
 
-.B "Is there an upper limit to the amount of data duperemove can process?"
-
-Right now duperemove has been tested on small numbers of VMS or iso
-files (5-10). I don't believe there should be a major problem scaling
-that up to 50 or so.
-
-.B "Why does it not print out all duplicate extents?"
-
-Internally duperemove is classifying extents based on various criteria
-like length, number of identical extents, etc. The printout we give is
-based on the results of that classification.
-
-.B "How can I find out my space savings after a dedupe?"
-
-\fBDuperemove\fR will print out an estimate of the saved space after a
-dedupe operation. You can also do a \fBdf\fR before the dedupe
-operation, then a \fBdf\fR about 60 seconds after the operation. It is
-common for \fIbtrfs\fR space reporting to be 'behind' while delayed
-updates get processed, so an immediate df after deduping might not
-show any savings.
-
-.B "Why is the total deduped data report an estimate?"
-
-At the moment \fBduperemove\fR can detect that some underlying extents are
-shared with other files, but it can not resolve which files those
-extents are shared with.
-
-Imagine \fBduperemove\fR is examing a series of files and it notes a
-shared data region in one of them. That data could be shared with a
-file outside of the series. Since \fBduperemove\fR can't resolve that
-information it will account the shared data against our dedupe
-operation while in reality, the kernel might deduplicate it further
-for us.
+Please see the \fBFAQ.md\fR file which should have been included with your 
duperemove package.
 
 .SH "NOTES"
 Deduplication is currently only supported by the \fIbtrfs\fR filesystem.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/duperemove.c 
new/duperemove-v0.09.beta5/duperemove.c
--- old/duperemove-v0.09.beta3/duperemove.c     2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/duperemove.c     2014-12-09 05:05:30.000000000 
+0100
@@ -70,11 +70,10 @@
 static dev_t one_fs_dev = 0;
 
 static int write_hashes = 0;
-static int scramble_filenames = 0;
 static int read_hashes = 0;
 static char *serialize_fname = NULL;
 static unsigned int hash_threads = 0;
-static int do_lookup_extents = 1;
+static int do_lookup_extents = 0;
 
 static int fancy_status = 0;
 
@@ -597,16 +596,9 @@
        printf("\t-h\t\tPrint numbers in human-readable format.\n");
        printf("\t-x\t\tDon't cross filesystem boundaries.\n");
        printf("\t-v\t\tBe verbose.\n");
-       printf("\t--hash-threads=N\n\t\t\tUse N threads for hashing phase. "
-              "Default is automatically detected.\n");
-       printf("\t--read-hashes=hashfile\n\t\t\tRead hashes from a hashfile. "
-              "A file list is not required with this option.\n");
-       printf("\t--write-hashes=hashfile\n\t\t\tWrite hashes to a hashfile. "
-              "These can be read in at a later date and deduped from.\n");
-       printf("\t--lookup-extents=[yes|no]\n\t\t\tLookup extent info during "
-              "checksum phase. Defaults to yes.\n");
        printf("\t--debug\t\tPrint debug messages, forces -v if selected.\n");
        printf("\t--help\t\tPrints this help text.\n");
+       printf("\nPlease see the duperemove(8) manpage for more options.\n");
 }
 
 static int add_file(const char *name, int dirfd);
@@ -786,7 +778,6 @@
        HELP_OPTION,
        VERSION_OPTION,
        WRITE_HASHES_OPTION,
-       WRITE_HASHES_SCRAMBLE_OPTION,
        READ_HASHES_OPTION,
        HASH_THREADS_OPTION,
        LOOKUP_EXTENTS_OPTION,
@@ -804,7 +795,6 @@
                { "help", 0, 0, HELP_OPTION },
                { "version", 0, 0, VERSION_OPTION },
                { "write-hashes", 1, 0, WRITE_HASHES_OPTION },
-               { "write-hashes-scramble", 1, 0, WRITE_HASHES_SCRAMBLE_OPTION },
                { "read-hashes", 1, 0, READ_HASHES_OPTION },
                { "hash-threads", 1, 0, HASH_THREADS_OPTION },
                { "lookup-extents", 1, 0, LOOKUP_EXTENTS_OPTION },
@@ -846,8 +836,6 @@
                case 'h':
                        human_readable = 1;
                        break;
-               case WRITE_HASHES_SCRAMBLE_OPTION:
-                       scramble_filenames = 1;
                case WRITE_HASHES_OPTION:
                        write_hashes = 1;
                        serialize_fname = strdup(optarg);
@@ -1236,7 +1224,8 @@
                fancy_status = 1;
 
        if (read_hashes) {
-               ret = read_hash_tree(serialize_fname, &tree, &blocksize, NULL);
+               ret = read_hash_tree(serialize_fname, &tree, &blocksize, NULL,
+                                    0);
                if (ret == FILE_VERSION_ERROR) {
                        fprintf(stderr,
                                "Hash file \"%s\": "
@@ -1250,6 +1239,12 @@
                                "Bad magic.\n",
                                serialize_fname);
                        goto out;
+               } else if (ret == FILE_HASH_TYPE_ERROR) {
+                       fprintf(stderr,
+                               "Hash file \"%s\": Unkown hash type \"%.*s\".\n"
+                               "(we use \"%.*s\").\n", serialize_fname,
+                               8, unknown_hash_type, 8, hash_type);
+                       goto out;
                } else if (ret) {
                        fprintf(stderr, "Hash file \"%s\": "
                                "Error %d while reading: %s.\n",
@@ -1259,6 +1254,7 @@
        }
 
        printf("Using %uK blocks\n", blocksize/1024);
+       printf("Using hash: %.*s\n", 8, hash_type);
 
        if (!read_hashes) {
                ret = populate_hash_tree(&tree);
@@ -1271,8 +1267,7 @@
        debug_print_tree(&tree);
 
        if (write_hashes) {
-               ret = serialize_hash_tree(serialize_fname, &tree, blocksize,
-                                         scramble_filenames);
+               ret = serialize_hash_tree(serialize_fname, &tree, blocksize);
                if (ret)
                        fprintf(stderr, "Error %d while writing to hash 
file\n", ret);
                goto out;
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/filerec.c 
new/duperemove-v0.09.beta5/filerec.c
--- old/duperemove-v0.09.beta3/filerec.c        2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/filerec.c        2014-12-09 05:05:30.000000000 
+0100
@@ -477,8 +477,10 @@
        memset(fiemap, 0, sizeof(struct fiemap));
 
        do {
+#ifndef        FILEREC_TEST
                dprintf("(fiemap) %s: start: %"PRIu64", len: %"PRIu64"\n",
                        file->filename, start, len);
+#endif
 
                /*
                 * Do search from 0 to EOF. btrfs was doing some weird
@@ -651,42 +653,40 @@
 
 int main(int argc, char **argv)
 {
-       int ret;
+       int ret, i;
        struct filerec *file;
-       uint64_t loff, len;
        uint64_t shared = 0;
 
        init_filerec();
 
-       /* test_filerec filename loff len */
-       if (argc < 4) {
-               printf("Usage: filerec_test filename loff len\n");
-               return 1;
-       }
-
-       file = filerec_new(argv[1], 500, 1); /* Use made up ino */
-       if (!file) {
-               fprintf(stderr, "filerec_new(): malloc error\n");
+       if (argc < 2) {
+               printf("Usage: show_shared_extents filename1 filename2 ...\n");
                return 1;
        }
 
-       ret = filerec_open(file, 0);
-       if (ret)
-               goto out;
+       for (i = 1; i < argc; i++) {
+               file = filerec_new(argv[i], 500 + i, 1); /* Use made up ino */
+               if (!file) {
+                       fprintf(stderr, "filerec_new(): malloc error\n");
+                       return 1;
+               }
 
-       loff = atoll(argv[2]);
-       len = atoll(argv[3]);
+               ret = filerec_open(file, 0);
+               if (ret)
+                       goto out;
+
+               ret = filerec_count_shared(file, 0, -1ULL, &shared);
+               filerec_close(file);
+               if (ret) {
+                       fprintf(stderr, "fiemap error %d: %s\n", ret, 
strerror(ret));
+                       goto out;
+               }
 
-       ret = filerec_count_shared(file, loff, len, &shared);
-       if (ret) {
-               fprintf(stderr, "fiemap error %d: %s\n", ret, strerror(ret));
-               goto out_close;
+               printf("%s: %"PRIu64" shared bytes\n", file->filename, shared);
+               filerec_free(file);
+               file = NULL;
        }
 
-       printf("%s: %"PRIu64" shared bytes\n", file->filename, shared);
-
-out_close:
-       filerec_close(file);
 out:
        filerec_free(file);
        return ret;
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/hashstats.8 
new/duperemove-v0.09.beta5/hashstats.8
--- old/duperemove-v0.09.beta3/hashstats.8      1970-01-01 01:00:00.000000000 
+0100
+++ new/duperemove-v0.09.beta5/hashstats.8      2014-12-09 05:05:30.000000000 
+0100
@@ -0,0 +1,30 @@
+.TH "hashstats" "8" "March 2014" "Version 0.09"
+.SH "NAME"
+hashstats \- Print information about a duperemove hashfile
+.SH "SYNOPSIS"
+\fBhashstats\fR \fI[options]\fR \fIhashfile\fR
+.SH "DESCRIPTION"
+.PP
+\fIhashfile\fR should be a file generated by running duperemove with
+the --write-hashes option.
+
+.SH "OPTIONS"
+
+.TP
+\fB\-n NUM\fR
+Print top \fINUM\fR hashes, sorted by bucket size. Default is 10.
+
+.TP
+\fB\-a\fR
+Print all hashes (overrides \fB-n\fR, above)
+.TP
+
+\fB\-b\fR
+Print info on each block within our hash buckets.
+
+.TP
+\fB\-l\fR
+Print a list of all files
+
+.SH "SEE ALSO"
+.BR duperemove(8)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/hashstats.c 
new/duperemove-v0.09.beta5/hashstats.c
--- old/duperemove-v0.09.beta3/hashstats.c      2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/hashstats.c      2014-12-09 05:05:30.000000000 
+0100
@@ -264,7 +264,7 @@
                return EINVAL;
        }
 
-       ret = read_hash_tree(serialize_fname, &tree, &blocksize, &h);
+       ret = read_hash_tree(serialize_fname, &tree, &blocksize, &h, 0);
        if (ret == FILE_VERSION_ERROR) {
                fprintf(stderr,
                        "Hash file \"%s\": "
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/serialize.c 
new/duperemove-v0.09.beta5/serialize.c
--- old/duperemove-v0.09.beta3/serialize.c      2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/serialize.c      2014-12-09 05:05:30.000000000 
+0100
@@ -39,6 +39,9 @@
 
 #include "serialize.h"
 
+char unknown_hash_type[8];
+#define        hash_type_v1_0  "\0\0\0\0\0\0\0\0"
+
 #if __BYTE_ORDER == __LITTLE_ENDIAN
 #define swap16(_x)     ((uint16_t)_x)
 #define swap32(_x)     ((uint32_t)_x)
@@ -60,6 +63,7 @@
        dprintf("num_files: %"PRIu64"\t", h->num_files);
        dprintf("num_hashes: %"PRIu64"\t", h->num_hashes);
        dprintf("block_size: %u\t", h->block_size);
+       dprintf("hash_type: %.*s\t", 8, h->hash_type);
        dprintf(" ]\n");
 }
 
@@ -91,6 +95,7 @@
        disk.num_files = swap64(h->num_files);
        disk.num_hashes = swap64(h->num_hashes);
        disk.block_size = swap32(h->block_size);
+       memcpy(&disk.hash_type, hash_type, 8);
 
        ret = lseek(fd, 0, SEEK_SET);
        if (ret == (loff_t)-1)
@@ -104,28 +109,10 @@
        return 0;
 }
 
-/* name scrambling code taken from e2fsprogs */
-static int name_id[256];
-static void scramble_name(char *name, int len)
-{
-       int id;
-       char *cp = name;
-
-       memset(cp, 'A', len);
-       id = name_id[len]++;
-       while ((len > 0) && (id > 0)) {
-               *cp += id % 26;
-               id = id / 26;
-               cp++;
-               len--;
-       }
-}
-
-static int write_file_info(int fd, struct filerec *file, int scramble)
+static int write_file_info(int fd, struct filerec *file)
 {
        int written, name_len;
        struct file_info finfo = { 0, };
-       char fname[PATH_MAX+1];
        char *n;
 
        finfo.ino = swap64(file->inum);
@@ -144,11 +131,6 @@
                return EIO;
 
        n = file->filename;
-       if (scramble) {
-               strcpy(fname, file->filename);
-               n = fname;
-               scramble_name(n, name_len);
-       }
 
        written = write(fd, n, name_len);
        if (written == -1)
@@ -179,7 +161,7 @@
 }
 
 int serialize_hash_tree(char *filename, struct hash_tree *tree,
-                       unsigned int block_size, int scramble)
+                       unsigned int block_size)
 {
        int ret, fd;
        struct hash_file_header *h = calloc(1, sizeof(*h));
@@ -206,7 +188,7 @@
                if (list_empty(&file->block_list))
                        continue;
 
-               ret = write_file_info(fd, file, scramble);
+               ret = write_file_info(fd, file);
                if (ret)
                        goto out;
                tot_files++;
@@ -332,12 +314,14 @@
        h->num_files = swap64(disk.num_files);
        h->num_hashes = swap64(disk.num_hashes);
        h->block_size = swap32(disk.block_size);
+       memcpy(&h->hash_type, &disk.hash_type, 8);
 
        return 0;
 }
 
 int read_hash_tree(char *filename, struct hash_tree *tree,
-                  unsigned int *block_size, struct hash_file_header *ret_hdr)
+                  unsigned int *block_size, struct hash_file_header *ret_hdr,
+                  int ignore_hash_type)
 {
        int ret, fd;
        uint32_t i;
@@ -361,6 +345,22 @@
                goto out;
        }
 
+       if (!ignore_hash_type) {
+               /*
+                * v1.0 hash files were SHA256 but wrote out hash_type
+                * as nulls
+                */
+               if (h.minor == 0 && memcmp(hash_type_v1_0, h.hash_type, 8)) {
+                       ret = FILE_HASH_TYPE_ERROR;
+                       memcpy(unknown_hash_type, hash_type_v1_0, 8);
+                       goto out;
+               } else  if (h.minor > 0 && memcmp(h.hash_type, hash_type, 8)) {
+                       ret = FILE_HASH_TYPE_ERROR;
+                       memcpy(unknown_hash_type, h.hash_type, 8);
+                       goto out;
+               }
+       }
+
        *block_size = h.block_size;
 
        dprintf("Load %"PRIu64" files from \"%s\"\n",
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/serialize.h 
new/duperemove-v0.09.beta5/serialize.h
--- old/duperemove-v0.09.beta3/serialize.h      2014-11-17 20:07:48.000000000 
+0100
+++ new/duperemove-v0.09.beta5/serialize.h      2014-12-09 05:05:30.000000000 
+0100
@@ -17,7 +17,7 @@
 #define __SERIALIZE__
 
 #define HASH_FILE_MAJOR        1
-#define HASH_FILE_MINOR        0
+#define HASH_FILE_MINOR        1
 
 #define HASH_FILE_MAGIC                "dupehash"
 struct hash_file_header {
@@ -28,7 +28,8 @@
 /*20*/ uint64_t        num_hashes;
        uint32_t        block_size; /* In bytes */
        uint32_t        pad0;
-       uint64_t        pad1[10];
+       char            hash_type[8];
+       uint64_t        pad1[9];
 };
 
 #define DISK_DIGEST_LEN                32
@@ -53,11 +54,14 @@
 };
 
 int serialize_hash_tree(char *filename, struct hash_tree *tree,
-                       unsigned int block_size, int scramble);
+                       unsigned int block_size);
 
 #define        FILE_VERSION_ERROR      1001
 #define        FILE_MAGIC_ERROR        1002
+#define        FILE_HASH_TYPE_ERROR    1003
+extern char unknown_hash_type[8];
 int read_hash_tree(char *filename, struct hash_tree *tree,
-                  unsigned int *block_size, struct hash_file_header *ret_hdr);
+                  unsigned int *block_size, struct hash_file_header *ret_hdr,
+                  int ignore_hash_type);
 
 #endif /* __SERIALIZE__ */
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/duperemove-v0.09.beta3/show-shared-extents.8 
new/duperemove-v0.09.beta5/show-shared-extents.8
--- old/duperemove-v0.09.beta3/show-shared-extents.8    1970-01-01 
01:00:00.000000000 +0100
+++ new/duperemove-v0.09.beta5/show-shared-extents.8    2014-12-09 
05:05:30.000000000 +0100
@@ -0,0 +1,15 @@
+.TH "show-shared-extents" "8" "December 2014" "Version 0.09"
+.SH "NAME"
+show-shared-extents \- Show extents that are shared.
+.SH "SYNOPSIS"
+\fBshow-shared-extents\fR \fIfiles...\fI
+.SH "DESCRIPTION"
+.PP
+Print all the extents in \fIfiles\fR that are shared. A sum of shared
+extents is also printed.
+
+On btrfs, an extent is reported as shared if it has more than one reference.
+
+.SH "SEE ALSO"
+.BR duperemove(8)
+.BR btrfs(8)

-- 
To unsubscribe, e-mail: opensuse-commit+unsubscr...@opensuse.org
For additional commands, e-mail: opensuse-commit+h...@opensuse.org

commit duperemove for openSUSE:Factory

Reply via email to