Re: du man page -- --separate-dirs explanation

2013-06-12 Thread Pádraig Brady
On 06/06/2013 10:00 PM, C de-Avillez wrote:
 Hello,
 
 This comes from https://launchpad.net/bugs/1187044. The OP (although 
 completely confused initially) raises a point that I sort of agree. The man 
 page for 'du' states, for  --separate-dirs, do not include size of 
 subdirectories; but it is not really clear what that means -- a first read I 
 had got me also a bit confused.
 
 On the other hand, I am having problems trying to convey something more 
 clear.
 
 separate directory size from directory contents
 output size of directory separate from size of directory contents

Well that's a bit ambiguous too since non directory entries
are included with --separate-dirs.

 
 I think something like this could be used, but I would like some input before 
 going on.

I see the original confusion in the man page,
also the texinfo was a bit confusing to me.
So hopefully the attached clarifies both appropriately.

thanks,
Pádraig.

From 245206d430a9b06eebf9bd3d55cff7a341bdeee7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com
Date: Wed, 12 Jun 2013 11:40:25 +0100
Subject: [PATCH] doc: clarify the description of du --separate-dirs

* src/du.c (usage): Clarify that --separate-dirs doesn't exclude
all directories.
* doc/coreutils.texi (du invocation): Avoid implying that -S
excludes the size of any non directory entries for a directory.
Also don't mention st_size as it's dependent on --apparent-size.
Reported by C de-Avillez in https://launchpad.net/bugs/1187044
---
 doc/coreutils.texi |3 +--
 src/du.c   |2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index d607eaf..a325bd0 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -11436,8 +11436,7 @@ Normally, in the output of @command{du} (when not using @option{--summarize}),
 the size listed next to a directory name, @var{d}, represents the sum
 of sizes of all entries beneath @var{d} as well as the size of @var{d} itself.
 With @option{--separate-dirs}, the size reported for a directory name,
-@var{d}, is merely the @code{stat.st_size}-derived size of the directory
-entry, @var{d}.
+@var{d}, will exclude the size of any subdirectories.
 
 @optSi
 
diff --git a/src/du.c b/src/du.c
index a80a177..1aa5a16 100644
--- a/src/du.c
+++ b/src/du.c
@@ -315,7 +315,7 @@ Summarize disk usage of each FILE, recursively for directories.\n\
 ), stdout);
   fputs (_(\
   -P, --no-dereference  don't follow any symbolic links (this is the default)\n\
-  -S, --separate-dirs   do not include size of subdirectories\n\
+  -S, --separate-dirs   for directories do not include size of subdirectories\n\
   --si  like -h, but use powers of 1000 not 1024\n\
   -s, --summarize   display only a total for each argument\n\
 ), stdout);
-- 
1.7.7.6



Re: du man page -- --separate-dirs explanation

2013-06-12 Thread C de-Avillez
On Wed, Jun 12, 2013 at 7:00 AM, Pádraig Brady p...@draigbrady.com wrote:

 On 06/06/2013 10:00 PM, C de-Avillez wrote:
  Hello,
 
  This comes from https://launchpad.net/bugs/1187044. The OP (although
 completely confused initially) raises a point that I sort of agree. The man
 page for 'du' states, for  --separate-dirs, do not include size of
 subdirectories; but it is not really clear what that means -- a first read
 I had got me also a bit confused.
 
  On the other hand, I am having problems trying to convey something more
 clear.
 
  separate directory size from directory contents
  output size of directory separate from size of directory contents

 Well that's a bit ambiguous too since non directory entries
 are included with --separate-dirs.

 
  I think something like this could be used, but I would like some input
 before going on.

 I see the original confusion in the man page,
 also the texinfo was a bit confusing to me.
 So hopefully the attached clarifies both appropriately.

 thanks,
 Pádraig.


I think it does -- at least also for me. And, yes, my tries were also
confused (as, I guess, I was).

Thank you, Pádraig.

-- 
..hggdh..


Re: [PATCH] Add wipename option to shred

2013-06-12 Thread Joseph D. Wagner

On 06/11/2013 4:36 pm, Pádraig Brady wrote:


On 06/11/2013 07:20 AM, Joseph D. Wagner wrote:


Currently, when --remove (-u) is specified, shred overwrites the file
name once for each character, so a file name of 0123456789 would be
overridden 10 times. While this may be the most secure, it is also 
the
most time consuming as each of the 10 renames has its own fsync. 
Also,

renaming may not be as effective on some journaled file systems. This
patch adds the option --wipename (-w) which accepts the options: *
perchar - overwrite file name once per character; same as current. *
once - overwrite file name once total. * none - skip overwrite of 
file

name entirely; just unlink. If --remove is specified but not
--wipename, perchar is assumed, preserving current behavior. 
Specifying

--wipename implies --remove. In theory, this should provide improved
performance for those who choose it, especially when deleting many
small files. I am currently testing performance on my system, but I
wanted to get the ball rolling by soliciting your comments and your
receptiveness to accepting this patch. Thoughts?


Thanks for the patch.
While on the face of it, the extra control seems beneficial,
I'm not convinced. The main reason is that this gives
some extra credence to per file shreds, which TBH are
not guaranteed due to journalling etc.

I see performance as an important consideration when
shredding large amounts of data like a disk device.
However single file performance is less of a concern.
The normal use case for shred would be for single files,
or at the device level. shredding many files is not the
normal use case to worry about IMHO. If one was worried
about securely shredding many files, it's probably best
to have those files on a separate file system, and shred
that at a lower level.

In any case if you really were OK with just unlinking files
after shredding the data, that can be done in a separate operation:
find | xargs shred
find | xargs rm

So I'm 60:40 against adding this option.

thanks,
Pádraig.


I thought about running two separate operations, as you suggested.
However, my problem with that would be the loss of an atomic
transaction.  What if something happens midway through the shred?  I
would not know which files were shredded, and I would have to start
over.  Worse, if running from a script, it might execute the unlinking
without having completed the shred.  While I could create all sorts of
sophisticated code to check these things, it would be a lot easier if I
could simply rely on the mechanisms already built into shred.

I can understand your concern about a tool being misused.  If adding a
warning to the usage output would help alleviate your concerns, I would
be happy to draft one and add it to my patch.  However, I do not 
believe

people should be denied a tool due to its potential misuse.  Would you
deny people the use of an iron due to its risk of misuse or injury?  My
personal philosophy is to give them the tool with instructions and
warnings.  If the user disregards this information, it is not my
problem.  In my case, I am using shred to purge information from file
systems that cannot be taken offline.  Given the specific file system,
its configuration, and modest sensitivity of the information, the
decision was made that this is an acceptable risk.  I believe I should
be able to assume those risks, without being denied optimizations
because they are not considered best practices for the majority of use
cases.

As for the performance improvement itself, the result is measurably
significant.  I wrote a script that creates 100,000 files, and then
measures the performance of shredding those files using the different
wipename options in my patch.  Exact results and the exact script are
below.

I am hoping these hard numbers and my kind, persuasive arguments will
convince you to change your mind, and accept my patch.

Thank you for your time and consideration.

Joseph D. Wagner

## perchar ##
real678m33.468s
user0m9.450s
sys 3m20.001s

## once ##
real151m54.655s
user0m3.336s
sys 0m32.357s

## none ##
real107m34.307s
user0m2.637s
sys 0m21.825s

perchar: 11 hours 18 minutes 33.468 seconds
once: 2 hours 31 minutes 54.655 seconds
 * a 346% improvement over perchar
none: 1 hour 47 minutes 34.307 seconds
 * a 530% improvement over perchar
 * a 41% improvement over once

Exact script below.

#!/bin/bash
# Generate random files
FILESNEEDED=$((10 - `ls -1 | wc -l`))
for (( i=1; i  ${FILESNEEDED}; i++)); do
FILENAME=`tr -dc [:alpha:]  /dev/urandom | head -c 16`
truncate -s 4k ${FILENAME}
done
# Wipename perchar
echo ## perchar ##
time find . -type f -print0 | xargs -0 -n16384 ../shred -n1 
--wipename=perchar

# Generate random files
FILESNEEDED=$((10 - `ls -1 | wc -l`))
for (( i=1; i  ${FILESNEEDED}; i++)); do
FILENAME=`tr -dc [:alpha:]  /dev/urandom | head -c 16`
truncate -s 4k ${FILENAME}
done
# Wipename once
echo ## once ##
time find .