Regarding the block size versus apparent size, like Padraig, I think it's
okay to let --size just filter the size, whichever the user happens to
choose right now. One can combine it with apparent size.

I think having special character in --size to denote max and min sizes are
confusing. Why not have separate --max-size and --min-size arguments? This
way you can filter by a range, and it's obvious what the flags mean. It's
also more consistent in style with the --max-depth flag.

If you absolutely want just one arg, what about --size=[minsize]-[maxsize]?
e.g. --size=4K- filters output to entries greater than 4K, --size=-8K
filters output to those lesser than 8K, and --size=4K-8K filters output for
those between 4K and 8K.

liulk


On Thu, Jan 17, 2013 at 6:51 AM, Pádraig Brady <p...@draigbrady.com> wrote:

> On 01/17/2013 07:19 AM, Bernhard Voelker wrote:
> > On 01/17/2013 02:46 AM, Pádraig Brady wrote:
> >> On 01/17/2013 01:23 AM, Bernhard Voelker wrote:
> >>> I was pretty sure that this slipped also from Padraig's list.
> >>
> >> Sorry for the delay in this.
> >>
> >> Note it's still on the list:
> >> http://www.pixelbeat.org/**patches/coreutils/inbox_dec_**2012.html<http://www.pixelbeat.org/patches/coreutils/inbox_dec_2012.html>
> >>
> >> You can browse older news and subscribe to new updates at:
> >> http://www.pixelbeat.org/**patches/coreutils/<http://www.pixelbeat.org/patches/coreutils/>
> >
> > Thanks for the links.
> >
> >>> Therefore, I took Jakob's patch and amended it with documentation
> >>> and a comprehensive test. ;-)
> >>
> >> Wow great work on the test.
> >
> > Well, that test just grew and grew. It's actually a result of
> > me not being 100% happy with the --size option as in some
> > situations it might confuse people more than it may help:
> >
> > E.g. users usually tend to "think in apparent sizes" for their
> > files instead of block sizes.
> >
> > Having a directory like this:
> >
> >    $ find tmp -exec ls -dog '{}' +
> >    drwxr-xr-x 5      4096 Jan 17 07:28 tmp
> >    drwxr-xr-x 2      4096 Jan 17 07:29 tmp/big_dir
> >    -rw-r--r-- 1 104857600 Jan 17 07:29 tmp/big_dir/big_file
> >    drwxr-xr-x 2      4096 Jan 17 07:25 tmp/empty_dir
> >    drwxr-xr-x 2      4096 Jan 17 07:28 tmp/small_dir
> >    -rw-r--r-- 1         6 Jan 17 07:26 tmp/small_dir/small_file
> >    -rw-r--r-- 1         0 Jan 17 07:22 tmp/x0
> >    -rw-r--r-- 1         1 Jan 17 07:22 tmp/x1
> >    -rw-r--r-- 1        10 Jan 17 07:22 tmp/x2
> >    -rw-r--r-- 1       100 Jan 17 07:22 tmp/x3
> >    -rw-r--r-- 1      1000 Jan 17 07:22 tmp/x4
> >    -rw-r--r-- 1     10000 Jan 17 07:22 tmp/x5
> >    -rw-r--r-- 1    100000 Jan 17 07:22 tmp/x6
> >    -rw-r--r-- 1   1000000 Jan 17 07:22 tmp/x7
> >
> > Then filter files and directories greater/equal 4000:
> >
> >    $ src/du -B1 -a --size=4000 tmp | sort -k2
> >    106012672  tmp
> >    104861696  tmp/big_dir
> >    104857600  tmp/big_dir/big_file
> >    4096       tmp/empty_dir
> >    8192       tmp/small_dir
> >    4096       tmp/small_dir/small_file
> >    4096       tmp/x1
> >    4096       tmp/x2
> >    4096       tmp/x3
> >    4096       tmp/x4
> >    12288      tmp/x5
> >    102400     tmp/x6
> >    1003520    tmp/x7
> >
> > This included also the small files tmp/x1 while it left out
> > the empty file tmp/x0 ... but yet included the empty directory
> > tmp/empty_dir. This feels somehow counter-intuitive.
> >
> > Now let's use the "apparent size":
> >    $ src/du -B1 -a --size=4000 --app tmp | sort -k2
> >    105985101 tmp
> >    104861696 tmp/big_dir
> >    104857600 tmp/big_dir/big_file
> >    4096      tmp/empty_dir
> >    4102      tmp/small_dir
> >    10000     tmp/x5
> >    100000    tmp/x6
> >    1000000   tmp/x7
> >
> > This is much better. Well, the empty directory still shows up
> > here (which might be different on a different file system),
> > but at least the small files have gone.
> >
> > Thus said, it seems that automatically applying --apparent
> > when -a and --size is specified would give a more "natural"
> > result.
> >
> > In practice, the users will probably only search for huge files
> > and directories, i.e. much greater than the file system's
> > block size, but even then they'd be trapped by forgetting the
> > --app option when it comes to sparse files:
> >
> >    $ src/truncate --size=1T tmp/sparse-1T
> >
> >    $ src/du -h -a --size=100M tmp
> >    100M    tmp/big_dir/big_file
> >    101M    tmp/big_dir
> >    102M    tmp
> >
> >    $ src/du -h -a --size=100M --app tmp
> >    100M    tmp/big_dir/big_file
> >    101M    tmp/big_dir
> >    1.0T    tmp/sparse-1T
> >    1.1T    tmp
> >
> > The only way out of this - probably only my - confusion would
> > be to prevent the use of the -a and the --size option together.
> > But this would artificially restrict the user's flexibility.
> >
> > Does anyone else have such a feeling, too?
>
> I think it's fine to have --size filtering what du outputs.
> I.E. have it just honor -a. Your info on the subject is clear enough:
>
>
>  +Please note that the @option{--size} option can be combined with the
> above
>  +@option{--apparent-size} option, and in this case would elide entries
> based on
>  +its apparent size.  This makes most sense for files, i.e. when the
> @option{-a}
>  +is specified, too.
>
> I'd remove the last sentence above actually.
> The user may want to operate on the cumulative apparent size for dirs.
>
>
> >> I wonder would it make sense to have consistent --size
> >> handling for du and truncate. I.E. have --size='<10M'
> >> specify the max size and --size='>10M' specify the min size?
> >
> > I personally do not like shell-special characters in optargs
> > too much, as many users will forget to put it into quotes;
> > --size=<10M may not be a great problem, but --size=>10M
> > may destroy data.
>
> Yes I agree. Maybe we should enforce the '+',
> but then again maybe not since it means '>' in `find`,
> rather than '>='. For comparison as it stands:
>
> find -size +1233  ≍ du -B512 -a --size '1234'
> find -size +1233c ≍ du -a --size '1234'
>
>
> > I was rather thinking that to make it more consistent with
> > "find tmp -size +10M", or even to teach find a new -csize
> > (cumulative size) option ... as finding big directories was
> > the original problem. On the other side, 'find' doesn't offer
> > the flexibility to filter based on the block size, i.e. it
> > would always include huge sparse files although these do
> > not fill up the file system.
> >
> > Maybe the current implementation is still the better way ...
>
> +1
>
> thanks,
> Pádraig.
>

Reply via email to