Update of bug #46815 (project findutils):

                  Status:                 Invalid => Need Info              
             Open/Closed:                  Closed => Open                   

    _______________________________________________________

Follow-up Comment #4:

Yes, both Sebastian and Dale are correct.  The 4.1.7 version of the manual
page is much less clear, though consistent with the current behaviour:

-size n[bckw]
File uses n units of space.  The units are 512-byte blocks by default or if
`b' follows n, bytes if `c' follows n, kilobytes if `k' follows n, or 2-byte
words if `w' follows n.  The size does
not count indirect blocks, but it does count blocks in sparse files that are
not actually allocated.

The current and previous code behaves similarly.  Here is the 4.1.7 code for
illustration:

boolean
pred_size (pathname, stat_buf, pred_ptr)
     char *pathname;
     struct stat *stat_buf;
     struct predicate *pred_ptr;
{
  unsigned long f_val;

  f_val = (stat_buf->st_size + pred_ptr->args.size.blocksize - 1)
    / pred_ptr->args.size.blocksize;
  switch (pred_ptr->args.size.kind)
    {
    case COMP_GT:
      if (f_val > pred_ptr->args.size.size)
        return (true);
      break;
    case COMP_LT:
      if (f_val < pred_ptr->args.size.size)
        return (true);
      break;
    case COMP_EQ:
      if (f_val == pred_ptr->args.size.size)
        return (true);
      break;
    }
  return (false);
}

As you can see, this does round up.

Let's go over to take a quick look at the POSIX requirements for the -size
test at http://pubs.opengroup.org/onlinepubs/009695399/utilities/find.html :-

-size  n[c]
The primary shall evaluate as true if the file size in bytes, divided by 512
and rounded up to the next integer, is n. If n is followed by the character
'c', the size shall be in bytes.

So POSIX requires -size -1 should be false for a 500-byte file, and it also
introduces a suffix c for bytes.

As is quite common for GNU tools, every chance to make a potentially-useful
extension is eventually taken.  So in this case the introduction of
alternative suffixes beyond the mandatory "c".  The "k" suffix denotes units
of 1024 bytes, for example.  It's not suprising that somebody thought that the
behavior for dealing in k should be quite similar to that for dealing in units
of 512 bytes (especially if they were using one of the several systems where
the system block size for things like ls -s is in fact 1024 bytes).

But this is clearly surprising for unit suffixes like m, for which it is
pretty clear that the user is not thinking in terms of how many blocks the
file occupies on the storage layer.  So the existing behavior is kind of
understandable, but it is obviously confusing for most users.

The canonical version of the bug report on this, one might say, is
https://savannah.gnu.org/bugs/?12162.  There are others (see
https://savannah.gnu.org/bugs/?group=findutils&func=browse&set=custom&msort=0&status_id[]=3&resolution_id[]=0&submitted_by[]=0&assigned_to[]=0&category_id[]=0&bug_group_id[]=0&severity[]=0&summary[]=-size&details[]=&advsrch=0&msort=0&chunksz=50&spamscore=5&report_id=101&sumORdet=0&morder=severity%3C&sumOrdet=0&order=date#results).

I think that particular discussion got side-tracked, at the end, by the
introduction of time tests.  The problems of rounding with those tests have
largely been obviated by the introduction of tests like -newermt, where the
timestamp is specified directly in absolute not relative terms (and no
rounding occurs).

I've been in favour of providing a more sensible test for a long time, the
problem has always been how to spell the new usage and describe its semantics.
  The use of > and < prefixes is attractive, but doomed by the use of those
characters by the shell.  Yes, the user could quote them to avoid redirection,
but this would clearly be a source of confusion for less experienced users.

The alternatives that seem attractive to me are Nigel McNie's proposal to use
a new test, '-filesize' or something along the lines of Martin Steigerwald's
three-word variant (i.e. -size lt 20M being the sane, no-rounding, replacement
for -size -20M).

Both of those options have the nice property that they're likely
POSIX-compliant in the sense that POSIX provides no required meaning for those
constructs (though -filesize is I suppose more obviously a GNU extension).

Let's re-open the discussion about what to call the "sane" alternative to
-size, and implement it this time.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?46815>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


Reply via email to