bug#22128: dirname enhancement

2015-12-11 Thread Stephane Chazelas
2015-12-10 10:40:30 -0700, Bob Proulx:
[...]
> In this instance the first thing I thought of when I read your dirname
> -f request was a loop.
> 
>while read dir; do dirname $dir; done < list

"read dir" expects the input in a very specific format and
depends on the current value of IFS (like a dir called "my\dir "
has to be input as "my\\dir\ " with the default value of IFS)
and can't accept dir names with newline characters.

Invoking the split+glob operator on $dir doesn't make sense here
unless you mean the input to be treated as a $IFS delimited list
of patterns.

If the intention was to treat the input as a list of file
paths, one per line (so can't do file paths with newline
characters), then that would rather be:

 while IFS= read -r dir; do dirname -- "$dir"; done < list

> 
> Pádraig suggested xargs which was even shorter.
> 
>   xargs dirname < filename

That expects yet another input format. That time, it can cope
with any file path, since newline can be specified using quotes
like:

"my dir
with newline"

The output of dirname however won't be post-processable.


> Both of those directly do exactly what you had asked to do.  The
> technique works not only with dirname but with every other command on
> the system too.  A technique that works with everything is much better
> than something that only works in one small place.

The while loop you can't reasonably do for large file lists as
running one dirname invocation per file is going to be
prohibitive in terms of performance.

The xargs approach, you can do only with GNU dirname as it
supports passing more than one string as an extension over the
standard.

I think here we're seeing the limits of shell scripting. OK,
dirname is the tool to get a dirname, but doing it in a loop is
not practical/efficient and produces an ambiguous output (not to
mention that file names are not necessarily valid text so the
passing of that data through text utilities can be a problem)

Extending all the utilities so that they can take a list of
arguments from stdin instead of arguments is one solution (and
one solution applied by several GNU utilities already (like
--files0-from in du/sort/wc) but I agree xargs -r0 is a more
generic solution and good enough for things like dirname since
the number of invocations is minimised..

The --files0-from option of du/sort/wc are justified because
xargs -r0 wouldn't work (as several invocations of the utilities
could end-up being made which wouldn't work for them), but not
for dirname. (I'd argue ls would need one for its sorting though
(and an option to outut NUL delimited).

That can't be applied for commands that take only one argument
like basename though.

GNU xargs addresses the problem of the stdin of the command
being redirected (like for rm -i) with its --arg-file option

The problem with dirname is that OK, GNU dirname can take
several paths as arguments but then its output is not
post-processable reliably ("dirname a/b a/c" and "dirname
$'a\na/b'" produce the same output for instance).

Here using another programming language/paradigm that has the
"dirname" capability and can deal with list of strings reliably
within the same command (like perl or zsh) would be a more
reliable and efficient approach.


zsh:

files=(${(z)

bug#22128: dirname enhancement

2015-12-11 Thread Pádraig Brady
On 11/12/15 14:46, Stephane Chazelas wrote:
> 2015-12-10 10:40:30 -0700, Bob Proulx:
> [...]
>> In this instance the first thing I thought of when I read your dirname
>> -f request was a loop.
>>
>>while read dir; do dirname $dir; done < list
> 
> "read dir" expects the input in a very specific format and
> depends on the current value of IFS (like a dir called "my\dir "
> has to be input as "my\\dir\ " with the default value of IFS)
> and can't accept dir names with newline characters.
> 
> Invoking the split+glob operator on $dir doesn't make sense here
> unless you mean the input to be treated as a $IFS delimited list
> of patterns.
> 
> If the intention was to treat the input as a list of file
> paths, one per line (so can't do file paths with newline
> characters), then that would rather be:
> 
>  while IFS= read -r dir; do dirname -- "$dir"; done < list
> 
>>
>> Pádraig suggested xargs which was even shorter.
>>
>>   xargs dirname < filename
> 
> That expects yet another input format. That time, it can cope
> with any file path, since newline can be specified using quotes
> like:
> 
> "my dir
> with newline"
> 
> The output of dirname however won't be post-processable.

Both GNU basename and dirname since 8.16 (2012) got
the -z option to make the _output_ post-processable,
along with support for processing multiple inputs.

xargs splits arguments on the _input_ appropriately.
In general xargs is fine for this when the tool
doesn't need to process all inputs at once
(like sorting or generating a total for example).

cheers,
Pádraig.






bug#22128: dirname enhancement

2015-12-10 Thread Bob Proulx
Nellis, Kenneth wrote:
> Still, my -f suggestion would be easier to type,
> but I welcome your alternatives.

Here is the problem.  You would like dirname to read a list from a
file.  Someone else will want it to read a file list of files listing
files.  Another will want to skip one header line.  Another will want
to skip multiple header lines.  Another will want the exact same
feature in basename too.  Another will want file name modification so
that it can be used to rename directories.  And on and on and on.
Trying to put every possible combination of feature into every utility
leads to unmanageable code bloat.

What do all of those have in common?  They are all specific features
that are easily available by using the features of the operating
system.  That is the entire point of a Unix-like operating system.  It
already has all of the tools needed.  You tell it what you want it to
do using those features.  That is the way the operating system is
designed.  Utilities such as dirname are simply small pieces in the
complete solution.

In this instance the first thing I thought of when I read your dirname
-f request was a loop.

   while read dir; do dirname $dir; done < list

Pádraig suggested xargs which was even shorter.

  xargs dirname < filename

Both of those directly do exactly what you had asked to do.  The
technique works not only with dirname but with every other command on
the system too.  A technique that works with everything is much better
than something that only works in one small place.

Want to get the basename instead?

   while read dir; do basename $dir; done < list

Want to modify the result to add a suffix?

   while read dir; do echo $dir.myaddedsuffix; done < list

Want to modify the name in some custom way?

   while read dir; do echo $dir | sed 's/foo/bar/; done < list

Want a sorted unique list modified in some custom way?

   while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u

The possibilities are endless and as they say limited only by your
imagination.  Anything you can think of doing you can tell the system
to do it for you.  Truly a marvelous thing to be so empowered.

Note that in order to be completely general and work with arbitrary
names that have embedded newlines then proper quoting is required and
the wisdom of today says always use null terminated strings.  But if
you are using a file of names then I assume you are operating on a
restricted and sane set of characters so this won't matter to you.
I do that all of the time.

Bob





bug#22128: dirname enhancement

2015-12-10 Thread Nellis, Kenneth
I got it. You don't like the idea. That's fine. Please close the ticket.
--Ken


> -Original Message-
> From: Bob Proulx [mailto:b...@proulx.com]
> Sent: Thursday, December 10, 2015 12:41 PM
> To: Nellis, Kenneth
> Cc: 22...@debbugs.gnu.org
> Subject: Re: bug#22128: dirname enhancement
> 
> Nellis, Kenneth wrote:
> > Still, my -f suggestion would be easier to type,
> > but I welcome your alternatives.
> 
> Here is the problem.  You would like dirname to read a list from a
> file.  Someone else will want it to read a file list of files listing
> files.  Another will want to skip one header line.  Another will want
> to skip multiple header lines.  Another will want the exact same
> feature in basename too.  Another will want file name modification so
> that it can be used to rename directories.  And on and on and on.
> Trying to put every possible combination of feature into every utility
> leads to unmanageable code bloat.
> 
> What do all of those have in common?  They are all specific features
> that are easily available by using the features of the operating
> system.  That is the entire point of a Unix-like operating system.  It
> already has all of the tools needed.  You tell it what you want it to
> do using those features.  That is the way the operating system is
> designed.  Utilities such as dirname are simply small pieces in the
> complete solution.
> 
> In this instance the first thing I thought of when I read your dirname
> -f request was a loop.
> 
>while read dir; do dirname $dir; done < list
> 
> Pádraig suggested xargs which was even shorter.
> 
>   xargs dirname < filename
> 
> Both of those directly do exactly what you had asked to do.  The
> technique works not only with dirname but with every other command on
> the system too.  A technique that works with everything is much better
> than something that only works in one small place.
> 
> Want to get the basename instead?
> 
>while read dir; do basename $dir; done < list
> 
> Want to modify the result to add a suffix?
> 
>while read dir; do echo $dir.myaddedsuffix; done < list
> 
> Want to modify the name in some custom way?
> 
>while read dir; do echo $dir | sed 's/foo/bar/; done < list
> 
> Want a sorted unique list modified in some custom way?
> 
>while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u
> 
> The possibilities are endless and as they say limited only by your
> imagination.  Anything you can think of doing you can tell the system
> to do it for you.  Truly a marvelous thing to be so empowered.
> 
> Note that in order to be completely general and work with arbitrary
> names that have embedded newlines then proper quoting is required and
> the wisdom of today says always use null terminated strings.  But if
> you are using a file of names then I assume you are operating on a
> restricted and sane set of characters so this won't matter to you.
> I do that all of the time.
> 
> Bob






bug#22128: dirname enhancement

2015-12-10 Thread Bob Proulx
Pádraig Brady wrote:
> Nellis, Kenneth wrote:
> > E.g., to get a list of directories that contain a specific file: 
> > 
> > find -name "xyz.dat" | dirname -f -
> 
> find -name "xyz.dat" -print0 | xargs -r0 dirname

Also if using GNU find can use GNU find's -printf operand and %h to
print the directory of the matching item.  Not portable to non-gnu
systems.

  find . -name xyz.dat -printf "%h\n"

Can generate null terminated string output for further xargs -0 use.

  find . -name xyz.dat -printf "%h\0" | xargs -0 ...otherstuff...

Bob





bug#22128: dirname enhancement

2015-12-09 Thread Pádraig Brady
tag 22128 notabug
close 22128
stop

On 09/12/15 17:31, Nellis, Kenneth wrote:
> I frequently need to extract the `dirname's from a list of files,
> so dirname should have an option to take its input from a
> file, e.g.:
> 
> dirname -f 

xargs dirname < filename

> where  could be "-" for stdin.
> 
> E.g., to get a list of directories that contain a specific
> file: 
> 
> find -name "xyz.dat" | dirname -f -

find -name "xyz.dat" -print0 | xargs -r0 dirname

> The same would be good for `basename' as well.

xargs basename -a < filename

thanks,
Pádraig.