bug#22128: dirname enhancement
2015-12-10 10:40:30 -0700, Bob Proulx: [...] > In this instance the first thing I thought of when I read your dirname > -f request was a loop. > >while read dir; do dirname $dir; done < list "read dir" expects the input in a very specific format and depends on the current value of IFS (like a dir called "my\dir " has to be input as "my\\dir\ " with the default value of IFS) and can't accept dir names with newline characters. Invoking the split+glob operator on $dir doesn't make sense here unless you mean the input to be treated as a $IFS delimited list of patterns. If the intention was to treat the input as a list of file paths, one per line (so can't do file paths with newline characters), then that would rather be: while IFS= read -r dir; do dirname -- "$dir"; done < list > > Pádraig suggested xargs which was even shorter. > > xargs dirname < filename That expects yet another input format. That time, it can cope with any file path, since newline can be specified using quotes like: "my dir with newline" The output of dirname however won't be post-processable. > Both of those directly do exactly what you had asked to do. The > technique works not only with dirname but with every other command on > the system too. A technique that works with everything is much better > than something that only works in one small place. The while loop you can't reasonably do for large file lists as running one dirname invocation per file is going to be prohibitive in terms of performance. The xargs approach, you can do only with GNU dirname as it supports passing more than one string as an extension over the standard. I think here we're seeing the limits of shell scripting. OK, dirname is the tool to get a dirname, but doing it in a loop is not practical/efficient and produces an ambiguous output (not to mention that file names are not necessarily valid text so the passing of that data through text utilities can be a problem) Extending all the utilities so that they can take a list of arguments from stdin instead of arguments is one solution (and one solution applied by several GNU utilities already (like --files0-from in du/sort/wc) but I agree xargs -r0 is a more generic solution and good enough for things like dirname since the number of invocations is minimised.. The --files0-from option of du/sort/wc are justified because xargs -r0 wouldn't work (as several invocations of the utilities could end-up being made which wouldn't work for them), but not for dirname. (I'd argue ls would need one for its sorting though (and an option to outut NUL delimited). That can't be applied for commands that take only one argument like basename though. GNU xargs addresses the problem of the stdin of the command being redirected (like for rm -i) with its --arg-file option The problem with dirname is that OK, GNU dirname can take several paths as arguments but then its output is not post-processable reliably ("dirname a/b a/c" and "dirname $'a\na/b'" produce the same output for instance). Here using another programming language/paradigm that has the "dirname" capability and can deal with list of strings reliably within the same command (like perl or zsh) would be a more reliable and efficient approach. zsh: files=(${(z)Want a sorted unique list modified in some custom way? > >while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u [...] I would recommend the reading of https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice Here, I'd do: < list sed -z 's/foo/bar/' | LC_ALL=C sort -zu Assuming a NUL delimited list in "list". -- Stephane
bug#22128: dirname enhancement
On 11/12/15 14:46, Stephane Chazelas wrote: > 2015-12-10 10:40:30 -0700, Bob Proulx: > [...] >> In this instance the first thing I thought of when I read your dirname >> -f request was a loop. >> >>while read dir; do dirname $dir; done < list > > "read dir" expects the input in a very specific format and > depends on the current value of IFS (like a dir called "my\dir " > has to be input as "my\\dir\ " with the default value of IFS) > and can't accept dir names with newline characters. > > Invoking the split+glob operator on $dir doesn't make sense here > unless you mean the input to be treated as a $IFS delimited list > of patterns. > > If the intention was to treat the input as a list of file > paths, one per line (so can't do file paths with newline > characters), then that would rather be: > > while IFS= read -r dir; do dirname -- "$dir"; done < list > >> >> Pádraig suggested xargs which was even shorter. >> >> xargs dirname < filename > > That expects yet another input format. That time, it can cope > with any file path, since newline can be specified using quotes > like: > > "my dir > with newline" > > The output of dirname however won't be post-processable. Both GNU basename and dirname since 8.16 (2012) got the -z option to make the _output_ post-processable, along with support for processing multiple inputs. xargs splits arguments on the _input_ appropriately. In general xargs is fine for this when the tool doesn't need to process all inputs at once (like sorting or generating a total for example). cheers, Pádraig.
bug#22128: dirname enhancement
Nellis, Kenneth wrote: > Still, my -f suggestion would be easier to type, > but I welcome your alternatives. Here is the problem. You would like dirname to read a list from a file. Someone else will want it to read a file list of files listing files. Another will want to skip one header line. Another will want to skip multiple header lines. Another will want the exact same feature in basename too. Another will want file name modification so that it can be used to rename directories. And on and on and on. Trying to put every possible combination of feature into every utility leads to unmanageable code bloat. What do all of those have in common? They are all specific features that are easily available by using the features of the operating system. That is the entire point of a Unix-like operating system. It already has all of the tools needed. You tell it what you want it to do using those features. That is the way the operating system is designed. Utilities such as dirname are simply small pieces in the complete solution. In this instance the first thing I thought of when I read your dirname -f request was a loop. while read dir; do dirname $dir; done < list Pádraig suggested xargs which was even shorter. xargs dirname < filename Both of those directly do exactly what you had asked to do. The technique works not only with dirname but with every other command on the system too. A technique that works with everything is much better than something that only works in one small place. Want to get the basename instead? while read dir; do basename $dir; done < list Want to modify the result to add a suffix? while read dir; do echo $dir.myaddedsuffix; done < list Want to modify the name in some custom way? while read dir; do echo $dir | sed 's/foo/bar/; done < list Want a sorted unique list modified in some custom way? while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u The possibilities are endless and as they say limited only by your imagination. Anything you can think of doing you can tell the system to do it for you. Truly a marvelous thing to be so empowered. Note that in order to be completely general and work with arbitrary names that have embedded newlines then proper quoting is required and the wisdom of today says always use null terminated strings. But if you are using a file of names then I assume you are operating on a restricted and sane set of characters so this won't matter to you. I do that all of the time. Bob
bug#22128: dirname enhancement
I got it. You don't like the idea. That's fine. Please close the ticket. --Ken > -Original Message- > From: Bob Proulx [mailto:b...@proulx.com] > Sent: Thursday, December 10, 2015 12:41 PM > To: Nellis, Kenneth > Cc: 22...@debbugs.gnu.org > Subject: Re: bug#22128: dirname enhancement > > Nellis, Kenneth wrote: > > Still, my -f suggestion would be easier to type, > > but I welcome your alternatives. > > Here is the problem. You would like dirname to read a list from a > file. Someone else will want it to read a file list of files listing > files. Another will want to skip one header line. Another will want > to skip multiple header lines. Another will want the exact same > feature in basename too. Another will want file name modification so > that it can be used to rename directories. And on and on and on. > Trying to put every possible combination of feature into every utility > leads to unmanageable code bloat. > > What do all of those have in common? They are all specific features > that are easily available by using the features of the operating > system. That is the entire point of a Unix-like operating system. It > already has all of the tools needed. You tell it what you want it to > do using those features. That is the way the operating system is > designed. Utilities such as dirname are simply small pieces in the > complete solution. > > In this instance the first thing I thought of when I read your dirname > -f request was a loop. > >while read dir; do dirname $dir; done < list > > Pádraig suggested xargs which was even shorter. > > xargs dirname < filename > > Both of those directly do exactly what you had asked to do. The > technique works not only with dirname but with every other command on > the system too. A technique that works with everything is much better > than something that only works in one small place. > > Want to get the basename instead? > >while read dir; do basename $dir; done < list > > Want to modify the result to add a suffix? > >while read dir; do echo $dir.myaddedsuffix; done < list > > Want to modify the name in some custom way? > >while read dir; do echo $dir | sed 's/foo/bar/; done < list > > Want a sorted unique list modified in some custom way? > >while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u > > The possibilities are endless and as they say limited only by your > imagination. Anything you can think of doing you can tell the system > to do it for you. Truly a marvelous thing to be so empowered. > > Note that in order to be completely general and work with arbitrary > names that have embedded newlines then proper quoting is required and > the wisdom of today says always use null terminated strings. But if > you are using a file of names then I assume you are operating on a > restricted and sane set of characters so this won't matter to you. > I do that all of the time. > > Bob
bug#22128: dirname enhancement
Pádraig Brady wrote: > Nellis, Kenneth wrote: > > E.g., to get a list of directories that contain a specific file: > > > > find -name "xyz.dat" | dirname -f - > > find -name "xyz.dat" -print0 | xargs -r0 dirname Also if using GNU find can use GNU find's -printf operand and %h to print the directory of the matching item. Not portable to non-gnu systems. find . -name xyz.dat -printf "%h\n" Can generate null terminated string output for further xargs -0 use. find . -name xyz.dat -printf "%h\0" | xargs -0 ...otherstuff... Bob
bug#22128: dirname enhancement
tag 22128 notabug close 22128 stop On 09/12/15 17:31, Nellis, Kenneth wrote: > I frequently need to extract the `dirname's from a list of files, > so dirname should have an option to take its input from a > file, e.g.: > > dirname -f xargs dirname < filename > where could be "-" for stdin. > > E.g., to get a list of directories that contain a specific > file: > > find -name "xyz.dat" | dirname -f - find -name "xyz.dat" -print0 | xargs -r0 dirname > The same would be good for `basename' as well. xargs basename -a < filename thanks, Pádraig.