Re: Unix-ify File Names
* Masatran, R. Deepak <[EMAIL PROTECTED]> 2007-04-17 > Since I frequently receive files from Microsoft Windows users, is there any > utility to unix-ify file names, that is, use lower case exclusively, use > hyphen as separator, etc.? I wrote the script below just now. Kindly give comments. This script is also available at <http://research.iiit.ac.in/~masatran/scripts/unixify>. #!/usr/bin/perl # Unix-ify name of file given as argument. # For multiple files, use along with the "find" program of Unix. # Supports Unicode file names # Correctly handles dangerous characters in file names. # How to get confirmation before over-writing a file, without sacrificing portability? use strict; use warnings; use utf8; my $name_original = shift; my $name_lowercase = lc $name_original; my $name_unix = join("-" => split(/\W/ => $name_lowercase)); rename $name_original => $name_unix; -- Masatran, R. Deepak <http://research.iiit.ac.in/~masatran/> pgpe9Lon1Ds6v.pgp Description: PGP signature
Re: Unix-ify File Names
Daniel Barclay <[EMAIL PROTECTED]>: > Frank Terbeck wrote: >> Daniel Barclay <[EMAIL PROTECTED]>: >>> Frank Terbeck wrote: Daniel B. <[EMAIL PROTECTED]>: [...] >>> For example, Emacs' tags files use commas as delimiters, and (last I >>> knew) don't have an escape/encoding mechansim for representing a comma >>> _in_ a file name, so (again, last I knew) a Linux kernel file with >>> a comma in its name doesn't get processed right. >> So? Just because there are programs that limit the namespace of the >> files they are working with (which is _absolutely_ okay), does not >> mean, that shell scripts must obey to these programs' behaviours. > > How did you infer that I was arguing that the shells should follow > those programs' behaviors? I wasn't arguing for that. > > I was pointing out that using shell-special characters in filenames > was (somewhat) bad--it triggers problems with non-robust programs. Then where is the point for the discussion? I am not telling anyone to bring filenames with weird characters into their system. But it is possible to support them if they are there. If you do not know about the data you are dealing with, limiting the code does not make sense. But I said that before. [...] >> Btw: xargs is not needed if your find binary is reasonably POSIX >> compliant. Just use '+' instead of ';' with the -exec option. (Yes, I >> know that GNU find didn't support this for quite some time.) > > Which version of find supports that? My (Sarge) system's man page > for find doesn't seem to mention it yet. I said GNU find didn't support it for quite some time. But nevertheless, SUSv3 defined it before (and there is not just the GNU version of find in this world). The GNU find in etch supports it. > Does the "+" make find invoke the command with multiple filenames at > once? Yes. >>> However, what about the general case? >>> >>> It sounds like for i in `...` doesn't have an escaping/encoding >>> mechanism that is sufficient to handle both (unescaped) asterisks >>> that represent wildcards and escaped/encoded asterisks that represent >>> literal asterists. >> I don't think you really understand, what is happening here. >> [snip] >> % foo='bar\ baz' ; % for i in `echo "$foo"` ; do echo "($i)" ; done >> (bar\) >> (baz) >> [snap] >> You _cannot_ escape things there. > > So how am I misunderstanding it? (I said it sounds like the shell > for loop doesn't support escaping. You said one cannot escape > things there. Those statements are consistent with each other. > So how am I not understanding it? See end of mail. >> You see, this is not the type of thing, you want to teach beginners. >> Hence, 'for i in `...`' loops should be avoided by beginners (did you >> realize, that you dropped 'ls *glob*' from the backtick expression? > > Yes. Did you realize that I was trying to talk about cases that are > more general that just globbing done by the shell? Yes. Otherwise you would have left the glob in there. But you are narrowing the subject until it fits your argumentation. > (By the way, why do keep sticking extraneous commas in the middle > your sentences?) No native English speaker here. Want to recommend a book about grammar? [...] >>> Is there any such command (or, say, built-in function)? >> It sounds like you are looking for 'eval'. > > Yes, that does seem like the easier (and safer) ("right") way. No. 'eval' is a great tool and has its uses. But it does not make loops easier, nor safer. >> But this has got noting to do with the original subject. >> And this misunderstanding leads me to the conclusion, that you should >> read up on how various expansions in POSIX shells work (and probably >> on a few common pitfalls, like maximum size of arguments for external >> processes, too.); > > Yeah, I know about that one (well, that there is a limit, if not > details). You do not know it. Otherwise you would know how the expansion in for var in `foobar baz` works, and not argue about 'loops do not support escaping'. Escaping is a different topic, that does not apply here. I will not continue the discussion just for the sake of it. I think I have made my point clear by now. Regards, Frank -- In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. -- RFC 1925 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Frank Terbeck wrote: Daniel Barclay <[EMAIL PROTECTED]>: Frank Terbeck wrote: Daniel B. <[EMAIL PROTECTED]>: Frank Terbeck wrote: Mike McClain <[EMAIL PROTECTED]>: Frank Terbeck <[EMAIL PROTECTED]> wrote: ... ... people think spaces are bad in filenames. (They are not bad, ... In what sense are they not bad? ... However, they and other special characters do make it more difficult to handle arbitrary file names. No. They are never bad. It just takes a bit of practice to get used to do things in a robust way. But some common Unix tools aren't robust enough, in the sense of providing consistent escape/encoding mechanisms to handle special characters. For example, Emacs' tags files use commas as delimiters, and (last I knew) don't have an escape/encoding mechansim for representing a comma _in_ a file name, so (again, last I knew) a Linux kernel file with a comma in its name doesn't get processed right. So? Just because there are programs that limit the namespace of the files they are working with (which is _absolutely_ okay), does not mean, that shell scripts must obey to these programs' behaviours. How did you infer that I was arguing that the shells should follow those programs' behaviors? I wasn't arguing for that. I was pointing out that using shell-special characters in filenames was (somewhat) bad--it triggers problems with non-robust programs. Some commands do provide fully general mechanisms. (For example, find's -print0 and xargs' -0 option can handle any possible file pathname, including one with newline characters.) However, many commands do not. That typically makes it very difficult to handle "special" characters. ... Btw: xargs is not needed if your find binary is reasonably POSIX compliant. Just use '+' instead of ';' with the -exec option. (Yes, I know that GNU find didn't support this for quite some time.) Which version of find supports that? My (Sarge) system's man page for find doesn't seem to mention it yet. Does the "+" make find invoke the command with multiple filenames at once? However, what about the general case? It sounds like for i in `...` doesn't have an escaping/encoding mechanism that is sufficient to handle both (unescaped) asterisks that represent wildcards and escaped/encoded asterisks that represent literal asterists. I don't think you really understand, what is happening here. [snip] % foo='bar\ baz' ; % for i in `echo "$foo"` ; do echo "($i)" ; done (bar\) (baz) [snap] You _cannot_ escape things there. So how am I misunderstanding it? (I said it sounds like the shell for loop doesn't support escaping. You said one cannot escape things there. Those statements are consistent with each other. So how am I not understanding it? You see, this is not the type of thing, you want to teach beginners. Hence, 'for i in `...`' loops should be avoided by beginners (did you realize, that you dropped 'ls *glob*' from the backtick expression? Yes. Did you realize that I was trying to talk about cases that are more general that just globbing done by the shell? (By the way, why do keep sticking extraneous commas in the middle your sentences?) What about when one is building up a command string in a variable, say CMD, and then executing the assembled command via "$CMD"? The string contained in the variable is parsed as a normal command, right? So any logical string values that contain shell-special characters needs to be encoded with the usual shell escape-sequence syntax, right? (E.g., if I want to delete a file named "xx*yy", I would have to type something like: rm xx\*yy on a manual command line, so if I wanted the command line $CMD to execute that same rm command, CMD would have to contain the string "rm xx\*yy" (e.g., set by the command line: CMD="rm xx\\*yy" ) [...] Is there any such command (or, say, built-in function)? It sounds like you are looking for 'eval'. Yes, that does seem like the easier (and safer) ("right") way. But this has got noting to do with the original subject. And this misunderstanding leads me to the conclusion, that you should read up on how various expansions in POSIX shells work (and probably on a few common pitfalls, like maximum size of arguments for external processes, too.); Yeah, I know about that one (well, that there is a limit, if not details). > No offence. Next time, you might want to avoid telling something they don't understand for the things you then immediately proceed to show they have already understood. Daniel -- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Octavio Alvarez <[EMAIL PROTECTED]>: > On Sat, 21 Apr 2007 05:26:40 -0700, Thomas Jollans <[EMAIL PROTECTED]> > wrote: FS=" " >>> >>> IFS, I suppose. But: Why do you set it? >> ugh... good question. I wrote this ages ago ;-) > > To make sure spaces in filenames don't break them apart? Not an issue in default zsh setup. Regards, Frank -- In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. -- RFC 1925 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
On Sat, 21 Apr 2007 05:26:40 -0700, Thomas Jollans <[EMAIL PROTECTED]> wrote: FS=" " IFS, I suppose. But: Why do you set it? ugh... good question. I wrote this ages ago ;-) To make sure spaces in filenames don't break them apart? -- Octavio.
Re: Unix-ify File Names
Frank Terbeck wrote: > Thomas Jollans <[EMAIL PROTECTED]>: > [...] > > zsh, yay! :-) > Just a few remarks. > >> #!/bin/zsh >> >> FS=" >> " > > IFS, I suppose. But: Why do you set it? ugh... good question. I wrote this ages ago ;-) > >> for f in **/* > > for i in ./**/* # make f=./$f unneeded below. > >> do >> #required for files in the current dir. >> f=./$f >> #dir of file >> fp1=${f%/*}/ > > fp1={$f:h}# (think (h)ead) > >> #name of file >> fp2=${f##*/} > > fp2=${f:t}# (think (t)ail) > >> #dir should already be anti-spaced and lower-cased >> f=$fp1:gs/\ /_/:l$fp2 >> #the new name; anti-spaced and lower-cased >> f2=$f:gs/\ /_/:l >> >> if ! [[ $f = $f2 ]] >> then >> mv -v "$f" "$f2" >> fi >> done > > Of course, your expansions do work (and they are portable, as they > work in every POSIX shell), but if you use zsh already, why not ':t' > and ':h', as they are easier to read, IMHO. :-) > > Recursive globbing is just a wonderful feature, isn't it? :-) definitely. Thanks for the comments :-) Thomas signature.asc Description: OpenPGP digital signature
Re: Unix-ify File Names
Thomas Jollans <[EMAIL PROTECTED]>: [...] zsh, yay! :-) Just a few remarks. > #!/bin/zsh > > FS=" > " IFS, I suppose. But: Why do you set it? > for f in **/* for i in ./**/* # make f=./$f unneeded below. > do > #required for files in the current dir. > f=./$f > #dir of file > fp1=${f%/*}/ fp1={$f:h}# (think (h)ead) > #name of file > fp2=${f##*/} fp2=${f:t}# (think (t)ail) > #dir should already be anti-spaced and lower-cased > f=$fp1:gs/\ /_/:l$fp2 > #the new name; anti-spaced and lower-cased > f2=$f:gs/\ /_/:l > > if ! [[ $f = $f2 ]] > then > mv -v "$f" "$f2" > fi > done Of course, your expansions do work (and they are portable, as they work in every POSIX shell), but if you use zsh already, why not ':t' and ':h', as they are easier to read, IMHO. :-) Recursive globbing is just a wonderful feature, isn't it? :-) Regards, Frank -- In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. -- RFC 1925 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Masatran, R. Deepak wrote: > Since I frequently receive files from Microsoft Windows users, is there any > utility to unix-ify file names, that is, use lower case exclusively, use > hyphen as separator, etc.? > I wrote this little zsh script once; it unixifies all file names in the current and sub directories. This may or may not work in other shells (I believe bash is quite feature-rich as well, but I don't use it) #!/bin/zsh FS=" " for f in **/* do #required for files in the current dir. f=./$f #dir of file fp1=${f%/*}/ #name of file fp2=${f##*/} #dir should already be anti-spaced and lower-cased f=$fp1:gs/\ /_/:l$fp2 #the new name; anti-spaced and lower-cased f2=$f:gs/\ /_/:l if ! [[ $f = $f2 ]] then mv -v "$f" "$f2" fi done signature.asc Description: OpenPGP digital signature
Re: Unix-ify File Names
Frank Terbeck <[EMAIL PROTECTED]> wrote: > > find is just the tool you want to use for recursive actions on files > (or specialized actions, like sorting). find is an external program, > but it does not take a file list as argument, which makes it the > ultimate choice. > I took your advice to heart and started rewriting a portion of a script I use. It started out like so: for f in `find /etc/ \( -type f -o -type l \) -name "*"`; do fsum=$(/usr/bin/md5sum -b $f | cut -c-32) ; fls=$(ls -li --full-time $f) ; echo $fsum $fls ; done At your instigation I tried this next: find /etc/ \( -type f -o -type l \) -name "*" -exec { fsum=$(/usr/bin/md5sum -b {} | cut -c-32) ; fls=$(ls -li --full-time {} ) ; echo $fsum $fls ; } /; Along with screens full of junk that appeared to be the output I expected except that: 1) there were no newlines, 2) it was data relating to files in my home directory, 3) it only appeared after I hit ^C thinking the job was hung. While trying various quoting, escaping and other flails, I also recieved messages like the following: -bash: }: command not found -bash: syntax error near unexpected token `}' find: missing argument to `-exec' Failing that I tried a function: md5ls () { local fsum=$(/usr/bin/md5sum -b $1 | cut -c-32 2> /dev/null ) ; local fls=$(ls -li --full-time $1) ; echo $fsum $fls ; } find /etc/ \( -type f -o -type l \) -name "*" -exec md5ls {} \; find: md5ls: No such file or directory It did finally perform correctly when I converted the function to a script but it's slow. I kept playing and came up with this which is much quicker and returns the same data though in a different format: find /etc/ \( -type f -o -type l \) \ -printf '%8i %y %#m %n %u %g %10s %TY-%Tm-%Td %TT ' \ -name "*" -exec /usr/bin/md5sum -b {} \; The only problem with this one is that it fails when it tries to handle a link to a directory. There's no newline in the printf format because the output of md5sum provides one for files and links to files but not for links to directories or links to missing files. For some reason find hides md5sum's failure so that 'set -e' at the top of the script doesn't work. Paste this into a script of your own and you'll see what I mean. --- cut here --- #!/bin/sh # testof find ... -exec set -e # exit on error find /usr/bin/ -name "X11" -print -exec /usr/bin/md5sum {} \; echo $?; /usr/bin/md5sum /usr/bin/X11; echo $?; --- cut here --- With the set statement in there the second echo isn't reached but entered on the CL after the script finishes prints 1. Whew sorry I got a little long winded, I guess the question I have is how to make either the second or third form work. find ... -exec {command {}; command {}; command;} \; or find ... -exec function {} \; Still learning, Mike -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
On Thu, Apr 19, 2007 at 09:05:56PM +0200, Frank Terbeck wrote: > Daniel Barclay <[EMAIL PROTECTED]>: > > Some commands do provide fully general mechanisms. (For example, > > find's -print0 and xargs' -0 option can handle any possible file > > pathname, including one with newline characters.) However, many > > commands do not. That typically makes it very difficult to > > handle "special" characters. > > Most programs do support filenames with special characters (if they > don't it is clearly a bug). They just depend that the shell gives them > the correct string. > > Btw: xargs is not needed if your find binary is reasonably POSIX > compliant. Just use '+' instead of ';' with the -exec option. (Yes, I > know that GNU find didn't support this for quite some time.) Wow... never heard of this and was going to ask more about it, but I see it's in the find(1) manpage post-sarge. I use find a lot, xargs only when it seems necessary, but the standard response to someone using find has been that it's bad due to spawning umpteen processes. Looks like that's no longer the case! Hmm, -execdir looks new as well, and very useful... Thanks! Ken -- Ken Irving -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Daniel Barclay <[EMAIL PROTECTED]>: > Frank Terbeck wrote: >> Daniel B. <[EMAIL PROTECTED]>: >>> Frank Terbeck wrote: Mike McClain <[EMAIL PROTECTED]>: > Frank Terbeck <[EMAIL PROTECTED]> wrote: > >>> for FILE in `ls *$1` ; do >>> ... b) it breaks on filenames with spaces (and other special characters). >>> ...> Using 'for i in `ls *`'-type loops breaks this and is one of the main reasons why people think spaces are bad in filenames. (They are not bad, ... >>> In what sense are they not bad? Yes, they're certainly legal per the >>> filesystem and most tools that take filenames. However, they and other >>> special characters do make it more difficult to handle arbitrary file >>> names. >> No. They are never bad. It just takes a bit of practice to get used to >> do things in a robust way. > > But some common Unix tools aren't robust enough, in the sense of > providing consistent escape/encoding mechanisms to handle special > characters. > > For example, Emacs' tags files use commas as delimiters, and (last I > knew) don't have an escape/encoding mechansim for representing a comma > _in_ a file name, so (again, last I knew) a Linux kernel file with > a comma in its name doesn't get processed right. So? Just because there are programs that limit the namespace of the files they are working with (which is _absolutely_ okay), does not mean, that shell scripts must obey to these programs' behaviours. The shell itself can handle whitespace in filenames just fine. No need to not use robust techniques, at all. It would be even worse, to use techniques that will potentially break. > Some commands do provide fully general mechanisms. (For example, > find's -print0 and xargs' -0 option can handle any possible file > pathname, including one with newline characters.) However, many > commands do not. That typically makes it very difficult to > handle "special" characters. Most programs do support filenames with special characters (if they don't it is clearly a bug). They just depend that the shell gives them the correct string. Btw: xargs is not needed if your find binary is reasonably POSIX compliant. Just use '+' instead of ';' with the -exec option. (Yes, I know that GNU find didn't support this for quite some time.) >>> For example, if someone wants to use ls's feature of sorting by date >>> (e.g., "ls -t *$1"), they cant combine it with the for-loop construct >>> above (reliably). >> Okay, I admit that sorting is one of the rare cases where >> [snip] >> find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; >> do >> ... >> done >> [snap] >> or > ... >> loops are justified. > > I think you missed my point--the question of how to (or whether one > can) use for i in `...` to loop over a list of file names that are > output by some arbitrary program. You just do not need that. And if you do, it's very hard to get right. 'for i in *' supports _every_ filename you could think of (including filenames with newlines. Why use something, that is more expensive, more error-prone and less powerful? > The particular example of starting with sorting by date with "ls -d" > has the solution of changing to an entirely different solution (using > find and sorting as above). find is just the tool you want to use for recursive actions on files (or specialized actions, like sorting). find is an external program, but it does not take a file list as argument, which makes it the ultimate choice. > However, what about the general case? > > It sounds like for i in `...` doesn't have an escaping/encoding > mechanism that is sufficient to handle both (unescaped) asterisks > that represent wildcards and escaped/encoded asterisks that represent > literal asterists. I don't think you really understand, what is happening here. [snip] % foo='bar\ baz' ; % for i in `echo "$foo"` ; do echo "($i)" ; done (bar\) (baz) [snap] You _cannot_ escape things there. If you want to know what's going on, consult the manual of your shell (or the respective POSIX document). Normally, a section about 'Word Expansions' will describe what happens in detail. You see, this is not the type of thing, you want to teach beginners. Hence, 'for i in `...`' loops should be avoided by beginners (did you realize, that you dropped 'ls *glob*' from the backtick expression? If you really know what you are doing, you can get these types of loops more or less right. But _never_ _ever_ if the command is getting a file list via globbing.) >>> Hey, is there any command for taking a filename and escaping/encoding >>> shell-special characters to make a string that, when parsed by the >>> shell, specifies that filename? I'm thinking of something that would >>> work like this: >>> >>>for i in `encode_for_shell *` ; ... >> [...] >> No, that is not how shells work. > > Maybe I gave the wrong kind of example (a for loop, which apparently > doesn't parse and interpreting things enough) for asking about an
Re: Unix-ify File Names
Frank Terbeck wrote: Daniel B. <[EMAIL PROTECTED]>: Frank Terbeck wrote: Mike McClain <[EMAIL PROTECTED]>: Frank Terbeck <[EMAIL PROTECTED]> wrote: for FILE in `ls *$1` ; do ... b) it breaks on filenames with spaces (and other special characters). ...> Using 'for i in `ls *`'-type loops breaks this and is one of the main reasons why people think spaces are bad in filenames. (They are not bad, ... In what sense are they not bad? Yes, they're certainly legal per the filesystem and most tools that take filenames. However, they and other special characters do make it more difficult to handle arbitrary file names. No. They are never bad. It just takes a bit of practice to get used to do things in a robust way. But some common Unix tools aren't robust enough, in the sense of providing consistent escape/encoding mechanisms to handle special characters. For example, Emacs' tags files use commas as delimiters, and (last I knew) don't have an escape/encoding mechansim for representing a comma _in_ a file name, so (again, last I knew) a Linux kernel file with a comma in its name doesn't get processed right. Some commands do provide fully general mechanisms. (For example, find's -print0 and xargs' -0 option can handle any possible file pathname, including one with newline characters.) However, many commands do not. That typically makes it very difficult to handle "special" characters. For example, if someone wants to use ls's feature of sorting by date (e.g., "ls -t *$1"), they cant combine it with the for-loop construct above (reliably). Okay, I admit that sorting is one of the rare cases where [snip] find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; do ... done [snap] or ... loops are justified. I think you missed my point--the question of how to (or whether one can) use for i in `...` to loop over a list of file names that are output by some arbitrary program. The particular example of starting with sorting by date with "ls -d" has the solution of changing to an entirely different solution (using find and sorting as above). However, what about the general case? It sounds like for i in `...` doesn't have an escaping/encoding mechanism that is sufficient to handle both (unescaped) asterisks that represent wildcards and escaped/encoded asterisks that represent literal asterists. Hey, is there any command for taking a filename and escaping/encoding shell-special characters to make a string that, when parsed by the shell, specifies that filename? I'm thinking of something that would work like this: for i in `encode_for_shell *` ; ... [...] No, that is not how shells work. Maybe I gave the wrong kind of example (a for loop, which apparently doesn't parse and interpreting things enough) for asking about an encode command. What about when one is building up a command string in a variable, say CMD, and then executing the assembled command via "$CMD"? The string contained in the variable is parsed as a normal command, right? So any logical string values that contain shell-special characters needs to be encoded with the usual shell escape-sequence syntax, right? (E.g., if I want to delete a file named "xx*yy", I would have to type something like: rm xx\*yy on a manual command line, so if I wanted the command line $CMD to execute that same rm command, CMD would have to contain the string "rm xx\*yy" (e.g., set by the command line: CMD="rm xx\\*yy" ) So if I were listing file names (e.g., with file -print0 and maybe some further filtering with, say grep) and I wanted to assemble a command that operated on the named files without interpreting any shell-special characters in the file names when the assembled command line was parsed by the shell and executed, I would need to map the actual file names to the shell represention of those file names. For example, if the list included the name "xx*xx", an encoder could map that string to the string "xx\*xx" (probably written "xx\\*xx" as a literal in sh/bash/etc.), which could be appended to the command string being assembled (and surrounded by whatever separators were needed to separate it from earlier and later tokens in the command line). My encode_for_shell command would be applicable to that case. Is there any such command (or, say, built-in function)? ... But if you are writing real scripts, that are supposed to work (with data, you potentially don't know in the first place), you will need to do things in a proper and robust way. Definitely. Sorry for the lengthy mail. I hope I could make myself a little clearer and didn't spread buggy code. :-) No problem. I agree with counteracting error-prone suggestions. Daniel -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Frank Terbeck <[EMAIL PROTECTED]> wrote: > a) `ls *` is an _external_ process. > b) it breaks on filenames with spaces (and other special characters). > c) people commonly use 'ls --color' or 'ls -F' aliases for ls. > There is _no_ reason why 'ls' should ever be used to generate file > lists for loops of any kind. Thanks, Frank, for the clear and comprehensive answer. Just a little better educated now, Mike -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
H.S. <[EMAIL PROTECTED]>: > Frank Terbeck wrote: > >> b) it breaks on filenames with spaces (and other special characters). >> While newlines and other special characters might be rather weird >> for filenames, spaces are perfectly okay and normal in filenames. >> Using 'for i in `ls *`'-type loops breaks this and is one of the >> main reasons why people think spaces are bad in filenames. >> (They are not bad, some people just do not know how to handle >> them properly.) > > I usually get by this problem by enclosing the variable in double quotes > within the for loop. A basic example: > > $> for f in *.jpg; do ls "$f"; done Yeah, you are using the for-loop construct absolutely right. I was arguing about for i in `ls *.jpg` ; do whatever "$i" ; done And you are, of course, right, that in POSIX shells, parameters should be double-quoted when used in almost every case, unless you know that you want splitting by $IFS. Regards, Frank -- In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. -- RFC 1925 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Daniel B. <[EMAIL PROTECTED]>: > Frank Terbeck wrote: >> Mike McClain <[EMAIL PROTECTED]>: >>> Frank Terbeck <[EMAIL PROTECTED]> wrote: >>> > for FILE in `ls *$1` ; do > ... >> b) it breaks on filenames with spaces (and other special characters). > ...> Using 'for i in `ls *`'-type loops breaks this and is one of the >> main reasons why people think spaces are bad in filenames. >> (They are not bad, ... > > In what sense are they not bad? Yes, they're certainly legal per the > filesystem and most tools that take filenames. However, they and other > special characters do make it more difficult to handle arbitrary file > names. No. They are never bad. It just takes a bit of practice to get used to do things in a robust way. > For example, if someone wants to use ls's feature of sorting by date > (e.g., "ls -t *$1"), they cant combine it with the for-loop construct > above (reliably). Okay, I admit that sorting is one of the rare cases where [snip] find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; do ... done [snap] or [snip] IFS=' ' for i in `find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2` ; do ... done [snap] loops are justified. At least in POSIX shell. I really didn't think of sorting in my original mail. Thanks for noting. (But still you don't use broken for loops.) Note, that the for loop does _not_ use an external program with globbing. And it only works with spaces, because of the changed $IFS parameter. This may lead to unexpected results if it is not reset to it's old value inside of the loop. However, Bash, ksh and zsh users may still overcome this: [snip] oifs="$IFS" IFS=' ' set -- x $(find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2) IFS="$oifs" shift while [ -n "$1" ] ; do echo file: "$1" shift done [snap] This will _not_ work in a pure POSIX shell like dash, as it only permits 10 positional parameters; those shells will indeed have to used a while loop fed by find(1) (like I noted above). Of course, this breaks with newline characters in filenames, but newlines are really uncommon (probably on left on a system by users who don't want their files to be deleted. :-)). And in zsh, you would actually do: [snip] for i in **/*(om) ; do foobar $i ; done [snap] Yes, zsh does recursive globbing and lets you define the sorting of the generated file list. Its really a pity that find(1) does not allow sorting by itself (and if it was only by a handful of criteria). But we are slowly leaving the topic, here. I just wanted to make sure that beginners are not confronted with problematic for-loop constructs like in the first mail I was replying to. Manipulating $IFS is probably not something to confront beginners with either. > Hey, is there any command for taking a filename and escaping/encoding > shell-special characters to make a string that, when parsed by the > shell, specifies that filename? I'm thinking of something that would > work like this: > >for i in `encode_for_shell *` ; ... [...] No, that is not how shells work. Just to repeat this once and for all: _Never_ do 'for i in `ls *`'. Never. It's broken. > > some people just do not know how to handle them properly.) > > You might not be, but it sounds like you're blaming users. Sometimes > it's developers of tools (including designers of formats) that don't > have an escape mechanism to handle spaces or other special characters > (or don't provide support for encoding special characters) who are to > blame. Well, the shell is really really old. It has its flaws. That is why it is not that easy to use and understand for beginners. Especially, if they are taught how to do things wrong, that often. I admit that it can be quite difficult to do things right[tm]. I'm making mistakes when scripting in 'sh' all the time (at least if the script is a little more than trivial). [...] >> Some people use things like this instead: >> [snip] >> ls * | while read file ; do whatever_command "$file" ; done >> [snap] >> This is just a little better than the for loop. It still breaks in >> some situations. > > I see how it would break with a newline character in a file name. > What other cases break? Broken aliases. Too long argument lists. Yeah, 'ls | while ...' does not have the argument problem, but as soon as you start globbing, it's there. >> There is _no_ reason why 'ls' should ever be used to generate file >> lists for loops of any kind. > > What about things that ls does that the shell's expansion of wildcards > does not do (e.g., sorting by date or size)? > > (Maybe ls should have an equilavent to find's "-print0" option.) In these cases, you use find(1) (in conjunction with other standard tools, like sort, cut etc.). Please note, that what I am writing here are no must-dos, of course. I do not intend to attack anybody. I mean, there are people who know POSIX shell scripting far better than I do, so who am I to judge others? But 'for i in `ls *`' is really annoyingly wron
Re: Unix-ify File Names
Frank Terbeck wrote: b) it breaks on filenames with spaces (and other special characters). While newlines and other special characters might be rather weird for filenames, spaces are perfectly okay and normal in filenames. Using 'for i in `ls *`'-type loops breaks this and is one of the main reasons why people think spaces are bad in filenames. (They are not bad, some people just do not know how to handle them properly.) I usually get by this problem by enclosing the variable in double quotes within the for loop. A basic example: $> for f in *.jpg; do ls "$f"; done ->HS -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Frank Terbeck wrote: Mike McClain <[EMAIL PROTECTED]>: Frank Terbeck <[EMAIL PROTECTED]> wrote: for FILE in `ls *$1` ; do ... b) it breaks on filenames with spaces (and other special characters). ...> Using 'for i in `ls *`'-type loops breaks this and is one of the main reasons why people think spaces are bad in filenames. (They are not bad, ... In what sense are they not bad? Yes, they're certainly legal per the filesystem and most tools that take filenames. However, they and other special characters do make it more difficult to handle arbitrary file names. For example, if someone wants to use ls's feature of sorting by date (e.g., "ls -t *$1"), they cant combine it with the for-loop construct above (reliably). Hey, is there any command for taking a filename and escaping/encoding shell-special characters to make a string that, when parsed by the shell, specifies that filename? I'm thinking of something that would work like this: for i in `encode_for_shell *` ; ... (mapping each argument to a shell string for the argument's value) or for i in `find ... -print0 | xargs -0 encode_for_shell` ; ... or cmd="some_command" cmd="${cmd} `encode_for_shell $file_name_with_special_chars`" $cmd (I'm thinking of something like Java's java.util.regex.Pattern.quote(String) (see http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String) ) or Ruby's RegExp::escape(...) (see http://www.ruby-doc.org/core/classes/Regexp.html#M001216 ), but escaping/encoding for shell parsing instead of for regular-expression parsing.) > some people just do not know how to handle them properly.) You might not be, but it sounds like you're blaming users. Sometimes it's developers of tools (including designers of formats) that don't have an escape mechanism to handle spaces or other special characters (or don't provide support for encoding special characters) who are to blame. I am aware that there are HOWTOs and other documents out there that propagate 'for i `ls *foobar*`' loops. I don't know why their authors do this. If they didn't know better they shouldn't have written a shell scripting HOWTO in the first place. Unfortunately for those they mislead, those authors don't know enough to know they don't know better. (They must not be the type to dig into things (e.g., shell syntax) to really understand them, or at least enough to notice that they don't fully understand them yet.) Some people use things like this instead: [snip] ls * | while read file ; do whatever_command "$file" ; done [snap] This is just a little better than the for loop. It still breaks in some situations. I see how it would break with a newline character in a file name. What other cases break? There is _no_ reason why 'ls' should ever be used to generate file lists for loops of any kind. What about things that ls does that the shell's expansion of wildcards does not do (e.g., sorting by date or size)? (Maybe ls should have an equilavent to find's "-print0" option.) Daniel -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Mike McClain <[EMAIL PROTECTED]>: > Frank Terbeck <[EMAIL PROTECTED]> wrote: > > > > for FILE in `ls *$1` ; do > > > > Please don't teach beginners to do for loops like this. It's broken in > > various ways. Just do: > > > > for FILE in *"$1" ; do > > > > Being a self taught script writer I just have to ask what are the > 'various ways' in which the first form is broken? a) `ls *` is an _external_ process. And performance is not the main reason we this is bad. It would have to be a lot of forks to make a notable difference. However, the big problem here is, that you can only pass a limited number of arguments to an external program. This limit is reached quicker than you might think, and it is one of the features of for-loops to _overcome_ this limitation. b) it breaks on filenames with spaces (and other special characters). While newlines and other special characters might be rather weird for filenames, spaces are perfectly okay and normal in filenames. Using 'for i in `ls *`'-type loops breaks this and is one of the main reasons why people think spaces are bad in filenames. (They are not bad, some people just do not know how to handle them properly.) c) people commonly use 'ls --color' or 'ls -F' aliases for ls. This is not a bad thing in the first place because it helps to simplify the overview you get from ls. However, it has a bad impact on scripting: [snip] % for i in `ls -F /bin/*sh` ; do stat "$i" ; done stat: cannot stat `/bin/sh@': No such file or directory % for i in `ls --color /bin/sh` ; do stat "$i" ; done stat: cannot stat `\033[00m\033[01;36m/bin/sh\033[00m': No such file or directory stat: cannot stat `\033[m': No such file or directory [snap] Using '--color=auto' instead of just '--color' helps a little with problem, but not everyone is aware of it. Yes, I know that aliases are normally not enabled in scripts, but for-loops are very handy as one-liners, so this _is_ indeed a problem. These are the main reasons that come to my mind. There might be others. I am aware that there are HOWTOs and other documents out there that propagate 'for i `ls *foobar*`' loops. I don't know why their authors do this. If they didn't know better they shouldn't have written a shell scripting HOWTO in the first place. Some people use things like this instead: [snip] ls * | while read file ; do whatever_command "$file" ; done [snap] This is just a little better than the for loop. It still breaks in some situations. Also 'for i in * ; do foo "$i" ; done' is much clearer, shorter and simpler to understand. There is _no_ reason why 'ls' should ever be used to generate file lists for loops of any kind. Whoever created this myth should be hung. :-) Oh, and doing the following will break in certain situations as well (which is something people do for recursive actions): [snip] find . -name '*' | while read file ; do foobar "$file" ; done [snap] If you need to do recursive actions, learn to use find(1) properly (with its '-exec' option), or switch to a shell, that can do it by itself (like zsh, for example). Regards, Frank -- In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. -- RFC 1925 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
On Tue, 17 Apr 2007, Roberto C. S?nchez wrote: On Tue, Apr 17, 2007 at 03:36:26PM -0700, Mike McClain wrote: Frank Terbeck <[EMAIL PROTECTED]> wrote: for FILE in `ls *$1` ; do Please don't teach beginners to do for loops like this. It's broken in various ways. Just do: for FILE in *"$1" ; do Being a self taught script writer I just have to ask what are the 'various ways' in which the first form is broken? The biggest one I can see is that it spawns an entire process when none is needed. Regards, -Roberto -- Roberto C. S?nchez http://people.connexer.com/~roberto http://www.connexer.com Duly noted, makes perfect sense. Bad habits are hard ones to break. -+- 8 out of 10 Owners who Expressed a Preference said Their Cats Preferred Techno.
Re: Unix-ify File Names
On Tue, Apr 17, 2007 at 03:36:26PM -0700, Mike McClain wrote: > Frank Terbeck <[EMAIL PROTECTED]> wrote: > > > > for FILE in `ls *$1` ; do > > > > Please don't teach beginners to do for loops like this. It's broken in > > various ways. Just do: > > > > for FILE in *"$1" ; do > > > > Being a self taught script writer I just have to ask what are the > 'various ways' in which the first form is broken? > The biggest one I can see is that it spawns an entire process when none is needed. Regards, -Roberto -- Roberto C. Sánchez http://people.connexer.com/~roberto http://www.connexer.com signature.asc Description: Digital signature
Re: Unix-ify File Names
Frank Terbeck <[EMAIL PROTECTED]> wrote: > > for FILE in `ls *$1` ; do > > Please don't teach beginners to do for loops like this. It's broken in > various ways. Just do: > > for FILE in *"$1" ; do > Being a self taught script writer I just have to ask what are the 'various ways' in which the first form is broken? Anticipating enlightenment, Mike -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
On Tue, Apr 17, 2007 at 09:23:19AM -0700, Leonid Grinberg wrote: > >opendir(DIR, system('pwd')); > > Sorry, that should be: > > opendir(DIR, `pwd`); > > ` returns output. system() does not. Or just use '.' as the directory name. -- Ken Irving -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
opendir(DIR, system('pwd')); Sorry, that should be: opendir(DIR, `pwd`); ` returns output. system() does not. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
Or, in Perl (might as well): #!/usr/bin/perl -w use strict; opendir(DIR, system('pwd')); my @files = readdir(DIR); closedir(DIR); my $new_name; foreach (@files) { $new_name = lc($_); $new_name =~ s/\ /\-/g; system('mv -i $_ $new_name'); } -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
On Mon, 16 Apr 2007 20:24:57 -0700, Masatran, R. Deepak <[EMAIL PROTECTED]> wrote: Since I frequently receive files from Microsoft Windows users, is there any utility to unix-ify file names, that is, use lower case exclusively, use hyphen as separator, etc.? I use something like this: #!/bin/bash OLD_FILE_NAME=$1 # You might want to add a y/ rule to remove accents. NEW_FILE_NAME=`echo "ABC DEF" | sed -e '\ s/ /-/g; y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/; '` mv $OLD_FILE_NAME $NEW_FILE_NAME -- Octavio.
Re: Unix-ify File Names
Jeff D <[EMAIL PROTECTED]>: > On Tue, 17 Apr 2007, Masatran, R. Deepak wrote: > > Since I frequently receive files from Microsoft Windows users, is there any > > utility to unix-ify file names, that is, use lower case exclusively, use > > hyphen as separator, etc.? [...] > #!/bin/sh > #change spaces to hyphens > rename 's/\ /-/g' *$1 > > #uppercase to lower > for FILE in `ls *$1` ; do Please don't teach beginners to do for loops like this. It's broken in various ways. Just do: for FILE in *"$1" ; do > filename=`basename $FILE` > newfile=`echo $filename | tr A-Z a-z` > if [ "$filename" != "$n" ] ; then > mv $filename $newfile Maybe adding '-i' to the mv call would be a good idea to avoid accidentally overwriting existing files. > fi > done [...] Regards, Frank -- In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. -- RFC 1925 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Unix-ify File Names
On Tue, 17 Apr 2007, Masatran, R. Deepak wrote: Since I frequently receive files from Microsoft Windows users, is there any utility to unix-ify file names, that is, use lower case exclusively, use hyphen as separator, etc.? -- Masatran, R. Deepak <http://research.iiit.ac.in/~masatran/> Not directly, but it's easy enough to do: #!/bin/sh #change spaces to hyphens rename 's/\ /-/g' *$1 #uppercase to lower for FILE in `ls *$1` ; do filename=`basename $FILE` newfile=`echo $filename | tr A-Z a-z` if [ "$filename" != "$n" ] ; then mv $filename $newfile fi done --- $ ls New Files.TXT SOME New Files.TXT $ sh ~/unixfy.sh TXT $ ls new-files.txt some-new-files.txt -+- 8 out of 10 Owners who Expressed a Preference said Their Cats Preferred Techno. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Unix-ify File Names
Since I frequently receive files from Microsoft Windows users, is there any utility to unix-ify file names, that is, use lower case exclusively, use hyphen as separator, etc.? -- Masatran, R. Deepak <http://research.iiit.ac.in/~masatran/> pgpBGtYS9v62i.pgp Description: PGP signature