Re: Unix-ify File Names

2007-04-28 Thread Masatran, R. Deepak
* Masatran, R. Deepak [EMAIL PROTECTED] 2007-04-17
 Since I frequently receive files from Microsoft Windows users, is there any
 utility to unix-ify file names, that is, use lower case exclusively, use
 hyphen as separator, etc.?

I wrote the script below just now. Kindly give comments. This script is also
available at http://research.iiit.ac.in/~masatran/scripts/unixify.

#!/usr/bin/perl
# Unix-ify name of file given as argument.
# For multiple files, use along with the find program of Unix.
# Supports Unicode file names
# Correctly handles dangerous characters in file names.
# How to get confirmation before over-writing a file, without sacrificing 
portability?

use strict;
use warnings;
use utf8;

my $name_original = shift;
my $name_lowercase = lc $name_original;
my $name_unix = join(- = split(/\W/ = $name_lowercase));
rename $name_original = $name_unix;

-- 
Masatran, R. Deepak http://research.iiit.ac.in/~masatran/


pgpe9Lon1Ds6v.pgp
Description: PGP signature


Re: Unix-ify File Names

2007-04-24 Thread Frank Terbeck
Daniel Barclay [EMAIL PROTECTED]:
 Frank Terbeck wrote:
 Daniel Barclay [EMAIL PROTECTED]:
 Frank Terbeck wrote:
 Daniel B. [EMAIL PROTECTED]:
[...]
 For example, Emacs' tags files use commas as delimiters, and (last I
 knew) don't have an escape/encoding mechansim for representing a comma
 _in_ a file name, so (again, last I knew) a Linux kernel file with
 a comma in its name doesn't get processed right.
 So? Just because there are programs that limit the namespace of the
 files they are working with (which is _absolutely_ okay), does not
 mean, that shell scripts must obey to these programs' behaviours. 

 How did you infer that I was arguing that the shells should follow
 those programs' behaviors?  I wasn't arguing for that.

 I was pointing out that using shell-special characters in filenames
 was (somewhat) bad--it triggers problems with non-robust programs.

Then where is the point for the discussion? I am not telling anyone
to bring filenames with weird characters into their system. But it is
possible to support them if they are there.

If you do not know about the data you are dealing with, limiting the
code does not make sense. But I said that before.

[...]
 Btw: xargs is not needed if your find binary is reasonably POSIX
 compliant. Just use '+' instead of ';' with the -exec option. (Yes, I
 know that GNU find didn't support this for quite some time.)

 Which version of find supports that?  My (Sarge) system's man page
 for find doesn't seem to mention it yet.

I said GNU find didn't support it for quite some time.
But nevertheless, SUSv3 defined it before (and there is not just the
GNU version of find in this world).

The GNU find in etch supports it.

 Does the + make find invoke the command with multiple filenames at
 once?

Yes.

 However, what about the general case?

 It sounds like for i in `...` doesn't have an escaping/encoding
 mechanism that is sufficient to handle both (unescaped) asterisks
 that represent wildcards and escaped/encoded asterisks that represent
 literal asterists.
 I don't think you really understand, what is happening here.
 [snip]
 % foo='bar\ baz' ; % for i in `echo $foo` ; do echo ($i) ; done
 (bar\)
 (baz)
 [snap]
 You _cannot_ escape things there. 

 So how am I misunderstanding it?  (I said it sounds like the shell
 for loop doesn't support escaping.  You said one cannot escape
 things there.  Those statements are consistent with each other.
 So how am I not understanding it?

See end of mail.

 You see, this is not the type of thing, you want to teach beginners.
 Hence, 'for i in `...`' loops should be avoided by beginners (did you
 realize, that you dropped 'ls *glob*' from the backtick expression? 

 Yes.  Did you realize that I was trying to talk about cases that are
 more general that just globbing done by the shell?

Yes. Otherwise you would have left the glob in there.
But you are narrowing the subject until it fits your argumentation.

 (By the way, why do keep sticking extraneous commas in the middle
 your sentences?)

No native English speaker here. Want to recommend a book about
grammar?

[...]
 Is there any such command (or, say, built-in function)?
 It sounds like you are looking for 'eval'.

 Yes, that does seem like the easier (and safer) (right) way.

No. 'eval' is a great tool and has its uses. But it does not make
loops easier, nor safer.

 But this has got noting to do with the original subject.
 And this misunderstanding leads me to the conclusion, that you should
 read up on how various expansions in POSIX shells work (and probably
 on a few common pitfalls, like maximum size of arguments for external
 processes, too.);

 Yeah, I know about that one (well, that there is a limit, if not
 details).

You do not know it. Otherwise you would know how the expansion in
  for var in `foobar baz`
works, and not argue about 'loops do not support escaping'. Escaping
is a different topic, that does not apply here.

I will not continue the discussion just for the sake of it.
I think I have made my point clear by now.

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
  -- RFC 1925


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-23 Thread Daniel Barclay

Frank Terbeck wrote:

Daniel Barclay [EMAIL PROTECTED]:

Frank Terbeck wrote:

Daniel B. [EMAIL PROTECTED]:

Frank Terbeck wrote:

Mike McClain [EMAIL PROTECTED]:

Frank Terbeck [EMAIL PROTECTED] wrote:

...

... people think spaces are bad in filenames.
(They are not bad, ...

In what sense are they not bad?  ... However, they and other
special characters do make it more difficult to handle arbitrary file
names.

No. They are never bad. It just takes a bit of practice to get used to
do things in a robust way.

But some common Unix tools aren't robust enough, in the sense of
providing consistent escape/encoding mechanisms to handle special
characters.

For example, Emacs' tags files use commas as delimiters, and (last I
knew) don't have an escape/encoding mechansim for representing a comma
_in_ a file name, so (again, last I knew) a Linux kernel file with
a comma in its name doesn't get processed right.


So? Just because there are programs that limit the namespace of the
files they are working with (which is _absolutely_ okay), does not
mean, that shell scripts must obey to these programs' behaviours. 


How did you infer that I was arguing that the shells should follow
those programs' behaviors?  I wasn't arguing for that.


I was pointing out that using shell-special characters in filenames
was (somewhat) bad--it triggers problems with non-robust programs.




Some commands do provide fully general mechanisms.  (For example,
find's -print0 and xargs' -0 option can handle any possible file
pathname, including one with newline characters.)  However, many
commands do not.  That typically makes it very difficult to
handle special characters.



...

Btw: xargs is not needed if your find binary is reasonably POSIX
compliant. Just use '+' instead of ';' with the -exec option. (Yes, I
know that GNU find didn't support this for quite some time.)


Which version of find supports that?  My (Sarge) system's man page
for find doesn't seem to mention it yet.

Does the + make find invoke the command with multiple filenames at
once?




However, what about the general case?

It sounds like for i in `...` doesn't have an escaping/encoding
mechanism that is sufficient to handle both (unescaped) asterisks
that represent wildcards and escaped/encoded asterisks that represent
literal asterists.


I don't think you really understand, what is happening here.

[snip]
% foo='bar\ baz' ; % for i in `echo $foo` ; do echo ($i) ; done
(bar\)
(baz)
[snap]

You _cannot_ escape things there. 


So how am I misunderstanding it?  (I said it sounds like the shell
for loop doesn't support escaping.  You said one cannot escape
things there.  Those statements are consistent with each other.
So how am I not understanding it?


You see, this is not the type of thing, you want to teach beginners.
Hence, 'for i in `...`' loops should be avoided by beginners (did you
realize, that you dropped 'ls *glob*' from the backtick expression? 


Yes.  Did you realize that I was trying to talk about cases that are
more general that just globbing done by the shell?

(By the way, why do keep sticking extraneous commas in the middle
your sentences?)



What about when one is building up a command string in a variable,
say CMD, and then executing the assembled command via $CMD?

The string contained in the variable is parsed as a normal command,
right?  So any logical string values that contain shell-special
characters needs to be encoded with the usual shell escape-sequence
syntax, right?

(E.g., if I want to delete a file named xx*yy, I would have to type
something like:

rm xx\*yy

on a manual command line, so if I wanted the command line

$CMD

to execute that same rm command, CMD would have to contain the
string rm xx\*yy (e.g., set by the command line:

   CMD=rm xx\\*yy

)

[...]

Is there any such command (or, say, built-in function)?


It sounds like you are looking for 'eval'.


Yes, that does seem like the easier (and safer) (right) way.



But this has got noting to do with the original subject.
And this misunderstanding leads me to the conclusion, that you should
read up on how various expansions in POSIX shells work (and probably
on a few common pitfalls, like maximum size of arguments for external
processes, too.);   


Yeah, I know about that one (well, that there is a limit, if not
details).


 No offence.

Next time, you might want to avoid telling something they don't
understand for the things you then immediately proceed to show
they have already understood.




Daniel

--



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Unix-ify File Names

2007-04-21 Thread Mike McClain
Frank Terbeck [EMAIL PROTECTED] wrote:

 find is just the tool you want to use for recursive actions on files
 (or specialized actions, like sorting). find is an external program,
 but it does not take a file list as argument, which makes it the
 ultimate choice.

I took your advice to heart and started rewriting a portion of a 
script I use. It started out like so:
for f in `find /etc/ \( -type f -o -type l \) -name *`; do
fsum=$(/usr/bin/md5sum -b $f | cut -c-32) ;
fls=$(ls -li --full-time $f) ;
echo $fsum $fls ;
done
At your instigation I tried this next:
find /etc/ \( -type f -o -type l \) -name * -exec {
fsum=$(/usr/bin/md5sum -b {} | cut -c-32) ;
fls=$(ls -li --full-time {} ) ;
echo $fsum $fls ;
} /;

Along with screens full of junk that appeared to be the output I
expected except that:
1) there were no newlines, 
2) it was data relating to files in my home directory,
3) it only appeared after I hit ^C thinking the job was hung.
While trying various quoting, escaping and other flails, I also 
recieved messages like the following:
-bash: }: command not found
-bash: syntax error near unexpected token `}'
find: missing argument to `-exec'

Failing that I tried a function:

md5ls () {
local fsum=$(/usr/bin/md5sum -b $1 | cut -c-32   2 /dev/null ) ;
local fls=$(ls -li --full-time $1) ;
echo $fsum $fls ;
}
find /etc/ \( -type f -o -type l \) -name * -exec md5ls {} \;
find: md5ls: No such file or directory

It did finally perform correctly when I converted the function to a 
script but it's slow.

I kept playing and came up with this which is much quicker and returns
the same data though in a different format:

find /etc/ \( -type f -o -type l \) \
-printf '%8i %y %#m %n %u %g %10s %TY-%Tm-%Td %TT  ' \
-name * -exec /usr/bin/md5sum -b {} \;

The only problem with this one is that it fails when it tries to 
handle a link to a directory. There's no newline in the printf
format because the output of md5sum provides one for files and
links to files but not for links to directories or links to 
missing files.

For some reason find hides md5sum's failure so that 'set -e' at
the top of the script doesn't work. Paste this into a script of
your own and you'll see what I mean.

--- cut here ---
#!/bin/sh
#   testof find ... -exec

set -e  # exit on error

find /usr/bin/ -name X11 -print -exec /usr/bin/md5sum {} \;
echo $?;
/usr/bin/md5sum /usr/bin/X11;
echo $?;
--- cut here ---

With the set statement in there the second echo isn't reached
but entered on the CL after the script finishes prints 1.

Whew sorry I got a little long winded, I guess the question I 
have is how to make either the second or third form work.
find ... -exec {command {}; command {}; command;} \;
or
find ... -exec function {} \;

Still learning,
Mike


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-21 Thread Thomas Jollans
Masatran, R. Deepak wrote:
 Since I frequently receive files from Microsoft Windows users, is there any
 utility to unix-ify file names, that is, use lower case exclusively, use
 hyphen as separator, etc.?
 

I wrote this little zsh script once; it unixifies all file names in the
current and sub directories. This may or may not work in other shells (I
believe bash is quite feature-rich as well, but I don't use it)



#!/bin/zsh

FS=


for f in **/*
do
  #required for files in the current dir.
  f=./$f
  #dir of file
  fp1=${f%/*}/
  #name of file
  fp2=${f##*/}
  #dir should already be anti-spaced and lower-cased
  f=$fp1:gs/\ /_/:l$fp2
  #the new name; anti-spaced and lower-cased
  f2=$f:gs/\ /_/:l

  if ! [[ $f = $f2 ]]
  then
mv -v $f $f2
  fi
done




signature.asc
Description: OpenPGP digital signature


Re: Unix-ify File Names

2007-04-21 Thread Frank Terbeck
Thomas Jollans [EMAIL PROTECTED]:
[...]

zsh, yay! :-)
Just a few remarks.

 #!/bin/zsh
 
 FS=
 

IFS, I suppose. But: Why do you set it?

 for f in **/*

  for i in ./**/* # make f=./$f unneeded below.

 do
   #required for files in the current dir.
   f=./$f
   #dir of file
   fp1=${f%/*}/

fp1={$f:h}# (think (h)ead)

   #name of file
   fp2=${f##*/}

fp2=${f:t}# (think (t)ail)

   #dir should already be anti-spaced and lower-cased
   f=$fp1:gs/\ /_/:l$fp2
   #the new name; anti-spaced and lower-cased
   f2=$f:gs/\ /_/:l
 
   if ! [[ $f = $f2 ]]
   then
 mv -v $f $f2
   fi
 done

Of course, your expansions do work (and they are portable, as they
work in every POSIX shell), but if you use zsh already, why not ':t'
and ':h', as they are easier to read, IMHO. :-)

Recursive globbing is just a wonderful feature, isn't it? :-)

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
  -- RFC 1925


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-21 Thread Thomas Jollans
Frank Terbeck wrote:
 Thomas Jollans [EMAIL PROTECTED]:
 [...]
 
 zsh, yay! :-)
 Just a few remarks.
 
 #!/bin/zsh

 FS=
 
 
 IFS, I suppose. But: Why do you set it?
ugh... good question. I wrote this ages ago ;-)

 
 for f in **/*
 
   for i in ./**/* # make f=./$f unneeded below.
 
 do
   #required for files in the current dir.
   f=./$f
   #dir of file
   fp1=${f%/*}/
 
 fp1={$f:h}# (think (h)ead)
 
   #name of file
   fp2=${f##*/}
 
 fp2=${f:t}# (think (t)ail)
 
   #dir should already be anti-spaced and lower-cased
   f=$fp1:gs/\ /_/:l$fp2
   #the new name; anti-spaced and lower-cased
   f2=$f:gs/\ /_/:l

   if ! [[ $f = $f2 ]]
   then
 mv -v $f $f2
   fi
 done
 
 Of course, your expansions do work (and they are portable, as they
 work in every POSIX shell), but if you use zsh already, why not ':t'
 and ':h', as they are easier to read, IMHO. :-)
 
 Recursive globbing is just a wonderful feature, isn't it? :-)
definitely.

Thanks for the comments :-)

Thomas




signature.asc
Description: OpenPGP digital signature


Re: Unix-ify File Names

2007-04-21 Thread Octavio Alvarez
On Sat, 21 Apr 2007 05:26:40 -0700, Thomas Jollans [EMAIL PROTECTED]  
wrote:

FS=



IFS, I suppose. But: Why do you set it?

ugh... good question. I wrote this ages ago ;-)


To make sure spaces in filenames don't break them apart?

--
Octavio.



Re: Unix-ify File Names

2007-04-21 Thread Frank Terbeck
Octavio Alvarez [EMAIL PROTECTED]:
 On Sat, 21 Apr 2007 05:26:40 -0700, Thomas Jollans [EMAIL PROTECTED] 
 wrote:
 FS=
 

 IFS, I suppose. But: Why do you set it?
 ugh... good question. I wrote this ages ago ;-)

 To make sure spaces in filenames don't break them apart?

Not an issue in default zsh setup.

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
  -- RFC 1925


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-19 Thread Mike McClain
Frank Terbeck [EMAIL PROTECTED] wrote:
 a) `ls *` is an _external_ process.
 b) it breaks on filenames with spaces (and other special characters).
 c) people commonly use 'ls --color' or 'ls -F' aliases for ls.
 There is _no_ reason why 'ls' should ever be used to generate file
 lists for loops of any kind.

Thanks, Frank, for the clear and comprehensive answer.

Just a little better educated now,
Mike


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-19 Thread Daniel Barclay

Frank Terbeck wrote:

Daniel B. [EMAIL PROTECTED]:

Frank Terbeck wrote:

Mike McClain [EMAIL PROTECTED]:

Frank Terbeck [EMAIL PROTECTED] wrote:


 for FILE in `ls *$1` ; do

...

b) it breaks on filenames with spaces (and other special characters).

... Using 'for i in `ls *`'-type loops breaks this and is one of the

main reasons why people think spaces are bad in filenames.
(They are not bad, ...

In what sense are they not bad?  Yes, they're certainly legal per the
filesystem and most tools that take filenames.  However, they and other
special characters do make it more difficult to handle arbitrary file
names.


No. They are never bad. It just takes a bit of practice to get used to
do things in a robust way.


But some common Unix tools aren't robust enough, in the sense of
providing consistent escape/encoding mechanisms to handle special
characters.

For example, Emacs' tags files use commas as delimiters, and (last I
knew) don't have an escape/encoding mechansim for representing a comma
_in_ a file name, so (again, last I knew) a Linux kernel file with
a comma in its name doesn't get processed right.


Some commands do provide fully general mechanisms.  (For example,
find's -print0 and xargs' -0 option can handle any possible file
pathname, including one with newline characters.)  However, many
commands do not.  That typically makes it very difficult to
handle special characters.








For example, if someone wants to use ls's feature of sorting by date
(e.g., ls -t *$1), they cant combine it with the for-loop construct
above (reliably).


Okay, I admit that sorting is one of the rare cases where

[snip]
find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; do
  ...
done
[snap]

or

...
loops are justified. 


I think you missed my point--the question of how to (or whether one
can) use for i in `...` to loop over a list of file names that are
output by some arbitrary program.

The particular example of starting with sorting by date with ls -d
has the solution of changing to an entirely different solution (using
find and sorting as above).

However, what about the general case?

It sounds like for i in `...` doesn't have an escaping/encoding
mechanism that is sufficient to handle both (unescaped) asterisks
that represent wildcards and escaped/encoded asterisks that represent
literal asterists.



Hey, is there any command for taking a filename and escaping/encoding
shell-special characters to make a string that, when parsed by the
shell, specifies that filename?  I'm thinking of something that would
work like this:

   for i in `encode_for_shell *` ; ...

[...]

No, that is not how shells work.


Maybe I gave the wrong kind of example (a for loop, which apparently
doesn't parse and interpreting things enough) for asking about an
encode command.

What about when one is building up a command string in a variable,
say CMD, and then executing the assembled command via $CMD?

The string contained in the variable is parsed as a normal command,
right?  So any logical string values that contain shell-special
characters needs to be encoded with the usual shell escape-sequence
syntax, right?

(E.g., if I want to delete a file named xx*yy, I would have to type
something like:

rm xx\*yy

on a manual command line, so if I wanted the command line

$CMD

to execute that same rm command, CMD would have to contain the
string rm xx\*yy (e.g., set by the command line:

   CMD=rm xx\\*yy

)

So if I were listing file names (e.g., with file -print0 and maybe some
further filtering with, say grep) and I wanted to assemble a command
that operated on the named files without interpreting any shell-special
characters in the file names when the assembled command line was parsed
by the shell and executed, I would need to map the actual file names to
the shell represention of those file names.

For example, if the list included the name xx*xx, an encoder could
map that string to the string xx\*xx (probably written xx\\*xx as
a literal in sh/bash/etc.), which could be appended to the command
string being assembled (and surrounded by whatever separators were
needed to separate it from earlier and later tokens in the command line).

My encode_for_shell command would be applicable to that case.

Is there any such command (or, say, built-in function)?



... But if you are writing real scripts, that are
supposed to work (with data, you potentially don't know in the first
place), you will need to do things in a proper and robust way.


Definitely.



Sorry for the lengthy mail. I hope I could make myself a little
clearer and didn't spread buggy code. :-)


No problem.  I agree with counteracting error-prone suggestions.

Daniel



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Unix-ify File Names

2007-04-19 Thread Frank Terbeck
Daniel Barclay [EMAIL PROTECTED]:
 Frank Terbeck wrote:
 Daniel B. [EMAIL PROTECTED]:
 Frank Terbeck wrote:
 Mike McClain [EMAIL PROTECTED]:
 Frank Terbeck [EMAIL PROTECTED] wrote:

  for FILE in `ls *$1` ; do
 ...
 b) it breaks on filenames with spaces (and other special characters).
 ... Using 'for i in `ls *`'-type loops breaks this and is one of the
 main reasons why people think spaces are bad in filenames.
 (They are not bad, ...
 In what sense are they not bad?  Yes, they're certainly legal per the
 filesystem and most tools that take filenames.  However, they and other
 special characters do make it more difficult to handle arbitrary file
 names.
 No. They are never bad. It just takes a bit of practice to get used to
 do things in a robust way.

 But some common Unix tools aren't robust enough, in the sense of
 providing consistent escape/encoding mechanisms to handle special
 characters.

 For example, Emacs' tags files use commas as delimiters, and (last I
 knew) don't have an escape/encoding mechansim for representing a comma
 _in_ a file name, so (again, last I knew) a Linux kernel file with
 a comma in its name doesn't get processed right.

So? Just because there are programs that limit the namespace of the
files they are working with (which is _absolutely_ okay), does not
mean, that shell scripts must obey to these programs' behaviours. The
shell itself can handle whitespace in filenames just fine. No need to
not use robust techniques, at all. It would be even worse, to use
techniques that will potentially break.

 Some commands do provide fully general mechanisms.  (For example,
 find's -print0 and xargs' -0 option can handle any possible file
 pathname, including one with newline characters.)  However, many
 commands do not.  That typically makes it very difficult to
 handle special characters.

Most programs do support filenames with special characters (if they
don't it is clearly a bug). They just depend that the shell gives them
the correct string.

Btw: xargs is not needed if your find binary is reasonably POSIX
compliant. Just use '+' instead of ';' with the -exec option. (Yes, I
know that GNU find didn't support this for quite some time.)

 For example, if someone wants to use ls's feature of sorting by date
 (e.g., ls -t *$1), they cant combine it with the for-loop construct
 above (reliably).
 Okay, I admit that sorting is one of the rare cases where
 [snip]
 find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; 
 do
   ...
 done
 [snap]
 or
 ...
 loops are justified. 

 I think you missed my point--the question of how to (or whether one
 can) use for i in `...` to loop over a list of file names that are
 output by some arbitrary program.

You just do not need that. And if you do, it's very hard to get right.
'for i in *' supports _every_ filename you could think of (including
filenames with newlines. Why use something, that is more expensive,
more error-prone and less powerful?

 The particular example of starting with sorting by date with ls -d
 has the solution of changing to an entirely different solution (using
 find and sorting as above).

find is just the tool you want to use for recursive actions on files
(or specialized actions, like sorting). find is an external program,
but it does not take a file list as argument, which makes it the
ultimate choice.

 However, what about the general case?

 It sounds like for i in `...` doesn't have an escaping/encoding
 mechanism that is sufficient to handle both (unescaped) asterisks
 that represent wildcards and escaped/encoded asterisks that represent
 literal asterists.

I don't think you really understand, what is happening here.

[snip]
% foo='bar\ baz' ; % for i in `echo $foo` ; do echo ($i) ; done
(bar\)
(baz)
[snap]

You _cannot_ escape things there. If you want to know what's going on,
consult the manual of your shell (or the respective POSIX document).
Normally, a section about 'Word Expansions' will describe what happens
in detail.

You see, this is not the type of thing, you want to teach beginners.
Hence, 'for i in `...`' loops should be avoided by beginners (did you
realize, that you dropped 'ls *glob*' from the backtick expression? If
you really know what you are doing, you can get these types of loops
more or less right. But _never_ _ever_ if the command is getting a
file list via globbing.)

 Hey, is there any command for taking a filename and escaping/encoding
 shell-special characters to make a string that, when parsed by the
 shell, specifies that filename?  I'm thinking of something that would
 work like this:

for i in `encode_for_shell *` ; ...
 [...]
 No, that is not how shells work.

 Maybe I gave the wrong kind of example (a for loop, which apparently
 doesn't parse and interpreting things enough) for asking about an
 encode command.

The parsing is done to the absolute normal rules of the shell, whether
you use a loop or not does not matter.

 What about when one is 

Re: Unix-ify File Names

2007-04-19 Thread Ken Irving
On Thu, Apr 19, 2007 at 09:05:56PM +0200, Frank Terbeck wrote:
 Daniel Barclay [EMAIL PROTECTED]:
  Some commands do provide fully general mechanisms.  (For example,
  find's -print0 and xargs' -0 option can handle any possible file
  pathname, including one with newline characters.)  However, many
  commands do not.  That typically makes it very difficult to
  handle special characters.
 
 Most programs do support filenames with special characters (if they
 don't it is clearly a bug). They just depend that the shell gives them
 the correct string.
 
 Btw: xargs is not needed if your find binary is reasonably POSIX
 compliant. Just use '+' instead of ';' with the -exec option. (Yes, I
 know that GNU find didn't support this for quite some time.)
 
Wow... never heard of this and was going to ask more about it, but I see
it's in the find(1) manpage post-sarge. I use find a lot, xargs only
when it seems necessary, but the standard response to someone using find
has been that it's bad due to spawning umpteen processes. Looks like
that's no longer the case! 

Hmm, -execdir looks new as well, and very useful...

Thanks!

Ken

-- 
Ken Irving


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-18 Thread Frank Terbeck
Mike McClain [EMAIL PROTECTED]:
 Frank Terbeck [EMAIL PROTECTED] wrote:

for FILE in `ls *$1` ; do
 
  Please don't teach beginners to do for loops like this. It's broken in
  various ways. Just do:
 
for FILE in *$1 ; do
 

 Being a self taught script writer I just have to ask what are the
 'various ways' in which the first form is broken?

a) `ls *` is an _external_ process.

And performance is not the main reason we this is bad.
It would have to be a lot of forks to make a notable difference.

However, the big problem here is, that you can only pass a limited
number of arguments to an external program. This limit is reached
quicker than you might think, and it is one of the features of
for-loops to _overcome_ this limitation.

b) it breaks on filenames with spaces (and other special characters).

While newlines and other special characters might be rather weird
for filenames, spaces are perfectly okay and normal in filenames.

Using 'for i in `ls *`'-type loops breaks this and is one of the
main reasons why people think spaces are bad in filenames.
(They are not bad, some people just do not know how to handle
them properly.)

c) people commonly use 'ls --color' or 'ls -F' aliases for ls.

This is not a bad thing in the first place because it helps to
simplify the overview you get from ls.

However, it has a bad impact on scripting:
[snip]
% for i in `ls -F /bin/*sh` ; do stat $i ; done
stat: cannot stat `/bin/sh@': No such file or directory
% for i in `ls --color /bin/sh` ; do stat $i ; done
stat: cannot stat `\033[00m\033[01;36m/bin/sh\033[00m': No such file or 
directory
stat: cannot stat `\033[m': No such file or directory
[snap]

Using '--color=auto' instead of just '--color' helps a little
with problem, but not everyone is aware of it.

Yes, I know that aliases are normally not enabled in scripts, but
for-loops are very handy as one-liners, so this _is_ indeed a
problem.

These are the main reasons that come to my mind. There might be
others. I am aware that there are HOWTOs and other documents out there
that propagate 'for i `ls *foobar*`' loops. I don't know why their
authors do this. If they didn't know better they shouldn't have
written a shell scripting HOWTO in the first place.

Some people use things like this instead:
[snip]
ls * | while read file ; do whatever_command $file ; done
[snap]

This is just a little better than the for loop. It still breaks in
some situations. Also 'for i in * ; do foo $i ; done' is much
clearer, shorter and simpler to understand.

There is _no_ reason why 'ls' should ever be used to generate file
lists for loops of any kind.
Whoever created this myth should be hung. :-)

Oh, and doing the following will break in certain situations as well
(which is something people do for recursive actions):
[snip]
find . -name '*' | while read file ; do foobar $file ; done
[snap]

If you need to do recursive actions, learn to use find(1) properly
(with its '-exec' option), or switch to a shell, that can do it by
itself (like zsh, for example).

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
  -- RFC 1925


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-18 Thread Daniel B.

Frank Terbeck wrote:

Mike McClain [EMAIL PROTECTED]:

Frank Terbeck [EMAIL PROTECTED] wrote:


 for FILE in `ls *$1` ; do

...


b) it breaks on filenames with spaces (and other special characters).

... Using 'for i in `ls *`'-type loops breaks this and is one of the

main reasons why people think spaces are bad in filenames.
(They are not bad, ...


In what sense are they not bad?  Yes, they're certainly legal per the
filesystem and most tools that take filenames.  However, they and other
special characters do make it more difficult to handle arbitrary file
names.

For example, if someone wants to use ls's feature of sorting by date
(e.g., ls -t *$1), they cant combine it with the for-loop construct
above (reliably).



Hey, is there any command for taking a filename and escaping/encoding
shell-special characters to make a string that, when parsed by the
shell, specifies that filename?  I'm thinking of something that would
work like this:

   for i in `encode_for_shell *` ; ...

(mapping each argument to a shell string for the argument's value)
or

   for i in `find ... -print0 | xargs -0 encode_for_shell` ; ...

or

   cmd=some_command
   cmd=${cmd} `encode_for_shell $file_name_with_special_chars`
   $cmd

(I'm thinking of something like Java's java.util.regex.Pattern.quote(String)
(see
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)
) or Ruby's RegExp::escape(...)
(see http://www.ruby-doc.org/core/classes/Regexp.html#M001216 ), but
escaping/encoding for shell parsing instead of for regular-expression
parsing.)



 some people just do not know how to handle them properly.)

You might not be, but it sounds like you're blaming users.  Sometimes
it's developers of tools (including designers of formats) that don't
have an escape mechanism to handle spaces or other special characters
(or don't provide support for encoding special characters) who are to
blame.



I am aware that there are HOWTOs and other documents out there
that propagate 'for i `ls *foobar*`' loops. I don't know why their
authors do this. If they didn't know better they shouldn't have
written a shell scripting HOWTO in the first place.


Unfortunately for those they mislead, those authors don't know enough
to know they don't know better.  (They must not be the type to dig
into things (e.g., shell syntax) to really understand them, or at
least enough to notice that they don't fully understand them yet.)



Some people use things like this instead:
[snip]
ls * | while read file ; do whatever_command $file ; done
[snap]

This is just a little better than the for loop. It still breaks in
some situations. 


I see how it would break with a newline character in a file name.
What other cases break?


There is _no_ reason why 'ls' should ever be used to generate file
lists for loops of any kind.


What about things that ls does that the shell's expansion of wildcards
does not do (e.g., sorting by date or size)?

(Maybe ls should have an equilavent to find's -print0 option.)



Daniel


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Unix-ify File Names

2007-04-18 Thread H.S.

Frank Terbeck wrote:


b) it breaks on filenames with spaces (and other special characters).

While newlines and other special characters might be rather weird
for filenames, spaces are perfectly okay and normal in filenames.

Using 'for i in `ls *`'-type loops breaks this and is one of the
main reasons why people think spaces are bad in filenames.
(They are not bad, some people just do not know how to handle
them properly.)


I usually get by this problem by enclosing the variable in double quotes 
within the for loop. A basic example:


$ for f in *.jpg; do ls $f; done

-HS




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Unix-ify File Names

2007-04-18 Thread Frank Terbeck
Daniel B. [EMAIL PROTECTED]:
 Frank Terbeck wrote:
 Mike McClain [EMAIL PROTECTED]:
 Frank Terbeck [EMAIL PROTECTED] wrote:

  for FILE in `ls *$1` ; do
 ...
 b) it breaks on filenames with spaces (and other special characters).
 ... Using 'for i in `ls *`'-type loops breaks this and is one of the
 main reasons why people think spaces are bad in filenames.
 (They are not bad, ...

 In what sense are they not bad?  Yes, they're certainly legal per the
 filesystem and most tools that take filenames.  However, they and other
 special characters do make it more difficult to handle arbitrary file
 names.

No. They are never bad. It just takes a bit of practice to get used to
do things in a robust way.

 For example, if someone wants to use ls's feature of sorting by date
 (e.g., ls -t *$1), they cant combine it with the for-loop construct
 above (reliably).

Okay, I admit that sorting is one of the rare cases where

[snip]
find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2 | while IFS= read -r ; do
  ...
done
[snap]

or

[snip]
IFS='
'
for i in `find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2` ; do
  ...
done
[snap]

loops are justified. At least in POSIX shell. I really didn't think of
sorting in my original mail. Thanks for noting. (But still you don't
use broken for loops.)

Note, that the for loop does _not_ use an external program with
globbing. And it only works with spaces, because of the changed $IFS
parameter.  This may lead to unexpected results if it is not reset to
it's old value inside of the loop.

However, Bash, ksh and zsh users may still overcome this:

[snip]
oifs=$IFS
IFS='
'
set -- x $(find . -printf '%Ts:%p\n' | sort -rn | cut -d: -f2)
IFS=$oifs
shift
while [ -n $1 ] ; do
  echo file: $1
  shift
done
[snap]

This will _not_ work in a pure POSIX shell like dash, as it only
permits 10 positional parameters; those shells will indeed have to
used a while loop fed by find(1) (like I noted above).

Of course, this breaks with newline characters in filenames, but
newlines are really uncommon (probably on left on a system by users
who don't want their files to be deleted. :-)).

And in zsh, you would actually do:
[snip]
for i in **/*(om) ; do foobar $i ; done
[snap]

Yes, zsh does recursive globbing and lets you define the sorting of
the generated file list.

Its really a pity that find(1) does not allow sorting by itself (and
if it was only by a handful of criteria).

But we are slowly leaving the topic, here. I just wanted to make sure
that beginners are not confronted with problematic for-loop constructs
like in the first mail I was replying to. Manipulating $IFS is
probably not something to confront beginners with either.

 Hey, is there any command for taking a filename and escaping/encoding
 shell-special characters to make a string that, when parsed by the
 shell, specifies that filename?  I'm thinking of something that would
 work like this:

for i in `encode_for_shell *` ; ...
[...]

No, that is not how shells work.
Just to repeat this once and for all:
_Never_ do 'for i in `ls *`'. Never. It's broken.

  some people just do not know how to handle them properly.)

 You might not be, but it sounds like you're blaming users.  Sometimes
 it's developers of tools (including designers of formats) that don't
 have an escape mechanism to handle spaces or other special characters
 (or don't provide support for encoding special characters) who are to
 blame.

Well, the shell is really really old. It has its flaws. That is why it
is not that easy to use and understand for beginners. Especially, if
they are taught how to do things wrong, that often. I admit that it
can be quite difficult to do things right[tm]. I'm making mistakes
when scripting in 'sh' all the time (at least if the script is a
little more than trivial).

[...]
 Some people use things like this instead:
 [snip]
 ls * | while read file ; do whatever_command $file ; done
 [snap]
 This is just a little better than the for loop. It still breaks in
 some situations. 

 I see how it would break with a newline character in a file name.
 What other cases break?

Broken aliases.
Too long argument lists. Yeah, 'ls | while ...' does not have the
argument problem, but as soon as you start globbing, it's there.

 There is _no_ reason why 'ls' should ever be used to generate file
 lists for loops of any kind.

 What about things that ls does that the shell's expansion of wildcards
 does not do (e.g., sorting by date or size)?

 (Maybe ls should have an equilavent to find's -print0 option.)

In these cases, you use find(1) (in conjunction with other standard
tools, like sort, cut etc.).


Please note, that what I am writing here are no must-dos, of course.
I do not intend to attack anybody. I mean, there are people who know
POSIX shell scripting far better than I do, so who am I to judge
others? But 'for i in `ls *`' is really annoyingly wrong, even in my
eyes. :-)

So, sometimes, when you are writing one-liners, at the 

Re: Unix-ify File Names

2007-04-18 Thread Frank Terbeck
H.S. [EMAIL PROTECTED]:
 Frank Terbeck wrote:

 b) it breaks on filenames with spaces (and other special characters).
 While newlines and other special characters might be rather weird
 for filenames, spaces are perfectly okay and normal in filenames.
 Using 'for i in `ls *`'-type loops breaks this and is one of the
 main reasons why people think spaces are bad in filenames.
 (They are not bad, some people just do not know how to handle
 them properly.)

 I usually get by this problem by enclosing the variable in double quotes 
 within the for loop. A basic example:

 $ for f in *.jpg; do ls $f; done

Yeah, you are using the for-loop construct absolutely right.
I was arguing about

  for i in `ls *.jpg` ; do whatever $i ; done

And you are, of course, right, that in POSIX shells, parameters should
be double-quoted when used in almost every case, unless you know that
you want splitting by $IFS.

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
  -- RFC 1925


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-17 Thread Frank Terbeck
Jeff D [EMAIL PROTECTED]:
  On Tue, 17 Apr 2007, Masatran, R. Deepak wrote:
  Since I frequently receive files from Microsoft Windows users, is there any
  utility to unix-ify file names, that is, use lower case exclusively, use
  hyphen as separator, etc.?
[...]
  #!/bin/sh
  #change spaces to hyphens
  rename 's/\ /-/g' *$1
 
  #uppercase to lower
  for FILE in `ls *$1` ; do

Please don't teach beginners to do for loops like this. It's broken in
various ways. Just do:

  for FILE in *$1 ; do

  filename=`basename $FILE`
  newfile=`echo $filename | tr A-Z a-z`
  if [ $filename != $n ] ; then
  mv $filename $newfile

Maybe adding '-i' to the mv call would be a good idea to avoid
accidentally overwriting existing files.

  fi
  done
[...]

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
  -- RFC 1925


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-17 Thread Octavio Alvarez
On Mon, 16 Apr 2007 20:24:57 -0700, Masatran, R. Deepak  
[EMAIL PROTECTED] wrote:


Since I frequently receive files from Microsoft Windows users, is there  
any

utility to unix-ify file names, that is, use lower case exclusively, use
hyphen as separator, etc.?


I use something like this:

#!/bin/bash

OLD_FILE_NAME=$1

# You might want to add a y/ rule to remove accents.
NEW_FILE_NAME=`echo ABC DEF | sed -e '\
s/ /-/g;
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;
'`

mv $OLD_FILE_NAME $NEW_FILE_NAME




--
Octavio.



Re: Unix-ify File Names

2007-04-17 Thread Leonid Grinberg

Or, in Perl (might as well):

#!/usr/bin/perl -w

use strict;

opendir(DIR, system('pwd'));
my @files = readdir(DIR);
closedir(DIR);

my $new_name;

foreach (@files)
{
 $new_name = lc($_);
 $new_name =~ s/\ /\-/g;
 system('mv -i $_ $new_name');
}


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Unix-ify File Names

2007-04-17 Thread Leonid Grinberg

opendir(DIR, system('pwd'));


Sorry, that should be:

opendir(DIR, `pwd`);

` returns output. system() does not.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Unix-ify File Names

2007-04-17 Thread Ken Irving
On Tue, Apr 17, 2007 at 09:23:19AM -0700, Leonid Grinberg wrote:
 opendir(DIR, system('pwd'));
 
 Sorry, that should be:
 
 opendir(DIR, `pwd`);
 
 ` returns output. system() does not.

Or just use '.' as the directory name.

-- 
Ken Irving


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-17 Thread Mike McClain
Frank Terbeck [EMAIL PROTECTED] wrote:

   for FILE in `ls *$1` ; do
 
 Please don't teach beginners to do for loops like this. It's broken in
 various ways. Just do:
 
   for FILE in *$1 ; do


Being a self taught script writer I just have to ask what are the
'various ways' in which the first form is broken?

Anticipating enlightenment,
Mike


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Unix-ify File Names

2007-04-17 Thread Roberto C . Sánchez
On Tue, Apr 17, 2007 at 03:36:26PM -0700, Mike McClain wrote:
 Frank Terbeck [EMAIL PROTECTED] wrote:
 
for FILE in `ls *$1` ; do
  
  Please don't teach beginners to do for loops like this. It's broken in
  various ways. Just do:
  
for FILE in *$1 ; do
 
 
 Being a self taught script writer I just have to ask what are the
 'various ways' in which the first form is broken?
 
The biggest one I can see is that it spawns an entire process when none
is needed.

Regards,

-Roberto

-- 
Roberto C. Sánchez
http://people.connexer.com/~roberto
http://www.connexer.com


signature.asc
Description: Digital signature


Re: Unix-ify File Names

2007-04-17 Thread Jeff D

On Tue, 17 Apr 2007, Roberto C. S?nchez wrote:


On Tue, Apr 17, 2007 at 03:36:26PM -0700, Mike McClain wrote:

Frank Terbeck [EMAIL PROTECTED] wrote:


 for FILE in `ls *$1` ; do


Please don't teach beginners to do for loops like this. It's broken in
various ways. Just do:

  for FILE in *$1 ; do



Being a self taught script writer I just have to ask what are the
'various ways' in which the first form is broken?


The biggest one I can see is that it spawns an entire process when none
is needed.

Regards,

-Roberto

--
Roberto C. S?nchez
http://people.connexer.com/~roberto
http://www.connexer.com


Duly noted, makes perfect sense.  Bad habits are hard ones to break.

-+-
8 out of 10 Owners who Expressed a Preference said Their Cats Preferred Techno.

Unix-ify File Names

2007-04-16 Thread Masatran, R. Deepak
Since I frequently receive files from Microsoft Windows users, is there any
utility to unix-ify file names, that is, use lower case exclusively, use
hyphen as separator, etc.?

-- 
Masatran, R. Deepak http://research.iiit.ac.in/~masatran/


pgpBGtYS9v62i.pgp
Description: PGP signature


Re: Unix-ify File Names

2007-04-16 Thread Jeff D

On Tue, 17 Apr 2007, Masatran, R. Deepak wrote:


Since I frequently receive files from Microsoft Windows users, is there any
utility to unix-ify file names, that is, use lower case exclusively, use
hyphen as separator, etc.?

--
Masatran, R. Deepak http://research.iiit.ac.in/~masatran/


Not directly, but it's easy enough to do:

#!/bin/sh
#change spaces to hyphens
rename 's/\ /-/g' *$1

#uppercase to lower
for FILE in `ls *$1` ; do
filename=`basename $FILE`
newfile=`echo $filename | tr A-Z a-z`
if [ $filename != $n ] ; then
mv $filename $newfile
fi
done


---
$ ls
New Files.TXT  SOME New Files.TXT
$ sh ~/unixfy.sh TXT
$ ls
new-files.txt  some-new-files.txt


-+-
8 out of 10 Owners who Expressed a Preference said Their Cats Preferred Techno.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]