Re: unix or mac-style text files?

2002-12-03 Thread Ken Williams

On Monday, November 25, 2002, at 07:23  AM, Chris Nandor wrote:


In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Dan Kogai) wrote:


On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote:

The bottom line was that it'd be nice to have a PerlIO filter for perl
5.8.x, so that MacPerl can execute Unix and Windows text files, and
Mac OS X
perl can execute Mac OS text files, etc.  Patches are surely welcome!
:-)


One good question may be how to handle newlines in heretext, the only
part that really matters because that's the only exception to the fact
that newlines are nothing but whitespace from perl compiler's point of
view -- oops, shebang is another.  When you feed MacPerl *.pl to MacOS
X, should linefeeds in heretext emit \015 or \012?


I am talking here about taking (for example) a perl program with Mac OS
newlines, and making it run under Unix perl.  In order for that to 
happen,
you need to translate all the CRs to LFs.  That would include the CRs 
in the
heretext, as well as in every literal string.

[revisiting an old thread]

I don't think it's really a good idea to translate newlines in string 
literals (let's lump heretext in with string literals, since that's how 
they function).  That stuff is part of the data of a program, not part 
of the instruction set.

So by doing one mass CR-LF conversion blindly, you'd get the program to 
run, but it would run differently given the exact same data input.  I 
don't think that's desirable.  It's quite useful to have \n and 
File::Spec-catfile() and so on mean different things on different 
platforms, but literal characters changing themselves seems like quite 
another matter.

 -Ken



Re: unix or mac-style text files?

2002-12-03 Thread Chris Nandor
In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Ken Williams) wrote:

 I don't think it's really a good idea to translate newlines in string 
 literals (let's lump heretext in with string literals, since that's how 
 they function).  That stuff is part of the data of a program, not part 
 of the instruction set.

 So by doing one mass CR-LF conversion blindly, you'd get the program to 
 run, but it would run differently given the exact same data input.  I 
 don't think that's desirable.

I disagree.  We've been doing this for years on Mac OS without problem.  
Whenever I unpack a tarball or fetch a file via FTP or HTTP, my programs are 
doing mass/blind newline conversions on text files.  It's long been accepted 
as the Right Thing, and it only rarely causes problems.

And on the contrary, it would cause major problems to do it the other way, 
not only in terms of effort (Yes, you downloaded the file via FTP as text, 
and it converted the newlines from Unix to Mac, but you need to go back and 
convert the newlines in string literals back into Unix newlines), but also 
in the simple fact that it would rarely be what we want.  When you do a here 
doc, 99.99% of the time you want native newlines in there.

The basic tenet is that if you embed an actual newline anywhere at all in 
your code, it is a logical newline, no matter where it is or what it is 
doing, and it should be converted to the native format of whatever the 
target platform is.  If you want a literal \012, then you should encode it 
as \012 or \0xA or \cJ.

-- 
Chris Nandor  [EMAIL PROTECTED]http://pudge.net/
Open Source Development Network[EMAIL PROTECTED] http://osdn.com/



Re: unix or mac-style text files?

2002-11-27 Thread Tels
-BEGIN PGP SIGNED MESSAGE-

Moin,

such as sharing code between Windows and Unix perl over NFS

Uh, I do this for years and ActiveState Perl doesn't seem to have
any problem with my Unix files. Where exactly is the problem? *puzzled*

(Granted, I never used Mac OS, but I got a report that my stuff works on
Mac OS, and somehow I doubt the the user in question converted all the
source code beforehand...hm, must ask him. Good point)

Cheers,

Tels


- -- 
 perl -MDev::Bollocks -le'print Dev::Bollocks-rand()'
 challengingly facilitate synergistic models

 http://bloodgate.com/perl   My current Perl projects
 PGP key available on http://bloodgate.com/tels.asc or via email

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPeP90ncLPEOTuEwVAQHIgQf8COL4H11GgNNJKd9GzBZziBaaen5GTeIu
gHoL7CSsqgpLqdJ0WcmHmONMn1UCO1ywR0JtMDT1IVdF+7iuthnELKz6RaMwbepL
CCiASeufG3IAXmaC195M3CZoc+O2PVfWA/WK8hgFgdxOqLT0pRPawkvB3PpxQBBo
72yMBEvbXhlvw9RubX5ddOoiy/TAW8T0UYX6nKmCDWZIi2ySdDZdQkzz5UbSpICh
52TH0v7b4eWEJ5u9mvjn/17j+cCfLnFFin9uxjx/pvYtEoMne0O0mlUN77tElRj5
JO8Mb9/9Da13r/uJguzE73nfaKLwZJwCmr3ko8Dv23eWFQhhQHQlnQ==
=Z3SR
-END PGP SIGNATURE-



Re: unix or mac-style text files?

2002-11-27 Thread Tels
-BEGIN PGP SIGNED MESSAGE-

Moin,

On 27-Nov-02 Chris Nandor carved into stone:
 At 00:03 +0100 2002.11.27, Tels wrote:
such as sharing code between Windows and Unix perl over NFS

Uh, I do this for years and ActiveState Perl doesn't seem to have
any problem with my Unix files. Where exactly is the problem? *puzzled*
 
 As noted in previous posts, there is a (unfinished) feature in older
 versions of perl that allows Unix perl to execute CRLF files, and Windows
 perl to execute LF files.  But it does not allow MacPerl to execute CRLF
 or LF files, and does not allow Unix or Windows perl to execute CR files.

Ah, thanx for the explanation.

(Granted, I never used Mac OS, but I got a report that my stuff works on
Mac OS, and somehow I doubt the the user in question converted all the
source code beforehand...hm, must ask him. Good point)
 
 Almost all tools for Mac OS unpack source with the proper newlines,
 either automatically or on request.  If the user used Stuffit or
 somesuch, they can set an option to translate newlines.  If they used
 MacPerl's module installation tools, then it asks you if you want to
 convert newlines.

Okay, but so sharing the stuff over a drive (is this easily possible with
Macß) would be one problem case?

Cheers,

Tels

- -- 
 perl -MDev::Bollocks -le'print Dev::Bollocks-rand()'
 enthusiastically compete proactive appliances

 http://bloodgate.com/perl   My current Perl projects
 PGP key available on http://bloodgate.com/tels.asc or via email

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPeUDEXcLPEOTuEwVAQFyTwf/SN7nsAV792P8xGrvNswsp6dVpXSacFcB
VsJfEeOPEmbVhpkiECTj4dz66XwbQSLerVTjmF3kEBk0ZIX0ohtcEiLQm9dwEer2
0Aga44DrCAX4MxNLRcRCpOLGGRgwhsJR3HZBfbpTuYkOIolzyOZKEw0qpWTNA8v3
SuZJjm80/clX6bpQliz8skklqjUbZPr1ccw5bGwqzsAQWcX56IfVVVxi3vVhlibc
fX6vqElcfkGWM98VanKHNmq3eZmw5jk0kjiVuCIH1swUsTxxHitZJlxH72etdfI9
1o/YJ8g0yL93UZZElL8Lek0lQ6riBE9dGxqS39ntWNUrsQ0tg9Wk/A==
=8Nnf
-END PGP SIGNATURE-



Re: unix or mac-style text files?

2002-11-25 Thread Michael Maibaum
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Sunday, November 24, 2002, at 05:21 PM, Ken Williams wrote:



On Monday, November 25, 2002, at 07:34  AM, Heather Madrone wrote:


Administrivia question:  I'm getting a lot of duplicate responsese
because the Reply-to on the list is set to sender.  On moderated
lists, this can be a good idea because the approval cycle causes
a lag between posting and mail reflection.

Is the Reply-to merely a hint that we should consider taking topics
offline, or is there some reason I should be leaving redundant 
addresses
in the headers?

The extra copies are more for your convenience - I appreciate when 
people send them to me, because one copy goes to my list mailbox and 
the other goes to my inbox.  The one in my inbox will be read faster.

I wish there were a standard way to indicate in your own mail headers 
I do/don't wish to receive a direct copy of replies to this message. 
 This can be done on usenet pretty effectively, but not really in 
email lists.



There is the Mail-Follow-Up-To header, unfortunately AFAIK mutt [1] is 
the only client to respect it, or provide methods to set it according 
to your preference.


FWIW, I object to reply to munging [2]

Michael

[1] http://www.mutt.org/doc/manual/manual-6.html#followup_to
[2] http://www.unicom.com/pw/reply-to-harmful.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.7 (Darwin)

iD8DBQE94Xxjilk3LUlIL0MRAvJ+AJ4iP8wsBGQL85PopAyF81kfT9S2HQCgvKdg
7ayC3QS0+aKFlFYOX0LcCkk=
=nqEg
-END PGP SIGNATURE-



Re: unix or mac-style text files?

2002-11-25 Thread Charles Albrecht
On Monday, November 25, 2002, at 07:34  AM, Heather Madrone wrote:

Administrivia question:  I'm getting a lot of duplicate responsese
because the Reply-to on the list is set to sender.  On moderated
lists, this can be a good idea because the approval cycle causes
a lag between posting and mail reflection.

Is the Reply-to merely a hint that we should consider taking topics
offline, or is there some reason I should be leaving redundant addresses
in the headers?

More to the point, this list doesn't set Reply-To at all. There's a great 
deal of discussion at large about whether this is a good idea or not, but 
by-and-large, the From, To and Cc that come through are the same ones the 
Sender originally used.

At 12:21 PM +1100 11/25/2002, Ken Williams replied:

The extra copies are more for your convenience - I appreciate when people send them to 
me, because one copy goes to my list mailbox and the other goes to my inbox.  The one 
in my inbox will be read faster.

I wish there were a standard way to indicate in your own mail headers I do/don't wish 
to receive a direct copy of replies to this message.  This can be done on usenet 
pretty effectively, but not really in email lists.

Well, on lists like this one that don't munge the Reply-To header, if you 
designate a Reply-To on the outgoing mail, it should remain intact all 
the way to the end recipients.

-Charles
 Euonymic Solutions
 [EMAIL PROTECTED]



Re: unix or mac-style text files?

2002-11-25 Thread Ronald J Kimball
On Mon, Nov 25, 2002 at 02:43:46AM +0900, Dan Kogai wrote:
 On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote:
 The bottom line was that it'd be nice to have a PerlIO filter for perl
 5.8.x, so that MacPerl can execute Unix and Windows text files, and 
 Mac OS X
 perl can execute Mac OS text files, etc.  Patches are surely welcome!  
 :-)
 
 One good question may be how to handle newlines in heretext, the only 
 part that really matters because that's the only exception to the fact 
 that newlines are nothing but whitespace from perl compiler's point of 
 view -- oops, shebang is another.

Newlines also serve as comment terminators; the perl compiler must
recognize either \012 or \015 as the end of a comment.

(I recall struggling to demonstrate some code during a Perl Mongers
meeting.  The script ran, but it didn't produce any output, nor did it
produce any error messages!  I finally figured out that perl on the OS X
machine was seeing the whole script as one long comment, starting with
#!/usr/bin/perl ...  :)

Ronald



Re: unix or mac-style text files?

2002-11-25 Thread Slaven Rezic
Chris Nandor [EMAIL PROTECTED] writes:

 In article [EMAIL PROTECTED],
  [EMAIL PROTECTED] (Dan Kogai) wrote:
 
  On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote:
   The bottom line was that it'd be nice to have a PerlIO filter for perl
   5.8.x, so that MacPerl can execute Unix and Windows text files, and 
   Mac OS X
   perl can execute Mac OS text files, etc.  Patches are surely welcome!  
   :-)
  
  One good question may be how to handle newlines in heretext, the only 
  part that really matters because that's the only exception to the fact 
  that newlines are nothing but whitespace from perl compiler's point of 
  view -- oops, shebang is another.  When you feed MacPerl *.pl to MacOS 
  X, should linefeeds in heretext emit \015 or \012?
 
 I am talking here about taking (for example) a perl program with Mac OS 
 newlines, and making it run under Unix perl.  In order for that to happen, 
 you need to translate all the CRs to LFs.  That would include the CRs in the 
 heretext, as well as in every literal string.
 
 
 
  I am not sure which is lazier to simply apply
  
  # Any - Unix
  perl -i.bak -ple 's/\015\012|\015|012/\012/g' *.pl
  # Any - Mac
  perl -i.bak -ple 's/\015\012|\015|012/\015/g' *.pl
  
  or teach camel the same trick
 
 One of the main points of this is that some people will want the same files 
 to be used in more than one context, such as sharing code between Windows 
 and Unix perl over NFS, or sharing code between perl on Mac OS X and MacPerl 
 under Mac OS or Classic.  Right now, the only solution is to make copies, as 
 you suggest.
 

Or use source filters:

package Filter::Any2Unix;
use Filter::Util::Call;
sub import {
if ($^O ne 'MacOS') {
filter_add(sub {
   my($status) = filter_read();
   if ($status  0) {
   s/\015\012|\015|\012/\012/g;
   }
   }
  );
}
}

and then call the script with 

   perl -MFilter::Any2Unix script.pl

or embed use Filter::Any2Unix into the script.

Regards,
Slaven

-- 
Slaven Rezic - [EMAIL PROTECTED]

Tk-AppMaster: a perl/Tk module launcher designed for handhelds
http://tk-appmaster.sf.net



Re: unix or mac-style text files?

2002-11-25 Thread Chris Nandor
In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Slaven Rezic) wrote:

Could this be made even more generic, but translating to \n instead of \012?

 Or use source filters:
 
 package Filter::Any2Unix;

Any2Native?

 use Filter::Util::Call;
 sub import {
 if ($^O ne 'MacOS') {

#?

   filter_add(sub {
  my($status) = filter_read();
  if ($status  0) {
  s/\015\012|\015|\012/\012/g;

/\n/g ?

  }
  }
 );
 }

#?

 }
 
 and then call the script with 
 
perl -MFilter::Any2Unix script.pl
 
 or embed use Filter::Any2Unix into the script.

That shouldn't work.  By the time you get to it in the script, if you have a 
#! line, then the entire script is one long comment, and the use() line 
won't ever be executed.

-- 
Chris Nandor  [EMAIL PROTECTED]http://pudge.net/
Open Source Development Network[EMAIL PROTECTED] http://osdn.com/



Re: unix or mac-style text files?

2002-11-25 Thread Rafael Garcia-Suarez
Chris Nandor [EMAIL PROTECTED] wrote:
 
 That shouldn't work.  By the time you get to it in the script, if you have a 
 #! line, then the entire script is one long comment, and the use() line 
 won't ever be executed.

That would be an argument for allowing -M/-m on the #! line.



Re: unix or mac-style text files?

2002-11-25 Thread Chris Nandor
In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Heather Madrone) wrote:

 At 11:05 AM 11/24/2002 -0500, Chris Nandor wrote:
 But back to the point: there's been some discussion in this threa on 
 workarounds, but my personal feeling is that this is a bug, or at best a 
 broken feature, in perl.  Some time ago, the capability was added to perl to 
 recognize and filter CRLF files to work on Unix and LF to work on Windows 
 (grep for PERL_STRICT_CR in toke.c).  However, this functionality was not 
 extended to CR files, as it should have been, IMO.  
 
 I think you're right.  It's easier to move back and forth from
 Windows to Solaris than it is to move from one side of the Mac
 house to the other.  This is undoubtedly broken, not just in
 perl, but on the Macintosh in general.

Well, I'd say it is only broken in perl because there is some support for 
it, but it is limited only to certain platforms.  Otherwise I'd call it a 
woefully missing feature.

I don't think it is, in the general case, broken on the Mac, however.  They 
can't just abandon CR, and they shouldn't have stuck with CR instead of 
moving to LF.  And CR itself wasn't broken to begin with.  They really 
didn't have many options; that is to say, the brokenness we encounter 
because of the CR/LF differences are not indicative of a brokenness in the 
OS, but just unfortunate confluence of events.


 Personally, I think that Apple would be wise to move to the Unix
 standard for text files.  It would take several releases of
 confusion to do it, but that would be better than carrying 
 forward this schizophrenia to future OS generations.

It has moved to the Unix standard.  Many apps, however, have not entirely 
made the adjustment.


 While they're at it, they might drop file resource forks.

Again, they essentially have.  They are still supported because, as with the 
CR issue, they cannot just abandon them.  But most apps do not have them; 
instead, the resource data is in separate files inside the packages.  I 
don't imagine support for resource forks will be dropped any time soon, but 
resource forks aren't really used by new apps.

  [pudge@bourque]$ perl -MFile::Find -MMac::Files -e 'find(sub{my $f = 
  $File::Find::name; return if ! -f || -l; my $catf = FSpGetCatInfo($_); 
  printf %s : %d\n, $f, $catf-ioFlRLgLen if $catf-ioFlRLgLen}, shift)' 
  /Applications/

The above one-liner prints out the size of the resource fork in every file 
under /Applications/ (ioFlRLgLen is the logical length of the resource fork, 
while ioFlLgLen is the logical end of the data fork; the -s file test 
operator and other file utilities, in perl and in Unix, only display the 
data fork size, so it should always be the case that -s $f == 
$catf-ioFlLgLen).

Out of all my apps in there, I got hits in maybe a dozen or so, and the only 
*Apple* apps were iMovie and DVD Player.  It's fairly clear that resource 
forks are being used less, and I imagine Apple is discouraging their use, 
since they are no longer needed.


 If Apple doesn't want to give up its own peculiar file formats,
 then they ought to fix their Unix so it handles Macintosh files
 sensibly.

Apple assumes -- for right or wrong -- that people who use the Unix side of 
things will be able to figure out how to deal with the resource forks, the 
newlines, etc. (with tools such as CpMac, ditto)  Let's face it: the Unix 
user side of things is relatively minor in priority to most other things in 
the OS.  And really, it should be that way: it is used relatively little and 
its users are smart enough to figure out workarounds.  Life sucks sometimes.  
;-)

-- 
Chris Nandor  [EMAIL PROTECTED]http://pudge.net/
Open Source Development Network[EMAIL PROTECTED] http://osdn.com/



Re: unix or mac-style text files?

2002-11-25 Thread Nicholas Clark
On Mon, Nov 25, 2002 at 02:33:45PM +0100, Rafael Garcia-Suarez wrote:
 Chris Nandor [EMAIL PROTECTED] wrote:
  
  That shouldn't work.  By the time you get to it in the script, if you have a 
  #! line, then the entire script is one long comment, and the use() line 
  won't ever be executed.
 
 That would be an argument for allowing -M/-m on the #! line.

Er, except that the #! line would all have been read by then, and treated
as a comment. Or have I got things confused?

 (that's 3 dots, perl.org smtp daemon)

the kernel parses the -M line and invokes perl with those -M options.
then perl runs and reaches the -M line again, and now we just need it not
to complain like it currently does.

I hoped it would be possible to hack round it in some way, relying on
\r being whitespace, so that

#!/usr/local/bin/perl -w
-MFilter if 0;

would behave as a no-op on a system with matching \n, and as 

#!/usr/local/bin/perl -w -MFilter if 0;

on a system where \n and \r are transposed, but I can't make it work.

Nicholas Clark

PS I need to dig it out of the archives for a second time, but nothing came
   of my #! line \r\n protector that works on everything it was tested on
   (Linux, FreeBSD, Solaris - so hopefully all SysV, BSD* and Linux)



Re: unix or mac-style text files?

2002-11-25 Thread Sébastien Aperghis-Tramoni
On Monday, November 25, 2002, at 02:09 PM, Chris Nandor wrote:


In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Slaven Rezic) wrote:

Could this be made even more generic, but translating to \n instead of 
\012?

Or use source filters:

package Filter::Any2Unix;


Any2Native?


When I had LinuxPPC on my Mac (when Mac OS X was still Rhapsody),
I had written this little module to run my Perl scripts with Mac 
end-of-lines
under Linux. Not pretty but still functional.

package Eol;
use Carp;

=head1 NAME

Eol - Perl module to execute scripts with foreign end-of-line character

=head1 SYNOPSIS

perl -MEol foo.pl

=head1 DESCRIPTION

This module allow one to execute Perl programs with foreign end-of-line
character. It's primarily intented to be executed on Unix systems as it
must be passed in argument to Cperl, but it may also work on Win32
systems.

=cut

if($0 ne '-e') {
my $script;

READ: {
  local $/;
  open(FILE, $0) or croak can't read file '$0': $!;
  undef $/;
  $script = FILE;  ## read whole file
  close(FILE);
}

$script =~s@(\015\012|\015|\012)@$/@g;

eval $script;
print $@ if $@

} else {
carp warning: module Eol can't be used on -e scripts 
}

1;
__END__


Sébastien Aperghis-Tramoni
 -- - --- -- - -- - --- -- - --- -- - --[ http://maddingue.org ]



Re: unix or mac-style text files?

2002-11-25 Thread William H. Magill
On Monday, November 25, 2002, at 08:50 AM, Chris Nandor wrote:

While they're at it, they might drop file resource forks.


Again, they essentially have.  They are still supported because, as 
with the
CR issue, they cannot just abandon them.  But most apps do not have 
them;
instead, the resource data is in separate files inside the packages.  I
don't imagine support for resource forks will be dropped any time 
soon, but
resource forks aren't really used by new apps.

  ...


Out of all my apps in there, I got hits in maybe a dozen or so, and 
the only
*Apple* apps were iMovie and DVD Player.  It's fairly clear that 
resource
forks are being used less, and I imagine Apple is discouraging their 
use,
since they are no longer needed.

I believe that you will find Chris' explanation to be correct -- OS X 
does
NOT use resource forks. It is only OS 9 compatibility which maintains 
their existence.

T.T.F.N.
William H. Magill
# Beige G3 - Rev A motherboard
# Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]



Re: unix or mac-style text files?

2002-11-25 Thread Rob Barris


On Monday, November 25, 2002, at 03:27 PM, William H. Magill wrote:


On Monday, November 25, 2002, at 08:50 AM, Chris Nandor wrote:

While they're at it, they might drop file resource forks.


Again, they essentially have.  They are still supported because, as 
with the
CR issue, they cannot just abandon them.  But most apps do not have 
them;
instead, the resource data is in separate files inside the packages.  
I
don't imagine support for resource forks will be dropped any time 
soon, but
resource forks aren't really used by new apps.

  ...


Out of all my apps in there, I got hits in maybe a dozen or so, and 
the only
*Apple* apps were iMovie and DVD Player.  It's fairly clear that 
resource
forks are being used less, and I imagine Apple is discouraging their 
use,
since they are no longer needed.

I believe that you will find Chris' explanation to be correct -- OS X 
does
NOT use resource forks. It is only OS 9 compatibility which maintains 
their existence.


OS X uses a resource fork every time it launches a CFM application so 
it can find the 'cfrg' resource in the app :)

But Chris' point makes sense.

Rob



Re: unix or mac-style text files?

2002-11-25 Thread Peter N Lewis
At 13:51 + 25/11/02, Nicholas Clark wrote:

On Mon, Nov 25, 2002 at 02:33:45PM +0100, Rafael Garcia-Suarez wrote:

 Chris Nandor [EMAIL PROTECTED] wrote:
 
  That shouldn't work.  By the time you get to it in the script, 
if you have a
  #! line, then the entire script is one long comment, and the use() line
  won't ever be executed.

 That would be an argument for allowing -M/-m on the #! line.

Er, except that the #! line would all have been read by then, and treated
as a comment. Or have I got things confused?


But is there any reason the # comments are not terminated by the 
first occurrence of *either* \012 or \015?

I can't see how this would affect any perl script, since presumably 
not unix script has a cr hidden in a comment (and similarly for Mac 
script and lf), and even for DOS, the cr will terminate the comment 
and the lf will be an irrelevant  white space (comments can't be 
inside anything that is storing white space, right?)

This would solve the #! commenting out the entire file issue, and 
allow the -M flag on the #! line to work.

Enjoy,
   Peter.

--
http://www.interarchy.com/  http://download.interarchy.com/


Re: unix or mac-style text files?

2002-11-25 Thread Ken Williams

On Tuesday, November 26, 2002, at 12:49  PM, Peter N Lewis wrote:


At 13:51 + 25/11/02, Nicholas Clark wrote:

On Mon, Nov 25, 2002 at 02:33:45PM +0100, Rafael Garcia-Suarez wrote:

 Chris Nandor [EMAIL PROTECTED] wrote:
 
  That shouldn't work.  By the time you get to it in the 
script, if you have a
  #! line, then the entire script is one long comment, and 
the use() line
  won't ever be executed.

 That would be an argument for allowing -M/-m on the #! line.

Er, except that the #! line would all have been read by then, 
and treated
as a comment. Or have I got things confused?

But is there any reason the # comments are not terminated by 
the first occurrence of *either* \012 or \015?

There's nothing perl can do about this - the OS (in fact, the 
kernel, I think) reads that shebang line in order to know it 
should call perl.  By the time perl gets to look at it, it's too 
late.

 -Ken



Re: unix or mac-style text files?

2002-11-25 Thread Chris

On Monday, November 25, 2002, at 10:09  PM, Ken Williams wrote:


There's nothing perl can do about this - the OS (in fact, the kernel, 
I think) reads that shebang line in order to know it should call perl. 
 By the time perl gets to look at it, it's too late.

Kernel not involved.  Shell looks to determine with which application 
to launch executable.  I kinda doubt the shell clips the first line and 
feeds only the later fragment to the executing file;  I suspect Perl 
gets the file and can parse as it likes.

Take care,
	Chris



Re: unix or mac-style text files?

2002-11-25 Thread Ken Williams

On Tuesday, November 26, 2002, at 03:38  PM, Chris wrote:


On Monday, November 25, 2002, at 10:09  PM, Ken Williams wrote:


There's nothing perl can do about this - the OS (in fact, the 
kernel, I think) reads that shebang line in order to know it 
should call perl.  By the time perl gets to look at it, it's 
too late.

Kernel not involved.  Shell looks to determine with which 
application to launch executable.

The following source says otherwise, as do some knowledgeable 
unix geeks I've asked about it.

  http://www.faqs.org/faqs/unix-faq/faq/part3/section-16.html


I kinda doubt the shell clips the first line and feeds only the 
later fragment to the executing file;  I suspect Perl gets the 
file and can parse as it likes.

True, perl gets the whole file, but before perl enters the 
picture at all, the kernel has to figure out whether to call 
perl, python, sh, or whatever.  That's the process that we don't 
have the ability to correct inside perl.

perl does indeed do some processing of the arguments in the 
shebang line - that's why it honors shebang switches (except 
ones it can't, like -T) even when you invoke it as perl 
filename.pl and the shebang mechanism isn't invoked.

 -Ken



Re: unix or mac-style text files?

2002-11-25 Thread Peter N Lewis
At 15:09 +1100 26/11/02, Ken Williams wrote:

On Tuesday, November 26, 2002, at 12:49  PM, Peter N Lewis wrote:



But is there any reason the # comments are not terminated by the 
first occurrence of *either* \012 or \015?

There's nothing perl can do about this - the OS (in fact, the 
kernel, I think) reads that shebang line in order to know it should 
call perl.  By the time perl gets to look at it, it's too late.

Ahh, yes, good point.

Except - the shell reads the file and executes the program that is 
the first word after the #! so perl will indeed get called for a file 
with the wrong line endings, although it might get called with the 
entire file inserted into the ARGV.  But then perl does all sorts of 
wacky emulation at that point anyway, so don't ask me what goes on 
there, I couldn't figure it out.

For example, a file containing

#!/bin/ps auxw

when executed does the ps with those flags.

A file containing

#!/bin/echo -e 'foreach (@ARGV) { print $_\n; }'
blah blah blah

displays 'foreach (@ARGV) { print $_\n; }'

But a file containing

#!/usr/bin/perl -e 'foreach (@ARGV) { print $_\n; }'
foreach (@ARGV) { print $_\n; }

displays nothing, presumably because of perl doing some wacky 
emulation on the command line.

Enjoy,
   Peter.

--
http://www.interarchy.com/  http://download.interarchy.com/


mea cupla (was Re: unix or mac-style text files?)

2002-11-25 Thread Chris

On Monday, November 25, 2002, at 11:20  PM, Ken Williams wrote:

On Tuesday, November 26, 2002, at 03:38  PM, Chris wrote:

On Monday, November 25, 2002, at 10:09  PM, Ken Williams wrote:

There's nothing perl can do about this - the OS (in fact, the 
kernel, I think) reads that shebang line in order to know it should 
call perl.  By the time perl gets to look at it, it's too late.
Kernel not involved.  Shell looks to determine with which application 
to launch executable.
The following source says otherwise, as do some knowledgeable unix 
geeks I've asked ...

I appear to have grossly misunderstood discussion on a different list 
regarding what the kernel did vs what the shell did.  I apologize for 
worsening the S/N ratio here.

Mea cupla!

--Chris



Re: mea cupla (was Re: unix or mac-style text files?)

2002-11-25 Thread Ken Williams

On Tuesday, November 26, 2002, at 04:40  PM, Chris wrote:



On Monday, November 25, 2002, at 11:20  PM, Ken Williams wrote:

On Tuesday, November 26, 2002, at 03:38  PM, Chris wrote:

On Monday, November 25, 2002, at 10:09  PM, Ken Williams wrote:

There's nothing perl can do about this - the OS (in fact, the 
kernel, I think) reads that shebang line in order to know it 
should call perl.  By the time perl gets to look at it, it's 
too late.
Kernel not involved.  Shell looks to determine with which 
application to launch executable.
The following source says otherwise, as do some knowledgeable 
unix geeks I've asked ...

I appear to have grossly misunderstood discussion on a 
different list regarding what the kernel did vs what the shell 
did.  I apologize for worsening the S/N ratio here.

Mea cupla!

No problem, this is the way people learn.  Also, the situation 
is somewhat blurry for historical reasons - the first support 
for shebang lines *was* in shells, but it's much better to do it 
at the system level (see a 1980 message from Dennis Ritchie: 
http://www.uni-ulm.de/~s_smasch/various/shebang/sys1.c.html), so 
current shells usually don't do this.

On OS X, when I look at 'man tcsh' (most OS X users' default 
shell) and search for '#!', I see some discussion of it.  The 
shell *can* be compiled with the 'hb' option to emulate the 
kernel's shebang processing, but on OS X as Apple ships it, this 
option is not activated.

So you've got some good reasons for being confused. ;-)

[Note that I've trimmed p5p from the recipient list, since most 
people there probably already know this stuff...]

 -Ken



Re: mea cupla (was Re: unix or mac-style text files?)

2002-11-25 Thread William H. Magill

On Tuesday, November 26, 2002, at 01:38 AM, Ken Williams wrote:



On Tuesday, November 26, 2002, at 04:40  PM, Chris wrote:



On Monday, November 25, 2002, at 11:20  PM, Ken Williams wrote:

On Tuesday, November 26, 2002, at 03:38  PM, Chris wrote:

On Monday, November 25, 2002, at 10:09  PM, Ken Williams wrote:

There's nothing perl can do about this - the OS (in fact, the 
kernel, I think) reads that shebang line in order to know it 
should call perl.  By the time perl gets to look at it, it's too 
late.
Kernel not involved.  Shell looks to determine with which 
application to launch executable.
The following source says otherwise, as do some knowledgeable unix 
geeks I've asked ...

I appear to have grossly misunderstood discussion on a different list 
regarding what the kernel did vs what the shell did.  I apologize for 
worsening the S/N ratio here.

Mea cupla!

No problem, this is the way people learn.  Also, the situation is 
somewhat blurry for historical reasons - the first support for shebang 
lines *was* in shells, but it's much better to do it at the system 
level (see a 1980 message from Dennis Ritchie: 
http://www.uni-ulm.de/~s_smasch/various/shebang/sys1.c.html), so 
current shells usually don't do this.

On OS X, when I look at 'man tcsh' (most OS X users' default shell) 
and search for '#!', I see some discussion of it.  The shell *can* be 
compiled with the 'hb' option to emulate the kernel's shebang 
processing, but on OS X as Apple ships it, this option is not 
activated.

Use: Http://www.uni-ulm.de/~s_smasch/various/shebang/

As the original URL does not work...

Also the man page for execve describes what is happening.

One hopes that Darwin does not have the old 32 character limit from 
Tahoe! (BSD 4.2) [See the footnotes to the table for the 3 possible 
events when the line is too long.]

The page is quite informative... Thanks for the pointer! [although the 
table is not complete by any means, it brings back memories of LONG 
nights caused by long path names.]

T.T.F.N.
William H. Magill
# Beige G3 - Rev A motherboard
# Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]



Re: unix or mac-style text files?

2002-11-24 Thread Chris Nandor
In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Wiggins D'Anconia) wrote:

 There is some discussion of this issue in the docs, check out:
 
 perldoc perlport

Note that perlport does not discuss this issue -- executing a non-native 
text file with perl -- at all, really.


 I guess the real question I have is does Perl on OS X qualify as MacPerl 
 or Unix perl ... I defer to the mac os x experts, but would guess Unix perl.

MacPerl is perl for Mac OS.  Mac OS X is not Mac OS; they are two different 
operating systems.  perl for Mac OS (MacPerl) uses Mac newlines, perl for 
Mac OS X (Unix perl) uses Unix newline.


But back to the point: there's been some discussion in this threa on 
workarounds, but my personal feeling is that this is a bug, or at best a 
broken feature, in perl.  Some time ago, the capability was added to perl to 
recognize and filter CRLF files to work on Unix and LF to work on Windows 
(grep for PERL_STRICT_CR in toke.c).  However, this functionality was not 
extended to CR files, as it should have been, IMO.  OK, so I am a little 
bitter about it.

The last discussion about how to deal with this was on p5p in July:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-07/msg00871.html


The bottom line was that it'd be nice to have a PerlIO filter for perl 
5.8.x, so that MacPerl can execute Unix and Windows text files, and Mac OS X 
perl can execute Mac OS text files, etc.  Patches are surely welcome!  :-)

-- 
Chris Nandor  [EMAIL PROTECTED]http://pudge.net/
Open Source Development Network[EMAIL PROTECTED] http://osdn.com/



Re: unix or mac-style text files?

2002-11-24 Thread Dan Kogai
On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote:

The bottom line was that it'd be nice to have a PerlIO filter for perl
5.8.x, so that MacPerl can execute Unix and Windows text files, and 
Mac OS X
perl can execute Mac OS text files, etc.  Patches are surely welcome!  
:-)

One good question may be how to handle newlines in heretext, the only 
part that really matters because that's the only exception to the fact 
that newlines are nothing but whitespace from perl compiler's point of 
view -- oops, shebang is another.  When you feed MacPerl *.pl to MacOS 
X, should linefeeds in heretext emit \015 or \012?

I am not sure which is lazier to simply apply

# Any - Unix
perl -i.bak -ple 's/\015\012|\015|012/\012/g' *.pl
# Any - Mac
perl -i.bak -ple 's/\015\012|\015|012/\015/g' *.pl

or teach camel the same trick

Dan the Man with Too Many Kinds of Line Endings to Deal With



Re: unix or mac-style text files?

2002-11-24 Thread Heather Madrone
Administrivia question:  I'm getting a lot of duplicate responsese
because the Reply-to on the list is set to sender.  On moderated
lists, this can be a good idea because the approval cycle causes
a lag between posting and mail reflection.

Is the Reply-to merely a hint that we should consider taking topics
offline, or is there some reason I should be leaving redundant addresses 
in the headers?

At 11:05 AM 11/24/2002 -0500, Chris Nandor wrote:
But back to the point: there's been some discussion in this threa on 
workarounds, but my personal feeling is that this is a bug, or at best a 
broken feature, in perl.  Some time ago, the capability was added to perl to 
recognize and filter CRLF files to work on Unix and LF to work on Windows 
(grep for PERL_STRICT_CR in toke.c).  However, this functionality was not 
extended to CR files, as it should have been, IMO.  

I think you're right.  It's easier to move back and forth from
Windows to Solaris than it is to move from one side of the Mac
house to the other.  This is undoubtedly broken, not just in
perl, but on the Macintosh in general.

Personally, I think that Apple would be wise to move to the Unix
standard for text files.  It would take several releases of
confusion to do it, but that would be better than carrying 
forward this schizophrenia to future OS generations.

The text file issue is one among many that make the Mac look
like a machine running two independent operating systems (you
can get this effect with Linux and Windows, without the
confusion of thinking that you're running on a single integrated
system).  The right half of its brain does not know what the left 
hand is doing.

While they're at it, they might drop file resource forks.
The Unix side of the house quietly drops them in any file
manipulation, but most Mac-native applications depend on 
them.

If Apple doesn't want to give up its own peculiar file formats,
then they ought to fix their Unix so it handles Macintosh files
sensibly.


Heather Madrone  ([EMAIL PROTECTED])  http://www.madrone.com
Reality: deeper than I dreamed.




Re: unix or mac-style text files?

2002-11-24 Thread William H. Magill

On Sunday, November 24, 2002, at 03:34 PM, Heather Madrone wrote:

If Apple doesn't want to give up its own peculiar file formats,
then they ought to fix their Unix so it handles Macintosh files
sensibly.


In the /Developer/Tools directory, you will find CpMac and MvMac -- 
these are the standard Unix cp and mv tools modified to deal with 
resource forks. ... they do have man pages.

There are a number of other similar tools in that directory.
Another useful tool is /usr/bin/ditto.

T.T.F.N.
William H. Magill
# Beige G3 - Rev A motherboard
# Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]



Re: unix or mac-style text files?

2002-11-24 Thread Chris Nandor
In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Dan Kogai) wrote:

 On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote:
  The bottom line was that it'd be nice to have a PerlIO filter for perl
  5.8.x, so that MacPerl can execute Unix and Windows text files, and 
  Mac OS X
  perl can execute Mac OS text files, etc.  Patches are surely welcome!  
  :-)
 
 One good question may be how to handle newlines in heretext, the only 
 part that really matters because that's the only exception to the fact 
 that newlines are nothing but whitespace from perl compiler's point of 
 view -- oops, shebang is another.  When you feed MacPerl *.pl to MacOS 
 X, should linefeeds in heretext emit \015 or \012?

I am talking here about taking (for example) a perl program with Mac OS 
newlines, and making it run under Unix perl.  In order for that to happen, 
you need to translate all the CRs to LFs.  That would include the CRs in the 
heretext, as well as in every literal string.



 I am not sure which is lazier to simply apply
 
 # Any - Unix
 perl -i.bak -ple 's/\015\012|\015|012/\012/g' *.pl
 # Any - Mac
 perl -i.bak -ple 's/\015\012|\015|012/\015/g' *.pl
 
 or teach camel the same trick

One of the main points of this is that some people will want the same files 
to be used in more than one context, such as sharing code between Windows 
and Unix perl over NFS, or sharing code between perl on Mac OS X and MacPerl 
under Mac OS or Classic.  Right now, the only solution is to make copies, as 
you suggest.

-- 
Chris Nandor  [EMAIL PROTECTED]http://pudge.net/
Open Source Development Network[EMAIL PROTECTED] http://osdn.com/



Re: unix or mac-style text files?

2002-11-24 Thread Ken Williams

On Monday, November 25, 2002, at 07:34  AM, Heather Madrone wrote:


Administrivia question:  I'm getting a lot of duplicate responsese
because the Reply-to on the list is set to sender.  On moderated
lists, this can be a good idea because the approval cycle causes
a lag between posting and mail reflection.

Is the Reply-to merely a hint that we should consider taking topics
offline, or is there some reason I should be leaving redundant 
addresses
in the headers?

The extra copies are more for your convenience - I appreciate 
when people send them to me, because one copy goes to my list 
mailbox and the other goes to my inbox.  The one in my inbox 
will be read faster.

I wish there were a standard way to indicate in your own mail 
headers I do/don't wish to receive a direct copy of replies to 
this message.  This can be done on usenet pretty effectively, 
but not really in email lists.

 -Ken



Re: unix or mac-style text files?

2002-11-20 Thread John Delacour
At 4:30 pm -0800 19/11/02, Heather Madrone wrote:


Is perl on the Mac going to care whether source files are Mac-style 
or Unix-style? Is it going to have difficulty reading and operating 
on either kind of file?  What kind of text files will it write?

You can do a routine like the one below to discover what line endings 
are used in the file and set $/ accordingly.  This script 
establishes, by reading 1000 bytes, that a Eudora mailbox file uses 
carriage returns only, changes $/ to cr and loops through the file 
printing all the From: lines terminated with line feeds.


#!/usr/bin/perl
$f = $ENV{HOME}/Documents/Eudora Folder/Mail Folder/Manningham ;
sysopen F, $f, O_RDONLY ;
sysread F, $_, 1000 ;
if (/\015\012/) {
  $/ = \015\012 ;
 } elsif (/\015/) {
   $/ = \015 ;
 } else {
   $/ = \012 ;
 }
 open F, $f ;
 for (F) {
   /^From: / and chomp and print $_\n
 }

-- JD






Re: unix or mac-style text files?

2002-11-20 Thread Chris Devers
On Tue, 19 Nov 2002, Heather Madrone wrote:

 I've already encountered a few text file anomalies on OS X. Most GUI
 applications seem to default to Mac-style text files (linefeeds only),
 but shell programs such as vi do not handle Mac-style text files
 gracefully.

If vi fails, try vim -- it handles line endings very gracefully, and
converting among different formats is just a matter of:

:set fileformat=unix
:set fileformat=mac
:set fileformat=dos

vim rox :)



-- 
Chris Devers[EMAIL PROTECTED]

Q: How does a hacker fix a function which doesn't
   work for all of the elements in its domain?
A: He changes the domain.




Re: unix or mac-style text files?

2002-11-20 Thread Peter N Lewis
At 13:22 + 20/11/02, John Delacour wrote:


 if (/\015\012/) {
  $/ = \015\012 ;
 } elsif (/\015/) {
   $/ = \015 ;
 } else {
   $/ = \012 ;
 }


You can do this with one regular expression which will pick up the 
first line ending:

 $/ = /(\015\012|\015|\012)/ ? $1: \n;

Note that because Perl picks the first match location, and after that 
picks the first of an or | set, it will find the first location, 
and will find the \015\012 if it is there in preference to the \015 
by itself.

Enjoy,
   Peter.

--
http://www.interarchy.com/  http://download.interarchy.com/


Re: unix or mac-style text files?

2002-11-19 Thread Wiggins d'Anconia
There is some discussion of this issue in the docs, check out:

perldoc perlport

And page through to a section called Newlines...
I guess the real question I have is does Perl on OS X qualify as MacPerl 
or Unix perl ... I defer to the mac os x experts, but would guess Unix perl.

http://danconia.org


Heather Madrone wrote:
I've already encountered a few text file anomalies on OS X. Most GUI 
applications
seem to default to Mac-style text files (linefeeds only), but shell 
programs such as
vi do not handle Mac-style text files gracefully.

Is perl on the Mac going to care whether source files are Mac-style or 
Unix-style?
Is it going to have difficulty reading and operating on either kind of 
file?  What
kind of text files will it write?

Thanks in advance for any illumination.

-hmm
[EMAIL PROTECTED]






Re: unix or mac-style text files?

2002-11-19 Thread Ken Williams

On Wednesday, November 20, 2002, at 01:45  AM, Wiggins d'Anconia wrote:

Heather Madrone wrote:

I've already encountered a few text file anomalies on OS X. Most GUI 
applications
seem to default to Mac-style text files (linefeeds only), but shell 
programs such as
vi do not handle Mac-style text files gracefully.
Is perl on the Mac going to care whether source files are Mac-style or 
Unix-style?
Is it going to have difficulty reading and operating on either kind of 
file?  What
kind of text files will it write?
Thanks in advance for any illumination.
-hmm
[EMAIL PROTECTED]

There is some discussion of this issue in the docs, check out:

perldoc perlport

And page through to a section called Newlines...
I guess the real question I have is does Perl on OS X qualify as 
MacPerl or Unix perl ... I defer to the mac os x experts, but would 
guess Unix perl.


Yes, Unix perl.

Of course, perl of any sort can read or write text (or non-text) files 
of any sort.  It's just that the default line endings differ on 
different platforms, in the interest of convenience.

 -Ken



Re: unix or mac-style text files?

2002-11-19 Thread Doug McNutt
At 16:30 -0800 11/19/02, Heather Madrone wrote:
I've already encountered a few text file anomalies on OS X. Most GUI applications
seem to default to Mac-style text files (linefeeds only),

I think that's returns only for Mac style. Don't be fooled by MPW's and perhaps 
MacPerl's redefinition of \n and \r in the reverse sense from the rest of the world.

I recommend use of linefeed only - ASCII 10 - for all future work in perl. BBEdit has 
no trouble with that. If you're even a little bit involved with moving perl scripts to 
some UNIX server where your web pages are based you'll find that your scripts move 
effortlessly with any kind of file transfer.

OT
The internet norm is a linefeed-return pair which is really strange because in the 
days of teletype one sent the return first because it took longer than the linefeed 
and one needed a few null characters to be sure the operation completed at 100 baud. 
Of course you had the option of sending the return only and repeating the line to get 
a bold appearance. If you don't believe that look at a UNIX man page with repeated 
characters and backspaces.
/OT


-- 
--  There are 10 kinds of people:  those who understand binary, and those who don't 
--



Re: unix or mac-style text files?

2002-11-19 Thread Heather Madrone
At 05:55 PM 11/19/2002 -0700, Doug McNutt wrote:
At 16:30 -0800 11/19/02, Heather Madrone wrote:
I've already encountered a few text file anomalies on OS X. Most GUI applications
seem to default to Mac-style text files (linefeeds only),

I think that's returns only for Mac style. Don't be fooled by MPW's and perhaps 
MacPerl's redefinition of \n and \r in the reverse sense from the rest of the world.

Yes.  ASCII 13.  \015.  ^M.

I recommend use of linefeed only - ASCII 10 - for all future work in perl. 
BBEdit has no trouble with that. If you're even a little bit involved with 
moving perl scripts to some UNIX server where your web pages are based 
you'll find that your scripts move effortlessly with any kind of file transfer.

Makes sense to me.  The input files are more problematic.
I don't necessarily know whether they will be created on the Mac
side of the house or the Unix side of the house.  It sounds like
I'm going to have to replace $FileHandle-getline with something 
that can handle either kind of line break.

--  There are 10 kinds of people:  
 those who understand binary, and those who don't --

This is one of my favorite jokes.


Heather Madrone  ([EMAIL PROTECTED])  http://www.madrone.com
Reality: deeper than I dreamed.




Re: unix or mac-style text files?

2002-11-19 Thread Peter N Lewis
At 16:30 -0800 19/11/02, Heather Madrone wrote:

I've already encountered a few text file anomalies on OS X. Most GUI 
applications seem to default to Mac-style text files (linefeeds 
only), but shell programs such as vi do not handle Mac-style text 
files gracefully.

Is perl on the Mac going to care whether source files are Mac-style 
or Unix-style?
Is it going to have difficulty reading and operating on either kind 
of file?  What kind of text files will it write?

Thanks in advance for any illumination.

Definitely read the perlport section of the documentation at:

http://www.perldoc.com/perl5.6.1/pod/perlport.html

Traditionally on Mac OS, line endings have been carriage return (cr) only.

Unix uses just linefeed line (lf) endings.

DOS/Windows uses carriage-linefeed (crlf) line endings.

Under Mac OS X, it is quite schizophrenic - some applications with 
handle only Mac line endings, some applications handle only Unix line 
endings, some applications will handle Unix or Mac (or even DOS) line 
endings.

Ignoring MacPerl (running under Mac OS X), and looking only at Mac OS 
X's /usr/bin/perl (or wherever you've installed perl), which is a 
Unix perl, not a Mac perl, we have:

Perl source files must have Unix line endings (lf only).  If the 
source file has Mac line endings, then it will usually run and do 
absolutely nothing (if you run it as perl script.pl, or it will 
complain script.pl: Command not found. if you run it as 
../script.pl.  This is because the first line is #!/usr/bin/perl - but 
after that the cr is not a line ending and so the entire source file 
appears as a single line.  If you run it with perl, then it will 
ignore the entire file as a comment.  If you run it yourself, then it 
will try to use the entire file as a command and wont be able to find 
/usr/bin/perlcrcruse (for example) as a command to run.

By default, Perl will read and write unix line ending files.  You can 
change the input separator with $/ = \r for Mac line endings, 
\r\n for DOS line ending (and back to \n for Unix, although 
saving and restoring is better practice) .  You can change the output 
by just printing the appropriate line ending.  In this case, a nice 
practice might be to do:

our $eol = \015\012; # Windows line ending

print First Line$eol;

My suggestion for Mac OS X users is to switch to using Unix line 
endings as soon as possible, and wherever possible support reading 
files with any line ending.  One simple thing I almost always do is:

while () {
  s/\015?\012$//; # instead of chomp
}

Yes, chomp is probably faster, but most of the time it makes no 
difference.  Not that the above code will not help you with Mac files 
because the  will read the entire file in one go :-(

It's really unfortunate that there is no special case value for $/ 
(like  perhaps) that handles \015\012|\015|\012 as a line ending. 
There is talk of making $/ a regex which would allow that, but that's 
huge overkill just to handle this one particular very special case.

An alternative is to read the entire file in (undef $/) and then split it:

local( $/ ) = undef;
my $file = ; # read in entire file
my @lines = split( /\015\012|\015|\012/, $file );
foreach my $line (@lines) {
  print '$line'\n;
}

Which is ok, but not great for big files.

Enjoy,
   Peter.


--
http://www.interarchy.com/  http://download.interarchy.com/


Re: unix or mac-style text files?

2002-11-19 Thread gene

An alternative is to read the entire file in (undef $/) and then split 
it:


My suggestion is to put some code like this in your script:

local $/ = get_line_ending($fh);

sub get_line_ending {
	my $fh = shift;
	my $char;
	while (read $fh, $char, 1) {
		if ($char eq \n) {
			seek $fh, 0, 0;
			return \n;
		} elsif ($char eq \r) {
			if (read $fh, $char, 1 and $char eq \n) {
seek $fh, 0, 0;
return \r\n;
			} else {
seek $fh, 0, 0;
return \r;
			}
		}
	}
	## what, no line ending?
	## return a reasonable default
	seek $fh, 0, 0;
	return \n;
}

This, of course assumes that you don't have some oddball case
where you have \r's in a unix file or something like that, but
if you're dealing with text files (which is the only place where
line endings should matter), that's unlikely.

Suggestions for the above code:
Move the sub into a module.
Put a byte counter in, so that you're not reading through a 5 Gig
file looking for a line ending.
I assume it's more efficient to read small chunks of bytes rather
than byte by byte.  For most text files this shouldn't matter, but
you may want to alter the reads and also the comparisons if you care.




Re: unix or mac-style text files?

2002-11-19 Thread Peter N Lewis
At 19:01 -0800 19/11/02, gene wrote:

An alternative is to read the entire file in (undef $/) and then split it:


My suggestion is to put some code like this in your script:


It's a good solution.  Probably for files less than a few hundred k 
it makes no difference (since you'll need to read the entire file 
anyway, until the memory usage of storing the whole thing becomes an 
issue it wont affect anything).

For portability, you should use \012 and \015 explicitly, except for 
the final default value which should be \n.  Here is the code, with 
the fail counter added to avoid it reading forever in a file with no 
line endings (not that it is likely to help anyway since you'll 
presumably follow this up with reading a line...)

# Usage: local $/ = get_line_ending($fh);
# By gene

sub get_line_ending {
  my ($fh) = @_;

  my $failcount = 33000;
  my $char;
  while (read $fh, $char, 1 and $failcount--  0) {
if ($char eq \012) {
  seek $fh, 0, 0;
  return \012;
} elsif ($char eq \015) {
  if (read $fh, $char, 1 and $char eq \012) {
seek $fh, 0, 0;
return \015\012;
  } else {
seek $fh, 0, 0;
return \015;
  }
}
  }
  ## what, no line ending?
  ## return a reasonable default
  seek $fh, 0, 0;
  return \n;
}

Suggestions for the above code:
Move the sub into a module.


I have ;-).  whether it's worth publishing a CPAN module, I don't 
know.  Perhaps adding it to some existing module?

I assume it's more efficient to read small chunks of bytes rather
than byte by byte.  For most text files this shouldn't matter, but
you may want to alter the reads and also the comparisons if you care.


It would require some timing to figure out if reading a block of 
characters would be better, possibly something like:

read 256 characters, look for the first \012 or \015 and see what's 
up (being careful not to accept a \015 as the 256th character as an 
answer), then try again with a larger read

would be more efficient, but then again, possibly not.  It would 
depend on a lot of things and might vary from OS to OS, so it's 
probably not worth worrying too much about.

Enjoy,
   Peter.


--
http://www.interarchy.com/  http://download.interarchy.com/