Re: unix or mac-style text files?
On Monday, November 25, 2002, at 07:23 AM, Chris Nandor wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Dan Kogai) wrote: On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote: The bottom line was that it'd be nice to have a PerlIO filter for perl 5.8.x, so that MacPerl can execute Unix and Windows text files, and Mac OS X perl can execute Mac OS text files, etc. Patches are surely welcome! :-) One good question may be how to handle newlines in heretext, the only part that really matters because that's the only exception to the fact that newlines are nothing but whitespace from perl compiler's point of view -- oops, shebang is another. When you feed MacPerl *.pl to MacOS X, should linefeeds in heretext emit \015 or \012? I am talking here about taking (for example) a perl program with Mac OS newlines, and making it run under Unix perl. In order for that to happen, you need to translate all the CRs to LFs. That would include the CRs in the heretext, as well as in every literal string. [revisiting an old thread] I don't think it's really a good idea to translate newlines in string literals (let's lump heretext in with string literals, since that's how they function). That stuff is part of the data of a program, not part of the instruction set. So by doing one mass CR-LF conversion blindly, you'd get the program to run, but it would run differently given the exact same data input. I don't think that's desirable. It's quite useful to have \n and File::Spec-catfile() and so on mean different things on different platforms, but literal characters changing themselves seems like quite another matter. -Ken
Re: unix or mac-style text files?
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Ken Williams) wrote: I don't think it's really a good idea to translate newlines in string literals (let's lump heretext in with string literals, since that's how they function). That stuff is part of the data of a program, not part of the instruction set. So by doing one mass CR-LF conversion blindly, you'd get the program to run, but it would run differently given the exact same data input. I don't think that's desirable. I disagree. We've been doing this for years on Mac OS without problem. Whenever I unpack a tarball or fetch a file via FTP or HTTP, my programs are doing mass/blind newline conversions on text files. It's long been accepted as the Right Thing, and it only rarely causes problems. And on the contrary, it would cause major problems to do it the other way, not only in terms of effort (Yes, you downloaded the file via FTP as text, and it converted the newlines from Unix to Mac, but you need to go back and convert the newlines in string literals back into Unix newlines), but also in the simple fact that it would rarely be what we want. When you do a here doc, 99.99% of the time you want native newlines in there. The basic tenet is that if you embed an actual newline anywhere at all in your code, it is a logical newline, no matter where it is or what it is doing, and it should be converted to the native format of whatever the target platform is. If you want a literal \012, then you should encode it as \012 or \0xA or \cJ. -- Chris Nandor [EMAIL PROTECTED]http://pudge.net/ Open Source Development Network[EMAIL PROTECTED] http://osdn.com/
Re: unix or mac-style text files?
-BEGIN PGP SIGNED MESSAGE- Moin, such as sharing code between Windows and Unix perl over NFS Uh, I do this for years and ActiveState Perl doesn't seem to have any problem with my Unix files. Where exactly is the problem? *puzzled* (Granted, I never used Mac OS, but I got a report that my stuff works on Mac OS, and somehow I doubt the the user in question converted all the source code beforehand...hm, must ask him. Good point) Cheers, Tels - -- perl -MDev::Bollocks -le'print Dev::Bollocks-rand()' challengingly facilitate synergistic models http://bloodgate.com/perl My current Perl projects PGP key available on http://bloodgate.com/tels.asc or via email -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.6 (GNU/Linux) Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl. iQEVAwUBPeP90ncLPEOTuEwVAQHIgQf8COL4H11GgNNJKd9GzBZziBaaen5GTeIu gHoL7CSsqgpLqdJ0WcmHmONMn1UCO1ywR0JtMDT1IVdF+7iuthnELKz6RaMwbepL CCiASeufG3IAXmaC195M3CZoc+O2PVfWA/WK8hgFgdxOqLT0pRPawkvB3PpxQBBo 72yMBEvbXhlvw9RubX5ddOoiy/TAW8T0UYX6nKmCDWZIi2ySdDZdQkzz5UbSpICh 52TH0v7b4eWEJ5u9mvjn/17j+cCfLnFFin9uxjx/pvYtEoMne0O0mlUN77tElRj5 JO8Mb9/9Da13r/uJguzE73nfaKLwZJwCmr3ko8Dv23eWFQhhQHQlnQ== =Z3SR -END PGP SIGNATURE-
Re: unix or mac-style text files?
-BEGIN PGP SIGNED MESSAGE- Moin, On 27-Nov-02 Chris Nandor carved into stone: At 00:03 +0100 2002.11.27, Tels wrote: such as sharing code between Windows and Unix perl over NFS Uh, I do this for years and ActiveState Perl doesn't seem to have any problem with my Unix files. Where exactly is the problem? *puzzled* As noted in previous posts, there is a (unfinished) feature in older versions of perl that allows Unix perl to execute CRLF files, and Windows perl to execute LF files. But it does not allow MacPerl to execute CRLF or LF files, and does not allow Unix or Windows perl to execute CR files. Ah, thanx for the explanation. (Granted, I never used Mac OS, but I got a report that my stuff works on Mac OS, and somehow I doubt the the user in question converted all the source code beforehand...hm, must ask him. Good point) Almost all tools for Mac OS unpack source with the proper newlines, either automatically or on request. If the user used Stuffit or somesuch, they can set an option to translate newlines. If they used MacPerl's module installation tools, then it asks you if you want to convert newlines. Okay, but so sharing the stuff over a drive (is this easily possible with Macß) would be one problem case? Cheers, Tels - -- perl -MDev::Bollocks -le'print Dev::Bollocks-rand()' enthusiastically compete proactive appliances http://bloodgate.com/perl My current Perl projects PGP key available on http://bloodgate.com/tels.asc or via email -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.6 (GNU/Linux) Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl. iQEVAwUBPeUDEXcLPEOTuEwVAQFyTwf/SN7nsAV792P8xGrvNswsp6dVpXSacFcB VsJfEeOPEmbVhpkiECTj4dz66XwbQSLerVTjmF3kEBk0ZIX0ohtcEiLQm9dwEer2 0Aga44DrCAX4MxNLRcRCpOLGGRgwhsJR3HZBfbpTuYkOIolzyOZKEw0qpWTNA8v3 SuZJjm80/clX6bpQliz8skklqjUbZPr1ccw5bGwqzsAQWcX56IfVVVxi3vVhlibc fX6vqElcfkGWM98VanKHNmq3eZmw5jk0kjiVuCIH1swUsTxxHitZJlxH72etdfI9 1o/YJ8g0yL93UZZElL8Lek0lQ6riBE9dGxqS39ntWNUrsQ0tg9Wk/A== =8Nnf -END PGP SIGNATURE-
Re: unix or mac-style text files?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sunday, November 24, 2002, at 05:21 PM, Ken Williams wrote: On Monday, November 25, 2002, at 07:34 AM, Heather Madrone wrote: Administrivia question: I'm getting a lot of duplicate responsese because the Reply-to on the list is set to sender. On moderated lists, this can be a good idea because the approval cycle causes a lag between posting and mail reflection. Is the Reply-to merely a hint that we should consider taking topics offline, or is there some reason I should be leaving redundant addresses in the headers? The extra copies are more for your convenience - I appreciate when people send them to me, because one copy goes to my list mailbox and the other goes to my inbox. The one in my inbox will be read faster. I wish there were a standard way to indicate in your own mail headers I do/don't wish to receive a direct copy of replies to this message. This can be done on usenet pretty effectively, but not really in email lists. There is the Mail-Follow-Up-To header, unfortunately AFAIK mutt [1] is the only client to respect it, or provide methods to set it according to your preference. FWIW, I object to reply to munging [2] Michael [1] http://www.mutt.org/doc/manual/manual-6.html#followup_to [2] http://www.unicom.com/pw/reply-to-harmful.html -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.7 (Darwin) iD8DBQE94Xxjilk3LUlIL0MRAvJ+AJ4iP8wsBGQL85PopAyF81kfT9S2HQCgvKdg 7ayC3QS0+aKFlFYOX0LcCkk= =nqEg -END PGP SIGNATURE-
Re: unix or mac-style text files?
On Monday, November 25, 2002, at 07:34 AM, Heather Madrone wrote: Administrivia question: I'm getting a lot of duplicate responsese because the Reply-to on the list is set to sender. On moderated lists, this can be a good idea because the approval cycle causes a lag between posting and mail reflection. Is the Reply-to merely a hint that we should consider taking topics offline, or is there some reason I should be leaving redundant addresses in the headers? More to the point, this list doesn't set Reply-To at all. There's a great deal of discussion at large about whether this is a good idea or not, but by-and-large, the From, To and Cc that come through are the same ones the Sender originally used. At 12:21 PM +1100 11/25/2002, Ken Williams replied: The extra copies are more for your convenience - I appreciate when people send them to me, because one copy goes to my list mailbox and the other goes to my inbox. The one in my inbox will be read faster. I wish there were a standard way to indicate in your own mail headers I do/don't wish to receive a direct copy of replies to this message. This can be done on usenet pretty effectively, but not really in email lists. Well, on lists like this one that don't munge the Reply-To header, if you designate a Reply-To on the outgoing mail, it should remain intact all the way to the end recipients. -Charles Euonymic Solutions [EMAIL PROTECTED]
Re: unix or mac-style text files?
On Mon, Nov 25, 2002 at 02:43:46AM +0900, Dan Kogai wrote: On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote: The bottom line was that it'd be nice to have a PerlIO filter for perl 5.8.x, so that MacPerl can execute Unix and Windows text files, and Mac OS X perl can execute Mac OS text files, etc. Patches are surely welcome! :-) One good question may be how to handle newlines in heretext, the only part that really matters because that's the only exception to the fact that newlines are nothing but whitespace from perl compiler's point of view -- oops, shebang is another. Newlines also serve as comment terminators; the perl compiler must recognize either \012 or \015 as the end of a comment. (I recall struggling to demonstrate some code during a Perl Mongers meeting. The script ran, but it didn't produce any output, nor did it produce any error messages! I finally figured out that perl on the OS X machine was seeing the whole script as one long comment, starting with #!/usr/bin/perl ... :) Ronald
Re: unix or mac-style text files?
Chris Nandor [EMAIL PROTECTED] writes: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Dan Kogai) wrote: On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote: The bottom line was that it'd be nice to have a PerlIO filter for perl 5.8.x, so that MacPerl can execute Unix and Windows text files, and Mac OS X perl can execute Mac OS text files, etc. Patches are surely welcome! :-) One good question may be how to handle newlines in heretext, the only part that really matters because that's the only exception to the fact that newlines are nothing but whitespace from perl compiler's point of view -- oops, shebang is another. When you feed MacPerl *.pl to MacOS X, should linefeeds in heretext emit \015 or \012? I am talking here about taking (for example) a perl program with Mac OS newlines, and making it run under Unix perl. In order for that to happen, you need to translate all the CRs to LFs. That would include the CRs in the heretext, as well as in every literal string. I am not sure which is lazier to simply apply # Any - Unix perl -i.bak -ple 's/\015\012|\015|012/\012/g' *.pl # Any - Mac perl -i.bak -ple 's/\015\012|\015|012/\015/g' *.pl or teach camel the same trick One of the main points of this is that some people will want the same files to be used in more than one context, such as sharing code between Windows and Unix perl over NFS, or sharing code between perl on Mac OS X and MacPerl under Mac OS or Classic. Right now, the only solution is to make copies, as you suggest. Or use source filters: package Filter::Any2Unix; use Filter::Util::Call; sub import { if ($^O ne 'MacOS') { filter_add(sub { my($status) = filter_read(); if ($status 0) { s/\015\012|\015|\012/\012/g; } } ); } } and then call the script with perl -MFilter::Any2Unix script.pl or embed use Filter::Any2Unix into the script. Regards, Slaven -- Slaven Rezic - [EMAIL PROTECTED] Tk-AppMaster: a perl/Tk module launcher designed for handhelds http://tk-appmaster.sf.net
Re: unix or mac-style text files?
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Slaven Rezic) wrote: Could this be made even more generic, but translating to \n instead of \012? Or use source filters: package Filter::Any2Unix; Any2Native? use Filter::Util::Call; sub import { if ($^O ne 'MacOS') { #? filter_add(sub { my($status) = filter_read(); if ($status 0) { s/\015\012|\015|\012/\012/g; /\n/g ? } } ); } #? } and then call the script with perl -MFilter::Any2Unix script.pl or embed use Filter::Any2Unix into the script. That shouldn't work. By the time you get to it in the script, if you have a #! line, then the entire script is one long comment, and the use() line won't ever be executed. -- Chris Nandor [EMAIL PROTECTED]http://pudge.net/ Open Source Development Network[EMAIL PROTECTED] http://osdn.com/
Re: unix or mac-style text files?
Chris Nandor [EMAIL PROTECTED] wrote: That shouldn't work. By the time you get to it in the script, if you have a #! line, then the entire script is one long comment, and the use() line won't ever be executed. That would be an argument for allowing -M/-m on the #! line.
Re: unix or mac-style text files?
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Heather Madrone) wrote: At 11:05 AM 11/24/2002 -0500, Chris Nandor wrote: But back to the point: there's been some discussion in this threa on workarounds, but my personal feeling is that this is a bug, or at best a broken feature, in perl. Some time ago, the capability was added to perl to recognize and filter CRLF files to work on Unix and LF to work on Windows (grep for PERL_STRICT_CR in toke.c). However, this functionality was not extended to CR files, as it should have been, IMO. I think you're right. It's easier to move back and forth from Windows to Solaris than it is to move from one side of the Mac house to the other. This is undoubtedly broken, not just in perl, but on the Macintosh in general. Well, I'd say it is only broken in perl because there is some support for it, but it is limited only to certain platforms. Otherwise I'd call it a woefully missing feature. I don't think it is, in the general case, broken on the Mac, however. They can't just abandon CR, and they shouldn't have stuck with CR instead of moving to LF. And CR itself wasn't broken to begin with. They really didn't have many options; that is to say, the brokenness we encounter because of the CR/LF differences are not indicative of a brokenness in the OS, but just unfortunate confluence of events. Personally, I think that Apple would be wise to move to the Unix standard for text files. It would take several releases of confusion to do it, but that would be better than carrying forward this schizophrenia to future OS generations. It has moved to the Unix standard. Many apps, however, have not entirely made the adjustment. While they're at it, they might drop file resource forks. Again, they essentially have. They are still supported because, as with the CR issue, they cannot just abandon them. But most apps do not have them; instead, the resource data is in separate files inside the packages. I don't imagine support for resource forks will be dropped any time soon, but resource forks aren't really used by new apps. [pudge@bourque]$ perl -MFile::Find -MMac::Files -e 'find(sub{my $f = $File::Find::name; return if ! -f || -l; my $catf = FSpGetCatInfo($_); printf %s : %d\n, $f, $catf-ioFlRLgLen if $catf-ioFlRLgLen}, shift)' /Applications/ The above one-liner prints out the size of the resource fork in every file under /Applications/ (ioFlRLgLen is the logical length of the resource fork, while ioFlLgLen is the logical end of the data fork; the -s file test operator and other file utilities, in perl and in Unix, only display the data fork size, so it should always be the case that -s $f == $catf-ioFlLgLen). Out of all my apps in there, I got hits in maybe a dozen or so, and the only *Apple* apps were iMovie and DVD Player. It's fairly clear that resource forks are being used less, and I imagine Apple is discouraging their use, since they are no longer needed. If Apple doesn't want to give up its own peculiar file formats, then they ought to fix their Unix so it handles Macintosh files sensibly. Apple assumes -- for right or wrong -- that people who use the Unix side of things will be able to figure out how to deal with the resource forks, the newlines, etc. (with tools such as CpMac, ditto) Let's face it: the Unix user side of things is relatively minor in priority to most other things in the OS. And really, it should be that way: it is used relatively little and its users are smart enough to figure out workarounds. Life sucks sometimes. ;-) -- Chris Nandor [EMAIL PROTECTED]http://pudge.net/ Open Source Development Network[EMAIL PROTECTED] http://osdn.com/
Re: unix or mac-style text files?
On Mon, Nov 25, 2002 at 02:33:45PM +0100, Rafael Garcia-Suarez wrote: Chris Nandor [EMAIL PROTECTED] wrote: That shouldn't work. By the time you get to it in the script, if you have a #! line, then the entire script is one long comment, and the use() line won't ever be executed. That would be an argument for allowing -M/-m on the #! line. Er, except that the #! line would all have been read by then, and treated as a comment. Or have I got things confused? (that's 3 dots, perl.org smtp daemon) the kernel parses the -M line and invokes perl with those -M options. then perl runs and reaches the -M line again, and now we just need it not to complain like it currently does. I hoped it would be possible to hack round it in some way, relying on \r being whitespace, so that #!/usr/local/bin/perl -w -MFilter if 0; would behave as a no-op on a system with matching \n, and as #!/usr/local/bin/perl -w -MFilter if 0; on a system where \n and \r are transposed, but I can't make it work. Nicholas Clark PS I need to dig it out of the archives for a second time, but nothing came of my #! line \r\n protector that works on everything it was tested on (Linux, FreeBSD, Solaris - so hopefully all SysV, BSD* and Linux)
Re: unix or mac-style text files?
On Monday, November 25, 2002, at 02:09 PM, Chris Nandor wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Slaven Rezic) wrote: Could this be made even more generic, but translating to \n instead of \012? Or use source filters: package Filter::Any2Unix; Any2Native? When I had LinuxPPC on my Mac (when Mac OS X was still Rhapsody), I had written this little module to run my Perl scripts with Mac end-of-lines under Linux. Not pretty but still functional. package Eol; use Carp; =head1 NAME Eol - Perl module to execute scripts with foreign end-of-line character =head1 SYNOPSIS perl -MEol foo.pl =head1 DESCRIPTION This module allow one to execute Perl programs with foreign end-of-line character. It's primarily intented to be executed on Unix systems as it must be passed in argument to Cperl, but it may also work on Win32 systems. =cut if($0 ne '-e') { my $script; READ: { local $/; open(FILE, $0) or croak can't read file '$0': $!; undef $/; $script = FILE; ## read whole file close(FILE); } $script =~s@(\015\012|\015|\012)@$/@g; eval $script; print $@ if $@ } else { carp warning: module Eol can't be used on -e scripts } 1; __END__ Sébastien Aperghis-Tramoni -- - --- -- - -- - --- -- - --- -- - --[ http://maddingue.org ]
Re: unix or mac-style text files?
On Monday, November 25, 2002, at 08:50 AM, Chris Nandor wrote: While they're at it, they might drop file resource forks. Again, they essentially have. They are still supported because, as with the CR issue, they cannot just abandon them. But most apps do not have them; instead, the resource data is in separate files inside the packages. I don't imagine support for resource forks will be dropped any time soon, but resource forks aren't really used by new apps. ... Out of all my apps in there, I got hits in maybe a dozen or so, and the only *Apple* apps were iMovie and DVD Player. It's fairly clear that resource forks are being used less, and I imagine Apple is discouraging their use, since they are no longer needed. I believe that you will find Chris' explanation to be correct -- OS X does NOT use resource forks. It is only OS 9 compatibility which maintains their existence. T.T.F.N. William H. Magill # Beige G3 - Rev A motherboard # Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: unix or mac-style text files?
On Monday, November 25, 2002, at 03:27 PM, William H. Magill wrote: On Monday, November 25, 2002, at 08:50 AM, Chris Nandor wrote: While they're at it, they might drop file resource forks. Again, they essentially have. They are still supported because, as with the CR issue, they cannot just abandon them. But most apps do not have them; instead, the resource data is in separate files inside the packages. I don't imagine support for resource forks will be dropped any time soon, but resource forks aren't really used by new apps. ... Out of all my apps in there, I got hits in maybe a dozen or so, and the only *Apple* apps were iMovie and DVD Player. It's fairly clear that resource forks are being used less, and I imagine Apple is discouraging their use, since they are no longer needed. I believe that you will find Chris' explanation to be correct -- OS X does NOT use resource forks. It is only OS 9 compatibility which maintains their existence. OS X uses a resource fork every time it launches a CFM application so it can find the 'cfrg' resource in the app :) But Chris' point makes sense. Rob
Re: unix or mac-style text files?
At 13:51 + 25/11/02, Nicholas Clark wrote: On Mon, Nov 25, 2002 at 02:33:45PM +0100, Rafael Garcia-Suarez wrote: Chris Nandor [EMAIL PROTECTED] wrote: That shouldn't work. By the time you get to it in the script, if you have a #! line, then the entire script is one long comment, and the use() line won't ever be executed. That would be an argument for allowing -M/-m on the #! line. Er, except that the #! line would all have been read by then, and treated as a comment. Or have I got things confused? But is there any reason the # comments are not terminated by the first occurrence of *either* \012 or \015? I can't see how this would affect any perl script, since presumably not unix script has a cr hidden in a comment (and similarly for Mac script and lf), and even for DOS, the cr will terminate the comment and the lf will be an irrelevant white space (comments can't be inside anything that is storing white space, right?) This would solve the #! commenting out the entire file issue, and allow the -M flag on the #! line to work. Enjoy, Peter. -- http://www.interarchy.com/ http://download.interarchy.com/
Re: unix or mac-style text files?
On Tuesday, November 26, 2002, at 12:49 PM, Peter N Lewis wrote: At 13:51 + 25/11/02, Nicholas Clark wrote: On Mon, Nov 25, 2002 at 02:33:45PM +0100, Rafael Garcia-Suarez wrote: Chris Nandor [EMAIL PROTECTED] wrote: That shouldn't work. By the time you get to it in the script, if you have a #! line, then the entire script is one long comment, and the use() line won't ever be executed. That would be an argument for allowing -M/-m on the #! line. Er, except that the #! line would all have been read by then, and treated as a comment. Or have I got things confused? But is there any reason the # comments are not terminated by the first occurrence of *either* \012 or \015? There's nothing perl can do about this - the OS (in fact, the kernel, I think) reads that shebang line in order to know it should call perl. By the time perl gets to look at it, it's too late. -Ken
Re: unix or mac-style text files?
On Monday, November 25, 2002, at 10:09 PM, Ken Williams wrote: There's nothing perl can do about this - the OS (in fact, the kernel, I think) reads that shebang line in order to know it should call perl. By the time perl gets to look at it, it's too late. Kernel not involved. Shell looks to determine with which application to launch executable. I kinda doubt the shell clips the first line and feeds only the later fragment to the executing file; I suspect Perl gets the file and can parse as it likes. Take care, Chris
Re: unix or mac-style text files?
On Tuesday, November 26, 2002, at 03:38 PM, Chris wrote: On Monday, November 25, 2002, at 10:09 PM, Ken Williams wrote: There's nothing perl can do about this - the OS (in fact, the kernel, I think) reads that shebang line in order to know it should call perl. By the time perl gets to look at it, it's too late. Kernel not involved. Shell looks to determine with which application to launch executable. The following source says otherwise, as do some knowledgeable unix geeks I've asked about it. http://www.faqs.org/faqs/unix-faq/faq/part3/section-16.html I kinda doubt the shell clips the first line and feeds only the later fragment to the executing file; I suspect Perl gets the file and can parse as it likes. True, perl gets the whole file, but before perl enters the picture at all, the kernel has to figure out whether to call perl, python, sh, or whatever. That's the process that we don't have the ability to correct inside perl. perl does indeed do some processing of the arguments in the shebang line - that's why it honors shebang switches (except ones it can't, like -T) even when you invoke it as perl filename.pl and the shebang mechanism isn't invoked. -Ken
Re: unix or mac-style text files?
At 15:09 +1100 26/11/02, Ken Williams wrote: On Tuesday, November 26, 2002, at 12:49 PM, Peter N Lewis wrote: But is there any reason the # comments are not terminated by the first occurrence of *either* \012 or \015? There's nothing perl can do about this - the OS (in fact, the kernel, I think) reads that shebang line in order to know it should call perl. By the time perl gets to look at it, it's too late. Ahh, yes, good point. Except - the shell reads the file and executes the program that is the first word after the #! so perl will indeed get called for a file with the wrong line endings, although it might get called with the entire file inserted into the ARGV. But then perl does all sorts of wacky emulation at that point anyway, so don't ask me what goes on there, I couldn't figure it out. For example, a file containing #!/bin/ps auxw when executed does the ps with those flags. A file containing #!/bin/echo -e 'foreach (@ARGV) { print $_\n; }' blah blah blah displays 'foreach (@ARGV) { print $_\n; }' But a file containing #!/usr/bin/perl -e 'foreach (@ARGV) { print $_\n; }' foreach (@ARGV) { print $_\n; } displays nothing, presumably because of perl doing some wacky emulation on the command line. Enjoy, Peter. -- http://www.interarchy.com/ http://download.interarchy.com/
mea cupla (was Re: unix or mac-style text files?)
On Monday, November 25, 2002, at 11:20 PM, Ken Williams wrote: On Tuesday, November 26, 2002, at 03:38 PM, Chris wrote: On Monday, November 25, 2002, at 10:09 PM, Ken Williams wrote: There's nothing perl can do about this - the OS (in fact, the kernel, I think) reads that shebang line in order to know it should call perl. By the time perl gets to look at it, it's too late. Kernel not involved. Shell looks to determine with which application to launch executable. The following source says otherwise, as do some knowledgeable unix geeks I've asked ... I appear to have grossly misunderstood discussion on a different list regarding what the kernel did vs what the shell did. I apologize for worsening the S/N ratio here. Mea cupla! --Chris
Re: mea cupla (was Re: unix or mac-style text files?)
On Tuesday, November 26, 2002, at 04:40 PM, Chris wrote: On Monday, November 25, 2002, at 11:20 PM, Ken Williams wrote: On Tuesday, November 26, 2002, at 03:38 PM, Chris wrote: On Monday, November 25, 2002, at 10:09 PM, Ken Williams wrote: There's nothing perl can do about this - the OS (in fact, the kernel, I think) reads that shebang line in order to know it should call perl. By the time perl gets to look at it, it's too late. Kernel not involved. Shell looks to determine with which application to launch executable. The following source says otherwise, as do some knowledgeable unix geeks I've asked ... I appear to have grossly misunderstood discussion on a different list regarding what the kernel did vs what the shell did. I apologize for worsening the S/N ratio here. Mea cupla! No problem, this is the way people learn. Also, the situation is somewhat blurry for historical reasons - the first support for shebang lines *was* in shells, but it's much better to do it at the system level (see a 1980 message from Dennis Ritchie: http://www.uni-ulm.de/~s_smasch/various/shebang/sys1.c.html), so current shells usually don't do this. On OS X, when I look at 'man tcsh' (most OS X users' default shell) and search for '#!', I see some discussion of it. The shell *can* be compiled with the 'hb' option to emulate the kernel's shebang processing, but on OS X as Apple ships it, this option is not activated. So you've got some good reasons for being confused. ;-) [Note that I've trimmed p5p from the recipient list, since most people there probably already know this stuff...] -Ken
Re: mea cupla (was Re: unix or mac-style text files?)
On Tuesday, November 26, 2002, at 01:38 AM, Ken Williams wrote: On Tuesday, November 26, 2002, at 04:40 PM, Chris wrote: On Monday, November 25, 2002, at 11:20 PM, Ken Williams wrote: On Tuesday, November 26, 2002, at 03:38 PM, Chris wrote: On Monday, November 25, 2002, at 10:09 PM, Ken Williams wrote: There's nothing perl can do about this - the OS (in fact, the kernel, I think) reads that shebang line in order to know it should call perl. By the time perl gets to look at it, it's too late. Kernel not involved. Shell looks to determine with which application to launch executable. The following source says otherwise, as do some knowledgeable unix geeks I've asked ... I appear to have grossly misunderstood discussion on a different list regarding what the kernel did vs what the shell did. I apologize for worsening the S/N ratio here. Mea cupla! No problem, this is the way people learn. Also, the situation is somewhat blurry for historical reasons - the first support for shebang lines *was* in shells, but it's much better to do it at the system level (see a 1980 message from Dennis Ritchie: http://www.uni-ulm.de/~s_smasch/various/shebang/sys1.c.html), so current shells usually don't do this. On OS X, when I look at 'man tcsh' (most OS X users' default shell) and search for '#!', I see some discussion of it. The shell *can* be compiled with the 'hb' option to emulate the kernel's shebang processing, but on OS X as Apple ships it, this option is not activated. Use: Http://www.uni-ulm.de/~s_smasch/various/shebang/ As the original URL does not work... Also the man page for execve describes what is happening. One hopes that Darwin does not have the old 32 character limit from Tahoe! (BSD 4.2) [See the footnotes to the table for the 3 possible events when the line is too long.] The page is quite informative... Thanks for the pointer! [although the table is not complete by any means, it brings back memories of LONG nights caused by long path names.] T.T.F.N. William H. Magill # Beige G3 - Rev A motherboard # Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: unix or mac-style text files?
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Wiggins D'Anconia) wrote: There is some discussion of this issue in the docs, check out: perldoc perlport Note that perlport does not discuss this issue -- executing a non-native text file with perl -- at all, really. I guess the real question I have is does Perl on OS X qualify as MacPerl or Unix perl ... I defer to the mac os x experts, but would guess Unix perl. MacPerl is perl for Mac OS. Mac OS X is not Mac OS; they are two different operating systems. perl for Mac OS (MacPerl) uses Mac newlines, perl for Mac OS X (Unix perl) uses Unix newline. But back to the point: there's been some discussion in this threa on workarounds, but my personal feeling is that this is a bug, or at best a broken feature, in perl. Some time ago, the capability was added to perl to recognize and filter CRLF files to work on Unix and LF to work on Windows (grep for PERL_STRICT_CR in toke.c). However, this functionality was not extended to CR files, as it should have been, IMO. OK, so I am a little bitter about it. The last discussion about how to deal with this was on p5p in July: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-07/msg00871.html The bottom line was that it'd be nice to have a PerlIO filter for perl 5.8.x, so that MacPerl can execute Unix and Windows text files, and Mac OS X perl can execute Mac OS text files, etc. Patches are surely welcome! :-) -- Chris Nandor [EMAIL PROTECTED]http://pudge.net/ Open Source Development Network[EMAIL PROTECTED] http://osdn.com/
Re: unix or mac-style text files?
On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote: The bottom line was that it'd be nice to have a PerlIO filter for perl 5.8.x, so that MacPerl can execute Unix and Windows text files, and Mac OS X perl can execute Mac OS text files, etc. Patches are surely welcome! :-) One good question may be how to handle newlines in heretext, the only part that really matters because that's the only exception to the fact that newlines are nothing but whitespace from perl compiler's point of view -- oops, shebang is another. When you feed MacPerl *.pl to MacOS X, should linefeeds in heretext emit \015 or \012? I am not sure which is lazier to simply apply # Any - Unix perl -i.bak -ple 's/\015\012|\015|012/\012/g' *.pl # Any - Mac perl -i.bak -ple 's/\015\012|\015|012/\015/g' *.pl or teach camel the same trick Dan the Man with Too Many Kinds of Line Endings to Deal With
Re: unix or mac-style text files?
Administrivia question: I'm getting a lot of duplicate responsese because the Reply-to on the list is set to sender. On moderated lists, this can be a good idea because the approval cycle causes a lag between posting and mail reflection. Is the Reply-to merely a hint that we should consider taking topics offline, or is there some reason I should be leaving redundant addresses in the headers? At 11:05 AM 11/24/2002 -0500, Chris Nandor wrote: But back to the point: there's been some discussion in this threa on workarounds, but my personal feeling is that this is a bug, or at best a broken feature, in perl. Some time ago, the capability was added to perl to recognize and filter CRLF files to work on Unix and LF to work on Windows (grep for PERL_STRICT_CR in toke.c). However, this functionality was not extended to CR files, as it should have been, IMO. I think you're right. It's easier to move back and forth from Windows to Solaris than it is to move from one side of the Mac house to the other. This is undoubtedly broken, not just in perl, but on the Macintosh in general. Personally, I think that Apple would be wise to move to the Unix standard for text files. It would take several releases of confusion to do it, but that would be better than carrying forward this schizophrenia to future OS generations. The text file issue is one among many that make the Mac look like a machine running two independent operating systems (you can get this effect with Linux and Windows, without the confusion of thinking that you're running on a single integrated system). The right half of its brain does not know what the left hand is doing. While they're at it, they might drop file resource forks. The Unix side of the house quietly drops them in any file manipulation, but most Mac-native applications depend on them. If Apple doesn't want to give up its own peculiar file formats, then they ought to fix their Unix so it handles Macintosh files sensibly. Heather Madrone ([EMAIL PROTECTED]) http://www.madrone.com Reality: deeper than I dreamed.
Re: unix or mac-style text files?
On Sunday, November 24, 2002, at 03:34 PM, Heather Madrone wrote: If Apple doesn't want to give up its own peculiar file formats, then they ought to fix their Unix so it handles Macintosh files sensibly. In the /Developer/Tools directory, you will find CpMac and MvMac -- these are the standard Unix cp and mv tools modified to deal with resource forks. ... they do have man pages. There are a number of other similar tools in that directory. Another useful tool is /usr/bin/ditto. T.T.F.N. William H. Magill # Beige G3 - Rev A motherboard # Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: unix or mac-style text files?
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Dan Kogai) wrote: On Monday, Nov 25, 2002, at 01:05 Asia/Tokyo, Chris Nandor wrote: The bottom line was that it'd be nice to have a PerlIO filter for perl 5.8.x, so that MacPerl can execute Unix and Windows text files, and Mac OS X perl can execute Mac OS text files, etc. Patches are surely welcome! :-) One good question may be how to handle newlines in heretext, the only part that really matters because that's the only exception to the fact that newlines are nothing but whitespace from perl compiler's point of view -- oops, shebang is another. When you feed MacPerl *.pl to MacOS X, should linefeeds in heretext emit \015 or \012? I am talking here about taking (for example) a perl program with Mac OS newlines, and making it run under Unix perl. In order for that to happen, you need to translate all the CRs to LFs. That would include the CRs in the heretext, as well as in every literal string. I am not sure which is lazier to simply apply # Any - Unix perl -i.bak -ple 's/\015\012|\015|012/\012/g' *.pl # Any - Mac perl -i.bak -ple 's/\015\012|\015|012/\015/g' *.pl or teach camel the same trick One of the main points of this is that some people will want the same files to be used in more than one context, such as sharing code between Windows and Unix perl over NFS, or sharing code between perl on Mac OS X and MacPerl under Mac OS or Classic. Right now, the only solution is to make copies, as you suggest. -- Chris Nandor [EMAIL PROTECTED]http://pudge.net/ Open Source Development Network[EMAIL PROTECTED] http://osdn.com/
Re: unix or mac-style text files?
On Monday, November 25, 2002, at 07:34 AM, Heather Madrone wrote: Administrivia question: I'm getting a lot of duplicate responsese because the Reply-to on the list is set to sender. On moderated lists, this can be a good idea because the approval cycle causes a lag between posting and mail reflection. Is the Reply-to merely a hint that we should consider taking topics offline, or is there some reason I should be leaving redundant addresses in the headers? The extra copies are more for your convenience - I appreciate when people send them to me, because one copy goes to my list mailbox and the other goes to my inbox. The one in my inbox will be read faster. I wish there were a standard way to indicate in your own mail headers I do/don't wish to receive a direct copy of replies to this message. This can be done on usenet pretty effectively, but not really in email lists. -Ken
Re: unix or mac-style text files?
At 4:30 pm -0800 19/11/02, Heather Madrone wrote: Is perl on the Mac going to care whether source files are Mac-style or Unix-style? Is it going to have difficulty reading and operating on either kind of file? What kind of text files will it write? You can do a routine like the one below to discover what line endings are used in the file and set $/ accordingly. This script establishes, by reading 1000 bytes, that a Eudora mailbox file uses carriage returns only, changes $/ to cr and loops through the file printing all the From: lines terminated with line feeds. #!/usr/bin/perl $f = $ENV{HOME}/Documents/Eudora Folder/Mail Folder/Manningham ; sysopen F, $f, O_RDONLY ; sysread F, $_, 1000 ; if (/\015\012/) { $/ = \015\012 ; } elsif (/\015/) { $/ = \015 ; } else { $/ = \012 ; } open F, $f ; for (F) { /^From: / and chomp and print $_\n } -- JD
Re: unix or mac-style text files?
On Tue, 19 Nov 2002, Heather Madrone wrote: I've already encountered a few text file anomalies on OS X. Most GUI applications seem to default to Mac-style text files (linefeeds only), but shell programs such as vi do not handle Mac-style text files gracefully. If vi fails, try vim -- it handles line endings very gracefully, and converting among different formats is just a matter of: :set fileformat=unix :set fileformat=mac :set fileformat=dos vim rox :) -- Chris Devers[EMAIL PROTECTED] Q: How does a hacker fix a function which doesn't work for all of the elements in its domain? A: He changes the domain.
Re: unix or mac-style text files?
At 13:22 + 20/11/02, John Delacour wrote: if (/\015\012/) { $/ = \015\012 ; } elsif (/\015/) { $/ = \015 ; } else { $/ = \012 ; } You can do this with one regular expression which will pick up the first line ending: $/ = /(\015\012|\015|\012)/ ? $1: \n; Note that because Perl picks the first match location, and after that picks the first of an or | set, it will find the first location, and will find the \015\012 if it is there in preference to the \015 by itself. Enjoy, Peter. -- http://www.interarchy.com/ http://download.interarchy.com/
Re: unix or mac-style text files?
There is some discussion of this issue in the docs, check out: perldoc perlport And page through to a section called Newlines... I guess the real question I have is does Perl on OS X qualify as MacPerl or Unix perl ... I defer to the mac os x experts, but would guess Unix perl. http://danconia.org Heather Madrone wrote: I've already encountered a few text file anomalies on OS X. Most GUI applications seem to default to Mac-style text files (linefeeds only), but shell programs such as vi do not handle Mac-style text files gracefully. Is perl on the Mac going to care whether source files are Mac-style or Unix-style? Is it going to have difficulty reading and operating on either kind of file? What kind of text files will it write? Thanks in advance for any illumination. -hmm [EMAIL PROTECTED]
Re: unix or mac-style text files?
On Wednesday, November 20, 2002, at 01:45 AM, Wiggins d'Anconia wrote: Heather Madrone wrote: I've already encountered a few text file anomalies on OS X. Most GUI applications seem to default to Mac-style text files (linefeeds only), but shell programs such as vi do not handle Mac-style text files gracefully. Is perl on the Mac going to care whether source files are Mac-style or Unix-style? Is it going to have difficulty reading and operating on either kind of file? What kind of text files will it write? Thanks in advance for any illumination. -hmm [EMAIL PROTECTED] There is some discussion of this issue in the docs, check out: perldoc perlport And page through to a section called Newlines... I guess the real question I have is does Perl on OS X qualify as MacPerl or Unix perl ... I defer to the mac os x experts, but would guess Unix perl. Yes, Unix perl. Of course, perl of any sort can read or write text (or non-text) files of any sort. It's just that the default line endings differ on different platforms, in the interest of convenience. -Ken
Re: unix or mac-style text files?
At 16:30 -0800 11/19/02, Heather Madrone wrote: I've already encountered a few text file anomalies on OS X. Most GUI applications seem to default to Mac-style text files (linefeeds only), I think that's returns only for Mac style. Don't be fooled by MPW's and perhaps MacPerl's redefinition of \n and \r in the reverse sense from the rest of the world. I recommend use of linefeed only - ASCII 10 - for all future work in perl. BBEdit has no trouble with that. If you're even a little bit involved with moving perl scripts to some UNIX server where your web pages are based you'll find that your scripts move effortlessly with any kind of file transfer. OT The internet norm is a linefeed-return pair which is really strange because in the days of teletype one sent the return first because it took longer than the linefeed and one needed a few null characters to be sure the operation completed at 100 baud. Of course you had the option of sending the return only and repeating the line to get a bold appearance. If you don't believe that look at a UNIX man page with repeated characters and backspaces. /OT -- -- There are 10 kinds of people: those who understand binary, and those who don't --
Re: unix or mac-style text files?
At 05:55 PM 11/19/2002 -0700, Doug McNutt wrote: At 16:30 -0800 11/19/02, Heather Madrone wrote: I've already encountered a few text file anomalies on OS X. Most GUI applications seem to default to Mac-style text files (linefeeds only), I think that's returns only for Mac style. Don't be fooled by MPW's and perhaps MacPerl's redefinition of \n and \r in the reverse sense from the rest of the world. Yes. ASCII 13. \015. ^M. I recommend use of linefeed only - ASCII 10 - for all future work in perl. BBEdit has no trouble with that. If you're even a little bit involved with moving perl scripts to some UNIX server where your web pages are based you'll find that your scripts move effortlessly with any kind of file transfer. Makes sense to me. The input files are more problematic. I don't necessarily know whether they will be created on the Mac side of the house or the Unix side of the house. It sounds like I'm going to have to replace $FileHandle-getline with something that can handle either kind of line break. -- There are 10 kinds of people: those who understand binary, and those who don't -- This is one of my favorite jokes. Heather Madrone ([EMAIL PROTECTED]) http://www.madrone.com Reality: deeper than I dreamed.
Re: unix or mac-style text files?
At 16:30 -0800 19/11/02, Heather Madrone wrote: I've already encountered a few text file anomalies on OS X. Most GUI applications seem to default to Mac-style text files (linefeeds only), but shell programs such as vi do not handle Mac-style text files gracefully. Is perl on the Mac going to care whether source files are Mac-style or Unix-style? Is it going to have difficulty reading and operating on either kind of file? What kind of text files will it write? Thanks in advance for any illumination. Definitely read the perlport section of the documentation at: http://www.perldoc.com/perl5.6.1/pod/perlport.html Traditionally on Mac OS, line endings have been carriage return (cr) only. Unix uses just linefeed line (lf) endings. DOS/Windows uses carriage-linefeed (crlf) line endings. Under Mac OS X, it is quite schizophrenic - some applications with handle only Mac line endings, some applications handle only Unix line endings, some applications will handle Unix or Mac (or even DOS) line endings. Ignoring MacPerl (running under Mac OS X), and looking only at Mac OS X's /usr/bin/perl (or wherever you've installed perl), which is a Unix perl, not a Mac perl, we have: Perl source files must have Unix line endings (lf only). If the source file has Mac line endings, then it will usually run and do absolutely nothing (if you run it as perl script.pl, or it will complain script.pl: Command not found. if you run it as ../script.pl. This is because the first line is #!/usr/bin/perl - but after that the cr is not a line ending and so the entire source file appears as a single line. If you run it with perl, then it will ignore the entire file as a comment. If you run it yourself, then it will try to use the entire file as a command and wont be able to find /usr/bin/perlcrcruse (for example) as a command to run. By default, Perl will read and write unix line ending files. You can change the input separator with $/ = \r for Mac line endings, \r\n for DOS line ending (and back to \n for Unix, although saving and restoring is better practice) . You can change the output by just printing the appropriate line ending. In this case, a nice practice might be to do: our $eol = \015\012; # Windows line ending print First Line$eol; My suggestion for Mac OS X users is to switch to using Unix line endings as soon as possible, and wherever possible support reading files with any line ending. One simple thing I almost always do is: while () { s/\015?\012$//; # instead of chomp } Yes, chomp is probably faster, but most of the time it makes no difference. Not that the above code will not help you with Mac files because the will read the entire file in one go :-( It's really unfortunate that there is no special case value for $/ (like perhaps) that handles \015\012|\015|\012 as a line ending. There is talk of making $/ a regex which would allow that, but that's huge overkill just to handle this one particular very special case. An alternative is to read the entire file in (undef $/) and then split it: local( $/ ) = undef; my $file = ; # read in entire file my @lines = split( /\015\012|\015|\012/, $file ); foreach my $line (@lines) { print '$line'\n; } Which is ok, but not great for big files. Enjoy, Peter. -- http://www.interarchy.com/ http://download.interarchy.com/
Re: unix or mac-style text files?
An alternative is to read the entire file in (undef $/) and then split it: My suggestion is to put some code like this in your script: local $/ = get_line_ending($fh); sub get_line_ending { my $fh = shift; my $char; while (read $fh, $char, 1) { if ($char eq \n) { seek $fh, 0, 0; return \n; } elsif ($char eq \r) { if (read $fh, $char, 1 and $char eq \n) { seek $fh, 0, 0; return \r\n; } else { seek $fh, 0, 0; return \r; } } } ## what, no line ending? ## return a reasonable default seek $fh, 0, 0; return \n; } This, of course assumes that you don't have some oddball case where you have \r's in a unix file or something like that, but if you're dealing with text files (which is the only place where line endings should matter), that's unlikely. Suggestions for the above code: Move the sub into a module. Put a byte counter in, so that you're not reading through a 5 Gig file looking for a line ending. I assume it's more efficient to read small chunks of bytes rather than byte by byte. For most text files this shouldn't matter, but you may want to alter the reads and also the comparisons if you care.
Re: unix or mac-style text files?
At 19:01 -0800 19/11/02, gene wrote: An alternative is to read the entire file in (undef $/) and then split it: My suggestion is to put some code like this in your script: It's a good solution. Probably for files less than a few hundred k it makes no difference (since you'll need to read the entire file anyway, until the memory usage of storing the whole thing becomes an issue it wont affect anything). For portability, you should use \012 and \015 explicitly, except for the final default value which should be \n. Here is the code, with the fail counter added to avoid it reading forever in a file with no line endings (not that it is likely to help anyway since you'll presumably follow this up with reading a line...) # Usage: local $/ = get_line_ending($fh); # By gene sub get_line_ending { my ($fh) = @_; my $failcount = 33000; my $char; while (read $fh, $char, 1 and $failcount-- 0) { if ($char eq \012) { seek $fh, 0, 0; return \012; } elsif ($char eq \015) { if (read $fh, $char, 1 and $char eq \012) { seek $fh, 0, 0; return \015\012; } else { seek $fh, 0, 0; return \015; } } } ## what, no line ending? ## return a reasonable default seek $fh, 0, 0; return \n; } Suggestions for the above code: Move the sub into a module. I have ;-). whether it's worth publishing a CPAN module, I don't know. Perhaps adding it to some existing module? I assume it's more efficient to read small chunks of bytes rather than byte by byte. For most text files this shouldn't matter, but you may want to alter the reads and also the comparisons if you care. It would require some timing to figure out if reading a block of characters would be better, possibly something like: read 256 characters, look for the first \012 or \015 and see what's up (being careful not to accept a \015 as the 256th character as an answer), then try again with a larger read would be more efficient, but then again, possibly not. It would depend on a lot of things and might vary from OS to OS, so it's probably not worth worrying too much about. Enjoy, Peter. -- http://www.interarchy.com/ http://download.interarchy.com/