Re: Regex and Mac vs UNIX line endings
At 19:25 +0200 20/7/06, kurtz le pirate wrote: hum... is 'end of line' caracter important ? if not, you can do something like that : while (FILE) { chomp; if (/?/) { ... } } yes ? no ? Not really, because if the file is Mac line endings, then that will read the entire file in a single gulp. Also, if the file is DOS line endings, then the chomp will remove only the linefeed (unless you have changed $/ to CRLF, in which case it will not remove a single linefeed). If you fist check the fie and determine the line endings (and the file has consistent line endings, which is not always the case) and set $/ appropriately, then what you suggest will work. Enjoy, Peter. -- Check out Interarchy 8.1.1, just released, now with Amazon S3 support. http://www.stairways.com/ http://download.stairways.com/
Re: Regex and Mac vs UNIX line endings
I'm processing a string with embedded newlines. For testing I was storing the text in __DATA__ and slurping it into a string. This works fine. However when I read in a file, I'm having trouble with the line endings. Matching begining/end of logical lines is not working as I expect. Regexes like the one below match when using the DATA filehandle, but don't when opening other text files on my Mac. $text =~ s/^Text to match.*$//m; Is this due to UNIX '\n' vs. Mac '\r' line endings? I assumed the 'm' modifier would recognize any line ending. Oh what to do? You have several possibilities, depending on what you are trying to do. You could explicitly use either line ending, as it: $text =~ s/(\012|\015|\A)Text to match[^\012\015]*(\012|\015|\z)/$1$2/; or using backward/forward assertions: $text =~ s/(?:\A|(?=\012|\015))Text to match[^\012\015]*(?=\012|\015|\z)//; (the convoluted backward assertion is required because backward assertions must be fixed lengths) Or you could convert $text to \n line endings: $text =~ s/(\015\012|\012|\015)/\n/g; $text =~ s/^Text to match.*$//m; Or you could detect the line ending and explicitly use it. Enjoy, Peter. -- Check out Interarchy 8.1.1, just released, now with Amazon S3 support. http://www.stairways.com/ http://download.stairways.com/
Re: Regex and Mac vs UNIX line endings
Peter gave some good examples, so I shortened this to supplement his suggestions. I prefer to determine what the end-of-line (eol) character is using something less slippery than \r and \n. In Perl, \n is the native eol for the OS that Perl is executing under, so it could any of the \n, \r, \r\n, etc., constructs. Instead, use the octal characters, which for this are: Mac CR (Carriage Return) \015 UNIX, Linux, VMSLF (Line Feed)\012 Win CRLF \015\012 BTW, many apps in Mac OS X (Excel, Filemaker Pro) continue to use the eol used in OS 9 and before (CR), not the UNIX eol (LF). Here's my favorite way to get the eol and convert it to native, no matter what's in the original file (at least in the popular OSes): $text =~ s/(\015?\012|\015)/\n/gs; You could also specify what you want, if that isn't simply the native eol: my $new_eol = \015; # or \012 or \015\012 $text =~ s/(\015?\012|\015)/$new_eol/gs; If the file is large, then you may need to use a heuristic (that is, test some of the text trying to detect a pattern), as Doug suggests, testing the first x characters of the file to find one of the above eol constructs, and then seeing whether it shows up again, and then backing up and processing the whole file. Or use the look-ahead/behind approaches that Peter suggests. 1; - Bruce __bruce__van_allen__santa_cruz__ca__
Re: Regex and Mac vs UNIX line endings
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Andrew Brosnan) wrote: I'm processing a string with embedded newlines. For testing I was storing the text in __DATA__ and slurping it into a string. This works fine. However when I read in a file, I'm having trouble with the line endings. Matching begining/end of logical lines is not working as I expect. Regexes like the one below match when using the DATA filehandle, but don't when opening other text files on my Mac. $text =~ s/^Text to match.*$//m; Is this due to UNIX '\n' vs. Mac '\r' line endings? I assumed the 'm' modifier would recognize any line ending. Oh what to do? Andrew hum... is 'end of line' caracter important ? if not, you can do something like that : while (FILE) { chomp; if (/?/) { ... } } yes ? no ? -- klp
Solved - Re: Regex and Mac vs UNIX line endings
All set with this. Converting the line endings worked fine. Thanks. Andrew On 7/20/06 at 7:25 PM, [EMAIL PROTECTED] (kurtz le pirate) wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Andrew Brosnan) wrote: I'm processing a string with embedded newlines. For testing I was storing the text in __DATA__ and slurping it into a string. This works fine. However when I read in a file, I'm having trouble with the line endings. Matching begining/end of logical lines is not working as I expect. Regexes like the one below match when using the DATA filehandle, but don't when opening other text files on my Mac. $text =~ s/^Text to match.*$//m; Is this due to UNIX '\n' vs. Mac '\r' line endings? I assumed the 'm' modifier would recognize any line ending. Oh what to do? Andrew hum... is 'end of line' caracter important ? if not, you can do something like that : while (FILE) { chomp; if (/?/) { ... } } yes ? no ?
Regex and Mac vs UNIX line endings
I'm processing a string with embedded newlines. For testing I was storing the text in __DATA__ and slurping it into a string. This works fine. However when I read in a file, I'm having trouble with the line endings. Matching begining/end of logical lines is not working as I expect. Regexes like the one below match when using the DATA filehandle, but don't when opening other text files on my Mac. $text =~ s/^Text to match.*$//m; Is this due to UNIX '\n' vs. Mac '\r' line endings? I assumed the 'm' modifier would recognize any line ending. Oh what to do? Andrew
Re: Regex and Mac vs UNIX line endings
Andrew Brosnan wrote: I'm processing a string with embedded newlines. For testing I was storing the text in __DATA__ and slurping it into a string. This works fine. However when I read in a file, I'm having trouble with the line endings. Matching begining/end of logical lines is not working as I expect. Regexes like the one below match when using the DATA filehandle, but don't when opening other text files on my Mac. $text =~ s/^Text to match.*$//m; Is this due to UNIX '\n' vs. Mac '\r' line endings? I assumed the 'm' modifier would recognize any line ending. Oh what to do? Andrew What version of the Mac? Anything in the OSX family is Unix and uses the standard \n line ending/new line. If you brought the files over then yes you are going to have the '\r' line ending. :Robert
Re: Regex and Mac vs UNIX line endings
On 7/19/06 at 9:51 PM, [EMAIL PROTECTED] (Robert Hicks) wrote: Andrew Brosnan wrote: I'm processing a string with embedded newlines. For testing I was storing the text in __DATA__ and slurping it into a string. This works fine. However when I read in a file, I'm having trouble with the line endings. Matching begining/end of logical lines is not working as I expect. Regexes like the one below match when using the DATA filehandle, but don't when opening other text files on my Mac. $text =~ s/^Text to match.*$//m; Is this due to UNIX '\n' vs. Mac '\r' line endings? I assumed the 'm' modifier would recognize any line ending. Oh what to do? Andrew What version of the Mac? 10.3.9 Anything in the OSX family is Unix and uses the standard \n line ending I don't think that is the case. These are text files created on 10.3.9 and they use \r for line endings. The problem is that /^.*$/ won't match lines ending with \r even with the m modifier. Andrew
Re: Regex and Mac vs UNIX line endings
If you want to adjust the line ends in the files have a look at: ftp://ftp.macnauchtan.com/Software/LineEnds/FixEndsFolder.sit 52 kB ftp://ftp.macnauchtan.com/Software/LineEnds/ReadMe_fixends.txt 4 kB Yeah. It's pretty easy in perl too. I have on occasion, read the first few hundred characters of a file and then searched for \n and \r and \r\n. From that I make a guess and reopen the file for line by line reading after setting $/ to what I found. If you slurp in the whole string you can play with $option1 = split /\n/, $thedata; $option2 = split /\r/, $thedata; Which option has the most elements? split /(\r|\n)/, $thedata; # is an idea I just had. I wonder? -- -- Science is the business of discovering and codifying the rules and methods employed by the Intelligent Designer. Religions provide myths to mollify the anxiety experienced by those who choose not to participate. --