Re: [Chicken-users] Regex fail?
On Fri, Oct 30, 2015 at 10:01 PM, John Cowan wrote: > Peter Bex scripsit: > > > Note the nonl, which the manual states is equivalent to ".", but of > > course nonl means "no newline". > > Dot in regular expressions has *always* meant "match any character but a > newline". It doesn't come up that much in Unix commands, which typically > process their input line by line anyway. But if you look at the Posix > definition or the Perl one, you see that dot is indeed equivalent to > "nonl". > Indeed, "nonl" exists in order to have an SRE equivalent for dot. > > > Maybe Alex can give us some info about why this is the case? I think > this > > may have something to do with the multi-line / single-line distinction > > (which, to be honest, I never really understood). > > Multi-line and single-line mean totally different things: you can use one > of them or both or neither. Multi-line mode means that ^ and $ will match > the beginning and end of a line as well as the beginning and the end of > the string. In non-multi-line mode, they match only the beginning and > the end of the string. Single-line mode means that dot matches newline; > non-single-line mode means that it does not. > Yes, exactly. The terminology "single-line" (/s) and "multi-line" (/m) come from Perl though, and I think are confusing. But these flags exist only for PCRE compatibility, so I don't think it's worth changing them. With SREs there is no confusion: you always say explicitly `any' or `nonl', `bol/eol' or `bos/eos'. Note this is the same in Ruby regexen (which also allows a /m flag). -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Regex fail?
Peter Bex scripsit: > Note the nonl, which the manual states is equivalent to ".", but of > course nonl means "no newline". Dot in regular expressions has *always* meant "match any character but a newline". It doesn't come up that much in Unix commands, which typically process their input line by line anyway. But if you look at the Posix definition or the Perl one, you see that dot is indeed equivalent to "nonl". Indeed, "nonl" exists in order to have an SRE equivalent for dot. > Maybe Alex can give us some info about why this is the case? I think this > may have something to do with the multi-line / single-line distinction > (which, to be honest, I never really understood). Multi-line and single-line mean totally different things: you can use one of them or both or neither. Multi-line mode means that ^ and $ will match the beginning and end of a line as well as the beginning and the end of the string. In non-multi-line mode, they match only the beginning and the end of the string. Single-line mode means that dot matches newline; non-single-line mode means that it does not. -- John Cowan http://www.ccil.org/~cowanco...@ccil.org You know, you haven't stopped talking since I came here. You must have been vaccinated with a phonograph needle. --Rufus T. Firefly ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Regex fail?
On Thu, Oct 29, 2015 at 09:12:44PM -0700, Matt Welland wrote: > (string-match "^([^\n]*)(\n.*|).*$" "This\nis \n") > => #f > > Using Ruby as comparison: > > irb(main):001:0> "This\nis \n".match(/^([^\n]*)(\n.*|)$/) > => # Interesting! This seems to be a problem in the way string->sre works: #;10> (string->sre "^([^\n]*)(\n.*|).*$") (seq bos (submatch (* (/ #\xe000 #\x10 #\vtab #\xd7ff #\null #\tab))) (submatch (or (seq "\n" (* nonl)) epsilon)) (* nonl) eos) Note the nonl, which the manual states is equivalent to ".", but of course nonl means "no newline". You can work around this by using the SRE directly: #;12> (irregex-match '(seq bos (submatch (* (~ "\n"))) (submatch (or (seq "\n" (* any)) epsilon)) (* any) eos) "This\nis \n") # #;13> (irregex-match-substring #12 1) "This" #;14> (irregex-match-substring #12 2) "\nis \n" Fixing this in irregex would be trivial, but I guess there's a *reason* why "." is considered the same as 'nonl. Maybe Alex can give us some info about why this is the case? I think this may have something to do with the multi-line / single-line distinction (which, to be honest, I never really understood). Cheers, Peter signature.asc Description: Digital signature ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
[Chicken-users] Regex fail?
(string-match "^([^\n]*)(\n.*|).*$" "This\nis \n") => #f Using Ruby as comparison: irb(main):001:0> "This\nis \n".match(/^([^\n]*)(\n.*|)$/) => # ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users