Re: [Chicken-users] Regex fail?

2015-11-01 Thread Alex Shinn
On Fri, Oct 30, 2015 at 10:01 PM, John Cowan  wrote:

> Peter Bex scripsit:
>
> > Note the nonl, which the manual states is equivalent to ".", but of
> > course nonl means "no newline".
>
> Dot in regular expressions has *always* meant "match any character but a
> newline".  It doesn't come up that much in Unix commands, which typically
> process their input line by line anyway.  But if you look at the Posix
> definition or the Perl one, you see that dot is indeed equivalent to
> "nonl".
> Indeed, "nonl" exists in order to have an SRE equivalent for dot.
>
> > Maybe Alex can give us some info about why this is the case?  I think
> this
> > may have something to do with the multi-line / single-line distinction
> > (which, to be honest, I never really understood).
>
> Multi-line and single-line mean totally different things: you can use one
> of them or both or neither.  Multi-line mode means that ^ and $ will match
> the beginning and end of a line as well as the beginning and the end of
> the string.  In non-multi-line mode, they match only the beginning and
> the end of the string.  Single-line mode means that dot matches newline;
> non-single-line mode means that it does not.
>

Yes, exactly.  The terminology "single-line" (/s) and "multi-line" (/m)
come from Perl though, and I think are confusing.  But these flags
exist only for PCRE compatibility, so I don't think it's worth changing
them.  With SREs there is no confusion: you always say explicitly
`any' or `nonl', `bol/eol' or `bos/eos'.

Note this is the same in Ruby regexen (which also allows a /m flag).

-- 
Alex
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Regex fail?

2015-10-30 Thread John Cowan
Peter Bex scripsit:

> Note the nonl, which the manual states is equivalent to ".", but of
> course nonl means "no newline".

Dot in regular expressions has *always* meant "match any character but a
newline".  It doesn't come up that much in Unix commands, which typically
process their input line by line anyway.  But if you look at the Posix
definition or the Perl one, you see that dot is indeed equivalent to "nonl".
Indeed, "nonl" exists in order to have an SRE equivalent for dot.

> Maybe Alex can give us some info about why this is the case?  I think this
> may have something to do with the multi-line / single-line distinction
> (which, to be honest, I never really understood).

Multi-line and single-line mean totally different things: you can use one
of them or both or neither.  Multi-line mode means that ^ and $ will match
the beginning and end of a line as well as the beginning and the end of
the string.  In non-multi-line mode, they match only the beginning and
the end of the string.  Single-line mode means that dot matches newline;
non-single-line mode means that it does not.

-- 
John Cowan  http://www.ccil.org/~cowanco...@ccil.org
You know, you haven't stopped talking since I came here. You must
have been vaccinated with a phonograph needle.
--Rufus T. Firefly

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Regex fail?

2015-10-30 Thread Peter Bex
On Thu, Oct 29, 2015 at 09:12:44PM -0700, Matt Welland wrote:
> (string-match "^([^\n]*)(\n.*|).*$" "This\nis \n")
> => #f
> 
> Using Ruby as comparison:
> 
> irb(main):001:0> "This\nis \n".match(/^([^\n]*)(\n.*|)$/)
> => #

Interesting!  This seems to be a problem in the way string->sre works:

#;10> (string->sre  "^([^\n]*)(\n.*|).*$")
(seq bos (submatch (* (/ #\xe000 #\x10 #\vtab #\xd7ff #\null #\tab))) 
(submatch (or (seq "\n" (* nonl)) epsilon)) (* nonl) eos)

Note the nonl, which the manual states is equivalent to ".", but of
course nonl means "no newline".

You can work around this by using the SRE directly:

#;12> (irregex-match '(seq bos (submatch (* (~ "\n"))) (submatch (or (seq "\n" 
(* any)) epsilon)) (* any) eos)  "This\nis \n")
#
#;13> (irregex-match-substring #12 1)
"This"
#;14> (irregex-match-substring #12 2)
"\nis \n"

Fixing this in irregex would be trivial, but I guess there's a *reason*
why "." is considered the same as 'nonl.

Maybe Alex can give us some info about why this is the case?  I think this
may have something to do with the multi-line / single-line distinction
(which, to be honest, I never really understood).

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] Regex fail?

2015-10-29 Thread Matt Welland
(string-match "^([^\n]*)(\n.*|).*$" "This\nis \n")
=> #f

Using Ruby as comparison:

irb(main):001:0> "This\nis \n".match(/^([^\n]*)(\n.*|)$/)
=> #
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users