Re: S05 question

2004-12-08 Thread Alexey Trofimenko
On Wed, 8 Dec 2004 16:07:43 -0700, Luke Palmer <[EMAIL PROTECTED]> wrote:
Ashley Winters writes:
In a one-liner, I'd rather just use:
$datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./
I'm starting to think that this '$year := ' syntax is an obfuscator. We  
couldn't refer to that capture with $year even inside a regex, right? We  
should use $ instead. Maybe $ := (\d+) would be less  
obfuscating.. but it's longer :)
(year:= \d+) and [year:= \d+] are somewhat better, IMHO, but I'm not sure  
if : in := is unambigous here.
but if // and /$year:=.../ both capture to $, why not make  
thoose two more similar? things like  or  or   
[\d+] come to mind. or that (now unused) <> [\d+]

Then go ahead and use that.  If you're going to use subrules, you can
either use the  form or just the regular old  form
and ignore the result.  There's nothing forcing you to pay attention to
those.  The number variables only get incremented when you use
parentheses.  I'd suspect that the return value of a rule only accounts
for parenthecized captures as well.
.."and ignore the result"? hm. what if someone lazy will put $a ~~  
// instead of $a ~~ //, would be there any copying overhead  
after $a = "something else" (to keep $, which he isn't even going to  
use).
(Some perl5 programmers use (...) where (?:...) would be sufficient, just  
because they are too lazy to put extra two characters, and because it's  
noisier.  is better than <> for noncapturing behaviour in  
that sense, but I could imagine thoose  everywhere.. um, just  
moaning..  maybe old, nonswapped behaviour, was better:   to not  
capture, <> to capture (I don't think  and  are appropriate.



Re: S05 question

2004-12-08 Thread Ashley Winters
On Wed, 8 Dec 2004 16:07:43 -0700, Luke Palmer <[EMAIL PROTECTED]> wrote:
> Ashley Winters writes:
> > For a grammar, that works perfectly!
> 
> Yep.
> 
> > In a one-liner, I'd rather just use:
> >
> > $datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./
> 
> Then go ahead and use that.  If you're going to use subrules, you can
> either use the  form or just the regular old  form
> and ignore the result.  There's nothing forcing you to pay attention to
> those.  The number variables only get incremented when you use
> parentheses.  I'd suspect that the return value of a rule only accounts
> for parenthecized captures as well.

I was working on the (possibly misguided) assumption that there's a
cost to capturing, and that perhaps agressive capturing isn't worth
having "on" in a one-liner. Some deep part of my mind remembers $`
being bad, I think. If there's no consequence to having capture being
on, then ignoring it is fine. I don't have a problem with that. As I
said before,  reads fine to me.

I'm still going to prefer using :=, simply as a good programming
practice. My mind sees a big difference between building a parse-tree
object and just grepping for some word I want in a string. Within a
rule{} block, there is no place except the rule object to keep your
data (hypothetically -- haha), so it makes sense to have everything
capture unless otherwise specified. There's no such limitation in a
regular code block, so I don't see the need.

I may change my mind after using $/[2]

Ashley Winters


Re: S05 question

2004-12-08 Thread Larry Wall
On Wed, Dec 08, 2004 at 11:09:30AM -0700, Patrick R. Michaud wrote:
: On Wed, Dec 08, 2004 at 08:19:17AM -0800, Larry Wall wrote:
: > And people would have to get used to seeing ? as non-capturing assertions:
: > 
: > 
: > 
: > 
: > 
: > This has a rather Ruby-esque "I am a boolean" feeling to it.  I think
: > I like it.  It's pretty easy to type, at least on my keyboard.
: 
: FWIW, for some reason in rule contexts I tend to conflate 
: "I am a boolean" feelings with "zero-width assertion", so that each
: of those look vaguely to me as though I'm testing a zero-width 
: proposition and not consuming any text.  And I still tend to think of
: '?' in it's "zero or one matches" or "minimal match" connotations.
: Oh well, I suppose I could get used to that.

Yes, there are those interferences, which was one of the reasons for
removing ? the last time we had it in that position (albeit on the
captures rather than the non-captures).  I think we'll have to let
it set a while to see how it feels in this role.  For the purpose of
being a non-alpha no-op, any other non-alpha character would do as well,
so maybe the "I am a boolean" feeling is not that useful.

: > Now suppose that we extend that "I am a boolean" feeling to
: > 
: > which might take the place of the confusing <(...)>, and make consistent
: > the notion that we always use {...} to invoke "real" code.
: 
: Hmm, this is nice, however.

In some ways, and not so nice in others, as Luke pointed out.

: > Another problem we've run into is naming if there are multiple assertions
: > of the same name.  If the capture name is just the alpha part of the
: > assertion, then we could allow an optional number, and still recognize
: > it as a "ws":
: >   
: > Except I can well imagine people wanting numbered rules.  Drat.  Could
: > force people to say  if they want that, I suppose.
: 
: I had been thinking that 
: 
: /   /
: 
: would simply cause $ to be a list of captured elements, similar to 
: what might happen for $1 in 
: 
: / [ (.*?) , ]* /

That's what happens by default whenever there is a name conflict.  This
would just be a way of giving a rule a "long name" as well as a short one,
much like &abs is the long name of &abs when dispatched on a
complex number, whereas &abs is just the set of all abs() multis, if
there is such a beastie.

: If someone really needs the contents of the first and second , they
: could do
: 
:()  ()
: 
: and get them as $1 and $2.  But, seeing this tells me that perhaps
: <(rule)> should be used for capturing rules, analogous to the
: capturing parens, and leave  to be the non-capturing version.
: But maybe that's anti-Huffman overall.  Maybe the parens could also
: help for disambiguating
: 
:<(ws)>  <(ws)>
: 
: so that we end up with $/[1], $/[2], etc.  But then we might
: have to always subscript our named captures, which is icky, or maybe 
: we'd only make $/ act like list when there's more than one 
: capturing <(ws)> in the rule.
: 
: I dunno.  I kinda like <(rule)> for capturing, but maybe it just
: doesn't work.

I thought about that a long time, which was part of the reason I also
thought about freeing up <(...)>.  But it just seems a little icky
to mix together the named captures and numbered captures visually if
not semantically.  It starts not being at all clear which parentheses
count and which ones not.  Which is perhaps another reason for changing
current <(...)> to .

We could, I suppose use a subscript inside:

  
  

but then you'd reference it as

$[0]
$

which is a gratuitous difference, and suffers the same problem as
the parenthese in confusing real arrays/hashes with sorta fake ones.
So I think we'll stick with the hyphen names for now, which have the
benefit of looking the same and not sending us to bracket heaven.

  
  

$
$

Larry


Re: S05 question

2004-12-08 Thread Larry Wall
On Wed, Dec 08, 2004 at 11:50:51AM -0700, Luke Palmer wrote:
: > Now suppose that we extend that "I am a boolean" feeling to
: > 
: > 
: > 
: > which might take the place of the confusing <(...)>, and make consistent
: > the notion that we always use {...} to invoke "real" code.
: 
: Hmm...  I'm just so attached to <(...)>.  I find it quite beautiful.  It
: also somehow communicates the feeling "you shouldn't be putting
: side-effects here".

Well, there is that.  On the other hand, <{...}> is usually just as
side-effect free.  I'm still of two minds about  vs <(...)>.
Course, if we used «...» to interpolate something then «{...}»
might interpolate a rule, which would free up <{...}> for the code
assertion.  Doesn't have your side-effectlessness feeling, but it is
at least symmetrical.

: > I think I'm leaning toward the idea that anything in angles that
: > begins alpha is a capture to just the alpha part, so the ? prefix is
: > merely a no-op that happens to make the assertion not start with an
: > alpha.  Interestingly, that gives these implicit bindings:
: > 
: >  $$`
: > $   $'
: 
: I don't quite follow.  Wouldn't that mean that these guys would get
: clobbered if you used lookaheads or lookbehinds in your rules?

The point is that you don't get the $`/$' equivalents unless you
explicitly put a lookbehind/lookahead assertion in your pattern:

/ foo /

That has the benefit of telling the rule engine when it has to worry
about saving the prefix/postfix.  Not knowing that is part of why
we had the sawampersand problem in Perl 5.

My other point is that the Perl 6 names of $` and $' fall out naturally
if we name the assertions appropriately.  Unfortunately, $ and
$ don't work as well for variable names as they do for assertion
names.  Maybe we just have  and  forms that really mean 
and .

: > Or we could use some standard delim for that:
: > 
: >   
: > 
: > which is vaguely reminiscent of our "version" syntax.  Indeed, if we
: > had quantifications, you might well want to have wildcards  and
: > let the name be filled in rather than autogenerating a list.  But
: > maybe we just stick with lists in that case.
: 
: I can imagine this being a lot cleaner if the thing after the dash can
: be any sort of identifier:
: 
:  if   

Funny thing, I just wrote that into S05.pod.

: On the other hand, it could be misleading, since the standard naming of
: BNF uses dashes instead of underscored.  I don't think it should be a
: big problem though. 

Me either, since it's difficult to define a rule with a hyphen in the name.
And other delimiter candidates run into various problems too.

Larry


Re: Is object representation "per class" or "per object"?

2004-12-08 Thread Larry Wall
On Tue, Dec 07, 2004 at 12:32:50PM -0500, Abhijit Mahabal wrote:
: According to S12, it is possible to supply the object layout to bless(), 
: like so:
: 
: $object = $class.bless(:CREATE[:repr] :k1($v1) :k2($v2))
: 
: But in the section "Introspection", "layout" is a class trait. Does this 
: mean that classes have a default layout that can be overriden for 
: individual objects?

Er, no.  It's probably just a braino.  If it works at all, I think
it's probably for when the class doesn't specify a layout, or has a
meta-layout that can handle multiple layouts. It might not even make
sense for that.  In general, a class should have a consistent layout.

I think I was thinking about the fact that Perl 5's bless can just use
whatever data structure you hand it.  So maybe

$object = $class.bless(:CREATE[:repr] :k1($v1) :k2($v2))

is equivalent to

$object = $class.bless({}, :k1($v1) :k2($v2))

But mostly I was just looking for an example option to pass to :CREATE.
Perhaps :repr is a bit too violent for that.

Larry


Re: S05 question

2004-12-08 Thread Juerd

Warning: excessive nitpicking ahead.


Ashley Winters skribis 2004-12-08 10:51 (-0800):
> rule year { \d<4> }

\d**{4}

Or, well, \d**{2,4}

> rule month { \d<2> }

\d**{2}

> rule date {  -?  -?  }

rule week { \d**{2} }
rule yday { \d**{3} }
rule date {
 
[
-? 
[ 
 
|
[ [ W |  ] [ -?  ]? ] 
]
]?
}  # :)

> rule time {  \:?  \:?  [\. ]? }

Likewise making parts optional, and "." can also be ",".


> rule datetime {  T  }

rule timezone { Z | <[+-]>  [ \:?  ]? }

rule datetime {  [ T  ? ]? }


And still this isn't a full ISO8601 grammar. But I it now covers every
notation that I have seen in the wild so far. A useful source of
information, apart from the ISO standard itself, is
DateTime-Format-ISO8601.


Juerd


Re: S05 question

2004-12-08 Thread Luke Palmer
Ashley Winters writes:
> I'm thinking capturing rules should be default in rules, where they're
> downright useful. Your hour/minute/second comment brings up parsing
> ISO time:
> 
> grammar ISO8601::DateTime {
> rule year { \d<4> }
> rule month { \d<2> }
> rule day { \d<2> }
> rule hour { \d<2> }
> rule minute { \d<2> }
> rule second { \d<2> }
> rule fraction { \d+ }
> 
> rule date {  -?  -?  }
> rule time {  \:?  \:?  [\. ]? }
> rule datetime {  T  }
> }
> 
> For a grammar, that works perfectly!

Yep. 

> In a one-liner, I'd rather just use:
> 
> $datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./

Then go ahead and use that.  If you're going to use subrules, you can
either use the  form or just the regular old  form
and ignore the result.  There's nothing forcing you to pay attention to
those.  The number variables only get incremented when you use
parentheses.  I'd suspect that the return value of a rule only accounts
for parenthecized captures as well.

Or are you asking something different than that?

Luke


Re: S05 question

2004-12-08 Thread Ashley Winters
On Wed, 8 Dec 2004 08:19:17 -0800, Larry Wall <[EMAIL PROTECTED]> wrote:
> / $ := [ () = (\N+) ]* /

You know, to be honest I don't know that I want rules in one-liners to
capture by default. I certainly want them to capture in rules, though.

> And people would have to get used to seeing ? as non-capturing assertions:
> 
> 
> 
> 
> 
> 
> 
> This has a rather Ruby-esque "I am a boolean" feeling to it.  I think
> I like it.  It's pretty easy to type, at least on my keyboard.

I like it. It reads to me as "if before ...", "if null". Sounds good.

> I think I'm leaning toward the idea that anything in angles that
> begins alpha is a capture to just the alpha part, so the ? prefix is
> merely a no-op that happens to make the assertion not start with an
> alpha.  Interestingly, that gives these implicit bindings:
> 
>  $$`
> $   $'

Again, I don't see the utility of that in a one-liner. In a grammar,
you would create a real rule which would assert  and
capture the result in a reasonable name.

> Anyway, that's where I am this week/day/hour/minute/second.

I'm thinking capturing rules should be default in rules, where they're
downright useful. Your hour/minute/second comment brings up parsing
ISO time:

grammar ISO8601::DateTime {
rule year { \d<4> }
rule month { \d<2> }
rule day { \d<2> }
rule hour { \d<2> }
rule minute { \d<2> }
rule second { \d<2> }
rule fraction { \d+ }

rule date {  -?  -?  }
rule time {  \:?  \:?  [\. ]? }
rule datetime {  T  }
}

For a grammar, that works perfectly!

In a one-liner, I'd rather just use:

$datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./

and specify the vars I want to save directly in my own scope.

Ashley Winters


Re: S05 question

2004-12-08 Thread Luke Palmer
Larry Wall writes:
> If we're going to stick with the notion that  captures and
> something else doesn't, I'm beginning to think that the other thing
> isn't Âfoo for a couple of reasons.

I just sat down to say the exact same thing.  I'm glad you beat me to
it.

> And people would have to get used to seeing ? as non-capturing assertions:
> 
> 
> 
> 
> 
> 
> 
> This has a rather Ruby-esque "I am a boolean" feeling to it.  I think
> I like it.  It's pretty easy to type, at least on my keyboard.

Yeah, I like it pretty well too.  Better than the french quites for
sure.

> Now suppose that we extend that "I am a boolean" feeling to
> 
> 
> 
> which might take the place of the confusing <(...)>, and make consistent
> the notion that we always use {...} to invoke "real" code.

Hmm...  I'm just so attached to <(...)>.  I find it quite beautiful.  It
also somehow communicates the feeling "you shouldn't be putting
side-effects here".

> I think I'm leaning toward the idea that anything in angles that
> begins alpha is a capture to just the alpha part, so the ? prefix is
> merely a no-op that happens to make the assertion not start with an
> alpha.  Interestingly, that gives these implicit bindings:
> 
>$$`
>   $   $'

I don't quite follow.  Wouldn't that mean that these guys would get
clobbered if you used lookaheads or lookbehinds in your rules?

> Or we could use some standard delim for that:
> 
>   
> 
> which is vaguely reminiscent of our "version" syntax.  Indeed, if we
> had quantifications, you might well want to have wildcards  and
> let the name be filled in rather than autogenerating a list.  But
> maybe we just stick with lists in that case.

I can imagine this being a lot cleaner if the thing after the dash can
be any sort of identifier:

 if   

On the other hand, it could be misleading, since the standard naming of
BNF uses dashes instead of underscored.  I don't think it should be a
big problem though. 

> I'm still thinking about what Â... might mean, if anything.  Bonus
> points for interpolative and/or word-splitty.

Yeah... umm... nope.  I got nothin.

Luke


Re: S05 question

2004-12-08 Thread Patrick R. Michaud
On Wed, Dec 08, 2004 at 08:19:17AM -0800, Larry Wall wrote:
> And people would have to get used to seeing ? as non-capturing assertions:
> 
> 
> 
> 
> 
> This has a rather Ruby-esque "I am a boolean" feeling to it.  I think
> I like it.  It's pretty easy to type, at least on my keyboard.

FWIW, for some reason in rule contexts I tend to conflate 
"I am a boolean" feelings with "zero-width assertion", so that each
of those look vaguely to me as though I'm testing a zero-width 
proposition and not consuming any text.  And I still tend to think of
'?' in it's "zero or one matches" or "minimal match" connotations.
Oh well, I suppose I could get used to that.

> Now suppose that we extend that "I am a boolean" feeling to
> 
> which might take the place of the confusing <(...)>, and make consistent
> the notion that we always use {...} to invoke "real" code.

Hmm, this is nice, however.

> Another problem we've run into is naming if there are multiple assertions
> of the same name.  If the capture name is just the alpha part of the
> assertion, then we could allow an optional number, and still recognize
> it as a "ws":
>   
> Except I can well imagine people wanting numbered rules.  Drat.  Could
> force people to say  if they want that, I suppose.

I had been thinking that 

/   /

would simply cause $ to be a list of captured elements, similar to 
what might happen for $1 in 

/ [ (.*?) , ]* /

If someone really needs the contents of the first and second , they
could do

   ()  ()

and get them as $1 and $2.  But, seeing this tells me that perhaps
<(rule)> should be used for capturing rules, analogous to the
capturing parens, and leave  to be the non-capturing version.
But maybe that's anti-Huffman overall.  Maybe the parens could also
help for disambiguating

   <(ws)>  <(ws)>

so that we end up with $/[1], $/[2], etc.  But then we might
have to always subscript our named captures, which is icky, or maybe 
we'd only make $/ act like list when there's more than one 
capturing <(ws)> in the rule.

I dunno.  I kinda like <(rule)> for capturing, but maybe it just
doesn't work.

Pm


Re: S05 question

2004-12-08 Thread Austin Hastings
Larry Wall wrote:
Another problem we've run into is naming if there are multiple assertions
of the same name.  If the capture name is just the alpha part of the
assertion, then we could allow an optional number, and still recognize
it as a "ws":
 
Except I can well imagine people wanting numbered rules.  Drat.  Could
force people to say  if they want that, I suppose.
Or we could use some standard delim for that:
 
which is vaguely reminiscent of our "version" syntax.  Indeed, if we
had quantifications, you might well want to have wildcards  and
let the name be filled in rather than autogenerating a list.  But maybe
we just stick with lists in that case.
For captures of non-alpha assertions, we could say that ? is the same
as "true" (just as with regular operators), and so
   -[aeiou]>
would capture to $.  (And one could always do an explicit binding
for a different name.)
Actually, I think people would find $ more meaningful than
C.
 

PHP's use of $array[] as "push" might work for this:
-[aeiou]>
or
<@true +-[aeiou]>
or
 +-[aeiou]>
or
-[aeiou]>
I like the idea of being able to "continue" versus "chunk" patterns. How 
do you say  "This is a continuation of the other " versus "This 
is a separate " ?

=Austin


Re: S05 question

2004-12-08 Thread Larry Wall
On Tue, Dec 07, 2004 at 10:36:53PM -0800, Larry Wall wrote:
: But somehow I expect that when someone writes () they probably
: usually meant («foo»).

If we're going to stick with the notion that  captures and something
else doesn't, I'm beginning to think that the other thing isn't «foo» for
a couple of reasons.  First, if other languages are going to borrow this
notation, they're probably not going to buy into the French quotes.  Second,
I can think of several other possible uses for the French quotes to cure
perceived ills such as the <(...)> vs <{...}> confusion.  Third, it now
bothers me to have a ! without a ?.  So what if «foo» is instead written
, meaning you only want to evaluate its success.  (Unlike ,
it's not zero-width, but that's just how success/failure works.)  So we'd
get things like

/ $ := [ () = (\N+) ]* /

And people would have to get used to seeing ? as non-capturing assertions:







This has a rather Ruby-esque "I am a boolean" feeling to it.  I think
I like it.  It's pretty easy to type, at least on my keyboard.

Now suppose that we extend that "I am a boolean" feeling to



which might take the place of the confusing <(...)>, and make consistent
the notion that we always use {...} to invoke "real" code.

: : Or is it that hypotheticals only bind to things captured by parens?
: : If so, it might need clarification (or perhaps I'm overlooking the part
: : that makes it clear).
: 
: No, I think you just found a blind spot in the design.

I think I'm leaning toward the idea that anything in angles that
begins alpha is a capture to just the alpha part, so the ? prefix is 
merely a no-op that happens to make the assertion not start with an
alpha.  Interestingly, that gives these implicit bindings:

 $$`
$   $'

Thought that's an argument for changing them to  and ,
I suppose, since if users are going to refer to $ in their main
program, it doesn't look like a declarative assertion anymore.

Another problem we've run into is naming if there are multiple assertions
of the same name.  If the capture name is just the alpha part of the
assertion, then we could allow an optional number, and still recognize
it as a "ws":

  

Except I can well imagine people wanting numbered rules.  Drat.  Could
force people to say  if they want that, I suppose.

Or we could use some standard delim for that:

  

which is vaguely reminiscent of our "version" syntax.  Indeed, if we
had quantifications, you might well want to have wildcards  and
let the name be filled in rather than autogenerating a list.  But maybe
we just stick with lists in that case.

For captures of non-alpha assertions, we could say that ? is the same
as "true" (just as with regular operators), and so

-[aeiou]>

would capture to $.  (And one could always do an explicit binding
for a different name.)

Actually, I think people would find $ more meaningful than
C.

I'm still thinking about what «...» might mean, if anything.  Bonus points
for interpolative and/or word-splitty.

Anyway, that's where I am this week/day/hour/minute/second.

Larry