Re: S05 question
On Wed, 8 Dec 2004 16:07:43 -0700, Luke Palmer <[EMAIL PROTECTED]> wrote: Ashley Winters writes: In a one-liner, I'd rather just use: $datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./ I'm starting to think that this '$year := ' syntax is an obfuscator. We couldn't refer to that capture with $year even inside a regex, right? We should use $ instead. Maybe $ := (\d+) would be less obfuscating.. but it's longer :) (year:= \d+) and [year:= \d+] are somewhat better, IMHO, but I'm not sure if : in := is unambigous here. but if // and /$year:=.../ both capture to $, why not make thoose two more similar? things like or or [\d+] come to mind. or that (now unused) <> [\d+] Then go ahead and use that. If you're going to use subrules, you can either use the form or just the regular old form and ignore the result. There's nothing forcing you to pay attention to those. The number variables only get incremented when you use parentheses. I'd suspect that the return value of a rule only accounts for parenthecized captures as well. .."and ignore the result"? hm. what if someone lazy will put $a ~~ // instead of $a ~~ //, would be there any copying overhead after $a = "something else" (to keep $, which he isn't even going to use). (Some perl5 programmers use (...) where (?:...) would be sufficient, just because they are too lazy to put extra two characters, and because it's noisier. is better than <> for noncapturing behaviour in that sense, but I could imagine thoose everywhere.. um, just moaning.. maybe old, nonswapped behaviour, was better: to not capture, <> to capture (I don't think  and  are appropriate.
Re: S05 question
On Wed, 8 Dec 2004 16:07:43 -0700, Luke Palmer <[EMAIL PROTECTED]> wrote: > Ashley Winters writes: > > For a grammar, that works perfectly! > > Yep. > > > In a one-liner, I'd rather just use: > > > > $datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./ > > Then go ahead and use that. If you're going to use subrules, you can > either use the form or just the regular old form > and ignore the result. There's nothing forcing you to pay attention to > those. The number variables only get incremented when you use > parentheses. I'd suspect that the return value of a rule only accounts > for parenthecized captures as well. I was working on the (possibly misguided) assumption that there's a cost to capturing, and that perhaps agressive capturing isn't worth having "on" in a one-liner. Some deep part of my mind remembers $` being bad, I think. If there's no consequence to having capture being on, then ignoring it is fine. I don't have a problem with that. As I said before, reads fine to me. I'm still going to prefer using :=, simply as a good programming practice. My mind sees a big difference between building a parse-tree object and just grepping for some word I want in a string. Within a rule{} block, there is no place except the rule object to keep your data (hypothetically -- haha), so it makes sense to have everything capture unless otherwise specified. There's no such limitation in a regular code block, so I don't see the need. I may change my mind after using $/[2] Ashley Winters
Re: S05 question
On Wed, Dec 08, 2004 at 11:09:30AM -0700, Patrick R. Michaud wrote: : On Wed, Dec 08, 2004 at 08:19:17AM -0800, Larry Wall wrote: : > And people would have to get used to seeing ? as non-capturing assertions: : > : > : > : > : > : > This has a rather Ruby-esque "I am a boolean" feeling to it. I think : > I like it. It's pretty easy to type, at least on my keyboard. : : FWIW, for some reason in rule contexts I tend to conflate : "I am a boolean" feelings with "zero-width assertion", so that each : of those look vaguely to me as though I'm testing a zero-width : proposition and not consuming any text. And I still tend to think of : '?' in it's "zero or one matches" or "minimal match" connotations. : Oh well, I suppose I could get used to that. Yes, there are those interferences, which was one of the reasons for removing ? the last time we had it in that position (albeit on the captures rather than the non-captures). I think we'll have to let it set a while to see how it feels in this role. For the purpose of being a non-alpha no-op, any other non-alpha character would do as well, so maybe the "I am a boolean" feeling is not that useful. : > Now suppose that we extend that "I am a boolean" feeling to : > : > which might take the place of the confusing <(...)>, and make consistent : > the notion that we always use {...} to invoke "real" code. : : Hmm, this is nice, however. In some ways, and not so nice in others, as Luke pointed out. : > Another problem we've run into is naming if there are multiple assertions : > of the same name. If the capture name is just the alpha part of the : > assertion, then we could allow an optional number, and still recognize : > it as a "ws": : > : > Except I can well imagine people wanting numbered rules. Drat. Could : > force people to say if they want that, I suppose. : : I had been thinking that : : / / : : would simply cause $ to be a list of captured elements, similar to : what might happen for $1 in : : / [ (.*?) , ]* / That's what happens by default whenever there is a name conflict. This would just be a way of giving a rule a "long name" as well as a short one, much like &abs is the long name of &abs when dispatched on a complex number, whereas &abs is just the set of all abs() multis, if there is such a beastie. : If someone really needs the contents of the first and second , they : could do : :() () : : and get them as $1 and $2. But, seeing this tells me that perhaps : <(rule)> should be used for capturing rules, analogous to the : capturing parens, and leave to be the non-capturing version. : But maybe that's anti-Huffman overall. Maybe the parens could also : help for disambiguating : :<(ws)> <(ws)> : : so that we end up with $/[1], $/[2], etc. But then we might : have to always subscript our named captures, which is icky, or maybe : we'd only make $/ act like list when there's more than one : capturing <(ws)> in the rule. : : I dunno. I kinda like <(rule)> for capturing, but maybe it just : doesn't work. I thought about that a long time, which was part of the reason I also thought about freeing up <(...)>. But it just seems a little icky to mix together the named captures and numbered captures visually if not semantically. It starts not being at all clear which parentheses count and which ones not. Which is perhaps another reason for changing current <(...)> to . We could, I suppose use a subscript inside: but then you'd reference it as $[0] $ which is a gratuitous difference, and suffers the same problem as the parenthese in confusing real arrays/hashes with sorta fake ones. So I think we'll stick with the hyphen names for now, which have the benefit of looking the same and not sending us to bracket heaven. $ $ Larry
Re: S05 question
On Wed, Dec 08, 2004 at 11:50:51AM -0700, Luke Palmer wrote: : > Now suppose that we extend that "I am a boolean" feeling to : > : > : > : > which might take the place of the confusing <(...)>, and make consistent : > the notion that we always use {...} to invoke "real" code. : : Hmm... I'm just so attached to <(...)>. I find it quite beautiful. It : also somehow communicates the feeling "you shouldn't be putting : side-effects here". Well, there is that. On the other hand, <{...}> is usually just as side-effect free. I'm still of two minds about vs <(...)>. Course, if we used «...» to interpolate something then «{...}» might interpolate a rule, which would free up <{...}> for the code assertion. Doesn't have your side-effectlessness feeling, but it is at least symmetrical. : > I think I'm leaning toward the idea that anything in angles that : > begins alpha is a capture to just the alpha part, so the ? prefix is : > merely a no-op that happens to make the assertion not start with an : > alpha. Interestingly, that gives these implicit bindings: : > : > $$` : > $ $' : : I don't quite follow. Wouldn't that mean that these guys would get : clobbered if you used lookaheads or lookbehinds in your rules? The point is that you don't get the $`/$' equivalents unless you explicitly put a lookbehind/lookahead assertion in your pattern: / foo / That has the benefit of telling the rule engine when it has to worry about saving the prefix/postfix. Not knowing that is part of why we had the sawampersand problem in Perl 5. My other point is that the Perl 6 names of $` and $' fall out naturally if we name the assertions appropriately. Unfortunately, $ and $ don't work as well for variable names as they do for assertion names. Maybe we just have and forms that really mean and . : > Or we could use some standard delim for that: : > : > : > : > which is vaguely reminiscent of our "version" syntax. Indeed, if we : > had quantifications, you might well want to have wildcards and : > let the name be filled in rather than autogenerating a list. But : > maybe we just stick with lists in that case. : : I can imagine this being a lot cleaner if the thing after the dash can : be any sort of identifier: : : if Funny thing, I just wrote that into S05.pod. : On the other hand, it could be misleading, since the standard naming of : BNF uses dashes instead of underscored. I don't think it should be a : big problem though. Me either, since it's difficult to define a rule with a hyphen in the name. And other delimiter candidates run into various problems too. Larry
Re: Is object representation "per class" or "per object"?
On Tue, Dec 07, 2004 at 12:32:50PM -0500, Abhijit Mahabal wrote: : According to S12, it is possible to supply the object layout to bless(), : like so: : : $object = $class.bless(:CREATE[:repr] :k1($v1) :k2($v2)) : : But in the section "Introspection", "layout" is a class trait. Does this : mean that classes have a default layout that can be overriden for : individual objects? Er, no. It's probably just a braino. If it works at all, I think it's probably for when the class doesn't specify a layout, or has a meta-layout that can handle multiple layouts. It might not even make sense for that. In general, a class should have a consistent layout. I think I was thinking about the fact that Perl 5's bless can just use whatever data structure you hand it. So maybe $object = $class.bless(:CREATE[:repr] :k1($v1) :k2($v2)) is equivalent to $object = $class.bless({}, :k1($v1) :k2($v2)) But mostly I was just looking for an example option to pass to :CREATE. Perhaps :repr is a bit too violent for that. Larry
Re: S05 question
Warning: excessive nitpicking ahead. Ashley Winters skribis 2004-12-08 10:51 (-0800): > rule year { \d<4> } \d**{4} Or, well, \d**{2,4} > rule month { \d<2> } \d**{2} > rule date { -? -? } rule week { \d**{2} } rule yday { \d**{3} } rule date { [ -? [ | [ [ W | ] [ -? ]? ] ] ]? } # :) > rule time { \:? \:? [\. ]? } Likewise making parts optional, and "." can also be ",". > rule datetime { T } rule timezone { Z | <[+-]> [ \:? ]? } rule datetime { [ T ? ]? } And still this isn't a full ISO8601 grammar. But I it now covers every notation that I have seen in the wild so far. A useful source of information, apart from the ISO standard itself, is DateTime-Format-ISO8601. Juerd
Re: S05 question
Ashley Winters writes: > I'm thinking capturing rules should be default in rules, where they're > downright useful. Your hour/minute/second comment brings up parsing > ISO time: > > grammar ISO8601::DateTime { > rule year { \d<4> } > rule month { \d<2> } > rule day { \d<2> } > rule hour { \d<2> } > rule minute { \d<2> } > rule second { \d<2> } > rule fraction { \d+ } > > rule date { -? -? } > rule time { \:? \:? [\. ]? } > rule datetime { T } > } > > For a grammar, that works perfectly! Yep. > In a one-liner, I'd rather just use: > > $datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./ Then go ahead and use that. If you're going to use subrules, you can either use the form or just the regular old form and ignore the result. There's nothing forcing you to pay attention to those. The number variables only get incremented when you use parentheses. I'd suspect that the return value of a rule only accounts for parenthecized captures as well. Or are you asking something different than that? Luke
Re: S05 question
On Wed, 8 Dec 2004 08:19:17 -0800, Larry Wall <[EMAIL PROTECTED]> wrote: > / $ := [ () = (\N+) ]* / You know, to be honest I don't know that I want rules in one-liners to capture by default. I certainly want them to capture in rules, though. > And people would have to get used to seeing ? as non-capturing assertions: > > > > > > > > This has a rather Ruby-esque "I am a boolean" feeling to it. I think > I like it. It's pretty easy to type, at least on my keyboard. I like it. It reads to me as "if before ...", "if null". Sounds good. > I think I'm leaning toward the idea that anything in angles that > begins alpha is a capture to just the alpha part, so the ? prefix is > merely a no-op that happens to make the assertion not start with an > alpha. Interestingly, that gives these implicit bindings: > > $$` > $ $' Again, I don't see the utility of that in a one-liner. In a grammar, you would create a real rule which would assert and capture the result in a reasonable name. > Anyway, that's where I am this week/day/hour/minute/second. I'm thinking capturing rules should be default in rules, where they're downright useful. Your hour/minute/second comment brings up parsing ISO time: grammar ISO8601::DateTime { rule year { \d<4> } rule month { \d<2> } rule day { \d<2> } rule hour { \d<2> } rule minute { \d<2> } rule second { \d<2> } rule fraction { \d+ } rule date { -? -? } rule time { \:? \:? [\. ]? } rule datetime { T } } For a grammar, that works perfectly! In a one-liner, I'd rather just use: $datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./ and specify the vars I want to save directly in my own scope. Ashley Winters
Re: S05 question
Larry Wall writes: > If we're going to stick with the notion that captures and > something else doesn't, I'm beginning to think that the other thing > isn't Âfoo for a couple of reasons. I just sat down to say the exact same thing. I'm glad you beat me to it. > And people would have to get used to seeing ? as non-capturing assertions: > > > > > > > > This has a rather Ruby-esque "I am a boolean" feeling to it. I think > I like it. It's pretty easy to type, at least on my keyboard. Yeah, I like it pretty well too. Better than the french quites for sure. > Now suppose that we extend that "I am a boolean" feeling to > > > > which might take the place of the confusing <(...)>, and make consistent > the notion that we always use {...} to invoke "real" code. Hmm... I'm just so attached to <(...)>. I find it quite beautiful. It also somehow communicates the feeling "you shouldn't be putting side-effects here". > I think I'm leaning toward the idea that anything in angles that > begins alpha is a capture to just the alpha part, so the ? prefix is > merely a no-op that happens to make the assertion not start with an > alpha. Interestingly, that gives these implicit bindings: > >$$` > $ $' I don't quite follow. Wouldn't that mean that these guys would get clobbered if you used lookaheads or lookbehinds in your rules? > Or we could use some standard delim for that: > > > > which is vaguely reminiscent of our "version" syntax. Indeed, if we > had quantifications, you might well want to have wildcards and > let the name be filled in rather than autogenerating a list. But > maybe we just stick with lists in that case. I can imagine this being a lot cleaner if the thing after the dash can be any sort of identifier: if On the other hand, it could be misleading, since the standard naming of BNF uses dashes instead of underscored. I don't think it should be a big problem though. > I'm still thinking about what Â... might mean, if anything. Bonus > points for interpolative and/or word-splitty. Yeah... umm... nope. I got nothin. Luke
Re: S05 question
On Wed, Dec 08, 2004 at 08:19:17AM -0800, Larry Wall wrote: > And people would have to get used to seeing ? as non-capturing assertions: > > > > > > This has a rather Ruby-esque "I am a boolean" feeling to it. I think > I like it. It's pretty easy to type, at least on my keyboard. FWIW, for some reason in rule contexts I tend to conflate "I am a boolean" feelings with "zero-width assertion", so that each of those look vaguely to me as though I'm testing a zero-width proposition and not consuming any text. And I still tend to think of '?' in it's "zero or one matches" or "minimal match" connotations. Oh well, I suppose I could get used to that. > Now suppose that we extend that "I am a boolean" feeling to > > which might take the place of the confusing <(...)>, and make consistent > the notion that we always use {...} to invoke "real" code. Hmm, this is nice, however. > Another problem we've run into is naming if there are multiple assertions > of the same name. If the capture name is just the alpha part of the > assertion, then we could allow an optional number, and still recognize > it as a "ws": > > Except I can well imagine people wanting numbered rules. Drat. Could > force people to say if they want that, I suppose. I had been thinking that / / would simply cause $ to be a list of captured elements, similar to what might happen for $1 in / [ (.*?) , ]* / If someone really needs the contents of the first and second , they could do () () and get them as $1 and $2. But, seeing this tells me that perhaps <(rule)> should be used for capturing rules, analogous to the capturing parens, and leave to be the non-capturing version. But maybe that's anti-Huffman overall. Maybe the parens could also help for disambiguating <(ws)> <(ws)> so that we end up with $/[1], $/[2], etc. But then we might have to always subscript our named captures, which is icky, or maybe we'd only make $/ act like list when there's more than one capturing <(ws)> in the rule. I dunno. I kinda like <(rule)> for capturing, but maybe it just doesn't work. Pm
Re: S05 question
Larry Wall wrote: Another problem we've run into is naming if there are multiple assertions of the same name. If the capture name is just the alpha part of the assertion, then we could allow an optional number, and still recognize it as a "ws": Except I can well imagine people wanting numbered rules. Drat. Could force people to say if they want that, I suppose. Or we could use some standard delim for that: which is vaguely reminiscent of our "version" syntax. Indeed, if we had quantifications, you might well want to have wildcards and let the name be filled in rather than autogenerating a list. But maybe we just stick with lists in that case. For captures of non-alpha assertions, we could say that ? is the same as "true" (just as with regular operators), and so -[aeiou]> would capture to $. (And one could always do an explicit binding for a different name.) Actually, I think people would find $ more meaningful than C. PHP's use of $array[] as "push" might work for this: -[aeiou]> or <@true +-[aeiou]> or +-[aeiou]> or -[aeiou]> I like the idea of being able to "continue" versus "chunk" patterns. How do you say "This is a continuation of the other " versus "This is a separate " ? =Austin
Re: S05 question
On Tue, Dec 07, 2004 at 10:36:53PM -0800, Larry Wall wrote: : But somehow I expect that when someone writes () they probably : usually meant («foo»). If we're going to stick with the notion that captures and something else doesn't, I'm beginning to think that the other thing isn't «foo» for a couple of reasons. First, if other languages are going to borrow this notation, they're probably not going to buy into the French quotes. Second, I can think of several other possible uses for the French quotes to cure perceived ills such as the <(...)> vs <{...}> confusion. Third, it now bothers me to have a ! without a ?. So what if «foo» is instead written , meaning you only want to evaluate its success. (Unlike , it's not zero-width, but that's just how success/failure works.) So we'd get things like / $ := [ () = (\N+) ]* / And people would have to get used to seeing ? as non-capturing assertions: This has a rather Ruby-esque "I am a boolean" feeling to it. I think I like it. It's pretty easy to type, at least on my keyboard. Now suppose that we extend that "I am a boolean" feeling to which might take the place of the confusing <(...)>, and make consistent the notion that we always use {...} to invoke "real" code. : : Or is it that hypotheticals only bind to things captured by parens? : : If so, it might need clarification (or perhaps I'm overlooking the part : : that makes it clear). : : No, I think you just found a blind spot in the design. I think I'm leaning toward the idea that anything in angles that begins alpha is a capture to just the alpha part, so the ? prefix is merely a no-op that happens to make the assertion not start with an alpha. Interestingly, that gives these implicit bindings: $$` $ $' Thought that's an argument for changing them to and , I suppose, since if users are going to refer to $ in their main program, it doesn't look like a declarative assertion anymore. Another problem we've run into is naming if there are multiple assertions of the same name. If the capture name is just the alpha part of the assertion, then we could allow an optional number, and still recognize it as a "ws": Except I can well imagine people wanting numbered rules. Drat. Could force people to say if they want that, I suppose. Or we could use some standard delim for that: which is vaguely reminiscent of our "version" syntax. Indeed, if we had quantifications, you might well want to have wildcards and let the name be filled in rather than autogenerating a list. But maybe we just stick with lists in that case. For captures of non-alpha assertions, we could say that ? is the same as "true" (just as with regular operators), and so -[aeiou]> would capture to $. (And one could always do an explicit binding for a different name.) Actually, I think people would find $ more meaningful than C. I'm still thinking about what «...» might mean, if anything. Bonus points for interpolative and/or word-splitty. Anyway, that's where I am this week/day/hour/minute/second. Larry