Re: Hypothetical synonyms
Don't forget you can parameterize rules with subrules. I don't see any reason you couldn't write a kind of rule and do whatever you like with the submatched bits. Larry
Re: Hypothetical synonyms
Luke Palmer wrote at Thu, 29 Aug 2002 15:21:57 +0200: >> The ° character doesn't have any special meaning, >> that's why I choosed it in the above example. >> However, it also symbolizes a little capturing >> and as it isn't filled, >> it could really symbolize an uncapturing. > > Interesting idea. I'm not sure if I agree with it yet. However, I don't > agree with your syntax, as I can't type that character. Year, that's of course a problem. But I don't have any imagination what over typeable character with no other meaning could be choosen. > Is it possible to > modify what was captured? > > /" ([ \\ . { chop; chop } | <[^\\]> ]*?) "/ > > Or is that just too ugly? IMHO, that looks as ugly as the other workaround solutions :-) I think, the greatest strength of Perl is that it expresses simple things in a simple, short and natural way. Such a regexp behaviour would simplify a lot of jobs where we have to make workarounds instead about the simple stuff "Match it, capture the relevant parts and ignore some irrelevant subparts". It's always possible to implemented with - more captures, joined together later or - a substitution regexp/translitariton for the captured part to remove the irrelevant subparts It's from my IMHO comparable to problem "Group it, but don't capture it" what had been solved with the (?:) sytnax. >From that regarding, a (?_...) (Questionmark underscore) syntax could also be an idea with the meaning "Group it, don't capture it even not in surrounding captures". With it, the OP problem would look like: /\s*((?_").*?"(?_°)|\S+)/; (I choosed the underscore, as it is typeable and could have the mnemonic meaning of some underlying unimport background group) But perhaps, I'm only dreaming Cheerio, Janek
Re: Hypothetical synonyms
> The ° character doesn't have any special meaning, > that's why I choosed it in the above example. > However, it also symbolizes a little capturing > and as it isn't filled, > it could really symbolize an uncapturing. Interesting idea. I'm not sure if I agree with it yet. However, I don't agree with your syntax, as I can't type that character. Is it possible to modify what was captured? /" ([ \\ . { chop; chop } | <[^\\]> ]*?) "/ Or is that just too ugly? Luke
Re: Capturing alternations (was Re: Hypothetical synonyms)
Piers wrote: > Not exactly DWIM, but how about: > > my $stuff = /^\s* [ "(.*?)" | (\S+) ] : { $foo := $+ }/; > > Assuming $+ means 'the last capture group matched' as it does now. > Or just: my $stuff = /^\s* [ "$foo:=(.*?)" | $foo:=(\S+) ]/; BTW, that doesn't actually *do* the match. It merely puts a reference to a rule object into $stuff. Perhaps we all actually meant variants on: my $stuff = m/^\s* [ "$0:=(.*?)" | $0:=(\S+) ]/; ??? Damian
Capturing alternations (was Re: Hypothetical synonyms)
In a message dated Thu, 29 Aug 2002, Janek Schleicher writes: > Aaron Sherman wrote at Wed, 28 Aug 2002 00:34:15 +0200: > > > $stuff = (defined($1)?$1:$2) if /^\s*(?:"(.*?)"|(\S+))/; > > It gives me the idea of a missing feature: > > What really should be expressed is: > > my ($stuff) = /^\s*("°.*?"°|\S+)/; > > where the ° character would mean, > "Don't capture the previous element". Hmm. One thing that has always bothered me about regexes is capturing parentheses in alternations. It seems to me that: my ($stuff) = /^\s* [ "(.*?)" | (\S+) ]/; should DWIM somehow, since it's impossible that both parens will capture. So when the same number of capturing parens appear in each of an alternation, they should factor out to being a single return value. Is this possible in the general case? Trey
Re: Hypothetical synonyms
Aaron Sherman wrote at Wed, 28 Aug 2002 00:34:15 +0200: > $stuff = (defined($1)?$1:$2) if /^\s*(?:"(.*?)"|(\S+))/; It gives me the idea of a missing feature: What really should be expressed is: my ($stuff) = /^\s*("°.*?"°|\S+)/; where the ° character would mean, "Don't capture the previous element". I think that such a meaning of "uncapturing" elements from a regexp would be really nice, as it would help to express things directly, instead of going complicated ways. The ° character doesn't have any special meaning, that's why I choosed it in the above example. However, it also symbolizes a little capturing and as it isn't filled, it could really symbolize an uncapturing. I don't know how hard it would be to implement or whether it had already discussed yet. Greetings, Janek
Re: Hypothetical synonyms
On Thu, 29 Aug 2002, Steffen Mueller wrote: > Nicholas Clark wrote: > > On Thu, Aug 29, 2002 at 12:00:55AM +0300, Markus Laire wrote: > >> And I'm definitely going to try any future PerlGolf challenges also > >> in perl6. > > > > Is it considered better if perl6 use more characters than perl5? (ie > > implying probably less line noise) > > or less (getting your job done more tersely?) > > >From the bit of Perl6 information I've gathered from the Apocalypses, the > Exegesises (is that really the plural? Sounds horrible.), and my Exegeses (like parentheses) > perl6-language reading, I'd say Perl6 is not only going to be a bit more > verbose (unless you use the dreaded "use Perl5;" pragma ;) ), but it'll also > be a Good Thing. No, not nessecarily. If you do a line-by-line translation, yes. But the fact is, Perl 6 will be able to do more in a single line (cleanly) than Perl 5. For instance, hyper-operators. So, Perl 6 will contain less line-noise and more whitespace than Perl 5, but code will end up being shorter, too. You can see that in Exegesis 4 (or 3, not sure), where Damian takes Perl5ish Perl6 code, and then writes it back out in idiomatic Perl 6. You see how much shorter it becomes. Luke
Re: Hypothetical synonyms
On Thu, 29 Aug 2002, Markus Laire wrote: > (only 32bit numbers, modulo not fully working, no capturing regexps, > ) Where does modulo break? /s
Re: Hypothetical synonyms
Nicholas Clark wrote: > On Thu, Aug 29, 2002 at 12:00:55AM +0300, Markus Laire wrote: >> And I'm definitely going to try any future PerlGolf challenges also >> in perl6. > > Is it considered better if perl6 use more characters than perl5? (ie > implying probably less line noise) > or less (getting your job done more tersely?) >From the bit of Perl6 information I've gathered from the Apocalypses, the Exegesises (is that really the plural? Sounds horrible.), and my perl6-language reading, I'd say Perl6 is not only going to be a bit more verbose (unless you use the dreaded "use Perl5;" pragma ;) ), but it'll also be a Good Thing. Applying that to Perl Golf, however, isn't possible. It doesn't make sense to ask whether less line noise is better in golf. Anybody who has seen any of the winning solutions should realize that whoever wrote that either used some random string generator or tried to do create ASCII art from a color scan of bird droppings. Maybe I am just a bit frustrated that I had such a hard time understanding some of the solutions. :) > It would be interesting to see whether there are classes of problems > that go in different directions. I guess over 90 percent of problems will be longer; possibly about 60 percent being significantly longer. (Mainly because of the changes of A5.) Steffen -- @n=(544290696690,305106661574,116357),$b=16,@c=' ,JPacehklnorstu'=~ /./g;for$n(@n){map{$h=int$n/$b**$_;$n-=$b**$_*$h;$c[@c]=$h}c(0..9); push@p,map{$c[$_]}@c[c($b..$#c)];$#c=$b-1}print@p;sub'c{reverse @_}
Re: Hypothetical synonyms
On Thu, Aug 29, 2002 at 12:00:55AM +0300, Markus Laire wrote: > And I'm definitely going to try any future PerlGolf challenges also > in perl6. Is it considered better if perl6 use more characters than perl5? (ie implying probably less line noise) or less (getting your job done more tersely?) It would be interesting to see whether there are classes of problems that go in different directions. Nicholas Clark -- Even better than the real thing:http://nms-cgi.sourceforge.net/
Re: Hypothetical synonyms
On Tue, Aug 27, 2002 at 08:59:09PM -0400, Uri Guttman wrote: > > "LW" == Larry Wall <[EMAIL PROTECTED]> writes: > > LW> On 27 Aug 2002, Uri Guttman wrote: : and quoteline might even > LW> default to " for its delim which would make : that line: > LW> : > LW> : my ($fields) = /(|\S+)/; > > LW> That just looks like: > > LW> my $field = //; > and it would be nice to have a dictionary of builtin rules. :) my $data = //; It would make 1 liners very powerful. How long before someone writes that and ships it with parrot? And the $64,000 question - will the perl regexp engine be faster than calling expat? Or will they be the same (because the regexp compiler has certain builtin rules that are actually implemented as calls to C code (unless they are over-ridden))? Nicholas Clark -- Even better than the real thing:http://nms-cgi.sourceforge.net/
Re: Hypothetical synonyms
On 28 Aug 2002 at 16:04, Steffen Mueller wrote: > Piers Cawley wrote: > > Uri Guttman <[EMAIL PROTECTED]> writes: > >> ... regex code ... > > > > Hmm... is this the first Perl 6 golf post? > > Well, no, for two reasons: > a) There's whitespace. > b) The time's not quite ready for Perl6 golf because Larry's the only one > who would qualify as a referee. I think that time is just right for starting to golf in perl6. Parrot with languages/perl6 already supports a working subset of perl6. I'm currently trying to get factorial-problem from last Perl Golf working in perl6, and it has proven to be quite a challenge... (only 32bit numbers, modulo not fully working, no capturing regexps, ) And I'm definitely going to try any future PerlGolf challenges also in perl6. -- Markus Laire 'malaire' <[EMAIL PROTECTED]>
Re: Hypothetical synonyms
Piers Cawley wrote: > Uri Guttman <[EMAIL PROTECTED]> writes: {...] >> couldn't that be reduced to: >> >> m{^\s* $stuff := [ "(.*?)" | (\S+) ] }; >> >> the | will only return one of the grabbed chunks and the result of >> the [] group would be assigned to $stuff. > > Hmm... is this the first Perl 6 golf post? Well, no, for two reasons: a) There's whitespace. b) The time's not quite ready for Perl6 golf because Larry's the only one who would qualify as a referee. And we all know that's not a recreational task :) Steffen -- @n=(544290696690,305106661574,116357),$b=16,@c=' ,JPacehklnorstu'=~ /./g;for$n(@n){map{$h=int$n/$b**$_;$n-=$b**$_*$h;$c[@c]=$h}c(0..9); push@p,map{$c[$_]}@c[c($b..$#c)];$#c=$b-1}print@p;sub'c{reverse @_}
Re: Hypothetical synonyms
In a message dated 28 Aug 2002, Aaron Sherman writes: > Ok, just to be certain: > > $_ = "0"; > my $zilch = /0/ || 1; > > Is $zilch C<"0"> or 8? 8? How do you get 8? You'd get a result object which stringified was "0" and booleanfied was true. So here, you'd get a result object vaguely isomorphic to "0 but true". > If C<"0">, does it continue to be "true"? What about: > > $_ = "0"; > my $zilch = /0/ || 1; > die "Failed to match zero" unless $zilch; > > Is that a bug? Yes, it's a bug, as I don't see any way to actually die there. I don't understand the presence of the C<|| 1> there. I think you'd just write C. If you really truly wanted it to be one if it failed, but you still wanted the die to work, you'd write: $_ = "0"; my $zilch = /0/ || 1 but false; die "Failed to match zero" unless $zilch; Or, more comprehensibly, just $_ = "0"; my $zilch = /0/ or die "Failed to match zero"; Trey
Re: Hypothetical synonyms
On Wed, 2002-08-28 at 03:23, Trey Harris wrote: > Note--no parens around $field. We're not "capturing" here, not in the > Perl 5 sense, anyway. > > When a pattern consisting of only a named rule invokation (possibly > quantified) matches, it returns the result object, which in boolean > context returns true, but in string context returns the entire captured > text from the named rule (so, one hopes that the C rule > captures only the quoted text, not the quotes surrounding it). Ok, just to be certain: $_ = "0"; my $zilch = /0/ || 1; Is $zilch C<"0"> or 8? If C<"0">, does it continue to be "true"? What about: $_ = "0"; my $zilch = /0/ || 1; die "Failed to match zero" unless $zilch; Is that a bug?
Re: Hypothetical synonyms
In a message dated 27 Aug 2002, Uri Guttman writes: > > "LW" == Larry Wall <[EMAIL PROTECTED]> writes: > > LW> On 27 Aug 2002, Uri Guttman wrote: : and quoteline might even > LW> default to " for its delim which would make : that line: > LW> : > LW> : my ($fields) = /(|\S+)/; > > LW> That just looks like: > > LW> my $field = //; > > where is the grabbing there? if there was more than just shellword would > you have to () it for a grab? wouldn't that assign a boolean like perl5 > or is the boolean result only returned in a boolean context? Note--no parens around $field. We're not "capturing" here, not in the Perl 5 sense, anyway. When a pattern consisting of only a named rule invokation (possibly quantified) matches, it returns the result object, which in boolean context returns true, but in string context returns the entire captured text from the named rule (so, one hopes that the C rule captures only the quoted text, not the quotes surrounding it). I think this is more generalizable. I believe that if one matches an arbitrary rule which does not contain capturing parentheses, it returns the result object as well, which should contain the entire match (as if one put parens around the entire thing). Correct? So: my $vers = _ / 6/; should cause $vers to contain either "6" or "". A successful match object is true in boolean context, so my $vers = / \d/; would cause $vers to be true, even if the digit matched was zero. Here's an interesting one: my $vers = _ / \d/; # Stringify... print "yes!" if $vers; # ... and booleanize If $vers contained "0", would it still be true? That is, does the "is true" property of the result object survive stringification? It might be useful if it did. On the other hand, of course, one can also imagine: my $flag = _ (/ <[01]>/ or die "No debug setting!"); print "yes!" if $flag; where one would want the truth value to follow old conventions. Perhaps you could write: my $flag = / [ 0 :: { $0 is false } | 1 ]/; But then you have no way short of another string comparison for teasing out the difference between a failed match and a zero match, which is what we were trying to get away from. Maybe I'm just making this too complicated > what happens to $field if no match was found? undef? the old boolean > false of a null string wouldn't be good as that could be the result of a > match. i assume undef could never be the result of a match unless some > included perl code returned undef to the match object. then coder emptor > would be the rule. If the pattern doesn't match... will it return the undefined value, or will it return a false (and stringwise empty) result object? I could see it going either way, but a failed pattern result object is fairly useless, isn't it? > this is gonna make all the groups that copied perl5 regexes blow their > lids. just think about all the neat canned regexes that will be > done. like Regex::Common but even more so. we will need a CPAN just for > these alone. full blown *ML parsers, email verifiers, formatted data > extractors, etc. More and more lately, I've been finding myself getting syntax errors when I've wishfully put Perl 6 into my code. :-) Trey
Re: Hypothetical synonyms
> "LW" == Larry Wall <[EMAIL PROTECTED]> writes: LW> On 27 Aug 2002, Uri Guttman wrote: : and quoteline might even LW> default to " for its delim which would make : that line: LW> : LW> : my ($fields) = /(|\S+)/; LW> That just looks like: LW> my $field = //; where is the grabbing there? if there was more than just shellword would you have to () it for a grab? wouldn't that assign a boolean like perl5 or is the boolean result only returned in a boolean context? what happens to $field if no match was found? undef? the old boolean false of a null string wouldn't be good as that could be the result of a match. i assume undef could never be the result of a match unless some included perl code returned undef to the match object. then coder emptor would be the rule. and it would be nice to have a dictionary of builtin rules. :) also i assume i was correct in that we won't need CORE:: for those? unless something we inherit had the same name and we wanted the CORE:: version. this is gonna make all the groups that copied perl5 regexes blow their lids. just think about all the neat canned regexes that will be done. like Regex::Common but even more so. we will need a CPAN just for these alone. full blown *ML parsers, email verifiers, formatted data extractors, etc. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com - Stem and Perl Development, Systems Architecture, Design and Coding Search or Offer Perl Jobs http://jobs.perl.org
Re: Hypothetical synonyms
On 27 Aug 2002, Uri Guttman wrote: : and quoteline might even default to " for its delim which would make : that line: : : my ($fields) = /(|\S+)/; That just looks like: my $field = //; Larry
Re: Hypothetical synonyms
On 27 Aug 2002, Uri Guttman wrote: : > "LW" == Larry Wall <[EMAIL PROTECTED]> writes: : LW> m{^\s*[ : LW> "$stuff:=(.*?)" | : LW> $stuff:=(\S+) : LW> ]}; : : couldn't that be reduced to: : : m{^\s* $stuff := [ "(.*?)" | (\S+) ] }; : : the | will only return one of the grabbed chunks and the result of the : [] group would be assigned to $stuff. That too. Larry
Re: Hypothetical synonyms
> "TH" == Trey Harris <[EMAIL PROTECTED]> writes: TH> In a message dated 27 Aug 2002, Uri Guttman writes: >> m{^\s* $stuff := [ "(.*?)" | (\S+) ] }; TH> Or, how about TH> my ($fields) = /( '"')>|\S+)/; wouldn't quotelike automatically be inherited from the CORE:: rules like UNIVERSAL is? i have seen and others mentioned as not being hardwired builtins but just rules declared elsewhere and inherited. and quoteline might even default to " for its delim which would make that line: my ($fields) = /(|\S+)/; uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com - Stem and Perl Development, Systems Architecture, Design and Coding Search or Offer Perl Jobs http://jobs.perl.org
Re: Hypothetical synonyms
In a message dated 27 Aug 2002, Uri Guttman writes: > m{^\s* $stuff := [ "(.*?)" | (\S+) ] }; Or, how about my ($fields) = /( '"')>|\S+)/; ? :-) Trey
Re: Hypothetical synonyms
> "LW" == Larry Wall <[EMAIL PROTECTED]> writes: LW> That seems like a lot of extra work. I'd prefer to see something like: LW> my stuff; LW> m{^\s*[ LW> "$stuff:=(.*?)" | LW>$stuff:=(\S+) LW> ]}; couldn't that be reduced to: m{^\s* $stuff := [ "(.*?)" | (\S+) ] }; the | will only return one of the grabbed chunks and the result of the [] group would be assigned to $stuff. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com - Stem and Perl Development, Systems Architecture, Design and Coding Search or Offer Perl Jobs http://jobs.perl.org
Re: Hypothetical synonyms
On 27 Aug 2002, Aaron Sherman wrote: : I just wrote this code in Perl5: : : $stuff = (defined($1)?$1:$2) if /^\s*(?:"(.*?)"|(\S+))/; : : This is a common practice for me when I parse configuration and data : files whose formats I define. It's nice to be able to quote fields that : have spaces, and this is an easy way to parse the result. : : In Perl6, it looks like what I would like here is very close, but I'm : not sure. Certainly, I could do: : : $stuff = ($1 // $2) if m{^\s*["(.*?)"|(\S+)]}; : : But I would far prefer: : : $stuff = $field if m{^\s*[ : "(.*?)" {let $field=$1} | : (\S+) {let $field=$2}]}; : : even though it's longer. That seems like a lot of extra work. I'd prefer to see something like: my stuff; m{^\s*[ "$stuff:=(.*?)" | $stuff:=(\S+) ]}; : Is this possible, or does the underlying implementation of hypothetical : variables pretty much rule it out? I don't see any particular reason why a top-level regex can't refer to variables in the surrounding scope, either by default, or via a :modifier of some sort. It's only down in the sub-rules that we have to make sure there's a hash to poke such hypotheticals into. Larry
Hypothetical synonyms
I just wrote this code in Perl5: $stuff = (defined($1)?$1:$2) if /^\s*(?:"(.*?)"|(\S+))/; This is a common practice for me when I parse configuration and data files whose formats I define. It's nice to be able to quote fields that have spaces, and this is an easy way to parse the result. In Perl6, it looks like what I would like here is very close, but I'm not sure. Certainly, I could do: $stuff = ($1 // $2) if m{^\s*["(.*?)"|(\S+)]}; But I would far prefer: $stuff = $field if m{^\s*[ "(.*?)" {let $field=$1} | (\S+) {let $field=$2}]}; even though it's longer. Is this possible, or does the underlying implementation of hypothetical variables pretty much rule it out?