By the way, another target of opportunity is a regex engine which
detects "hard" and "easy" regexes. Most regexes it would handle in the
ordinary way, with a regex engine, but the hard ones it hands over to
Marpa. This might prove popular because people *want* to do everything
with a regex. This would allow them to. It'd make a great Perl extension.
-- jeffrey
On 06/09/2014 10:03 AM, Steven Haryanto wrote:
Thanks for the answer and explanation. I see that the second approach
is about 50% faster on my PC. Although speed-wise it's not on par with
regex for this simple case[*], it's interesting nevertheless and will
be useful in certain cases.
*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x
1000). With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) {
... } I get around 250k searches/sec. With the Marpa grammars I get +-
200/sec and +- 300/sec.
Regards,
Steven
Pada Minggu, 08 Juni 2014 23:24:21 UTC+7, Jeffrey Kegler menulis:
I've donea 2nd version of this <http://scsys.co.uk:8002/391796>,
which I think should be faster and, especially, take less memory.
This technique is, I hope, of wide interest. To do an
"unanchored" search, it uses Marpa's :discard mechanism.
Essentially, it treats strings that are not part of the search
target as whitespace.
The SLIf is quite compact, but an explanation may help. I set the
grammar up to discard all single characters of length 1:
:discard ~ [\d\D]
Discard will always be the last choice. Even in LATM, longest
match wins and, with length 1, a discard lexeme can at worst tie
for longest match. Whenever there is a non-discard match, that is
preferred. Bottom line: discard always loses, unless there is no
other choice.
Then, for your search patterns, you define other lexemes. As just
shown, when they match they will always be preferred. If you want
all matches, you make your top rule
string ::= target+
where <target> is the pattern you are searching for.
-- jeffrey
On 06/07/2014 10:48 PM, Steven Haryanto wrote:
Hi all,
I wonder if it's feasible to use Marpa, like regular expression,
to detect some pattern inside a string. An example of what I'm
trying to do is to extract some numeric expression from these
strings:
"1+2"
"This is an expression: 1+2, and this is another 1+2+4"
"1+2 is the expression"
I want to recognize and extract 1+2 and 1+2+4 from the above.
Here's my current (and failing) attempt:
---
use MarpaX::Simple qw(gen_parser);
my $p = gen_parser(
grammar => <<'_',
lexeme default = latm => 1
:default ::= action=>::first
:start ::= answer
answer ::= expr
| expr any
| any expr action=>get1
| any expr any action=>get1
expr ::= num
| expr '+' expr
num ~ [\d]+
any ~ [\d\D]+
:discard ~ ws
ws ~ [\s]+
_
trace_terminals => 1,
trace_values => 1,
actions => {
get0 => sub { $_[1] },
get1 => sub { $_[2] },
},
);
sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say
"=" x 20 }
check('1');
check('1 + 2');
check('1 + 2 is the expression');
check('This is an expression: 1 + 2 and another 1+2+4');
--
Regards,
Steven
--
You received this message because you are subscribed to the
Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to marpa-parser...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to marpa-parser+unsubscr...@googlegroups.com
<mailto:marpa-parser+unsubscr...@googlegroups.com>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.