subject:"Re\: Finding a parse inside a $potentially long$ string\?"

Re: Finding a parse inside a (potentially long) string?

2016-12-23 Thread hovercraft-google

Yes, this is exactly what I do, and it works.

MO, just make sure your BNF produces a parser that parses all the inputs 
> correctly and then use parens to weed out the uninteresting parts as needed.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-23 Thread Ruslan Shvedov

IMO, just make sure your BNF produces a parser that parses all the inputs
correctly and then use parens to weed out the uninteresting parts as needed.

On Thu, Dec 22, 2016 at 5:38 PM, hovercraft-google 
wrote:

> Thank you very much for your attention and references. Actually, I already
> saw  and continue to study some of it. In essence, what I am looking for is
> a most efficient development tool for DSLs in several application areas.
> Specifics is in that these kinds of DSL doesn't require strict parsing of
> entire input. Instead, they must discover a known distinctive pattern, like
> a global structure of document, and extract only pieces of information
> based on it. The simple pattern search is not helpful there. I need the BNF
> grammar, but it should be able to bypass areas of 'not interesting' input.
> Naturally, writing the strict grammar for those uninteresting parts would
> be an overkill.
> My current approach to do it with Marpa: I just make Marpa believe it
> parses entire the input by substituting simplified dummy parts of grammar
> for uninteresting places, and I find it more elegant than making perl code
> exits.
> But I think, Marpa can have special means to work with such a grammars.
> Regards,
> Al
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-22 Thread Ron Savage

Glad to help. But as for the uninteresting parts, my fear is that you'll 
need to more-or-less define them just to be able to tell Marpa what to 
skip. I wonder if using priorities would be best, so telling Marpa the 
interesting parts have high priority and the other part have low priority 
should work. Perhaps the rules to match the uninteresting parts don't have 
to be too precise, but they must be sufficiently precise so as to stop 
Marpa matching them to the patterns of the interesting parts.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-22 Thread hovercraft-google

In my previous post I accidentally abused term DSL - I just meant 
'parsers', sorry,

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-22 Thread hovercraft-google

Thank you very much for your attention and references. Actually, I already 
saw  and continue to study some of it. In essence, what I am looking for is 
a most efficient development tool for DSLs in several application areas. 
Specifics is in that these kinds of DSL doesn't require strict parsing of 
entire input. Instead, they must discover a known distinctive pattern, like 
a global structure of document, and extract only pieces of information 
based on it. The simple pattern search is not helpful there. I need the BNF 
grammar, but it should be able to bypass areas of 'not interesting' input. 
Naturally, writing the strict grammar for those uninteresting parts would 
be an overkill. 
My current approach to do it with Marpa: I just make Marpa believe it 
parses entire the input by substituting simplified dummy parts of grammar 
for uninteresting places, and I find it more elegant than making perl code 
exits.
But I think, Marpa can have special means to work with such a grammars.
Regards,
Al

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-21 Thread Ron Savage

There are many Perl packages to study at 
http://savage.net.au/Marpa.html#Perl_Packages.

In particular, Text::Balanced::Marpa and Text::Balanced::Delimited have the 
concept of delimited strings, where you may define the delimiters to be 
anything.

Also, GraphViz2::Marpa contains 2 BNFs, one for the overall language (DOT) 
and one for its embedded HTML-like language (which is not HTML).

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-21 Thread Jeffrey Kegler

There is also this:
http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2012/11/pattern_search.html
 It has several problems -- it uses external lexiing, won't set speed
records, and is written in an older version of Marpa::R2, so that it may
need to be translated to run.

In Marpa::R3 I will be adding "eager" lexemes and other features that will
be helpful with this.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-21 Thread hovercraft-google

Well, this is a fate of those examples :) I have some generic question: 
what is the easiest way to skip part of input stream until some keyword or 
lexeme appears? Is it possible to do this in Marpa without using external 
lexer and/or events? May be putting an undesired_part symbol inside 
parenthesis will work, something like some_rule ::=  desired_part ( 
undesired_part ) end_mark_symbol? But how to define this undesired_part 
lexeme?

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-20 Thread Jeffrey Kegler

Looks like I decided to use a pasting service for my examples.  Bad
decision.  Either it no longer exists or they expired them.  I don't know
of a way to recover them.

Sorry, jeffrey

On Mon, Dec 19, 2016 at 9:55 PM,  wrote:

> This is  quite old topic, but in my opinion the use case discussed there
> is very important. Unfortunately, links to both examples are broken now.
> Could somebody fix it, please?
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2016-12-19 Thread hovercraft

This is  quite old topic, but in my opinion the use case discussed there is 
very important. Unfortunately, links to both examples are broken now. Could 
somebody fix it, please?

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-09 Thread Ron Savage


>
> expressions.  That world is like bell-bottom jeans, with one difference: 
> bell-bottoms might come back. 
>

They have, Jeffrey, a couple of years ago :-(.
 

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-09 Thread Jeffrey Kegler

I agree with Christopher's post just about word-for-word, but with one
exception, which is this:

Wishing Marpa's audience is something other than it is, is pointless.
Marpa has to be presented to the programming profession we have, not the
one we'd like, even if the one we'd like is arguably better.

And you might view a regex processor which internally picks the engine
for you (that is, Marpa vs. regular expression) as like the change from
manual shift to automatic in cars. I avoided buying a manual up to the
point where it'd cost me serious $$ to stay manual. But at this point I
find the automatic makes fairly reasonable decisions, and I may be
better off watching the road than the gearbox. Most of my fellow
drivers don't know how to manual shift. Some don't even know what a
gearbox does. But are they really worse drivers than the ones I grew up
with?

I grew up in a world where BNF was better known than regular
expressions. That world is like bell-bottom jeans, with one difference:
bell-bottoms might come back.

-- jeffrey

On 06/09/2014 07:10 PM, Christopher Layne wrote:

Personally I think people need to know when not to use a regex and when to
switch to a proper grammar/parser. I've been in that boat myself and made the
mistake more than a few times.

When the regex itself is actually more complicated and difficult to understand
vs the grammar, the value of continuing to use it is gone. At that point it's
unlikely the particular regex will even outperform a grammar based solution as
it is. Self-forcing of regex in all cases because it's familiar is pretty
irrational but probably a consequence of the vast majority of people using them
not having a lot of experience with proper parsers. At the end of the day,
we're all writing programmatic scrolls for little state machines as encoded by
the particular language chosen (regex, grammar, etc) but the language chosen to
write them should be sane. Why are regexes chosen so often? Familiarity and
false sense of programmar efficiency. Any non-trivial regex eventually turns
into a pretty ridiculously large pattern with multiple alternatives that
wouldn't even be readable without an /x flag. Individuals keep adding more to
said patterns, all the while just sinking costs into something where they
should just stop the madness and switch to a classic grammar/parser approach.

From a technical perspective, when multiple alternative, but valid, patterns show up that require
stateful logic is when grammars should be considered. The splitting of rules vs tokens as a
generalized parsing approach is quite clean from an abstraction POV as well. We have rules, they
define the way something should look and the order of elements that fit into the rules. We have
tokens, they define what something actually is as coming from a sequence of bits/bytes. In a sense,
such languages separate "code" from "data" and will always win from a
maintainability standpoint because the approach is inherently structured, organized, and with less
baked in data.

It also helps that in most of the non-trivial cases they're usually faster too.

On Jun 9, 2014, at 1050 PT, Jeffrey Kegler
wrote:

By the way, another target of opportunity is a regex engine which detects "hard" and
"easy" regexes. Most regexes it would handle in the ordinary way, with a regex engine,
but the hard ones it hands over to Marpa. This might prove popular because people *want* to do
everything with a regex. This would allow them to. It'd make a great Perl extension.

-- jeffrey

On 06/09/2014 10:03 AM, Steven Haryanto wrote:

Thanks for the answer and explanation. I see that the second approach is about
50% faster on my PC. Although speed-wise it's not on par with regex for this
simple case[*], it's interesting nevertheless and will be useful in certain
cases.

*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x 1000).
With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) { ... } I get around 250k searches/sec. With the
Marpa grammars I get +- 200/sec and +- 300/sec.

Regards,
Steven

--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-09 Thread Christopher Layne

Personally I think people need to know when not to use a regex and when to 
switch to a proper grammar/parser. I've been in that boat myself and made the 
mistake more than a few times.

When the regex itself is actually more complicated and difficult to understand 
vs the grammar, the value of continuing to use it is gone. At that point it's 
unlikely the particular regex will even outperform a grammar based solution as 
it is. Self-forcing of regex in all cases because it's familiar is pretty 
irrational but probably a consequence of the vast majority of people using them 
not having a lot of experience with proper parsers. At the end of the day, 
we're all writing programmatic scrolls for little state machines as encoded by 
the particular language chosen (regex, grammar, etc) but the language chosen to 
write them should be sane. Why are regexes chosen so often? Familiarity and 
false sense of programmar efficiency. Any non-trivial regex eventually turns 
into a pretty ridiculously large pattern with multiple alternatives that 
wouldn't even be readable without an /x flag. Individuals keep adding more to 
said patterns, all the while just sinking costs into something where they 
should just stop the madness and switch to a classic grammar/parser approach.

>From a technical perspective, when multiple alternative, but valid, patterns 
>show up that require stateful logic is when grammars should be considered. The 
>splitting of rules vs tokens as a generalized parsing approach is quite clean 
>from an abstraction POV as well. We have rules, they define the way something 
>should look and the order of elements that fit into the rules. We have tokens, 
>they define what something actually is as coming from a sequence of 
>bits/bytes. In a sense, such languages separate "code" from "data" and will 
>always win from a maintainability standpoint because the approach is 
>inherently structured, organized, and with less baked in data.

It also helps that in most of the non-trivial cases they're usually faster too.

On Jun 9, 2014, at 1050 PT, Jeffrey Kegler  
wrote:

> By the way, another target of opportunity is a regex engine which detects 
> "hard" and "easy" regexes.  Most regexes it would handle in the ordinary way, 
> with a regex engine, but the hard ones it hands over to Marpa.  This might 
> prove popular because people *want* to do everything with a regex.  This 
> would allow them to.  It'd make a great Perl extension.
> 
> -- jeffrey
> 
> On 06/09/2014 10:03 AM, Steven Haryanto wrote:
>> Thanks for the answer and explanation. I see that the second approach is 
>> about 50% faster on my PC. Although speed-wise it's not on par with regex 
>> for this simple case[*], it's interesting nevertheless and will be useful in 
>> certain cases.
>> 
>> *) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x 1000). 
>> With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) { ... } I get 
>> around 250k searches/sec. With the Marpa grammars I get +- 200/sec and +- 
>> 300/sec.
>> 
>> Regards,
>> Steven

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-09 Thread Durand Jean-Damien

Definitely -;

Btw there is a full regexp engine writen with Marpa (well, the JavaScript 
one, wich differs from perl with quite subtile things 
)
 
here 

.

JD.

Le lundi 9 juin 2014 19:50:47 UTC+2, Jeffrey Kegler a écrit :
>
>  By the way, another target of opportunity is a regex engine which detects 
> "hard" and "easy" regexes.  Most regexes it would handle in the ordinary 
> way, with a regex engine, but the hard ones it hands over to Marpa.  This 
> might prove popular because people *want* to do everything with a regex.  
> This would allow them to.  It'd make a great Perl extension.
>
> -- jeffrey
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-09 Thread Jeffrey Kegler

By the way, another target of opportunity is a regex engine which 
detects "hard" and "easy" regexes.  Most regexes it would handle in the 
ordinary way, with a regex engine, but the hard ones it hands over to 
Marpa.  This might prove popular because people *want* to do everything 
with a regex.  This would allow them to.  It'd make a great Perl extension.


-- jeffrey

On 06/09/2014 10:03 AM, Steven Haryanto wrote:
Thanks for the answer and explanation. I see that the second approach 
is about 50% faster on my PC. Although speed-wise it's not on par with 
regex for this simple case[*], it's interesting nevertheless and will 
be useful in certain cases.


*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x 
1000). With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) { 
... } I get around 250k searches/sec. With the Marpa grammars I get +- 
200/sec and +- 300/sec.


Regards,
Steven


Pada Minggu, 08 Juni 2014 23:24:21 UTC+7, Jeffrey Kegler menulis:

I've donea 2nd version of this ,
which I think should be faster and, especially, take less memory.

This technique is, I hope, of wide interest.  To do an
"unanchored" search, it uses Marpa's :discard mechanism.
Essentially, it treats strings that are not part of the search
target as whitespace.

The SLIf is quite compact, but an explanation may help.  I set the
grammar up to discard all single characters of length 1:

:discard ~ [\d\D]

Discard will always be the last choice.  Even in LATM, longest
match wins and, with length 1, a discard lexeme can at worst tie
for longest match.  Whenever there is a non-discard match, that is
preferred.  Bottom line: discard always loses, unless there is no
other choice.

Then, for your search patterns, you define other lexemes. As just
shown, when they match they will always be preferred.  If you want
all matches, you make your top rule

string::= target+

where  is the pattern you are searching for.

-- jeffrey

On 06/07/2014 10:48 PM, Steven Haryanto wrote:

Hi all,

I wonder if it's feasible to use Marpa, like regular expression,
to detect some pattern inside a string. An example of what I'm
trying to do is to extract some numeric expression from these
strings:

"1+2"
"This is an expression: 1+2, and this is another 1+2+4"
"1+2 is the expression"

I want to recognize and extract 1+2 and 1+2+4 from the above.
Here's my current (and failing) attempt:

---
use MarpaX::Simple qw(gen_parser);

my $p = gen_parser(
grammar => <<'_',
lexeme default  = latm => 1
:default  ::= action=>::first
:start::= answer

answer::= expr
| expr any
| any expr  action=>get1
| any expr any  action=>get1

expr  ::= num
| expr '+' expr

num ~ [\d]+

any ~ [\d\D]+

:discard~ ws
ws  ~ [\s]+
_
trace_terminals => 1,
trace_values => 1,
actions => {
get0 => sub { $_[1] },
get1 => sub { $_[2] },
},
);

sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say
"=" x 20 }

check('1');
check('1 + 2');
check('1 + 2 is the expression');
check('This is an expression: 1 + 2 and another 1+2+4');
--

Regards,
Steven
-- 
You received this message because you are subscribed to the

Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to marpa-parser...@googlegroups.com .
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google 
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to marpa-parser+unsubscr...@googlegroups.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-09 Thread Jeffrey Kegler

Where the search target *is* a regular expression, Marpa will never be 
competitive with regexes.  But regexes get used for a lot of things 
which are NOT regular expressions, and on these Marpa can and does win.


I've used matching parentheses as an example.  These are not regular 
expressions, but regexes get used for them anyway.  And in "easy" cases, 
regexes still win.  But in "hard" cases Marpa is 10x faster or more.  I 
did a detailed write-up on my blog twice: the 2nd version is here 
. 
Basically, the story is that when the regex has to do back-tracking, 
Marpa wins.  Marpa does all its parsing without back-tracking.


Interesting applications for Marpa pattern searching might be things 
like finding unmatched parens, brackets, etc. in a programming language, 
taking into account strings, comments, etc.  You can't do that with a 
pure regular expression and a regex will be unreadable and slow.


You can think of it as a tortoise and hare thing.  Marpa's a good steady 
predictable tortoise, and it will win if the course is difficult.  But 
for a simple regular expression, pick the hare.


-- jeffrey

On 06/09/2014 10:03 AM, Steven Haryanto wrote:
Thanks for the answer and explanation. I see that the second approach 
is about 50% faster on my PC. Although speed-wise it's not on par with 
regex for this simple case[*], it's interesting nevertheless and will 
be useful in certain cases.


*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x 
1000). With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) { 
... } I get around 250k searches/sec. With the Marpa grammars I get +- 
200/sec and +- 300/sec.


Regards,
Steven


Pada Minggu, 08 Juni 2014 23:24:21 UTC+7, Jeffrey Kegler menulis:

I've donea 2nd version of this ,
which I think should be faster and, especially, take less memory.

This technique is, I hope, of wide interest.  To do an
"unanchored" search, it uses Marpa's :discard mechanism.
Essentially, it treats strings that are not part of the search
target as whitespace.

The SLIf is quite compact, but an explanation may help.  I set the
grammar up to discard all single characters of length 1:

:discard ~ [\d\D]

Discard will always be the last choice.  Even in LATM, longest
match wins and, with length 1, a discard lexeme can at worst tie
for longest match.  Whenever there is a non-discard match, that is
preferred.  Bottom line: discard always loses, unless there is no
other choice.

Then, for your search patterns, you define other lexemes. As just
shown, when they match they will always be preferred.  If you want
all matches, you make your top rule

string::= target+

where  is the pattern you are searching for.

-- jeffrey

On 06/07/2014 10:48 PM, Steven Haryanto wrote:

Hi all,

I wonder if it's feasible to use Marpa, like regular expression,
to detect some pattern inside a string. An example of what I'm
trying to do is to extract some numeric expression from these
strings:

"1+2"
"This is an expression: 1+2, and this is another 1+2+4"
"1+2 is the expression"

I want to recognize and extract 1+2 and 1+2+4 from the above.
Here's my current (and failing) attempt:

---
use MarpaX::Simple qw(gen_parser);

my $p = gen_parser(
grammar => <<'_',
lexeme default  = latm => 1
:default  ::= action=>::first
:start::= answer

answer::= expr
| expr any
| any expr  action=>get1
| any expr any  action=>get1

expr  ::= num
| expr '+' expr

num ~ [\d]+

any ~ [\d\D]+

:discard~ ws
ws  ~ [\s]+
_
trace_terminals => 1,
trace_values => 1,
actions => {
get0 => sub { $_[1] },
get1 => sub { $_[2] },
},
);

sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say
"=" x 20 }

check('1');
check('1 + 2');
check('1 + 2 is the expression');
check('This is an expression: 1 + 2 and another 1+2+4');
--

Regards,
Steven
-- 
You received this message because you are subscribed to the

Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to marpa-parser...@googlegroups.com .
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google 
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to marpa-parser+unsubscr...@googlegroups.com 
.

For more options, visit https://

Re: Finding a parse inside a (potentially long) string?

2014-06-09 Thread Steven Haryanto

Thanks for the answer and explanation. I see that the second approach is 
about 50% faster on my PC. Although speed-wise it's not on par with regex 
for this simple case[*], it's interesting nevertheless and will be useful 
in certain cases.

*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x 
1000). With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) { ... } 
I get around 250k searches/sec. With the Marpa grammars I get +- 200/sec 
and +- 300/sec.

Regards,
Steven


Pada Minggu, 08 Juni 2014 23:24:21 UTC+7, Jeffrey Kegler menulis:
>
>  I've done a 2nd version of this , which 
> I think should be faster and, especially, take less memory.
>
> This technique is, I hope, of wide interest.  To do an "unanchored" 
> search, it uses Marpa's :discard mechanism.  Essentially, it treats strings 
> that are not part of the search target as whitespace.
>
> The SLIf is quite compact, but an explanation may help.  I set the grammar 
> up to discard all single characters of length 1:
>
> :discard ~ [\d\D]
>
>  Discard will always be the last choice.  Even in LATM, longest match wins 
> and, with length 1, a discard lexeme can at worst tie for longest match.  
> Whenever there is a non-discard match, that is preferred.  Bottom line: 
> discard always loses, unless there is no other choice.
>
> Then, for your search patterns, you define other lexemes.  As just shown, 
> when they match they will always be preferred.  If you want all matches, 
> you make your top rule
>
> string::= target+
>
>  where  is the pattern you are searching for.
>
> -- jeffrey
>
> On 06/07/2014 10:48 PM, Steven Haryanto wrote:
>  
> Hi all, 
>
>  I wonder if it's feasible to use Marpa, like regular expression, to 
> detect some pattern inside a string. An example of what I'm trying to do is 
> to extract some numeric expression from these strings:
>
>  "1+2"
> "This is an expression: 1+2, and this is another 1+2+4"
> "1+2 is the expression"
>
>  I want to recognize and extract 1+2 and 1+2+4 from the above. Here's my 
> current (and failing) attempt:
>
>  ---
> use MarpaX::Simple qw(gen_parser);
>
>  my $p = gen_parser(
> grammar => <<'_',
> lexeme default  = latm => 1
> :default  ::= action=>::first
> :start::= answer
>
>  answer::= expr
> | expr any
> | any expraction=>get1
> | any expr anyaction=>get1
>
>  expr  ::= num
> | expr '+' expr
>
>  num ~ [\d]+
>
>  any ~ [\d\D]+
>
>  :discard~ ws
> ws  ~ [\s]+
> _
> trace_terminals => 1,
> trace_values => 1,
> actions => {
> get0 => sub { $_[1] },
> get1 => sub { $_[2] },
> },
> );
>
>  sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say "=" x 
> 20 }
>
>  check('1');
> check('1 + 2');
> check('1 + 2 is the expression');
> check('This is an expression: 1 + 2 and another 1+2+4');
>  --
>
>  Regards,
> Steven
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to marpa-parser...@googlegroups.com .
> For more options, visit https://groups.google.com/d/optout.
>
>
>  

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-08 Thread Jeffrey Kegler

I've donea 2nd version of this , which I 
think should be faster and, especially, take less memory.


This technique is, I hope, of wide interest.  To do an "unanchored" 
search, it uses Marpa's :discard mechanism.  Essentially, it treats 
strings that are not part of the search target as whitespace.


The SLIf is quite compact, but an explanation may help.  I set the 
grammar up to discard all single characters of length 1:

:discard ~ [\d\D]
Discard will always be the last choice.  Even in LATM, longest match 
wins and, with length 1, a discard lexeme can at worst tie for longest 
match.  Whenever there is a non-discard match, that is preferred.  
Bottom line: discard always loses, unless there is no other choice.


Then, for your search patterns, you define other lexemes.  As just 
shown, when they match they will always be preferred.  If you want all 
matches, you make your top rule

string::= target+

where  is the pattern you are searching for.

-- jeffrey

On 06/07/2014 10:48 PM, Steven Haryanto wrote:

Hi all,

I wonder if it's feasible to use Marpa, like regular expression, to 
detect some pattern inside a string. An example of what I'm trying to 
do is to extract some numeric expression from these strings:


"1+2"
"This is an expression: 1+2, and this is another 1+2+4"
"1+2 is the expression"

I want to recognize and extract 1+2 and 1+2+4 from the above. Here's 
my current (and failing) attempt:


---
use MarpaX::Simple qw(gen_parser);

my $p = gen_parser(
grammar => <<'_',
lexeme default  = latm => 1
:default  ::= action=>::first
:start::= answer

answer::= expr
| expr any
| any expraction=>get1
| any expr anyaction=>get1

expr  ::= num
| expr '+' expr

num ~ [\d]+

any ~ [\d\D]+

:discard~ ws
ws  ~ [\s]+
_
trace_terminals => 1,
trace_values => 1,
actions => {
get0 => sub { $_[1] },
get1 => sub { $_[2] },
},
);

sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say "=" x 
20 }


check('1');
check('1 + 2');
check('1 + 2 is the expression');
check('This is an expression: 1 + 2 and another 1+2+4');
--

Regards,
Steven
--
You received this message because you are subscribed to the Google 
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to marpa-parser+unsubscr...@googlegroups.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

2014-06-08 Thread Jeffrey Kegler

I've pasted an example that passes your test cases 
.  It's just one of the several possible 
approaches.  This one works by moving the expressions down into the 
lexer, and searching for them as lexemes.


I've thought, and written in my blog, a lot on the topic of 
Marpa-powered searches.  But it was long ago and at that point there 
wasn't much interest, so I moved on.


-- jeffrey

On 06/07/2014 10:48 PM, Steven Haryanto wrote:

Hi all,

I wonder if it's feasible to use Marpa, like regular expression, to 
detect some pattern inside a string. An example of what I'm trying to 
do is to extract some numeric expression from these strings:


"1+2"
"This is an expression: 1+2, and this is another 1+2+4"
"1+2 is the expression"

I want to recognize and extract 1+2 and 1+2+4 from the above. Here's 
my current (and failing) attempt:


---
use MarpaX::Simple qw(gen_parser);

my $p = gen_parser(
grammar => <<'_',
lexeme default  = latm => 1
:default  ::= action=>::first
:start::= answer

answer::= expr
| expr any
| any expraction=>get1
| any expr anyaction=>get1

expr  ::= num
| expr '+' expr

num ~ [\d]+

any ~ [\d\D]+

:discard~ ws
ws  ~ [\s]+
_
trace_terminals => 1,
trace_values => 1,
actions => {
get0 => sub { $_[1] },
get1 => sub { $_[2] },
},
);

sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say "=" x 
20 }


check('1');
check('1 + 2');
check('1 + 2 is the expression');
check('This is an expression: 1 + 2 and another 1+2+4');
--

Regards,
Steven
--
You received this message because you are subscribed to the Google 
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to marpa-parser+unsubscr...@googlegroups.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

Re: Finding a parse inside a (potentially long) string?

19 matches

Site Navigation

Mail list logo

Footer information