A grammar for quoted strings with escaped chars

2014-09-22 Thread Ron Savage
I've developed a grammar (with help from various people of course) for 
quoted strings: http://scsys.co.uk:8002/424926

Requirements:

o Strings must be quoted

o Strings are either single or double quoted

o The escape character is \

o If the string is single quoted, internal single quotes must be escaped

o If the string is double quoted, internal double quotes must be escaped

o Any other character may be escaped

o If a character is escaped, the escape character is preserved in the output

o Empty strings are accepted

ToDo: Make it work with utf8.

Does anyone see problems, or other input strings which should be tested?

Jeffrey: This is one of the plug-in grammars Jean-Damien and I talked about 
recently. Any chance you can implement:

my $source = 'END_OF_GRAMMAR';
...
:include /my/grammars/quoted.strings.bnf
...
END_OF_GRAMMAR

to include a suitable[*] grammar in situ within a grammar declaration?

[*] Obviously, here that just means the prefix:

:default ::= action = [values]

lexeme default =  latm = 1 # Longest Acceptable Token Match.

:start ::= string_token

and the suffix:

# Boilerplate.

:discard ~ whitespace
whitespace ~ [\s]+

END_OF_GRAMMAR

would not be present in the include file.

-- 
You received this message because you are subscribed to the Google Groups 
marpa parser group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A grammar for quoted strings with escaped chars

2014-09-22 Thread Christopher Layne
I've posted some things previously on this topic - but in short, you don't 
really need to use events to do this. It's possible to do it in a 
semi-straightforward fashion without a lot of jumping through hoops (just a 
bunch of rules).

Here's some grammar fragments demonstrating what I'm talking about (this 
handles single quoted, double quoted, and quote-like parsing (e.g. q%%, q||), 
while efficiently handling simple quoted strings that have no escape sequences 
but falling back to an escape-aware mode when they're present.

my ($dsl, $grammar) = 
'===';

:default ::= action = [values]
lexeme default = latm = 1

[...]

# Normal, bare, unquoted
value   ::= value_n
value_n ::= valword_n

# Quoted but not escaped# reassemble 
action
value   ::= value_qdaction = val_qd
  | value_qsaction = val_qs
  | value_ql0   action = 
val_ql0
  | value_ql1   action = 
val_ql1
value_qd::= valword_qd
value_qs::= valword_qs
value_ql0   ::= valword_ql0
value_ql1   ::= valword_ql1

# Quoted and escaped# reassemble 
action
value   ::= (g_quote_d) value_eqd (g_quote_d)   action = 
val_eqd
  | (g_quote_s) value_eqs (g_quote_s)   action = 
val_eqs
  | (g_quote_ls0) value_eql0 (g_quote_le0)  action = 
val_eql0
  | (g_quote_ls1) value_eql1 (g_quote_le1)  action = 
val_eql1
value_eqd   ::= valword_eqd*
value_eqs   ::= valword_eqs*
value_eql0  ::= valword_eql0*
value_eql1  ::= valword_eql1*

# Normal, bare, unquoted
valword_n ~ valword_n_c
valword_n_c   ~ [\w_\@:.\/\*-]+

# Quoted but not escaped
valword_qd~ quote_d valword_qd_c quote_d
valword_qs~ quote_s valword_qs_c quote_s
valword_ql0   ~ quote_ls0 valword_ql0_c quote_le0
valword_ql1   ~ quote_ls1 valword_ql1_c quote_le1
valword_qd_c  ~ [^\\]*
valword_qs_c  ~ [^'\\]*
valword_ql0_c ~ [^|\\]*
valword_ql1_c ~ [^%\\]*

# Quoted and escaped
valword_eqd   ~ valword_eqd_c
valword_eqs   ~ valword_eqs_c
valword_eql0  ~ valword_eql0_c
valword_eql1  ~ valword_eql1_c
valword_eqd_c ~ [^] | whitespace | escape []
valword_eqs_c ~ [^'] | whitespace | escape [']
valword_eql0_c~ [^|] | whitespace | escape [|]
valword_eql1_c~ [^%] | whitespace | escape [%]

# These do translation, but cannot be enabled yet as the expectation is no 
translation.
# valword_eqd ~ [^\a\b\e\f\r\n\t\\] | whitespace | escape valword_esc
# valword_eqs ~ [^\a\b\e\f\r\n\t\\'] | whitespace | escape valword_esc
# valword_esc ~ [abefrnt\\']

# The same base lexemes cannot be directly used by both the lexer and grammar 
*at the same time*.
# Work around it by providing wrapper lexeme rules for the grammar which end up 
at the same terminal.
g_quote_d ~ quote_d
g_quote_s ~ quote_s
g_quote_ls0   ~ quote_ls0
g_quote_le0   ~ quote_le0
g_quote_ls1   ~ quote_ls1
g_quote_le1   ~ quote_le1

quote_d   ~ []
quote_s   ~ [']
quote_ls0 ~ 'q|'
quote_le0 ~ '|'
quote_ls1 ~ 'q%'
quote_le1 ~ '%'
escape~ '\'

:discard  ~ whitespace
whitespace~ [\s]+
===

# Deescaping table
my $xtab = {
 'eqd' = { q(\) = qq() },
 'eqs' = { q(\') = qq(') },
'eql0' = { q(\|) = qq(|) },
'eql1' = { q(\%) = qq(%) },

#   # Not presently used.
#   'eqx'  = {
#   q(\a)   = qq(\a),
#   q(\b)   = qq(\b),
#   q(\e)   = qq(\e),
#   q(\f)   = qq(\f),
#   q(\n)   = qq(\n),
#   q(\r)   = qq(\r),
#   q(\t)   = qq(\t),
#   q(\)   = qq(),
#   q(\')   = qq('),
#   q() = qq(\\),
#   },
};

# Deescaping functions
sub val_eqd  { return [ join '', map +($xtab-{'eqd'}{$_} || $_), @{$_[1]} ] }
sub val_eqs  { return [ join '', map +($xtab-{'eqs'}{$_} || $_), @{$_[1]} ] }
sub val_eql0 { return [ join '', map +($xtab-{'eql0'}{$_} || $_), @{$_[1]} ] }
sub val_eql1 { return [ join '', map +($xtab-{'eql1'}{$_} || $_), @{$_[1]} ] }
#sub val_eqx  { return [ join '', map +($xtab-{'eqx'}{$_} || $_), @{$_[1]} ] }

# Dequoting functions
sub val_qd  { return [ substr($_[1]-[0], 1, -1) ] }
sub val_qs  { return [ substr($_[1]-[0], 1, -1) ] }
sub val_ql0 { return [ substr($_[1]-[0], 2, -1) ] }
sub val_ql1 { return [ substr($_[1]-[0], 2, -1) ] }



The deescape anything back to it's original form isn't used in the above, but 
simply commented 

Re: A grammar for quoted strings with escaped chars

2014-09-22 Thread Ron Savage
Thanx for the link.

I did not consider that case, since I'm really interested in the Graphviz 
DOT file format, where quotes if any must be double quotes, and internal 
quotes must be escaped.

However, I will examine the code you link to, since ever such example is 
interesting.

-- 
You received this message because you are subscribed to the Google Groups 
marpa parser group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A grammar for quoted strings with escaped chars

2014-09-22 Thread Ron Savage
I think I'll release such samples (I have encountered a few) as 
MarpaX::Demo::SampleGrammars. It'll be basically a dummy module with the 
good stuff in scripts/*.pl.

I've been thinking about a script collection for many months now.

Any other suggestions (module name, code to include)?

-- 
You received this message because you are subscribed to the Google Groups 
marpa parser group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A grammar for quoted strings with escaped chars

2014-09-22 Thread Ruslan Shvedov
On Tue, Sep 23, 2014 at 2:14 AM, Ron Savage r...@savage.net.au wrote:

 Thanx for the link.


 2 of those 3 samples (the 2nd  3rd) produce ambiguous parses. Is that
 what you find too?

Yes, the code warns about it; actually I was planning to deal with it as
part of my current work on ASF-based disambiguation so the code can better
be used when I'll finish. It can serve just an idea/illustration now.




 --
 You received this message because you are subscribed to the Google Groups
 marpa parser group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to marpa-parser+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
marpa parser group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.