I've posted some things previously on this topic - but in short, you don't
really need to use events to do this. It's possible to do it in a
semi-straightforward fashion without a lot of jumping through hoops (just a
bunch of rules).
Here's some grammar fragments demonstrating what I'm talking about (this
handles single quoted, double quoted, and quote-like parsing (e.g. q%%, q||),
while efficiently handling simple quoted strings that have no escape sequences
but falling back to an escape-aware mode when they're present.
my ($dsl, $grammar) =
'===';
:default ::= action = [values]
lexeme default = latm = 1
[...]
# Normal, bare, unquoted
value ::= value_n
value_n ::= valword_n
# Quoted but not escaped# reassemble
action
value ::= value_qdaction = val_qd
| value_qsaction = val_qs
| value_ql0 action =
val_ql0
| value_ql1 action =
val_ql1
value_qd::= valword_qd
value_qs::= valword_qs
value_ql0 ::= valword_ql0
value_ql1 ::= valword_ql1
# Quoted and escaped# reassemble
action
value ::= (g_quote_d) value_eqd (g_quote_d) action =
val_eqd
| (g_quote_s) value_eqs (g_quote_s) action =
val_eqs
| (g_quote_ls0) value_eql0 (g_quote_le0) action =
val_eql0
| (g_quote_ls1) value_eql1 (g_quote_le1) action =
val_eql1
value_eqd ::= valword_eqd*
value_eqs ::= valword_eqs*
value_eql0 ::= valword_eql0*
value_eql1 ::= valword_eql1*
# Normal, bare, unquoted
valword_n ~ valword_n_c
valword_n_c ~ [\w_\@:.\/\*-]+
# Quoted but not escaped
valword_qd~ quote_d valword_qd_c quote_d
valword_qs~ quote_s valword_qs_c quote_s
valword_ql0 ~ quote_ls0 valword_ql0_c quote_le0
valword_ql1 ~ quote_ls1 valword_ql1_c quote_le1
valword_qd_c ~ [^\\]*
valword_qs_c ~ [^'\\]*
valword_ql0_c ~ [^|\\]*
valword_ql1_c ~ [^%\\]*
# Quoted and escaped
valword_eqd ~ valword_eqd_c
valword_eqs ~ valword_eqs_c
valword_eql0 ~ valword_eql0_c
valword_eql1 ~ valword_eql1_c
valword_eqd_c ~ [^] | whitespace | escape []
valword_eqs_c ~ [^'] | whitespace | escape [']
valword_eql0_c~ [^|] | whitespace | escape [|]
valword_eql1_c~ [^%] | whitespace | escape [%]
# These do translation, but cannot be enabled yet as the expectation is no
translation.
# valword_eqd ~ [^\a\b\e\f\r\n\t\\] | whitespace | escape valword_esc
# valword_eqs ~ [^\a\b\e\f\r\n\t\\'] | whitespace | escape valword_esc
# valword_esc ~ [abefrnt\\']
# The same base lexemes cannot be directly used by both the lexer and grammar
*at the same time*.
# Work around it by providing wrapper lexeme rules for the grammar which end up
at the same terminal.
g_quote_d ~ quote_d
g_quote_s ~ quote_s
g_quote_ls0 ~ quote_ls0
g_quote_le0 ~ quote_le0
g_quote_ls1 ~ quote_ls1
g_quote_le1 ~ quote_le1
quote_d ~ []
quote_s ~ [']
quote_ls0 ~ 'q|'
quote_le0 ~ '|'
quote_ls1 ~ 'q%'
quote_le1 ~ '%'
escape~ '\'
:discard ~ whitespace
whitespace~ [\s]+
===
# Deescaping table
my $xtab = {
'eqd' = { q(\) = qq() },
'eqs' = { q(\') = qq(') },
'eql0' = { q(\|) = qq(|) },
'eql1' = { q(\%) = qq(%) },
# # Not presently used.
# 'eqx' = {
# q(\a) = qq(\a),
# q(\b) = qq(\b),
# q(\e) = qq(\e),
# q(\f) = qq(\f),
# q(\n) = qq(\n),
# q(\r) = qq(\r),
# q(\t) = qq(\t),
# q(\) = qq(),
# q(\') = qq('),
# q() = qq(\\),
# },
};
# Deescaping functions
sub val_eqd { return [ join '', map +($xtab-{'eqd'}{$_} || $_), @{$_[1]} ] }
sub val_eqs { return [ join '', map +($xtab-{'eqs'}{$_} || $_), @{$_[1]} ] }
sub val_eql0 { return [ join '', map +($xtab-{'eql0'}{$_} || $_), @{$_[1]} ] }
sub val_eql1 { return [ join '', map +($xtab-{'eql1'}{$_} || $_), @{$_[1]} ] }
#sub val_eqx { return [ join '', map +($xtab-{'eqx'}{$_} || $_), @{$_[1]} ] }
# Dequoting functions
sub val_qd { return [ substr($_[1]-[0], 1, -1) ] }
sub val_qs { return [ substr($_[1]-[0], 1, -1) ] }
sub val_ql0 { return [ substr($_[1]-[0], 2, -1) ] }
sub val_ql1 { return [ substr($_[1]-[0], 2, -1) ] }
The deescape anything back to it's original form isn't used in the above, but
simply commented