Re : Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax

Matthieu Wipliez Sun, 08 Mar 2009 06:33:38 -0700

> > In this case, here is a possible solution, you have your hash table 
> > associate 
> a lowercase version of the token with what you'd like to use in the grammar:
> > "buytocover" => "BuyToCover"
> > "sellshort" => "SellShort"
> > ...
> 
> 
> I'm doing this already but I don't think it will do the trick with a camlp4 
> parser since it goes through is_kwd to find a match when you use "delay".


I've just tested the idea with my lexer, in the rule identifier:
  | identifier as ident {
    if String.lowercase ident = "action" then
      IDENT "ActioN"
    else
      IDENT ident

replacing entries in the grammar that match against "action" so they match 
against "ActioN".

In the source code, I have
reload: ActIon in8:[i]
shift: acTIon

And Camlp4 parses it correctly. I have a tentative explanation as why it works 
below:

> I think that the internal keyword hash table in the grammar needs to be 
> populated with lowercase keywords (by invoking 'using'). I don't know how to 
> get 
> to the 'using' function yet, though.

I don't think so, here is what happens:
  1) you preprocess your grammar with camlp4of. This transforms the EXTEND 
statements (and a lot of other stuff) to calls to Camlp4 modules/functions.
The grammar parser is in the Camlp4GrammarParser module.
In the rule "symbol", the entry | s = STRING -> matches strings (literal 
tokens) and produces a TXkwd s.
This is later transformed by make_expr to an expression 
Camlp4Grammar__.Skeyword s (quotation <:expr< $uid:gm$.Skeyword $str:kwd$ >>)
What this means is that at compile time an entry
  my_rule : [ [ "BuyOrSell"; .. ] ]
gets transformed to an AST node
  Skeyword "BuyOrSell"

You can see that by running "camlp4of" on the parser. Every rule gets 
transformed to a call to Gram.extend function, with Gram.Sopt, Gram.Snterm, 
Gram.Skeyword etc.

  2) At runtime, when you start your program, all the Gram.extend calls are 
executed (because they are top-level). Your parser is kind of configured.
It turns out that extend is just a synonym for Insert.extend
  (last line of Static module)

  value extend = Insert.extend

This function will insert rules and tokens into Camlp4. The insert_tokens 
function tells us that whenever a Skeyword is seen, "using gram kwd" is called.
I believe this is the function you're referring to?

This function calls Structure.using, which basically add a keyword if 
necessary, and increase its reference count. (I think this is to automatically 
remove unused keywords, remember that Camlp4 can also delete rules, not only 
insert them).



So to sum up: when you declare a rule with a token "MyToken", the grammar is 
configured to recognize a "MyToken" keyword.

Now the lexer produces IDENT (or SYMBOL for that matters). SYMBOLs are KEYWORDs 
by default. IDENTs become KEYWORDs if they match the keyword content.

So in our case, the lexer recognizes identifiers. If this identifier equals 
(case-insensitively speaking) "mytoken", we declare an IDENT "MyToken", which 
will be later recognized as the "MyToken" keyword (because the is_kwd test is 
case-sensitive).

Cheers,
Matthieu

> 
>     Thanks, Joel
> 
> ---
> http://tinyco.de
> Mac, C++, OCaml





_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Re : Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax

Reply via email to