Hi Jakob, Thank you for getting in touch. I had been meaning to after someone pointed me to your repo in a reddit thread, but you beat me to it. Replies in line. Best! Tom
PS ccing this back to the list for the record. On Tue, Jun 1, 2021 at 1:56 AM Jakob Schöttl <jscho...@gmail.com> wrote: > > Hi Tom, > > I came to your post at the mailing list from here: > https://github.com/gagbo/LuaOrgParser/issues/1 > Sorry, I don't know, how I can answer on the mailing list when I don't have > received the original mail. No worries, I never managed to figure that out either so I just subscribed. Maybe by matching the subject as you do here and ccing the list (attempting it in this email to see what happens)? > We have a pretty similar project, org-parser[1]. It's also written in a Lisp > dialect, Clojure, but it uses instaparse instead of brag as parser library. https://github.com/tgbugs/laundry/tree/next#similar-projects I managed to get it into my README as a reminder to myself to have a thorough look at it, but have been occupied with other work since then. > My idea was, to transform the formal grammar to a grammar.js for tree-sitter. > It would be so cool, if it could be generated from one formal specification. Yes, that would be great. It would be a major step to have a couple of grammars for org that can be used for stuff like this and compared to each other, along with test cases that we can use to define correct behavior. One issue that I don't have a full understanding of at the moment is how certain ambiguous forms will impact the ability to transform directly into the tree sitter grammar. The reason I mention this is because I have had to move to a two phase parser in order to deal with ambiguous parses. Having not looked carefully at your approach I don't know whether you have encountered similar issues. For the tree sitter use case in particular I'm not entirely sure that the ambiguity matters, but I haven't had a chance to look at it yet. > Do you plan, in your parser, to do a transformation step from the raw parser > AST to a higher-level AST? E.g. the raw parser AST would parse a (:date > "2021-06-01") and the transformed AST would transform this to a higher-level > timestamp object. Yes. I already do that to a certain extent in the expander https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the raw AST is hard to work with directly), but there will be more. I also expect that I will add an intermediate step where the AST is rearranged to account for aspects of org semantics that cannot be captured by the context free part of the grammar. After that step there are a number of potential conversions, one of which will transform the AST into Racket structs, but I haven't made it quite that far yet. That said, I think that in terms of defining a canonical parse, I am aiming to do that in the transformed intermediate s-expression representation because I think it will be easier to define the correctness of certain user interactions on that form rather than on the higher level object representation, even if the higher level objects are ultimately used to actually implement that behavior. > Do you have any automated tests for your parser? Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt you can run them from the working directory via =raco test laundry=. I haven't fully specified the expected AST (and transforms) in most cases because I'm still hammering out details. In some cases I do specify the parse that I expect, e.g. for headings I specify when tags are expected in cases where there might be some ambiguity. If you are looking for edge cases there are a number that are not yet in the automated tests but that are in https://github.com/tgbugs/laundry/blob/next/laundry/cursed.org because they hit on some cases of extreme ambiguity and internal inconsistency in the elisp implementation or on weird behavior under user interaction (I also have some other test cases that haven't been committed to the repo yet). It would be great to align the grammars and the behavior using a set of common test cases.