Hi Devin,

Partly agree.
Yes, we should think about the intended use. For me, the primary use would
be to automatically import an NM-Tran file into Simulo.
But of course, having a proper modern editor for NM-TRAN code would be a
huge added bonus.

*Regarding your concerns:*
1) Parsing fortran is not needed, as NM-TRAN makes it quite strict: each
line with verbatim code needs to start with "     This fortran code can
just be a single token in the lexer; lexing/parsing it as Fortran is
optional (as this is a rarely-used feature anyway).
2) Writing the bindings for antlr targets is indeed annoying, which is why
I am focusing on XText instead.
1 bis) XText has the XTend language to easily serialize program code from
an AST. It would be ideal to create an alternative implementation to
NM-Tran, writing out the F* files in XText instead. An editor could then
lex/parse the resulting file (including verbatim code) again, and detect
any issues properly (e.g. use of undefined variables in verbatim code,
which would require building half an NM-Tran anyway to properly detect).
3) Nonmem does not have a formal spec, but the HTML NM-Tran record
documentation combined with IV.pdf comes close. I do agree we would need a
validation set of NM-Tran scripts; I'm hoping to scrape the public ddmore
repository for that.

Yes, it is easy to construct the lexer by hand, and this may be what we
should do (although an ANTLR lexer is easier to understand and port).
However, the parser should be defined in a parser generator language, as
(thank you so much Mike for providing the code, will still be useful!) a
manual parser is difficult to understand or port.

Can I admit I'm quite impressed at the number of technical people on this
mailing list? I am amazed that a lack of time/funding seems to be the main
issue why we do not have nice tooling yet, not lack of technical knowledge.

/ Ruben

On Thu, Jun 14, 2018 at 5:39 PM Devin Pastoor <devin.past...@gmail.com>
wrote:

> Hi all,
>
> A couple thoughts on this. First, I would suggest constructing a parse
> tree over an abstract syntax tree, as it will likely be important to retain
> additional metadata such as comments, as I presume such a tool would
> provide a foundation for automatic things such as refactoring,
> reformatting, updating parameter values etc. Thus it would need to retain
> the state of the initial document.
>
> I don't know how if ANTLR is the right approach as:
>
> 1) you'd also need to parse FORTRAN snippets that can be embedded in a
> control stream, would you not?
> 2) Write bindings for the antlr targets (eg to get into R I guess would
> need to go through the C++ antlr target, blah)
> 3) does nonmem even have a formal specification to compare grammar against?
>
> Rather than focusing on the formally defining the language to be
> compatible with ANTLR, and instead treating it as a DSL, a simpler lexer
> could be constructed "by hand", thus allowing flexibility in handling some
> of the particularities that the DSL imposes (mix of fortran dialects, etc).
>
> Perhaps taking a step back and figuring out what the most common uses,
> given such a tool would exist, would help drive the discussion moving
> forward about what would be the most reasonable implementation objective to
> achieve the major outcomes. For example, having an autochecking CLI tool
> for grammar errors vs a tool for updating parameters, vs a tool to
> integrate with developer tools such as rstudio/vscode etc.
>
> Would also be happy to continue discussing on github/otherwise.
>
> Devin
>
> On Thu, Jun 14, 2018 at 10:59 AM Ruben Faelens <ruben.fael...@gmail.com>
> wrote:
>
>> Hi Bill,
>>
>> Nice to see you're interested. I have something basic working in XText,
>> that can already create an AST and editor for an example nonmem file.
>> However, it suffers from the context-free aspect of the lexer, and
>> therefore errors out in some cases...
>>
>> See http://github.com/rfaelens/nmparser/demo/
>> Feel free to git pull and continue to implement the full language.
>> Although note that the lexing is currently the weak part...
>>
>> / Ruben
>>
>> On Thu, Jun 14, 2018 at 3:18 PM Bill Denney <wden...@humanpredictions.com>
>> wrote:
>>
>>> Hi Ruben,
>>>
>>>
>>>
>>> I’m also interested in a lexer-parser for NONMEM.  The regexp-based ones
>>> that I’ve used have typically had issues (I’ve tried about 4 different ones
>>> including one that I wrote), and they are working for many but not all
>>> models.  I’m unaware of a reasonably complete lexer-parser for NONMEM
>>> (though I know of at least one non-public effort; I’ve contacted that
>>> author to see if he is interested in joining this conversation).
>>>
>>>
>>>
>>> I’ve wanted to build the abstract syntax tree for NONMEM to help with
>>> computational model-building, and I’ve been looking into ANTLR as well.
>>> Three questions:  Are you interested in collaborating on the parser (can
>>> you create a GitHub project for it)?  Why ANTLRv3 instead of v4?  Do you
>>> have a way to get an ANTLR parse tree into R?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Bill
>>>
>>>
>>>
>>> *From:* owner-nmus...@globomaxnm.com <owner-nmus...@globomaxnm.com> *On
>>> Behalf Of *Ruben Faelens
>>>
>>> *Sent:* Thursday, June 14, 2018 8:55 AM
>>> *To:* Tim Bergsma <tim.berg...@certara.com>
>>> *Cc:* nmusers@globomaxnm.com
>>> *Subject:* Re: [NMusers] Context-free lexer for NM-TRAN
>>>
>>>
>>>
>>> Hi Tim,
>>>
>>>
>>>
>>> Thanks for pointing to that.
>>>
>>> Unfortunately, nonmemica uses regular expressions to simply split the
>>> character stream into subsections.
>>>
>>> This is not the way to go. As an example, nonmemica would get confused
>>> by the following input:
>>>
>>> $PROBLEM This is a problem with special $PK section
>>>
>>> $PK ;Refer to $ERROR for more information
>>>
>>> CL=THETA(1)
>>>
>>> $ERROR
>>>
>>> Y = W*F
>>>
>>>
>>>
>>> Probably a contextual lexer is the way to go; fortunately ANTLRv3 has
>>> functionality for this.
>>>
>>>
>>>
>>> Kind regards,
>>>
>>> Ruben
>>>
>>>
>>>
>>> On Thu, Jun 14, 2018 at 12:42 PM Tim Bergsma <tim.berg...@certara.com>
>>> wrote:
>>>
>>>
>>>
>>> Hi Ruben.
>>>
>>>
>>>
>>> Related: the CRAN package “nonmemica” has a function as.model() that
>>> parses NONMEM control streams. Type “?nonmemica” at the R prompt after
>>> loading.  See also https://github.com/MikeKSmith/rspeaksnonmem .  Happy
>>> to discuss further.
>>>
>>>
>>>
>>> Kind regards,
>>>
>>>
>>>
>>> Tim
>>>
>>>
>>>
>>> *Tim Bergsma, PhD*
>>>
>>> Associate Director
>>>
>>> Certara Strategic Consulting
>>>
>>> [image: image001.png]
>>>
>>> m.  860.930.9931 <(860)%20930-9931>
>>>
>>> tim.berg...@certara.com
>>>
>>>
>>>
>>> *From:* owner-nmus...@globomaxnm.com <owner-nmus...@globomaxnm.com> *On
>>> Behalf Of *Ruben Faelens
>>> *Sent:* Thursday, June 14, 2018 4:33 AM
>>> *To:* nmusers@globomaxnm.com
>>> *Subject:* [NMusers] Context-free lexer for NM-TRAN
>>>
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>> Calling all computer scientists and computer language experts.
>>>
>>> In my spare time, I am working on a lexer and parser for NM-Tran.
>>> Primarly to teach myself about grammars and DSL, but perhaps something
>>> useful will come out of this (e.g. a context-sensitive editor with code
>>> completion).
>>>
>>>
>>>
>>> When lexing, I am having a hard time describing the keywords used by
>>> nm-tran.
>>>
>>> Let us take '.EQ.' as an example.
>>>
>>> 1) It seems that *.EQ. *is a keyword used to describe a comparison.
>>>
>>> 2) However, a filename could also be 'foo.eq.bar'
>>>
>>> The same thing applies for keywords on the '$ESTIMATION' record. These
>>> keywords could also be used as variable names.
>>>
>>>
>>>
>>> Am I right in saying that NM-TRAN cannot be tokenized with a
>>> context-free lexer? And that I should focus my efforts on building a
>>> lexer-less parser? (Or building my own lexer-parser, see
>>> https://en.wikipedia.org/wiki/The_lexer_hack )
>>>
>>> I assume building a parser for NM-TRAN was already done in the DDMoRe
>>> project, but I failed to find the source code...
>>>
>>>
>>>
>>> Kind regards,
>>>
>>> Ruben Faelens
>>>
>>>
>>>
>>> *NOTICE: *The information contained in this electronic mail message is
>>> intended only for the personal and confidential use of the designated
>>> recipient(s) named above. This message may be an attorney-client
>>> communication, may be protected by the work product doctrine, and may be
>>> subject to a protective order. As such, this message is privileged and
>>> confidential. If the reader of this message is not the intended recipient
>>> or an agent responsible for delivering it to the intended recipient, you
>>> are hereby notified that you have received this message in error and that
>>> any review, dissemination, distribution, or copying of this message is
>>> strictly prohibited. If you have received this communication in error,
>>> please notify us immediately by telephone and e-mail and destroy any and
>>> all copies of this message in your possession (whether hard copies or
>>> electronically stored copies). Thank you.
>>>
>>> Personal data may be transferred to the United States of America and, if
>>> this occurs, it is possible that US governmental authorities may access
>>> such personal data.
>>>
>>>
>>> buSp9xeMeKEbrUze
>>>
>>>

Reply via email to