I am not familiar with instaparse, but the parser may be reasoning as
follows:
- "Exposure" matches a "PK" TOKEN that is preferred over "WORD" TOKEN, so
it is parsed as "PK" TOKEN
- "to" matches a "WORD" TOKEN that is not preferred, but there's no other
choice, so it is parsed as a "WORD" TOKEN
- ...
I don't think that instaparse will consider aggregating more complex TOKENS
when it can parse each word as a TOKEN (at worst a "WORD" token if none of
the preferred one-word TOKENs match)
I'd say you have to rework your grammar to be less ambiguous (i.e., only
one way to decompose a sentence) but unfortunately I don't know how exactly
you should fix it for instaparse
HTH
-Gianluca
On Monday, November 18, 2013 2:47:22 PM UTC+1, Jim foo.bar wrote:
>
> Hi all,
>
> I'm having a small problem composing smaller matches in instaparse. Here
> is what I'm trying...just observe the bold bits:
>
> (def parsePK
> (insta/parser
> "S = TOKEN (SPACE TOKEN PUNCT?)* END
> TOKEN = (NUM | DRUG | PK | DRUGPK | MECH | SIGN | EFF | ENCLOSED) /
> WORD
> <WORD> = #'\\w+' | PUNCT
> <PUNCT> = #'\\p{Punct}'
> ENCLOSED = PAREN | SQBR
> <PAREN> = #'\\[.*\\]'
> <SQBR> = #'\\(.*\\)'
> NUM = #'[0-9]+'
> ADV = #'[a-z]+ly'
> <SPACE> = #'\\s+'
> DRUG = #'(?i)didanosine|quinidine|tenofovir'
> PK = #'(?i)exposure|bioavailability|lower?[\\s|\\-]?clearance'
> *DRUGPK = PK SPACE TO SPACE DRUG SPACE EFF? SPACE *
> MECH = #'[a-z]+e(s|d)'
> *EFF = BE? SPACE SIGN? SPACE MECH | BE? SPACE MECH SPACE ADV? *
> SIGN = ADV | NEG
> NEG = 'not'
> <TO> = 'to' | 'of'
> <BE> = 'is' | 'are' | 'was' | 'were'
> END = '.' " ))
>
> Running the parser returns the following. It seems that the 2 bigger
> composite rules DRUGPK & EFF are not recognised at all. Only the smaller
> pieces are actually shown. I would expect [:TOKEN [:DRUGPK "Exposure to
> didanosine is increased"]] and [:TOKEN [:EFF "is increased"]] entries.
> (pprint
> (parsePK "Exposure to didanosine is increased when coadministered with
> tenofovir disoproxil fumarate [Table 5 and see Clinical Pharmacokinetics
> (12.3, Tables 9 and 10)]."))
>
>
> [:S
> [:TOKEN [:PK "Exposure"]]
> " "
> [:TOKEN "to"]
> " "
> [:TOKEN [:DRUG "didanosine"]]
> " "
> [:TOKEN "is"]
> " "
> [:TOKEN [:MECH "increased"]]
> " "
> [:TOKEN "when"]
> " "
> [:TOKEN [:MECH "coadministered"]]
> " "
> [:TOKEN "with"]
> " "
> [:TOKEN [:DRUG "tenofovir"]]
> ","
> " "
> [:TOKEN "disoproxil"]
> " "
> [:TOKEN "fumarate"]
> [:END "."]]
>
> Am I thinking about it the wrong way? Can ayone shed some light?
>
> many thanks in advance,
>
> Jim
>
>
>
>
>
>
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.