Hi Kean,

> Has anybody done this? I've glanced at the cTAKES wiki, but the install 
> instructions don't seem to address this level of customization;

Ctakes basically uses lvg as a 3rd party "black box".   So you probably won't 
find any in-depth lvg discussion or answers in the ctakes documentation or 
mailing list.  However, it never hurts to ask ...

Does anybody out there know of a really good source of information for the nlm 
lexical tools (lvg) ?

Sean

-----Original Message-----
From: Kean Kaufmann [mailto:[email protected]] 
Sent: Friday, July 14, 2017 4:25 PM
To: [email protected]
Subject: LVG questions [EXTERNAL]

Having issues with LVG overgenerating false positives.

I've tried leaving it out altogether, but there are some stemming stumbling 
blocks in the dictionary, e.g.

sql> select cui, tui, text, prefterm from cui_terms c join tui t on 
sql> t.cui =
> c.cui join prefterm p on p.cui = c.cui and cui=11849;
>   CUI  TUI  TEXT              PREFTERM
> -----  ---  ----------------  -----------------
> 11849   47  dm                Diabetes Mellitus
> 11849   47  diabetes          Diabetes Mellitus
> 11849   47  diabete mellitus  Diabetes Mellitus


I've also added some particularly problematic words to the LvgAnnotator's 
ExclusionSet, e.g.

>             <!-- oth = C0449210, "OTH tumor staging notation" -->
>
<string>other</string><string>Other</string><string>OTHER</string>
>             <!-- moth = C1445661, "Moth antigen" -->
>
<string>mother</string><string>Mother</string><string>MOTHER</string>
>             <!-- plan = C0270724, "Infantile Neuroaxonal Dystrophy" 
> -->
>
<string>planning</string><string>Planning</string><string>PLANNING</string>
>             <!-- not Attention Deficit Disorder -->
>
<string>adding</string><string>Adding</string><string>ADDING</string>
>             <!-- pas = C0030125, "p-Aminosalicylic acid" -->
>            
> <string>pass</string><string>Pass</string><string>PASS</string>
>
<string>passing</string><string>Passing</string><string>PASSING</string>
>            <!-- bre = C2363129, "Benign Rolandic Epilepsy" ?! -->
>
 <string>bring</string><string>Bring</string><string>BRING</string>

But what I'd really like is more control over LVG's behavior: for instance, 
blocking the "-er" suffixing rule completely, not letting the "-ing" rule apply 
to a stem without vowels, and not letting the plural rule add "-s" to stems 
ending in "s".  I've fiddled with the LVG rules under ctakes-lvg-res, e.g. 
data/rules/dm.rul , but to no apparent effect.

Has anybody done this? I've glanced at the cTAKES wiki, but the install 
instructions don't seem to address this level of customization; and I've 
skimmed the NLM documentation, but it doesn't seem to be intended for 
developers.  Can anyone point me to more detailed docs?

And: Has anyone tried plugging in another stemmer? To play nicely with the 
ctakes-dictionary-lookup-fast annotators, it seems as if all it would have to 
do would be to populate canonicalForm.

Happy Friday, and thanks for any help you can provide!

Kean Kaufmann
NLP Developer
RecordsOne, Inc.
  • LVG questions Kean Kaufmann
    • RE: LVG questions [EXTERNAL] Finan, Sean

Reply via email to