Having issues with LVG overgenerating false positives. I've tried leaving it out altogether, but there are some stemming stumbling blocks in the dictionary, e.g.
sql> select cui, tui, text, prefterm from cui_terms c join tui t on t.cui = > c.cui join prefterm p on p.cui = c.cui and cui=11849; > CUI TUI TEXT PREFTERM > ----- --- ---------------- ----------------- > 11849 47 dm Diabetes Mellitus > 11849 47 diabetes Diabetes Mellitus > 11849 47 diabete mellitus Diabetes Mellitus I've also added some particularly problematic words to the LvgAnnotator's ExclusionSet, e.g. > <!-- oth = C0449210, "OTH tumor staging notation" --> > <string>other</string><string>Other</string><string>OTHER</string> > <!-- moth = C1445661, "Moth antigen" --> > <string>mother</string><string>Mother</string><string>MOTHER</string> > <!-- plan = C0270724, "Infantile Neuroaxonal Dystrophy" --> > <string>planning</string><string>Planning</string><string>PLANNING</string> > <!-- not Attention Deficit Disorder --> > <string>adding</string><string>Adding</string><string>ADDING</string> > <!-- pas = C0030125, "p-Aminosalicylic acid" --> > <string>pass</string><string>Pass</string><string>PASS</string> > <string>passing</string><string>Passing</string><string>PASSING</string> > <!-- bre = C2363129, "Benign Rolandic Epilepsy" ?! --> > <string>bring</string><string>Bring</string><string>BRING</string> But what I'd really like is more control over LVG's behavior: for instance, blocking the "-er" suffixing rule completely, not letting the "-ing" rule apply to a stem without vowels, and not letting the plural rule add "-s" to stems ending in "s". I've fiddled with the LVG rules under ctakes-lvg-res, e.g. data/rules/dm.rul , but to no apparent effect. Has anybody done this? I've glanced at the cTAKES wiki, but the install instructions don't seem to address this level of customization; and I've skimmed the NLM documentation, but it doesn't seem to be intended for developers. Can anyone point me to more detailed docs? And: Has anyone tried plugging in another stemmer? To play nicely with the ctakes-dictionary-lookup-fast annotators, it seems as if all it would have to do would be to populate canonicalForm. Happy Friday, and thanks for any help you can provide! Kean Kaufmann NLP Developer RecordsOne, Inc.