There is a lot of config handling, maybe PostLemmas is being set to true or configInit() is not setting up the NLM wrapper incorrectly.
ctakes-lvg *README* Note: as distributed, PostLemmas is set to false. This is done to reduce the size of the CAS. Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma annotations added to the CAS. *LvgAnnotator.xml * PostLemmas = True *LvgAnnotator.java* if (postLemmas) { lvgResource.getLvgLex() } On Thu, Apr 17, 2014 at 3:23 PM, Masanz, James J. <masanz.ja...@mayo.edu>wrote: > The normalizedForm field is filled in. It is used by dictionary lookup. > > So, for example, if the dictionary would contain "lymph node" but not > "lymph nodes", a document with text of "lymph nodes" would match the > dictionary entry "lymph node" because "node", being the normalized form of > "nodes", would be used when searching dictionary entries (in addition to > searching dictionary entries for "nodes") > > -----Original Message----- > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > Sent: Thursday, April 17, 2014 4:33 PM > To: dev@ctakes.apache.org > Subject: Re: lvg entries > > Quick follow-up since I was interested. The current dependency parser > does have the option to use ctakes lemmas or do its own lemmatizing, but > that doesn't use the lemma field, it uses the normalizedForm field. I'm > not sure if that field is actually ever filled in -- on my example data > it is always null. > > Tim > > On 04/17/2014 01:57 PM, Masanz, James J. wrote: > > Offhand I recall at least one of the dependency parsers used the Lemma > annotations at one point. > > Not sure if still does. > > > > There is an option for turning off the posting of the lemmas to the cas. > > > > Hope that helps > > > > -----Original Message----- > > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > > Sent: Thursday, April 17, 2014 11:27 AM > > To: dev@ctakes.apache.org > > Subject: lvg entries > > > > The LVG annotator creates an enormous number of "lemmas" for every > > WordToken in the CAS, and I'm wondering what the original purpose was? I > > think this is probably a minor bottleneck for speed but mostly a pretty > > big space hog (at least 50% of the space of xmi files in my tests). > > > > As of right now I'm not sure if any downstream components are using > > these lemmas, and on a manual inspection the precision seems to be > > pretty abysmal (meaning most of them are nonsensical as lexical > > variants), so as I said, just wondering if we can revisit why cTAKES > > generates so many and whether that component can be optimized. > > > > Thanks > > Tim > > > > > >