Thanks James. Does it ring a bell to you that the original intention was something like query expansion for a dictionary lookup? Tim
On 04/17/2014 01:57 PM, Masanz, James J. wrote: > Offhand I recall at least one of the dependency parsers used the Lemma > annotations at one point. > Not sure if still does. > > There is an option for turning off the posting of the lemmas to the cas. > > Hope that helps > > -----Original Message----- > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > Sent: Thursday, April 17, 2014 11:27 AM > To: dev@ctakes.apache.org > Subject: lvg entries > > The LVG annotator creates an enormous number of "lemmas" for every > WordToken in the CAS, and I'm wondering what the original purpose was? I > think this is probably a minor bottleneck for speed but mostly a pretty > big space hog (at least 50% of the space of xmi files in my tests). > > As of right now I'm not sure if any downstream components are using > these lemmas, and on a manual inspection the precision seems to be > pretty abysmal (meaning most of them are nonsensical as lexical > variants), so as I said, just wondering if we can revisit why cTAKES > generates so many and whether that component can be optimized. > > Thanks > Tim > >