Tim, this is a very interesting observation. Could you please send a few examples of what LVG generates? Both sensical and non :)
Dima On Apr 17, 2014, at 11:28, Miller, Timothy <timothy.mil...@childrens.harvard.edu> wrote: > The LVG annotator creates an enormous number of "lemmas" for every > WordToken in the CAS, and I'm wondering what the original purpose was? I > think this is probably a minor bottleneck for speed but mostly a pretty > big space hog (at least 50% of the space of xmi files in my tests). > > As of right now I'm not sure if any downstream components are using > these lemmas, and on a manual inspection the precision seems to be > pretty abysmal (meaning most of them are nonsensical as lexical > variants), so as I said, just wondering if we can revisit why cTAKES > generates so many and whether that component can be optimized. > > Thanks > Tim >