Sure, just as an example, I gave it a note with about 1000 words. It generates 11500 NonEmptyFSList elements (each is basically one lexical variant).
For the word "symptomatic", these are the first 10 of 20 lexical variants: Symptomaticer/JJ Symptomaticer/RB Symptomaticed/VB Symptomaticcing/VB Symptomatics/VB Symptomatics/NN Symptomaticked/VB Symptomatic/VB Symptomatic/JJ Symptomatic/RB Tim On 04/17/2014 12:31 PM, Dligach, Dmitriy wrote: > Tim, this is a very interesting observation. Could you please send a few > examples of what LVG generates? Both sensical and non :) > > Dima > > > > > On Apr 17, 2014, at 11:28, Miller, Timothy > <timothy.mil...@childrens.harvard.edu> wrote: > >> The LVG annotator creates an enormous number of "lemmas" for every >> WordToken in the CAS, and I'm wondering what the original purpose was? I >> think this is probably a minor bottleneck for speed but mostly a pretty >> big space hog (at least 50% of the space of xmi files in my tests). >> >> As of right now I'm not sure if any downstream components are using >> these lemmas, and on a manual inspection the precision seems to be >> pretty abysmal (meaning most of them are nonsensical as lexical >> variants), so as I said, just wondering if we can revisit why cTAKES >> generates so many and whether that component can be optimized. >> >> Thanks >> Tim >> >