Sure, just as an example, I gave it a note with about 1000 words. It
generates 11500 NonEmptyFSList elements (each is basically one lexical
variant).

For the word "symptomatic", these are the first 10 of 20 lexical variants:
Symptomaticer/JJ
Symptomaticer/RB
Symptomaticed/VB
Symptomaticcing/VB
Symptomatics/VB
Symptomatics/NN
Symptomaticked/VB
Symptomatic/VB
Symptomatic/JJ
Symptomatic/RB

Tim


On 04/17/2014 12:31 PM, Dligach, Dmitriy wrote:
> Tim, this is a very interesting observation. Could you please send a few 
> examples of what LVG generates? Both sensical and non :)
>
> Dima
>
>
>
>
> On Apr 17, 2014, at 11:28, Miller, Timothy 
> <timothy.mil...@childrens.harvard.edu> wrote:
>
>> The LVG annotator creates an enormous number of "lemmas" for every
>> WordToken in the CAS, and I'm wondering what the original purpose was? I
>> think this is probably a minor bottleneck for speed but mostly a pretty
>> big space hog (at least 50% of the space of xmi files in my tests).
>>
>> As of right now I'm not sure if any downstream components are using
>> these lemmas, and on a manual inspection the precision seems to be
>> pretty abysmal (meaning most of them are nonsensical as lexical
>> variants), so as I said, just wondering if we can revisit why cTAKES
>> generates so many and whether that component can be optimized.
>>
>> Thanks
>> Tim
>>
>

Reply via email to