Hi Jeff, The short answer: No, LVG is not in the pipeline created by the DefaultFastPipeline.piper
Longer answer: In older versions of dictionary lookup the Lexical Variant Generator module (LVG) was recommended to capture lexical variants of terms. However, the dictionary resource already contains variants so the LVG module should not make much of a difference. When the fast lookup was new several years ago I ran a test with and without LVG on two datasets and the difference was along the lines of +1-2% recall, -1% precision. I think that ClinicalPipelineFactory.getFastPipeline() was a copy-paste of the previous .getClinicalPipeline() but with the dictionary module replaced. So, LVG is still in that method -created pipeline. When I (more recently) wrote that piper file that you reference I left out LVG as the added burden didn't seem to warrant its presence. When I say burden I don't just mean speed decrease and memory footprint. There have been a lot of configuration problems with LVG on various systems which led to difficulty using ctakes. The diagram that you reference places LVG after the dictionary lookup, and after the part of speech tagger, while the page on lvg https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+LVG lists those as the two modules that may benefit from its presence. That diagram is very old and should definitely be updated. Both the diagram and the page on lvg include information that precedes (does not account for) the existence of the fast dictionary lookup. Sean ________________________________________ From: Jeffrey Miller <[email protected]> Sent: Tuesday, February 19, 2019 10:53 AM To: [email protected] Subject: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL] Hi, I was wondering if the LVG Annotator is included DefaultFastPipeline.piper <https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dclinical-2Dpipeline-2Dres_src_main_resources_org_apache_ctakes_clinical_pipeline_DefaultFastPipeline.piper&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=3Sgs1Jc-C37kcy1efCEhU_3RV4aFipAt1lbTO0Wu_Ns&e=>. I have tried to trace through all the includes, but I cannot find it. However, when I look at the code for the ClinicalPipelineFactory.getFastPipeline() it seems to be included. <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_513bb49ebb98c4ac63f690c7b88a82aff18947b8_ctakes-2Dclinical-2Dpipeline_src_main_java_org_apache_ctakes_clinicalpipeline_ClinicalPipelineFactory.java-23L98&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=kmZDExXBOyXg84kix__UvgD3LniSHa8MgL8K5fK3XC4&e=> From documentation in this flow diagram <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_68718172_ctakes-2D3.1-2Ddependencies.png-3Fversion-3D1-26modificationDate-3D1488992146000-26api-3Dv2&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=4yYVqkyLiodAWATji1EjSwoMh-YpU7qTz2J8tZvRT6I&e=> from the components documentation page <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BComponent-2BUse-2BGuide&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=m-9MenhmNTr2vdVAhCvKgBt48OUiQB8R2TkR7fEYtsY&e=>, it seems to be a recommended component for the dictionary annotator. Thanks for your help, Jeff
