Hi everyone, I've spent the last a few months working on a clinical NLP project using cTAKES. It's a very complex system to me and every time I dig into it some new discoveries will come out. Since last week, I tried to figure out which analysis engine can help to do a good job to consider cases like negation, family history, uncertainty, etc. By now, I had some experience and would like to share with the community.
The best combination for me is to use assertionMiniPipelineAnalysisEngine for negation, uncertainty, generic and subject detection, and HistoryCleartkAnalysisEngine for history detection. Both engines are in desc/ctakes-assertion folder. The assertionMiniPipelineAnalysisEngine also claims to be useful for conditional detection, which I haven't verified using my test files yet. I'm using the AggregatePlaintextFastUMLSProcessor on the higher level. The default analysis engines in AggregatePlaintextFastUMLSProcessor for negation, uncertainty, generic, etc. are StatusAnnotator + NegationAnnotator + PolarityCleartkAnalysisEngine + SubjectCleartkAnalysisEngine + UncertaintyCleartkAnalysisEngine + GenericCleartkAnalysisEngine + HistoryCleartkAnalysisEngine. It looks like in the node part, StatusAnnotator and NegationAnnotator are commented out, so only the remaining five analysis engines are actually used and all of them are in the same desc/ctakes-assertion folder. These five analysis engines were not effective in my test files and I'm still confused by their relationship to the assertionaAnalysisEngine, conceptConverterAnalysisEngine, GenericAttributeAnalysisEngine and SubjectAttributeAnalysisEngine used in assertionMiniPipelineAnalysisEngine. It looks to me the Clear in their names indicate something but I couldn't figure it out without going through the java code, which I intend not to do at this level. That's pretty much all of it for now. Anyone familiar with this topic are welcome to jump in to provide my insights or correction. Hopefully, we can have a nice discussion that can be useful to other users and developers. ps. The reason for using AggregatePlaintextFastUMLSProcessor rather than AggregatePlaintextProcessor is that I find the preferred words property in the former very useful while it can't be detected using the latter. Best, Yiming -- Yiming Zuo <https://sites.google.com/site/yimingzuo/> Georgetown U. Medical Center: Dr. Ressom's Omics Lab <http://omics.georgetown.edu/> ECE Department of Virginia Tech: Computational Bioinformatics & Bio-imaging Laboratory <http://www.cbil.ece.vt.edu/>