To get predicate argument structure the best method is probably to use the SRL (Semantic Role Labeling) annotator which is part of the ctakes-dependency-parser module. Check in the desc/ directory in that module for some sample pipelines to see its dependencies. Once you have that running, look for the types: org.apache.ctakes.typesystem.type.textsem.Predicate org.apache.ctakes.typesystem.type.textsem.SemanticArgument org.apache.ctakes.typesystem.type.textsem.SemanticRoleRelation
in the CVD to get a feel for how predicate arguments are represented in the CAS. If you are not familiar with SRL maybe check out this demo: http://cogcomp.org/page/demo_view/SRL and these slides (specifically the propbank, that is the style cTAKES uses): https://nlp.stanford.edu/kristina/papers/SRL-Tutorial-post-HLT-NAACL-06.pdf I believe StanfordNLP has a module to do this too, but of course not trained on clinical data and not using the augmented set of verb senses that were created by the PropBank team for the clinical domain. Tim ________________________________________ From: Don Flinn <fl...@alum.mit.edu> Sent: Monday, June 18, 2018 5:40 AM To: dev@ctakes.apache.org Subject: Parse Medical Research Papers [EXTERNAL] I want to parse medical research papers and am looking at using Ctakes. I do realize that Ctkes is aimed at Clinical Reports, but I would like to see if I can use it for my purposes. I'm initially looking to get a tuple of Subject, Predicate, Object for each sentence and later additional semantic information.. I modified ClinicalPipelineFactory.java to use the following portion of a research report - "A research team based in Houston has developed a prototype for a “bionic” heart replacement. Other designs all mimic the beating of a heart, but due to many moving parts, the mechanical hearts would quickly wear out. The heart developed by BiVACOR does not beat, and instead has one moving part which propels the blood throughout the body. The bionic heart has been safely and successfully transplanted into animals leading to very promising results." I got the following result - Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient === Generic? false === Conditional? false === History? false Entity: replacement === Polarity: 1 === Uncertain? false === Subject: patient === Generic? false === Conditional? false === History? false Entity: mimic === Polarity: 0 === Uncertain? false === Subject: null === Generic? false === Conditional? false === History? false Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient === Generic? false === Conditional? false === History? false Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient === Generic? false === Conditional? false === History? false Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient === Generic? false === Conditional? false === History? false I assume my problem is related to the Snomed database, which is not trained for what I want. My questions - Is my assumption correct? Should I attempt to modify/extend Snomed? Is there a better/different way to query Snomed to meet my needs? Is there an existing database that I could use with Ctakes that would more meet my needs? Should I instead use the Stanford Java NLP system or the Apache OpenNLP? I'll still need a database. Thank you for any suggestions Don