No, SHARPn was a later project. I'm not sure if there is any overlap in the datasets.
There are 2 ways to look at the features, one is to read this paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112774 and another is to look at the source: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/cleartk/AssertionCleartkAnalysisEngine.java?view=markup Tim -----Original Message----- From: ouyeyu panyu <ouy...@gmail.com<mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>> Reply-to: <u...@ctakes.apache.org> To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org> Cc: dev@ctakes.apache.org <dev@ctakes.apache.org<mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>> Subject: Re: Question about negation [EXTERNAL] Date: Wed, 16 Jan 2019 08:09:06 -0800 Hi Timothy, Thank you very much for the quick response. https://pdfs.semanticscholar.org/8f2c/a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__pdfs.semanticscholar.org_8f2c_a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=bdfSiGGOpy6_mnRe0CZd0-wjjUpY-DH7SrOU5_WMkZE&s=UhoZqDN8rO9tb4R791cI7gKRT7zn_O2yZ8VZpbsD3Ek&e=> says The Mayo-derived linguistically annotated corpus (Mayo) was developed in-house and consisted of 273 clinical notes (100 650 tokens; 7299 sentences; 61 consult; 1 discharge summary; 4 educational visit; 4 general medical examination; 48 limited exam; 19 multi-system evaluation; 43 miscellaneous; 1 preoperative medical evaluation; 3 report; 3 specialty evaluation; 5 dismissal summary; 73 subsequent visit; 5 therapy; 3 test-oriented miscellaneous). Is SHARPn based on the aforementioned 273 clinical notes? Also is there a way for me to look into the trained SVM model? Say what are features there and their weights? Best, Yu Pan On Wed, Jan 16, 2019 at 7:58 AM Miller, Timothy <timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>> wrote: It uses an SVM model. The training data is from a project called SHARPn, it is notes from Mayo Clinic with a variety of note types and specialties represented. As for the example, is it a real example that someone wrote "Deny hepatitis"? That sounds more like a command than documentation of a negated concept ("denies" or "denied" would seem more common?). Even if that is a real example, I think it's unusual enough that there are probably not examples of "Deny X" in the training data. Tim -----Original Message----- From: ouyeyu panyu <ouy...@gmail.com<mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>> Reply-to: <u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>> To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>, dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: Question about negation [EXTERNAL] Date: Wed, 16 Jan 2019 07:51:20 -0800 Hi ctakes dev team, I have one question, hope someone can help me with it. For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” returns polarity=1. It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, which is machine learning based. It seems this issue is caused by the training data. Is this true? And what is the training data and what machine learning algorithm is used? LogisticRegress, SVM, RandomForest or something else? Thanks.