Re: polarity tag in output for mention/concept. [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-11-28 Thread Kathy Ferro
Tim and Sean,

I think I spoke too soon. Don't know why it didn't work the 1st few times I
ran it.

I changed both to 20.  Looks like we are safe here; looks like it's 20 to
the left/right within the same sentence because the term "right breast" is
positive.








*my sample text:A Regional Med Center This should not interfere with ROS
section: denies fatigue, malaise, fever, weight lossMDM/ED
CourseImaging:The patient underwent an ultrasound-guilded core needle
biopsy with clip pacement of the 2.3cm mass in the upper outer quadrant of
the right breast.ROS: Heme/Lymphatic: denies easy or excessive bruising,
history of blood transfusions, anemia, bleeding disorders, adenopathy,
chills, sweatsAllergic/Immunologic: denies urticaria, hay fever, frequent
UTIs; denies HIV high risk behaviors*
Here's how I run it.  Is there a better way to run it?

I made a copy of runPipeFile.bat and added:

@rem -
set *FAST_PIPER*
=resources\org\apache\ctakes\clinical\pipeline\FastPipeline1.piper
java -Dctakes.umlsuser="myUser" -Dctakes.umlspw="myPW" -cp "%CLASS_PATH%"
%LOG4J_PARM% -Xms512M -Xmx3g %PIPE_RUNNER% -p %*FAST_PIPER*% %* -i
C:\Projects\NLPinbound --xmiOut C:\Projects\NLPoutbound  -l
org/apache/ctakes/dictionary/lookup/fast/aa_5.xml
:end


I am thrill with this discovery.
Thanks for your help.
Kathy


On Tue, Nov 28, 2017 at 2:05 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Thanks Tim,
>
> "Negation's Not Solved" :^)
>
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Tuesday, November 28, 2017 2:01 PM
> To: dev@ctakes.apache.org
> Subject: Re: polarity tag in output for mention/concept. [EXTERNAL]
> [SUSPICIOUS] [SUSPICIOUS]
>
> I'll just point out -- the kind of examples Kathy gave were the bane of
> our existence while working on the ML-based assertion system. Even though
> it is obvious what is going on to a human it was hard to encode as a
> feature in a way that was learnable. But I think most rule-based algorithms
> will also run into problems with this type of example eventually if they
> have a hard-coded scoping mechanism (e.g., scope extends up to 10 words to
> the right). If you make it larger than you may increase the number of false
> positives your algorithm finds (confusingly, here a false positive is an
> example the algorithm calls negated that is not actually being negated).
> Tim
>
>
> On Tue, 2017-11-28 at 17:22 +, Finan, Sean wrote:
> > Hi Kathy,
> >
> > I am glad that you checked the wiki!  I should have pointed to it ...
> >
> > In the example I sent the "relevant distance" between trigger terms
> > and events would be 10.  There isn't any maximum as far as I know, but
> > I think that 10 is the most that I've ever used.  The default is 7,
> > and you can try with that (remove "*=*") before increasing the
> > number(s).
> >
> > The piper files aren't source code, they are just plain text and don't
> > require compiling, etc.  How are you running the pipeline right now?
> > From a binary with a bin/run* script?
> >
> > Sean
> >
> >
> > -Original Message-
> > From: Kathy Ferro [mailto:healthcare1...@gmail.com]
> > Sent: Tuesday, November 28, 2017 12:11 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: polarity tag in output for mention/concept. [EXTERNAL]
> >
> > Sean,
> >
> > Thank you for information.
> >
> > I was reading the document.  So, the MaxLeftScopeSize and
> > MaxRightScopeSize are limit up to 10?  Is there anyway to adjust it
> > without modify the source code?
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org
> > _confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BNE-
> > 2BContexts=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=f
> > s67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=4K9fxMmBiI0QZB0UhriFp_Yv
> > XDL8rmXtGRiKVgxMCPE=hsCB9xPXLC8fpiwrGXuEW9snw_WZbY0e-E-mhPOO9N8=
> >
> >
> > Thanks again,
> > Kathy
> >
> >
> >
> > On Tue, Nov 28, 2017 at 9:31 AM, Finan, Sean < Sean.Finan@childrens.h
> > arvard.edu> wrote:
> >
> > >
> > > Hi Kathy,
> > >
> > > The negation annotator used in the default clinical pipeline is
> > > based upon machine learning and trained on real data.  It is
> > > possible that such "denies" lists were underrepresented in the
> > > training data.  One thing that you can try is adding another
> > > negation annotator.  The ContextAnnotator in ctakes-ne-contexts will
> > > add negation to terms without removing existing negation

RE: polarity tag in output for mention/concept. [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-11-28 Thread Finan, Sean
Thanks Tim,

"Negation's Not Solved" :^)

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, November 28, 2017 2:01 PM
To: dev@ctakes.apache.org
Subject: Re: polarity tag in output for mention/concept. [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

I'll just point out -- the kind of examples Kathy gave were the bane of our 
existence while working on the ML-based assertion system. Even though it is 
obvious what is going on to a human it was hard to encode as a feature in a way 
that was learnable. But I think most rule-based algorithms will also run into 
problems with this type of example eventually if they have a hard-coded scoping 
mechanism (e.g., scope extends up to 10 words to the right). If you make it 
larger than you may increase the number of false positives your algorithm finds 
(confusingly, here a false positive is an example the algorithm calls negated 
that is not actually being negated).
Tim


On Tue, 2017-11-28 at 17:22 +, Finan, Sean wrote:
> Hi Kathy,
> 
> I am glad that you checked the wiki!  I should have pointed to it ...
> 
> In the example I sent the "relevant distance" between trigger terms 
> and events would be 10.  There isn't any maximum as far as I know, but 
> I think that 10 is the most that I've ever used.  The default is 7, 
> and you can try with that (remove "*=*") before increasing the 
> number(s).
> 
> The piper files aren't source code, they are just plain text and don't 
> require compiling, etc.  How are you running the pipeline right now?  
> From a binary with a bin/run* script?
> 
> Sean
> 
> 
> -Original Message-
> From: Kathy Ferro [mailto:healthcare1...@gmail.com]
> Sent: Tuesday, November 28, 2017 12:11 PM
> To: dev@ctakes.apache.org
> Subject: Re: polarity tag in output for mention/concept. [EXTERNAL]
> 
> Sean,
> 
> Thank you for information.
> 
> I was reading the document.  So, the MaxLeftScopeSize and 
> MaxRightScopeSize are limit up to 10?  Is there anyway to adjust it 
> without modify the source code?
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org
> _confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BNE-
> 2BContexts=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=f
> s67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=4K9fxMmBiI0QZB0UhriFp_Yv
> XDL8rmXtGRiKVgxMCPE=hsCB9xPXLC8fpiwrGXuEW9snw_WZbY0e-E-mhPOO9N8=
> 
> 
> Thanks again,
> Kathy
> 
> 
> 
> On Tue, Nov 28, 2017 at 9:31 AM, Finan, Sean < Sean.Finan@childrens.h 
> arvard.edu> wrote:
> 
> > 
> > Hi Kathy,
> > 
> > The negation annotator used in the default clinical pipeline is 
> > based upon machine learning and trained on real data.  It is 
> > possible that such "denies" lists were underrepresented in the 
> > training data.  One thing that you can try is adding another 
> > negation annotator.  The ContextAnnotator in ctakes-ne-contexts will 
> > add negation to terms without removing existing negation.  It also 
> > has configurable scope/distance that may be helpful.
> > 
> > To use this, create a new piper file containing the two lines
> > 
> > load DefaultFastPipeline
> > add ContextAnnotator MaxLeftScopeSize=10 MaxRightScopeSize=10
> > 
> > The default scope sizes are 7, but increasing  the MaxRight* might 
> > help with your "denies" discoveries.  7 might be ok for the left, so 
> > feel free to remove "MaxLeftScopeSize=10" from the line.
> > 
> > Then run your piper file (command line, gui, maven profile, etc.) 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.o
> > rg_
> > confluence_display_CTAKES_Piper-
> > 2BFiles=DwIBaQ=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > Tao
> > =4K9fxMmBiI0QZB0UhriFp_YvXDL8rmXtGRiKVgxMCPE=rXqsHq_poDXmwkCf3L
> > 2M5
> > ZlsByCbUHcSWD84JQQuh5A=
> > 
> > Sean
> > 
> > -Original Message-
> > From: Kathy Ferro [mailto:healthcare1...@gmail.com]
> > Sent: Monday, November 27, 2017 8:10 PM
> > To: dev@ctakes.apache.org
> > Subject: polarity tag in output for mention/concept. [EXTERNAL]
> > 
> > Good evening,
> > 
> > I ran a few sentences through default clinical pipeline.
> > 
> > It really reliable if it's only one term after negative, but I am 
> > get in-consistent value for polarity for the list of terms.  Please 
> > see example below.
> > 
> > 1.   denies fatigue, malaise, fever, weight loss
> > SignSymthomMention:
> > polarity = -1:  fatigue, malaise,fever polarity = 1: weight loss.
> > Why

Re: polarity tag in output for mention/concept. [EXTERNAL] [SUSPICIOUS]

2017-11-28 Thread Miller, Timothy
I'll just point out -- the kind of examples Kathy gave were the bane of
our existence while working on the ML-based assertion system. Even
though it is obvious what is going on to a human it was hard to encode
as a feature in a way that was learnable. But I think most rule-based
algorithms will also run into problems with this type of example
eventually if they have a hard-coded scoping mechanism (e.g., scope
extends up to 10 words to the right). If you make it larger than you
may increase the number of false positives your algorithm finds
(confusingly, here a false positive is an example the algorithm calls
negated that is not actually being negated).
Tim


On Tue, 2017-11-28 at 17:22 +, Finan, Sean wrote:
> Hi Kathy,
> 
> I am glad that you checked the wiki!  I should have pointed to it ...
> 
> In the example I sent the "relevant distance" between trigger terms
> and events would be 10.  There isn't any maximum as far as I know,
> but I think that 10 is the most that I've ever used.  The default is
> 7, and you can try with that (remove "*=*") before increasing the
> number(s).
> 
> The piper files aren't source code, they are just plain text and
> don't require compiling, etc.  How are you running the pipeline right
> now?  From a binary with a bin/run* script?
> 
> Sean
> 
> 
> -Original Message-
> From: Kathy Ferro [mailto:healthcare1...@gmail.com] 
> Sent: Tuesday, November 28, 2017 12:11 PM
> To: dev@ctakes.apache.org
> Subject: Re: polarity tag in output for mention/concept. [EXTERNAL]
> 
> Sean,
> 
> Thank you for information.
> 
> I was reading the document.  So, the MaxLeftScopeSize and
> MaxRightScopeSize are limit up to 10?  Is there anyway to adjust it
> without modify the source code?
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org
> _confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BNE-
> 2BContexts=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=f
> s67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=4K9fxMmBiI0QZB0UhriFp_Yv
> XDL8rmXtGRiKVgxMCPE=hsCB9xPXLC8fpiwrGXuEW9snw_WZbY0e-E-mhPOO9N8=
> 
> 
> Thanks again,
> Kathy
> 
> 
> 
> On Tue, Nov 28, 2017 at 9:31 AM, Finan, Sean < Sean.Finan@childrens.h
> arvard.edu> wrote:
> 
> > 
> > Hi Kathy,
> > 
> > The negation annotator used in the default clinical pipeline is
> > based 
> > upon machine learning and trained on real data.  It is possible
> > that 
> > such "denies" lists were underrepresented in the training
> > data.  One 
> > thing that you can try is adding another negation annotator.  The 
> > ContextAnnotator in ctakes-ne-contexts will add negation to terms 
> > without removing existing negation.  It also has configurable
> > scope/distance that may be helpful.
> > 
> > To use this, create a new piper file containing the two lines
> > 
> > load DefaultFastPipeline
> > add ContextAnnotator MaxLeftScopeSize=10 MaxRightScopeSize=10
> > 
> > The default scope sizes are 7, but increasing  the MaxRight* might 
> > help with your "denies" discoveries.  7 might be ok for the left,
> > so 
> > feel free to remove "MaxLeftScopeSize=10" from the line.
> > 
> > Then run your piper file (command line, gui, maven profile, etc.) 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.o
> > rg_
> > confluence_display_CTAKES_Piper-
> > 2BFiles=DwIBaQ=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > Tao
> > =4K9fxMmBiI0QZB0UhriFp_YvXDL8rmXtGRiKVgxMCPE=rXqsHq_poDXmwkCf3L
> > 2M5
> > ZlsByCbUHcSWD84JQQuh5A=
> > 
> > Sean
> > 
> > -Original Message-
> > From: Kathy Ferro [mailto:healthcare1...@gmail.com]
> > Sent: Monday, November 27, 2017 8:10 PM
> > To: dev@ctakes.apache.org
> > Subject: polarity tag in output for mention/concept. [EXTERNAL]
> > 
> > Good evening,
> > 
> > I ran a few sentences through default clinical pipeline.
> > 
> > It really reliable if it's only one term after negative, but I am
> > get 
> > in-consistent value for polarity for the list of terms.  Please
> > see 
> > example below.
> > 
> > 1.   denies fatigue, malaise, fever, weight loss
> > SignSymthomMention:
> > polarity = -1:  fatigue, malaise,fever polarity = 1: weight loss.
> > Why does weight loss got single out?
> > 
> > 2.   denies ear pain or discharge, nasal obstruction or discharge,
> > sore
> > throat
> > polarity = -1: ear pain or discharge
> > polarity = 1: nasal obstruction or discharge, obstruction, sore
> > throat 
> > Doesn't even acknowledge the list.
> > 
> > 3.   denies back pain, joint swelling, joint stiffness, joint pain
> > polarity = -1: back pain, Swelling
> > polarity = 1: Joint swelling, Stiffness, pain What! totally messy
> > the 
> > pattern.
> > 
> > 4.   denied back pain, joint swelling, joint stiffness, joint pain
> > Ok, may be it doesn't like the word denies; I changed to denied,
> > deny, 
> > etc..
> > polarity = -1 : Swelling
> > everything else is 1.
> > 
> > 
> > My question is:
> > How do I handle the negative claims in the document?