RE: neural negation model in ctakes

2022-02-10 Thread Thomas W Loehfelm
Thanks for the BERT negation classifier Tim - I tested it on a small sample of 
radiology notes and didn't see a big improvement in accuracy (kudos to the 
original ctakes pipeline!) but your framework of a FastAPI component that can 
loosely interface with cTAKES was really interesting.

@Peter - I'd love to try your improved Negex Annotator too - is it available 
somewhere?

I made a relative of Tim's FastAPI component but based on the medSpaCy ConText 
annotator and am sharing it here in case it is useful to anyone else. I haven't 
gotten around to doing the head-to-head-to-head comparison between ctakes, 
RoBERTa, and medSpaCy Context, but will someday and can update the list.

Check it out here: https://github.com/twloehfelm/medSpaCy_Context

The FastAPI docs serve as a useful intro (localhost:8000/docs once up and 
running), but you basically pass it a dictionary of

{
accnum: [accession number],
report: [report text],
annotations: List[Annotation]
}

Where Annotation is an object with:
first_pos: Int
last_pos: Int
is_negated: bool
is_uncertain: bool
is_conditional: bool
is_historic: bool
subject: str

I parse ctakes annotations into a database similar to the annotation object, so 
with this FastAPI endpoint I pass in the cTAKES output and try to find the 
matching contexts using the medSpaCy context pipeline. I return a list of exact 
or overlapping (if no exact matching) spans where I set True if either cTAKES 
or medSpaCy thought that item was true. Again, haven't done the testing yet to 
figure out what the best way to ensemble these together might be.

Definite limitations:
* The annotations medspacy finds are not linked to any knowledge base. It is 
not hard to add that in using sciSpaCy, but I haven't found that to be helpful 
yet and it adds significantly to the startup and document processing time.
* Using overlapping matches is probably a bad idea, since some of the named 
entities identified by medSpaCy include the negation term which really messes 
with the is_negated accuracy.
* Probably a million others

Sharing in case it is useful to bootstrap someone else.


Tom

-Original Message-
From: Peter Abramowitsch 
Sent: Sunday, January 24, 2021 9:05 AM
To: dev@ctakes.apache.org
Subject: Re: neural negation model in ctakes

Thats great Tim - it sounds very sophisticated!

In fact I had made some changes to the Negex Annotator a last fall which I 
hadn't checked in but was waiting for Sean to test.  In a great deal of my own 
testing I discovered that Negex, which is easily expandable to accommodate new 
constructions, had only a couple of serious flaws and I
believe I have fixed these, as well as a performance issue it had.   If
you're interested in testing it up against yours that would be great.
Reading your description above, I wondered how it would do in the case of 
strings of entities which were negated by a single negating trigger phrase 
either ahead or behind the series.  Or what happens when a series of entities 
which begins as all being negated has one expressed in a way that stops the 
negation pattern.  These are the weaknesses I addressed in my changes.

Regards
Peter

On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy < 
timothy.mil...@childrens.harvard.edu> wrote:

> Hi all,
> I just checked in a usable proof-of-concept for a neural
> (RoBERTa-based to be specific) negation classifier. The way it works
> is a tiny bit of python code (using FastAPI) sets up a REST interface that 
> runs the classifier:
> ctakes-assertion/src/main/python/negation_rest.py
>
> it runs a default model that I trained and uploaded into Huggingface
> modelhub. It will automatically download the first time the server is run.
>
> there is a startup script there too:
> ctakes-assertion/src/main/python/start_negation_rest.sh
>
> The idea would be to run this on whatever machine you have with the
> appropriate GPU resources and it creates 3 REST endpoints:
> /negation/initialize  -- to load the model (takes longer the first
> time as it will download) /negation/process -- to classify the data
> and return negation values /negation/collection_process_complete -- to
> unload the model
>
> to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
>
> ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/Polarity
> BertRestAnnotator.java
>
> The main work here is converting the cTAKES entities/events into a
> simpler data structure that gets sent to the python REST server,
> making the REST call, and then converting the classifier output into the 
> polarity property.
>
> Performance:
> The accuracy of this classifier is much better in my testing. I am
> looking forward to being able to hopefully make the path to improving
> the performance easier as it can potentially just be a change to the
> model string to have it grab a new model on modelhu

Re: neural negation model in ctakes [EXTERNAL] [SUSPICIOUS]

2021-01-25 Thread Finan, Sean
Hi Tim,

This is really exciting!  

Just having this code available to use as a template is extremely useful.

Cheers,
Sean

From: Miller, Timothy 
Sent: Sunday, January 24, 2021 11:08 AM
To: dev@ctakes.apache.org
Subject: neural negation model in ctakes [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi all,
I just checked in a usable proof-of-concept for a neural (RoBERTa-based to be 
specific) negation classifier. The way it works is a tiny bit of python code 
(using FastAPI) sets up a REST interface that runs the classifier:
ctakes-assertion/src/main/python/negation_rest.py

it runs a default model that I trained and uploaded into Huggingface modelhub. 
It will automatically download the first time the server is run.

there is a startup script there too:
ctakes-assertion/src/main/python/start_negation_rest.sh

The idea would be to run this on whatever machine you have with the appropriate 
GPU resources and it creates 3 REST endpoints:
/negation/initialize  -- to load the model (takes longer the first time as it 
will download)
/negation/process -- to classify the data and return negation values
/negation/collection_process_complete -- to unload the model

to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java

The main work here is converting the cTAKES entities/events into a simpler data 
structure that gets sent to the python REST server, making the REST call, and 
then converting the classifier output into the polarity property.

Performance:
The accuracy of this classifier is much better in my testing. I am looking 
forward to being able to hopefully make the path to improving the performance 
easier as it can potentially just be a change to the model string to have it 
grab a new model on modelhub.

The speed is marginally slower if we do a 1-for-1 swap, but that's a little bit 
misleading, because we currently run 2 parsers to generate features for the 
default ML negation module. If we don't need those parsers we can dramatically 
cut the speed of the processing even with the neural negation module. I tested 
this with the python code running on a machine with a 1070ti. The goal for 
these methods going forward if we want to scale should be to have the neural 
call do a few things with a single pass, especially if we are using large 
transformer models. But this proof of concept of a single task will hopefully 
make it easier for other folks to do that if they wish.

FYI, another way of doing this is by using python libraries like cassis and 
actually having python functions be essentially UIMA AEs -- I think there will 
be a place for both approaches and I'm not trying to wall off work in that 
direction.

Tim



Re: neural negation model in ctakes [EXTERNAL]

2021-01-24 Thread Miller, Timothy
Peter, I'd be happy to try it, especially if it's made easy with a ctakes 
module! At the very least that sounds like it would be a good baseline 
comparison to use if we are benchmarking new ML methods. We have several 
datasets available internally that are not widely available in the research 
community.
Tim


From: Peter Abramowitsch 
Sent: Sunday, January 24, 2021 12:05 PM
To: dev@ctakes.apache.org
Subject: Re: neural negation model in ctakes [EXTERNAL]

* External Email - Caution *


Thats great Tim - it sounds very sophisticated!

In fact I had made some changes to the Negex Annotator a last fall which I
hadn't checked in but was waiting for Sean to test.  In a great deal of my
own testing I discovered that Negex, which is easily expandable to
accommodate new constructions, had only a couple of serious flaws and I
believe I have fixed these, as well as a performance issue it had.   If
you're interested in testing it up against yours that would be great.
Reading your description above, I wondered how it would do in the case of
strings of entities which were negated by a single negating trigger phrase
either ahead or behind the series.  Or what happens when a series of
entities which begins as all being negated has one expressed in a way that
stops the negation pattern.  These are the weaknesses I addressed in my
changes.

Regards
Peter

On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Hi all,
> I just checked in a usable proof-of-concept for a neural (RoBERTa-based to
> be specific) negation classifier. The way it works is a tiny bit of python
> code (using FastAPI) sets up a REST interface that runs the classifier:
> ctakes-assertion/src/main/python/negation_rest.py
>
> it runs a default model that I trained and uploaded into Huggingface
> modelhub. It will automatically download the first time the server is run.
>
> there is a startup script there too:
> ctakes-assertion/src/main/python/start_negation_rest.sh
>
> The idea would be to run this on whatever machine you have with the
> appropriate GPU resources and it creates 3 REST endpoints:
> /negation/initialize  -- to load the model (takes longer the first time as
> it will download)
> /negation/process -- to classify the data and return negation values
> /negation/collection_process_complete -- to unload the model
>
> to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
>
> ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java
>
> The main work here is converting the cTAKES entities/events into a simpler
> data structure that gets sent to the python REST server, making the REST
> call, and then converting the classifier output into the polarity property.
>
> Performance:
> The accuracy of this classifier is much better in my testing. I am looking
> forward to being able to hopefully make the path to improving the
> performance easier as it can potentially just be a change to the model
> string to have it grab a new model on modelhub.
>
> The speed is marginally slower if we do a 1-for-1 swap, but that's a
> little bit misleading, because we currently run 2 parsers to generate
> features for the default ML negation module. If we don't need those parsers
> we can dramatically cut the speed of the processing even with the neural
> negation module. I tested this with the python code running on a machine
> with a 1070ti. The goal for these methods going forward if we want to scale
> should be to have the neural call do a few things with a single pass,
> especially if we are using large transformer models. But this proof of
> concept of a single task will hopefully make it easier for other folks to
> do that if they wish.
>
> FYI, another way of doing this is by using python libraries like cassis
> and actually having python functions be essentially UIMA AEs -- I think
> there will be a place for both approaches and I'm not trying to wall off
> work in that direction.
>
> Tim
>
>


Re: neural negation model in ctakes

2021-01-24 Thread Peter Abramowitsch
Thats great Tim - it sounds very sophisticated!

In fact I had made some changes to the Negex Annotator a last fall which I
hadn't checked in but was waiting for Sean to test.  In a great deal of my
own testing I discovered that Negex, which is easily expandable to
accommodate new constructions, had only a couple of serious flaws and I
believe I have fixed these, as well as a performance issue it had.   If
you're interested in testing it up against yours that would be great.
Reading your description above, I wondered how it would do in the case of
strings of entities which were negated by a single negating trigger phrase
either ahead or behind the series.  Or what happens when a series of
entities which begins as all being negated has one expressed in a way that
stops the negation pattern.  These are the weaknesses I addressed in my
changes.

Regards
Peter

On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Hi all,
> I just checked in a usable proof-of-concept for a neural (RoBERTa-based to
> be specific) negation classifier. The way it works is a tiny bit of python
> code (using FastAPI) sets up a REST interface that runs the classifier:
> ctakes-assertion/src/main/python/negation_rest.py
>
> it runs a default model that I trained and uploaded into Huggingface
> modelhub. It will automatically download the first time the server is run.
>
> there is a startup script there too:
> ctakes-assertion/src/main/python/start_negation_rest.sh
>
> The idea would be to run this on whatever machine you have with the
> appropriate GPU resources and it creates 3 REST endpoints:
> /negation/initialize  -- to load the model (takes longer the first time as
> it will download)
> /negation/process -- to classify the data and return negation values
> /negation/collection_process_complete -- to unload the model
>
> to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
>
> ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java
>
> The main work here is converting the cTAKES entities/events into a simpler
> data structure that gets sent to the python REST server, making the REST
> call, and then converting the classifier output into the polarity property.
>
> Performance:
> The accuracy of this classifier is much better in my testing. I am looking
> forward to being able to hopefully make the path to improving the
> performance easier as it can potentially just be a change to the model
> string to have it grab a new model on modelhub.
>
> The speed is marginally slower if we do a 1-for-1 swap, but that's a
> little bit misleading, because we currently run 2 parsers to generate
> features for the default ML negation module. If we don't need those parsers
> we can dramatically cut the speed of the processing even with the neural
> negation module. I tested this with the python code running on a machine
> with a 1070ti. The goal for these methods going forward if we want to scale
> should be to have the neural call do a few things with a single pass,
> especially if we are using large transformer models. But this proof of
> concept of a single task will hopefully make it easier for other folks to
> do that if they wish.
>
> FYI, another way of doing this is by using python libraries like cassis
> and actually having python functions be essentially UIMA AEs -- I think
> there will be a place for both approaches and I'm not trying to wall off
> work in that direction.
>
> Tim
>
>


neural negation model in ctakes

2021-01-24 Thread Miller, Timothy
Hi all,
I just checked in a usable proof-of-concept for a neural (RoBERTa-based to be 
specific) negation classifier. The way it works is a tiny bit of python code 
(using FastAPI) sets up a REST interface that runs the classifier:
ctakes-assertion/src/main/python/negation_rest.py

it runs a default model that I trained and uploaded into Huggingface modelhub. 
It will automatically download the first time the server is run.

there is a startup script there too:
ctakes-assertion/src/main/python/start_negation_rest.sh

The idea would be to run this on whatever machine you have with the appropriate 
GPU resources and it creates 3 REST endpoints:
/negation/initialize  -- to load the model (takes longer the first time as it 
will download)
/negation/process -- to classify the data and return negation values
/negation/collection_process_complete -- to unload the model

to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java

The main work here is converting the cTAKES entities/events into a simpler data 
structure that gets sent to the python REST server, making the REST call, and 
then converting the classifier output into the polarity property.

Performance:
The accuracy of this classifier is much better in my testing. I am looking 
forward to being able to hopefully make the path to improving the performance 
easier as it can potentially just be a change to the model string to have it 
grab a new model on modelhub.

The speed is marginally slower if we do a 1-for-1 swap, but that's a little bit 
misleading, because we currently run 2 parsers to generate features for the 
default ML negation module. If we don't need those parsers we can dramatically 
cut the speed of the processing even with the neural negation module. I tested 
this with the python code running on a machine with a 1070ti. The goal for 
these methods going forward if we want to scale should be to have the neural 
call do a few things with a single pass, especially if we are using large 
transformer models. But this proof of concept of a single task will hopefully 
make it easier for other folks to do that if they wish.

FYI, another way of doing this is by using python libraries like cassis and 
actually having python functions be essentially UIMA AEs -- I think there will 
be a place for both approaches and I'm not trying to wall off work in that 
direction.

Tim