FW: Facing issues with cTakes confidence score [EXTERNAL]

Finan, Sean Tue, 02 Jan 2018 09:35:31 -0800

Just in case somebody else has already done this or has any ideas, I am 
forwarding the question and one answer to:
> How do we find out which entity has more relevance to the document. I need 
> this, as we need to limit our outputs to max 10 terms for one clinical 
> document.

Sean

From: Finan, Sean
Sent: Tuesday, January 02, 2018 12:31 PM
To: 'Ratan Sharma'
Subject: RE: Facing issues with cTakes confidence score [EXTERNAL]

Hi Ratan,

There are a couple of things that you can do, but getting down to 10 terms per 
a note will be difficult.

The first thing to do is go into your resources and edit your dictionary’s 
setup xml file.  It is either in ctakes-dictionary-lookup-fast-res/ or 
resources/ depending upon how you are running.  Go all the way to the end of 
org/ctakes/dictionary/lookup/fast/   At the bottom of the xml file you will see 
a couple of commented lines, one with “PrecisionTermConsumer”.  Uncomment that 
line and comment out the line with “DefaultTermConsumer”.  This will limit 
mentions so that you will get things like “lung cancer” instead of both “lung 
cancer” and “cancer” – “lung cancer” being the more specific disease.  You will 
still get “lung” in each case as an anatomical site.

The second thing that you can do is build up a map of counts per CUI.  You can 
get a map of cuis and the number of times they appear in the document 
(Map<String,Long>) with the following command:
OntologyConceptUtil.getCuiCounts( jCas )

You can sort by the number of appearances and grab the top 10.  Another thing 
that might help is filtering out the negated concepts.  Something like:
Map<String,Long> topTenYes =
JCaseUtil.select( jCas, IdentifiedAnnotation.class ).stream()
.filter( ia -> ia.getPolarity != CONST.NE_POLARITY_NEGATION_PRESENT )
.map( OntologyConceptUtil::getCuiCounts )

Another thing to do would be to filter out by subject.  For each identified 
annotation use .getSubject().equals( CONST.ATTR_SUBJECT_PATIENT ).
Related to subject, you can filter out identified annotations in sections like 
family history.  Use JCasUtil.selectCovered( jCas, Segment.class, 
IdentifiedAnnotation.class ) and filter out when by checking each segment’s 
.getPreferredText().  If the preferred text is “Family Medical History” then 
you can probably discount everything in that section.
Likewise, if the mentions are in things like “Patient History” then they may 
not have to do with the current encounter.  You can find section names in the 
ctakes-core-res DefaultSectionRegex.bsv file.  You will need to have the 
BsvRegexSectionizer in your pipeline.  I would use the SectionedFastPipeline 
piper in ctakes-clinical-pipeline-res and your custom filtering annotator to 
the end of it.

Lastly, if you use the temporal modules you can filter by the time relative to 
the document time (doc time rel) being overlap or before/overlap.  Use the 
SectionedTemporalPipeline piper in ctakes-temporal-res.  Then some code like 
the following:
If ( annotation instanceof EventMention ) {
   Final Event event = ((EventMention)annotation).getEvent();
   If ( event != null ) {
      Final EventProperties properties = event.getProperties();
      If ( properties != null ) {
         Final String doctimerel = properties.getDocTimeRel();
         Final Boolean keepThisAnnotation = doctimerel != null && 
doctimerel.contains( “Overlap” );

That should give you a start.  I am not sure how much each will help, but they 
are suggestions of things that you can try.

Sean

From: Ratan Sharma [mailto:ratanc...@gmail.com]
Sent: Tuesday, January 02, 2018 11:20 AM
To: Finan, Sean
Subject: Re: Facing issues with cTakes confidence score [EXTERNAL]

Thanks Sean for the reply.

So is there no way we can assign relevance/confidence of entities. How do we 
find out which entity has more relevance to the document. I need this, as we 
need to limit our outputs to max 10 terms for one clinical document.

Thanks for your time on this. Really appreciate it.

On Tue, Jan 2, 2018 at 9:00 PM, Finan, Sean 
<sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>> 
wrote:
Hi Ratan,

What Tim said is absolutely correct.  Those mentions are all discovered by 
dictionary lookup procedures.  The default procedure is strict lookup against a 
term in the dictionary database and no lookup has any more validity than any 
other, so “confidence” is pretty meaningless.

Confidence can be introduced by other modules and for various reasons, but for 
creation of mentions using standard ctakes that value is never set.

Sean

From: Ratan Sharma [mailto:ratanc...@gmail.com<mailto:ratanc...@gmail.com>]
Sent: Saturday, December 30, 2017 5:23 AM
To: Finan, Sean
Subject: Facing issues with cTakes confidence score [EXTERNAL]

Hi Sean,

Can you please add your thoughts to this query :

http://ctakes.markmail.org/search/?q=#query:+page:1+mid:czbrt7itywvvjqnm+state:results<https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.markmail.org_search_-3Fq-3D-23query-3A-2Bpage-3A1-2Bmid-3Aczbrt7itywvvjqnm-2Bstate-3Aresults&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hg1c3pcA5EPDx0yofQhMmgyBwv8irHKwjV0fqOKJdfs&s=3TEOrc4vXTkcsD9cHx6KNk2tPxgwfqsavRF5jcD9wuU&e=>

I am looking for a way to distinguish which entity has higher weight-age than 
others..like a relevance score for each entity.

Is it possible we can have a meeting to discuss this. Anytime of yours is fine 
with me.

Thank you.
Ratan

FW: Facing issues with cTakes confidence score [EXTERNAL]

Reply via email to