Re: Disambiguation --alignment with SNOMED [EXTERNAL]

Finan, Sean Wed, 09 Dec 2020 08:49:09 -0800

Hi Eugenia,

I think that I actually have code scattered about that can help a lot of this.  
It isn't checked in and I will need to shove some things around to make it 
fully-ctakes-compatible.

I can't do anything right now, but since this seems to be pretty urgent for you 
I will start putting things together after work.

Sean
________________________________________
From: Monogyiou, Eugenia <eugenia.monogy...@nttdata.com>
Sent: Wednesday, December 9, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Disambiguation --alignment with SNOMED [EXTERNAL]

* External Email - Caution *

Many thanks for the support Sean. So let me explain myself a bit .

Ideally I would need all the key entities so Medications , Diseases , 
Sign/Symptom and Procedures, Labs extracted from a set of clinical letters. 
These letters are not clinical notes, i.e. they have paragraphs with titles 
that tell you what type of content to expect in the paragraph but there is no 
timeline of events described in the letter and you cannot associate entities 
events via cause and effect type of relationships. They refer to heart 
attack/heart disease cohorts only.

The absolute priority at the moment are Medications and Diseases, would be 
great if I could get the medications dosages which , when actually described, 
are always in different formats and 99% of these formats cannot be found in the 
measurement files in the source code.
The other key concern is disambiguation (since I can't make YTEX to work) so I 
am thinking of other ways to apply disambiguation (e.g. lexicon expansion, 
pre-processing of certain acronyms etc.).
And lastly but very importantly I need to create an evaluation process against 
a gold standard dataset (are there templates/ code included in the trunk that 
do evaluation?)

Any feedback is greatly appreciated  :)

Kind Regards,
Eugenia

-----Original Message-----
From: Finan, Sean <sean.fi...@childrens.harvard.edu>
Sent: 09 December 2020 14:23
To: dev@ctakes.apache.org
Subject: Re: Disambiguation --alignment with SNOMED [EXTERNAL]

Hi Eugenia,

I don't know that anybody on the devlist regularly uses the 
org.hsqldb.util.DatabaseManager tool and there might be a better forum for 
questions on that topic.

We could take a step back here and see if there might be more direct ways to 
address your efforts.  By that I mean perhaps we can look at the larger picture 
and come up with some single solution to many smaller problems.

What exactly are you trying to extract from your documents?  Do you have a 
certain clinical domain or certain clinical elements that interest you the 
most?  Are you only interested in entities from a single vocabulary?

Thanks,
Sean
________________________________________
From: Monogyiou, Eugenia <eugenia.monogy...@nttdata.com>
Sent: Wednesday, December 9, 2020 5:52 AM
To: dev@ctakes.apache.org
Subject: RE: Disambiguation --alignment with SNOMED [EXTERNAL]

* External Email - Caution *

Many thanks for the suggestion. Before I use the sample tool I tried the hsqldb 
manager and the results were surprising. Please bear with me because I am 
really confused...

I copied the hsqldb jar where my dict script and properties files are and then 
I navigated there and run the following commands

java -cp  hsqldb-2.3.4.jar   org.hsqldb.util.DatabaseManager (the gui launched 
successfully and user was set to SA)  and then set the URL to

jdbc:hsqldb:\apache-ctakes-4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab\sno_rx_16ab
  -- connection established successfully.

However , the tree is empty! No schema no tables --- if I open the script file 
all the relevant SQL commands are there to create cui_terms, prefterm tables  
etc. etc.  Any idea what I am doing wrong? Is it possible that my installation 
was problematic or is it a matter of configuration? I have used many pipelines 
and I am getting many annotations from different coding schemes across all the 
key entities , I can even see in the command line when the hsqldb is accessed 
when I run the pipelines so I must be missing something here?

Thank you for your patience with me !

Kind Regards,

Eugenia Monogyiou | NTT Data UK
Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG

Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com

-----Original Message-----
From: Finan, Sean <sean.fi...@childrens.harvard.edu>
Sent: 08 December 2020 19:23
To: dev@ctakes.apache.org
Subject: Re: Disambiguation --alignment with SNOMED [EXTERNAL]

Hi Eugenia,

Within the past few years people have made some user-friendly tools.  Check out:

https://urldefense.proofpoint.com/v2/url?u=https-3A__razorsql.com_features_hsqldb-5Fgui-5Ftools.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=mEsAu8ltaPsSc6-L0Tmb1ebbALPXcZ65moWihkB6ypk&s=qP7jn08uKkzKE4-M62frNWUXAd5JFXaB5JJZsVpy8z4&e=

After you launch it (30 day trial is free), create a new connection as the 
first panel indicates.

- Type any name for the connection (line #1)
- for the Administrator, type "SA"
- point to the sno_rx_16ab.script file (or better yet, a copy that you can play 
with)

You are done with those settings.  Opening the db will take a few seconds.

In the new main panel, in the tree on the left select Project > PUBLIC > Tables 
You should see the tables that are relevant to ctakes.

Right-click on one of the tables and select "Edit"
A panel should pop up with the table.  You should be able to edit the columns 
and rows.

There is also a gui that the hsqldb people make, but it is a little primitive.
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.hsqldb.org_doc_2.0_guide_running-2Dchapt.html-23rgc-5Faccess-5Ftools&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=mEsAu8ltaPsSc6-L0Tmb1ebbALPXcZ65moWihkB6ypk&s=bmgP2Ljg9HgF3lusL1at9CWngr1FlY6UCBiBmU1PSIk&e=
  You can find instructions online such as:
https://urldefense.proofpoint.com/v2/url?u=https-3A__waqasaslam.me_2019_06_24_how-2Dto-2Dview-2Dhsql-2Ddb-2Din-2Da-2Dgui-2Dhsql-2Ddatabase-2Dmanager-2Dfor-2Dsap-2Dhybris_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=mEsAu8ltaPsSc6-L0Tmb1ebbALPXcZ65moWihkB6ypk&s=r8yhYzOUQwbxZJgqFiF09In5dJ5QulCk7UzkuZuOjkI&e=

Sean

________________________________________
From: Monogyiou, Eugenia <eugenia.monogy...@nttdata.com>
Sent: Tuesday, December 8, 2020 1:08 PM
To: dev@ctakes.apache.org
Subject: RE: Disambiguation --alignment with SNOMED [EXTERNAL]

* External Email - Caution *

Many thanks for your support this far I am finally shaping a decent pipeline 
for the data I am working on -- Chunk in the overlap annotator actually works 
great!

I never tried to access the hsqldb before so I will have to ask instructions 
for that please (signal processing person here)!
I looked up some advice from Sean i.e.  to copy the hsql***.jar from 
[ctakes_root]/lib/ , navigate to the 
cTakes\apache-ctakes-4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab
 folder on command line and type  java -cp hsqldb-2.3.4.jar 
org.hsqldb.util.SqlTool --rcfile [sno_rx_16ab].rc [sno_rx_16ab] to open the 
SqlTool.

The  sno_rx_16ab dictionary was created with the Dictionary Creator GUI of 
ctakes 4.0.0 ...Problem is that there is no .rc file there ... there are 
.script,  .properties and .lck files . I know the .script file contains the SQL 
commands but I prefer not to improvise and change something .. any instructions 
on how I can inspect the tables the tables below in hsqldb please?

Thank you!

Kind Regards,

Eugenia

-----Original Message-----
From: Peter Abramowitsch <pabramowit...@gmail.com>
Sent: 04 December 2020 20:03
To: dev@ctakes.apache.org
Subject: Re: Disambiguation --alignment with SNOMED

Hi Eugenia.  I may be wrong, but that XML definition is out of date (which is 
why it is commented out).  Through the piper mechanism you have a
different choice.   Here follows a bit more.  I hope some of it is
useful....

Highly specific identification of terms is difficult and I am working on some 
infrastructure to help in really capturing values - not only lab values, but it 
will take a long time as I'm just doing it for fun.  But your problem seems 
more like a dictionary issue.

I won't pretend to be an expert or to have tried out every possibility, but 
I'll give you a few tips.  The important thing is to know that, for me at 
least,  Ctakes is not a finished product but an eternal work in progress.
It takes years of experimentation and configuration.

First you need to understand what specific terms and contexts your physicians 
are using and whether the punctuation is clean enough that you can work with 
sentences or need to go down to the chunk level.

in the UMLS Dictionary Lookup mechanism , the WindowAnnotation param is 
probably something you can supply in a piper file and it is the FQN of a
class that extends Annotation.   You could create your own Annotation &
Annotator, or you could try using a Chunk annotator upstream of the UMLS
lookup.   The piper creator helps you do that.   Then you would add the FQN
of a Chunk to the window param of your UMLS lookup annotator.   I used it a
long time ago and from what I remember it basically tries to identify clauses 
within sentences.  By doing this - especially with the Overlap Annotator, you'd 
prevent spilling the lookup across clauses within a sentence.

You may want to play with the SentenceDetectorAnnotatorBIO instead of the 
SentenceDetector to see which gets you the most workable sentences.  And you 
may want to look at this file  EndOfSentenceScannerImpl.java

Customizing the dictionary usually means adding a synonym for each wording that 
represents context in which your term will be found.  Now in your specific 
example about a monocyte procedure vs a monocyte count result,
these are not just distinct in SNOMED terms but also distinct CUIs.   Here
are the two canonical terms with their CUIS as I found them, then each has its 
synonyms.  As you can see that these SYNONYMS are woefully insufficient and not 
only have the synonyms blurred the distinction you were looking for, but the 
SNOMED mapping overlaps the two concepts.  This was probably done as an 
expedient, but from an informatics perspective, you are right.
This is incorrect.

INSERT INTO PREFTERM VALUES(750880,'Monocyte count result')    (TUI 34)
SYNONYMS count monocytes,
SNO *365631001*

INSERT INTO PREFTERM VALUES(200637,'Monocyte count procedure')  (TUI 59) 
SYNONYM monos, monocyte count SNO 67776007, *365631001*

Check out how a row like this works.
INSERT INTO CUI_TERMS VALUES(CUI,INDEX,COUNT,'<context with
keyword>','<keyword>')

You can add these rows to match the language used by your physicians or in your 
forms.

I had to do a fair bit of juggling to get what we needed and it's a job
that's never finished.   The way I save my changes is to produce sed files
of deletions, changes, additions made to the standard dictionary, and archive 
those rather than the dictionary which is quite large

I hope this helps.

Peter

On Fri, Dec 4, 2020 at 5:07 PM Monogyiou, Eugenia < 
eugenia.monogy...@nttdata.com> wrote:

> Thank you all for the support!
> Sean, Kean the labValueFinder works as described so thanks for
> pointing that out!
>
> Peter, I will ask for your help with the LookupWindow if you could
> please spare a bit more time... I have located the
> UmlsOverlapLookupAnnotator file, thank you for that.
>
> I have located in the UmlsLookupAnnotator (in
> ctakes-dictionary-lookup-fast)
> <name>windowAnnotations</name>
>             <value>
>                <!--  LookupWindowAnnotation is supposed to be a
> refined Noun Phrase  -->
>
>  
> <!--<string>org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation</string>-->
>                <!--  In some instances LookupWindowAnnotation is
> missing tokens and Sentence can be used -->
>
>  <string>org.apache.ctakes.typesystem.type.textspan.Sentence</string>
>             </value>
>
> I have gone through various java and typesystem files but I am not
> sure where I can find all the potential options for the Lookup Window
> and where/how I can set these. Also, if you could please let me know
> where in the code it is possible to see what symbols are considered
> "end-of sentence". I have noticed that ":" sometimes defines the end
> of a sentence but I haven't located anything relevant in the code ...
>
> Peter says :
> > > Sometimes you need to further customize your dictionary. (can you
> please elaborate ?)
>
> Many thanks in advance,
>
> Kind Regards,
>
> Eugenia Monogyiou | NTT Data UK
> Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>
> Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com
>
>
> -----Original Message-----
> From: Peter Abramowitsch <pabramowit...@gmail.com>
> Sent: 03 December 2020 18:54
> To: dev@ctakes.apache.org
> Subject: Re: Disambiguation --alignment with SNOMED
>
> I have this issue a lot.  There are many moving parts.   Sometimes it can
> be resolved by using the widest window in the DictionaryLookup or
> sometimes the TermOverlap lookup annotator.  Sometimes you need to
> further customize your dictionary.
>
> The problem arises when there isn't enough context to whittle down the
> lookup to the correct SNOMED entity. Or there isn't a synonym entry in the
> Dictionary that maps to the widest context in your texts.    If you look at
> how the UMLS SNO_RX dictionary is structured you'll see how it can happen.
>
> For starters, look at the raw XMI and see all the entries in the
> UmlsArray that were selected even if later, only the wrong one entry surfaced.
>
> Another issue is the LabValueFinder.  It has settings that allow it to
> clone procedures into lab values or vice versa (I can't remember).
> This can lead to a lot of duplication
>
> Peter
>
> On Thu, Dec 3, 2020 at 2:23 PM Monogyiou, Eugenia <
> eugenia.monogy...@nttdata.com> wrote:
>
> > Hello,
> >
> > I think I have hit a wall in terms of applying disambiguation in the
> > cTakes context. I have come across the following example where what
> > I consider to be a lab result (Monocyte Count) is picked up as a
> > procedure, apparently, in alignment with UMLS
> > coding Scheme = SNOMED    Code =67776007,     CUI =C0200637  ,  TUI =T059
> > , preferredText = " Monocyte Count Procedure"
> > coding Scheme = SNOMED    Code =365631001,   CUI =C0200637  ,  TUI =T059
> ,
> > preferredText = " Monocyte Count Procedure"
> >
> > While they share the CUI (at UMLS level, due to the reconciliation
> > of different ontologies), they are quite different concepts.
> > 67776007 stands for "Monocyte count (procedure)" while 365631001
> > stands for "Finding of monocyte count (finding)". So is it fair to
> > say that cTakes is not fully aligned with SNOMED?  Is there a rule
> > on how such concepts may be merged under the same CUI? Would using
> > YTEX resolve
> similar issues?
> >
> > And also I'm using cTakes 4.0.0 and the YTEX installation guide
> > appears to be outdated - the patch download is missing , names of
> > files
> missing etc.
> > If YTEX is the answer are there any updated instructions? If it is
> > not are you using other UIMA-friendly solutions?
> >
> > Many thanks in advance,
> > Eugenia
> >
> > Disclaimer: This email and any attachments are sent in strictest
> > confidence for the sole use of the addressee and may contain legally
> > privileged, confidential, and proprietary data. If you are not the
> > intended recipient, please advise the sender by replying promptly to
> > this email and then delete and destroy this email and any
> > attachments without any further use, copying or forwarding.
> >
> Disclaimer: This email and any attachments are sent in strictest
> confidence for the sole use of the addressee and may contain legally
> privileged, confidential, and proprietary data. If you are not the
> intended recipient, please advise the sender by replying promptly to
> this email and then delete and destroy this email and any attachments
> without any further use, copying or forwarding.
>
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged, confidential, 
and proprietary data. If you are not the intended recipient, please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged, confidential, 
and proprietary data. If you are not the intended recipient, please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged, confidential, 
and proprietary data. If you are not the intended recipient, please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.

Re: Disambiguation --alignment with SNOMED [EXTERNAL]

Reply via email to