subject:"Re\: Building"

Re: Building 2023AA Snomed RxNorm dictionary fails [EXTERNAL]

2023-08-07 Thread Finan, Sean

Hi Akram,

The first thing that I'll mention is that there are a lot of updates to the 
ctakes Dictionary builder in the unreleased version 5.0, so I am going to talk 
about its use.  https://github.com/apache/ctakes

>  1. Combining SNOMED and RxNorm in Dictionary Creation:
> I extracted data from umls-2023AA-full and RxNorm_full. After utilizing NLM 
> Metamorphosys to install UMLS, the conversion of SNOMED from umls-2023AA-full 
> into RRF files was successfully accomplished.
- For clarity, are you stating that you created RRF files for snomed from 
umls-2023AA_full and separate RRF files from RxNorm_full sources ?  If so, are 
you sure that UMLS 2023AA_full doesn't contain all of the RxNorm information 
that you need?

> I can only select one "UMLS Installation" source, limiting me to either 
> SNOMED or RxNorm.
- This is correct.  Normally a dictionary is built from RRF files created using 
metamorphosys on a single source.
- There are two possible clobberings to combine dictionaries from disparate 
sources:

  1.  Concatenate the source RRF files from both sources.  You should only need 
to do this with the MRCONSO RRF files.  Then select the directory containing 
the concatenated RRF (and other RRF files) as the umls source for the 
dictionary creator gui.
  2.   Build 2 ctakes dictionaries, one from each source.  Then concatenate all 
"INSERT" lines into one dictionary file.

- A cleaner method for your situation is to create one ctakes dictionary for 
snomed and a separate ctakes dictionary for rxnorm.  Then create a dictionary 
descriptor file for multiple dictionaries.  Tim Miller has a great example of 
one here:  
https://github.com/tmills/ctakes-docker/blob/master/ctakes-as-pipeline/MultipleDictionaryLookupSpecExample.xml
- The multiple dictionary approach is more flexible, but try not to use 
multiple dictionaries with a lot of overlap.

> 2. Error Message During Dictionary Build:
> Log Message: user lacks privilege or object not found: MED in statement 
> [insert into MED-RT (CUI,MED-RT)  values (?,?)]
- I think that vocabularies containing a dash in the name such as "MED-RT" were 
problematic in older versions of the dictionary creator.  It should be ok with 
v.5
- The problem stemmed from SQL not allowing dash characters in table names 
without special treatment.  ctakes gets around it by converting the dash 
character to an underscore.

Sean


From: Akram 
Sent: Saturday, August 5, 2023 11:34 AM
To: dev@ctakes.apache.org 
Subject: Building 2023AA Snomed RxNorm dictionary fails [EXTERNAL]

* External Email - Caution *

Hi All

I've been working on creating a dictionary for the 2023AA UMLS, specifically 
incorporating SNOMED and RxNorm. However, I've encountered two main challenges 
that I'm hoping to get assistance with:

1. Combining SNOMED and RxNorm in Dictionary Creation:
   To initiate the process, I extracted data from umls-2023AA-full and 
RxNorm_full. After utilizing NLM Metamorphosys to install UMLS, the conversion 
of SNOMED from umls-2023AA-full into RRF files was successfully accomplished. 
However, when I proceeded to employ cTAKES Dictionary Creator for transforming 
UMLS SNOMED and RxNorm into a singular dictionary, I encountered an issue. The 
challenge lies in the fact that I can only select one "UMLS Installation" 
source, limiting me to either SNOMED or RxNorm. Is there a viable solution that 
would enable me to effectively incorporate both SNOMED and RxNorm into the 
dictionary generation process?

2. Error Message During Dictionary Build:
   Following the selection of the NLM Metamorphosys output folder as the "UMLS 
Installation" source and checking all relevant boxes for Vocabulary and 
Semantic Type, I clicked on "Build Dictionary". Unfortunately, this action 
resulted in an error message being displayed. I'm seeking guidance on how to 
address and resolve this error in order to successfully complete the dictionary 
creation process.

Error Message: Dictionary ctakesdictionary could not be built in F:\cTAKES
Log Message: user lacks privilege or object not found: MED in statement [insert 
into MED-RT (CUI,MED-RT)  values (?,?)]

I truly appreciate any assistance that can be provided in overcoming these 
challenges.

Thank you very much.

[Inline image]

[Inline image]

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-17 Thread Finan, Sean

Hi Abad,

If I am following you, this is a different problem.  

Previously you had the ICD code (for instance) in the text itself.  You wanted 
ctakes to identify the ICD code in the text and annotate it.
For this, I have no idea why a number in the dictionary would not be 
discovered.  I think that you have removed all of the filters that would 
prevent such a thing and you are practically left with pure string matching.
Did you add the PrettyText writer to your pipeline?  Did you check its output?  
This kind of data can really help debugging.

Now it seems that you are asking about assigning ICD codes to some annotation 
discovered in the text, like "cancer".  For this second problem:

1.  You must make sure that you copy not only the INSERT lines, but also the 
CREATE table and index (on cui).  I am guessing that you did because otherwise 
hsql should throw an error and ctakes should exit.  I am writing this to 
attempt a complete answer.

2.  You must modify your dictionary parameters .xml file.  If you are using 
sno_rx_16ab then it is in the parent directory of the .script file, 
sno_rx_16ab.xml
Within   sno_rx_16abConcepts you should see declared 
properties
 
 
You need to create properties for your codes.  For instance
 

The value is one of "long", "double", "text" if I remember correctly.  Text can 
be used for long and double as well as text - but you will want to match your 
table's column type.


As an aside, sInce you obtained your icd and cpt codes separately from the cuis 
used in sno_rx_16ab they won't match up 1:1.  There may be cuis that don't have 
an code in your icd table, but I would bet that there are a lot of icd codes in 
that table with cuis that are not in the main table, therefore they will never 
be used - they just slow down select calls.  You could try to filter your 
copied insert statements by existing cuis in the cui_terms table, but that is 
up to you.


Sean


From: abad.ay...@cognizant.com 
Sent: Thursday, September 17, 2020 12:45 PM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Sean,

We tried this and unfortunately this wasn't working, Our main goal was to 
extract/detect the respective code of(SnoMed/RxNorm/ICD/CPT) by cTAKES and 
meanwhile we saw an attribute in the OntologyConceptArray named as "code" where 
that attribute was having the respective SNOMEDCT and RxNORM code,So can we 
consider this attribute to get populated for all other newly configure codes?. 
The reason why am asking is because we couldn't see that "code" attribute 
getting populated for the newly configured ICD/CPT . Could you pls. advise us 
why this "code" would not be getting populated for the newly configured ICD/CPT 
. Pls. find the below steps that we did for adding  ICD/CPT into our profile

1. Generated a custom dictionary using METAMORPHOSYS UML installation 
tool(where we provide sources as ICD10,CPT) and leverage the full set of .rrf  
files in the meta folder .
2.Using the same .rrf files we generated the .script file(which has the INSERT 
scripts to ICD10 and CPT tables).
3.Copied the INSERT scripts from the newly generated .script file and merged it 
to the existing sno_rx_16ab.script file.
4.Then restarted the cTAKES.

We could see that cTAKES was detecting the newly configured CUI's in ICD/CPT 
but could find that "code" attribute in the OntologyConceptArray was null for 
the detected ICD's and CPT's. It would have been helpful for us if that "code" 
attribute was returned by cTAKES for the newly configured ICD and CPT. Could 
you pls. advise us whether the steps followed by us is correct in the case of 
addition of ICD and CPT. Is there any other configuration changes required from 
our end for getting the "code" attribute populated as expected for ICD and CPT.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-Original Message-----
From: Finan, Sean 
Sent: Wednesday, September 16, 2020 11:10 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

Since you are changing code ...


Line #320 of AbstractJcasTermAnnotator:

 final boolean isNonLookup = baseToken instanceof PunctuationToken
 || baseToken instanceof NumToken
 || baseToken instanceof ContractionToken
 || baseToken instanceof SymbolToken;

Comment out:
 || baseToken instanceof NumToken


Sean


From: abad.ay...@cognizant.com 
Sent: Wednesday, September 16, 2020 1:01 PM
To: dev@ct

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-17 Thread Abad.Ayyub

Hi Sean,

We tried this and unfortunately this wasn't working, Our main goal was to 
extract/detect the respective code of(SnoMed/RxNorm/ICD/CPT) by cTAKES and 
meanwhile we saw an attribute in the OntologyConceptArray named as "code" where 
that attribute was having the respective SNOMEDCT and RxNORM code,So can we 
consider this attribute to get populated for all other newly configure codes?. 
The reason why am asking is because we couldn't see that "code" attribute 
getting populated for the newly configured ICD/CPT . Could you pls. advise us 
why this "code" would not be getting populated for the newly configured ICD/CPT 
. Pls. find the below steps that we did for adding  ICD/CPT into our profile

1. Generated a custom dictionary using METAMORPHOSYS UML installation 
tool(where we provide sources as ICD10,CPT) and leverage the full set of .rrf  
files in the meta folder .
2.Using the same .rrf files we generated the .script file(which has the INSERT 
scripts to ICD10 and CPT tables).
3.Copied the INSERT scripts from the newly generated .script file and merged it 
to the existing sno_rx_16ab.script file.
4.Then restarted the cTAKES.

We could see that cTAKES was detecting the newly configured CUI's in ICD/CPT 
but could find that "code" attribute in the OntologyConceptArray was null for 
the detected ICD's and CPT's. It would have been helpful for us if that "code" 
attribute was returned by cTAKES for the newly configured ICD and CPT. Could 
you pls. advise us whether the steps followed by us is correct in the case of 
addition of ICD and CPT. Is there any other configuration changes required from 
our end for getting the "code" attribute populated as expected for ICD and CPT.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-Original Message-
From: Finan, Sean 
Sent: Wednesday, September 16, 2020 11:10 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

Since you are changing code ...


Line #320 of AbstractJcasTermAnnotator:

 final boolean isNonLookup = baseToken instanceof PunctuationToken
 || baseToken instanceof NumToken
 || baseToken instanceof ContractionToken
 || baseToken instanceof SymbolToken;

Comment out:
 || baseToken instanceof NumToken


Sean


From: abad.ay...@cognizant.com 
Sent: Wednesday, September 16, 2020 1:01 PM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Sean,

We tried setting the exclusion tags="". PFB the changes we did

1. Set the value of String DEFAULT_EXCLUSION_TAGS = "" ; in 
JCasTermAnnotator.java file 2. Removed the values of  tag of  
 with  tag as 'exclusionTags' in UmlsLookupAnnotator.xml 
& UmlsOverlapLookupAnnotator.xml

But still we could see that "97112" was not getting picked up from dictionary. 
Is there anywhere else we need to try the changes. Kindly advise

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-Original Message-----
From: Finan, Sean 
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts 
of speech that are ignored for lookup.  Right now a pure number is ignored - it 
is not a word.  So, similar to what I said in my previous email, change the 
dictionary lookup parameter exclusionTags.  But to make sure that you get 
everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting 
"F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to 
the end of the piper and looking at its output (see my previous email).   It 
will print each sentence on a line and indicate the part of speech for each 
token.  If you can quickly and easily see what the system is doing then you 
might start to understand what needs to be changed to fit your data.

Sean
________
From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requi

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-16 Thread Finan, Sean

Hi Abad,

Since you are changing code ...


Line #320 of AbstractJcasTermAnnotator:

 final boolean isNonLookup = baseToken instanceof PunctuationToken
 || baseToken instanceof NumToken
 || baseToken instanceof ContractionToken
 || baseToken instanceof SymbolToken;

Comment out:
 || baseToken instanceof NumToken


Sean


From: abad.ay...@cognizant.com 
Sent: Wednesday, September 16, 2020 1:01 PM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Sean,

We tried setting the exclusion tags="". PFB the changes we did

1. Set the value of String DEFAULT_EXCLUSION_TAGS = "" ; in 
JCasTermAnnotator.java file
2. Removed the values of  tag of   with  tag as 
'exclusionTags' in UmlsLookupAnnotator.xml & UmlsOverlapLookupAnnotator.xml

But still we could see that "97112" was not getting picked up from dictionary. 
Is there anywhere else we need to try the changes. Kindly advise

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-Original Message-
From: Finan, Sean 
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts 
of speech that are ignored for lookup.  Right now a pure number is ignored - it 
is not a word.  So, similar to what I said in my previous email, change the 
dictionary lookup parameter exclusionTags.  But to make sure that you get 
everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting 
"F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to 
the end of the piper and looking at its output (see my previous email).   It 
will print each sentence on a line and indicate the part of speech for each 
token.  If you can quickly and easily see what the system is doing then you 
might start to understand what needs to be changed to fit your data.

Sean

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requirement. Your solution of adding spaces between the 
entries worked but it required the input  text also to have the spaces. If the 
text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 
'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below 
for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We 
expected cTAKES to recognize the CPT code '97112' as a separate token but it 
didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something 
additional for cTAKES to recognize the code alone as a separate token Is there 
any other way in which we can try to get the respective ICD/CPT code of the 
identified annotation from cTAKES, like querying the CPT/ICD table using the 
fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


__

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-16 Thread Abad.Ayyub

Hi Sean,

We tried setting the exclusion tags="". PFB the changes we did

1. Set the value of String DEFAULT_EXCLUSION_TAGS = "" ; in 
JCasTermAnnotator.java file
2. Removed the values of  tag of   with  tag as 
'exclusionTags' in UmlsLookupAnnotator.xml & UmlsOverlapLookupAnnotator.xml

But still we could see that "97112" was not getting picked up from dictionary. 
Is there anywhere else we need to try the changes. Kindly advise

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-Original Message-
From: Finan, Sean 
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts 
of speech that are ignored for lookup.  Right now a pure number is ignored - it 
is not a word.  So, similar to what I said in my previous email, change the 
dictionary lookup parameter exclusionTags.  But to make sure that you get 
everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting 
"F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to 
the end of the piper and looking at its output (see my previous email).   It 
will print each sentence on a line and indicate the part of speech for each 
token.  If you can quickly and easily see what the system is doing then you 
might start to understand what needs to be changed to fit your data.

Sean

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requirement. Your solution of adding spaces between the 
entries worked but it required the input  text also to have the spaces. If the 
text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 
'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below 
for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We 
expected cTAKES to recognize the CPT code '97112' as a separate token but it 
didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something 
additional for cTAKES to recognize the code alone as a separate token Is there 
any other way in which we can try to get the respective ICD/CPT code of the 
identified annotation from cTAKES, like querying the CPT/ICD table using the 
fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


____
From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as mentioned in 
cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT 
are identifying the descriptions as ex

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Peter Abramowitsch

Thanks Tim.

I've been experimenting with the PennTreebank and see some potential for
using it as a powerful disambiguation tool.  The complex part is to find a
heuristic that minimizes the number of cases where the "big guns"   need to
be brought in -- because, yes, it would really slow things down.

Peter

On Tue, Sep 15, 2020 at 12:54 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Peter,
> The parts of speech come from the ctakes-pos-tagger module, which uses
> the OpenNLP pos tagger trained on clinical data. There is a
> constituency parser as well, which I think in theory can tag even
> better (that might be able to get you a unary branch in a tree from NN
> -> CD -> .), but is a lot slower than the pos tagger and we
> probably don't want to make it necessary to run for simple dictionary
> pipelines.
> Tim
>
> On Tue, 2020-09-15 at 12:12 -0700, Peter Abramowitsch wrote:
> > * External Email - Caution *
> >
> >
> > Sean this conversation raises for me a question that I've had for a
> > while.
> >  Does the term finding mechanism actually use a treebank to find the
> > POS or
> > does it use a another less rigorous approach.   If it were rigorous
> > wouldn't it be able to tag a pure number as an NN in the role
> > of  object if
> > it played the corresponding role in the sentence?
> >
> > I've not had the same problem as Ayyub,  but I have been wondering
> > why one
> > needed to disable the identification of cm as a genetic acronym
> > because of
> > situations where clearly cm is part of a unit of measure and would
> > show up
> > as an entity's modifier in a treebank.
> >
> > Does the question make sense?
> >
> > Peter
> >
> > On Tue, Sep 15, 2020, 9:02 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu>
> > wrote:
> >
> > > I should mention that going the Paragraph route would only impact
> > > term
> > > lookup.
> > > 
> > > From: abad.ay...@cognizant.com 
> > > Sent: Tuesday, September 15, 2020 11:54 AM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Thank you Sean for the response. We shall definitely try that way.
> > > I have
> > > one question on the "f84.1" problem, since we have now developed a
> > > lot of
> > > features based on the output from cTAKES, is the impact of changing
> > > the
> > > sentenceDetectorAnnotator going to be huge?
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > -Original Message-
> > > From: Finan, Sean 
> > > Sent: Tuesday, September 15, 2020 9:06 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > [External]
> > >
> > >
> > > Hi Abad,
> > >
> > > The first thing that I would try for the "97112" problem is
> > > changing the
> > > parts of speech that are ignored for lookup.  Right now a pure
> > > number is
> > > ignored - it is not a word.  So, similar to what I said in my
> > > previous
> > > email, change the dictionary lookup parameter exclusionTags.  But
> > > to make
> > > sure that you get everything, you can first try no exclusions:
> > > set exclusionTags=""
> > >
> > > My guess with the F84.1 problem is that your sentence splitter is
> > > splitting "F84.1" but not splitting "F84 . 1".
> > >
> > > I think that the best way to start debugging is adding the
> > > PrettyTextWriter to the end of the piper and looking at its output
> > > (see my
> > > previous email).   It will print each sentence on a line and
> > > indicate the
> > > part of speech for each token.  If you can quickly and easily see
> > > what the
> > > system is doing then you might start to understand what needs to be
> > > changed
> > > to fit your data.
> > >
> > > Sean
> > > 
> > > From: abad.ay..

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Peter Abramowitsch

Sean this conversation raises for me a question that I've had for a while.
 Does the term finding mechanism actually use a treebank to find the POS or
does it use a another less rigorous approach.   If it were rigorous
wouldn't it be able to tag a pure number as an NN in the role of  object if
it played the corresponding role in the sentence?

I've not had the same problem as Ayyub,  but I have been wondering why one
needed to disable the identification of cm as a genetic acronym because of
situations where clearly cm is part of a unit of measure and would show up
as an entity's modifier in a treebank.

Does the question make sense?

Peter

On Tue, Sep 15, 2020, 9:02 AM Finan, Sean 
wrote:

> I should mention that going the Paragraph route would only impact term
> lookup.
> 
> From: abad.ay...@cognizant.com 
> Sent: Tuesday, September 15, 2020 11:54 AM
> To: dev@ctakes.apache.org
> Subject: RE: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thank you Sean for the response. We shall definitely try that way. I have
> one question on the "f84.1" problem, since we have now developed a lot of
> features based on the output from cTAKES, is the impact of changing the
> sentenceDetectorAnnotator going to be huge?
>
> Thanks & Regards
>
> Abad Ayyub
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
> -Original Message-
> From: Finan, Sean 
> Sent: Tuesday, September 15, 2020 9:06 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> [External]
>
>
> Hi Abad,
>
> The first thing that I would try for the "97112" problem is changing the
> parts of speech that are ignored for lookup.  Right now a pure number is
> ignored - it is not a word.  So, similar to what I said in my previous
> email, change the dictionary lookup parameter exclusionTags.  But to make
> sure that you get everything, you can first try no exclusions:
> set exclusionTags=""
>
> My guess with the F84.1 problem is that your sentence splitter is
> splitting "F84.1" but not splitting "F84 . 1".
>
> I think that the best way to start debugging is adding the
> PrettyTextWriter to the end of the piper and looking at its output (see my
> previous email).   It will print each sentence on a line and indicate the
> part of speech for each token.  If you can quickly and easily see what the
> system is doing then you might start to understand what needs to be changed
> to fit your data.
>
> Sean
> 
> From: abad.ay...@cognizant.com 
> Sent: Tuesday, September 15, 2020 11:15 AM
> To: dev@ctakes.apache.org
> Subject: RE: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thank you Sean for the detailed response.  I think there was
> miscommunication from our end with the requirement. Your solution of adding
> spaces between the entries worked but it required the input  text also to
> have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the
> token but if the text came as 'F84 . 1' then cTAKES was recognizing the
> tokens for the below INSERT scripts.
>
> INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
>
> But we encountered a similar issue when we configured an INSERT entry as
> below for CPT codes,
>
> INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)
>
> Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We
> expected cTAKES to recognize the CPT code '97112' as a separate token but
> it didn't. Could you pls. advise us on why this issue came up.
>
> Is there something wrong in the configuration. Do we need to have
> something additional for cTAKES to recognize the code alone as a separate
> token Is there any other way in which we can try to get the respective
> ICD/CPT code of the identified annotation from cTAKES, like querying the
> CPT/ICD table using the fetched CUI? Kindly advise.
>
>
> Thanks & Regards
>
> Abad Ayyub
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
> -Original Message-
> From: Finan, Sean 
> Sent: Monday, September 14, 2020 9:35 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> [External]
>
>
> Hi Abad,
>
>
> I think that you need to make only one minor change.
>
>
> ctakes uses "tokens" for identif

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Finan, Sean

I should mention that going the Paragraph route would only impact term lookup.

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Thank you Sean for the response. We shall definitely try that way. I have one 
question on the "f84.1" problem, since we have now developed a lot of features 
based on the output from cTAKES, is the impact of changing the 
sentenceDetectorAnnotator going to be huge?

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

-Original Message-
From: Finan, Sean 
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]

Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts 
of speech that are ignored for lookup.  Right now a pure number is ignored - it 
is not a word.  So, similar to what I said in my previous email, change the 
dictionary lookup parameter exclusionTags.  But to make sure that you get 
everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting 
"F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to 
the end of the piper and looking at its output (see my previous email).   It 
will print each sentence on a line and indicate the part of speech for each 
token.  If you can quickly and easily see what the system is doing then you 
might start to understand what needs to be changed to fit your data.

Sean

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requirement. Your solution of adding spaces between the 
entries worked but it required the input  text also to have the spaces. If the 
text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 
'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below 
for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We 
expected cTAKES to recognize the CPT code '97112' as a separate token but it 
didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something 
additional for cTAKES to recognize the code alone as a separate token Is there 
any other way in which we can try to get the respective ICD/CPT code of the 
identified annotation from cTAKES, like querying the CPT/ICD table using the 
fetched CUI? Kindly advise.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

-Original Message-
From: Finan, Sean 
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]

Hi Abad,

I think that you need to make only one minor change.

ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:

INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)

Sean

From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Finan, Sean

Hi Abad,

Changing the sentence detector will make a change.  However in terms of term 
lookup I wouldn't call it "huge".  However, I would spot-check a series of 
notes just to see how it impacts your data specifically.

Sean

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the response. We shall definitely try that way. I have one 
question on the "f84.1" problem, since we have now developed a lot of features 
based on the output from cTAKES, is the impact of changing the 
sentenceDetectorAnnotator going to be huge?

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts 
of speech that are ignored for lookup.  Right now a pure number is ignored - it 
is not a word.  So, similar to what I said in my previous email, change the 
dictionary lookup parameter exclusionTags.  But to make sure that you get 
everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting 
"F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to 
the end of the piper and looking at its output (see my previous email).   It 
will print each sentence on a line and indicate the part of speech for each 
token.  If you can quickly and easily see what the system is doing then you 
might start to understand what needs to be changed to fit your data.

Sean

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requirement. Your solution of adding spaces between the 
entries worked but it required the input  text also to have the spaces. If the 
text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 
'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below 
for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We 
expected cTAKES to recognize the CPT code '97112' as a separate token but it 
didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something 
additional for cTAKES to recognize the code alone as a separate token Is there 
any other way in which we can try to get the respective ICD/CPT code of the 
identified annotation from cTAKES, like querying the CPT/ICD table using the 
fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean



From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing diction

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Abad.Ayyub

Thank you Sean for the response. We shall definitely try that way. I have one 
question on the "f84.1" problem, since we have now developed a lot of features 
based on the output from cTAKES, is the impact of changing the 
sentenceDetectorAnnotator going to be huge?

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts 
of speech that are ignored for lookup.  Right now a pure number is ignored - it 
is not a word.  So, similar to what I said in my previous email, change the 
dictionary lookup parameter exclusionTags.  But to make sure that you get 
everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting 
"F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to 
the end of the piper and looking at its output (see my previous email).   It 
will print each sentence on a line and indicate the part of speech for each 
token.  If you can quickly and easily see what the system is doing then you 
might start to understand what needs to be changed to fit your data.

Sean

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requirement. Your solution of adding spaces between the 
entries worked but it required the input  text also to have the spaces. If the 
text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 
'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below 
for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We 
expected cTAKES to recognize the CPT code '97112' as a separate token but it 
didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something 
additional for cTAKES to recognize the code alone as a separate token Is there 
any other way in which we can try to get the respective ICD/CPT code of the 
identified annotation from cTAKES, like querying the CPT/ICD table using the 
fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean



From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as mentioned in 
cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT 
are identifying the descriptions as expected but we have a requirement to 
extract the ICD code for the respective description . so the scenario would be 
like for a text like below

‘F84.1 pervasive developmental disorde

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Finan, Sean

Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts 
of speech that are ignored for lookup.  Right now a pure number is ignored - it 
is not a word.  So, similar to what I said in my previous email, change the 
dictionary lookup parameter exclusionTags.  But to make sure that you get 
everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting 
"F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to 
the end of the piper and looking at its output (see my previous email).   It 
will print each sentence on a line and indicate the part of speech for each 
token.  If you can quickly and easily see what the system is doing then you 
might start to understand what needs to be changed to fit your data.

Sean

From: abad.ay...@cognizant.com 
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requirement. Your solution of adding spaces between the 
entries worked but it required the input  text also to have the spaces. If the 
text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 
'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below 
for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We 
expected cTAKES to recognize the CPT code '97112' as a separate token but it 
didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something 
additional for cTAKES to recognize the code alone as a separate token
Is there any other way in which we can try to get the respective ICD/CPT code 
of the identified annotation from cTAKES, like querying the CPT/ICD table using 
the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean



From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as mentioned in 
cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT 
are identifying the descriptions as expected but we have a requirement to 
extract the ICD code for the respective description . so the scenario would be 
like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in 
any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior 
experience we tried to tweak the dictionary where we added a synonym for the 
existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental 
disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, 
‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t 
consider whenever a ‘.’ Has been encountered. As

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Abad.Ayyub

Thank you Sean for the detailed response.  I think there was miscommunication 
from our end with the requirement. Your solution of adding spaces between the 
entries worked but it required the input  text also to have the spaces. If the 
text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 
'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below 
for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We 
expected cTAKES to recognize the CPT code '97112' as a separate token but it 
didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something 
additional for cTAKES to recognize the code alone as a separate token
Is there any other way in which we can try to get the respective ICD/CPT code 
of the identified annotation from cTAKES, like querying the CPT/ICD table using 
the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean



From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as mentioned in 
cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT 
are identifying the descriptions as expected but we have a requirement to 
extract the ICD code for the respective description . so the scenario would be 
like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in 
any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior 
experience we tried to tweak the dictionary where we added a synonym for the 
existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental 
disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, 
‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t 
consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be 
able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost 
all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the 
dictionary is not working. Infact cTAKES is recognizing the digit after decimal 
within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while 
retrieving  the token as an individual token or as an attribute in any of the 
tokens

Is there any other way in which the dictionary can be tweaked , so that a 
synonym addition as below will recognize the ICD code as a token and will be 
returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet 
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES

[External]
Hi Abad,

•   How can we point cTAKES application to multiple dictionaries. Currently 
only sno_rx_16ab is pointed to the application, how can

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL] [SUSPICIOUS]

2020-09-14 Thread Finan, Sean

I should also mention:

By default ctakes uses Sentences as the "range" in which to find terms.  For 
instance, in the text "There was a lesion in the stomach.  Cancer was not 
diagnosed."  ctakes will (most likely) split the text into two sentences.  
Within the first sentence it could discover "stomach", and in the second 
sentence it could discover "cancer".  However, it will not event try to 
discover the term "stomach cancer".  For the text "F84.1" ctakes may determine 
there two be two sentences: "F84" and "1".

There are a couple of ways to "correct" this.
1.  Use the SentenceDetectorAnnotatorBIO instead of SentenceDetector.  The BIO 
version is more of a "lumper" while the other is more of a "splitter".  In the 
piper:
// add SentenceDetector
add SentenceDetectorAnnotatorBIO 
classifierJarPath=/org/apache/ctakes/core/sentdetect/model.jar

2.  Use paragraphs as the discovery range for the dictionary lookup.  In the 
piper:
add ParagraphAnnotator
set windowAnnotations=org.apache.ctakes.typesystem.type.textspan.Paragraph
// -- lines for cli if you use them
add DefaultJCasTermAnnotator

3.  Create an annotator that joins the possibly erroneous splits.  You can use 
MrsDrSentenceJoiner as an example, but checking the last characters of a 
sentence and the first characters of the next sentence for digits, making sure 
that there is no whitespace between their offsets.

There may be another issue with the part of speech of a non-word such as "F84" 
causing it to be ignored as a candidate for lookup. The default exclusion (penn 
treebank) tags are:
"VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,IN,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"
You would want to remove (at least) "CD" and "LS".  In your piper:
set 
exclusionTags="VB,VBD,VBG,VBN,VBP,VBZ,CC,DT,EX,IN,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"
// -- other parameters for dictionary lookup
add DefaultJCasTermAnnotator

It could be that ctakes is tagging things like "F84" as other parts of speech, 
so you would have to check on that and modify the exclusionTags accordingly.  
You can check by adding at the end of your piper:
add pretty.plaintext.PrettyTextWriterFit
and checking the output files that it creates.

I realize that this seems like a lot to check, but dictionary lookup is not a 
simple beast.

Sean


From: Finan, Sean 
Sent: Monday, September 14, 2020 12:04 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean



From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as mentioned in 
cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT 
are identifying the descriptions as expected but we have a requirement to 
extract the ICD code for the respective description . so the scenario would be 
like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in 
any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior 
experience we tried to tweak the dictionary where we added a synonym for the 
existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental 
disorders’, ‘pervasive’)
INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’)
INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a tok

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-14 Thread Finan, Sean

Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization 
turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed 
by a token encompassing '.' and another with '1'.  The manner in which this is 
indicated in the .script file is by adding a space between each token.  This 
makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the 
between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean



From: abad.ay...@cognizant.com 
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as mentioned in 
cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT 
are identifying the descriptions as expected but we have a requirement to 
extract the ICD code for the respective description . so the scenario would be 
like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in 
any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior 
experience we tried to tweak the dictionary where we added a synonym for the 
existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental 
disorders’, ‘pervasive’)
INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’)
INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t 
consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be 
able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost 
all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the 
dictionary is not working. Infact cTAKES is recognizing the digit after decimal 
within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while 
retrieving  the token as an individual token or as an attribute in any of the 
tokens

Is there any other way in which the dictionary can be tweaked , so that a 
synonym addition as below will recognize the ICD code as a token and will be 
returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet 
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES

[External]
Hi Abad,

•   How can we point cTAKES application to multiple dictionaries. Currently 
only sno_rx_16ab is pointed to the application, how can I tweak it to point 
that to multiple dictionary simultaneously. Or you meant to say create a fresh 
dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into 
detail on how to add multiple dictionaries. Combining all dictionaries into a 
fresh dictionary is not recommended for obvious reasons. If you can't find the 
thread, I will dig it up.

•   So for these edits I will have to add INSERT queries to respective 
tables in the sno_rx_16ab.script file right? Do I need to make any more changes 
for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should 
recognize your new entries.

•   If it is a non-existing CUI , I can get the respective CUI,TUI from 
here  
https://uts.nlm.nih.gov//metathesaurus.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0=DwMGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=YKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U=Aks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA=>
  right?

Correct!

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

2020-09-14 Thread Abad.Ayyub

Hi Team,

I hope you all are doing good. With your support ,We were able to successfully 
add our required synonyms into existing dictionary and could see that it was 
getting successfully picked up by cTAKES. Now we have a requirement to 
configure the ICD and CPT also, where we followed the steps as mentioned in 
cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT 
are identifying the descriptions as expected but we have a requirement to 
extract the ICD code for the respective description . so the scenario would be 
like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in 
any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior 
experience we tried to tweak the dictionary where we added a synonym for the 
existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental 
disorders’, ‘pervasive’)
INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’)
INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t 
consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be 
able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost 
all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the 
dictionary is not working. Infact cTAKES is recognizing the digit after decimal 
within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while 
retrieving  the token as an individual token or as an attribute in any of the 
tokens

Is there any other way in which the dictionary can be tweaked , so that a 
synonym addition as below will recognize the ICD code as a token and will be 
returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


From: Remy Sanouillet 
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES

[External]
Hi Abad,

•   How can we point cTAKES application to multiple dictionaries. Currently 
only sno_rx_16ab is pointed to the application, how can I tweak it to point 
that to multiple dictionary simultaneously. Or you meant to say create a fresh 
dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into 
detail on how to add multiple dictionaries. Combining all dictionaries into a 
fresh dictionary is not recommended for obvious reasons. If you can't find the 
thread, I will dig it up.

•   So for these edits I will have to add INSERT queries to respective 
tables in the sno_rx_16ab.script file right? Do I need to make any more changes 
for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should 
recognize your new entries.

•   If it is a non-existing CUI , I can get the respective CUI,TUI from 
here  
https://uts.nlm.nih.gov//metathesaurus.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cc8b0b69302014cff91ac08d80697c6a7%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637266596246493365=hNixbxffJ9%2Fx%2Bho9J41gjonaT9IGLsxIqABKq1dpzG8%3D=0>
  right?

Correct! Remember that the ontology has multiple-inheritance so you need to 
grab all the TUIs for a given CUI.

•   Based on the source I will have to add entry to respective table right? 
Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in 
all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one 
CODE table so it is not even a question. However, sno_rx_16ab is an exception 
with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. 
I do remember that there were a couple of exceptions but, in the case where 
that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C946) has two SNOMEDCT_US codes (372815001 
and 68088000) *and* an RXNORM of 149.

•   PREFTERM table will be having only one entry for each CUI right? 
Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am 
wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system 
appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
re...@foreseemed.com<mailto:xx...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: Thi

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

2020-06-01 Thread Remy Sanouillet

  right?
>
> ·   Based on the source I will have to add entry to respective table
> right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it
> and not in all. Correct me if am wrong on this understanding
>
> ·   PREFTERM table will be having only one entry for each CUI right?
> Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me
> if am wrong on this understanding.
>
>
>
>
>
> Thanks & Regards
>
> [image: cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
>
> *Abad Ayyub*
>
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
>
>
> *From:* Remy Sanouillet 
> *Sent:* Friday, May 29, 2020 9:25 PM
> *To:* dev@ctakes.apache.org
> *Cc:* u...@ctakes.apache.org
> *Subject:* Re: Building a new custom dictionary or Updating/Adding values
> to the existing dictionary in cTAKES
>
>
>
> *[External]*
>
> Hello Abad,
>
>
>
> The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
> caveats are that any mistake can stop all recognition and you will lose all
> your mods on updates. So an additional dictionary is a recommended approach.
>
>
>
> There are two cases. EIther the CUI you are adding already exists and you
> are just adding a synonym. In that case, you only need to add one line:
>
> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
>
> where:
>
>- CUI is the cui, nuf'said
>- TEXT is the tokenized lowercase string for the entry. In your case
>'pap smear'. Most punctuation is a separate token. Single quotes are
>escaped by doubling them
>- RWORD is the one token in TEXT that is the most indicative (least
>common) which will be used as the index in the lookup. In your case
>probably 'pap' since it is not as common as 'smear'
>- RINDEX is the index of RWORD in TEXT. First token is 0 which is the
>case for 'pap'
>- TCOUNT is the token count for TEXT. In your case, 2
>
> So you would want to add:
>
> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>
>
>
>  If the entry is a non-existing one, you will need to add a few more
> lines. Their positions are unimportant as long as they are below the header
> lines (below the final "SET SCHEMA PUBLIC" line).
>
>1. INSERT INTO TUI VALUES(CUI,TUI)
>One line for each TUI in the taxonomy
>2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
>assuming you are adding a SNOMED
>3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
>where PREFTERM is the pretty string to describe the entry. It need not
>correspond to any indexed entry. It is used for display once the lookup has
>been successful.
>
> That's it. Use at your own discretion. No guarantees.
>
>
>
>
> *Rémy Sanouillet*
>
> NLP Engineer
>
> re...@foreseemed.com 
>
>
>
>
> [image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
> ForeSee Medical, Inc.
>
> 12555 High Bluff Drive, Suite 100
>
> San Diego, CA 92130
>
>
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressee and may contain legally
> privileged and confidential information. If the reader of this message is
> not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution, copying, or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to this
> message and please delete it from your computer.
>
>
>
>
>
> On Fri, May 29, 2020 at 7:34 AM  wrote:
>
> Hi Team,
>
>
>
> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have
> faced situations where some of the expected tokens are not picked up by
> cTAKES during clinical text extraction. So our first thought process was to
> identify where the dictionary is configured and how that can be updated.
> After some code analysis  it was found that the dictionary is configured in
> the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>
>
>
> We were able to open the hsqldb using the hsql db gui and found out that
> some of our required entries are already there . So if I come specifically
> to our current problem. The  Pap Smear and Mamogram are two clinical terms
> which are not currently recognized by cTAKES in our profile.
>
> ·   If I look into the .script file , Pap Smear and
> Mammogram/Mammography is already present in the .script file and in the
> respective tables. PFB a snapshot as below
>
>
>
>
>
>
>

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

2020-06-01 Thread Abad.Ayyub

Thank you Remy and Peter for your responses. I hope you guys are doing good and 
safe in this lock down period. Could you pls. help me on my below queries in 
creating an additional dictionary.


·   How to create additional dictionary. You meant to say using the UMLS 
tool , so that using that tool we create .script files from .RRF files?

·   How can we point cTAKES application to multiple dictionaries. Currently 
only sno_rx_16ab is pointed to the application, how can I tweak it to point 
that to multiple dictionary simultaneously. Or you meant to say create a fresh 
dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal 
with two scenarios where one was with existing CUI and other was with 
Non-existing CUI. Could you pls. resolve the below queries regarding the same.


·   So for these edits I will have to add INSERT queries to respective 
tables in the sno_rx_16ab.script file right? Do I need to make any more changes 
for these tokens to get reflected in cTAKES.

·   If it is a non-existing CUI , I can get the respective CUI,TUI from 
here  
https://uts.nlm.nih.gov//metathesaurus.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cbd4a861ed0404262802e08d803e8a4b0%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637263645022133073=KFn7yO59jEsACpgY2%2BRv2XKnzipPHgC00oSvN3R0ADI%3D=0>
  right?

·   Based on the source I will have to add entry to respective table right? 
Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in 
all. Correct me if am wrong on this understanding

·   PREFTERM table will be having only one entry for each CUI right? 
Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am 
wrong on this understanding.


Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


From: Remy Sanouillet 
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org
Cc: u...@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the 
existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats 
are that any mistake can stop all recognition and you will lose all your mods 
on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are 
just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap 
smear'. Most punctuation is a separate token. Single quotes are escaped by 
doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) 
which will be used as the index in the lookup. In your case probably 'pap' 
since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case 
for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. 
Their positions are unimportant as long as they are below the header lines 
(below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
where PREFTERM is the pretty string to describe the entry. It need not 
correspond to any indexed entry. It is used for display once the lookup has 
been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
re...@foreseemed.com<mailto:xx...@foreseemed.com>


[cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are 
intended solely for the use of the addressee and may contain legally privileged 
and confidential information. If the reader of this message is not the intended 
recipient, or an employee or agent responsible for delivering this message to 
the intended recipient, you are hereby notified that any dissemination, 
distribution, copying, or other use of this message or its attachments is 
strictly prohibited. If you have received this message in error, please notify 
the sender immediately by replying to this message and please delete it from 
your computer.


On Fri, May 29, 2020 at 7:34 AM 
mailto:abad.ay...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recen

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

2020-05-29 Thread Peter Abramowitsch

I'm using the UMLS fast dictionary out of the box and mammography certainly
appears:

   {
  "_type": "UmlsConcept",
  "codingScheme": "SNOMEDCT_US",
  "code": "71651007",
  "score": 0.0,
  "disambiguated": false,
  "cui": "C0024671",
  "tui": "T060",
  "preferredText": "Mammography"
},

The problem with pap smear is not that a concept isn't found, but that PAP
is also an acronym for something else: Prostatic acid phosphatase
{
  "_type": "UmlsConcept",
  "codingScheme": "SNOMEDCT_US",
  "code": "59518007",
  "score": 0.0,
  "disambiguated": false,
  "cui": "C0523444",
  "tui": "T059",
  "preferredText": "Prostatic acid phosphatase measurement"
}

Oddly enough I can't get it to recognize any of its forms except for
"cervical smear test"





On Fri, May 29, 2020 at 8:54 AM Remy Sanouillet 
wrote:

> Hello Abad,
>
> The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
> caveats are that any mistake can stop all recognition and you will lose all
> your mods on updates. So an additional dictionary is a recommended approach.
>
> There are two cases. EIther the CUI you are adding already exists and you
> are just adding a synonym. In that case, you only need to add one line:
>
>> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
>
> where:
>
>- CUI is the cui, nuf'said
>- TEXT is the tokenized lowercase string for the entry. In your case
>'pap smear'. Most punctuation is a separate token. Single quotes are
>escaped by doubling them
>- RWORD is the one token in TEXT that is the most indicative (least
>common) which will be used as the index in the lookup. In your case
>probably 'pap' since it is not as common as 'smear'
>- RINDEX is the index of RWORD in TEXT. First token is 0 which is the
>case for 'pap'
>- TCOUNT is the token count for TEXT. In your case, 2
>
> So you would want to add:
>
>> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>>
>
>  If the entry is a non-existing one, you will need to add a few more
> lines. Their positions are unimportant as long as they are below the header
> lines (below the final "SET SCHEMA PUBLIC" line).
>
>1. INSERT INTO TUI VALUES(CUI,TUI)
>One line for each TUI in the taxonomy
>2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
>assuming you are adding a SNOMED
>3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
>where PREFTERM is the pretty string to describe the entry. It need not
>correspond to any indexed entry. It is used for display once the lookup has
>been successful.
>
> That's it. Use at your own discretion. No guarantees.
>
>
> *Rémy Sanouillet*
> NLP Engineer
> re...@foreseemed.com 
>
>
> [image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
> ForeSee Medical, Inc.
> 12555 High Bluff Drive, Suite 100
> San Diego, CA 92130
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressee and may contain legally
> privileged and confidential information. If the reader of this message is
> not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution, copying, or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to this
> message and please delete it from your computer.
>
>
> On Fri, May 29, 2020 at 7:34 AM  wrote:
>
>> Hi Team,
>>
>>
>>
>> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We
>> have faced situations where some of the expected tokens are not picked up
>> by cTAKES during clinical text extraction. So our first thought process was
>> to identify where the dictionary is configured and how that can be updated.
>> After some code analysis  it was found that the dictionary is configured in
>> the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>>
>>
>>
>> We were able to open the hsqldb using the hsql db gui and found out that
>> some of our required entries are already there . So if I come specifically
>> to our current problem. The  Pap Smear and Mamogram are two clinical terms
>> which are not currently recognized by cTAKES in our profile.
>>
>> ·   If I look into the .script file , Pap Smear and
>> Mammogram/Mammography is already present in the .script file and in the
>> respective tables. PFB a snapshot as below
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> But still this was not recogonised by cTAKES. I see there are some
>> filters working on top of the available entries in dictionary(ctakes-gui
>> and ctake-gui-res). Will that be because of these filters the tokens are
>> not recognized as expected. Could you pls. share us what exactly these
>> filters do. This will help us in future also when we are trying to add new
>> terms into the dictionary
>>
>>
>>
>>
>>

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

2020-05-29 Thread Remy Sanouillet

Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
caveats are that any mistake can stop all recognition and you will lose all
your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you
are just adding a synonym. In that case, you only need to add one line:

> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)

where:

   - CUI is the cui, nuf'said
   - TEXT is the tokenized lowercase string for the entry. In your case
   'pap smear'. Most punctuation is a separate token. Single quotes are
   escaped by doubling them
   - RWORD is the one token in TEXT that is the most indicative (least
   common) which will be used as the index in the lookup. In your case
   probably 'pap' since it is not as common as 'smear'
   - RINDEX is the index of RWORD in TEXT. First token is 0 which is the
   case for 'pap'
   - TCOUNT is the token count for TEXT. In your case, 2

So you would want to add:

> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>

 If the entry is a non-existing one, you will need to add a few more lines.
Their positions are unimportant as long as they are below the header lines
(below the final "SET SCHEMA PUBLIC" line).

   1. INSERT INTO TUI VALUES(CUI,TUI)
   One line for each TUI in the taxonomy
   2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
   assuming you are adding a SNOMED
   3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
   where PREFTERM is the pretty string to describe the entry. It need not
   correspond to any indexed entry. It is used for display once the lookup has
   been successful.

That's it. Use at your own discretion. No guarantees.

*Rémy Sanouillet*
NLP Engineer
re...@foreseemed.com 

[image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are
intended solely for the use of the addressee and may contain legally
privileged and confidential information. If the reader of this message is
not the intended recipient, or an employee or agent responsible for
delivering this message to the intended recipient, you are hereby notified
that any dissemination, distribution, copying, or other use of this message
or its attachments is strictly prohibited. If you have received this
message in error, please notify the sender immediately by replying to this
message and please delete it from your computer.

On Fri, May 29, 2020 at 7:34 AM  wrote:

> Hi Team,
>
>
>
> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have
> faced situations where some of the expected tokens are not picked up by
> cTAKES during clinical text extraction. So our first thought process was to
> identify where the dictionary is configured and how that can be updated.
> After some code analysis  it was found that the dictionary is configured in
> the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>
>
>
> We were able to open the hsqldb using the hsql db gui and found out that
> some of our required entries are already there . So if I come specifically
> to our current problem. The  Pap Smear and Mamogram are two clinical terms
> which are not currently recognized by cTAKES in our profile.
>
> ·   If I look into the .script file , Pap Smear and
> Mammogram/Mammography is already present in the .script file and in the
> respective tables. PFB a snapshot as below
>
>
>
>
>
>
>
>
>
> But still this was not recogonised by cTAKES. I see there are some filters
> working on top of the available entries in dictionary(ctakes-gui and
> ctake-gui-res). Will that be because of these filters the tokens are not
> recognized as expected. Could you pls. share us what exactly these filters
> do. This will help us in future also when we are trying to add new terms
> into the dictionary
>
>
>
>
>
> ·   What are the steps to do if we need to add/edit entries into the
> existing dictionaries. I see we can add/edit the existing values in
> .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to
> be added to dictionary how can I get the CUI and other values like
> TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value
> for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom
> bsv dictionaries but couldn’t see much documentation for it. Kindly advise
> which is the better option from the below 3.
>
>
>
> o   Generate a custom dictionary using METAMORPHOSYS UML installation
> tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the
> full set of .rrf  files in the meta folder . Is this approach better if the
> entries to be populated are maximal?
>
> o   Add/edit the available dictionary sno_rx_16ab and in that case how to
> provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and
> PREFTERM. If the entries to be populated are

RE: Building a dictionary from ontologies [EXTERNAL]

2018-01-24 Thread Finan, Sean

I think that there is an example lookup xml file in the 
dictionary/fast/examples/

Basically you want to copy one of those and just point it to your bsv file.
Then in your pipeline you want to specify the "LookupXml" to point to that xml 
file.  You can do this with a -l if you are running the default pipeline 
command-line script.  Or if you run the piper gui you can point to it there.  
That is probably the easiest thing for a new user.  That also allows you to 
save your setup.

Sean

-Original Message-
From: Erick Velazquez [mailto:erick.lero...@gmail.com] 
Sent: Wednesday, January 24, 2018 11:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a dictionary from ontologies [EXTERNAL]

Hi Sean, 
Thanks for your help. So now I got my BSV file, but I don’t find the 
documentation that explains how to include it into the cTAKES analysis. Is 
there any document that can help me?
Kind regards, 

Erick 

> On Jan 23, 2018, at 5:21 PM, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Erick,
> 
> Each synonym gets a single line in the bsv file.  So:
> 
> OWL00770 | TOOO | Right parietal lobe
> OWL00770 | TOOO | parietal lobe, right
> OWL00770 | TOOO | parietal lobe on the right
> 
> If you don't have a tui then you can simplify the lines by leaving the second 
> column empty:
> 
> OWL00770 | | Right parietal lobe
> OWL00770 | | parietal lobe, right
> OWL00770 | | parietal lobe on the right
> 
> 
> Sean
> 
> -Original Message-
> From: Erick Velazquez [mailto:erick.lero...@gmail.com] 
> Sent: Tuesday, January 23, 2018 5:15 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a dictionary from ontologies [EXTERNAL]
> 
> Hi Sean,
> 
> Thank you for your answer!
> 
> I would like to show you my results. As an example, I got this:
> 
> 
> 
> OWL00770 | TOOO | Right parietal lobe
> 
> 
> 
> The third column is the text.
> 
> You suggested to me to use the uri but what I get from the ontology is only a 
> web link. Then, I don’t use the preferred text option.
> 
> When you say that text should also contain synonyms what do you mean? That 
> means that every token in the text column is considered as a synonym? Then in 
> my example, right would be interpreted as a synonym of parietal and lob?
> 
> 
> 
> Kind regards,
> 
> 
> 
> Erick Velazquez 
> 
> 
>> On Jan 22, 2018, at 2:47 PM, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
>> 
>> Hi Erick,
>> 
>> There is a fourth option that should work
>> 
>> cui | tui | text | preferredText
>> 
>> I would create an importer that creates a -fake- cui.  The cui need not (in 
>> this case should not) start with 'C'.  So, I would import per-owl uri using 
>> something like OWL1.  
>> 
>> tui can be empty, in which case "T000" will be used, =forcing ctakes to 
>> create annotations of unknown semantic type.  
>> 
>> text(s) should contain your synonym(s).
>> 
>> preferredText can be your owl uri.
>> 
>> This should allow you to fake it with an imported owl.  Upon deconstruction 
>> of the cas you will want to look at the preferredTerm for each annotation 
>> and ignore the cui and tui.
>> 
>> Sean 
>> 
>> 
>> From: Erick Velazquez <erick.lero...@gmail.com>
>> Sent: Monday, January 22, 2018 11:14 AM
>> To: dev@ctakes.apache.org
>> Subject: Building a dictionary from ontologies  [EXTERNAL]
>> 
>> Hello,
>> 
>> I’m building a dictionary from an ontology (OWL), but there is no CUI, 
>> neither TUI in the information. Since the format of a dictionary in cTAKES 
>> is CUI | TUI | TEXT, or CUI | TEXT, is there any specification to create 
>> CUIs for terms?
>> Thanks,
>> 
>> Erick
>

Re: Building a dictionary from ontologies [EXTERNAL]

2018-01-24 Thread Erick Velazquez

Hi Sean, 
Thanks for your help. So now I got my BSV file, but I don’t find the 
documentation that explains how to include it into the cTAKES analysis. Is 
there any document that can help me?
Kind regards, 

Erick 

> On Jan 23, 2018, at 5:21 PM, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Erick,
> 
> Each synonym gets a single line in the bsv file.  So:
> 
> OWL00770 | TOOO | Right parietal lobe
> OWL00770 | TOOO | parietal lobe, right
> OWL00770 | TOOO | parietal lobe on the right
> 
> If you don't have a tui then you can simplify the lines by leaving the second 
> column empty:
> 
> OWL00770 | | Right parietal lobe
> OWL00770 | | parietal lobe, right
> OWL00770 | | parietal lobe on the right
> 
> 
> Sean
> 
> -Original Message-
> From: Erick Velazquez [mailto:erick.lero...@gmail.com] 
> Sent: Tuesday, January 23, 2018 5:15 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a dictionary from ontologies [EXTERNAL]
> 
> Hi Sean,
> 
> Thank you for your answer!
> 
> I would like to show you my results. As an example, I got this:
> 
> 
> 
> OWL00770 | TOOO | Right parietal lobe
> 
> 
> 
> The third column is the text.
> 
> You suggested to me to use the uri but what I get from the ontology is only a 
> web link. Then, I don’t use the preferred text option.
> 
> When you say that text should also contain synonyms what do you mean? That 
> means that every token in the text column is considered as a synonym? Then in 
> my example, right would be interpreted as a synonym of parietal and lob?
> 
> 
> 
> Kind regards,
> 
> 
> 
> Erick Velazquez 
> 
> 
>> On Jan 22, 2018, at 2:47 PM, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
>> 
>> Hi Erick,
>> 
>> There is a fourth option that should work
>> 
>> cui | tui | text | preferredText
>> 
>> I would create an importer that creates a -fake- cui.  The cui need not (in 
>> this case should not) start with 'C'.  So, I would import per-owl uri using 
>> something like OWL1.  
>> 
>> tui can be empty, in which case "T000" will be used, =forcing ctakes to 
>> create annotations of unknown semantic type.  
>> 
>> text(s) should contain your synonym(s).
>> 
>> preferredText can be your owl uri.
>> 
>> This should allow you to fake it with an imported owl.  Upon deconstruction 
>> of the cas you will want to look at the preferredTerm for each annotation 
>> and ignore the cui and tui.
>> 
>> Sean 
>> 
>> 
>> From: Erick Velazquez <erick.lero...@gmail.com>
>> Sent: Monday, January 22, 2018 11:14 AM
>> To: dev@ctakes.apache.org
>> Subject: Building a dictionary from ontologies  [EXTERNAL]
>> 
>> Hello,
>> 
>> I’m building a dictionary from an ontology (OWL), but there is no CUI, 
>> neither TUI in the information. Since the format of a dictionary in cTAKES 
>> is CUI | TUI | TEXT, or CUI | TEXT, is there any specification to create 
>> CUIs for terms?
>> Thanks,
>> 
>> Erick
>

RE: Building a dictionary from ontologies [EXTERNAL]

2018-01-23 Thread Finan, Sean

Hi Erick,

Each synonym gets a single line in the bsv file.  So:

OWL00770 | TOOO | Right parietal lobe
OWL00770 | TOOO | parietal lobe, right
OWL00770 | TOOO | parietal lobe on the right

If you don't have a tui then you can simplify the lines by leaving the second 
column empty:

OWL00770 | | Right parietal lobe
OWL00770 | | parietal lobe, right
OWL00770 | | parietal lobe on the right


Sean

-Original Message-
From: Erick Velazquez [mailto:erick.lero...@gmail.com] 
Sent: Tuesday, January 23, 2018 5:15 PM
To: dev@ctakes.apache.org
Subject: Re: Building a dictionary from ontologies [EXTERNAL]

Hi Sean,

Thank you for your answer!

I would like to show you my results. As an example, I got this:

 

OWL00770 | TOOO | Right parietal lobe

 

The third column is the text.

You suggested to me to use the uri but what I get from the ontology is only a 
web link. Then, I don’t use the preferred text option.

When you say that text should also contain synonyms what do you mean? That 
means that every token in the text column is considered as a synonym? Then in 
my example, right would be interpreted as a synonym of parietal and lob?

 

Kind regards,

 

Erick Velazquez 


> On Jan 22, 2018, at 2:47 PM, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Erick,
> 
> There is a fourth option that should work
> 
> cui | tui | text | preferredText
> 
> I would create an importer that creates a -fake- cui.  The cui need not (in 
> this case should not) start with 'C'.  So, I would import per-owl uri using 
> something like OWL1.  
> 
> tui can be empty, in which case "T000" will be used, =forcing ctakes to 
> create annotations of unknown semantic type.  
> 
> text(s) should contain your synonym(s).
> 
> preferredText can be your owl uri.
> 
> This should allow you to fake it with an imported owl.  Upon deconstruction 
> of the cas you will want to look at the preferredTerm for each annotation and 
> ignore the cui and tui.
> 
> Sean 
> 
> 
> From: Erick Velazquez <erick.lero...@gmail.com>
> Sent: Monday, January 22, 2018 11:14 AM
> To: dev@ctakes.apache.org
> Subject: Building a dictionary from ontologies  [EXTERNAL]
> 
> Hello,
> 
> I’m building a dictionary from an ontology (OWL), but there is no CUI, 
> neither TUI in the information. Since the format of a dictionary in cTAKES is 
> CUI | TUI | TEXT, or CUI | TEXT, is there any specification to create CUIs 
> for terms?
> Thanks,
> 
> Erick

Re: Building a dictionary from ontologies [EXTERNAL]

2018-01-23 Thread Erick Velazquez

Hi Sean,

Thank you for your answer!

I would like to show you my results. As an example, I got this:

 

OWL00770 | TOOO | Right parietal lobe

 

The third column is the text.

You suggested to me to use the uri but what I get from the ontology is only a 
web link. Then, I don’t use the preferred text option.

When you say that text should also contain synonyms what do you mean? That 
means that every token in the text column is considered as a synonym? Then in 
my example, right would be interpreted as a synonym of parietal and lob?

 

Kind regards,

 

Erick Velazquez 


> On Jan 22, 2018, at 2:47 PM, Finan, Sean  
> wrote:
> 
> Hi Erick,
> 
> There is a fourth option that should work
> 
> cui | tui | text | preferredText
> 
> I would create an importer that creates a -fake- cui.  The cui need not (in 
> this case should not) start with 'C'.  So, I would import per-owl uri using 
> something like OWL1.  
> 
> tui can be empty, in which case "T000" will be used, =forcing ctakes to 
> create annotations of unknown semantic type.  
> 
> text(s) should contain your synonym(s).
> 
> preferredText can be your owl uri.
> 
> This should allow you to fake it with an imported owl.  Upon deconstruction 
> of the cas you will want to look at the preferredTerm for each annotation and 
> ignore the cui and tui.
> 
> Sean 
> 
> 
> From: Erick Velazquez 
> Sent: Monday, January 22, 2018 11:14 AM
> To: dev@ctakes.apache.org
> Subject: Building a dictionary from ontologies  [EXTERNAL]
> 
> Hello,
> 
> I’m building a dictionary from an ontology (OWL), but there is no CUI, 
> neither TUI in the information. Since the format of a dictionary in cTAKES is 
> CUI | TUI | TEXT, or CUI | TEXT, is there any specification to create CUIs 
> for terms?
> Thanks,
> 
> Erick

Re: Building a dictionary from ontologies [EXTERNAL]

2018-01-22 Thread Finan, Sean

Hi Erick,

There is a fourth option that should work

cui | tui | text | preferredText

I would create an importer that creates a -fake- cui.  The cui need not (in 
this case should not) start with 'C'.  So, I would import per-owl uri using 
something like OWL1.  
 
tui can be empty, in which case "T000" will be used, =forcing ctakes to create 
annotations of unknown semantic type.  

text(s) should contain your synonym(s).

preferredText can be your owl uri.

This should allow you to fake it with an imported owl.  Upon deconstruction of 
the cas you will want to look at the preferredTerm for each annotation and 
ignore the cui and tui.

Sean 


From: Erick Velazquez 
Sent: Monday, January 22, 2018 11:14 AM
To: dev@ctakes.apache.org
Subject: Building a dictionary from ontologies  [EXTERNAL]

Hello,

I’m building a dictionary from an ontology (OWL), but there is no CUI, neither 
TUI in the information. Since the format of a dictionary in cTAKES is CUI | TUI 
| TEXT, or CUI | TEXT, is there any specification to create CUIs for terms?
Thanks,

Erick

Re: building cTAKES (discussion transferred from CTAKES-445 [EXTERNAL]

2017-10-06 Thread Hadrian Zbarcea

Personally, I would prefer Sean's idea and I think it'd be easy to 
implement using org.junit.Assume [1]. The drawback is that one *must* 
remember to setup the umls credentials to run the tests during a 
release, whereas the maven profiles enforce that. But either is fine.


My $0.02,
Hadrian

[1] http://junit.org/junit4/javadoc/4.12/org/junit/Assume.html

On 10/06/2017 02:47 PM, James Masanz wrote:

Alex, I like the idea of "*2 profiles (in pom.xml), one *with* UMLS **account
and one *without**".  However, I would have just the one without the
credentials be part of the Jenkins job and someone would manually run the
other one as part of the release process (or whenever someone felt it was
warranted).


On Fri, Oct 6, 2017 at 11:23 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:


Hi Alex,

I think that it goes against the umls license to have credentials
available to the public.  That might be what you were saying  in a previous
email:

Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build
succeed. But having some credentials in the Jenkins job (official)
doesn't make much sense.


This might be a dumb question, but is it possible to disable a single test
in Jenkins depending upon the run environment?  Can something like
$JENKINS_HOME  and/or $BUILD_ID be used?  If they are in the environment
then it should be easy to check in a unit test and log a warning instead of
running the test.

One thing that we can do is use a small non-umls custom dictionary for a
pipeline test.

One thing that has long been on my plate is smaller hsql, bsv and combined
component tests.  They should be tests outside a full pipeline; just simple
segment, sentence, pos and dictionary,  and a created cas.

What do you think?  Anybody else?

Sean

-Original Message-
From: Alexandru Zbarcea [mailto:al...@apache.org]
Sent: Friday, October 06, 2017 10:41 AM
To: Apache cTAKES Dev
Subject: Re: building cTAKES (discussion transferred from CTAKES-445
[EXTERNAL]

I started to look for ways to make the build stable. After applying patch
for CTAKES-334, the only issue remains with:

testCPE(org.apache.ctakes.regression.test.RegressionPipelineTest):
Initialization of CAS Processor with name "RegressionPipelineAggregateTes
t"
failed.

which is caused by:

ERROR UmlsUserApprover -   User CHANGE_ME not allowed.  It is a placeholder
reminder.

The first thought was to implement 2 profiles (in pom.xml), one *with*
UMLS account and one *without*. A successful release would have to pass a
test execution for both profiles though. That means, in the official
Jenkins would have to be a reference for the UMLS credentials.

What do you think?

Alex

On Thu, Oct 5, 2017 at 1:47 PM, Alexandru Zbarcea <al...@apache.org>
wrote:


Hi James,

I have been working on stabilizing the build for 4.0.0, and I
discovered the following:
* CTAKES-445 (commited)
* CTAKES-334 (patch provided - NOT committed by the community, but
ready,
tested)
* UMLS credential UTest (work-in-progress)

Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build
succeed. But having some credentials in the Jenkins job (official)
doesn't make much sense.

With all these patches, cTAKES would have a stable build, making it
closer to be releasable from official Apache repository.

I also started cleaning up some WARNING(s): see CTAKES-463,
CTAKES-465, issues that would improve the quality of the binaries.

I look forward to your feedback,
Alex





On Tue, Oct 3, 2017 at 5:56 PM, James Masanz <masanz.ja...@gmail.com>
wrote:


A question was asked within JIRA issue CTAKES-445
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.o
rg_jira_browse_CTAKES-2D445=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZ
MSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=bvLXDfh
0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=3Z-jEPHYenjA6JszcDCLhiymi7al8J
5RcFbZTyy3XVY= > about building cTAKES that is more general than
the topic of CTAKES-445, so I'm transferring that to this mailing
list. It started with the following question

how someone is able to provide complete Apache cTakes 4.0 binaries @
https://urldefense.proofpoint.com/v2/url?u=http-3A__archive.apache.or
g_dist_ctakes_ctakes-2D4.0.0_apache-2D=DwIBaQ=qS4goWBT7poplM69zy_
3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
o=bvLXDfh0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=oc8UopbX3fwk1z3oFmP
UPyzdTYW5QW35NSfpB2T2rDA=
ctakes-4.0.0-bin.tar.gz
while
we struggle to build it from official Apache repository because of
issues like this one [CTAKES-445
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.o
rg_jir=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67G
vlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=bvLXDfh0IADeKEvPAs2FfCE5JjPS
wZ32r8Fw3DNV5FM=CXhGNBxrs_S7CFBpQMbtQ7ygcylD9lIq1bwElNx9BHA= >>
a/browse/CTAKES-445> ]


If you are trying to build a binary of cTAKES, I suggest you follow
instructions from the  cTAKES 4.0 Developer I

Re: building cTAKES (discussion transferred from CTAKES-445 [EXTERNAL]

2017-10-06 Thread James Masanz

Alex,
I forgot to add: thanks for all your work on these changes and improvements.

regarding your email Sean,
I think a small non-umls custom dictionary for a pipeline test would be
great.
And test(s) with combined hsql and bsv would also be great. I don't know
they necessarily need to be outside a full pipeline, though that would be a
good start.
And I like the idea of using something like $JENKINS_HOME  and/or
$BUILD_ID, if it can be done, to determine whether to run tests using UMLS
credentials if someone wants to look into it or knows how offhand.


On Fri, Oct 6, 2017 at 11:23 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Alex,
>
> I think that it goes against the umls license to have credentials
> available to the public.  That might be what you were saying  in a previous
> email:
> > Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build
> > succeed. But having some credentials in the Jenkins job (official)
> > doesn't make much sense.
>
> This might be a dumb question, but is it possible to disable a single test
> in Jenkins depending upon the run environment?  Can something like
> $JENKINS_HOME  and/or $BUILD_ID be used?  If they are in the environment
> then it should be easy to check in a unit test and log a warning instead of
> running the test.
>
> One thing that we can do is use a small non-umls custom dictionary for a
> pipeline test.
>
> One thing that has long been on my plate is smaller hsql, bsv and combined
> component tests.  They should be tests outside a full pipeline; just simple
> segment, sentence, pos and dictionary,  and a created cas.
>
> What do you think?  Anybody else?
>
> Sean
>
> -Original Message-
> From: Alexandru Zbarcea [mailto:al...@apache.org]
> Sent: Friday, October 06, 2017 10:41 AM
> To: Apache cTAKES Dev
> Subject: Re: building cTAKES (discussion transferred from CTAKES-445
> [EXTERNAL]
>
> I started to look for ways to make the build stable. After applying patch
> for CTAKES-334, the only issue remains with:
>
> testCPE(org.apache.ctakes.regression.test.RegressionPipelineTest):
> Initialization of CAS Processor with name "RegressionPipelineAggregateTes
> t"
> failed.
>
> which is caused by:
>
> ERROR UmlsUserApprover -   User CHANGE_ME not allowed.  It is a placeholder
> reminder.
>
> The first thought was to implement 2 profiles (in pom.xml), one *with*
> UMLS account and one *without*. A successful release would have to pass a
> test execution for both profiles though. That means, in the official
> Jenkins would have to be a reference for the UMLS credentials.
>
> What do you think?
>
> Alex
>
> On Thu, Oct 5, 2017 at 1:47 PM, Alexandru Zbarcea <al...@apache.org>
> wrote:
>
> > Hi James,
> >
> > I have been working on stabilizing the build for 4.0.0, and I
> > discovered the following:
> > * CTAKES-445 (commited)
> > * CTAKES-334 (patch provided - NOT committed by the community, but
> > ready,
> > tested)
> > * UMLS credential UTest (work-in-progress)
> >
> > Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build
> > succeed. But having some credentials in the Jenkins job (official)
> > doesn't make much sense.
> >
> > With all these patches, cTAKES would have a stable build, making it
> > closer to be releasable from official Apache repository.
> >
> > I also started cleaning up some WARNING(s): see CTAKES-463,
> > CTAKES-465, issues that would improve the quality of the binaries.
> >
> > I look forward to your feedback,
> > Alex
> >
> >
> >
> >
> >
> > On Tue, Oct 3, 2017 at 5:56 PM, James Masanz <masanz.ja...@gmail.com>
> > wrote:
> >
> >> A question was asked within JIRA issue CTAKES-445
> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.o
> >> rg_jira_browse_CTAKES-2D445=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZ
> >> MSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=bvLXDfh
> >> 0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=3Z-jEPHYenjA6JszcDCLhiymi7al8J
> >> 5RcFbZTyy3XVY= > about building cTAKES that is more general than
> >> the topic of CTAKES-445, so I'm transferring that to this mailing
> >> list. It started with the following question
> >>
> >> how someone is able to provide complete Apache cTakes 4.0 binaries @
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__archive.apache.or
> >> g_dist_ctakes_ctakes-2D4.0.0_apache-2D=DwIBaQ=qS4goWBT7poplM69zy_
> >> 3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
> >> o=bvLXDfh0IADeKEvPAs2FfCE5JjPSwZ32

Re: building cTAKES (discussion transferred from CTAKES-445 [EXTERNAL]

2017-10-06 Thread James Masanz

Alex, I like the idea of "*2 profiles (in pom.xml), one *with* UMLS **account
and one *without**".  However, I would have just the one without the
credentials be part of the Jenkins job and someone would manually run the
other one as part of the release process (or whenever someone felt it was
warranted).


On Fri, Oct 6, 2017 at 11:23 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Alex,
>
> I think that it goes against the umls license to have credentials
> available to the public.  That might be what you were saying  in a previous
> email:
> > Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build
> > succeed. But having some credentials in the Jenkins job (official)
> > doesn't make much sense.
>
> This might be a dumb question, but is it possible to disable a single test
> in Jenkins depending upon the run environment?  Can something like
> $JENKINS_HOME  and/or $BUILD_ID be used?  If they are in the environment
> then it should be easy to check in a unit test and log a warning instead of
> running the test.
>
> One thing that we can do is use a small non-umls custom dictionary for a
> pipeline test.
>
> One thing that has long been on my plate is smaller hsql, bsv and combined
> component tests.  They should be tests outside a full pipeline; just simple
> segment, sentence, pos and dictionary,  and a created cas.
>
> What do you think?  Anybody else?
>
> Sean
>
> -Original Message-
> From: Alexandru Zbarcea [mailto:al...@apache.org]
> Sent: Friday, October 06, 2017 10:41 AM
> To: Apache cTAKES Dev
> Subject: Re: building cTAKES (discussion transferred from CTAKES-445
> [EXTERNAL]
>
> I started to look for ways to make the build stable. After applying patch
> for CTAKES-334, the only issue remains with:
>
> testCPE(org.apache.ctakes.regression.test.RegressionPipelineTest):
> Initialization of CAS Processor with name "RegressionPipelineAggregateTes
> t"
> failed.
>
> which is caused by:
>
> ERROR UmlsUserApprover -   User CHANGE_ME not allowed.  It is a placeholder
> reminder.
>
> The first thought was to implement 2 profiles (in pom.xml), one *with*
> UMLS account and one *without*. A successful release would have to pass a
> test execution for both profiles though. That means, in the official
> Jenkins would have to be a reference for the UMLS credentials.
>
> What do you think?
>
> Alex
>
> On Thu, Oct 5, 2017 at 1:47 PM, Alexandru Zbarcea <al...@apache.org>
> wrote:
>
> > Hi James,
> >
> > I have been working on stabilizing the build for 4.0.0, and I
> > discovered the following:
> > * CTAKES-445 (commited)
> > * CTAKES-334 (patch provided - NOT committed by the community, but
> > ready,
> > tested)
> > * UMLS credential UTest (work-in-progress)
> >
> > Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build
> > succeed. But having some credentials in the Jenkins job (official)
> > doesn't make much sense.
> >
> > With all these patches, cTAKES would have a stable build, making it
> > closer to be releasable from official Apache repository.
> >
> > I also started cleaning up some WARNING(s): see CTAKES-463,
> > CTAKES-465, issues that would improve the quality of the binaries.
> >
> > I look forward to your feedback,
> > Alex
> >
> >
> >
> >
> >
> > On Tue, Oct 3, 2017 at 5:56 PM, James Masanz <masanz.ja...@gmail.com>
> > wrote:
> >
> >> A question was asked within JIRA issue CTAKES-445
> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.o
> >> rg_jira_browse_CTAKES-2D445=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZ
> >> MSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=bvLXDfh
> >> 0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=3Z-jEPHYenjA6JszcDCLhiymi7al8J
> >> 5RcFbZTyy3XVY= > about building cTAKES that is more general than
> >> the topic of CTAKES-445, so I'm transferring that to this mailing
> >> list. It started with the following question
> >>
> >> how someone is able to provide complete Apache cTakes 4.0 binaries @
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__archive.apache.or
> >> g_dist_ctakes_ctakes-2D4.0.0_apache-2D=DwIBaQ=qS4goWBT7poplM69zy_
> >> 3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
> >> o=bvLXDfh0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=oc8UopbX3fwk1z3oFmP
> >> UPyzdTYW5QW35NSfpB2T2rDA=
> >> ctakes-4.0.0-bin.tar.gz
> >> while
> >> we struggle to build it from official Apache repository because of
>

RE: building cTAKES (discussion transferred from CTAKES-445 [EXTERNAL]

2017-10-06 Thread Finan, Sean

Hi Alex,

I think that it goes against the umls license to have credentials available to 
the public.  That might be what you were saying  in a previous email:
> Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build 
> succeed. But having some credentials in the Jenkins job (official)
> doesn't make much sense.

This might be a dumb question, but is it possible to disable a single test in 
Jenkins depending upon the run environment?  Can something like $JENKINS_HOME  
and/or $BUILD_ID be used?  If they are in the environment then it should be 
easy to check in a unit test and log a warning instead of running the test.

One thing that we can do is use a small non-umls custom dictionary for a 
pipeline test.  

One thing that has long been on my plate is smaller hsql, bsv and combined 
component tests.  They should be tests outside a full pipeline; just simple 
segment, sentence, pos and dictionary,  and a created cas.  

What do you think?  Anybody else?

Sean

-Original Message-
From: Alexandru Zbarcea [mailto:al...@apache.org] 
Sent: Friday, October 06, 2017 10:41 AM
To: Apache cTAKES Dev
Subject: Re: building cTAKES (discussion transferred from CTAKES-445 [EXTERNAL]

I started to look for ways to make the build stable. After applying patch for 
CTAKES-334, the only issue remains with:

testCPE(org.apache.ctakes.regression.test.RegressionPipelineTest):
Initialization of CAS Processor with name "RegressionPipelineAggregateTest"
failed.

which is caused by:

ERROR UmlsUserApprover -   User CHANGE_ME not allowed.  It is a placeholder
reminder.

The first thought was to implement 2 profiles (in pom.xml), one *with* UMLS 
account and one *without*. A successful release would have to pass a test 
execution for both profiles though. That means, in the official Jenkins would 
have to be a reference for the UMLS credentials.

What do you think?

Alex

On Thu, Oct 5, 2017 at 1:47 PM, Alexandru Zbarcea <al...@apache.org> wrote:

> Hi James,
>
> I have been working on stabilizing the build for 4.0.0, and I 
> discovered the following:
> * CTAKES-445 (commited)
> * CTAKES-334 (patch provided - NOT committed by the community, but 
> ready,
> tested)
> * UMLS credential UTest (work-in-progress)
>
> Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build 
> succeed. But having some credentials in the Jenkins job (official) 
> doesn't make much sense.
>
> With all these patches, cTAKES would have a stable build, making it 
> closer to be releasable from official Apache repository.
>
> I also started cleaning up some WARNING(s): see CTAKES-463, 
> CTAKES-465, issues that would improve the quality of the binaries.
>
> I look forward to your feedback,
> Alex
>
>
>
>
>
> On Tue, Oct 3, 2017 at 5:56 PM, James Masanz <masanz.ja...@gmail.com>
> wrote:
>
>> A question was asked within JIRA issue CTAKES-445 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.o
>> rg_jira_browse_CTAKES-2D445=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZ
>> MSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=bvLXDfh
>> 0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=3Z-jEPHYenjA6JszcDCLhiymi7al8J
>> 5RcFbZTyy3XVY= > about building cTAKES that is more general than 
>> the topic of CTAKES-445, so I'm transferring that to this mailing 
>> list. It started with the following question
>>
>> how someone is able to provide complete Apache cTakes 4.0 binaries @ 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__archive.apache.or
>> g_dist_ctakes_ctakes-2D4.0.0_apache-2D=DwIBaQ=qS4goWBT7poplM69zy_
>> 3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
>> o=bvLXDfh0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=oc8UopbX3fwk1z3oFmP
>> UPyzdTYW5QW35NSfpB2T2rDA=
>> ctakes-4.0.0-bin.tar.gz
>> while
>> we struggle to build it from official Apache repository because of 
>> issues like this one [CTAKES-445 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.o
>> rg_jir=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67G
>> vlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=bvLXDfh0IADeKEvPAs2FfCE5JjPS
>> wZ32r8Fw3DNV5FM=CXhGNBxrs_S7CFBpQMbtQ7ygcylD9lIq1bwElNx9BHA= >> 
>> a/browse/CTAKES-445> ]
>>
>>
>> If you are trying to build a binary of cTAKES, I suggest you follow 
>> instructions from the  cTAKES 4.0 Developer Install Guide 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.or
>> g_confluence_display_CTAKES_cTAKES-2B=DwIBaQ=qS4goWBT7poplM69zy_3
>> xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao
>> =bvLXDfh0IADeKEvPAs2FfCE5JjPSwZ32r8Fw3DNV5FM=RchFgtJIDpmC-PAgEW4z
>> xd0a5hNtt8Yo3US-XtU5tiA= >> 4.0+Developer+Install+Guide

Re: building cTAKES (discussion transferred from CTAKES-445

2017-10-06 Thread Alexandru Zbarcea

I started to look for ways to make the build stable. After applying patch
for CTAKES-334, the only issue remains with:

testCPE(org.apache.ctakes.regression.test.RegressionPipelineTest):
Initialization of CAS Processor with name "RegressionPipelineAggregateTest"
failed.

which is caused by:

ERROR UmlsUserApprover -   User CHANGE_ME not allowed.  It is a placeholder
reminder.

The first thought was to implement 2 profiles (in pom.xml), one *with* UMLS
account and one *without*. A successful release would have to pass a test
execution for both profiles though. That means, in the official Jenkins
would have to be a reference for the UMLS credentials.

What do you think?

Alex

On Thu, Oct 5, 2017 at 1:47 PM, Alexandru Zbarcea  wrote:

> Hi James,
>
> I have been working on stabilizing the build for 4.0.0, and I discovered
> the following:
> * CTAKES-445 (commited)
> * CTAKES-334 (patch provided - NOT committed by the community, but ready,
> tested)
> * UMLS credential UTest (work-in-progress)
>
> Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build
> succeed. But having some credentials in the Jenkins job (official) doesn't
> make much sense.
>
> With all these patches, cTAKES would have a stable build, making it closer
> to be releasable from official Apache repository.
>
> I also started cleaning up some WARNING(s): see CTAKES-463, CTAKES-465,
> issues that would improve the quality of the binaries.
>
> I look forward to your feedback,
> Alex
>
>
>
>
>
> On Tue, Oct 3, 2017 at 5:56 PM, James Masanz 
> wrote:
>
>> A question was asked within JIRA issue CTAKES-445
>>  about building cTAKES
>> that is more general than the topic of CTAKES-445, so I'm transferring
>> that
>> to this mailing list. It started with the following question
>>
>> how someone is able to provide complete Apache cTakes 4.0 binaries @
>> http://archive.apache.org/dist/ctakes/ctakes-4.0.0/apache-
>> ctakes-4.0.0-bin.tar.gz
>> while
>> we struggle to build it from official Apache repository because of issues
>> like this one [CTAKES-445 > a/browse/CTAKES-445>
>> ]
>>
>>
>> If you are trying to build a binary of cTAKES, I suggest you follow
>> instructions from the  cTAKES 4.0 Developer Install Guide
>> > 4.0+Developer+Install+Guide>
>> to
>> get a copy of cTAKES from trunk, but when checking out the source, be sure
>> to specify the revision you are interested in. By checking out from trunk,
>> you will get pom files that have a SNAPSHOT version.
>>
>> Then use the command line version of maven to do something like the
>> following
>> mvn clean install -DskipTests=true
>> You should find the binaries have been built somewhere under
>> ctakes-distribution
>>
>> -- James
>>
>
>

Re: building cTAKES (discussion transferred from CTAKES-445

2017-10-05 Thread Alexandru Zbarcea

Hi James,

I have been working on stabilizing the build for 4.0.0, and I discovered
the following:
* CTAKES-445 (commited)
* CTAKES-334 (patch provided - NOT committed by the community, but ready,
tested)
* UMLS credential UTest (work-in-progress)

Exporting the ctakes_umlsuser, ctakes_umlspw makes the whole build succeed.
But having some credentials in the Jenkins job (official) doesn't make much
sense.

With all these patches, cTAKES would have a stable build, making it closer
to be releasable from official Apache repository.

I also started cleaning up some WARNING(s): see CTAKES-463, CTAKES-465,
issues that would improve the quality of the binaries.

I look forward to your feedback,
Alex





On Tue, Oct 3, 2017 at 5:56 PM, James Masanz  wrote:

> A question was asked within JIRA issue CTAKES-445
>  about building cTAKES
> that is more general than the topic of CTAKES-445, so I'm transferring that
> to this mailing list. It started with the following question
>
> how someone is able to provide complete Apache cTakes 4.0 binaries @
> http://archive.apache.org/dist/ctakes/ctakes-4.0.0/
> apache-ctakes-4.0.0-bin.tar.gz
> while
> we struggle to build it from official Apache repository because of issues
> like this one [CTAKES-445  jira/browse/CTAKES-445>
> ]
>
>
> If you are trying to build a binary of cTAKES, I suggest you follow
> instructions from the  cTAKES 4.0 Developer Install Guide
>  cTAKES+4.0+Developer+Install+Guide>
> to
> get a copy of cTAKES from trunk, but when checking out the source, be sure
> to specify the revision you are interested in. By checking out from trunk,
> you will get pom files that have a SNAPSHOT version.
>
> Then use the command line version of maven to do something like the
> following
> mvn clean install -DskipTests=true
> You should find the binaries have been built somewhere under
> ctakes-distribution
>
> -- James
>

RE: Building a Custom cTAKES Dictionary [EXTERNAL]

2017-07-02 Thread Finan, Sean

Hi Andrew,

I am glad that it worked for you.  I agree that the umls/metamorphosis 
installation could use a few hints here and there, but I don't want to be the 
pot calling the kettle ...

Thanks,
Sean

-Original Message-
From: Andrew Phillips [mailto:aphilli...@luc.edu] 
Sent: Friday, June 30, 2017 10:29 PM
To: dev@ctakes.apache.org
Subject: Re: Building a Custom cTAKES Dictionary [EXTERNAL]

Hi Sean,

It took a while to figure out how to setup everything and run tests, but I have 
now successfully generated CUI files containing the terms that were missing 
before.

Thanks for your help.

Andrew


*Andrew Phillips*
GitHub: github.com/skeledrew
LinkedIn: www.linkedin.com/in/aphillipstech

On 28 June 2017 at 19:31, Finan, Sean <sean.fi...@childrens.harvard.edu>
wrote:

> Hi Andrew,
>
> You will need to download the umls data from the nlm.  Go to their 
> website 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.gov_r
> esearch_umls_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> =fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=F5WVZckx_3sdlvoaIdIevDj
> eldXHgjUG7qjyUqYK9qQ=S-FPNAvXFdKt7MWWJUkn5X2s1PiDkbuhZCFUphCH3t4=
> and use the "Downloads" button in the "Access" panel.
>
> I will put some more instructions on the wiki page when I get a chance.
>
> After you download and build a local copy of the umls, run the gui.  
> In your ctakes resources/ directory, go to 
> org/apache/ctakes/dictionary/lookup/fast/
> and you will see an xml file and a directory named after your custom 
> dictionary.  Copy those to the headless server in that ctakes'
> resources/org/apache/../fast/ directory.
>
> See the (info) panel at the bottom of 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> =DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstT
> pyIisCYNYmQCP6r0bcpKGd4f7d4gTao=F5WVZckx_3sdlvoaIdIevDjeldXHgjUG7qjy
> UqYK9qQ=XQZYxmJ5KLB6jcx4ZAPQ-EtQt05nUbUim-hiI5snN_M=
> confluence/display/CTAKES/Dictionary+Creator+GUI
> To see how to point ctakes to your custom gui.
>
> By the way, if you have a small number of terms and don't need the 
> umls you can manually create a bar separated value (bsv) file.  I am 
> guessing that you have more than a few terms.
>
> Sean
>
> -Original Message-
> From: Andrew Phillips [mailto:skeled...@gmail.com]
> Sent: Wednesday, June 28, 2017 6:28 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a Custom cTAKES Dictionary [EXTERNAL]
>
> Hi Sean,
>
> I haven't found a good way to launch GUIs remotely (BTW I'm also using 
> Linux on my machine). However I also have a local cTAKES install and 
> I'm trying to do the modification there. However I'm not sure what to 
> enter into the UMLS installation field, as I cannot find a META 
> directory or RRF files. Also what file(s) woud I transfer to the 
> server once I have built the new dictionary?
>
> Thanks,
> Andrew
>
> *Andrew Phillips*
> Computer Technician / Programmer and Mobile Experience Consultant
> Phone: (678) 753-5313
> Email: skeled...@gmail.com
> LinkedIn: www.linkedin.com/in/aphillipstech
>
> "A man may imagine things that are false, but he can only understand 
> things that are true, for if the things be false, the apprehension of 
> them is not understanding." - Isaac Newton
>
> On 28 June 2017 at 12:35, Finan, Sean 
> <sean.fi...@childrens.harvard.edu>
> wrote:
>
> > Hi Andrew,
> >
> > Can you xWin (or other) to the server to launch gui applications?
> > If so, try the dictionary creator gui:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.or
> > g_ 
> > =DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZs
> > tT
> > pyIisCYNYmQCP6r0bcpKGd4f7d4gTao=375tbajCm8PjEi_45ahKX7bDghJ_w3fOBI
> > h9 vAMpylU=0JwVENy0eLeCkfQVSfkUBwhF_xAzBzXGYCWVX9_e65U=
> > confluence/display/CTAKES/Dictionary+Creator+GUI
> >
> >
> > Sean
> >
> > -Original Message-
> > From: Andrew Phillips [mailto:aphilli...@luc.edu]
> > Sent: Wednesday, June 28, 2017 1:14 PM
> > To: dev@ctakes.apache.org
> > Subject: Building a Custom cTAKES Dictionary [EXTERNAL]
> >
> > Hello,
> >
> > I am new to cTAKES and I'm trying to create a custom dictionary of 
> > additional terms related to alcohol. How can I go about adding 
> > semantic types such as food (T168), etc that aren't available by 
> > default to the dictionary? The cTAKES install is located on a 
> > headless Linux server that I access via SSH.
> >
> > Thank you,
> > Andrew
> >
> > *Andrew Phillips*
> > GitHub: github.com/skeledrew
> > LinkedIn: www.linkedin.com/in/aphillipstech
> >
>

Re: Building a Custom cTAKES Dictionary [EXTERNAL]

2017-06-30 Thread Andrew Phillips

Hi Sean,

It took a while to figure out how to setup everything and run tests, but I
have now successfully generated CUI files containing the terms that were
missing before.

Thanks for your help.

Andrew


*Andrew Phillips*
GitHub: github.com/skeledrew
LinkedIn: www.linkedin.com/in/aphillipstech

On 28 June 2017 at 19:31, Finan, Sean <sean.fi...@childrens.harvard.edu>
wrote:

> Hi Andrew,
>
> You will need to download the umls data from the nlm.  Go to their website
> https://www.nlm.nih.gov/research/umls/
> and use the "Downloads" button in the "Access" panel.
>
> I will put some more instructions on the wiki page when I get a chance.
>
> After you download and build a local copy of the umls, run the gui.  In
> your ctakes resources/ directory, go to 
> org/apache/ctakes/dictionary/lookup/fast/
> and you will see an xml file and a directory named after your custom
> dictionary.  Copy those to the headless server in that ctakes'
> resources/org/apache/../fast/ directory.
>
> See the (info) panel at the bottom of https://cwiki.apache.org/
> confluence/display/CTAKES/Dictionary+Creator+GUI
> To see how to point ctakes to your custom gui.
>
> By the way, if you have a small number of terms and don't need the umls
> you can manually create a bar separated value (bsv) file.  I am guessing
> that you have more than a few terms.
>
> Sean
>
> -Original Message-
> From: Andrew Phillips [mailto:skeled...@gmail.com]
> Sent: Wednesday, June 28, 2017 6:28 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a Custom cTAKES Dictionary [EXTERNAL]
>
> Hi Sean,
>
> I haven't found a good way to launch GUIs remotely (BTW I'm also using
> Linux on my machine). However I also have a local cTAKES install and I'm
> trying to do the modification there. However I'm not sure what to enter
> into the UMLS installation field, as I cannot find a META directory or RRF
> files. Also what file(s) woud I transfer to the server once I have built
> the new dictionary?
>
> Thanks,
> Andrew
>
> *Andrew Phillips*
> Computer Technician / Programmer and Mobile Experience Consultant
> Phone: (678) 753-5313
> Email: skeled...@gmail.com
> LinkedIn: www.linkedin.com/in/aphillipstech
>
> "A man may imagine things that are false, but he can only understand
> things that are true, for if the things be false, the apprehension of them
> is not understanding." - Isaac Newton
>
> On 28 June 2017 at 12:35, Finan, Sean <sean.fi...@childrens.harvard.edu>
> wrote:
>
> > Hi Andrew,
> >
> > Can you xWin (or other) to the server to launch gui applications?
> > If so, try the dictionary creator gui:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> > =DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstT
> > pyIisCYNYmQCP6r0bcpKGd4f7d4gTao=375tbajCm8PjEi_45ahKX7bDghJ_w3fOBIh9
> > vAMpylU=0JwVENy0eLeCkfQVSfkUBwhF_xAzBzXGYCWVX9_e65U=
> > confluence/display/CTAKES/Dictionary+Creator+GUI
> >
> >
> > Sean
> >
> > -Original Message-
> > From: Andrew Phillips [mailto:aphilli...@luc.edu]
> > Sent: Wednesday, June 28, 2017 1:14 PM
> > To: dev@ctakes.apache.org
> > Subject: Building a Custom cTAKES Dictionary [EXTERNAL]
> >
> > Hello,
> >
> > I am new to cTAKES and I'm trying to create a custom dictionary of
> > additional terms related to alcohol. How can I go about adding
> > semantic types such as food (T168), etc that aren't available by
> > default to the dictionary? The cTAKES install is located on a headless
> > Linux server that I access via SSH.
> >
> > Thank you,
> > Andrew
> >
> > *Andrew Phillips*
> > GitHub: github.com/skeledrew
> > LinkedIn: www.linkedin.com/in/aphillipstech
> >
>

RE: Building a Custom cTAKES Dictionary [EXTERNAL]

2017-06-28 Thread Finan, Sean

Hi Andrew,

You will need to download the umls data from the nlm.  Go to their website
https://www.nlm.nih.gov/research/umls/
and use the "Downloads" button in the "Access" panel.

I will put some more instructions on the wiki page when I get a chance.

After you download and build a local copy of the umls, run the gui.  In your 
ctakes resources/ directory, go to org/apache/ctakes/dictionary/lookup/fast/ 
and you will see an xml file and a directory named after your custom 
dictionary.  Copy those to the headless server in that ctakes' 
resources/org/apache/../fast/ directory.

See the (info) panel at the bottom of 
https://cwiki.apache.org/confluence/display/CTAKES/Dictionary+Creator+GUI 
To see how to point ctakes to your custom gui.

By the way, if you have a small number of terms and don't need the umls you can 
manually create a bar separated value (bsv) file.  I am guessing that you have 
more than a few terms.

Sean

-Original Message-
From: Andrew Phillips [mailto:skeled...@gmail.com] 
Sent: Wednesday, June 28, 2017 6:28 PM
To: dev@ctakes.apache.org
Subject: Re: Building a Custom cTAKES Dictionary [EXTERNAL]

Hi Sean,

I haven't found a good way to launch GUIs remotely (BTW I'm also using Linux on 
my machine). However I also have a local cTAKES install and I'm trying to do 
the modification there. However I'm not sure what to enter into the UMLS 
installation field, as I cannot find a META directory or RRF files. Also what 
file(s) woud I transfer to the server once I have built the new dictionary?

Thanks,
Andrew

*Andrew Phillips*
Computer Technician / Programmer and Mobile Experience Consultant
Phone: (678) 753-5313
Email: skeled...@gmail.com
LinkedIn: www.linkedin.com/in/aphillipstech

"A man may imagine things that are false, but he can only understand things 
that are true, for if the things be false, the apprehension of them is not 
understanding." - Isaac Newton

On 28 June 2017 at 12:35, Finan, Sean <sean.fi...@childrens.harvard.edu>
wrote:

> Hi Andrew,
>
> Can you xWin (or other) to the server to launch gui applications?
> If so, try the dictionary creator gui:  
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> =DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstT
> pyIisCYNYmQCP6r0bcpKGd4f7d4gTao=375tbajCm8PjEi_45ahKX7bDghJ_w3fOBIh9
> vAMpylU=0JwVENy0eLeCkfQVSfkUBwhF_xAzBzXGYCWVX9_e65U=
> confluence/display/CTAKES/Dictionary+Creator+GUI
>
>
> Sean
>
> -Original Message-
> From: Andrew Phillips [mailto:aphilli...@luc.edu]
> Sent: Wednesday, June 28, 2017 1:14 PM
> To: dev@ctakes.apache.org
> Subject: Building a Custom cTAKES Dictionary [EXTERNAL]
>
> Hello,
>
> I am new to cTAKES and I'm trying to create a custom dictionary of 
> additional terms related to alcohol. How can I go about adding 
> semantic types such as food (T168), etc that aren't available by 
> default to the dictionary? The cTAKES install is located on a headless 
> Linux server that I access via SSH.
>
> Thank you,
> Andrew
>
> *Andrew Phillips*
> GitHub: github.com/skeledrew
> LinkedIn: www.linkedin.com/in/aphillipstech
>

Re: Building a Custom cTAKES Dictionary [EXTERNAL]

2017-06-28 Thread Andrew Phillips

Hi Sean,

I haven't found a good way to launch GUIs remotely (BTW I'm also using
Linux on my machine). However I also have a local cTAKES install and I'm
trying to do the modification there. However I'm not sure what to enter
into the UMLS installation field, as I cannot find a META directory or RRF
files. Also what file(s) woud I transfer to the server once I have built
the new dictionary?

Thanks,
Andrew

*Andrew Phillips*
Computer Technician / Programmer and Mobile Experience Consultant
Phone: (678) 753-5313
Email: skeled...@gmail.com
LinkedIn: www.linkedin.com/in/aphillipstech

"A man may imagine things that are false, but he can only understand things
that are true, for if the things be false, the apprehension of them is not
understanding." - Isaac Newton

On 28 June 2017 at 12:35, Finan, Sean 
wrote:

> Hi Andrew,
>
> Can you xWin (or other) to the server to launch gui applications?
> If so, try the dictionary creator gui:  https://cwiki.apache.org/
> confluence/display/CTAKES/Dictionary+Creator+GUI
>
>
> Sean
>
> -Original Message-
> From: Andrew Phillips [mailto:aphilli...@luc.edu]
> Sent: Wednesday, June 28, 2017 1:14 PM
> To: dev@ctakes.apache.org
> Subject: Building a Custom cTAKES Dictionary [EXTERNAL]
>
> Hello,
>
> I am new to cTAKES and I'm trying to create a custom dictionary of
> additional terms related to alcohol. How can I go about adding semantic
> types such as food (T168), etc that aren't available by default to the
> dictionary? The cTAKES install is located on a headless Linux server that I
> access via SSH.
>
> Thank you,
> Andrew
>
> *Andrew Phillips*
> GitHub: github.com/skeledrew
> LinkedIn: www.linkedin.com/in/aphillipstech
>

RE: Building a Custom cTAKES Dictionary [EXTERNAL]

2017-06-28 Thread Finan, Sean

Hi Andrew,

Can you xWin (or other) to the server to launch gui applications?
If so, try the dictionary creator gui:  
https://cwiki.apache.org/confluence/display/CTAKES/Dictionary+Creator+GUI


Sean

-Original Message-
From: Andrew Phillips [mailto:aphilli...@luc.edu] 
Sent: Wednesday, June 28, 2017 1:14 PM
To: dev@ctakes.apache.org
Subject: Building a Custom cTAKES Dictionary [EXTERNAL]

Hello,

I am new to cTAKES and I'm trying to create a custom dictionary of additional 
terms related to alcohol. How can I go about adding semantic types such as food 
(T168), etc that aren't available by default to the dictionary? The cTAKES 
install is located on a headless Linux server that I access via SSH.

Thank you,
Andrew

*Andrew Phillips*
GitHub: github.com/skeledrew
LinkedIn: www.linkedin.com/in/aphillipstech

Re: building a real sample dictionary without UMLS login

2015-10-02 Thread Mattmann, Chris A (3980)

Hi,

I would be extremely interested in a sample dictionary that
doesn’t require a UMLS login.

How would I use this?

Thanks,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





-Original Message-
From: "and...@apache.org (forwarding)" 
Reply-To: "dev@ctakes.apache.org" 
Date: Friday, October 2, 2015 at 12:43 AM
To: "dev@ctakes.apache.org" 
Subject: building a *real sample dictionary* without UMLS login

>Greetings ctakes-dev!
>
>I have been polishing MedGen (UMLS) dictionaries for over a year now and
>I am confident in saying "this is solid".
>As a reminder, the medgen-mysql package contains a large subset of the
>UMLS that can be downloaded without UMLS login, greatly simplifying the
>creation of an example dictionary.
>
>QUESTION: 
>Would you like me to integrate this into ctakes to simplify installations
>for new-users, and if so, what would be your preferred method?
>
>Source Vocabularies (SAB)
>+-++
>| SourceVocab | cnt|
>+-++
>| MSH | 245435 | Medical Subject Headings
>| SNOMEDCT_US | 156105 | SNOMED Clinical Terms
>| NCI | 136888 | NCI Cancer Terms
>| ... |  ...   |
>+-++
>
>Semantic Types (STY)
>+---++
>| SemanticType  | cnt|
>+---++
>| Pharmacologic Substance   | 102511 |
>| Finding   |  90413 |
>| Organic Chemical  |  81329 |
>| Disease or Syndrome   |  47223 |
>| Neoplastic Process|  16151 |
>| Amino Acid, Peptide, or Protein   |   9383 |
>| Congenital Abnormality|   6536 |
>| Pathologic Function   |   5655 |
>| Steroid   |   3919 |
>| Sign or Symptom   |   2909 |
>| ...   |   ...  |
>
>
>What would you like to see?
>and...@apache.org  
>
>
>On Nov 12, 2014, at 6:14 AM, "Dligach, Dmitriy"
> wrote:
>
>> Andy, thank you for this resource!
>> 
>> Do you have an estimate of what percentage of UMLS concepts were left
>>out?
>> 
>> Dima
>> 
>> 
>> 
>> 
>> On Nov 11, 2014, at 16:02, andy mcmurry  wrote:
>> 
>>> Hello!
>>> 
>>> https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
>>> 
>>> We just released a new library containing a huge chunk of UMLS concepts
>>> which are available without registering accounts/username/passwords.
>>> LEGALLY. Yes, really!
>>> 
>>> The subset is from NCBI and it contains *thousands of concepts from
>>>SNOMED
>>> and other vocabularies*.
>>> 
>>> The code is essentially
>>> 1. a list of WGET targets to various NCBI FTP site mirrors
>>> 2. Makefile for building the databases of interest
>>> 
>>> Our legal team has approved distribution for Open Access work, ASL2
>>> LICENSE.
>>> 
>>> I recommend we use this opportunity to make this the default
>>>distribution
>>> for CTAKES UMLS connections, because it obviates the need for so much
>>> painful credentialing and back and forth agreements with the US
>>>National
>>> Library of Medicine.
>>> 
>>> Cheers!
>>> --Andy
>>> 
>>> 
>>> On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J.
>>>
>>> wrote:
>>> 
 
 I would love to see the install be as simple as apt-get install to
end up
 with some working dictionary that have more than a handful of entries
to
 get them started.
 
 Regards,
 James Masanz
 
 -Original Message-
 From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
 Sent: Tuesday, September 09, 2014 4:32 PM
 To: ctakes-...@incubator.apache.org
 Subject: Recommendation for ctakes default (UMLS) dictionaries
 
 Greetings ctakes-dev:
 
 *UMLS license restrictions have been getting more lax over the years
--
 *much of the UMLS can be downloaded directly from the NCBI official
FTP
 site.
 
 In fact, the NIH (and implicitly the NLM) *have already made the
standard
 terms public for some medical specialities*.
 
 For example: Here is the UMLS subset specific to Medical Genetics
(MedGen)
 and Genetic Testing (GTR) complete with

Re: Building

2014-07-14 Thread John Green

Hi Vijay - The queries are all returning odd errors but the commands to
list what graphs are available. I will try and sleuth out a better report
than that.

Thank you for your continued help,
JG

On Sat, Jul 5, 2014 at 12:53 AM, vijay garla vnga...@gmail.com wrote:

When you run the webapp, the restful sevices run as well

On Friday, July 4, 2014, John Green john.travis.gr...@gmail.com wrote:

Vijay - Ha! Ok. Works perfect with cuis.

Is there a way to run the web application as a RESTful API? You mention
this as a service on your yale box, but I dont see a way to deploy it
this
way local.

Thanks again,
JG

On Wed, Jul 2, 2014 at 10:58 PM, vijay garla vnga...@gmail.com
javascript:; wrote:

The ytexWeb application tries to look up concepts from terms using the
ytex
dictionary lookup table, which is a small subset of the UMLS. Can you
try
specifying cuis? That skips the lookup - if the concepts are in the
concept graph, this will work.

Best,

On Sun, Jun 29, 2014 at 6:10 PM, John Green
john.travis.gr...@gmail.com
javascript:;
wrote:

Hi Vijay, thank you for your time.

Your documentation was quite good. I had no problem setting up ytex
with
UMLS running on my local mysql server. Where I ran into problems was
understanding how to launch the web service (also, is there anyway to
run
this in a RESTful mode? Btw, the informatics.yale links returns 502).
After
I did get it launched, and the confusion was probably all my fault,
the
concepts available to the similarity fields seemed very sparse; I
just
started typing randomly, hematochezia, choledocholithiasis, etc, and
nothing would come up. The best I got was gallbladder function test,
which,
if Im understanding it right, would be an alkphos, but alkaline
phosphatase
didnt come up, which led to me to believe they were smaller sets of
the
the
snomed, mesh, etc compilations (as I checked the UMLS db and these
concepts
are there).

I think I got that execution command from the code.google, which is
probably why it was stale. I did not see the ytex semantic similarity
guide
under the ctakes components part (sorry, thanks for pointing me
there,
ill
get to work on reading it).

So bottom line: are the ones that shipped watered down versions? And
if
not, why are my concepts coming up short? If you give me a hint at
where
to
check Ill investigate.

Thanks!
JG

On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com
javascript:; wrote:

Hi John,

YTEX ships with 3 concept graphs (see

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):

- sct-rxnorm: concepts from SNOMED-CT and RXNORM. This is the
default.
- sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and
Alcohol
and Drug thesaurus
- umls: concepts from all restriction free (level 0) UMLS source
vocabularies and SNOMED-CT

These concept graphs are included in ytex resources zip (see

https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
):
3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:

- Concept Graphs derived from the UMLS2013AA used to compute
semantic
similarity measures

All YTEX packages moved from the ytex namespace into
org.apache.ctakes.ytex
- can you tell me which document you were looking at that mentioned
ytex.kernel.dao.ConceptDaoImpl? I thought I had fixed this in the
documentation.

HTH,

-vj

On Sun, Jun 29, 2014 at 2:25 PM, John Green
john.travis.gr...@gmail.com javascript:;

wrote:

I got the semantic similarity web app running in ytex. Im still
learning
umls terminology, but I believe it says that out of the box its
concept
graphs are limited to the free set from umls? Does this mean
without
permissions? Similar to ctakes with umls rights? The concepts
available
seem limited so this would make sense.

So, to take full advantage I would need to rebuild the concept
graph,
correct? Im in the process of doing this but getting classpath
errors.
I
used java a bit ten years ago, so you can probably guess these
will
take
me
a minute to resolve. Notably, it is complaining about
ytex.kernel.dao.ConceptDaoImpl.

Thanks all,

—
Sent from Mailbox for iPhone

Re: Building

2014-07-04 Thread John Green

Vijay - Ha! Ok. Works perfect with cuis.

Is there a way to run the web application as a RESTful API? You mention
this as a service on your yale box, but I dont see a way to deploy it this
way local.

Thanks again,
JG

On Wed, Jul 2, 2014 at 10:58 PM, vijay garla vnga...@gmail.com wrote:

The ytexWeb application tries to look up concepts from terms using the ytex
dictionary lookup table, which is a small subset of the UMLS. Can you try
specifying cuis? That skips the lookup - if the concepts are in the
concept graph, this will work.

Best,

On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com
wrote:

Hi Vijay, thank you for your time.

Your documentation was quite good. I had no problem setting up ytex with
UMLS running on my local mysql server. Where I ran into problems was
understanding how to launch the web service (also, is there anyway to run
this in a RESTful mode? Btw, the informatics.yale links returns 502).
After
I did get it launched, and the confusion was probably all my fault, the
concepts available to the similarity fields seemed very sparse; I just
started typing randomly, hematochezia, choledocholithiasis, etc, and
nothing would come up. The best I got was gallbladder function test,
which,
if Im understanding it right, would be an alkphos, but alkaline
phosphatase
didnt come up, which led to me to believe they were smaller sets of the
the
snomed, mesh, etc compilations (as I checked the UMLS db and these
concepts
are there).

So bottom line: are the ones that shipped watered down versions? And if
not, why are my concepts coming up short? If you give me a hint at where
to
check Ill investigate.

Thanks!
JG

On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote:

Hi John,

YTEX ships with 3 concept graphs (see

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):

These concept graphs are included in ytex resources zip (see
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation):
3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:

- Concept Graphs derived from the UMLS2013AA used to compute
semantic
similarity measures

HTH,

-vj

On Sun, Jun 29, 2014 at 2:25 PM, John Green
john.travis.gr...@gmail.com

wrote:

I got the semantic similarity web app running in ytex. Im still
learning
umls terminology, but I believe it says that out of the box its
concept
graphs are limited to the free set from umls? Does this mean without
permissions? Similar to ctakes with umls rights? The concepts
available
seem limited so this would make sense.

So, to take full advantage I would need to rebuild the concept graph,
correct? Im in the process of doing this but getting classpath
errors.
I
used java a bit ten years ago, so you can probably guess these will
take
me
a minute to resolve. Notably, it is complaining about
ytex.kernel.dao.ConceptDaoImpl.

Thanks all,

—
Sent from Mailbox for iPhone

Re: Building

2014-07-04 Thread vijay garla

When you run the webapp, the restful sevices run as well

On Friday, July 4, 2014, John Green john.travis.gr...@gmail.com wrote:

Vijay - Ha! Ok. Works perfect with cuis.

Is there a way to run the web application as a RESTful API? You mention
this as a service on your yale box, but I dont see a way to deploy it this
way local.

Thanks again,
JG

On Wed, Jul 2, 2014 at 10:58 PM, vijay garla vnga...@gmail.com
javascript:; wrote:

Best,

On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com
javascript:;
wrote:

Hi Vijay, thank you for your time.

Your documentation was quite good. I had no problem setting up ytex
with
UMLS running on my local mysql server. Where I ran into problems was
understanding how to launch the web service (also, is there anyway to
run
this in a RESTful mode? Btw, the informatics.yale links returns 502).
After
I did get it launched, and the confusion was probably all my fault, the
concepts available to the similarity fields seemed very sparse; I just
started typing randomly, hematochezia, choledocholithiasis, etc, and
nothing would come up. The best I got was gallbladder function test,
which,
if Im understanding it right, would be an alkphos, but alkaline
phosphatase
didnt come up, which led to me to believe they were smaller sets of the
the
snomed, mesh, etc compilations (as I checked the UMLS db and these
concepts
are there).

So bottom line: are the ones that shipped watered down versions? And if
not, why are my concepts coming up short? If you give me a hint at
where
to
check Ill investigate.

Thanks!
JG

On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com
javascript:; wrote:

Hi John,

YTEX ships with 3 concept graphs (see

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):

These concept graphs are included in ytex resources zip (see
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
):
3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:

- Concept Graphs derived from the UMLS2013AA used to compute
semantic
similarity measures

HTH,

-vj

On Sun, Jun 29, 2014 at 2:25 PM, John Green
john.travis.gr...@gmail.com javascript:;

wrote:

So, to take full advantage I would need to rebuild the concept
graph,
correct? Im in the process of doing this but getting classpath
errors.
I
used java a bit ten years ago, so you can probably guess these will
take
me
a minute to resolve. Notably, it is complaining about
ytex.kernel.dao.ConceptDaoImpl.

Thanks all,

—
Sent from Mailbox for iPhone

Re: Building

2014-07-02 Thread vijay garla

the concept graph used by the webapp is defined in ytex.properties. You
can also override it using the ytex.conceptGraph system property (add
-Dytex.conceptGraph=xxx to the beginning of the ytexweb.bat java command
line).

I'm not sure about why you don't see any log output:
when I run this line specifying an invalid concept graph name:
java -cp %CLASSPATH% -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml
-Xmx1g org.apache.ctakes.ytex.kernel.dao.ConceptDaoImpl -name concept
graph nameC:\java\apache-ctakes-3.1.2-SNAPSHOTjava -cp %CLASSPATH%
-Dlog4j.configuration=
file:/%CTAKES_HOME%/config/log4j.xml -Xmx1g
org.apache.ctakes.ytex.kernel.dao.ConceptDaoImpl -name test

I get this output (indicating that the corresponding properties file can't
be found):
log4j: reset attribute= false.
log4j: Threshold =null.
log4j: Level value for root is [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: org.apache.log4j.PatternLayout
log4j: Setting property [conversionPattern] to [%d{dd MMM HH:mm:ss}
%5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
*properties file could not be located:
org/apache/ctakes/ytex/conceptGraph/test.xml *

If you're on linux, can you play around with the file url for log4j?

Best,

On Sun, Jun 29, 2014 at 6:30 PM, John Green john.travis.gr...@gmail.com
wrote:

Successfully ran command to build the concept graph, however, it seems to
be failing silently. The version issued with ytex is 10m. I expected, worst
case, for mine to be the same, it was 400 bytes (the .gz output). I cant
find anything logged. log4j is complaining it isnt setup correctly,
however, it is directed to the correct config file. Im not familiar with
this logging program, so perhaps the errors are ending up in some kind of
/dev/null.

Also, the web app is only loading sct-msh-csp-aod. I see that in the same
dir there are the others you spoke of. The web app doesnt give an option
for using them (this makes sense as the command line output makes no
mention of loading them) but I can find where what is loaded is defined.

I hope that wasnt too poorly explained,
Thanks,
John

On Sun, Jun 29, 2014 at 9:10 PM, John Green john.travis.gr...@gmail.com
wrote:

Hi Vijay, thank you for your time.

Your documentation was quite good. I had no problem setting up ytex with
UMLS running on my local mysql server. Where I ran into problems was
understanding how to launch the web service (also, is there anyway to run
this in a RESTful mode? Btw, the informatics.yale links returns 502).
After
I did get it launched, and the confusion was probably all my fault, the
concepts available to the similarity fields seemed very sparse; I just
started typing randomly, hematochezia, choledocholithiasis, etc, and
nothing would come up. The best I got was gallbladder function test,
which,
if Im understanding it right, would be an alkphos, but alkaline
phosphatase
didnt come up, which led to me to believe they were smaller sets of the
the
snomed, mesh, etc compilations (as I checked the UMLS db and these
concepts
are there).

So bottom line: are the ones that shipped watered down versions? And if
not, why are my concepts coming up short? If you give me a hint at where
to
check Ill investigate.

Thanks!
JG

On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote:

Hi John,

YTEX ships with 3 concept graphs (see

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):

These concept graphs are included in ytex resources zip (see
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation):
3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:

- Concept Graphs derived from the UMLS2013AA used to compute semantic
similarity measures

HTH,

-vj

On Sun, Jun 29, 2014 at 2:25 PM, John Green

Re: Building

2014-07-02 Thread vijay garla

The ytexWeb application tries to look up concepts from terms using the ytex
dictionary lookup table, which is a small subset of the UMLS. Can you try
specifying cuis? That skips the lookup - if the concepts are in the
concept graph, this will work.

Best,

On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com
wrote:

Hi Vijay, thank you for your time.

Your documentation was quite good. I had no problem setting up ytex with
UMLS running on my local mysql server. Where I ran into problems was
understanding how to launch the web service (also, is there anyway to run
this in a RESTful mode? Btw, the informatics.yale links returns 502). After
I did get it launched, and the confusion was probably all my fault, the
concepts available to the similarity fields seemed very sparse; I just
started typing randomly, hematochezia, choledocholithiasis, etc, and
nothing would come up. The best I got was gallbladder function test, which,
if Im understanding it right, would be an alkphos, but alkaline phosphatase
didnt come up, which led to me to believe they were smaller sets of the the
snomed, mesh, etc compilations (as I checked the UMLS db and these concepts
are there).

I think I got that execution command from the code.google, which is
probably why it was stale. I did not see the ytex semantic similarity guide
under the ctakes components part (sorry, thanks for pointing me there, ill
get to work on reading it).

So bottom line: are the ones that shipped watered down versions? And if
not, why are my concepts coming up short? If you give me a hint at where to
check Ill investigate.

Thanks!
JG

On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote:

Hi John,

YTEX ships with 3 concept graphs (see

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):

These concept graphs are included in ytex resources zip (see
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation):
3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:

- Concept Graphs derived from the UMLS2013AA used to compute semantic
similarity measures

HTH,

-vj

On Sun, Jun 29, 2014 at 2:25 PM, John Green john.travis.gr...@gmail.com

wrote:

I got the semantic similarity web app running in ytex. Im still
learning
umls terminology, but I believe it says that out of the box its concept
graphs are limited to the free set from umls? Does this mean without
permissions? Similar to ctakes with umls rights? The concepts available
seem limited so this would make sense.

So, to take full advantage I would need to rebuild the concept graph,
correct? Im in the process of doing this but getting classpath errors.
I
used java a bit ten years ago, so you can probably guess these will
take
me
a minute to resolve. Notably, it is complaining about
ytex.kernel.dao.ConceptDaoImpl.

Thanks all,

—
Sent from Mailbox for iPhone

41 matches

Mail list logo