Hi,

The build-in cTAKES Fast Dictionary (UMLS 2011) contains about ~490.000 rows 
(each synonym of the same CUI counted as a row), while the 2015 UMLS Fast 
Dictionary created via the dictionarytool results in about ~660.000 rows.

I noticed that CUIs tend to be expressed with more synonyms in the 2015 UMLS 
Fast Dictionary, this is what I suppose leads to the increase of rows. For 
instance, the CUI C0231749 "knee pain" has 15 rows in the default cTAKES UMLS, 
while 24 rows in the 2015 one. 

How can I control which subset of synonyms is taken by the dictionarytool per 
CUI when the Fast Dictionary is created?

The UMLS metathesaurus itself has (much) more synonyms than 24 for C0231749, so 
I image somewhere in the dictionarytool the subset can be setup?

Thanks,
Tomasz 


________________________________________
From: Finan, Sean [sean.fi...@childrens.harvard.edu]
Sent: Monday, October 19, 2015 9:02 AM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Hi Brandon,

Good catch, and thanks for letting me know.  Feel free to check in a fix, 
otherwise it will probably be a while before I get to it.

Thanks,
Sean

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Monday, October 19, 2015 8:50 AM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Hi Sean,

I finally had a chance to look at the SNOMEDCT issue further regarding the 
codingScheme populating using the default value.  What I found was in the 
dictionary tool when running the CodeMapCreator, when the CuiCodesDbWriter is 
called, the collection uses the name passed into the method, which is SNOMEDCT. 
 However, if you are using SNOMEDCT_US the collection name is SNOMEDCT_US 
instead of SNOMEDCT, so it never populates the hsqldb.  Obviously an easy 
change to make, but thought it might be helpful feedback.

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, September 21, 2015 10:39 AM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Hi Brandon,

Sorry for the late reply - I've been out for an extended weekend.

The coding scheme change is fairly simply explained (imo).  The plain old CUI 
is not a snomed code.  If the snomed codes are reported by ctakes (uncomment 
the snomed line in ctakesHsql.xml ) then their UmlsConcept entries in the 
ontology array have the coding scheme name "SNOMEDCT".
            <!-- Optional tables for optional term info.
            Uncommenting these lines alone may not persist term information;
            persistence depends upon the TermConsumer.  -->
            <property key="snomedTable" value="snomedct"/>

Basically, the "CTAKES" name indicates that the scheme only contains Umls Cuis 
that have TUIs of the default ctakes configuration.  ctakes does not use all 
umls tuis, therefore I did not name the scheme "UMLS".  If you make a custom 
scheme (etc.) you can change the name in cTakesHsql.xml or in a custom .xml
          <!-- Depending upon the consumer, the value of codingScheme may or 
may not be used.  With the packaged consumers,
          codingScheme is a default value used only for cuis that do not have 
secondary codes (snomed, rxnorm, etc.)  -->
         <property key="codingScheme" value="CTAKES"/>


The " RelationsExtractor" in the dictionary creator tool is completely 
experimental and unfinished - but perhaps some day it will throw umls relations 
into a format that ctakes can directly use.  For the time being it should be 
avoided.

Sean

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Thursday, September 17, 2015 10:23 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

You can disregard my question about the relation extraction as I fixed this by 
building the new dictionary with the default data files in the dictionarytool.  
I am curious about the SNOMED change still though.

Thanks,
Brandon

-----Original Message-----
From: Geise, Brandon D.
Sent: Thursday, September 17, 2015 9:40 PM
To: cTAKES Developer list <dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Thanks Dmitriy.  I was referring to the RelationsExtractor class found in the 
dictionarytool.  On a similar note, the coding scheme for all SNOMEDCT codes 
for the new dictionary is CTAKES compared to SNOMED with the UMLS version 
packaged with cTakes.  Is there something else I need to run for the dictionary 
creation that I'm missing?

Thanks,
Brandon

-----Original Message-----
From: Dligach, Dmitriy [mailto:dmitriy.dlig...@childrens.harvard.edu]
Sent: Thursday, September 17, 2015 8:42 PM
To: cTAKES Developer list <dev@ctakes.apache.org>
Subject: Re: Fast Dictionary Update

Hi Brandon,

Relation extraction at the moment only handles two specific relation types: 
LocationOf and DegreeOf. You are welcome to run it if you need these specific 
relations.


Dima

--
Dmitriy (Dima) Dligach, Ph.D.
Boston Children's Hospital and Harvard Medical School
(617) 651-0397



On Sep 17, 2015, at 17:08, Geise, Brandon D. 
<bdge...@geisinger.edu<mailto:bdge...@geisinger.edu>> wrote:

Does the RelationsExtractor need to be run in order to generate information on 
relationships from cTakes?  When running with 2011 UMLS dictionary I'm able to 
get relationships for BodyLocationMentions but with the dictionary I created I 
am not getting this information.  Any advice?

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 17, 2015 1:18 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

It claims that the database is connected and the preceding line of are spat out 
during loading, which took ~3-4 seconds (so something was there):
............
17 Sep 2015 12:58:58  INFO JdbcConnectionFactory -  Database connected

Strange.  I don't really know what to tell you right now.  Perhaps something 
will click with me later ...


Did you also run org.apache.ctakes.dictionarytool.CodeMapCreator ?  It isn't 
strictly necessary but it stores the tuis in the database so that cTakes can 
identify the semantic group of a mention.




-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Thursday, September 17, 2015 1:02 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Not specifically loaded.  Here's what I see when loading the pipeline:

17 Sep 2015 12:58:54  INFO JdbcConnectionFactory - Connecting to 
jdbc:hsqldb:file:path/to/ctakes/ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/UMLS2015/snorx2015:
............
17 Sep 2015 12:58:58  INFO JdbcConnectionFactory -  Database connected

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 17, 2015 12:57 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Making an alternate copy of cTakesHsql.xml and pointing to the new dictionary 
is all that is necessary.  Do you see a message in the initialization output 
indicating that the dictionary db has been loaded?

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Thursday, September 17, 2015 12:54 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Great, thanks both seemed to work for populating the script table.

Besides the path to the new dictionary needing to be changed in cTakesHsql.xml, 
does anything else need to be modified to use the new dictionary?  My pipeline 
runs however there aren't any annotations related to the UMLS concepts.  The 
only annotations I'm seeing are date, roman numeral, or modifier related. (My 
pipeline if UMLSFastProcessor with additions for modifiers and templatefiller). 
 Any suggestions would be appreciated.

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 17, 2015 10:40 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Correct, Hsql should automatically read the .log file upon first use, and then 
perform the inserts into the .script file.

In case you want to play it safe, check the README in the resource/ directory 
(where you got the hsqldb template).  The last paragraph indicates how you can 
launch a simple sql tool to play with the db.  You will need to change the name 
of the db accordingly.  Upon first launch of the sql tool everything should be 
moved from the .log to the .script file.   It is a strange setup/workflow, but 
it seems to work.

Sean

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Thursday, September 17, 2015 10:31 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

When I run the tool it outputs a file with a .log extension that has all the 
insert statements.  Do I copy this to the .script template from memcachedb in 
the dictionarytool project or should the inserts be put into the .script file 
by default on the program execution?

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 9:59 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Excellent!

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 9:55 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

No, I had changed it on the Tiny source file.  I just changed the default file 
and it looks to be running as expected now.

Thank you for all your help and patience, Brandon

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 9:35 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Did you add it to data/default/ CtakesSources.txt ?

If not then you need to specify -src ./data/tiny/CtakesSources.txt

Sorry for any confusion.

As soon as my inet isn't overloaded I'll download 2015AA and see if I can build 
a dictionary.

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 8:14 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>; 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Sean,

I added that and still had the same issue.

Thanks,
Brandon
_____________________________
From: Finan, Sean 
<sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu><mailto:sean.fi...@childrens.harvard.edu>>
Sent: Wednesday, September 16, 2015 7:56 PM
Subject: RE: Fast Dictionary Update
To: 
<dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>>


And you added "SNOMEDCT_US" to data/tiny/CtakesSources.txt ?

-----Original Message-----
From: Tomasz Oliwa [mailto:ol...@uchicago.edu]
Sent: Wednesday, September 16, 2015 7:13 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

I have exactly the same problem with the tool.

A grep on MRCONSO.RRF for "SNOMEDCT" or for "SNOMEDCT_US" shows many lines.

________________________________________
From: Geise, Brandon D. 
[bdge...@geisinger.edu<mailto:bdge...@geisinger.edu><mailto:bdge...@geisinger.edu>]
Sent: Wednesday, September 16, 2015 5:05 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Yes, it finds "SNOMEDCT_US".

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 5:17 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Ah, now I see what you mean. Can you do a grep on your MRCONSO.RRF for 
"SNOMEDCT" ?

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 4:04 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

I tried changing as suggested.

Below is what I see for the snomed piece, but for RXNorm it writes terms at the 
end.

Reading list of Source Types from ./data/default/CtakesSources.txt File Lines 1 
list of Source Types 1 Reading list of Tuis from 
./data/tiny/CtakesSnomedTuis.txt File Lines 24 list of Tuis 24 Compiling list 
of Cuis with wanted Tuis using /patto/UMLS_Current_Version/META/MRSTY.RRF
File Line 200000 Cuis 60895
File Line 300000 Cuis 85750
File Line 400000 Cuis 135098
File Line 600000 Cuis 183925
File Line 1700000<tel:1700000> Cuis 376338 File Line 1800000<tel:1800000> Cuis 
471009 File Line 1900000<tel:1900000> Cuis 568375 File Line 
2100000<tel:2100000> Cuis 674715 File Line 2800000<tel:2800000> Cuis 903583 
File Line 3300000<tel:3300000> Cuis 973791 File Lines 3370173<tel:3370173> Cuis 
999451 ..................................................File Line 100000 Valid 
Cuis 0 ..................................................File Line 200000 Valid 
Cuis 0 ..................................................File Line 300000 Valid 
Cuis 0 ..................................................File Line 400000 Valid 
Cuis 0 ..................................................File Line 500000 Valid 
Cuis 0 ..................................................File Line 600000 Valid 
Cuis 0 ..................................................File Line 700000 Valid 
Cuis 0 ..................................................File Line 800000 Valid 
Cuis 0 ..................................................File Line 900000 Valid 
Cuis 0 ..................................................File Line 
1000000<tel:1000000> Valid Cuis 0 
..................................................File Line 
1100000<tel:1100000> Valid Cuis 0 
..................................................File Line 
1200000<tel:1200000> Valid Cuis 0 
..................................................File Line 
1300000<tel:1300000> Valid Cuis 0 
..................................................File Line 
1400000<tel:1400000> Valid Cuis 0 
..................................................File Line 
1500000<tel:1500000> Valid Cuis 0 
..................................................File Line 
1600000<tel:1600000> Valid Cuis 0 
..................................................File Line 
1700000<tel:1700000> Valid Cuis 0 
..................................................File Line 
1800000<tel:1800000> Valid Cuis 0 
..................................................File Line 
1900000<tel:1900000> Valid Cuis 0 
..................................................File Line 
2000000<tel:2000000> Valid Cuis 0 
..................................................File Line 
2100000<tel:2100000> Valid Cuis 0 
..................................................File Line 
2200000<tel:2200000> Valid Cuis 0 
..................................................File Line 
2300000<tel:2300000> Valid Cuis 0 
..................................................File Line 
2400000<tel:2400000> Valid Cuis 0 
..................................................File Line 
2500000<tel:2500000> Valid Cuis 0 
..................................................File Line 
2600000<tel:2600000> Valid Cuis 0 
..................................................File Line 
2700000<tel:2700000> Valid Cuis 0 
..................................................File Line 
2800000<tel:2800000> Valid Cuis 0 
..................................................File Line 
2900000<tel:2900000> Valid Cuis 0 
..................................................File Line 
3000000<tel:3000000> Valid Cuis 0 
..................................................File Line 
3100000<tel:3100000> Valid Cuis 0 
..................................................File Line 
3200000<tel:3200000> Valid Cuis 0 
..................................................File Line 
3300000<tel:3300000> Valid Cuis 0 
..................................................File Line 
3400000<tel:3400000> Valid Cuis 0 
..................................................File Line 
3500000<tel:3500000> Valid Cuis 0 
..................................................File Line 
3600000<tel:3600000> Valid Cuis 0 
..................................................File Line 
3700000<tel:3700000> Valid Cuis 0 
..................................................File Line 
3800000<tel:3800000> Valid Cuis 0 
..................................................File Line 
3900000<tel:3900000> Valid Cuis 0 
..................................................File Line 
4000000<tel:4000000> Valid Cuis 0 
..................................................File Line 
4100000<tel:4100000> Valid Cuis 0 
..................................................File Line 
4200000<tel:4200000> Valid Cuis 0 
..................................................File Line 
4300000<tel:4300000> Valid Cuis 0 
..................................................File Line 
4400000<tel:4400000> Valid Cuis 0 
..................................................File Line 
4500000<tel:4500000> Valid Cuis 0 
..................................................File Line 
4600000<tel:4600000> Valid Cuis 0 
..................................................File Line 
4700000<tel:4700000> Valid Cuis 0 
..................................................File Line 
4800000<tel:4800000> Valid Cuis 0 
..................................................File Line 
4900000<tel:4900000> Valid Cuis 0 
..................................................File Line 
5000000<tel:5000000> Valid Cuis 0 
..................................................File Line 
5100000<tel:5100000> Valid Cuis 0 
..................................................File Line 
5200000<tel:5200000> Valid Cuis 0 
..................................................File Line 
5300000<tel:5300000> Valid Cuis 0 
..................................................File Line 
5400000<tel:5400000> Valid Cuis 0 
..................................................File Line 
5500000<tel:5500000> Valid Cuis 0 
..................................................File Line 
5600000<tel:5600000> Valid Cuis 0 
..................................................File Line 
5700000<tel:5700000> Valid Cuis 0 
..................................................File Line 
5800000<tel:5800000> Valid Cuis 0 
..................................................File Line 
5900000<tel:5900000> Valid Cuis 0 
..................................................File Line 
6000000<tel:6000000> Valid Cuis 0 
..................................................File Line 
6100000<tel:6100000> Valid Cuis 0 
..................................................File Line 
6200000<tel:6200000> Valid Cuis 0 
..................................................File Line 
6300000<tel:6300000> Valid Cuis 0 
..................................................File Line 
6400000<tel:6400000> Valid Cuis 0 
..................................................File Line 
6500000<tel:6500000> Valid Cuis 0 
..................................................File Line 
6600000<tel:6600000> Valid Cuis 0 
..................................................File Line 
6700000<tel:6700000> Valid Cuis 0 
..................................................File Line 
6800000<tel:6800000> Valid Cuis 0 
..................................................File Line 
6900000<tel:6900000> Valid Cuis 0 
..................................................File Line 
7000000<tel:7000000> Valid Cuis 0 
..................................................File Line 
7100000<tel:7100000> Valid Cuis 0 
..................................................File Line 
7200000<tel:7200000> Valid Cuis 0 
..................................................File Line 
7300000<tel:7300000> Valid Cuis 0 
..................................................File Line 
7400000<tel:7400000> Valid Cuis 0 
..................................................File Line 
7500000<tel:7500000> Valid Cuis 0 
..................................................File Line 
7600000<tel:7600000> Valid Cuis 0 
..................................................File Line 
7700000<tel:7700000> Valid Cuis 0 
..................................................File Line 
7800000<tel:7800000> Valid Cuis 0 
..................................................File Line 
7900000<tel:7900000> Valid Cuis 0 
..................................................File Line 
8000000<tel:8000000> Valid Cuis 0 
..................................................File Line 
8100000<tel:8100000> Valid Cuis 0 
..................................................File Line 
8200000<tel:8200000> Valid Cuis 0 
..................................................File Line 
8300000<tel:8300000> Valid Cuis 0 
..................................................File Line 
8400000<tel:8400000> Valid Cuis 0 
..................................................File Line 
8500000<tel:8500000> Valid Cuis 0 
..................................................File Line 
8600000<tel:8600000> Valid Cuis 0 
..................................................File Line 
8700000<tel:8700000> Valid Cuis 0 
..................................................File Line 
8800000<tel:8800000> Valid Cuis 0 .............File Lines 8827152<tel:8827152> 
Valid Cuis 0 Compiling map of Umls Cuis and Texts 
..................................................File Line 100000 Terms 0 
..................................................File Line 200000 Terms 0 
..................................................File Line 300000 Terms 0 
..................................................File Line 400000 Terms 0 
..................................................File Line 500000 Terms 0 
..................................................File Line 600000 Terms 0 
..................................................File Line 700000 Terms 0 
..................................................File Line 800000 Terms 0 
..................................................File Line 900000 Terms 0 
..................................................File Line 
1000000<tel:1000000> Terms 0 
..................................................File Line 
1100000<tel:1100000> Terms 0 
..................................................File Line 
1200000<tel:1200000> Terms 0 
..................................................File Line 
1300000<tel:1300000> Terms 0 
..................................................File Line 
1400000<tel:1400000> Terms 0 
..................................................File Line 
1500000<tel:1500000> Terms 0 
..................................................File Line 
1600000<tel:1600000> Terms 0 
..................................................File Line 
1700000<tel:1700000> Terms 0 
..................................................File Line 
1800000<tel:1800000> Terms 0 
..................................................File Line 
1900000<tel:1900000> Terms 0 
..................................................File Line 
2000000<tel:2000000> Terms 0 
..................................................File Line 
2100000<tel:2100000> Terms 0 
..................................................File Line 
2200000<tel:2200000> Terms 0 
..................................................File Line 
2300000<tel:2300000> Terms 0 
..................................................File Line 
2400000<tel:2400000> Terms 0 
..................................................File Line 
2500000<tel:2500000> Terms 0 
..................................................File Line 
2600000<tel:2600000> Terms 0 
..................................................File Line 
2700000<tel:2700000> Terms 0 
..................................................File Line 
2800000<tel:2800000> Terms 0 
..................................................File Line 
2900000<tel:2900000> Terms 0 
..................................................File Line 
3000000<tel:3000000> Terms 0 
..................................................File Line 
3100000<tel:3100000> Terms 0 
..................................................File Line 
3200000<tel:3200000> Terms 0 
..................................................File Line 
3300000<tel:3300000> Terms 0 
..................................................File Line 
3400000<tel:3400000> Terms 0 
..................................................File Line 
3500000<tel:3500000> Terms 0 
..................................................File Line 
3600000<tel:3600000> Terms 0 
..................................................File Line 
3700000<tel:3700000> Terms 0 
..................................................File Line 
3800000<tel:3800000> Terms 0 
..................................................File Line 
3900000<tel:3900000> Terms 0 
..................................................File Line 
4000000<tel:4000000> Terms 0 
..................................................File Line 
4100000<tel:4100000> Terms 0 
..................................................File Line 
4200000<tel:4200000> Terms 0 
..................................................File Line 
4300000<tel:4300000> Terms 0 
..................................................File Line 
4400000<tel:4400000> Terms 0 
..................................................File Line 
4500000<tel:4500000> Terms 0 
..................................................File Line 
4600000<tel:4600000> Terms 0 
..................................................File Line 
4700000<tel:4700000> Terms 0 
..................................................File Line 
4800000<tel:4800000> Terms 0 
..................................................File Line 
4900000<tel:4900000> Terms 0 
..................................................File Line 
5000000<tel:5000000> Terms 0 
..................................................File Line 
5100000<tel:5100000> Terms 0 
..................................................File Line 
5200000<tel:5200000> Terms 0 
..................................................File Line 
5300000<tel:5300000> Terms 0 
..................................................File Line 
5400000<tel:5400000> Terms 0 
..................................................File Line 
5500000<tel:5500000> Terms 0 
..................................................File Line 
5600000<tel:5600000> Terms 0 
..................................................File Line 
5700000<tel:5700000> Terms 0 
..................................................File Line 
5800000<tel:5800000> Terms 0 
..................................................File Line 
5900000<tel:5900000> Terms 0 
..................................................File Line 
6000000<tel:6000000> Terms 0 
..................................................File Line 
6100000<tel:6100000> Terms 0 
..................................................File Line 
6200000<tel:6200000> Terms 0 
..................................................File Line 
6300000<tel:6300000> Terms 0 
..................................................File Line 
6400000<tel:6400000> Terms 0 
..................................................File Line 
6500000<tel:6500000> Terms 0 
..................................................File Line 
6600000<tel:6600000> Terms 0 
..................................................File Line 
6700000<tel:6700000> Terms 0 
..................................................File Line 
6800000<tel:6800000> Terms 0 
..................................................File Line 
6900000<tel:6900000> Terms 0 
..................................................File Line 
7000000<tel:7000000> Terms 0 
..................................................File Line 
7100000<tel:7100000> Terms 0 
..................................................File Line 
7200000<tel:7200000> Terms 0 
..................................................File Line 
7300000<tel:7300000> Terms 0 
..................................................File Line 
7400000<tel:7400000> Terms 0 
..................................................File Line 
7500000<tel:7500000> Terms 0 
..................................................File Line 
7600000<tel:7600000> Terms 0 
..................................................File Line 
7700000<tel:7700000> Terms 0 
..................................................File Line 
7800000<tel:7800000> Terms 0 
..................................................File Line 
7900000<tel:7900000> Terms 0 
..................................................File Line 
8000000<tel:8000000> Terms 0 
..................................................File Line 
8100000<tel:8100000> Terms 0 
..................................................File Line 
8200000<tel:8200000> Terms 0 
..................................................File Line 
8300000<tel:8300000> Terms 0 
..................................................File Line 
8400000<tel:8400000> Terms 0 
..................................................File Line 
8500000<tel:8500000> Terms 0 
..................................................File Line 
8600000<tel:8600000> Terms 0 
..................................................File Line 
8700000<tel:8700000> Terms 0 
..................................................File Line 
8800000<tel:8800000> Terms 0 .............File Line 8827152<tel:8827152> Terms 
0 Writing map of Cuis and Texts to pathtoUmls2015.bsv

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 4:00 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Thank you! I believe that was a change post 2011! You should actually be ok 
with both SNOMEDCT and SNOMEDCT_US in CtakesSources.txt

Cheers,
Sean

-----Original Message-----
From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com]
Sent: Wednesday, September 16, 2015 3:43 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: Re: Fast Dictionary Update

If this can helps, I had to replace 'SNOMEDCT' with 'SNOMEDCT_US' in 
CtakesSources.txt.

On Wed, Sep 16, 2015 at 2:33 PM, Finan, Sean < 
sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu><mailto:sean.fi...@childrens.harvard.edu>>
 wrote:

I'm not sure that I understand your question. As I sent it, the anat, snomed 
and rxnorm are not separate runs. The args line I sent earlier is for a single 
run that will create a dictionary with snomed and rxnorm terms. The anatomy tui 
list has a special use in correctly processing snomed codes.

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 3:27 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Ok, hopefully one last question.

Based on your example everything runs, however the Anat and Snomed runs don't 
produce any valid CUIs but RXNorm does. I'm not sure if this has anything to do 
with it but every UMLS source read is against MRSTY.

Here's my command

java -cp dictionarytool.jar;lib/*
org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls /path/to/UMLS/META 
-fd ./data/tiny -atui ./data/tiny/CtakesAnatTuis.txt -tui 
./data/tiny/CtakesSnomedTuis.txt -ol path o ileUmls2015.bsv

Any suggestions?

Thanks again,
Brandon


-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 3:05 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Yes, that will make the rare word dictionary in a memory-based hsql database - 
the same as the default for the dictionary-lookup-fast module.

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 2:42 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Thanks Sean, much appreciated. To clarify the example below would create the 
dictionary for use for the rare word approach?

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 2:16 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Hi Brandon,

I just checked in a bin/dictionarytool.zip It should have everything that you 
need (.jar, lib/, data/).
java -cp dictionarytool.jar;lib/*
org.apache.ctakes.dictionarytool.DictionaryCreator2 [args] Should do the trick.

To recreate a 2015 version of the current ctakes dictionary, the arguments
are:
-umls my/path/to/2015AA/META -fd ./data/tiny -atui 
./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt -db
jdbc:hsqldb:file:my/path/to/snorx2015 -tbl CUI_TERMS

Create my/path/to/snorx2015 by copying
resources/memdbtemplate/ctakesumls.properties to 
my/path/to/snorx2015.properties - there is a resources/README about this.

Before populating a DB, I usually do a trial run first, writing to a flat file. 
Replace "-db ... -tbl ..." with "-ol my/path/to/testout.bsv"


Sean

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 1:49 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Hi Sean,

That'd be great.

I think I'm building it incorrectly because after I build the jar and try to 
run specifying DictionaryCreator2 as the main class it says it can't find it. 
I'm not too familiar with Java and building projects/jars so it could be my 
ignorance causing the problem.

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 1:45 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Hi Brandon,

I can send you a jar or commit one pre-built. What goes wrong when you try to 
build the tool?

Sean

-----Original Message-----
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 1:23 PM
To: 
'dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>'
Subject: Fast Dictionary Update

Does someone have the DictionaryTool jar available? I'm having trouble creating 
the jar file from the project and would like to be able to create an updated 
UMLS fast dictionary for 2015.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this message by anyone else 
is unauthorized. If you are not the intended recipient, any disclosure, 
copying, distribution or any action taken, or omitted to be taken, in reliance 
on it is prohibited and may be unlawful. If you have received this message in 
error, please delete all electronic copies of this message (and the documents 
attached to it, if any), destroy any hard copies you may have created and 
notify me immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.



















Reply via email to