Hi Sean, I finally had a chance to look at the SNOMEDCT issue further regarding the codingScheme populating using the default value. What I found was in the dictionary tool when running the CodeMapCreator, when the CuiCodesDbWriter is called, the collection uses the name passed into the method, which is SNOMEDCT. However, if you are using SNOMEDCT_US the collection name is SNOMEDCT_US instead of SNOMEDCT, so it never populates the hsqldb. Obviously an easy change to make, but thought it might be helpful feedback.
Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 21, 2015 10:39 AM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Hi Brandon, Sorry for the late reply - I've been out for an extended weekend. The coding scheme change is fairly simply explained (imo). The plain old CUI is not a snomed code. If the snomed codes are reported by ctakes (uncomment the snomed line in ctakesHsql.xml ) then their UmlsConcept entries in the ontology array have the coding scheme name "SNOMEDCT". <!-- Optional tables for optional term info. Uncommenting these lines alone may not persist term information; persistence depends upon the TermConsumer. --> <property key="snomedTable" value="snomedct"/> Basically, the "CTAKES" name indicates that the scheme only contains Umls Cuis that have TUIs of the default ctakes configuration. ctakes does not use all umls tuis, therefore I did not name the scheme "UMLS". If you make a custom scheme (etc.) you can change the name in cTakesHsql.xml or in a custom .xml <!-- Depending upon the consumer, the value of codingScheme may or may not be used. With the packaged consumers, codingScheme is a default value used only for cuis that do not have secondary codes (snomed, rxnorm, etc.) --> <property key="codingScheme" value="CTAKES"/> The " RelationsExtractor" in the dictionary creator tool is completely experimental and unfinished - but perhaps some day it will throw umls relations into a format that ctakes can directly use. For the time being it should be avoided. Sean -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Thursday, September 17, 2015 10:23 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update You can disregard my question about the relation extraction as I fixed this by building the new dictionary with the default data files in the dictionarytool. I am curious about the SNOMED change still though. Thanks, Brandon -----Original Message----- From: Geise, Brandon D. Sent: Thursday, September 17, 2015 9:40 PM To: cTAKES Developer list <dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Thanks Dmitriy. I was referring to the RelationsExtractor class found in the dictionarytool. On a similar note, the coding scheme for all SNOMEDCT codes for the new dictionary is CTAKES compared to SNOMED with the UMLS version packaged with cTakes. Is there something else I need to run for the dictionary creation that I'm missing? Thanks, Brandon -----Original Message----- From: Dligach, Dmitriy [mailto:dmitriy.dlig...@childrens.harvard.edu] Sent: Thursday, September 17, 2015 8:42 PM To: cTAKES Developer list <dev@ctakes.apache.org> Subject: Re: Fast Dictionary Update Hi Brandon, Relation extraction at the moment only handles two specific relation types: LocationOf and DegreeOf. You are welcome to run it if you need these specific relations. Dima -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and Harvard Medical School (617) 651-0397 On Sep 17, 2015, at 17:08, Geise, Brandon D. <bdge...@geisinger.edu<mailto:bdge...@geisinger.edu>> wrote: Does the RelationsExtractor need to be run in order to generate information on relationships from cTakes? When running with 2011 UMLS dictionary I'm able to get relationships for BodyLocationMentions but with the dictionary I created I am not getting this information. Any advice? Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Thursday, September 17, 2015 1:18 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update It claims that the database is connected and the preceding line of are spat out during loading, which took ~3-4 seconds (so something was there): ............ 17 Sep 2015 12:58:58 INFO JdbcConnectionFactory - Database connected Strange. I don't really know what to tell you right now. Perhaps something will click with me later ... Did you also run org.apache.ctakes.dictionarytool.CodeMapCreator ? It isn't strictly necessary but it stores the tuis in the database so that cTakes can identify the semantic group of a mention. -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Thursday, September 17, 2015 1:02 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Not specifically loaded. Here's what I see when loading the pipeline: 17 Sep 2015 12:58:54 INFO JdbcConnectionFactory - Connecting to jdbc:hsqldb:file:path/to/ctakes/ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/UMLS2015/snorx2015: ............ 17 Sep 2015 12:58:58 INFO JdbcConnectionFactory - Database connected -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Thursday, September 17, 2015 12:57 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Making an alternate copy of cTakesHsql.xml and pointing to the new dictionary is all that is necessary. Do you see a message in the initialization output indicating that the dictionary db has been loaded? -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Thursday, September 17, 2015 12:54 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Great, thanks both seemed to work for populating the script table. Besides the path to the new dictionary needing to be changed in cTakesHsql.xml, does anything else need to be modified to use the new dictionary? My pipeline runs however there aren't any annotations related to the UMLS concepts. The only annotations I'm seeing are date, roman numeral, or modifier related. (My pipeline if UMLSFastProcessor with additions for modifiers and templatefiller). Any suggestions would be appreciated. Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Thursday, September 17, 2015 10:40 AM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Correct, Hsql should automatically read the .log file upon first use, and then perform the inserts into the .script file. In case you want to play it safe, check the README in the resource/ directory (where you got the hsqldb template). The last paragraph indicates how you can launch a simple sql tool to play with the db. You will need to change the name of the db accordingly. Upon first launch of the sql tool everything should be moved from the .log to the .script file. It is a strange setup/workflow, but it seems to work. Sean -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Thursday, September 17, 2015 10:31 AM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update When I run the tool it outputs a file with a .log extension that has all the insert statements. Do I copy this to the .script template from memcachedb in the dictionarytool project or should the inserts be put into the .script file by default on the program execution? Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 9:59 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Excellent! -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Wednesday, September 16, 2015 9:55 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update No, I had changed it on the Tiny source file. I just changed the default file and it looks to be running as expected now. Thank you for all your help and patience, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 9:35 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Did you add it to data/default/ CtakesSources.txt ? If not then you need to specify -src ./data/tiny/CtakesSources.txt Sorry for any confusion. As soon as my inet isn't overloaded I'll download 2015AA and see if I can build a dictionary. -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Wednesday, September 16, 2015 8:14 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>; dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Sean, I added that and still had the same issue. Thanks, Brandon _____________________________ From: Finan, Sean <sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu><mailto:sean.fi...@childrens.harvard.edu>> Sent: Wednesday, September 16, 2015 7:56 PM Subject: RE: Fast Dictionary Update To: <dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>> And you added "SNOMEDCT_US" to data/tiny/CtakesSources.txt ? -----Original Message----- From: Tomasz Oliwa [mailto:ol...@uchicago.edu] Sent: Wednesday, September 16, 2015 7:13 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update I have exactly the same problem with the tool. A grep on MRCONSO.RRF for "SNOMEDCT" or for "SNOMEDCT_US" shows many lines. ________________________________________ From: Geise, Brandon D. [bdge...@geisinger.edu<mailto:bdge...@geisinger.edu><mailto:bdge...@geisinger.edu>] Sent: Wednesday, September 16, 2015 5:05 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Yes, it finds "SNOMEDCT_US". -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 5:17 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Ah, now I see what you mean. Can you do a grep on your MRCONSO.RRF for "SNOMEDCT" ? -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Wednesday, September 16, 2015 4:04 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update I tried changing as suggested. Below is what I see for the snomed piece, but for RXNorm it writes terms at the end. Reading list of Source Types from ./data/default/CtakesSources.txt File Lines 1 list of Source Types 1 Reading list of Tuis from ./data/tiny/CtakesSnomedTuis.txt File Lines 24 list of Tuis 24 Compiling list of Cuis with wanted Tuis using /patto/UMLS_Current_Version/META/MRSTY.RRF File Line 200000 Cuis 60895 File Line 300000 Cuis 85750 File Line 400000 Cuis 135098 File Line 600000 Cuis 183925 File Line 1700000<tel:1700000> Cuis 376338 File Line 1800000<tel:1800000> Cuis 471009 File Line 1900000<tel:1900000> Cuis 568375 File Line 2100000<tel:2100000> Cuis 674715 File Line 2800000<tel:2800000> Cuis 903583 File Line 3300000<tel:3300000> Cuis 973791 File Lines 3370173<tel:3370173> Cuis 999451 ..................................................File Line 100000 Valid Cuis 0 ..................................................File Line 200000 Valid Cuis 0 ..................................................File Line 300000 Valid Cuis 0 ..................................................File Line 400000 Valid Cuis 0 ..................................................File Line 500000 Valid Cuis 0 ..................................................File Line 600000 Valid Cuis 0 ..................................................File Line 700000 Valid Cuis 0 ..................................................File Line 800000 Valid Cuis 0 ..................................................File Line 900000 Valid Cuis 0 ..................................................File Line 1000000<tel:1000000> Valid Cuis 0 ..................................................File Line 1100000<tel:1100000> Valid Cuis 0 ..................................................File Line 1200000<tel:1200000> Valid Cuis 0 ..................................................File Line 1300000<tel:1300000> Valid Cuis 0 ..................................................File Line 1400000<tel:1400000> Valid Cuis 0 ..................................................File Line 1500000<tel:1500000> Valid Cuis 0 ..................................................File Line 1600000<tel:1600000> Valid Cuis 0 ..................................................File Line 1700000<tel:1700000> Valid Cuis 0 ..................................................File Line 1800000<tel:1800000> Valid Cuis 0 ..................................................File Line 1900000<tel:1900000> Valid Cuis 0 ..................................................File Line 2000000<tel:2000000> Valid Cuis 0 ..................................................File Line 2100000<tel:2100000> Valid Cuis 0 ..................................................File Line 2200000<tel:2200000> Valid Cuis 0 ..................................................File Line 2300000<tel:2300000> Valid Cuis 0 ..................................................File Line 2400000<tel:2400000> Valid Cuis 0 ..................................................File Line 2500000<tel:2500000> Valid Cuis 0 ..................................................File Line 2600000<tel:2600000> Valid Cuis 0 ..................................................File Line 2700000<tel:2700000> Valid Cuis 0 ..................................................File Line 2800000<tel:2800000> Valid Cuis 0 ..................................................File Line 2900000<tel:2900000> Valid Cuis 0 ..................................................File Line 3000000<tel:3000000> Valid Cuis 0 ..................................................File Line 3100000<tel:3100000> Valid Cuis 0 ..................................................File Line 3200000<tel:3200000> Valid Cuis 0 ..................................................File Line 3300000<tel:3300000> Valid Cuis 0 ..................................................File Line 3400000<tel:3400000> Valid Cuis 0 ..................................................File Line 3500000<tel:3500000> Valid Cuis 0 ..................................................File Line 3600000<tel:3600000> Valid Cuis 0 ..................................................File Line 3700000<tel:3700000> Valid Cuis 0 ..................................................File Line 3800000<tel:3800000> Valid Cuis 0 ..................................................File Line 3900000<tel:3900000> Valid Cuis 0 ..................................................File Line 4000000<tel:4000000> Valid Cuis 0 ..................................................File Line 4100000<tel:4100000> Valid Cuis 0 ..................................................File Line 4200000<tel:4200000> Valid Cuis 0 ..................................................File Line 4300000<tel:4300000> Valid Cuis 0 ..................................................File Line 4400000<tel:4400000> Valid Cuis 0 ..................................................File Line 4500000<tel:4500000> Valid Cuis 0 ..................................................File Line 4600000<tel:4600000> Valid Cuis 0 ..................................................File Line 4700000<tel:4700000> Valid Cuis 0 ..................................................File Line 4800000<tel:4800000> Valid Cuis 0 ..................................................File Line 4900000<tel:4900000> Valid Cuis 0 ..................................................File Line 5000000<tel:5000000> Valid Cuis 0 ..................................................File Line 5100000<tel:5100000> Valid Cuis 0 ..................................................File Line 5200000<tel:5200000> Valid Cuis 0 ..................................................File Line 5300000<tel:5300000> Valid Cuis 0 ..................................................File Line 5400000<tel:5400000> Valid Cuis 0 ..................................................File Line 5500000<tel:5500000> Valid Cuis 0 ..................................................File Line 5600000<tel:5600000> Valid Cuis 0 ..................................................File Line 5700000<tel:5700000> Valid Cuis 0 ..................................................File Line 5800000<tel:5800000> Valid Cuis 0 ..................................................File Line 5900000<tel:5900000> Valid Cuis 0 ..................................................File Line 6000000<tel:6000000> Valid Cuis 0 ..................................................File Line 6100000<tel:6100000> Valid Cuis 0 ..................................................File Line 6200000<tel:6200000> Valid Cuis 0 ..................................................File Line 6300000<tel:6300000> Valid Cuis 0 ..................................................File Line 6400000<tel:6400000> Valid Cuis 0 ..................................................File Line 6500000<tel:6500000> Valid Cuis 0 ..................................................File Line 6600000<tel:6600000> Valid Cuis 0 ..................................................File Line 6700000<tel:6700000> Valid Cuis 0 ..................................................File Line 6800000<tel:6800000> Valid Cuis 0 ..................................................File Line 6900000<tel:6900000> Valid Cuis 0 ..................................................File Line 7000000<tel:7000000> Valid Cuis 0 ..................................................File Line 7100000<tel:7100000> Valid Cuis 0 ..................................................File Line 7200000<tel:7200000> Valid Cuis 0 ..................................................File Line 7300000<tel:7300000> Valid Cuis 0 ..................................................File Line 7400000<tel:7400000> Valid Cuis 0 ..................................................File Line 7500000<tel:7500000> Valid Cuis 0 ..................................................File Line 7600000<tel:7600000> Valid Cuis 0 ..................................................File Line 7700000<tel:7700000> Valid Cuis 0 ..................................................File Line 7800000<tel:7800000> Valid Cuis 0 ..................................................File Line 7900000<tel:7900000> Valid Cuis 0 ..................................................File Line 8000000<tel:8000000> Valid Cuis 0 ..................................................File Line 8100000<tel:8100000> Valid Cuis 0 ..................................................File Line 8200000<tel:8200000> Valid Cuis 0 ..................................................File Line 8300000<tel:8300000> Valid Cuis 0 ..................................................File Line 8400000<tel:8400000> Valid Cuis 0 ..................................................File Line 8500000<tel:8500000> Valid Cuis 0 ..................................................File Line 8600000<tel:8600000> Valid Cuis 0 ..................................................File Line 8700000<tel:8700000> Valid Cuis 0 ..................................................File Line 8800000<tel:8800000> Valid Cuis 0 .............File Lines 8827152<tel:8827152> Valid Cuis 0 Compiling map of Umls Cuis and Texts ..................................................File Line 100000 Terms 0 ..................................................File Line 200000 Terms 0 ..................................................File Line 300000 Terms 0 ..................................................File Line 400000 Terms 0 ..................................................File Line 500000 Terms 0 ..................................................File Line 600000 Terms 0 ..................................................File Line 700000 Terms 0 ..................................................File Line 800000 Terms 0 ..................................................File Line 900000 Terms 0 ..................................................File Line 1000000<tel:1000000> Terms 0 ..................................................File Line 1100000<tel:1100000> Terms 0 ..................................................File Line 1200000<tel:1200000> Terms 0 ..................................................File Line 1300000<tel:1300000> Terms 0 ..................................................File Line 1400000<tel:1400000> Terms 0 ..................................................File Line 1500000<tel:1500000> Terms 0 ..................................................File Line 1600000<tel:1600000> Terms 0 ..................................................File Line 1700000<tel:1700000> Terms 0 ..................................................File Line 1800000<tel:1800000> Terms 0 ..................................................File Line 1900000<tel:1900000> Terms 0 ..................................................File Line 2000000<tel:2000000> Terms 0 ..................................................File Line 2100000<tel:2100000> Terms 0 ..................................................File Line 2200000<tel:2200000> Terms 0 ..................................................File Line 2300000<tel:2300000> Terms 0 ..................................................File Line 2400000<tel:2400000> Terms 0 ..................................................File Line 2500000<tel:2500000> Terms 0 ..................................................File Line 2600000<tel:2600000> Terms 0 ..................................................File Line 2700000<tel:2700000> Terms 0 ..................................................File Line 2800000<tel:2800000> Terms 0 ..................................................File Line 2900000<tel:2900000> Terms 0 ..................................................File Line 3000000<tel:3000000> Terms 0 ..................................................File Line 3100000<tel:3100000> Terms 0 ..................................................File Line 3200000<tel:3200000> Terms 0 ..................................................File Line 3300000<tel:3300000> Terms 0 ..................................................File Line 3400000<tel:3400000> Terms 0 ..................................................File Line 3500000<tel:3500000> Terms 0 ..................................................File Line 3600000<tel:3600000> Terms 0 ..................................................File Line 3700000<tel:3700000> Terms 0 ..................................................File Line 3800000<tel:3800000> Terms 0 ..................................................File Line 3900000<tel:3900000> Terms 0 ..................................................File Line 4000000<tel:4000000> Terms 0 ..................................................File Line 4100000<tel:4100000> Terms 0 ..................................................File Line 4200000<tel:4200000> Terms 0 ..................................................File Line 4300000<tel:4300000> Terms 0 ..................................................File Line 4400000<tel:4400000> Terms 0 ..................................................File Line 4500000<tel:4500000> Terms 0 ..................................................File Line 4600000<tel:4600000> Terms 0 ..................................................File Line 4700000<tel:4700000> Terms 0 ..................................................File Line 4800000<tel:4800000> Terms 0 ..................................................File Line 4900000<tel:4900000> Terms 0 ..................................................File Line 5000000<tel:5000000> Terms 0 ..................................................File Line 5100000<tel:5100000> Terms 0 ..................................................File Line 5200000<tel:5200000> Terms 0 ..................................................File Line 5300000<tel:5300000> Terms 0 ..................................................File Line 5400000<tel:5400000> Terms 0 ..................................................File Line 5500000<tel:5500000> Terms 0 ..................................................File Line 5600000<tel:5600000> Terms 0 ..................................................File Line 5700000<tel:5700000> Terms 0 ..................................................File Line 5800000<tel:5800000> Terms 0 ..................................................File Line 5900000<tel:5900000> Terms 0 ..................................................File Line 6000000<tel:6000000> Terms 0 ..................................................File Line 6100000<tel:6100000> Terms 0 ..................................................File Line 6200000<tel:6200000> Terms 0 ..................................................File Line 6300000<tel:6300000> Terms 0 ..................................................File Line 6400000<tel:6400000> Terms 0 ..................................................File Line 6500000<tel:6500000> Terms 0 ..................................................File Line 6600000<tel:6600000> Terms 0 ..................................................File Line 6700000<tel:6700000> Terms 0 ..................................................File Line 6800000<tel:6800000> Terms 0 ..................................................File Line 6900000<tel:6900000> Terms 0 ..................................................File Line 7000000<tel:7000000> Terms 0 ..................................................File Line 7100000<tel:7100000> Terms 0 ..................................................File Line 7200000<tel:7200000> Terms 0 ..................................................File Line 7300000<tel:7300000> Terms 0 ..................................................File Line 7400000<tel:7400000> Terms 0 ..................................................File Line 7500000<tel:7500000> Terms 0 ..................................................File Line 7600000<tel:7600000> Terms 0 ..................................................File Line 7700000<tel:7700000> Terms 0 ..................................................File Line 7800000<tel:7800000> Terms 0 ..................................................File Line 7900000<tel:7900000> Terms 0 ..................................................File Line 8000000<tel:8000000> Terms 0 ..................................................File Line 8100000<tel:8100000> Terms 0 ..................................................File Line 8200000<tel:8200000> Terms 0 ..................................................File Line 8300000<tel:8300000> Terms 0 ..................................................File Line 8400000<tel:8400000> Terms 0 ..................................................File Line 8500000<tel:8500000> Terms 0 ..................................................File Line 8600000<tel:8600000> Terms 0 ..................................................File Line 8700000<tel:8700000> Terms 0 ..................................................File Line 8800000<tel:8800000> Terms 0 .............File Line 8827152<tel:8827152> Terms 0 Writing map of Cuis and Texts to pathtoUmls2015.bsv -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 4:00 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Thank you! I believe that was a change post 2011! You should actually be ok with both SNOMEDCT and SNOMEDCT_US in CtakesSources.txt Cheers, Sean -----Original Message----- From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com] Sent: Wednesday, September 16, 2015 3:43 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: Re: Fast Dictionary Update If this can helps, I had to replace 'SNOMEDCT' with 'SNOMEDCT_US' in CtakesSources.txt. On Wed, Sep 16, 2015 at 2:33 PM, Finan, Sean < sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu><mailto:sean.fi...@childrens.harvard.edu>> wrote: I'm not sure that I understand your question. As I sent it, the anat, snomed and rxnorm are not separate runs. The args line I sent earlier is for a single run that will create a dictionary with snomed and rxnorm terms. The anatomy tui list has a special use in correctly processing snomed codes. -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Wednesday, September 16, 2015 3:27 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Ok, hopefully one last question. Based on your example everything runs, however the Anat and Snomed runs don't produce any valid CUIs but RXNorm does. I'm not sure if this has anything to do with it but every UMLS source read is against MRSTY. Here's my command java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls /path/to/UMLS/META -fd ./data/tiny -atui ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt -ol path o ileUmls2015.bsv Any suggestions? Thanks again, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 3:05 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Yes, that will make the rare word dictionary in a memory-based hsql database - the same as the default for the dictionary-lookup-fast module. -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Wednesday, September 16, 2015 2:42 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Thanks Sean, much appreciated. To clarify the example below would create the dictionary for use for the rare word approach? Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 2:16 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Hi Brandon, I just checked in a bin/dictionarytool.zip It should have everything that you need (.jar, lib/, data/). java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.DictionaryCreator2 [args] Should do the trick. To recreate a 2015 version of the current ctakes dictionary, the arguments are: -umls my/path/to/2015AA/META -fd ./data/tiny -atui ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt -db jdbc:hsqldb:file:my/path/to/snorx2015 -tbl CUI_TERMS Create my/path/to/snorx2015 by copying resources/memdbtemplate/ctakesumls.properties to my/path/to/snorx2015.properties - there is a resources/README about this. Before populating a DB, I usually do a trial run first, writing to a flat file. Replace "-db ... -tbl ..." with "-ol my/path/to/testout.bsv" Sean -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Wednesday, September 16, 2015 1:49 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Hi Sean, That'd be great. I think I'm building it incorrectly because after I build the jar and try to run specifying DictionaryCreator2 as the main class it says it can't find it. I'm not too familiar with Java and building projects/jars so it could be my ignorance causing the problem. Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 1:45 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org> Subject: RE: Fast Dictionary Update Hi Brandon, I can send you a jar or commit one pre-built. What goes wrong when you try to build the tool? Sean -----Original Message----- From: Geise, Brandon D. [mailto:bdge...@geisinger.edu] Sent: Wednesday, September 16, 2015 1:23 PM To: 'dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>' Subject: Fast Dictionary Update Does someone have the DictionaryTool jar available? I'm having trouble creating the jar file from the project and would like to be able to create an updated UMLS fast dictionary for 2015. Thanks, Brandon IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.