Hi Chris,

I use bsv to denote "bar separated value" - also known as "pipe delimited".  I 
typically name the files with a ".bsv" extension, and they are just plain old 
boring ascii flat files.
There should be multiple columns in the bsv file separated by the '|' 
character.  The following are all valid per-line formats:
CUI|text
CUI|TUI|text
CUI|TUI|text|preferredText
It doesn't matter which format you choose, the parser will auto-detect 
per-line.  Starting a line with "//" or "#" indicates that it is a comment and 
should be ignored. 


To add the bsv dictionary to your pipeline you just need to edit the 
resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml file and add 
a couple new sections.
Within the <dictionaries> section, add:
      <dictionary>
         <name>CustomCuiRareWord</name>
         
<implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRareWordDictionary</implementationName>
         <properties>
            <property key="bsvPath" 
value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/>
         </properties>
      </dictionary>
Within the <conceptFactories> section, add:
      <conceptFactory>
         <name>CustomCuiConcept</name>
         
<implementationName>org.apache.ctakes.dictionary.lookup2.concept.BsvConceptFactory</implementationName>
         <properties>
            <property key="bsvPath" 
value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/>
         </properties>
      </conceptFactory>
Within the <dictionaryConceptPairs> section, add:
      <dictionaryConceptPair>
         <name>CustomPair</name>
         <dictionaryName>CustomCuiRareWord</dictionaryName>
         <conceptFactoryName>CustomCuiConcept</conceptFactoryName>
      </dictionaryConceptPair>
You can change all of the [Custom**] names, and you should obviously point to 
the actual path of your bsv file.

In addition to detecting your column count/style, upon loading the text will be 
lower-cased and tokenized and the terms will be indexed by rare word (for fast 
lookup).   Also, you do not need to write out the whole "C1234567" or "T123" 
cui tui codes.  The default prefix characters and padding zeros are 
automatically added.   Cuis "1" "01" "C1" "C01" will all be stored as 
"C0000001" and Tuis are handled likewise.  If you have custom cuis then it will 
honor non-"C" prefixes and still pad zeros automatically based upon the longest 
entry.  For instance, if your bsv has "CAM1", "CAM12" and "CAM12345" then the 
stored custom cuis should be "CAM00001", "CAM00012" and "CAM13245".

I think that is about all that there is to it ...

Sean

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:[email protected]] 
Sent: Tuesday, October 06, 2015 4:31 PM
To: [email protected]
Subject: Re: How to update cTAKES so that new top level categories come out 
based on local dictionary?

Hi Sean,



Thanks so much for your reply. For now I don’t care about the secondary

codes and I for sure have < 1000 terms. Can you tell me how to wire up

the BSV file by editing specific places in cTAKES? What specific commands

should I run or what format should the BSV file look like? I must admit

I have never heard of BSV files and the Internet varies on these between

Bluespec System Verilog and BASIC bsave files.



Then after I make the BSV file, what steps next? Recompile cTAKES? Can

I take the BSV file and simply point to it from a binary installation of

cTAKES? Thank you!



Cheers,

Chris



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Chris Mattmann, Ph.D.

Chief Architect

Instrument Software and Science Data Systems Section (398)

NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Office: 168-519, Mailstop: 168-527

Email: [email protected]

WWW:  
https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bLdoNVceobXShsqfGFdPDKSiq2WNSUbGDHdvmrfMj10&s=CXhGiFUuPnSekOe4GnsuxPOgYHbNp-hAnOD8jmB-lgc&e=
 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Adjunct Associate Professor, Computer Science Department

University of Southern California, Los Angeles, CA 90089 USA

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++











-----Original Message-----

From: "Finan, Sean" <[email protected]>

Reply-To: "[email protected]" <[email protected]>

Date: Tuesday, October 6, 2015 at 8:05 AM

To: "[email protected]" <[email protected]>

Subject: RE: How to update cTAKES so that new top level categories come

out based on local dictionary?



>Hi Chris,

>

>There are a few ways to do this:

>1.  Create an additional dictionary with the terms of interest and add it

>as a source

>2.  Create a new dictionary hsqldb that contains everything, old and new

>3.  Add to the existing hsqldb dictionary

>

>The best approach for you would probably depend upon

>1.  How many new terms you have

>2.  Whether or not you desire additional codes, i.e. rxnorm, snomed

>

>If you don't have many new terms (<1000) and you don't care about

>secondary codes then the easiest thing would be to create a BSV file with

>the new terms and cuis.

>

>If you have a lot of new terms or do care about secondary codes, then a

>less facile solution would be to create a new hsqldb with only the new

>info or a complete replacement with new and old/existing terms.  Of the

>two hsql options creating a new all-inclusive database would probably be

>easier unless you want to learn the ins and outs of hsql.  If all of the

>terms are in the umls, then the new all-inclusive hsqldb would definitely

>be easiest (I think) as you could use the dictionary tool to create it.

>

>If you let me know your exact situation then I may be able to better

>expound.

>

>Sean

>

>-----Original Message-----

>From: Mattmann, Chris A (3980) [mailto:[email protected]]

>Sent: Monday, October 05, 2015 7:36 PM

>To: [email protected]

>Subject: How to update cTAKES so that new top level categories come out

>based on local dictionary?

>

>Hi cTAKES team,

>

>

>

>Hope you’re well! I had a quick question. I was wondering if someone

>

>could provide me a step-by-step guide to updating cTAKES to be based

>

>off a local dictionary, so that in addition to e.g.,

>

>

>

>ProceduralMention

>

>  Value1 position etc

>

>  Value2 position etc

>

>

>

>MedicationMention

>

>  Value1 position etc

>

>  Value2 position etc

>

>

>

>

>

>NewTopLevelCategoryFromMyDictionary

>

>  FoundValue1 position etc

>

>  FoundValue2 position etc

>

>

>

>

>

>I realize this has something to do with updating the annotation

>

>descriptions etc in XML, so if I someone just could tell me what

>

>to update I’d really appreciate it.

>

>

>

>Thank you!

>

>

>

>Cheers,

>

>Chris

>

>

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>Chris Mattmann, Ph.D.

>

>Chief Architect

>

>Instrument Software and Science Data Systems Section (398)

>

>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

>

>Office: 168-519, Mailstop: 168-527

>

>Email: [email protected]

>

>WWW:  

>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Ematt

>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZst

>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=MEZE0aOE5pBHul1QA3A9xWbiwS6LzZaIq2rMw9a

>jiB0&s=cvi79MY1__guvBRsQmsYfc39lqPvv-1Yx1Pg8g5B0QU&e=

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>Adjunct Associate Professor, Computer Science Department

>

>University of Southern California, Los Angeles, CA 90089 USA

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>

>

>

>

>

>



Reply via email to