Gandhi, Thanks again for your response.
I am pretty new with ctakes myself and my Java knowledge is not up to dated. I am looking at the sample source code from https://github.com/healthnlp/ examples/tree/master/ctakes-temporal-demo. In pipeline.java, it looks like it changes the dictionary name only. builder.add( AnalysisEngineFactory.createEngineDescription( DefaultJCasTermAnnotator.class, AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_KEY, "org.apache.ctakes.typesystem.type.textspan.Sentence", JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, "org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml") ); 1. Do I change to MySQL driver in (dictionary).xml? Below is the code snip. 2, What do I do with the blue highlight? 3. If I leave hsqldb, would that just use the hsqldb script file? 4. If I change it, do you have sample? Right now, I run the pipeline using the new dictionary with this option "-l org/apache/ctakes/dictionary/lookup/fast/(dictionary name).xml" which loads the dictionary into hsqldb memory. <property key="jdbcDriver" value="org.hsqldb.jdbcDriver"/> <property key="jdbcUrl" value="jdbc:hsqldb:file :src/main/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab"/> I'm very appreciated your help. Kathy On Wed, Oct 11, 2017 at 5:14 PM, Kathy Ferro <[email protected]> wrote: > Gandhi and Matthew, > > Thank you for the information. > > Kathy > > On Wed, Oct 11, 2017 at 1:35 AM, Gandhi Rajan Natarajan < > [email protected]> wrote: > >> Hi Matthew, >> >> Please check out my response to Kathy. If feel that has the required info >> to start off. Please let me know if you are looking for any specific >> additional info. >> >> Regards, >> Gandhi >> >> >> -----Original Message----- >> From: Matthew Vita [mailto:[email protected]] >> Sent: Wednesday, October 11, 2017 11:00 AM >> To: [email protected] >> Subject: Re: HSQLDB out of memory with custom dictionary >> >> Hi Kathy and Gandhi, >> >> I started to put together a more formal solution for this here: >> https://github.com/GoTeamEpsilon/cTAKES-HSQLDB-to-MySQL-Dictionary - It >> is not perfect but it makes things a bit easier. I was able to load in >> millions of records into MySQL, which is awesome! >> >> *If you have a non-trivial dictionary, chances are you will exhaust >> HSQLDB's capabilities. By using this solution, you will have a MySQL schema >> filled up with what would have been the HSQLDB data.* >> >> *This solution uses lazy lists and streams to keep memory usage low when >> the script files are huge.* >> >> I have not got it working with the XML jdbc configuration yet so if you >> (or anyone else) could share an example that would be amazing. >> >> Thanks, >> >> Matthew Vita >> www.matthewvita.com >> >> On Tue, Oct 10, 2017 at 9:57 PM, Gandhi Rajan Natarajan < >> [email protected]> wrote: >> >> > Hi Kathy, >> > >> > Good to hear from you. Please find the response below. >> > >> > NOTE: This is based on my experience with cTAKES so far. Please >> > correct me if someone find the answers to be wrong. >> > >> > 1. Does it matter what the name of the database? >> > >> > Name of the database really don’t matter. But the name you have >> > created should be mapped in the Dictionary GUI generated XML file's >> 'jdbcurl' >> > property. >> > >> > 2. What configuration file do I change to switch to use the new >> database? >> > >> > If you are using the example downloaded from >> > https://github.com/healthnlp/ >> > examples/tree/master/ctakes-temporal-demo , then in Pipeline.java you >> > gotta map the XML file name generated using the Dictionary GUI instead >> of 'sno_rx_16ab.xml' >> > >> > If you want to use the new database for CVD, then you got to change ' >> > DEFAULT_DICT_DESC_PATH' to point to the new XML file in >> > JCasTermAnnotator.java and rebuild ctakes-dictionary-lookup-fast >> > module and use the jar file. >> > >> > 3) Do you think I can use SQL server instead of MySQL? My SQL seems >> > to run faster. >> > >> > This choice is user specific and I can't comment on performance >> > comparison as I have no clue on this. >> > >> > >> > >> > Regards, >> > Gandhi >> > >> > >> > -----Original Message----- >> > From: Kathy Ferro [mailto:[email protected]] >> > Sent: Tuesday, October 10, 2017 9:26 PM >> > To: [email protected] >> > Subject: Re: HSQLDB out of memory with custom dictionary >> > >> > Gandhi, >> > >> > My name is Kathy Ferro. >> > >> > Matthew and I are trying to accomplish the thing. I got the scripts >> > loaded into both SQL server and MySQL. I did it in two ways. >> > 1. Manually modifier the scripts for DB specific and run them in query >> > analyzer window as you described. Works find if the data is small >> enough. >> > For bigger file, it looks up. >> > 2. I wrote c# program to read the scripts and insert records one by >> > one I re-load them. >> > >> > My question for you are: >> > >> > 2. What configuration file do I change to switch to use the new >> database? >> > 3. Do you think I can use SQL server instead of MySQL? My SQL seems >> > to run faster. >> > >> > Thank >> > Kathy >> > >> > >> > >> > >> > On Tue, Oct 10, 2017 at 2:34 AM, Gandhi Rajan Natarajan < >> > [email protected]> wrote: >> > >> > > Hi Matthew, >> > > >> > > The SQLs looks fine. The only additional table I'm using apart from >> > > the tables mentioned below is MDR table (MEDDRA related) and I don’t >> > > use AIR table. >> > > >> > > Do you really think you need a JAVA program to convert those insert >> > > statements to work with MySQL? I just opened the script file in text >> > > editor like Editplus and did a find for `[\)]\n` and replaced it >> > > with `);\n` using find and replace all option with REGEX and we are >> > > done with >> > the scripts. >> > > >> > > But only thing is you can load the data in parallel by splitting the >> > > script files as mentioned earlier which saves times for you and may >> > > be you can write a JAVA program to split the file. This is the >> > > easiest approach I feel. >> > > >> > > Regards, >> > > Gandhi >> > > >> > > >> > > -----Original Message----- >> > > From: Matthew Vita [mailto:[email protected]] >> > > Sent: Tuesday, October 10, 2017 10:47 AM >> > > To: [email protected] >> > > Subject: Re: HSQLDB out of memory with custom dictionary >> > > >> > > Gandhi, >> > > >> > > I really appreciate this information. I have started working out the >> > > schema and plan on writing a program that will automatically prepare >> > > a script to work with MySQL. Work in progress. Can you do a quick >> > > review of my MySQL schema so far? >> > > >> > > CREATE SCHEMA CTAKES_DATA; >> > > >> > > use CTAKES_DATA; >> > > >> > > CREATE TABLE CUI_TERMS ( >> > > CUI BIGINT NOT NULL, >> > > RINDEX INT(128) NOT NULL, >> > > TCOUNT INT(128) NOT NULL, >> > > TEXT VARCHAR(255) NOT NULL, >> > > RWORD VARCHAR(48) NOT NULL >> > > ); >> > > CREATE INDEX IDX_CUI_TERMS ON CUI_TERMS (RWORD); >> > > >> > > CREATE TABLE TUI ( >> > > CUI BIGINT NOT NULL, >> > > TUI INT(128) NOT NULL >> > > ); >> > > CREATE INDEX IDX_TUI ON TUI (CUI); >> > > >> > > CREATE TABLE PREFTERM ( >> > > CUI BIGINT NOT NULL, >> > > PREFTERM VARCHAR(511) NOT NULL >> > > ); >> > > CREATE INDEX IDX_PREFTERM ON PREFTERM (CUI); >> > > >> > > CREATE TABLE RXNORM ( >> > > CUI BIGINT NOT NULL, >> > > RXNORM BIGINT NOT NULL >> > > ); >> > > CREATE INDEX IDX_RXNORM ON RXNORM (CUI); >> > > >> > > CREATE TABLE SNOMEDCT_US ( >> > > CUI BIGINT NOT NULL, >> > > SNOMEDCT_US BIGINT NOT NULL >> > > ); >> > > CREATE INDEX IDX_SNOMEDCT_US ON SNOMEDCT_US (CUI); >> > > >> > > Quick question: do you use the AIR table? >> > > >> > > Thanks, >> > > >> > > Matthew Vita >> > > www.matthewvita.com >> > > >> > > On Mon, Oct 9, 2017 at 1:14 AM, Gandhi Rajan Natarajan < >> > > [email protected]> wrote: >> > > >> > > > Hi Mathew, >> > > > >> > > > First I would like to tell you that even I m a newbie in cTAKES. >> > > > Unfortunately I don’t find any documentation on this. I have >> > > > followed a crude way to accomplish as this is an one time activity. >> > > > This is what >> > > I did: >> > > > >> > > > 1) Used dictionary generator GUI to generate Snomed, RxNorm and >> > > > MEDDRA dictionary data that resulted in '.script' file under my >> > > > <ctakes_home>\resources\org\apache\ctakes\dictionary\lookup\fast\< >> > > > pr >> > > > oj >> > > > ect_name> >> > > > folder >> > > > 2) The '.script' file has HSQLDB specific queries. I have removed >> > > > the unwanted statements for me pertaining to HSQLDB from the file >> > > > and converted them to mysql specific queries manually. >> > > > 3) I have added semicolons at the end of each line in the script >> > > > using text editor and splitted the file in to five parts. Then I >> > > > ran those five sctipr files in five different mysql command >> > > > lines. It took me approximately 4 hours to pump all the data in to >> MySQL DB. >> > > > >> > > > I'm not sure whether it is the right way to proceed as I mentioned >> > > > earlier. But with no documentation available for MySQL DB with >> > > > cTAKES, this is the approached that worked for me. Hope it will be >> > > helpful. >> > > > >> > > > Regards, >> > > > Gandhi >> > > > >> > > > >> > > > -----Original Message----- >> > > > From: Matthew Vita [mailto:[email protected]] >> > > > Sent: Monday, October 09, 2017 10:41 AM >> > > > To: [email protected] >> > > > Subject: Re: HSQLDB out of memory with custom dictionary >> > > > >> > > > Gandhi, >> > > > >> > > > Thank you for the reply. Do you have any documentation on how to >> > > > accomplish this? >> > > > >> > > > Thanks, >> > > > >> > > > Matthew Vita >> > > > www.matthewvita.com >> > > > >> > > > On Sun, Oct 8, 2017 at 3:14 AM, Gandhi Rajan Natarajan < >> > > > [email protected]> wrote: >> > > > >> > > > > Hi Mathew, >> > > > > >> > > > > I feel using MySQL Db would be better idea than using in-memory >> > > > > HSQLDB. In fact, this also comes handy when you are planning to >> > > > > deploy ctakes as a web application as in our case. >> > > > > >> > > > > Regards, >> > > > > Gandhi >> > > > > >> > > > > -----Original Message----- >> > > > > From: Matthew Vita [mailto:[email protected]] >> > > > > Sent: Sunday, October 08, 2017 6:02 AM >> > > > > To: [email protected] >> > > > > Subject: HSQLDB out of memory with custom dictionary >> > > > > >> > > > > Hi Sean, Tim, cTAKES Community, >> > > > > >> > > > > I have put together what I am considering a pretty standard >> > > > > dictionary with sources from the following: >> > > > > >> > > > > >> > > > > - >> > > > > >> > > > > MEDLINEPLUS >> > > > > - >> > > > > >> > > > > MSH >> > > > > - >> > > > > >> > > > > NCI >> > > > > - >> > > > > >> > > > > NDFRT >> > > > > - >> > > > > >> > > > > CHV >> > > > > - >> > > > > >> > > > > CSP >> > > > > - >> > > > > >> > > > > ICPC2P >> > > > > - >> > > > > >> > > > > MEDCIN >> > > > > - >> > > > > >> > > > > SNOMED >> > > > > - >> > > > > >> > > > > RXNORM >> > > > > - >> > > > > >> > > > > ICD10 >> > > > > >> > > > > >> > > > > However, when copied over to cTAKES (handled by the handy >> > > > > Dictionary Creator GUI) HSQLDB runs out of memory. >> > > > > >> > > > > This is my first experience with HSQLDB so you’ll have to excuse >> > > > > my limited knowledge here. I do understand that it can run >> > > > > either in-memory and on disk, but I’m not sure how to configure >> this. >> > > > > >> > > > > Here is how I am connecting to it: >> > > > > >> > > > > >> > > > > <dictionary> >> > > > > >> > > > > >> > > > > <name>sno_rx_16abTerms</name> >> > > > > >> > > > > <implementationName >> > > > > >org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWor >> > > > > >dD >> > > > > >ic >> > > > > >ti >> > > > > >on >> > > > > >ary</ >> > > > > implementationName> >> > > > > >> > > > > <properties> >> > > > > >> > > > > <property key="jdbcDriver" value="org.hsqldb.jdbcDriver" >> > > > > /> >> > > > > >> > > > > <property key="jdbcUrl" value= >> > > > > "jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/ >> > > > > lookup/fast/sno_rx_16ab/sno_rx_16ab" >> > > > > /> >> > > > > >> > > > > <property key="jdbcUser" value="sa" /> >> > > > > >> > > > > <property key="jdbcPass" value="" /> >> > > > > >> > > > > <property key="rareWordTable" value="cui_terms" /> >> > > > > >> > > > > <property key="umlsUrl" value=" >> > > > > https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser" /> >> > > > > >> > > > > <property key="umlsVendor" value="NLM-6515182895" /> >> > > > > >> > > > > <property key="umlsUser" value="CHANGE_ME" /> >> > > > > >> > > > > <property key="umlsPass" value="CHANGE_ME" /> >> > > > > >> > > > > </properties> >> > > > > >> > > > > </dictionary> >> > > > > >> > > > > <dictionary> >> > > > > >> > > > > >> > > > > >> > > > > Can I configure HSQLDB to be used on disk? If this is not a good >> > > > > approach, can I spin up MySQL in its place? >> > > > > >> > > > > >> > > > > Sorry if this has asked before. >> > > > > >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Matthew Vita >> > > > > www.matthewvita.com >> > > > > This email and any files transmitted with it are confidential >> > > > > and intended solely for the use of the individual or entity to >> > > > > whom they are >> > > > addressed. >> > > > > If you are not the named addressee you should not disseminate, >> > > > > distribute or copy this e-mail. Please notify the sender or >> > > > > system manager by email immediately if you have received this >> > > > > e-mail by mistake and delete this e-mail from your system. If >> > > > > you are not the intended recipient you are notified that >> > > > > disclosing, copying, distributing or taking any action in >> > > > > reliance on the contents of this information is strictly >> prohibited and against the law. >> > > > > >> > > > This email and any files transmitted with it are confidential and >> > > > intended solely for the use of the individual or entity to whom >> > > > they are >> > > addressed. >> > > > If you are not the named addressee you should not disseminate, >> > > > distribute or copy this e-mail. Please notify the sender or system >> > > > manager by email immediately if you have received this e-mail by >> > > > mistake and delete this e-mail from your system. If you are not >> > > > the intended recipient you are notified that disclosing, copying, >> > > > distributing or taking any action in reliance on the contents of >> > > > this information is strictly prohibited and against the law. >> > > > >> > > This email and any files transmitted with it are confidential and >> > > intended solely for the use of the individual or entity to whom they >> > > are >> > addressed. >> > > If you are not the named addressee you should not disseminate, >> > > distribute or copy this e-mail. Please notify the sender or system >> > > manager by email immediately if you have received this e-mail by >> > > mistake and delete this e-mail from your system. If you are not the >> > > intended recipient you are notified that disclosing, copying, >> > > distributing or taking any action in reliance on the contents of >> > > this information is strictly prohibited and against the law. >> > > >> > This email and any files transmitted with it are confidential and >> > intended solely for the use of the individual or entity to whom they >> are addressed. >> > If you are not the named addressee you should not disseminate, >> > distribute or copy this e-mail. Please notify the sender or system >> > manager by email immediately if you have received this e-mail by >> > mistake and delete this e-mail from your system. If you are not the >> > intended recipient you are notified that disclosing, copying, >> > distributing or taking any action in reliance on the contents of this >> > information is strictly prohibited and against the law. >> > >> This email and any files transmitted with it are confidential and >> intended solely for the use of the individual or entity to whom they are >> addressed. If you are not the named addressee you should not disseminate, >> distribute or copy this e-mail. Please notify the sender or system manager >> by email immediately if you have received this e-mail by mistake and delete >> this e-mail from your system. If you are not the intended recipient you are >> notified that disclosing, copying, distributing or taking any action in >> reliance on the contents of this information is strictly prohibited and >> against the law. >> > >
