Gandhi and Matthew, Thank you for the information.
Kathy On Wed, Oct 11, 2017 at 1:35 AM, Gandhi Rajan Natarajan < [email protected]> wrote: > Hi Matthew, > > Please check out my response to Kathy. If feel that has the required info > to start off. Please let me know if you are looking for any specific > additional info. > > Regards, > Gandhi > > > -----Original Message----- > From: Matthew Vita [mailto:[email protected]] > Sent: Wednesday, October 11, 2017 11:00 AM > To: [email protected] > Subject: Re: HSQLDB out of memory with custom dictionary > > Hi Kathy and Gandhi, > > I started to put together a more formal solution for this here: > https://github.com/GoTeamEpsilon/cTAKES-HSQLDB-to-MySQL-Dictionary - It > is not perfect but it makes things a bit easier. I was able to load in > millions of records into MySQL, which is awesome! > > *If you have a non-trivial dictionary, chances are you will exhaust > HSQLDB's capabilities. By using this solution, you will have a MySQL schema > filled up with what would have been the HSQLDB data.* > > *This solution uses lazy lists and streams to keep memory usage low when > the script files are huge.* > > I have not got it working with the XML jdbc configuration yet so if you > (or anyone else) could share an example that would be amazing. > > Thanks, > > Matthew Vita > www.matthewvita.com > > On Tue, Oct 10, 2017 at 9:57 PM, Gandhi Rajan Natarajan < > [email protected]> wrote: > > > Hi Kathy, > > > > Good to hear from you. Please find the response below. > > > > NOTE: This is based on my experience with cTAKES so far. Please > > correct me if someone find the answers to be wrong. > > > > 1. Does it matter what the name of the database? > > > > Name of the database really don’t matter. But the name you have > > created should be mapped in the Dictionary GUI generated XML file's > 'jdbcurl' > > property. > > > > 2. What configuration file do I change to switch to use the new database? > > > > If you are using the example downloaded from > > https://github.com/healthnlp/ > > examples/tree/master/ctakes-temporal-demo , then in Pipeline.java you > > gotta map the XML file name generated using the Dictionary GUI instead > of 'sno_rx_16ab.xml' > > > > If you want to use the new database for CVD, then you got to change ' > > DEFAULT_DICT_DESC_PATH' to point to the new XML file in > > JCasTermAnnotator.java and rebuild ctakes-dictionary-lookup-fast > > module and use the jar file. > > > > 3) Do you think I can use SQL server instead of MySQL? My SQL seems > > to run faster. > > > > This choice is user specific and I can't comment on performance > > comparison as I have no clue on this. > > > > > > > > Regards, > > Gandhi > > > > > > -----Original Message----- > > From: Kathy Ferro [mailto:[email protected]] > > Sent: Tuesday, October 10, 2017 9:26 PM > > To: [email protected] > > Subject: Re: HSQLDB out of memory with custom dictionary > > > > Gandhi, > > > > My name is Kathy Ferro. > > > > Matthew and I are trying to accomplish the thing. I got the scripts > > loaded into both SQL server and MySQL. I did it in two ways. > > 1. Manually modifier the scripts for DB specific and run them in query > > analyzer window as you described. Works find if the data is small > enough. > > For bigger file, it looks up. > > 2. I wrote c# program to read the scripts and insert records one by > > one I re-load them. > > > > My question for you are: > > > > 2. What configuration file do I change to switch to use the new database? > > 3. Do you think I can use SQL server instead of MySQL? My SQL seems > > to run faster. > > > > Thank > > Kathy > > > > > > > > > > On Tue, Oct 10, 2017 at 2:34 AM, Gandhi Rajan Natarajan < > > [email protected]> wrote: > > > > > Hi Matthew, > > > > > > The SQLs looks fine. The only additional table I'm using apart from > > > the tables mentioned below is MDR table (MEDDRA related) and I don’t > > > use AIR table. > > > > > > Do you really think you need a JAVA program to convert those insert > > > statements to work with MySQL? I just opened the script file in text > > > editor like Editplus and did a find for `[\)]\n` and replaced it > > > with `);\n` using find and replace all option with REGEX and we are > > > done with > > the scripts. > > > > > > But only thing is you can load the data in parallel by splitting the > > > script files as mentioned earlier which saves times for you and may > > > be you can write a JAVA program to split the file. This is the > > > easiest approach I feel. > > > > > > Regards, > > > Gandhi > > > > > > > > > -----Original Message----- > > > From: Matthew Vita [mailto:[email protected]] > > > Sent: Tuesday, October 10, 2017 10:47 AM > > > To: [email protected] > > > Subject: Re: HSQLDB out of memory with custom dictionary > > > > > > Gandhi, > > > > > > I really appreciate this information. I have started working out the > > > schema and plan on writing a program that will automatically prepare > > > a script to work with MySQL. Work in progress. Can you do a quick > > > review of my MySQL schema so far? > > > > > > CREATE SCHEMA CTAKES_DATA; > > > > > > use CTAKES_DATA; > > > > > > CREATE TABLE CUI_TERMS ( > > > CUI BIGINT NOT NULL, > > > RINDEX INT(128) NOT NULL, > > > TCOUNT INT(128) NOT NULL, > > > TEXT VARCHAR(255) NOT NULL, > > > RWORD VARCHAR(48) NOT NULL > > > ); > > > CREATE INDEX IDX_CUI_TERMS ON CUI_TERMS (RWORD); > > > > > > CREATE TABLE TUI ( > > > CUI BIGINT NOT NULL, > > > TUI INT(128) NOT NULL > > > ); > > > CREATE INDEX IDX_TUI ON TUI (CUI); > > > > > > CREATE TABLE PREFTERM ( > > > CUI BIGINT NOT NULL, > > > PREFTERM VARCHAR(511) NOT NULL > > > ); > > > CREATE INDEX IDX_PREFTERM ON PREFTERM (CUI); > > > > > > CREATE TABLE RXNORM ( > > > CUI BIGINT NOT NULL, > > > RXNORM BIGINT NOT NULL > > > ); > > > CREATE INDEX IDX_RXNORM ON RXNORM (CUI); > > > > > > CREATE TABLE SNOMEDCT_US ( > > > CUI BIGINT NOT NULL, > > > SNOMEDCT_US BIGINT NOT NULL > > > ); > > > CREATE INDEX IDX_SNOMEDCT_US ON SNOMEDCT_US (CUI); > > > > > > Quick question: do you use the AIR table? > > > > > > Thanks, > > > > > > Matthew Vita > > > www.matthewvita.com > > > > > > On Mon, Oct 9, 2017 at 1:14 AM, Gandhi Rajan Natarajan < > > > [email protected]> wrote: > > > > > > > Hi Mathew, > > > > > > > > First I would like to tell you that even I m a newbie in cTAKES. > > > > Unfortunately I don’t find any documentation on this. I have > > > > followed a crude way to accomplish as this is an one time activity. > > > > This is what > > > I did: > > > > > > > > 1) Used dictionary generator GUI to generate Snomed, RxNorm and > > > > MEDDRA dictionary data that resulted in '.script' file under my > > > > <ctakes_home>\resources\org\apache\ctakes\dictionary\lookup\fast\< > > > > pr > > > > oj > > > > ect_name> > > > > folder > > > > 2) The '.script' file has HSQLDB specific queries. I have removed > > > > the unwanted statements for me pertaining to HSQLDB from the file > > > > and converted them to mysql specific queries manually. > > > > 3) I have added semicolons at the end of each line in the script > > > > using text editor and splitted the file in to five parts. Then I > > > > ran those five sctipr files in five different mysql command > > > > lines. It took me approximately 4 hours to pump all the data in to > MySQL DB. > > > > > > > > I'm not sure whether it is the right way to proceed as I mentioned > > > > earlier. But with no documentation available for MySQL DB with > > > > cTAKES, this is the approached that worked for me. Hope it will be > > > helpful. > > > > > > > > Regards, > > > > Gandhi > > > > > > > > > > > > -----Original Message----- > > > > From: Matthew Vita [mailto:[email protected]] > > > > Sent: Monday, October 09, 2017 10:41 AM > > > > To: [email protected] > > > > Subject: Re: HSQLDB out of memory with custom dictionary > > > > > > > > Gandhi, > > > > > > > > Thank you for the reply. Do you have any documentation on how to > > > > accomplish this? > > > > > > > > Thanks, > > > > > > > > Matthew Vita > > > > www.matthewvita.com > > > > > > > > On Sun, Oct 8, 2017 at 3:14 AM, Gandhi Rajan Natarajan < > > > > [email protected]> wrote: > > > > > > > > > Hi Mathew, > > > > > > > > > > I feel using MySQL Db would be better idea than using in-memory > > > > > HSQLDB. In fact, this also comes handy when you are planning to > > > > > deploy ctakes as a web application as in our case. > > > > > > > > > > Regards, > > > > > Gandhi > > > > > > > > > > -----Original Message----- > > > > > From: Matthew Vita [mailto:[email protected]] > > > > > Sent: Sunday, October 08, 2017 6:02 AM > > > > > To: [email protected] > > > > > Subject: HSQLDB out of memory with custom dictionary > > > > > > > > > > Hi Sean, Tim, cTAKES Community, > > > > > > > > > > I have put together what I am considering a pretty standard > > > > > dictionary with sources from the following: > > > > > > > > > > > > > > > - > > > > > > > > > > MEDLINEPLUS > > > > > - > > > > > > > > > > MSH > > > > > - > > > > > > > > > > NCI > > > > > - > > > > > > > > > > NDFRT > > > > > - > > > > > > > > > > CHV > > > > > - > > > > > > > > > > CSP > > > > > - > > > > > > > > > > ICPC2P > > > > > - > > > > > > > > > > MEDCIN > > > > > - > > > > > > > > > > SNOMED > > > > > - > > > > > > > > > > RXNORM > > > > > - > > > > > > > > > > ICD10 > > > > > > > > > > > > > > > However, when copied over to cTAKES (handled by the handy > > > > > Dictionary Creator GUI) HSQLDB runs out of memory. > > > > > > > > > > This is my first experience with HSQLDB so you’ll have to excuse > > > > > my limited knowledge here. I do understand that it can run > > > > > either in-memory and on disk, but I’m not sure how to configure > this. > > > > > > > > > > Here is how I am connecting to it: > > > > > > > > > > > > > > > <dictionary> > > > > > > > > > > > > > > > <name>sno_rx_16abTerms</name> > > > > > > > > > > <implementationName > > > > > >org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWor > > > > > >dD > > > > > >ic > > > > > >ti > > > > > >on > > > > > >ary</ > > > > > implementationName> > > > > > > > > > > <properties> > > > > > > > > > > <property key="jdbcDriver" value="org.hsqldb.jdbcDriver" > > > > > /> > > > > > > > > > > <property key="jdbcUrl" value= > > > > > "jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/ > > > > > lookup/fast/sno_rx_16ab/sno_rx_16ab" > > > > > /> > > > > > > > > > > <property key="jdbcUser" value="sa" /> > > > > > > > > > > <property key="jdbcPass" value="" /> > > > > > > > > > > <property key="rareWordTable" value="cui_terms" /> > > > > > > > > > > <property key="umlsUrl" value=" > > > > > https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser" /> > > > > > > > > > > <property key="umlsVendor" value="NLM-6515182895" /> > > > > > > > > > > <property key="umlsUser" value="CHANGE_ME" /> > > > > > > > > > > <property key="umlsPass" value="CHANGE_ME" /> > > > > > > > > > > </properties> > > > > > > > > > > </dictionary> > > > > > > > > > > <dictionary> > > > > > > > > > > > > > > > > > > > > Can I configure HSQLDB to be used on disk? If this is not a good > > > > > approach, can I spin up MySQL in its place? > > > > > > > > > > > > > > > Sorry if this has asked before. > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Matthew Vita > > > > > www.matthewvita.com > > > > > This email and any files transmitted with it are confidential > > > > > and intended solely for the use of the individual or entity to > > > > > whom they are > > > > addressed. > > > > > If you are not the named addressee you should not disseminate, > > > > > distribute or copy this e-mail. Please notify the sender or > > > > > system manager by email immediately if you have received this > > > > > e-mail by mistake and delete this e-mail from your system. If > > > > > you are not the intended recipient you are notified that > > > > > disclosing, copying, distributing or taking any action in > > > > > reliance on the contents of this information is strictly > prohibited and against the law. > > > > > > > > > This email and any files transmitted with it are confidential and > > > > intended solely for the use of the individual or entity to whom > > > > they are > > > addressed. > > > > If you are not the named addressee you should not disseminate, > > > > distribute or copy this e-mail. Please notify the sender or system > > > > manager by email immediately if you have received this e-mail by > > > > mistake and delete this e-mail from your system. If you are not > > > > the intended recipient you are notified that disclosing, copying, > > > > distributing or taking any action in reliance on the contents of > > > > this information is strictly prohibited and against the law. > > > > > > > This email and any files transmitted with it are confidential and > > > intended solely for the use of the individual or entity to whom they > > > are > > addressed. > > > If you are not the named addressee you should not disseminate, > > > distribute or copy this e-mail. Please notify the sender or system > > > manager by email immediately if you have received this e-mail by > > > mistake and delete this e-mail from your system. If you are not the > > > intended recipient you are notified that disclosing, copying, > > > distributing or taking any action in reliance on the contents of > > > this information is strictly prohibited and against the law. > > > > > This email and any files transmitted with it are confidential and > > intended solely for the use of the individual or entity to whom they are > addressed. > > If you are not the named addressee you should not disseminate, > > distribute or copy this e-mail. Please notify the sender or system > > manager by email immediately if you have received this e-mail by > > mistake and delete this e-mail from your system. If you are not the > > intended recipient you are notified that disclosing, copying, > > distributing or taking any action in reliance on the contents of this > > information is strictly prohibited and against the law. > > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you are not the named addressee you should not disseminate, distribute > or copy this e-mail. Please notify the sender or system manager by email > immediately if you have received this e-mail by mistake and delete this > e-mail from your system. If you are not the intended recipient you are > notified that disclosing, copying, distributing or taking any action in > reliance on the contents of this information is strictly prohibited and > against the law. >
