Hi, I would be extremely interested in a sample dictionary that doesn’t require a UMLS login.
How would I use this? Thanks, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: "and...@apache.org (forwarding)" <mcmurry.a...@gmail.com> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Date: Friday, October 2, 2015 at 12:43 AM To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Subject: building a *real sample dictionary* without UMLS login >Greetings ctakes-dev! > >I have been polishing MedGen (UMLS) dictionaries for over a year now and >I am confident in saying "this is solid". >As a reminder, the medgen-mysql package contains a large subset of the >UMLS that can be downloaded without UMLS login, greatly simplifying the >creation of an example dictionary. > >QUESTION: >Would you like me to integrate this into ctakes to simplify installations >for new-users, and if so, what would be your preferred method? > >Source Vocabularies (SAB) >+-------------+--------+ >| SourceVocab | cnt | >+-------------+--------+ >| MSH | 245435 | Medical Subject Headings >| SNOMEDCT_US | 156105 | SNOMED Clinical Terms >| NCI | 136888 | NCI Cancer Terms >| ... | ... | >+-------------+--------+ > >Semantic Types (STY) >+-------------------------------------------+--------+ >| SemanticType | cnt | >+-------------------------------------------+--------+ >| Pharmacologic Substance | 102511 | >| Finding | 90413 | >| Organic Chemical | 81329 | >| Disease or Syndrome | 47223 | >| Neoplastic Process | 16151 | >| Amino Acid, Peptide, or Protein | 9383 | >| Congenital Abnormality | 6536 | >| Pathologic Function | 5655 | >| Steroid | 3919 | >| Sign or Symptom | 2909 | >| ... | ... | > > >What would you like to see? >and...@apache.org > > >On Nov 12, 2014, at 6:14 AM, "Dligach, Dmitriy" ><dmitriy.dlig...@childrens.harvard.edu> wrote: > >> Andy, thank you for this resource! >> >> Do you have an estimate of what percentage of UMLS concepts were left >>out? >> >> Dima >> >> >> >> >> On Nov 11, 2014, at 16:02, andy mcmurry <mcmurry.a...@gmail.com> wrote: >> >>> Hello! >>> >>> https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2) >>> >>> We just released a new library containing a huge chunk of UMLS concepts >>> which are available without registering accounts/username/passwords. >>> LEGALLY. Yes, really! >>> >>> The subset is from NCBI and it contains *thousands of concepts from >>>SNOMED >>> and other vocabularies*. >>> >>> The code is essentially >>> 1. a list of WGET targets to various NCBI FTP site mirrors >>> 2. Makefile for building the databases of interest >>> >>> Our legal team has approved distribution for Open Access work, ASL2 >>> LICENSE. >>> >>> I recommend we use this opportunity to make this the default >>>distribution >>> for CTAKES UMLS connections, because it obviates the need for so much >>> painful credentialing and back and forth agreements with the US >>>National >>> Library of Medicine. >>> >>> Cheers! >>> --Andy >>> >>> >>> On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. >>><masanz.ja...@mayo.edu> >>> wrote: >>> >>>> >>>> I would love to see the install be as simple as apt-get install to >>>>end up >>>> with some working dictionary that have more than a handful of entries >>>>to >>>> get them started. >>>> >>>> Regards, >>>> James Masanz >>>> >>>> -----Original Message----- >>>> From: andy mcmurry [mailto:mcmurry.a...@gmail.com] >>>> Sent: Tuesday, September 09, 2014 4:32 PM >>>> To: ctakes-...@incubator.apache.org >>>> Subject: Recommendation for ctakes default (UMLS) dictionaries >>>> >>>> Greetings ctakes-dev: >>>> >>>> *UMLS license restrictions have been getting more lax over the years >>>>-- >>>> *much of the UMLS can be downloaded directly from the NCBI official >>>>FTP >>>> site. >>>> >>>> In fact, the NIH (and implicitly the NLM) *have already made the >>>>standard >>>> terms public for some medical specialities*. >>>> >>>> For example: Here is the UMLS subset specific to Medical Genetics >>>>(MedGen) >>>> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) and >>>>names, >>>> etc : >>>> >>>> [ ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html ] >>>> >>>> My team has developed a JVM based wrapper for MetaMap 2013AB which I >>>> intend to open source soon (Clojure). It includes REST support for >>>> invoking MetaMap with any or all of the command line arguments. >>>> We do not integrate with UIMA, we are basically a wrapper around the >>>> binary installation of MetaMap. The emphasis is on publication text >>>>not >>>> clinical text, still, some services are common (such as LVG). >>>> >>>> Strangely, the NLM still requires UMLS licenses to download MetaMap >>>> execution binaries. The MetaMap binary install is better but >>>>customizing >>>> dictionaries (DataFileBuilder) is not as easy to use as CTAKES with >>>>YTEXT >>>> >>>> [ >>>>https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation ] >>>> >>>> *** Hence, there is a real opportunity here to enable Apache cTAKES to >>>> have a stronger default dictionary. ** * >>>> >>>> Imagine if we could >>>> *$ apt-get install apache-ctakes * >>>> >>>> and instantly have a working package for SOME problem domain. >>>> In my case (Medical Genetics) the UMLS definitions are already >>>>available >>>> and the UMLS license problem becomes a non issue, at least for many >>>>first >>>> time users >>>> >>>> Your thoughts? >>>> AndyMC >>>> >> >