Pei: Yes, specifically: The source code was released by Invitae under Apache ASL 2.0 per my request and with full blessing from our legal counsel and software team. I also reviewed in principle the idea with John Wilbanks of Sage Bionetworks (and formerly creative commons). This is legit, or I wouldn't have spent tons of hours doing it.
The raw content is a set of scripts which wget a list of URLS from the NCBI public FTP repositories. This code DOES NOT redistribute any content whatsoever, just a list of URLs to download, unzip, and insert into a local mysql database. To repeat: I am NOT circulating any content, just URL links -- you must download the content yourself. And that is the beauty -- all content is downloaded BY THE USER and the content is publicly available per the NCBI policy and license for MedGen sources. On Thu, Nov 13, 2014 at 11:18 AM, Chen, Pei <pei.c...@childrens.harvard.edu> wrote: > John- I believe that was the thinking. > Andy- Just to confirm- Is the raw content of this dataset released under > ASL2.0? i.e. can you contribute it as a CSV or similar so that cTAKES may > re-tokenize it using the same PTB rules, format it for cTAKES' dictionary > lookup, etc., and then redistribute it under the same License. > > > -----Original Message----- > > From: John Green [mailto:john.travis.gr...@gmail.com] > > Sent: Thursday, November 13, 2014 1:55 PM > > To: dev@ctakes.apache.org > > Cc: dev@ctakes.apache.org > > Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available > > as open access download > > > > The old licensed setup would be kept as a packaged option? Much as it is > > now.... With the unlicensed going out in place of the current "free" > > dictionary? Am I understanding that right? > > > > > > JG > > — > > Sent from Mailbox > > > > On Thu, Nov 13, 2014 at 1:40 PM, andy mcmurry > > <mcmurry.a...@gmail.com> > > wrote: > > > > > I'll crunch the numbers -- in the meantime I can tell you that > > > phenotypes vary by semantic type. clinical attributes from SNOMED are > > > abundant, many concepts in mesh that are mapped to diseases. Tons of > > > "pharmacological substances" > > > On Nov 12, 2014 6:19 AM, "Dligach, Dmitriy" < > > > dmitriy.dlig...@childrens.harvard.edu> wrote: > > >> Andy, thank you for this resource! > > >> > > >> Do you have an estimate of what percentage of UMLS concepts were left > > out? > > >> > > >> Dima > > >> > > >> > > >> > > >> > > >> On Nov 11, 2014, at 16:02, andy mcmurry <mcmurry.a...@gmail.com> > > wrote: > > >> > > >> > Hello! > > >> > > > >> > https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2) > > >> > > > >> > We just released a new library containing a huge chunk of UMLS > > >> > concepts which are available without registering > > accounts/username/passwords. > > >> > LEGALLY. Yes, really! > > >> > > > >> > The subset is from NCBI and it contains *thousands of concepts from > > >> SNOMED > > >> > and other vocabularies*. > > >> > > > >> > The code is essentially > > >> > 1. a list of WGET targets to various NCBI FTP site mirrors 2. > > >> > Makefile for building the databases of interest > > >> > > > >> > Our legal team has approved distribution for Open Access work, ASL2 > > >> > LICENSE. > > >> > > > >> > I recommend we use this opportunity to make this the default > > >> > distribution for CTAKES UMLS connections, because it obviates the > > >> > need for so much painful credentialing and back and forth > > >> > agreements with the US National Library of Medicine. > > >> > > > >> > Cheers! > > >> > --Andy > > >> > > > >> > > > >> > On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. < > > >> masanz.ja...@mayo.edu> > > >> > wrote: > > >> > > > >> >> > > >> >> I would love to see the install be as simple as apt-get install to > > >> >> end > > >> up > > >> >> with some working dictionary that have more than a handful of > > >> >> entries to get them started. > > >> >> > > >> >> Regards, > > >> >> James Masanz > > >> >> > > >> >> -----Original Message----- > > >> >> From: andy mcmurry [mailto:mcmurry.a...@gmail.com] > > >> >> Sent: Tuesday, September 09, 2014 4:32 PM > > >> >> To: ctakes-...@incubator.apache.org > > >> >> Subject: Recommendation for ctakes default (UMLS) dictionaries > > >> >> > > >> >> Greetings ctakes-dev: > > >> >> > > >> >> *UMLS license restrictions have been getting more lax over the > > >> >> years -- *much of the UMLS can be downloaded directly from the > > >> >> NCBI official FTP site. > > >> >> > > >> >> In fact, the NIH (and implicitly the NLM) *have already made the > > >> standard > > >> >> terms public for some medical specialities*. > > >> >> > > >> >> For example: Here is the UMLS subset specific to Medical Genetics > > >> (MedGen) > > >> >> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) > > >> >> and > > >> names, > > >> >> etc : > > >> >> > > >> >> [ ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html ] > > >> >> > > >> >> My team has developed a JVM based wrapper for MetaMap 2013AB > > which > > >> >> I intend to open source soon (Clojure). It includes REST support > > >> >> for invoking MetaMap with any or all of the command line arguments. > > >> >> We do not integrate with UIMA, we are basically a wrapper around > > >> >> the binary installation of MetaMap. The emphasis is on publication > > >> >> text not clinical text, still, some services are common (such as > LVG). > > >> >> > > >> >> Strangely, the NLM still requires UMLS licenses to download > > >> >> MetaMap execution binaries. The MetaMap binary install is better > > >> >> but customizing dictionaries (DataFileBuilder) is not as easy to > > >> >> use as CTAKES with > > >> YTEXT > > >> >> > > >> >> [ > > >> >> https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installati > > >> >> on > > >> ] > > >> >> > > >> >> *** Hence, there is a real opportunity here to enable Apache > > >> >> cTAKES to have a stronger default dictionary. ** * > > >> >> > > >> >> Imagine if we could > > >> >> *$ apt-get install apache-ctakes * > > >> >> > > >> >> and instantly have a working package for SOME problem domain. > > >> >> In my case (Medical Genetics) the UMLS definitions are already > > >> >> available and the UMLS license problem becomes a non issue, at > > >> >> least for many > > >> first > > >> >> time users > > >> >> > > >> >> Your thoughts? > > >> >> AndyMC > > >> >> > > >> > > >> >