Hi Andy,

Great stuff!  I think that I understand the method, but I have a question about 
the statement:

>the content is publicly available per the NCBI policy and license for MedGen 
>sources

Does this mean that I, Joe Anybody, could download the content, place some of 
the content in a database structured in my own fashion, package the -new- 
database, and include it in a cTakes distribution?
Or, does it mean that content downloaded by script is usable as-is and only 
as-is?  The whole "if I'd known your were going to do that I wouldn't have 
given it to you ..."

Thanks,
Sean

________________________________________
From: andy mcmurry [mcmurry.a...@gmail.com]
Sent: Thursday, November 13, 2014 6:59 PM
To: dev@ctakes.apache.org
Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available as open 
access download

Pei: Yes, specifically:

The source code was released by Invitae under Apache ASL 2.0 per my request
and with full blessing from our legal counsel and software team. I also
reviewed in principle the idea with John Wilbanks of Sage Bionetworks (and
formerly creative commons). This is legit, or I wouldn't have spent tons of
hours doing it.

The raw content is a set of scripts which wget a list of URLS from the NCBI
public FTP repositories. This code DOES NOT redistribute any content
whatsoever, just a list of URLs to download, unzip, and insert into a local
mysql database. To repeat: I am NOT circulating any content, just URL links
-- you must download the content yourself. And that is the beauty -- all
content is downloaded BY THE USER and the content is publicly available per
the NCBI policy and license for MedGen sources.


On Thu, Nov 13, 2014 at 11:18 AM, Chen, Pei <pei.c...@childrens.harvard.edu>
wrote:

> John- I believe that was the thinking.
> Andy- Just to confirm- Is the raw content of this dataset released under
> ASL2.0?  i.e. can you contribute it as a CSV or similar so that cTAKES may
> re-tokenize it using the same PTB rules, format it for cTAKES' dictionary
> lookup, etc., and then redistribute it under the same License.
>
> > -----Original Message-----
> > From: John Green [mailto:john.travis.gr...@gmail.com]
> > Sent: Thursday, November 13, 2014 1:55 PM
> > To: dev@ctakes.apache.org
> > Cc: dev@ctakes.apache.org
> > Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available
> > as open access download
> >
> > The old licensed setup would be kept as a packaged option? Much as it is
> > now.... With the unlicensed going out in place of the current "free"
> > dictionary? Am I understanding that right?
> >
> >
> > JG
> > —
> > Sent from Mailbox
> >
> > On Thu, Nov 13, 2014 at 1:40 PM, andy mcmurry
> > <mcmurry.a...@gmail.com>
> > wrote:
> >
> > > I'll crunch the numbers -- in the meantime I can tell you that
> > > phenotypes vary by semantic type. clinical attributes  from SNOMED are
> > > abundant, many concepts in mesh that are mapped to diseases. Tons of
> > > "pharmacological substances"
> > > On Nov 12, 2014 6:19 AM, "Dligach, Dmitriy" <
> > > dmitriy.dlig...@childrens.harvard.edu> wrote:
> > >> Andy, thank you for this resource!
> > >>
> > >> Do you have an estimate of what percentage of UMLS concepts were left
> > out?
> > >>
> > >> Dima
> > >>
> > >>
> > >>
> > >>
> > >> On Nov 11, 2014, at 16:02, andy mcmurry <mcmurry.a...@gmail.com>
> > wrote:
> > >>
> > >> > Hello!
> > >> >
> > >> > https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
> > >> >
> > >> > We just released a new library containing a huge chunk of UMLS
> > >> > concepts which are available without registering
> > accounts/username/passwords.
> > >> > LEGALLY. Yes, really!
> > >> >
> > >> > The subset is from NCBI and it contains *thousands of concepts from
> > >> SNOMED
> > >> > and other vocabularies*.
> > >> >
> > >> > The code is essentially
> > >> > 1. a list of WGET targets to various NCBI FTP site mirrors 2.
> > >> > Makefile for building the databases of interest
> > >> >
> > >> > Our legal team has approved distribution for Open Access work, ASL2
> > >> > LICENSE.
> > >> >
> > >> > I recommend we use this opportunity to make this the default
> > >> > distribution for CTAKES UMLS connections, because it obviates the
> > >> > need for so much painful credentialing and back and forth
> > >> > agreements with the US National Library of Medicine.
> > >> >
> > >> > Cheers!
> > >> > --Andy
> > >> >
> > >> >
> > >> > On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. <
> > >> masanz.ja...@mayo.edu>
> > >> > wrote:
> > >> >
> > >> >>
> > >> >> I would love to see the install be as simple as apt-get install to
> > >> >> end
> > >> up
> > >> >> with some working dictionary that have more than a handful of
> > >> >> entries to get them started.
> > >> >>
> > >> >> Regards,
> > >> >> James Masanz
> > >> >>
> > >> >> -----Original Message-----
> > >> >> From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
> > >> >> Sent: Tuesday, September 09, 2014 4:32 PM
> > >> >> To: ctakes-...@incubator.apache.org
> > >> >> Subject: Recommendation for ctakes default (UMLS) dictionaries
> > >> >>
> > >> >> Greetings ctakes-dev:
> > >> >>
> > >> >> *UMLS license restrictions have been getting more lax over the
> > >> >> years -- *much of the UMLS can be downloaded directly from the
> > >> >> NCBI official FTP site.
> > >> >>
> > >> >> In fact, the NIH (and implicitly the NLM) *have already made the
> > >> standard
> > >> >> terms public for some medical specialities*.
> > >> >>
> > >> >> For example: Here is the UMLS subset specific to Medical Genetics
> > >> (MedGen)
> > >> >> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s)
> > >> >> and
> > >> names,
> > >> >> etc :
> > >> >>
> > >> >> [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
> > >> >>
> > >> >> My team has developed a JVM based wrapper for MetaMap 2013AB
> > which
> > >> >> I intend to open source soon (Clojure).  It includes REST support
> > >> >> for invoking MetaMap with any or all of the command line arguments.
> > >> >> We do not integrate with UIMA, we are basically a wrapper around
> > >> >> the binary installation of MetaMap. The emphasis is on publication
> > >> >> text not clinical text, still, some services are common (such as
> LVG).
> > >> >>
> > >> >> Strangely, the NLM still requires UMLS licenses to download
> > >> >> MetaMap execution binaries. The MetaMap binary install is better
> > >> >> but customizing dictionaries (DataFileBuilder) is not as easy to
> > >> >> use as CTAKES with
> > >> YTEXT
> > >> >>
> > >> >> [
> > >> >> https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installati
> > >> >> on
> > >> ]
> > >> >>
> > >> >> *** Hence, there is a real opportunity here to enable Apache
> > >> >> cTAKES to have a stronger default dictionary. ** *
> > >> >>
> > >> >> Imagine if we could
> > >> >> *$ apt-get install apache-ctakes *
> > >> >>
> > >> >> and instantly have a working package for SOME problem domain.
> > >> >> In my case (Medical Genetics) the UMLS definitions are already
> > >> >> available and the UMLS license problem becomes a non issue, at
> > >> >> least for many
> > >> first
> > >> >> time users
> > >> >>
> > >> >> Your thoughts?
> > >> >> AndyMC
> > >> >>
> > >>
> > >>
>

Reply via email to