Good dependency parser are hard to find; moreover good dependency parsers trained on clinical data are impossible to find. I don't think there is another dep parser trained on clinical data other than cTAKES's. In general, the state of the art of dependency parsing is associated with resource intense computing, the models are also of fair size. -- Guergana Savova, PhD, FACMI Associate Professor PI Natural Language Processing Lab Boston Children's Hospital and Harvard Medical School 300 Longwood Avenue Mailstop: BCH3092 Enders 144.1 Boston, MA 02115 Tel: (617) 919-2972 Fax: (617) 730-0817 [email protected] Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv http://ctakes.apache.org http://thyme.healthnlp.org http://cancer.healthnlp.org http://share.healthnlp.org http://center.healthnlp.org
-----Original Message----- From: Finan, Sean [mailto:[email protected]] Sent: Tuesday, June 27, 2017 4:07 PM To: [email protected] Subject: RE: Proposed improvements [EXTERNAL] [SUSPICIOUS] Hi all, > I would like to have (and work on it) much leaner distribution One bigfoot is the clearparser_models.jar in ctakes-dependency-parser-res. As far as I know this is not used by default or in any checked-in non-default configuration. As it is 1/4 GB, I would like to move it to its own module to keep it out of projects that use ctakes "as a library". I hunted the net to see if a duplicate is available elsewhere for alternative inclusion methods but couldn't find one. Thoughts? Thanks, Sean -----Original Message----- From: Andrey Kurdumov [mailto:[email protected]] Sent: Sunday, June 25, 2017 1:52 AM To: cTakes developers list Subject: Re: Proposed improvements [EXTERNAL] Just want to note that ASF PMC want to make GitHub primary repository and Apache servers secondary soon. Regarding improvements: I personally want better support for embedding. Right now cTakes distribution comes with LVG and UMLS dictionary and size of cTakes thus become very. I would like to have (and work on it) much leaner distribution, let's name it cTakes Core, which will just provide cTakes executable without need for data. Right now I have constantly rip-off that data after cTakes build which slow down my build significantly. Personally I support Hadrian initiative to have better logging since cTakes setup has some quirks which could be faster resolved by better logging. 2017-06-23 17:38 GMT+06:00 Miller, Timothy < [email protected]>: > Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and > like something where we should be making people aware of cTAKES! > > svn vs. git -- I'm with you on preferring git, but not by so much that > it's worth spending time on an argument if it turns into an argument > :). As far as I know we've never really had a discussion about it. > It's probably getting to the point where new developers have _only_ > used git and would find it a complete roadblock to use svn but for me > it's just a mild annoyance. > > All others you mentioned -- if you are willing to contribute a patch > we are happy to accept one-off contributions, and we are also > interested in growing the developer community with people who are > interested in contributing regularly over time. > > Tim > > ________________________________________ > From: Hadrian Zbarcea <[email protected]> > Sent: Thursday, June 22, 2017 9:14 PM > To: [email protected] > Subject: Proposed improvements [EXTERNAL] > > Last week I presented at the OSEHRA Summit about ActiveMQ (and a few > other projects) and the ASF in general. > > I was surprised that most didn't know much about the ASF and more > importantly that nobody knew about cTakes, the only (directly) > healthcare related project at the ASF. There was no cTakes talk at > ApacheCon in Miami, but at OSEHRA, which is all about healthcare we > should have had a presence. I will probably submit a talk for next > year, but until then, because I think I created a bit of interest in > cTakes I went to build cTakes myself and try a few things. > > Some of my findings are: > * test failures with openjdk; granted the docs mention oracle jdk as a > prerequisite, but think it's easy to support openjdk > * use of svn vs git; this is a debatable topic, but by now everybody > and their uncles are on git so moving to git (which I'd recommend) > would probably forster adoption (yes, I know about the github mirror) > * no support for OSGi, many large players use it > * improvements in logging could go a long way, starting with moving to > slf4j > > Suggesting improvements imply that I volunteer to do a good chunk of > the work, but before that I'm interested more in how much the > community would welcome such improvements. I am curious what are > considered more low hanging fruits, for the more controversial topics > we could take them to [discuss] threads. Because every community has > its own culture and I am not that familiar with the cTakes one, > although I went through the mail archives, I thought a prudent first step > would be to start with this. > > Feedback appreciated, > Hadrian >
