Re: Proposed improvements [EXTERNAL] [SUSPICIOUS]

2017-06-27 Thread Miller, Timothy
Yeah, actually, I have no idea why that's there. All the actual default parser 
models are in their own directories (dependency, srl, etc.). This almost looks 
like just a collection of additional models, which the average user would have 
no idea how to use and take up a lot of space.
Tim


From: Finan, Sean 
Sent: Tuesday, June 27, 2017 10:07 PM
To: dev@ctakes.apache.org
Subject: RE: Proposed improvements [EXTERNAL] [SUSPICIOUS]

Hi all,

> I would like to have (and work on it) much leaner distribution
One bigfoot is the clearparser_models.jar in ctakes-dependency-parser-res.  As 
far as I know this is not used by default or in any checked-in non-default 
configuration.  As it is 1/4 GB, I would like to move it to its own module to 
keep it out of projects that use ctakes "as a library".  I hunted the net to 
see if a duplicate is available elsewhere for alternative inclusion methods but 
couldn't find one.

Thoughts?

Thanks,
Sean

-Original Message-
From: Andrey Kurdumov [mailto:kant2...@googlemail.com]
Sent: Sunday, June 25, 2017 1:52 AM
To: cTakes developers list
Subject: Re: Proposed improvements [EXTERNAL]

Just want to note that ASF PMC want to make GitHub primary repository and 
Apache servers secondary soon.

Regarding improvements:
I personally want better support for embedding. Right now cTakes distribution 
comes with LVG and UMLS dictionary and size of cTakes thus become very.
I would like to have (and work on it) much leaner distribution, let's name it 
cTakes Core, which will just provide cTakes executable without need for data.
Right now I have constantly rip-off that data after cTakes build which slow 
down my build significantly.

Personally I support Hadrian initiative to have better logging since cTakes 
setup has some quirks which could be faster resolved by better logging.


2017-06-23 17:38 GMT+06:00 Miller, Timothy <
timothy.mil...@childrens.harvard.edu>:

> Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and
> like something where we should be making people aware of cTAKES!
>
> svn vs. git -- I'm with you on preferring git, but not by so much that
> it's worth spending time on an argument if it turns into an argument
> :). As far as I know we've never really had a discussion about it.
> It's probably getting to the point where new developers have _only_
> used git and would find it a complete roadblock to use svn but for me
> it's just a mild annoyance.
>
> All others you mentioned -- if you are willing to contribute a patch
> we are happy to accept one-off contributions, and we are also
> interested in growing the developer community with people who are
> interested in contributing regularly over time.
>
> Tim
>
> 
> From: Hadrian Zbarcea 
> Sent: Thursday, June 22, 2017 9:14 PM
> To: dev@ctakes.apache.org
> Subject: Proposed improvements [EXTERNAL]
>
> Last week I presented at the OSEHRA Summit about ActiveMQ (and a few
> other projects) and the ASF in general.
>
> I was surprised that most didn't know much about the ASF and more
> importantly that nobody knew about cTakes, the only (directly)
> healthcare related project at the ASF. There was no cTakes talk at
> ApacheCon in Miami, but at OSEHRA, which is all about healthcare we
> should have had a presence. I will probably submit a talk for next
> year, but until then, because I think I created a bit of interest in
> cTakes I went to build cTakes myself and try a few things.
>
> Some of my findings are:
> * test failures with openjdk; granted the docs mention oracle jdk as a
> prerequisite, but think it's easy to support openjdk
> * use of svn vs git; this is a debatable topic, but by now everybody
> and their uncles are on git so moving to git (which I'd recommend)
> would probably forster adoption (yes, I know about the github mirror)
> * no support for OSGi, many large players use it
> * improvements in logging could go a long way, starting with moving to
> slf4j
>
> Suggesting improvements imply that I volunteer to do a good chunk of
> the work, but before that I'm interested more in how much the
> community would welcome such improvements. I am curious what are
> considered more low hanging fruits, for the more controversial topics
> we could take them to [discuss] threads. Because every community has
> its own culture and I am not that familiar with the cTakes one,
> although I went through the mail archives, I thought a prudent first step 
> would be to start with this.
>
> Feedback appreciated,
> Hadrian
>


Re: Proposed improvements [EXTERNAL]

2017-06-27 Thread Hadrian Zbarcea
Speaking of lvg. Does anybody know where the source code for 
lvgdist-2016.0.jar is?


Thanks,
Hadrian



On 06/26/2017 11:04 AM, Finan, Sean wrote:

Hi Andrey,

Thank you for the input.  Thank you also Hadrian.

With regard to a smaller ctakes, I know that a couple of people (including 
yours truly) are currently working on trimming some fat.  A few areas have been 
targeted, with the old/huge umls dictionary being at the top of the list.  It 
is deprecated and only used in a few tests.  Lvg is also used in a few test 
configurations, but I am unsure of its necessity.

As far as a "ctakes core" ... I have been trying to figure out a smart way to separate 
the default clinical pipeline modules from others, making the others optional.  I already have a 
pom for clinical that does not include relation, temporal, coref, very importantly ytex ... as 
those are not part of the default clinical pipeline.  One thing that has me halted is figuring out 
how and where to make a simple mechanism for people to grab the more advanced modules.  A while ago 
I put a project pom in sandbox under "ctakes the api" or something to that effect.  It is 
basically a pom with advanced modules commented out.  A developer could start with that pom as 
their project main, then uncomment modules as needed.  It was a first ten-minute attempt at 
something simple and, while worth a try, not an ideal solution.

Another idea that I have been tossing around is separating tests into separate modules.  
Also possibly "training" into separate modules.  It is standard practice to 
keep parallel src/ and test/ directories in a repository and this kind of follows that 
thinking.  Many of the tests (such as mentioned above) require/use modules and resources 
that are not actually required to build the source.  The same goes for possible examples. 
 I think that the same could be true for training - if not now, perhaps in the future.   
Again, I am held up on the best way to actually do this, keeping things simple wrt maven 
and a lack of excess complexity.  The last thing that I want to do is make ctakes more 
difficult to use.

Maybe osgi can help the above, but I'm honestly not sure how.  If anybody else 
thinks that it can then I am going to let them handle it.  Perhaps I am just 
jaded.  Years ago my previous company had great hopes for osgi and invested a 
lot of time (=money) into applying it to our applications.  Over a million 
dollars later, the consensus was that osgi couldn't apply to our applications 
without completely rewriting infrastructure - which was an absolute no-go - and 
even if it could just be slapped on overnight did nothing for us or our 
customers.

With regard to better logging, I think that James added some more detailed logging for 
the 4.0 release, and I think that he has a few more areas slated.  There are more logging 
statements that exist at finer levels than "info" and can be seen by changing 
the log4j configuration.  As for changing the entire codebase to slf4j, I may be alone 
but I'm not sure how that alone will make ctakes any more transparent.

With regard to ctakes setup having some quirks ... yup.  Known issue to a lot of us.  Documentation 
was improved for the 4.0 release, but "run anywhere" documentation is difficult to both 
create and maintain.  Several ideas have been tossed around including installation scripts, an 
"environment/setup investigation/confirmation" gui or something like a running faq/blog 
on nothing but installation problems and solutions.

Sean

-Original Message-
From: Andrey Kurdumov [mailto:kant2...@googlemail.com]
Sent: Sunday, June 25, 2017 1:52 AM
To: cTakes developers list
Subject: Re: Proposed improvements [EXTERNAL]

Just want to note that ASF PMC want to make GitHub primary repository and 
Apache servers secondary soon.

Regarding improvements:
I personally want better support for embedding. Right now cTakes distribution 
comes with LVG and UMLS dictionary and size of cTakes thus become very.
I would like to have (and work on it) much leaner distribution, let's name it 
cTakes Core, which will just provide cTakes executable without need for data.
Right now I have constantly rip-off that data after cTakes build which slow 
down my build significantly.

Personally I support Hadrian initiative to have better logging since cTakes 
setup has some quirks which could be faster resolved by better logging.


2017-06-23 17:38 GMT+06:00 Miller, Timothy <
timothy.mil...@childrens.harvard.edu>:


Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and
like something where we should be making people aware of cTAKES!

svn vs. git -- I'm with you on preferring git, but not by so much that
it's worth spending time on an argument if it turns into an argument
:). As far as I know we've never really had a discussion about it.
It's probably getting to the point where new developers have _only_
used git and would find it a complete roadblock to use svn but for me
it's 

RE: Proposed improvements [EXTERNAL]

2017-06-27 Thread Finan, Sean
Hi all,

> I would like to have (and work on it) much leaner distribution
One bigfoot is the clearparser_models.jar in ctakes-dependency-parser-res.  As 
far as I know this is not used by default or in any checked-in non-default 
configuration.  As it is 1/4 GB, I would like to move it to its own module to 
keep it out of projects that use ctakes "as a library".  I hunted the net to 
see if a duplicate is available elsewhere for alternative inclusion methods but 
couldn't find one.

Thoughts?

Thanks,
Sean

-Original Message-
From: Andrey Kurdumov [mailto:kant2...@googlemail.com] 
Sent: Sunday, June 25, 2017 1:52 AM
To: cTakes developers list
Subject: Re: Proposed improvements [EXTERNAL]

Just want to note that ASF PMC want to make GitHub primary repository and 
Apache servers secondary soon.

Regarding improvements:
I personally want better support for embedding. Right now cTakes distribution 
comes with LVG and UMLS dictionary and size of cTakes thus become very.
I would like to have (and work on it) much leaner distribution, let's name it 
cTakes Core, which will just provide cTakes executable without need for data.
Right now I have constantly rip-off that data after cTakes build which slow 
down my build significantly.

Personally I support Hadrian initiative to have better logging since cTakes 
setup has some quirks which could be faster resolved by better logging.


2017-06-23 17:38 GMT+06:00 Miller, Timothy <
timothy.mil...@childrens.harvard.edu>:

> Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and 
> like something where we should be making people aware of cTAKES!
>
> svn vs. git -- I'm with you on preferring git, but not by so much that 
> it's worth spending time on an argument if it turns into an argument 
> :). As far as I know we've never really had a discussion about it. 
> It's probably getting to the point where new developers have _only_ 
> used git and would find it a complete roadblock to use svn but for me 
> it's just a mild annoyance.
>
> All others you mentioned -- if you are willing to contribute a patch 
> we are happy to accept one-off contributions, and we are also 
> interested in growing the developer community with people who are 
> interested in contributing regularly over time.
>
> Tim
>
> 
> From: Hadrian Zbarcea 
> Sent: Thursday, June 22, 2017 9:14 PM
> To: dev@ctakes.apache.org
> Subject: Proposed improvements [EXTERNAL]
>
> Last week I presented at the OSEHRA Summit about ActiveMQ (and a few 
> other projects) and the ASF in general.
>
> I was surprised that most didn't know much about the ASF and more 
> importantly that nobody knew about cTakes, the only (directly) 
> healthcare related project at the ASF. There was no cTakes talk at 
> ApacheCon in Miami, but at OSEHRA, which is all about healthcare we 
> should have had a presence. I will probably submit a talk for next 
> year, but until then, because I think I created a bit of interest in 
> cTakes I went to build cTakes myself and try a few things.
>
> Some of my findings are:
> * test failures with openjdk; granted the docs mention oracle jdk as a 
> prerequisite, but think it's easy to support openjdk
> * use of svn vs git; this is a debatable topic, but by now everybody 
> and their uncles are on git so moving to git (which I'd recommend) 
> would probably forster adoption (yes, I know about the github mirror)
> * no support for OSGi, many large players use it
> * improvements in logging could go a long way, starting with moving to 
> slf4j
>
> Suggesting improvements imply that I volunteer to do a good chunk of 
> the work, but before that I'm interested more in how much the 
> community would welcome such improvements. I am curious what are 
> considered more low hanging fruits, for the more controversial topics 
> we could take them to [discuss] threads. Because every community has 
> its own culture and I am not that familiar with the cTakes one, 
> although I went through the mail archives, I thought a prudent first step 
> would be to start with this.
>
> Feedback appreciated,
> Hadrian
>


RE: Annotating Lab data [EXTERNAL]

2017-06-27 Thread Finan, Sean
Hi Tanmay,

Good question.  The short answer to
>Does this AE contain annotator to annotate Lab data?
Is "no".

> can someone suggest any different annotator that could identify lab values.

Kean Kaufmann was kind enough to contribute an annotator that creates 
labmentions.  A great description plus a .zip with the code can be found at 
https://issues.apache.org/jira/browse/CTAKES-441
I need to check it in to trunk for the next release ... I just assigned it to 
myself so that I don't forget.  :^(

Somebody out there may have another engine or post-processor that does.  

It should also be pretty easy to make your own "post-processor" annotator if 
Kean's doesn't fit your purposes.  All of the concepts are stored with a 
semantic type that can be used to determine whether or not a medication, 
finding, etc. is what you a consider a LabMention.  Use OntologyConceptUtil 
.getTuis(..) or .getAnnotationsByTui(..) to help you with this.  
http://ctakes.apache.org/apidocs/4.0.0/ 
A semantic type to tui map is here:
https://mmtx.nlm.nih.gov/MMTx/semanticTypes.shtml

Sean


-Original Message-
From: Das, Tanmay [mailto:tanmay@optum.com] 
Sent: Tuesday, June 27, 2017 3:39 AM
To: dev@ctakes.apache.org
Subject: Annotating Lab data [EXTERNAL]

Hi,

When using the CVD bundled with cTAKES along with 
AggrigatePlainTextFastUMLSProcessor I found that no laboratory data was 
annotated, even after providing it.
For an input like:
LABORATORY DATA:
Hemoglobin 10.6, hematocrit 31.7, white cell count 5.8, platelet 377.
Magnesium 2.6, glucose 98, BUN 13, creatinine 0.5, sodium 138, potassium 3.9, 
chloride 103. INR is 1.5.
The IdentifiedAnnotations classified them as Medication, Procedures etc but not 
as LabMention.
Does this AE contain annotator to annotate Lab data? If not, can someone 
suggest any different annotator that could identify lab values.


This e-mail, including attachments, may include confidential and/or proprietary 
information, and may be used only by the person or entity to which it is 
addressed. If the reader of this e-mail is not the intended recipient or his or 
her authorized agent, the reader is hereby notified that any dissemination, 
distribution or copying of this e-mail is prohibited. If you have received this 
e-mail in error, please notify the sender by replying to this message and 
delete this e-mail immediately.


Annotating Lab data

2017-06-27 Thread Das, Tanmay
Hi,

When using the CVD bundled with cTAKES along with 
AggrigatePlainTextFastUMLSProcessor I found that no laboratory data was 
annotated, even after providing it.
For an input like:
LABORATORY DATA:
Hemoglobin 10.6, hematocrit 31.7, white cell count 5.8, platelet 377.
Magnesium 2.6, glucose 98, BUN 13, creatinine 0.5, sodium 138, potassium 3.9,
chloride 103. INR is 1.5.
The IdentifiedAnnotations classified them as Medication, Procedures etc but not 
as LabMention.
Does this AE contain annotator to annotate Lab data? If not, can someone 
suggest any different annotator that could identify lab values.


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.