RE: cTakes Annotation Comparison

2014-12-19 Thread Savova, Guergana
Several thoughts:
1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only 
Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES 
annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, 
Medications and Anatomical Sites. Therefore you will get MANY more annotations 
with cTAKES. Eventually the ShARe corpus will be expanded to the other types.

2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 
of cTAKES on the ShARe corpus if you output only mentions of type 
Disease/Disorder. 

3. Could you send us the list of files you use from ShARe to test? We have the 
corpus and would like to run against as well.

Hope this makes sense...
--Guergana

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 1:16 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Our analysis against the human adjudicated gold standard from this SHARE corpus 
is using a simple check to see if the cTakes output included the annotation 
specified by the gold standard. The initial results I reported were for exact 
matches of CUI and text span.  Only exact matches were counted.

It looks like if we also count as matches cTakes annotations with a matching 
CUI and a text span that overlaps the gold standard text span then the matches 
increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the 
the old pipeline.

The question was also asked about annotations in the cTakes output that were 
not in the human adjudicated gold standard. The answer is yes, there were a lot 
of additional annotations made by cTakes that don't appear to be in the gold 
standard. We haven't analyzed that yet, but it looks like the gold standard we 
are using may only have Disease_Disorder annotations.



 [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy < 
timothy.mil...@childrens.harvard.edu> wrote:
>
> Thanks Kim,
> This sounds interesting though I don't totally understand it. Are you 
> saying that extraction performance for a given note depends on which 
> order the note was in the processing queue? If so that's pretty bad! 
> If you (or anyone else who understands this issue) has a concrete 
> example I think that might help me understand what the problem is/was.
>
> Even though, as Pei mentioned, we are going to try moving the 
> community to the faster dictionary, I would like to understand better 
> just to help myself avoid issues of this type going forward (and 
> verify the new dictionary doesn't use similar logic).
>
> Also, when we finish annotating the sample notes, might we use that as 
> a point of comparison for the two dictionaries? That would get around 
> the issue that not everyone has access to the datasets we used for 
> validation and others are likely not able to share theirs either. And 
> maybe we can replicate the notes if we want to simulate the scenario 
> Kim is talking about with thousands or more notes.
>
> Tim
>
>
> On 12/19/2014 10:24 AM, Kim Ebert wrote:
> Guergana,
>
> I'm curious to the number of records that are in your gold standard 
> sets, or if your gold standard set was run through a long running cTAKES 
> process.
> I know at some point we fixed a bug in the old dictionary lookup that 
> caused the permutations to become corrupted over time. Typically this 
> isn't seen in the first few records, but over time as patterns are 
> used the permutations would become corrupted. This caused documents 
> that were fed through cTAKES more than once to have less codes 
> returned than the first time.
>
> For example, if a permutation of 4,2,3,1 was found, the permutation 
> would be corrupted to be 1,2,3,4. It would no longer be possible to 
> detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
> fix in after the cTAKES 3.2.0 release. 
> https://issues.apache.org/jira/browse/CTAKES-310
> Depending upon the corpus size, I could see the permutation engine 
> eventually only have a single permutation of 1,2,3,4.
>
> Typically though, this isn't very easily detected in the first 100 or 
> so documents.
>
> We discovered this issue when we made cTAKES have consistent output of 
> codes in our system.
>
> [IMAT Solutions]<http://imatsolutions.com>
> Kim Ebert
> Software Engineer
> [Office:] 801.669.7342
> kim.eb...@imatsolutions.com<mailto:greg.hub...@imatsolutions.com>
> On 12/19/2014 07:05 AM, Savova, Guergana wrote:
>
> We are doing a similar kind of evaluation and will report the results.
>
> Before we release

RE: cTakes polarity problem

2014-12-31 Thread Savova, Guergana
cTAKES also implements a rule-based approach to the negation/polarity problem. 
It was the default until the latest release. You are free to use the rule-based 
implementation and compare results with the ML approach.
--Guergana

-Original Message-
From: Michael J Gurley [mailto:m-gur...@northwestern.edu] 
Sent: Wednesday, December 31, 2014 11:22 AM
To: dev@ctakes.apache.org
Subject: Re: cTakes polarity problem

I think this demonstrates that machine learning is not the right approach to 
the negation/polarity problem.


Michael Gurley
m-gur...@northwestern.edu
312 925 3268
Northwestern University Clinical and Translational Sciences Institute
(NUCATS)
http://www.nucats.northwestern.edu
Rubloff Building
750 N Lake Shore Drive, 11th Floor
Chicago, IL 60611







On 12/31/14 9:13 AM, "Miller, Timothy"
 wrote:

>Hi Yu,
>
>The new polarity module is machine-learning based so it is not always 
>easy to diagnose accuracy issues. But generally it might mean there was 
>no example like that in the training data. It was trained on multiple 
>corpora, but sometimes certain phrases slip through the cracks, and 
>"Deny hepatitis," while possible in the truncated language of clinical 
>notes, seems like an unlikely phrase and so it may not be in our data.
>Is that a real example you saw or just a minimum (not) working example?
>If not do you have a real example (i.e. a whole sentence) where "deny"
>should cause a negation but does not? If so I will look into it. We 
>have had a few reports like this so it may be worth keeping track of 
>missed examples for future iterations of the module. It is important 
>that they be real examples "from the wild" though.
>
>(As an aside, machine learning methods don't understand language the 
>way people do so even if it seems obvious to a human that "Deny ."
>should be negated, if it looks different enough from the context of an 
>example from the training data the ML will sometimes fall back to the 
>majority class of "Not negated".)
>
>Tim
>
>
>On 12/31/2014 10:03 AM, Yu Liang wrote:
>> I have a quick question about CTAKES.
>> I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get 
>>some negation results by referring to polarity attribute.
>> However, it turns out, for example ³Negative for hepatitis², is not 
>>negated. I think it is weird and I tried ³No hepatitis², ³ Denies 
>>hepatitis² which return ³polarity= -1², but ³Deny hepatitis.² returns 
>>³polarity=1².
>>
>> any one could give me some clue that what is wrong? Thank you!
>



RE: Negex

2015-01-05 Thread Savova, Guergana
Yes, they were added in the rule-based implementation. You can still use it if 
you'd like.
--Guergana


-Original Message-
From: Green, John [mailto:john.gr...@usuhs.edu] 
Sent: Monday, January 05, 2015 12:59 PM
To: dev@ctakes.apache.org
Subject: Negex

Hi all - Does anyone know off the top of their head if the negex trigger rules 
included in the original 2009 python script were added to when it was 
implemented in ctakes?

Thanks,
John


RE: Medical de-identification

2015-03-22 Thread Savova, Guergana
Agreed - sounds very good!
--guergana

From: britt fitch [mailto:britt.fi...@wiredinformatics.com]
Sent: Sunday, March 22, 2015 11:59 AM
To: dev@ctakes.apache.org
Cc: Rohit Shinde
Subject: Re: Medical de-identification

Sounds good.

Starting with some references:
Docs: https://open.med.harvard.edu/wiki/display/SCRUBBER/3.X
Publication: 
http://www.biomedcentral.com/1472-6947/13/112/abstract
  (check out the supplemental material as well for additional details on 
running and improvements)
SVN (old, standalone, Scrubber v.3.x): 
https://open.med.harvard.edu/wiki/display/SCRUBBER/Software
SVN (initial apache port to ctakes sandbox): 
https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/

The project started off as a standalone process and became a UIMA pipeline 
(outside of ctakes).
The plan had always been to port this to an optional ctakes module but we never 
got that fully implemented.

Some of the parts that need the most attention to get going:

  *   working with the ctakes type system
  *   pulling out weka (ML lib) for an asf 2.0 friendly lib instead
  *   simpler process for building the models.

Regarding knowledge, its good to be familiar with java, UIMA, decision trees, 
and ctakes. Likely in that order.

While this is still in the sandbox and you are still getting familiar with 
running it as a standalone app feel free to ping me and andy off-list if thats 
more convenient.
Then we can definitely bring it back to the dev list while getting it running 
in ctakes.

Cheers,

Britt









Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

On Mar 20, 2015, at 7:57 PM, andy mcmurry 
mailto:mcmurry.a...@gmail.com>> wrote:

Britt et al: here is a student named rohit interested in getting the
deidentification pipeline running again. Hoping there is still interest in
getting this going in ctakes for real. Comments?
-- Forwarded message --
From: "Rohit Shinde" 
mailto:rohit.shinde12...@gmail.com>>
Date: Mar 20, 2015 5:02 AM
Subject: Re: Medical de-identification
To: "andy mcmurry" mailto:mcmurry.a...@gmail.com>>
Cc:

I would certainly be interested into "production grade code". The project
also sounds interesting. How do I start working on it? I know Java well.
What else would I need to know before starting on this project?

On Fri, Mar 20, 2015 at 12:44 PM, andy mcmurry 
mailto:mcmurry.a...@gmail.com>>
wrote:


Yes, the project is in Java, the code was written for a research project
and never made into "production grade code". If you are interested, we
would like to turn the scrubber into a solid pipeline. Java programming
100%, with Colt statistical library
On Mar 19, 2015 7:52 PM, "Rohit Shinde" 
mailto:rohit.shinde12...@gmail.com>>
wrote:


Hi Andy,

Could you please tell me more about that project? I would really like a
reply.

Thank you,
Rohit Shinde

On Wed, Mar 18, 2015 at 5:51 PM, Rohit Shinde <
rohit.shinde12...@gmail.com> wrote:


Hi Andy,

I am interested in medical de-identification. I would like to know what
this project consists of. Is it partially implemented, or does the
implementation need to start?

What languages would I need to know? What theoretical background would I
need? Also, how complex would this task be? What parts of OpenNLP does this
project use?

Thank you,
Rohit Shinde




RE: Request for help:: NCBO Ontology Extraction Tool for i2b2

2015-04-27 Thread Savova, Guergana
Hi Sekhar,
You'd want to be on the i2b2 mailing list, not the cTAKES mailing list.
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Monday, April 27, 2015 7:57 AM
To: dev@ctakes.apache.org
Subject: RE: Request for help:: NCBO Ontology Extraction Tool for i2b2

Sekhar,
You seem to be on the wrong email list.
Tim


From: Hari, Sekhar [sekhar.h...@cgi.com]
Sent: Monday, April 27, 2015 7:50 AM
To: dev@ctakes.apache.org
Subject: RE: Request for help:: NCBO Ontology Extraction Tool for i2b2

Hello there - Any luck on extracting and processing these ontologies; 
particularly OAE, SSE, and OVAE?



Many thanks,

Sekhar H.



-Original Message-

From: Hari, Sekhar

Sent: Friday, April 24, 2015 11:45 AM

To: dev@ctakes.apache.org

Subject: RE: Request for help:: NCBO Ontology Extraction Tool for i2b2



I checked only those 4 Ontologies that I mentioned in my email. In this site - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__i2b2.bioontology.org_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNE&s=o9OW8Ggj5Sf1Qj9MP74B-vyof_EDsrOWZhRPNHFNsh0&e=
  , I see that you have submitted a number of final metadata files for 
different Ontologies. I am not familiar with Extraction and Processing programs 
to modify it; hence I requested the group under the hope that somebody can 
extract and process the final metadata files for these Ontologies.



WHO-ART:

For this one, the problem is that the Processing program dies with the "GC 
Overhead Limit reached" error exactly after the output file size reaches 11GB 
(if I provide the  pathFormat as 'Medium'; dies at 9.4GB if the pathFormat is 
'Short'). The Extraction program worked very well.

I contacted Lori, and here is what he has to say:

"Problem with WHO-ART is that its circular...  I don't have a solution for this 
problem.  ..

Traverse down one of the AV Block paths / Retinal Odeama / Fungal ../ Thyroid 
... / Aspiration / and Av Block again...  it goes on and on..."



OAE, SSE, OVAE:

For this one, the problem is different. There is no "GC Overhead Limit" error. 
But when the Extraction program runs, after each page there is Java 
"NullPointerException". Lori asked me to modify the program. Below is Lori's 
response:



"I see the problem



My code assumes the following format for each concept:



Example from ICD9:



https://urldefense.proofpoint.com/v2/url?u=http-3A__bioportal.bioontology.org_ontologies_umls_tui&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNE&s=pJ0tV9QPzp3YPIl85qnP8S4zaxpEE7m8auQPWFGkvNA&e=
 ">T061

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_2004_02_skos_core-23notation&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNE&s=Khia9dB2IyUg57GmBl39USHjHqNrPovzCUP3ivBAOR4&e=
 ">83.72

https://urldefense.proofpoint.com/v2/url?u=http-3A__bioportal.bioontology.org_ontologies_umls_cui&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNE&s=PrdG0tWcPXszObCrai-Xn-pVAVhIbmw8-jvCfImP2zA&e=
 ">C0185466

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_2004_02_skos_core-23prefLabel&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNE&s=p0y5VnSnfBTe2SjT5xQGpY4PoWOKqMQTkuTV1bY1O9M&e=
 ">Recession of tendon



Its expecting to see  to obtain the basecode of the term.



In your case

There is no  entry.   (why you are seeing null pointers)



It does have (which I assume is the basecode) https://urldefense.proofpoint.com/v2/url?u=http-3A__data.bioontology.org_metadata_prefixIRI&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNE&s=TQXj3b4O_PVIR5VZnzWgq2RzphBc3LeKpnI2LPFih40&e=
 ">OAE:0001620



Your problem is going to need a custom solution, that unfortunately I don't 
have the bandwidth for.   I can tell where/how to modify the code to fit your 
needs.  Let me know if you need assistance in modifying the code."



Thanks,

Sekhar H.



-Original Message-

From: Pei Chen [mailto:chen...@apache.org]

Sent: Thursday, April 23, 2015 9:22 PM

To: dev@ctakes.apache.org

Subject: Re: Request for help:: NCBO Ontology Extraction Tool for i2b2



Sekhar,

Is it happening to all of the ontologies you mentioned or just one?  Those 
ontologies do not seem very big or deep.  Did you notice in the logs if 
something in the ontology havi

RE: OpenNLP VS UIMA, general question.

2015-05-19 Thread Savova, Guergana
I cannot answer your question, but thought to point out that the stand-alone 
OpenNLP models are trained on general domain data, while the models in cTAKES 
are trained on a combo of general + clinical domain data. Makes a significant 
difference in parser performance.
--Guergana

-Original Message-
From: Damir Olejar [mailto:olejar.da...@gmail.com] 
Sent: Tuesday, May 19, 2015 10:08 AM
To: dev@ctakes.apache.org
Subject: OpenNLP VS UIMA, general question.

To whom it may concern,

I would like to ask whether it is possible to have a code written for OpenNLP 
and then, if necessary, integrate it with UIMA.  Furthermore, is it possible to 
go from UIMA to OpenNLP ? For example, I am interested in a medical analysis 
with cTakes, but I cannot find a way how to do it using only the OpenNLP.

The reason why I want to rely on OpenNLP as much as possible, is simply due to 
a complexity of applications I am developing, and UIMA would simply complicate 
everything without a necessity.

Thank you kindly for your answers!

Damir Olejar


RE: Content extract and classification of Patient's note

2015-05-20 Thread Savova, Guergana
You would probably need to write an annotator for PSA and TNM staging as 
currently cTAKES does not these cancer-specific variables extracted. We are 
working on such annotators but will be some time before we release them.

However, it should not be difficult for you to add a new annotator.
--Guergana Savova

From: Sangram Patil [mailto:sangr...@figmd.com]
Sent: Wednesday, May 20, 2015 10:02 AM
To: dev@ctakes.apache.org
Subject: Re: Content extract and classification of Patient's note

Thanks for the information!!!

I have notes of cancer patients and would like to extract values such PSA, 
Gleason TNM staging mentioned in the text-note. Could you please provide inputs.

On Mon, May 18, 2015 at 6:40 PM, Chen, Pei 
mailto:pei.c...@childrens.harvard.edu>> wrote:
Sangram,
cTAKES is modular by design so it is comprised of a collection of 
components/modules with some Default or Sample Aggregate pipelines. There could 
be an endless combination depending on the task at hand. I would suggest 
checking out the components guide (even though it may be outdated) to figure 
out what would be needed for a specific task...



On May 18, 2015, at 2:18 AM, Sangram Patil 
mailto:sangr...@figmd.com>>>
 wrote:

Hi,

I am pretty new to cTakes, and need to extract fields from text note of 
patient. Could you please tell me which AE is suitable. Also can I have some 
sample code examples for my job. Attached is the sample patient note.

--

Sincerely,
Sangram Patil




--
* Please include your practice ID in your correspondence.

Sincerely,

Sangram Patil
Team Lead-Core Dev
Cell: +91 992 293 9497

[cid:8DBA14FA-A66C-440F-B247-FE46832624AA]
figmd.com
CONFIDENTIALITY 
NOTICE


RE: Exploiting the power of cTakes, using OpenNLP only

2015-05-22 Thread Savova, Guergana
Yes, you are correct. cTAKES does named entity recognition and 
normalization=mapping to an ontology (through the UMLS). The normalization part 
is what is different from what is usually done in the general domain (where 
mentions of several semantic types are discovered but not necessarily 
normalized to a concept within an ontology). In the general domain, there is a 
recent trend to normalize to Wikipedia (wikification).

In short, to do the NER in cTAKES you do need a license for the UMLS. BTW, that 
license is free for level 0 vocabularies.

Hope this information helps.
--Guergana

-Original Message-
From: Damir Olejar [mailto:olejar.da...@gmail.com] 
Sent: Friday, May 22, 2015 7:51 AM
To: dev@ctakes.apache.org
Subject: Re: Exploiting the power of cTakes, using OpenNLP only

To answer my own question, it all comes down to UMLS licensing, and which files 
are being downloaded from the server.
The files that are downloaded are compressed *.model files that can be 
integrated with cTakes.
However, there is (or might be in the near future) a restriction to which user 
can download which files, and also, there might be a copyright issue if the 
UMLS procedure is not followed.

So, yes, there is no need for UIMA, but then, for any serious work, the 
copyrights need to be respected.


On Thu, May 21, 2015 at 12:10 PM, Damir Olejar 
wrote:

> To whom it may concern,
>
> First, I would like to apologize if my question is vague, since I am 
> new and unaccustomed to the cTakes diction. To keep my question simple 
> and up to a point, let us assume that I am working only with an Apache 
> OpenNLP. I do not have any UIMA-specific JAR files included, and let 
> us assume that I do not want to include any of them (or keep it to a 
> minimum), thus keeping the project confined to OpenNLP as much as possible.
>
> As far as I know, UIMA is just a framework that does not provide any 
> specific NLP tools (source:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_24186742_is-2Duima-2Dprovides-2Donly-2Da-2Dwrapper-2Dor-2Dis-2Dit-2Dlike-2Dstandfordcore-2Dnlp-2Dand-2Dgate&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=umFvmAvfVN2FIHuugFp5H33UdNyy-mxG3U3yDPRMp9I&s=uM0wOUdg63NBJRXD3JRZeU0fx-jT8ide6bcZdx_-WY8&e=
>  ).
> This means that there should be a way of integrating the cTakes 
> components with OpenNLP.
>
> What I would like to do is to simply have the Name Entity Recognition
> (NER) applied to a text, so I know which word from an inputted 
> sentence is a medical term.  The perfect option would be if I could 
> have a *.bin file such as "en-ner-person.bin”, but I think that cTakes 
> does not give us such an option, since there are no *.bin files.
>
> How would I accomplish such a task? Would there be any code, examples, 
> tutorials, documentations, pseudo-code, ideas ,… to take a look at?
>
> Thank you kindly for your time, understanding, and a patience.
>
> Damir
>


RE: TimeLanes

2015-06-22 Thread Savova, Guergana
The cTAKES temporal component is in the main release. You can get the system 
output, but as Sean said TimeLanes does not consume it yet.

A demo of the cTAKES temporal component can be found in Getting Started -> 
Demos. Pei just put it up there, thank you very much, Pei!
--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, June 22, 2015 11:36 AM
To: dev@ctakes.apache.org
Subject: RE: TimeLanes

Hi Maashu,



TimeLanes is currently a prototype gui under development and there is probably 
no information about it on the web.  It is in sandbox because it isn't part of 
the ctakes release and is missing much needed functionality.  For instance, It 
should display basic information about the patient and note (name, birth date, 
note date), but such things are often in structured data or some custom header 
of the note.  Right now TimeLanes does not fetch them at all (it will require 
custom readers) and just displays "Dan Testing".



If you want to run it, the main class is 
org.chboston.cnlp.timeline.gui.main.TimelineMain .  Upon startup it will 
display "open a note".  You can use the "Open" button or drag a file into the 
box.  Unfortunately, it does not yet run ctakes (coming soon), so you need to 
give it an annotated (protégé or Anafora) note or .xmi .  Using an .xmi would 
probably be easiest as you can create it with ctakes.  You can watch an 
outdated video here:  

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DKp9YE0o3urU-26feature-3Dyoutu.be&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=P2Q3bVKBdvXziFnahfApZEyBbj-eR-wV-TfEZfTtl0Q&s=1HETvigL__bzBXBpv2jLdRJMvJ3CI77UQZORumsBJIM&e=
 



Sean



-Original Message-

From: maa...@gmail.com [mailto:maa...@gmail.com] 

Sent: Friday, June 12, 2015 1:18 PM

To: dev@ctakes.apache.org

Subject: TimeLanes



Hi All,



I've just started working with cTAKES and was curious about TimeLanes.  I found 
it in the sandbox here:



https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_sandbox_timelanes_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=qneEArWy0QvCgMGCuF8-DwG3kslsrGAKWFtmP174uO4&s=iZj-v0HJjZccezixIOmlTFwyIGFf9OqImfSv-aMKdgI&e=
 



But I'm lost on how to actually use it.  I've googled around but there seems to 
be very little information on it.



Can anyone point me in the right direction?



Thanks in advance!



Cheers,



-Maashu



--

"If you are immune to boredom, there is literally nothing you cannot 
accomplish."



-David Foster Wallace



RE: Including gene mappings in UMLS

2015-09-01 Thread Savova, Guergana
Hi Chris,
We have not focused on the gene mappings because gene mentions were not very 
frequent in the clinical narrative until several years ago. However, as 
genes/mutations and proteins have become actionable items in both diagnosing 
and treatment, we do have plans to add a module for gene/mutations/protein 
mentions in the very near future. We are likely to start with cancer since that 
domain offers most actionable information and is consistently recorded in 
pathology notes. Contributions are more than welcome.

Is this helping? 
Cheers,
--Guergana

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Tuesday, September 1, 2015 7:16 PM
To: dev@ctakes.apache.org
Subject: Including gene mappings in UMLS

Hey cTAKES community,



Is there any particular reason that you guys didn’t do the mappings

for genes (which exists in UMLS) - but doesn’t seem to exist as

cTAKES concept identifiers (CIDs)?



Are there any plans to include this in a future release of Apache

cTAKES? We were just wondering.



Thanks all!



Cheers,

Chris



++

Chris Mattmann, Ph.D.

Chief Architect

Instrument Software and Science Data Systems Section (398)

NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Office: 168-519, Mailstop: 168-527

Email: chris.a.mattm...@nasa.gov

WWW:  
https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=Dq_pRYxsnlvTlbBj114Dbfhjvu-Ps3J0sV5YYJj-S3o&s=NWXiQ1BQ4rFG345QMhfFcVwsFIiy9cmd_8KD9rlGh1w&e=
 

++

Adjunct Associate Professor, Computer Science Department

University of Southern California, Los Angeles, CA 90089 USA

++









RE: End to end ctakes example app

2015-09-18 Thread Savova, Guergana
Jay,
Do you mean something like this:
http://52.26.219.218:8080/index.jsp

--guergana

-Original Message-
From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] 
Sent: Friday, September 18, 2015 10:43 AM
To: dev@ctakes.apache.org
Subject: End to end ctakes example app

Is there a end 2 end ctakes app that can be used to demonstrate all the core 
Portions of it?

We built the bigpetstore app in bigtop to do this. And we have a data generator.

Maybe time to do something similar for ctakes


RE: cTAKES in Spanish

2015-10-30 Thread Savova, Guergana
This book cites some work on adaptation of cTAKES to Spanish:
Bioinformatics (Pacbb 2014). (2014) Saez-Rodriguez J. Springer.

Also this:
http://link.springer.com/chapter/10.1007%2F978-3-319-09891-3_18

--Guergana

-Original Message-
From: Santiago Esteban [mailto:santiagoeste...@gmail.com] 
Sent: Friday, October 30, 2015 8:10 AM
To: dev@ctakes.apache.org
Subject: cTAKES in Spanish

Hi everyone,
I'm new to cTkes and to this mailing so this might have already been answered 
before but, does anyone know of any projects for using cTAKES on clinical text 
in spanish?

Thanks!

Santiago


RE: ctakes with icd10

2015-12-08 Thread Savova, Guergana
Hi Alaa,
You need to create a resource off the terminology/ontology you want to use (in 
this case ICD9 or ICD10). Then run that resource with cTAKES for the fast 
dictionary lookup. There is cTAKES code and some documentation on how to create 
that resource. By default, cTAKES runs with a resource created from the English 
version of SNOMED CT and RxNORM.
Hope this helps.
--Guergana

-Original Message-
From: Alaa al Barari [mailto:alaa.albar...@gmail.com] 
Sent: Tuesday, December 8, 2015 10:01 AM
To: dev@ctakes.apache.org
Subject: ctakes with icd10

Hi,

I downloaded Latest umls version, and I want to know how to make ctakes work 
with icd10 and icd9.


Thanks


RE: ctakes blog

2015-12-09 Thread Savova, Guergana
Excellent idea!
+1
--guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, December 8, 2015 4:41 PM
To: dev@ctakes.apache.org
Subject: ctakes blog

The recent discussion over dictionary building, and someone pointing out it has 
come up several times, made me think that maybe we should use the blog space 
that apache provides. It could be used for write-ups of things that are not 
quite as formal as "documentation" but would benefit from being written down 
somewhere that is easier to search and link to than a mailing list.
Any thoughts?
Tim



RE: 2007 CMC Challenge data set

2016-01-28 Thread Savova, Guergana
Including Dr. John Pestian who led the creation of the dataset.
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org

-Original Message-
From: John Mongan [mailto:john.mon...@ucsf.edu] 
Sent: Wednesday, January 27, 2016 6:25 PM
To: dev@ctakes.apache.org
Subject: 2007 CMC Challenge data set

Does anyone have a copy of the 2007 Computational Medicine Center Challenge 
data?

The website for the Computational Medicine Center seems to have disappeared and 
I can't find a place to download the data anymore.

Thanks,

John



RE: Clinical Element Model normalization component

2016-02-09 Thread Savova, Guergana
Hi Phuc,
Did you create a dictionary to run your pipeline with?
Did you check under IdentifiedAnnotations where there are annotations of type 
Drugs?
--Guergana


-Original Message-
From: Phan Hồng Phúc [mailto:phanhongphu...@gmail.com] 
Sent: Tuesday, February 9, 2016 2:19 PM
To: dev@ctakes.apache.org
Subject: Re: Clinical Element Model normalization component

Hi all,

I try to run the sample to get the drug name.
Let me describe my steps:
- runctakesCVD.bat
- Load AE:
apache-ctakes-3.2.2\desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProcessor
- Add sample text: "*He is taking 500mg of tylenol, twice a day"*
*- *After run, I see a list of result but all type
org.apache.ctakes.drugner.type.* are [0]

I think it should contain at least drug name "tylenol" in the result.

Do I miss something?

Thank you,
- Phúc


RE: Contributing to documentation

2016-02-10 Thread Savova, Guergana
Hi Jessica,
Thank you very much for offering to contribute to the documentation! Indeed 
this is our weak link and any help there will be greatly appreciated.
A warm welcome to the community!
--Guergana


-Original Message-
From: Pei Chen [mailto:chen...@apache.org] 
Sent: Wednesday, February 10, 2016 1:41 PM
To: dev@ctakes.apache.org
Subject: Re: Contributing to documentation

We've been generally following the C-T-R model [1] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_foundation_glossary.html-23CommitThenReview&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=yprBottjMZmd-5h2kun5_56ITgboOGhRiM1FrbJtLiE&s=BiPUyRARC7nrVJaM2ajjNaANac3AbCc0l25_hWVUCQU&e=
But feel free to discuss on dev@ whenever in doubt...

[1] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_foundation_glossary.html-23CommitThenReview&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=yprBottjMZmd-5h2kun5_56ITgboOGhRiM1FrbJtLiE&s=BiPUyRARC7nrVJaM2ajjNaANac3AbCc0l25_hWVUCQU&e=
 

On Wed, Feb 10, 2016 at 1:31 PM, Jessica Glover  
wrote:
> Thank you. I'm excited to contribute.
>
> Is there a process by which my contributions should get "voted in" or 
> am I free to just start editing?
>
> - Jessica
>
> On Feb 10, 2016 9:28 AM, "Pei Chen"  wrote:
>
>> User Jessica Glover (jgloves) Added to:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org
>> _confluence_display_CTAKES_cTAKES&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKw
>> EW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcr
>> O4yRGmRCJNAr-rCmP&m=yprBottjMZmd-5h2kun5_56ITgboOGhRiM1FrbJtLiE&s=LVL
>> CQGevx3dGn1G-IoKWfyFMl6ZQThSi90BoERcRp6w&e=
>> Enjoy!
>> —Pei
>>
>> On Feb 10, 2016, at 8:43 AM, Jessica Glover 
>> 
>> wrote:
>>
>> Hi Pei,
>> I'm not sure what my confluence ID is. I log in with this email 
>> address, and I can be found under Jessica Glover in a People search.
>>
>> - Jessica
>> This would be great.  What is your confluence id (anyone should be 
>> able to create an account)?
>> --Pei
>>
>> On Tue, Feb 9, 2016 at 7:49 AM, Jessica Glover 
>>  wrote:
>>
>> Hello,
>>
>> I am a cTAKES user, but I am interested in development and especially 
>> interested in contributing to the documentation. I have some ideas 
>> for making the component use guides more user-friendly for first-time 
>> UIMAers, but I'm also eager to hear what the dev community would like 
>> to see. I am happy to write as well as create diagrams.
>>
>> Thanks,
>>
>> Jessica Glover
>>
>>
>>


RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2016-03-10 Thread Savova, Guergana
Yes, the current release of cTAKES has a module for the temporal expressions 
which includes dates. The normalizer for the temporal expressions is Steven 
Bethard's timenorm code.

However, if you do de-identification of dates/temporal expressions, you run the 
risk of creating incorrect timelines as many of the relative temporal 
expressions (e.g. spring of this year, x-mas time, etc.) are unlikely to be 
correctly shifted by any de-identification tool.

One de-identification tool is MIST -- http://mist-deid.sourceforge.net/ . 

Hope this helps with the de-identification items
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv

-Original Message-
From: Azad Dehghan [mailto:azad.dehg...@gmail.com] 
Sent: Thursday, March 10, 2016 3:42 PM
To: dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

> This means both training data folders? I have access to the data but 
> not
to the challenge description.

Yes. Is there any specific information that you are missing?
>
>
>> It would be good to incorporate/refactor (basically, GATE API needs 
>> to be replaced with UIMA API to generate annotation) the two-pass 
>> recognition method for cTAKES - which has a wider application on 
>> longitudinal data.
>> This method is used on-top of a number NERs.
>
>
> I'll take a look.
>
> I do not know how much time I can invest this month. Let's see how 
> many
phases I can translate.
>
> I added the rules for age. Are there jape rules for creating date
annotations?
>

No. I believe cTAKES has existing component(s) to capture dates?

> After all rules are translated, they need some major refactoring. Jape
and Ruta are quite different in some aspects.
>
Ok.

>
>
>
>
>
>> Please let me know where I can help. I will be available again in April.
>>
>> Cheers,
>> Azad
>>
>> On 10 March 2016 at 13:13, Peter Klügl  wrote:
>>
>>> Hi,
>>>
>>> sorry, I was quite busy last month.
>>>
>>> I added a new patch, which needs to be applied.
>>>
>>> No new rules, but it's possible now to evaluate everything against 
>>> the labelled data of the challenge.
>>>
>>> @Azad:
>>> Which documents exactly did you use to develop the rules?
>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
testing-PHI-Gold-fixed?
>>>
>>> Best,
>>>
>>> Peter
>>>
>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:

 Hi,

 the last patch fixed almost all problems.

 I added another one that adds the csv file for the unit test and
extends
 svn-ignore.

 Best,

 Peter

 Am 02.02.2016 um 09:16 schrieb Peter Klügl:
>
> Hi,
>
> I added another patch. I missed to manually add one test file to
version
> control, and there are still duplicate lines.
> I hope this patch fixes the remaining problems.
>
> Best,
>
> Peter
>
>
> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
>>
>> Hi,
>>
>> the problems were caused by the svn client in my Eclipse. Sorry 
>> for
the
>> trouble, I should have looked more closely at the ciomplete patch.
>>
>> I attached a new patch created with commandline tools wich looks
>>>
>>> correct
>>
>> now.
>>
>> Pei, can you apply the new patch?
>>
>> Best,
>>
>> Peter
>>
>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
>>>
>>> Thanks Pei.
>>>
>>> I fear there was again a problem with the patch. All new files 
>>> are missing (and also the svn-ignore settings).
>>>
>>> Can you take a look?
>>>
>>> Best,
>>>
>>> Peter
>>>
>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:

 patch applied.
 Thanks,
 Pei

 On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <
>>>
>>> peter.klu...@averbis.com> wrote:
>
> Hi Pei,
>
> can you commit the recent patch for us?
>
> CTAKES-384-20160120.patch
>
> Best,
>
> Peter
>
> Am 20.01.2016 um 19:35 schrieb Pei Chen:
>>
>> Hi,
>> Sorry I was swamped recently.
>> But yeah, we can even create an extended type system to store
>>>
>>> these items temporarily and add them into the main/core type system 
>>> afterwards.
>>
>> There was an existing item to upgrade UIMA, but agreed- it 
>> will
>>>
>>> require much more testing.  If it works, we can upgrade it in our
sandbox
>>> area or create a branch if necessary.
>>
>> —Pei
>>
>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl <
>>>
>>> peter.klu...@averbis.com> wrote:
>>>
>>> Hi,
>>>
>>

RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2016-03-10 Thread Savova, Guergana
You can re-build the models that feed into MIST. I personally would not use the 
default model that MIST comes with as it is not trained on clinical data. In 
our previous work we found that hand-annotating about 200 docs for PHI 
(representative of the sample you are going to run the models on) results in 
building a pretty good model - in the 90's for p, r and f1. However, even with 
that high performance, the institution that owns the data might be still 
reluctant to share as it might pose a violation of HIPAA through some potential 
PHI leaks. In cTAKES our approach has been to de-couple the de-identifcation 
from the NLP/information extraction. If a user has the need for de-identified 
data, they could choose their method -- manual or otherwise -- and then process 
through cTAKES. Our focus is the NLP/IE space, while de-identification is a 
blend of that plus policy

--Guergana

-Original Message-
From: Azad Dehghan [mailto:azad.dehg...@gmail.com] 
Sent: Thursday, March 10, 2016 4:19 PM
To: dev@ctakes.apache.org
Subject: RE: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

Thanks Guergana.

> Yes, the current release of cTAKES has a module for the temporal
expressions which includes dates. The normalizer for the temporal expressions 
is Steven Bethard's timenorm code.
>

Great.

> However, if you do de-identification of dates/temporal expressions, 
> you
run the risk of creating incorrect timelines as many of the relative temporal 
expressions (e.g. spring of this year, x-mas time, etc.) are unlikely to be 
correctly shifted by any de-identification tool.
>
Indeed, a reason I have not included the dates component.

> One de-identification tool is MIST -- 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s&e=
>   .
>
I don't remember them doing well in the community held evaluation in 2014.
Hence, cDeid :)
>
> Guergana Savova, PhD, FACMI
> Associate Professor
> PI Natural Language Processing Lab
> Boston Children's Hospital and Harvard Medical School
> 300 Longwood Avenue
> Mailstop: BCH3092
> Enders 144.1
> Boston, MA 02115
> Tel: (617) 919-2972
> Fax: (617) 730-0817
> Harvard Scholar: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
> u_guergana-5Fk-5Fsavova_biocv&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> ZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
> RCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=3taiTxFp55
> iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c&e=
>
> -Original Message-
> From: Azad Dehghan [mailto:azad.dehg...@gmail.com]
> Sent: Thursday, March 10, 2016 3:42 PM
> To: dev@ctakes.apache.org
> Subject: Re: Combining Knowledge- and Data-driven Methods for
De-identification of Clinical Narratives
>
> > This means both training data folders? I have access to the data but 
> > not
> to the challenge description.
>
> Yes. Is there any specific information that you are missing?
> >
> >
> >> It would be good to incorporate/refactor (basically, GATE API needs 
> >> to be replaced with UIMA API to generate annotation) the two-pass 
> >> recognition method for cTAKES - which has a wider application on
longitudinal data.
> >> This method is used on-top of a number NERs.
> >
> >
> > I'll take a look.
> >
> > I do not know how much time I can invest this month. Let's see how 
> > many
> phases I can translate.
> >
> > I added the rules for age. Are there jape rules for creating date
> annotations?
> >
>
> No. I believe cTAKES has existing component(s) to capture dates?
>
> > After all rules are translated, they need some major refactoring. 
> > Jape
> and Ruta are quite different in some aspects.
> >
> Ok.
>
> >
> >
> >
> >
> >
> >> Please let me know where I can help. I will be available again in
April.
> >>
> >> Cheers,
> >> Azad
> >>
> >> On 10 March 2016 at 13:13, Peter Klügl 
wrote:
> >>
> >>> Hi,
> >>>
> >>> sorry, I was quite busy last month.
> >>>
> >>> I added a new patch, which needs to be applied.
> >>>
> >>> No new rules, but it's possible now to evaluate everything against 
> >>> the labelled data of the challenge.
> >>>
> >>> @Azad:
> >>> Which documents exactly did you use to develop the rules?
> >>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
> testing-PHI-Gold-fixed?
> >>>
> >>> Best,
> >>>
> >>> Peter
> >>>
> >>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
> 
>  Hi,
> 
>  the last patch fixed almost all problems.
> 
>  I added another one that adds the csv file for the unit test and
> extends
>  svn-ignore.
> 
>  Best,
> 
>  Peter
> 
>  Am 02.02.2016 um 09:16 schrieb Peter Klügl:
> >
> > Hi,
> >
> > I added another patch. I missed to manually add one test fi

RE: cTAKES scale-out with DUCC and Shangridocs

2016-03-14 Thread Savova, Guergana
WOW, this is fantastic, Chris! Thank you so very much!! We will start using the 
DUCC implementation.
Cheers,
--Guergana

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Monday, March 14, 2016 2:58 PM
To: dev@ctakes.apache.org
Subject: cTAKES scale-out with DUCC and Shangridocs

Hi Team,



Just wanted to let you know that my team has completed a deployment

of cTAKES Scale-out with DUCC. Thanks to a number of contributors

in particular Yi-Wen my Directed Research student at USC, and all

the help she has received on this list and on the UIMA list. Thanks

much.



We also have an app, Shangridocs, that we are building on top of

this scale-out. You can find it here:



https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_chrismattmann_shangridocs.git&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=tztXfa7c6KSvjdoVfp-B7Ay9zQwsexm2d65LcSCk2_0&s=T4I20dU3kYj8AGStMEcJJfenlBpq2C7aoe48mrmfH7Y&e=
 



We are actively working on making it more robust and scalable. Feedback

is always welcomed. Apache cTAKES is at the core and is awesome.



Cheers,

Chris



++

Chris Mattmann, Ph.D.

Chief Architect

Instrument Software and Science Data Systems Section (398)

NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Office: 168-519, Mailstop: 168-527

Email: chris.a.mattm...@nasa.gov

WWW:  
https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=tztXfa7c6KSvjdoVfp-B7Ay9zQwsexm2d65LcSCk2_0&s=q0GA0RJZysioUTuiGvbhfooG__KOATqEgvWCchiacnM&e=
 

++

Director, Information Retrieval and Data Science Group (IRDS)

Adjunct Associate Professor, Computer Science Department

University of Southern California, Los Angeles, CA 90089 USA

WWW: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__irds.usc.edu_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=tztXfa7c6KSvjdoVfp-B7Ay9zQwsexm2d65LcSCk2_0&s=NfkeQsPLWF_QJ_fCr98_n4Lsajm_BJ54VNbQ3Zq-uw0&e=
 

++









RE: incorporating an outside ontology in cTAKES pipeline

2016-04-26 Thread Savova, Guergana
Hi Joshua,
You can use any ontology to run cTAKES with as long as it is formatted in the 
cTAKES expected format. I believe Sean Finan gave you some pointers, but here 
is it again:
The fast dictionary module in ctakes can use flat files with _bar-separated 
values_ (.bsv).  Checkout 
*ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv*
 for some example formats.  Start with *custom_cui_tui_bsv.bsv* as an example.  
*CustomCuiTuiBsv.xml* is an example property .xml for configuration, but right 
now it is missing the last *bsv/* subdirectory.

--Guergana

-Original Message-
From: Joshua Valdez [mailto:jd...@case.edu] 
Sent: Tuesday, April 26, 2016 8:59 AM
To: dev@ctakes.apache.org
Subject: incorporating an outside ontology in cTAKES pipeline

Hello,

So, I am trying to extend the UMLSlookupannotator with my own .owl ontology, 
but I am having difficulty finding any resources on how this might be done.  
Has anyone here tried something like this before?  If so, could you point me in 
the direction of any resources that may help in my efforts?  I am having 
trouble locating any material on this subject.

Thanks.





*Joshua Valdez*
*Computational Linguist : Cognitive Scientist : Data Scientist
 *

Home:  (440)-231-0479 | Work: (216)-368-7560 jd...@case.edu  | 
j...@uw.edu | jo...@armsandanchors.com 



RE: cTAKES false positives, case-insensitivity

2016-06-01 Thread Savova, Guergana
This is the very interesting topic of Word Sense Disambiguation. Currently 
there are no generalizable large scale solutions for it... One can in a way 
"hack" it if the domain is constrained, e.g. if your extraction focuses on use 
of hearing aids, you can have a rule that says if hearing in proximity of 
aid/aids, then tag it with the code for a hearing aid and remove all other 
ontology mappings.

In general, the topic makes an excellent candidate for a PhD thesis work.

Hope this helps.
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv


-Original Message-
From: Tomasz Oliwa [mailto:ol...@uchicago.edu] 
Sent: Wednesday, June 1, 2016 11:28 AM
To: dev@ctakes.apache.org
Subject: cTAKES false positives, case-insensitivity

Hi,

I have encountered false positives annotated with cTAKES that seem to come from 
case-insensitivity of the annotation lookup, such as:

Pt uses hearing aids. -> "aids" is found as DiseaseDisorderMention 
cui=C0001175, Acquired Immunodeficiency Syndrome

Pt values are all stable. -> "all" is found as DiseaseDisorderMention 
cui=C1961102, Precursor Cell Lymphoblastic Leukemia Lymphoma"

Are there ways in cTAKES to approach or to resolve such issues?

How do you deal with such false positives, so that they are not matched?

Regards,
Tomasz


RE: Best combination of analysis engines to consider negation, family history, uncertainty, etc.

2016-10-19 Thread Savova, Guergana
Hi Yiming,
Re your question about gold standard datasets. In parallel with releasing best 
performing methods in cTAKES, we have generated several gold standard datesets. 
Our plan is to start distributing them through a unified effort -- a health NLP 
Center. See attached exec summary. We hope to have the Center running in the 
very near future.

Cheers,
--Guergana

-Original Message-
From: Zuo Yiming [mailto:yiming...@gmail.com] 
Sent: Wednesday, October 19, 2016 12:22 PM
To: dev@ctakes.apache.org
Subject: Re: Best combination of analysis engines to consider negation, family 
history, uncertainty, etc.

Hi Sean and Timothy,

Thanks for your clarification about ClearTK tools. I'm amazed by the power of 
cTAKES and the resource and community you guys take efforts to built. I will 
certainly be happy to provide more feedback as my project moves on.

For Timothy,

By rule-based system, do you refer to the assertion annotator? How about the 
old negation annotator and the status annotator, are they also ruled-based 
system? I got a feeling that assertion annotator and ClearTK system are more 
favored than negation annotator and the status annotator for some reason in 
cTAKES right now.

Regarding ClearTK system on my test files, the negation, history, uncertainty 
modules work just fine as the assertion annotator. My test files are only a 
few, so it's really hard to tell which one is better. The main difference comes 
when detecting subject and generic property. On my limited test files, ClearTK 
system doesn't work at all. It will assign patient as the subject for all 
detected phrases when it's the patient's family member who have diabetes. The 
same problem goes to the generic property, ClearTK system assigns false as the 
generic property for all detected phrases. The paper mentioned by you and Sean 
seems interesting, I will take a look later.

As for further questions, can you guys give me some suggestions where to find 
public golden standard datasets so I can actually conduct some independent 
evaluation of cTAKES by metrics like precision/recall and F1 score?

At last, a minor suggestion from the user perspective will be to add the 
preferred words property to the AggregatePlaintextUMLSProcessor. Like I pointed 
out briefly in my first email, using AggregatePlaintextFastUMLSProcessor we can 
get the preferred words for detected phrases but not 
AggregatePlaintextUMLSProcessor. This is very helpful when the detected phrases 
are acronyms such as pt for patient. From my experience, 
AggregatePlaintextUMLSProcessor tend to detect more clinical relevant phrases 
compared with AggregatePlaintextFastUMLSProcessor. It will be really nice if we 
can have the same preferred words property in AggregatePlaintextUMLSProcessor 
in future cTAKES release.

Best,
Yiming

On Wed, Oct 19, 2016 at 11:11 AM, Miller, Timothy < 
timothy.mil...@childrens.harvard.edu> wrote:

> I can second Sean's thank you, it is good to have this feedback. The 
> ClearTK machine learning models were made the default after we ran 
> some experiments that found it performed better across a range of 
> standard datasets than rule-based algorithms or the existing cTAKES 
> module ( 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__journals.plos.org_plosone_article-3Fid-3D10.1371_journal.pone.0112774&d=DQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=h2xGj7JrNP5pTj6fU4IE9EdNfbJZ0FkOk3swxGR91E4&s=9b891QWT_DEckn4f25-xn3W32qkz8UoOw61qKAOqpK0&e=
>  ).
> Since making them the default, though, we have heard from people and 
> had our own experience conflict with those experiments. And certainly 
> the errors in the rule-based system are easier to understand.
>
> Just curious, are you able to characterize the errors you see from the 
> ClearTK system? I did some experiments recently on a new dataset 
> comparing negex with the cleartk negation module and found that there 
> was a precision/recall tradeoff but almost identical F1 scores. But 
> for that dataset the tradeoff negex provided was preferred by our 
> collaborators. (I think negex had better recall of negated terms but worse 
> precision).
>
> Tim
>
>
>
> 
> From: Finan, Sean 
> Sent: Wednesday, October 19, 2016 10:53 AM
> To: dev@ctakes.apache.org
> Subject: RE: Best combination of analysis engines to consider 
> negation, family history, uncertainty, etc.
>
> Hi Yiming,
>
>
>
> Thank you very much for letting the community know what has and has 
> not worked for you.  I have also had better results with the Assertion 
> annotators than the ClearTk alternatives, but that could be because of 
> the note types/formats that I am using.
>
>
>
> Regarding the "Clear" in names, it is because ClearTk (Clear ToolKit) 
> is used to train machine learning models for detection of the 
> indicated property.  You can find information on ClearTk starting here:
> https://urldefen

RE: Best combination of analysis engines to consider negation, family history, uncertainty, etc.

2016-10-20 Thread Savova, Guergana
: dev@ctakes.apache.org
Subject: Re: Best combination of analysis engines to consider negation, family 
history, uncertainty, etc.

Hi Sean and Guergana,

Thanks for your reply about the fast and non-fast dictionary look-up, and the 
testing dataset. Originally, I thought the fast annotator is fast because it 
only takes a portion of the whole dictionary. Now I realize the fast annotator 
is the more powerful one. That's very helpful.

For Guergana,

Were you also trying to attach the exec summary? I couldn't see it from the 
email.

Best,
Yiming

On Wed, Oct 19, 2016 at 1:03 PM, Savova, Guergana < 
guergana.sav...@childrens.harvard.edu> wrote:

> Hi Yiming,
> Re your question about gold standard datasets. In parallel with 
> releasing best performing methods in cTAKES, we have generated several 
> gold standard datesets. Our plan is to start distributing them through 
> a unified effort
> -- a health NLP Center. See attached exec summary. We hope to have the 
> Center running in the very near future.
>
> Cheers,
> --Guergana
>
> -Original Message-
> From: Zuo Yiming [mailto:yiming...@gmail.com]
> Sent: Wednesday, October 19, 2016 12:22 PM
> To: dev@ctakes.apache.org
> Subject: Re: Best combination of analysis engines to consider 
> negation, family history, uncertainty, etc.
>
> Hi Sean and Timothy,
>
> Thanks for your clarification about ClearTK tools. I'm amazed by the 
> power of cTAKES and the resource and community you guys take efforts 
> to built. I will certainly be happy to provide more feedback as my project 
> moves on.
>
> For Timothy,
>
> By rule-based system, do you refer to the assertion annotator? How 
> about the old negation annotator and the status annotator, are they 
> also ruled-based system? I got a feeling that assertion annotator and 
> ClearTK system are more favored than negation annotator and the status 
> annotator for some reason in cTAKES right now.
>
> Regarding ClearTK system on my test files, the negation, history, 
> uncertainty modules work just fine as the assertion annotator. My test 
> files are only a few, so it's really hard to tell which one is better. 
> The main difference comes when detecting subject and generic property. 
> On my limited test files, ClearTK system doesn't work at all. It will 
> assign patient as the subject for all detected phrases when it's the 
> patient's family member who have diabetes. The same problem goes to 
> the generic property, ClearTK system assigns false as the generic 
> property for all detected phrases. The paper mentioned by you and Sean 
> seems interesting, I will take a look later.
>
> As for further questions, can you guys give me some suggestions where 
> to find public golden standard datasets so I can actually conduct some 
> independent evaluation of cTAKES by metrics like precision/recall and 
> F1 score?
>
> At last, a minor suggestion from the user perspective will be to add 
> the preferred words property to the AggregatePlaintextUMLSProcessor. 
> Like I pointed out briefly in my first email, using 
> AggregatePlaintextFastUMLSProcessor
> we can get the preferred words for detected phrases but not 
> AggregatePlaintextUMLSProcessor. This is very helpful when the 
> detected phrases are acronyms such as pt for patient. From my 
> experience, AggregatePlaintextUMLSProcessor tend to detect more 
> clinical relevant phrases compared with 
> AggregatePlaintextFastUMLSProcessor. It will be really nice if we can 
> have the same preferred words property in AggregatePlaintextUMLSProcessor in 
> future cTAKES release.
>
> Best,
> Yiming
>
> On Wed, Oct 19, 2016 at 11:11 AM, Miller, Timothy < 
> timothy.mil...@childrens.harvard.edu> wrote:
>
> > I can second Sean's thank you, it is good to have this feedback. The 
> > ClearTK machine learning models were made the default after we ran 
> > some experiments that found it performed better across a range of 
> > standard datasets than rule-based algorithms or the existing cTAKES 
> > module ( https://urldefense.proofpoint.com/v2/url?u=http-3A__
> journals.plos.org_plosone_article-3Fid-3D10.1371_
> journal.pone.0112774&d=DQIBaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-
> j0kfEcrO4yRGmRCJNAr-rCmP&m=h2xGj7JrNP5pTj6fU4IE9EdNfbJZ0F
> kOk3swxGR91E4&s=9b891QWT_DEckn4f25-xn3W32qkz8UoOw61qKAOqpK0&e= ).
> > Since making them the default, though, we have heard from people and 
> > had our own experience conflict with those experiments. And 
> > certainly the errors in the rule-based system are easier to understand.
> >
> > Just curious, are you able to characterize the errors you see from 
> &

RE: New to CTAKES [SUSPICIOUS]

2017-01-17 Thread Savova, Guergana
As Sean mentioned, we the NLP lab at Boston Children's Hospital/Harvard Medical 
School will be dedicating significant effort in the next several months (=FTE) 
to make a solid release happen asap. We expect the release within 3 months. As 
Sean mentioned help from the broader cTAKES community is welcome.
Thank you!
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org





-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, January 17, 2017 12:54 PM
To: dev@ctakes.apache.org
Subject: RE: New to CTAKES [SUSPICIOUS]

> 3.2.3 is still considered a snapshot - do you have any feeling when/if it 
> will be released?

Good question.

One person has volunteered to be a release manager and we at the Boston 
Children's Hospital nlp group are trying to get some additional hands on the 
task.  There are still outstanding bugs.  Ctakes-core is undergoing some 
changes and should be tested before release.  I think that the state is good, 
but in my opinion the whole app needs to have some end-to-end testing before a 
major release.  At a recent hackathon, 50% of those present could not by 
themselves get ctakes installed and running even with written instructions.  In 
my mind a release that is not usable is not a release at all, so I think that 
we need to devote a little effort to usability.  Again, that is just my 
opinion.  I have put in time over the past few months to work on making ctakes 
easier for newcomers and non-developers.  As you noticed, a fair amount of 
online documentation is stale, and it would be great if people volunteered to 
update it before a release.  After that there are just the matters of updating 
the main website links, publicizing the release, release notes, and a parade 
with balloons.

It think that everybody out there would be happy if there was a new official, 
stable and useable release.  I also think that we can get one of good quality 
together within the next 3 months - more quickly if there are volunteers from 
the community.

Sean


-Original Message-
From: Dunlop, Joyce (HP) [mailto:joyce.dun...@va.gov] 
Sent: Tuesday, January 17, 2017 12:32 PM
To: dev@ctakes.apache.org
Subject: RE: New to CTAKES 

Thanks Sean,

3.2.3 is still considered a snapshot - do you have any feeling when/if it will 
be released?

Thanks,
Joyce

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, January 17, 2017 10:57 AM
To: dev@ctakes.apache.org
Subject: [EXTERNAL] RE: New to CTAKES 

Hi Joyce,

If you are building from source then you should not need to manually download 
the resources.  Maven should be doing it for you.  Well, that is the behavior 
of 3.2.3 ... I honestly cannot remember what 3.2.2 did ...

Otherwise, I think that if the latest was the 3.2.1.1 then that is probably the 
most appropriate for the 3.2.2 release if you want all of the resources.

As for building and deploying ytex, I don't have any advice.  Perhaps some ytex 
power-user out there can help.

Sean

-Original Message-
From: Dunlop, Joyce (HP) [mailto:joyce.dun...@va.gov] 
Sent: Tuesday, January 17, 2017 11:25 AM
To: dev@ctakes.apache.org
Cc: Dorner, Andrew J. (PSI); Rustrian, Armando (Liberty ITS)
Subject: New to CTAKES 

Good Morning,

I am trying to set up a development environment using the source release of 
3.2.2.

Reading though the documentation on

https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2BDeveloper-2BInstall-2BGuide&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=yBXENVQKpWjVraf6Zf7uY5l9LJxxrRiiE-yjyFID6d8&s=iqpkHc0kT5mucNnxYyc1mczXXlbmSVJlX-8dxeJvp2o&e=
 .

Merge the version-matching resources ZIP file from 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_projects_ctakesresources_files_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=yBXENVQKpWjVraf6Zf7uY5l9LJxxrRiiE-yjyFID6d8&s=wPUG8d9qpl_kQBPP5xI9y84mwMEXfaB2cdbkHvWaa0Y&e=
  into your ctakes-dictionary-lookup-res project.

ctakes-resources-3.2.1.1-bin.zip is available for download.  Is there a 3.2.2 version of the resources?

Aft

gold standard annotations for cTAKES

2017-01-31 Thread Savova, Guergana
A while ago our physician colleague John Green created 16 realistically looking 
(but fake) clinical notes. Many thanks again, John!

These notes are in ctakes-examples/data/notes. We now volunteer to annotate 
them with gold annotations. The main elements with their attributes are:
Medications, Attributes ::= span   associatedCode change_status_model 
conditional  dosage_model duration_model  end_date form_model frequency_model 
generic negation_indicator  route_model  start_date  strength_model  subject  
uncertainty_indicator


Signs/Symptoms, Attributes ::= associated_code body_location conditional course 
duration end_time generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Anatomical Sites, Attributes ::= associatedCode  conditional  generic  
negation_indicator  subject  uncertainty_indicator



Disease/DisordersAttributes ::= associated_code body_location conditional 
course duration end_time  generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Procedures, Attributes ::= associated_code  body_location conditional duration 
end_time generic historyOf method negation_indicator relative_temporal_context 
start_time subject uncertainty_indicator



We expect to have the gold annotations by end of March. We are using the 
Anafora annotation tool (https://github.com/weitechen/anafora ) and will 
release the annotations in the xml format.



Regards,

--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org





RE: gold standard annotations for cTAKES [SUSPICIOUS]

2017-01-31 Thread Savova, Guergana
Thank you, Sean!

Yes, absolutely -- we welcome volunteers for the gold annotations!
Regards,
--Guergana

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, January 31, 2017 4:08 PM
To: dev@ctakes.apache.org
Subject: RE: gold standard annotations for cTAKES [SUSPICIOUS]

Hi all,

I just have a couple of notes to expand upon what Guergana wrote.

Anafora requires a schema for annotation and it requires text files to be in a 
certain structure.  I just checked in text files for annotation and the schema 
that we plan to use in ctakes-examples-res 
src/main/resources/org/apache/ctakes/examples/annotation/ . 

Everybody is obviously welcome to use the schema and notes, or to create 
annotations using another tool for all to share.

As a disclaimer ... Anafora is not associated with ctakes.   My opinion is that 
the ctakes devlist should not be over-used for anafora q/a.  

Thanks,
Sean

-Original Message-
From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] 
Sent: Tuesday, January 31, 2017 3:42 PM
To: dev@ctakes.apache.org
Subject: gold standard annotations for cTAKES [SUSPICIOUS]

A while ago our physician colleague John Green created 16 realistically looking 
(but fake) clinical notes. Many thanks again, John!

These notes are in ctakes-examples/data/notes. We now volunteer to annotate 
them with gold annotations. The main elements with their attributes are:
Medications, Attributes ::= span   associatedCode change_status_model 
conditional  dosage_model duration_model  end_date form_model frequency_model 
generic negation_indicator  route_model  start_date  strength_model  subject  
uncertainty_indicator


Signs/Symptoms, Attributes ::= associated_code body_location conditional course 
duration end_time generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Anatomical Sites, Attributes ::= associatedCode  conditional  generic  
negation_indicator  subject  uncertainty_indicator



Disease/DisordersAttributes ::= associated_code body_location conditional 
course duration end_time  generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Procedures, Attributes ::= associated_code  body_location conditional duration 
end_time generic historyOf method negation_indicator relative_temporal_context 
start_time subject uncertainty_indicator



We expect to have the gold annotations by end of March. We are using the 
Anafora annotation tool 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_weitechen_anafora&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=klIlU3or2Lr4NKPbcLwbF6pes2n2Ype-qri4zGIW_Xk&s=DE-u9g6s9UaCO6fLztks2ClRi7lrSCi5IkV5jtu3BPc&e=
  ) and will release the annotations in the xml format.



Regards,

--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu<mailto:guergana.sav...@childrens.harvard.edu>
Harvard Scholar: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.edu_guergana-5Fk-5Fsavova_biocv&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=klIlU3or2Lr4NKPbcLwbF6pes2n2Ype-qri4zGIW_Xk&s=8AV7t2x3gPeu3zXjyzKyiyi6KUNsNO2Qv2Jmsx2Ys1M&e=
 
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org





RE: Phenotype-specific entities

2017-02-15 Thread Savova, Guergana
Hi Erin,
Yes, creating your customized dictionary is the way to go. You can prune by 
semantic types of interest and then remove branches that are not relevant to 
your specific phenotype. I am not aware of cTAKES implementing such a tool for 
a very customized dictionary.

You can also start with  a few terms that you know are relevant to your 
phenotype and then find their synonyms in the UMLS. Then, you can further walk 
a specific ontology and take siblings, parents if you think they are relevant.

Then, there is the whole field of using word embeddings to find 
synonyms/related terms from unlabeled data  if you want to become really fancy 
:-) At this point, cTAKES does not implement any deep learning algorithms, in 
the future we are planning to release a bridge to KERAS. 

I hope this makes sense.

--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org


-Original Message-
From: Erin Nicole Gustafson [mailto:erin.gustaf...@northwestern.edu] 
Sent: Wednesday, February 15, 2017 1:38 PM
To: dev@ctakes.apache.org
Subject: Phenotype-specific entities

Hi all,

I would like to be able to only identify entities that are relevant for some 
specific phenotype. One step towards achieving this would be to build a custom 
dictionary with a limited set of semantic types. However, this is not quite 
specific enough to only identify mentions related to one disease while ignoring 
those related to some other disease, for example.

Does cTAKES currently have a way to do this sort of filtering? Or, has anyone 
developed their own tools that they'd be willing to share?

Thanks,
Erin


RE: Phenotype-specific entities

2017-02-15 Thread Savova, Guergana
I don't believe there is a tool for walking the UMLS ontology, Dima. But Sean 
should confirm that his dictionary building tool does not have that 
functionality.

I think you can use the UMLS tables to get that information. It has been quite 
a while I have used these tables, but I remember I was able to get that 
information from them...

Sean,
Does your dictionary building tool implement ontology walking?

--Guergana

-Original Message-
From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
Sent: Wednesday, February 15, 2017 1:50 PM
To: dev@ctakes.apache.org
Subject: Re: Phenotype-specific entities

Guergana, thank you. 

Is there anything in cTAKES now for walking the UMLS ontology (e.g. for finding 
hypernyms, synonyms, etc.)?

Dima



> On Feb 15, 2017, at 12:45, Savova, Guergana 
>  wrote:
> 
> Hi Erin,
> Yes, creating your customized dictionary is the way to go. You can prune by 
> semantic types of interest and then remove branches that are not relevant to 
> your specific phenotype. I am not aware of cTAKES implementing such a tool 
> for a very customized dictionary.
> 
> You can also start with  a few terms that you know are relevant to your 
> phenotype and then find their synonyms in the UMLS. Then, you can further 
> walk a specific ontology and take siblings, parents if you think they are 
> relevant.
> 
> Then, there is the whole field of using word embeddings to find 
> synonyms/related terms from unlabeled data  if you want to become really 
> fancy :-) At this point, cTAKES does not implement any deep learning 
> algorithms, in the future we are planning to release a bridge to KERAS. 
> 
> I hope this makes sense.
> 
> --
> Guergana Savova, PhD, FACMI
> Associate Professor
> PI Natural Language Processing Lab
> Boston Children's Hospital and Harvard Medical School
> 300 Longwood Avenue
> Mailstop: BCH3092
> Enders 144.1
> Boston, MA 02115
> Tel: (617) 919-2972
> Fax: (617) 730-0817
> guergana.sav...@childrens.harvard.edu
> Harvard Scholar: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.edu_guergana-5Fk-5Fsavova_biocv&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=EMsbVKH4fuTPUXGVRWfjw4vqV3ifyKdh-3K3OLUIogI&s=oAz3p_diNUmQdKL6UIfE9Vsnj1T4H5xq6CIof1jXisU&e=
>  
> ctakes.apache.org
> thyme.healthnlp.org
> cancer.healthnlp.org
> share.healthnlp.org
> 
> 
> -Original Message-
> From: Erin Nicole Gustafson [mailto:erin.gustaf...@northwestern.edu] 
> Sent: Wednesday, February 15, 2017 1:38 PM
> To: dev@ctakes.apache.org
> Subject: Phenotype-specific entities
> 
> Hi all,
> 
> I would like to be able to only identify entities that are relevant for some 
> specific phenotype. One step towards achieving this would be to build a 
> custom dictionary with a limited set of semantic types. However, this is not 
> quite specific enough to only identify mentions related to one disease while 
> ignoring those related to some other disease, for example.
> 
> Does cTAKES currently have a way to do this sort of filtering? Or, has anyone 
> developed their own tools that they'd be willing to share?
> 
> Thanks,
> Erin



RE: Ctakes relation extraction

2017-02-20 Thread Savova, Guergana
A word of caution -- the definition of a causal relation in the clinical 
narrative is much vaguer than in the general domain. Even if the clinical 
narrative asserts a relation between a medication and a sign/symptom (a.k.a. 
adverse event), it might not be necessarily the case. Even more, the lack of an 
asserted explicit causal relation does not mean that such is lacking. 
Regards,
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv

-Original Message-
From: Abramowitsch, Peter [mailto:pabramowit...@hearst.com] 
Sent: Monday, February 20, 2017 1:19 PM
To: dev@ctakes.apache.org
Subject: Re: Ctakes relation extraction

There is another, more generic NLP engine,  StanfordCoreNLP which does have 
more advanced CORel annotation capabilities, but it is not specifically tuned 
to clinical concepts and relationships. So for instance, causal relationships 
might be detected if expressed in standard
english, but certainly not by clinical acronyms.But you might play
with it just to get a sense of what is possible in the open source space.

Regards,  Peter



On 2/19/17, 11:31 PM, "Oleg Bogatiryov"  wrote:

>Hello to everyone.
>
> 
>
>I am pleased to join the group.
>
> 
>
>I am trying to extract relation from the document.
>
>Ideally I'd like to get the graph or tree of dependencies/relations 
>from the clinical documents.
>
> 
>
>Could you please let me know how can I achieve it ?
>
> 
>
>I am able to run CVD and RelationExtractorAggregate analysis engine but 
>there is no useful information
>
>in results that can be used in order to build a relation graph.
>
> 
>
>Thanks in advance,
>
>Oleg.
>



RE: wiki wishlist [SUSPICIOUS]

2017-03-01 Thread Savova, Guergana
Thank you, James!

One suggestion (more to come): post the pamphlet that Sean Finan created for 
the cTAKES hackathon in Chicago in Nov 2016. 

--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Wednesday, March 1, 2017 1:31 PM
To: dev@ctakes.apache.org
Subject: RE: wiki wishlist [SUSPICIOUS]

Virge!  Thanks James!

Sean

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com] 
Sent: Wednesday, March 01, 2017 1:22 PM
To: dev@ctakes.apache.org
Subject: wiki wishlist

In an earlier post I mentioned I was interested in moving away from Confluence 
for the cTAKES wiki, but the only new wikis Infra will create are Confluence 
ones.

I suggest we use this thread + a JIRA item to compile a list of wiki changes 
people would like - formatting, content, anything to do with updating the 
cTAKES wiki, which is 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=fis6AwJb9fhaRocij6Pe0pkYNBdy-lty2Sn_7j8xt7c&s=KF-QpiUGZ8Ks8KuSn8OVstr8A_Wlk88EaJmt5CZ2DgE&e=
 

First, if you have a quick update, please just go ahead and make it!

I'll start the list with these items:
   - make the sidebar show the most recent cTAKES release at the top (reverse 
chronological order) (Done - Just did it!)
   - incorporate any comments made within the Wiki, such as this one 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.0-2BUser-2BInstall-2BGuide-3FfocusedCommentId-3D34013875-23comment-2D34013875&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=fis6AwJb9fhaRocij6Pe0pkYNBdy-lty2Sn_7j8xt7c&s=XIuRKG3vBrrf8vkmmsAwQTjiq82mgaG_XMRWSYEAvLM&e=
 

To add to the wish list, either comment on CTAKES-420 
 or reply to this thread.

Thanks!
James


FW: ASF Board Report for cTAKES - Initial Reminder for March 2017

2017-03-02 Thread Savova, Guergana
Some items to include in the report:
1. actively working on cTAKES 4.0 scheduled for release end of March (major 
release)
2. actively working on updating the cTAKES Confluence website
3. human-tagged gold annotations of 18 mockup clinical notes done. Notes were 
generated by a cTAKES committer-physician John Green. Format is Anafora 
(https://github.com/weitechen/anafora ), annotations are for signs/symptoms, 
diseases/disorders, procedures, anatomical sites and medications with relevant 
attributes and mappings to ontology concept codes. Human tagged annotations 
done by Dave Harris at Boston Children's Hospital.

--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org


-Original Message-
From: Brett Porter [mailto:br...@apache.org] 
Sent: Wednesday, March 1, 2017 6:27 AM
To: Pei J Chen 
Cc: priv...@ctakes.apache.org
Subject: ASF Board Report for cTAKES - Initial Reminder for March 2017

This email was sent on behalf of the ASF Board.  It is an initial reminder to 
give you plenty of time to prepare the report.

According to board records, you are listed as the chair of a committee that is 
due to submit a report this month. [1] [2]

The meeting is scheduled for Wed, 15 Mar 2017 at 10:30 PDT and the deadline for 
submitting your report is 1 full week prior to that (Wed Mar 8th)!

Meeting times in other time zones:

  
https://urldefense.proofpoint.com/v2/url?u=http-3A__timeanddate.com_s_3773&d=DwICaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc&s=9CXNCWeglLq82KBvEvZkH9XmRRfCPgwlK6hkn1W74rY&e=
 

Please submit your report with sufficient time to allow the board members to 
review and digest. Again, the very latest you should submit your report is 1 
full week (7days) prior to the board meeting (Wed Mar 8th).

If you feel that an error has been made, please consult [1] and if there is 
still an issue then contact the board directly.

As always, PMC chairs are welcome to attend the board meeting.

Thanks,
The ASF Board

[1] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_committers_board_committee-2Dinfo.txt&d=DwICaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc&s=JA58ZZaIG-RLKRrTJEnH0bjDyYkaLX4HQTxBIrjGdf0&e=
[2] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_committers_board_calendar.txt&d=DwICaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc&s=6gNTwUUVr1yhHfwrwiQZw76k05LsCxa4CdomPb24Y3U&e=
[3] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_committers_board_templates&d=DwICaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc&s=XbtYiUZAJVS_GEnFsxqbYNQInVHTJAxJfyPAdIuol0I&e=
[4] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__reporter.apache.org_&d=DwICaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc&s=jPo9HdBMlwrjjCIDgW_SDYe9ZHbj-GTd6FVXkwIqByg&e=
 


Submitting your Report
--

Full details about the process and schedule are in [1].

The report should be committed to the meeting agenda in the board directory in 
the foundation repository, trying to keep a similar format to the others.
This can be found at:

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_foundation_board&d=DwICaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc&s=IRycMcH2Tp_FHvglrdxrfIObbAXYSafzFipAmXSUywM&e=
 

Reports can also be posted using the online agenda tool:

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__whimsy.apache.org_board_agenda_2017-2D03-2D15_cTAKES&d=DwICaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc&s=xsHBVwyaBAvLrqe9XgcTJ6BpsD_I1Y0Z_liwO6SISP0&e=
 

Your report should also be sent in plain-text format to bo...@apache.org with a 
Subject line that follows the below format:

  Subject: [REPORT] cTAKES - March 2017

Cutting and pasting directly from a Wiki is not acceptable due to formatti

Apache cTAKES 4.0 and soliciting testimonials

2017-04-03 Thread Savova, Guergana
Dear Apache cTAKES community,

As you know, Apache cTAKES 4.0 release candidate will be ready for your testing 
sometime this week. A big round of applause goes to James Masanz and Sean Finan 
for leading this milestone release -- thank you, James and Sean!

Sally Khudairi from Apache generously offered to help us craft the announcement 
for cTAKES 4.0 release. She suggested we solicit quotes/testimonials from the 
Apache cTakes community to demonstrate the project's robustness and breadth of 
deployment. We are now asking you to send us your testimonials to include in 
the announcement. We very much look forward to your input!

Kindest regards,
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org
http://thyme.healthnlp.org
http://cancer.healthnlp.org
http://share.healthnlp.org
http://center.healthnlp.org





RE: cTAKES confluence wiki

2017-04-13 Thread Savova, Guergana
I am sorry but I am not seeing documentation for v4 on the confluence Wiki: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES 
Could you please send the relevant link? Also, I think it would be very helpful 
to include the link in the README distributed with the release.

Thanks!
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Thursday, April 13, 2017 12:49 PM
To: dev@ctakes.apache.org
Subject: Re: testing release candidates Re: Release Apache cTAKES 4.0.0 (rc2) 
[SUSPICIOUS] [SUSPICIOUS]

OK. By logging into confluence I found the draft of version 4.0 documentation, 
but maybe it's worth sending an email to dev with a few pages that need help 
and people can improve as they test?

I will do the same.

Thanks
Tim

On Thu, 2017-04-13 at 12:24 -0400, James Masanz wrote:
> I agree.
> There are (or were) some places that have TBD. and the part about 
> unzipping resources needs to be expanded to include what to do if you 
> just download the fast dictionary and not the entire set of 
> dictionaries.  If no one beats me to it I will improve those sections 
> by the time we announce.
> but
> the documentation is ready for comments and can continually be 
> improved, even past an announced release if needed
> 
> On Thu, Apr 13, 2017 at 12:14 PM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > 
> > Hi Tim,
> > 
> > Excellent question/point.
> > 
> > I think that you are welcome to follow any online instructions.  We 
> > are aware that the wiki is far from complete, and one thing that I 
> > welcome everybody to do is become active on documentation.
> > 
> > So, if you find instructions for installation, workflow, etc.
> > please
> > "test" the instructions.  If there are none then comment on the 
> > absence.
> > However, I think that a paucity of documentation should not hold up 
> > the code/bin release.  I could be in the minority opinion.
> > 
> > Sean
> > 
> > -Original Message-
> > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > Sent: Thursday, April 13, 2017 11:55 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: testing release candidates Re: Release Apache cTAKES
> > 4.0.0
> > (rc2) [SUSPICIOUS]
> > 
> > Thanks all for your hard work. I added some minor instructions to 
> > the spreadsheet that are hopefully helpful.
> > 
> > I want to test the cvd for standard dictionary lookup with the 
> > separate resoureces. Am I meant to be testing documentation as well? 
> > As in, something I can follow along and make sure it's correct? Or 
> > should I just do it the way I know how to do it?
> > Tim
> > 
> > On Wed, 2017-04-12 at 20:21 -0400, James Masanz wrote:
> > > 
> > > Hi Everyone,
> > > 
> > > We could use a google spreadsheet to end up with a sense of 
> > > testing coverage and maybe reduce duplicate testing effort too.
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.
> > > com_
> > > spreadsheets_d_1FK-
> > > 2DkEhwewLJaVCBgWsSAMhL2KNCD6L8AMfFM33oKR2Y_edit-
> > > 3Fusp-
> > > 3Dsharing&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> > > r=He
> > > up-IbsIg9Q1TPOylpP9FE4GTK-
> > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=4CRAiUDySrFeinWC7JWYv7qWMQ
> > > FLuR
> > > Py-8Or1PXz-Fk&s=e08AY-Zbdb76VvYv_7uI4PE7LSTnsaP9BWpYtALtNgI&e=
> > > And we can compare future releases to this one.
> > > 
> > > I put a few example lines of the first things I plan to test.
> > > I'll
> > > start testing tomorrow and add more lines for myself then.
> > > If you don't want to update the spreadsheet twice, it would still 
> > > be helpful to list what you've done after you do testing, without 
> > > listing what you plan to do ahead of time.
> > > 
> > > Thanks,
> > > -- James
> > > 
> > > 
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen 
> > > wrote:
> > > 
> > > > 
> > > > 
> > > > This is a call for a vote on releasing the following candidate
> > > > (rc2) as
> > > > Apache cTAKES 4.0.0.
> > > > 
> > > > For more detailed information on the changes/release notes, 
> > > > please
> > > > visit:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apa
> > > > che.
> > > > org_jira_secure_ReleaseNote.jspa- 
> > > > 3F&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heu
> > > > p-
> > > > IbsIg9Q1TPOylpP9FE4GTK-
> > > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=4CRAiUDySrFeinWC7JWYv7qW
> > > > MQFL
> > > > uRPy-8Or1PXz-Fk&s=rjZm_RuqvmHgiCulkvVx1bMlB-
> > > > hPdl2e6jFALQo9EpI&e=
> > > > projectId=12313621&version=12340211
> > > > 
> > > > The release was made using the cTAKES release process documented
> > > > here:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ctakes.apa
> > > > che.
> > > > org_ctakes-2Drelease-
> > > > 2Dguide.html&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
> > > > xeFU
> > > > &r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=4CRAiUDySrFeinWC7JWYv7qW
> > > > MQFL
> > > > uRPy-8Or1PXz

RE: cTAKES confluence wiki

2017-04-13 Thread Savova, Guergana
Actually, James did send an email with the Confluence details -- my bad for not 
seeing it.
--Guergana

-Original Message-
From: Savova, Guergana 
Sent: Thursday, April 13, 2017 2:05 PM
To: dev@ctakes.apache.org
Subject: RE: cTAKES confluence wiki

I am sorry but I am not seeing documentation for v4 on the confluence Wiki: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES
Could you please send the relevant link? Also, I think it would be very helpful 
to include the link in the README distributed with the release.

Thanks!
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Thursday, April 13, 2017 12:49 PM
To: dev@ctakes.apache.org
Subject: Re: testing release candidates Re: Release Apache cTAKES 4.0.0 (rc2) 
[SUSPICIOUS] [SUSPICIOUS]

OK. By logging into confluence I found the draft of version 4.0 documentation, 
but maybe it's worth sending an email to dev with a few pages that need help 
and people can improve as they test?

I will do the same.

Thanks
Tim

On Thu, 2017-04-13 at 12:24 -0400, James Masanz wrote:
> I agree.
> There are (or were) some places that have TBD. and the part about 
> unzipping resources needs to be expanded to include what to do if you 
> just download the fast dictionary and not the entire set of 
> dictionaries.  If no one beats me to it I will improve those sections 
> by the time we announce.
> but
> the documentation is ready for comments and can continually be 
> improved, even past an announced release if needed
> 
> On Thu, Apr 13, 2017 at 12:14 PM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > 
> > Hi Tim,
> > 
> > Excellent question/point.
> > 
> > I think that you are welcome to follow any online instructions.  We 
> > are aware that the wiki is far from complete, and one thing that I 
> > welcome everybody to do is become active on documentation.
> > 
> > So, if you find instructions for installation, workflow, etc.
> > please
> > "test" the instructions.  If there are none then comment on the 
> > absence.
> > However, I think that a paucity of documentation should not hold up 
> > the code/bin release.  I could be in the minority opinion.
> > 
> > Sean
> > 
> > -Original Message-
> > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > Sent: Thursday, April 13, 2017 11:55 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: testing release candidates Re: Release Apache cTAKES
> > 4.0.0
> > (rc2) [SUSPICIOUS]
> > 
> > Thanks all for your hard work. I added some minor instructions to 
> > the spreadsheet that are hopefully helpful.
> > 
> > I want to test the cvd for standard dictionary lookup with the 
> > separate resoureces. Am I meant to be testing documentation as well?
> > As in, something I can follow along and make sure it's correct? Or 
> > should I just do it the way I know how to do it?
> > Tim
> > 
> > On Wed, 2017-04-12 at 20:21 -0400, James Masanz wrote:
> > > 
> > > Hi Everyone,
> > > 
> > > We could use a google spreadsheet to end up with a sense of 
> > > testing coverage and maybe reduce duplicate testing effort too.
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.
> > > com_
> > > spreadsheets_d_1FK-
> > > 2DkEhwewLJaVCBgWsSAMhL2KNCD6L8AMfFM33oKR2Y_edit-
> > > 3Fusp-
> > > 3Dsharing&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> > > r=He
> > > up-IbsIg9Q1TPOylpP9FE4GTK-
> > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=4CRAiUDySrFeinWC7JWYv7qWMQ
> > > FLuR
> > > Py-8Or1PXz-Fk&s=e08AY-Zbdb76VvYv_7uI4PE7LSTnsaP9BWpYtALtNgI&e=
> > > And we can compare future releases to this one.
> > > 
> > > I put a few example lines of the first things I plan to test.
> > > I'll
> > > start testing tomorrow and add more lines for myself then.
> > > If you don't want to update the spreadsheet twice, it would still 
> > > be helpful to list what you've done after you do testing, without 
> > > listing what you plan to do ahead of time.
> > > 
> > > Thanks,
> > > -- James
> > > 
> > > 
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen 
> > > wrote:
> > > 
> > > > 
> > > > 
> > > > This is a call for a vote on releasing the following candidate
> > > > (rc2) as
> > > > Apache cTAKES 4.0.0.
> > > > 
> > > > For more detailed information on

RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

2017-04-15 Thread Savova, Guergana
Agreed that we need rc3 asap.
I am planning to test rc3 this weekend. 
--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Friday, April 14, 2017 11:38 PM
To: dev@ctakes.apache.org
Subject: RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

> I'd rather not get into the definition of "basic", just like I'd rather not 
> discuss the definition of obvious with another mathematician.
--> Lol.  My wife can't stand it when I say "obviously".

Fwiw, I think that cutting a new rc sooner rather than later is comparatively 
little work compared to the benefit for testers.  It needs to be done anyway as 
what is in rc2 is not releasable.  I don't want to vote -1 on the rc, but will 
if it is necessary to get an rc3 cut.

Sean

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com]
Sent: Friday, April 14, 2017 9:36 PM
To: dev@ctakes.apache.org
Subject: Re: Release Apache cTAKES 4.0.0 (rc2)

these are all the fixes I plan to make. last I talked to Sean, he had all his 
changes in. I assume there will be more testing up until final vote, I 
certainly will be doing more testing and working more on documentation. But why 
not have people test on the latest now that we have fixed some issues that seem 
like showstoppers?  I'd rather not get into the definition of "basic", just 
like I'd rather not discuss the definition of obvious with another 
mathematician.

On Fri, Apr 14, 2017 at 8:23 PM, Pei Chen  wrote:

> James,
> Happy to create another rc3, but can I suggest we bundle all of the 
> fixes before creating another candidate?  Are there other remaining 
> items to test? This just seems like basic functionality?
>
> On Fri, Apr 14, 2017 at 8:04 PM, James Masanz 
> wrote:
> > -1 from me for rc2 because of various issues found
> > old dictionary lookup didn't work in an IDE unless you manually 
> > download the latest zip - pom files needed updating (checked into 
> > trunk
> > today) (more of the ctakesresources from sourceforge need to be put 
> > onto maven central for ctakes to work as a maven dependency)
> > Sean fixed some issues today (I saw commit notices today) which 
> > I'd like to see included in 4.0 before it's released
> >
> > -- James
> >
> >
> > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen  wrote:
> >
> >> This is a call for a vote on releasing the following candidate
> >> (rc2) as Apache cTAKES 4.0.0.
> >>
> >> For more detailed information on the changes/release notes, please
> visit:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.
> >> org_jira_secure_ReleaseNote.jspa-3F&d=DwIBaQ&c=qS4goWBT7poplM69zy_3
> >> xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gT
> >> ao&m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=UysXDJxyZLqXPkgd
> >> XXnHhwHUOl9QlNlwEhNHgti7unw&e=
> >> projectId=12313621&version=12340211
> >>
> >> The release was made using the cTAKES release process documented here:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__ctakes.apache.
> >> org_ctakes-2Drelease-2Dguide.html&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xh
> >> KwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao
> >> &m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=fIq5eXz-SfJlhVLIwE
> >> cyCvFBbXzhgDSobUXBLQd4J-A&e=
> >>
> >> The candidate is available at:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> >> g_repos_dist_dev_ctakes_ctakes-2D4.0.0-2D&d=DwIBaQ&c=qS4goWBT7poplM
> >> 69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4
> >> f7d4gTao&m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=X5tWr5mjLa
> >> aF0ox740Z4Qm5A0vgmBG52xJhjwZvDiYk&e=
> >> rc2/apache-ctakes-4.0.0-src.tar.gz
> >> /.zip
> >>
> >> The tag to be voted on:
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_
> >> repos_asf_ctakes_tags_ctakes-2D4.0.0-2Drc2&d=DwIBaQ&c=qS4goWBT7popl
> >> M69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd
> >> 4f7d4gTao&m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=UK2KK8Yem
> >> 20C6Ai8CJd358-kgZFai3uOLcwnuzKBw9Q&e=
> >> The MD5 checksum of the tarball can be found at:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> >> g_repos_dist_dev_ctakes_ctakes-2D4.0.0-2D&d=DwIBaQ&c=qS4goWBT7poplM
> >> 69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4
> >> f7d4gTao&m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=X5tWr5mjLa
> >> aF0ox740Z4Qm5A0vgmBG52xJhjwZvDiYk&e=
> >> rc2/apache-ctakes-4.0.0-src.tar.gz.md5
> >> /.zip.md5
> >>
> >> The signature of the tarball can be found at:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> >> g_repos_dist_dev_ctakes_ctakes-2D4.0.0-2D&d=DwIBaQ&c=qS4goWBT7poplM
> >> 69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4
> >> f7d4gTao&m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=X5tWr5mjLa
> >> aF0ox740Z4Qm5A0vgmBG52xJhjwZvDiYk&e=
> >> rc2/apache-ctakes-4.0.0-src.tar.gz.asc
> >> /.zip.asc
> >>
> >> Apache

RE: Release Apache cTAKES 4.0.0 (rc2)

2017-04-15 Thread Savova, Guergana
Not sure what is meant by "this week". Today, Sat, April 15 by 5 pm?
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Pei Chen [mailto:pei.c...@wiredinformatics.com] 
Sent: Saturday, April 15, 2017 9:13 AM
To: dev@ctakes.apache.org
Subject: Re: Release Apache cTAKES 4.0.0 (rc2)

Let us recut 4.0.0 from trunk this week.  I just saw a note from Sean that he 
would like to integrate changes from trunk as well.

   Pei Chen
Wired Informatics 

265 Franklin St Ste 1702
Boston, MA 02110
tel: (617) 433-7544
pei.c...@wiredinformatics.com

On Fri, Apr 14, 2017 at 11:38 PM, Finan, Sean < 
sean.fi...@childrens.harvard.edu> wrote:

> > I'd rather not get into the definition of "basic", just like I'd 
> > rather
> not discuss the definition of obvious with another mathematician.
> --> Lol.  My wife can't stand it when I say "obviously".
>
> Fwiw, I think that cutting a new rc sooner rather than later is 
> comparatively little work compared to the benefit for testers.  It 
> needs to be done anyway as what is in rc2 is not releasable.  I don't 
> want to vote
> -1 on the rc, but will if it is necessary to get an rc3 cut.
>
> Sean
>
> -Original Message-
> From: James Masanz [mailto:masanz.ja...@gmail.com]
> Sent: Friday, April 14, 2017 9:36 PM
> To: dev@ctakes.apache.org
> Subject: Re: Release Apache cTAKES 4.0.0 (rc2)
>
> these are all the fixes I plan to make. last I talked to Sean, he had 
> all his changes in. I assume there will be more testing up until final 
> vote, I certainly will be doing more testing and working more on 
> documentation. But why not have people test on the latest now that we 
> have fixed some issues that seem like showstoppers?  I'd rather not 
> get into the definition of "basic", just like I'd rather not discuss 
> the definition of obvious with another mathematician.
>
> On Fri, Apr 14, 2017 at 8:23 PM, Pei Chen  wrote:
>
> > James,
> > Happy to create another rc3, but can I suggest we bundle all of the 
> > fixes before creating another candidate?  Are there other remaining 
> > items to test? This just seems like basic functionality?
> >
> > On Fri, Apr 14, 2017 at 8:04 PM, James Masanz 
> > 
> > wrote:
> > > -1 from me for rc2 because of various issues found
> > > old dictionary lookup didn't work in an IDE unless you 
> > > manually download the latest zip - pom files needed updating 
> > > (checked into trunk
> > > today) (more of the ctakesresources from sourceforge need to be 
> > > put onto maven central for ctakes to work as a maven dependency)
> > > Sean fixed some issues today (I saw commit notices today) 
> > > which I'd like to see included in 4.0 before it's released
> > >
> > > -- James
> > >
> > >
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen  wrote:
> > >
> > >> This is a call for a vote on releasing the following candidate
> > >> (rc2) as Apache cTAKES 4.0.0.
> > >>
> > >> For more detailed information on the changes/release notes, 
> > >> please
> > visit:
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.
> > >> org_jira_secure_ReleaseNote.jspa-3F&d=DwIBaQ&c=qS4goWBT7poplM69zy
> > >> _3 
> > >> xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4
> > >> gT 
> > >> ao&m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=UysXDJxyZLqXPk
> > >> gd
> > >> XXnHhwHUOl9QlNlwEhNHgti7unw&e=
> > >> projectId=12313621&version=12340211
> > >>
> > >> The release was made using the cTAKES release process documented here:
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__ctakes.apache.
> > >> org_ctakes-2Drelease-2Dguide.html&d=DwIBaQ&c=qS4goWBT7poplM69zy_3
> > >> xh 
> > >> KwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gT
> > >> ao 
> > >> &m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=fIq5eXz-SfJlhVLI
> > >> wE
> > >> cyCvFBbXzhgDSobUXBLQd4J-A&e=
> > >>
> > >> The candidate is available at:
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.
> > >> or 
> > >> g_repos_dist_dev_ctakes_ctakes-2D4.0.0-2D&d=DwIBaQ&c=qS4goWBT7pop
> > >> lM
> > >> 69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKG
> > >> d4 
> > >> f7d4gTao&m=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8&s=X5tWr5mj
> > >> La aF0ox740Z4Qm5A0vgmBG5

RE: Release Apache cTAKES 4.0.0 (rc2)

2017-04-17 Thread Savova, Guergana
Pei/Murali,
Let us know if you could cut release candidate 3 by Monday, April 17, 5 pm ET. 
We would understand if you are very busy and unavailable to do so -- life 
happens. Sean Finan and James Masanz volunteered to prepare rc3 if we do not 
hear from you. 

Dear cTAKES community,
Thank you for your testing of rc2, your contributions are so valuable! RC3 will 
be made available on Tuesday, April 18 or Wednesday, April 19 for another round 
of testing and voting.
We all are looking forward to the v4 release!

Cheers,
 --Guergana

-Original Message-
From: Savova, Guergana 
Sent: Saturday, April 15, 2017 10:02 AM
To: 'dev@ctakes.apache.org' 
Subject: RE: Release Apache cTAKES 4.0.0 (rc2)

Not sure what is meant by "this week". Today, Sat, April 15 by 5 pm?
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org
http://thyme.healthnlp.org
http://cancer.healthnlp.org
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Pei Chen [mailto:pei.c...@wiredinformatics.com]
Sent: Saturday, April 15, 2017 9:13 AM
To: dev@ctakes.apache.org
Subject: Re: Release Apache cTAKES 4.0.0 (rc2)

Let us recut 4.0.0 from trunk this week.  I just saw a note from Sean that he 
would like to integrate changes from trunk as well.

   Pei Chen
Wired Informatics 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.ly_1pHmTcL&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=if2F9Ti4D02juzTUQoXtsUPoO5F3SufvTF70twXnRpc&s=hKX8Ff6KEsf5JpGL11G7PTETB_ZEFCtCGxoWs5U2JEA&e=
 >
265 Franklin St Ste 1702
Boston, MA 02110
tel: (617) 433-7544
pei.c...@wiredinformatics.com

On Fri, Apr 14, 2017 at 11:38 PM, Finan, Sean < 
sean.fi...@childrens.harvard.edu> wrote:

> > I'd rather not get into the definition of "basic", just like I'd 
> > rather
> not discuss the definition of obvious with another mathematician.
> --> Lol.  My wife can't stand it when I say "obviously".
>
> Fwiw, I think that cutting a new rc sooner rather than later is 
> comparatively little work compared to the benefit for testers.  It 
> needs to be done anyway as what is in rc2 is not releasable.  I don't 
> want to vote
> -1 on the rc, but will if it is necessary to get an rc3 cut.
>
> Sean
>
> -Original Message-
> From: James Masanz [mailto:masanz.ja...@gmail.com]
> Sent: Friday, April 14, 2017 9:36 PM
> To: dev@ctakes.apache.org
> Subject: Re: Release Apache cTAKES 4.0.0 (rc2)
>
> these are all the fixes I plan to make. last I talked to Sean, he had 
> all his changes in. I assume there will be more testing up until final 
> vote, I certainly will be doing more testing and working more on 
> documentation. But why not have people test on the latest now that we 
> have fixed some issues that seem like showstoppers?  I'd rather not 
> get into the definition of "basic", just like I'd rather not discuss 
> the definition of obvious with another mathematician.
>
> On Fri, Apr 14, 2017 at 8:23 PM, Pei Chen  wrote:
>
> > James,
> > Happy to create another rc3, but can I suggest we bundle all of the 
> > fixes before creating another candidate?  Are there other remaining 
> > items to test? This just seems like basic functionality?
> >
> > On Fri, Apr 14, 2017 at 8:04 PM, James Masanz 
> > 
> > wrote:
> > > -1 from me for rc2 because of various issues found
> > > old dictionary lookup didn't work in an IDE unless you 
> > > manually download the latest zip - pom files needed updating 
> > > (checked into trunk
> > > today) (more of the ctakesresources from sourceforge need to be 
> > > put onto maven central for ctakes to work as a maven dependency)
> > > Sean fixed some issues today (I saw commit notices today) 
> > > which I'd like to see included in 4.0 before it's released
> > >
> > > -- James
> > >
> > >
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen  wrote:
> > >
> > >> This is a call for a vote on releasing the following candidate
> > >> (rc2) as Apache cTAKES 4.0.0.
> > >>
> > >> For more detailed information on the changes/release notes, 
> > >> please
> > visit:
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__i

RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

2017-04-17 Thread Savova, Guergana
Never rule out unplanned family events and emergencies... Glad to hear that 
does not appear to be the case. 

More clarity on when the rc3 will be ready would be appreciated.

--Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, April 17, 2017 10:40 AM
To: dev@ctakes.apache.org
Subject: RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

Hi Pei,

I don't  think that Guergana was imposing a deadline.  I think that she is 
indicating that James and/or I will make rc3 if you are offline.  I think that 
she was actually trying to relieve any pressure that may be upon you by 
volunteering bch time.
Guergana is very eager to get a successful 4.0 out as soon as possible.

Sean

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Monday, April 17, 2017 10:30 AM
To: dev@ctakes.apache.org
Subject: Re: Release Apache cTAKES 4.0.0 (rc2)

Guergana,
Sean and James sent us a private message to request a rc3 to include most 
recent changes in trunk after rc2 was created.
We are more than happy to create another release candidate.  That was the 
reason that rc2 was veto'd and a rc3 was requested.  The only differences 
between rc3 and rc2 are whatever minor changes went into trunk since Fri over 
the Easter and Patriots holiday weekend.  You're more than welcome to create 
the rc yourself-- but I don't think it will make it any more efficient.  I 
rarely see anyone threaten dates/deadlines upon other ASF volunteers.  What 
gives?

On Mon, Apr 17, 2017 at 9:53 AM, Savova, Guergana 
 wrote:
> Pei/Murali,
> Let us know if you could cut release candidate 3 by Monday, April 17, 5 pm 
> ET. We would understand if you are very busy and unavailable to do so -- life 
> happens. Sean Finan and James Masanz volunteered to prepare rc3 if we do not 
> hear from you.
>
> Dear cTAKES community,
> Thank you for your testing of rc2, your contributions are so valuable! RC3 
> will be made available on Tuesday, April 18 or Wednesday, April 19 for 
> another round of testing and voting.
> We all are looking forward to the v4 release!
>
> Cheers,
>  --Guergana
>
> -Original Message-
> From: Savova, Guergana
> Sent: Saturday, April 15, 2017 10:02 AM
> To: 'dev@ctakes.apache.org' 
> Subject: RE: Release Apache cTAKES 4.0.0 (rc2)
>
> Not sure what is meant by "this week". Today, Sat, April 15 by 5 pm?
> --Guergana
>
> Guergana Savova, PhD, FACMI
> Associate Professor
> PI Natural Language Processing Lab
> Boston Children's Hospital and Harvard Medical School
> 300 Longwood Avenue
> Mailstop: BCH3092
> Enders 144.1
> Boston, MA 02115
> Tel: (617) 919-2972
> Fax: (617) 730-0817
> guergana.sav...@childrens.harvard.edu
> Harvard Scholar: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
> u_guergana-5Fk-5Fsavova_biocv&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> ZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1NZsSd_
> t_sgTIJih8u2BxRLiJPDvnuewNBu5-1b-YVk&s=8bQ5yoZbdBJ1OPH9Mx93S8AKr4UenJQ
> VV_q6yL86np8&e=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org&;
> d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTp
> yIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNBu5-
> 1b-YVk&s=in-TijV-tW7CS3nn-XBBPGGx960bvD-tBdvM-ANaOok&e=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__thyme.healthnlp.or
> g&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZst
> TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNBu
> 5-1b-YVk&s=qjQAgMNUlopxi2zR5RHe8BaOZBtb3O3LKiZElC1dA9o&e=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__cancer.healthnlp.o
> rg&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZs
> tTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNB
> u5-1b-YVk&s=3WDCHDRtjyvhZ4rqdmRooWIaP0O25UuYnVhAp8m131k&e=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__share.healthnlp.or
> g&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZst
> TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNBu
> 5-1b-YVk&s=1M68m6OHMvu7ZH9j41Co0kVZRsgTeidD-NGDvwmeMf4&e=
&g

RE: cTAKES 4.0.0 Release [SUSPICIOUS]

2017-04-24 Thread Savova, Guergana
Excellent work, cTAKES team! We are already looking forward to v4.1... Release 
soon, release early.

The formal announcement will go out tomorrow -- we are very appreciative of 
Sally's expertise and efforts (and the ASF PR office) in promoting this major 
release!

--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Monday, April 24, 2017 9:33 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org; annou...@apache.org
Subject: Re: cTAKES 4.0.0 Release [SUSPICIOUS]

Congrats cTAKES team! This is an important milestone!

Tim





On Mon, 2017-04-24 at 09:02 -0400, Murali Minnah wrote:

> The Apache cTAKES team is pleased to announce the availability of the

> 4.0.0 release.

> 

> For the complete release notes, please visit

> https://urldefense.proofpoint.com/v2/url?u=https-3A__s.apache.org_ctakes-2D4.0.0-2Drelease-2Dnotes&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=WdT9MyR7SjUw37IAH5QXuFcUwS4ShD531Pxte5A7z4Q&s=vwMjzd0TaH9TGT7Ew-2x6q_LQFiiO4iky3Iy8zmf9EM&e=
>  

> 

> Apache clinical Text Analysis and Knowledge Extraction System

> (cTAKES) is

> an open-source natural language processing system for information

> extraction from electronic medical record clinical free-text.

> 

> The release can be downloaded from

> https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_downloads.cgi&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=WdT9MyR7SjUw37IAH5QXuFcUwS4ShD531Pxte5A7z4Q&s=QvJDR_vTNejF4s7uwNgJusOJ1BLYmnJoi0y8B8Priyw&e=
>  

> 

> For further information, please visit the project website at

> https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=WdT9MyR7SjUw37IAH5QXuFcUwS4ShD531Pxte5A7z4Q&s=HBd5K583Nzh_0eWV0gx504QQEf07QFZuegCTGCQUUQI&e=
>  

> 

> -- The Apache cTAKES Team


paper describing the cTAKES coreference module

2017-04-27 Thread Savova, Guergana
http://www.sciencedirect.com/science/article/pii/S1532046417300850
--Guergana



paper describing the cTAKES temporal module

2017-04-27 Thread Savova, Guergana
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009920/
Enjoy!
--Guergana



RE: cTAKES as a dependency

2017-05-01 Thread Savova, Guergana
We probably need to reach out to Sally Khudairi for guidance (copied).

Sally,
What would be your recommendation?
Thanks,
--Guergana


Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  



-Original Message-
From: Kean Kaufmann [mailto:k...@recordsone.com] 
Sent: Monday, May 1, 2017 8:46 AM
To: dev@ctakes.apache.org
Subject: Re: cTAKES as a dependency

>
> On Fri, Apr 28, 2017 at 9:53 PM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
>
Hey Kean,
>
> It is great to know that your project is out there!
>

Hey Sean!  Very kind of you.  Speaking of which, our BizDevVeep would like to 
see RecordsOne listed under "Companies" on the "Users of cTAKES" page:
https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_usedby.html&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=L8s422nQDPgdtmXx1GSp0EjFwLprtxHkVqV02XoSOl8&s=bxgVB9OYUOWATh5aPUDwAscheOLC1vy_fj73rED16ZU&e=
  . Who should I ask about that?

Many thanks...


RE: Visit segregation and extraction [EXTERNAL]

2017-06-26 Thread Savova, Guergana
You probably have to add some logic on top of the cTAKES extracted information 
to distinguish inpatient v outpatient text. 
--Guergana


Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Hari, Sekhar [mailto:sekhar.h...@cgi.com] 
Sent: Monday, June 26, 2017 12:44 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org
Subject: RE: Visit segregation and extraction [EXTERNAL]

These are already readable PDFs and not images. The clinical documents came 
through to me as scanned images. We then converted those images into readable 
PDFs using OCR. cTAKES is able to read the texts. But I want to understand if 
it can distinguish BP test result performed during an outpatient visit and in a 
non-outpatient visit (such as inpatient stay, ED visit, diagnostic test, or 
surgical procedure). The texts are cluttered with different types of clinical 
documents (progress notes, radiology notes, H&P notes etc.).

Thanks
Sekhar Hari | Program Lead
Health Sciences Business Innovation
ASDC CGI Health Solutions
Electronic City, Bangalore
Karnataka, India 560100

814 7027 779 (C)
080 6642 2536 (D)

-Original Message-
From: Chris Mattmann [mailto:mattm...@apache.org] 
Sent: 26 June 2017 10:03
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Visit segregation and extraction

Maybe start out with Apache Tika for text extraction from the PDFs, then run 
Apache cTAKES on the resultant text?



On 6/25/17, 5:30 PM, "Hari, Sekhar"  wrote:

Hello there -

I have a task in hand to process 7,000,000 patient records (PDF files) 
containing different clinical documents. Each PDF has 20 pages and one PDF = 
one patient.

The information to retrieve from these documents is like this for a patient 
quality measure namely 'Controlling High Blood Pressure' -

"Extract most recently documented blood pressure occurring after the 
diagnosis of hypertension (Do not use BP readings from inpatient stay, ED 
visit, diagnostic test, or surgical procedure). Blood pressure should be 
routinely assessed as part of a physical exam at each outpatient visit."

Can cTAKES identify non-outpatient visits and outpatient visits separately? 
Are there specific pipelines that we should use to solve this problem?

Many thanks,
Sekhar H.





RE: Annotating Lab data [EXTERNAL]

2017-07-10 Thread Savova, Guergana
Yes, cTAKES does not annotate lab data. The basic components are there -- the 
lab and the value, but linking the two of them is not. One could do the linking 
through rules or a classifier.
I hope this helps.
--Guergana


Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Das, Tanmay [mailto:tanmay@optum.com] 
Sent: Tuesday, June 27, 2017 3:39 AM
To: dev@ctakes.apache.org
Subject: Annotating Lab data [EXTERNAL]

Hi,

When using the CVD bundled with cTAKES along with 
AggrigatePlainTextFastUMLSProcessor I found that no laboratory data was 
annotated, even after providing it.
For an input like:
LABORATORY DATA:
Hemoglobin 10.6, hematocrit 31.7, white cell count 5.8, platelet 377.
Magnesium 2.6, glucose 98, BUN 13, creatinine 0.5, sodium 138, potassium 3.9, 
chloride 103. INR is 1.5.
The IdentifiedAnnotations classified them as Medication, Procedures etc but not 
as LabMention.
Does this AE contain annotator to annotate Lab data? If not, can someone 
suggest any different annotator that could identify lab values.


This e-mail, including attachments, may include confidential and/or proprietary 
information, and may be used only by the person or entity to which it is 
addressed. If the reader of this e-mail is not the intended recipient or his or 
her authorized agent, the reader is hereby notified that any dissemination, 
distribution or copying of this e-mail is prohibited. If you have received this 
e-mail in error, please notify the sender by replying to this message and 
delete this e-mail immediately.


RE: Proposed improvements [EXTERNAL] [SUSPICIOUS]

2017-07-10 Thread Savova, Guergana
Good dependency parser are hard to find; moreover good dependency parsers 
trained on clinical data are impossible to find. I don't think there is another 
dep parser trained on clinical data other than cTAKES's. In general, the state 
of the art of dependency parsing is associated with resource intense computing, 
the models are also of fair size.
--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  






-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, June 27, 2017 4:07 PM
To: dev@ctakes.apache.org
Subject: RE: Proposed improvements [EXTERNAL] [SUSPICIOUS]

Hi all,

> I would like to have (and work on it) much leaner distribution
One bigfoot is the clearparser_models.jar in ctakes-dependency-parser-res.  As 
far as I know this is not used by default or in any checked-in non-default 
configuration.  As it is 1/4 GB, I would like to move it to its own module to 
keep it out of projects that use ctakes "as a library".  I hunted the net to 
see if a duplicate is available elsewhere for alternative inclusion methods but 
couldn't find one.

Thoughts?

Thanks,
Sean

-Original Message-
From: Andrey Kurdumov [mailto:kant2...@googlemail.com]
Sent: Sunday, June 25, 2017 1:52 AM
To: cTakes developers list
Subject: Re: Proposed improvements [EXTERNAL]

Just want to note that ASF PMC want to make GitHub primary repository and 
Apache servers secondary soon.

Regarding improvements:
I personally want better support for embedding. Right now cTakes distribution 
comes with LVG and UMLS dictionary and size of cTakes thus become very.
I would like to have (and work on it) much leaner distribution, let's name it 
cTakes Core, which will just provide cTakes executable without need for data.
Right now I have constantly rip-off that data after cTakes build which slow 
down my build significantly.

Personally I support Hadrian initiative to have better logging since cTakes 
setup has some quirks which could be faster resolved by better logging.


2017-06-23 17:38 GMT+06:00 Miller, Timothy <
timothy.mil...@childrens.harvard.edu>:

> Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and 
> like something where we should be making people aware of cTAKES!
>
> svn vs. git -- I'm with you on preferring git, but not by so much that 
> it's worth spending time on an argument if it turns into an argument 
> :). As far as I know we've never really had a discussion about it.
> It's probably getting to the point where new developers have _only_ 
> used git and would find it a complete roadblock to use svn but for me 
> it's just a mild annoyance.
>
> All others you mentioned -- if you are willing to contribute a patch 
> we are happy to accept one-off contributions, and we are also 
> interested in growing the developer community with people who are 
> interested in contributing regularly over time.
>
> Tim
>
> 
> From: Hadrian Zbarcea 
> Sent: Thursday, June 22, 2017 9:14 PM
> To: dev@ctakes.apache.org
> Subject: Proposed improvements [EXTERNAL]
>
> Last week I presented at the OSEHRA Summit about ActiveMQ (and a few 
> other projects) and the ASF in general.
>
> I was surprised that most didn't know much about the ASF and more 
> importantly that nobody knew about cTakes, the only (directly) 
> healthcare related project at the ASF. There was no cTakes talk at 
> ApacheCon in Miami, but at OSEHRA, which is all about healthcare we 
> should have had a presence. I will probably submit a talk for next 
> year, but until then, because I think I created a bit of interest in 
> cTakes I went to build cTakes myself and try a few things.
>
> Some of my findings are:
> * test failures with openjdk; granted the docs mention oracle jdk as a 
> prerequisite, but think it's easy to support openjdk
> * use of svn vs git; this is a debatable topic, but by now everybody 
> and their uncles are on git so moving to git (which I'd recommend) 
> would probably forster adoption (yes, I know about the github mirror)
> * no support for OSGi, many large players use it
> * improvements in logging could go a long way, starting with moving to 
> slf4j
>
> Suggesting improvements imply that I volunteer to do a good chunk of 
> the work, but before that I'm interested more in how much the 
> community would welcome such improvements. I am curious what are 
> considered more low hanging fruits, for the more controversial topics 
> we could take them to [discuss] threads. Because every community has 
> its own cultur

RE: Deep learning [EXTERNAL]

2017-08-18 Thread Savova, Guergana
Not at this point...
Thanks for your question,
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  





-Original Message-
From: abilash.mat...@cognizant.com [mailto:abilash.mat...@cognizant.com] 
Sent: Friday, August 18, 2017 1:46 AM
To: dev@ctakes.apache.org
Subject: Deep learning [EXTERNAL]

Just a basic question, are we using any deep learning techniques in CTAKES? If 
yes, then which module.

Thanks,
Abilash Mathew
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


RE: Looking for literature [EXTERNAL]

2020-01-29 Thread Savova, Guergana
Hi Greg,

A link to our JAMIA publication describing the MiPACQ corpus and its usage:
https://academic.oup.com/jamia/article/20/5/922/2909262

I believe the "Development and evaluation of NLP components" section provides 
the details you are looking for.

Best,
--
Guergana Savova, PhD, FACMI
Associate Professor
Boston Children's Hospital and Harvard Medical School
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova


-Original Message-
From: Greg Silverman [mailto:g...@umn.edu] 
Sent: Wednesday, January 29, 2020 2:54 PM
To: dev@ctakes.apache.org
Subject: Looking for literature [EXTERNAL]

* External Email - Caution *


I'm digging around for literature on the relationship between cTAKES and 
MiPACQ, and of course found this paper, "The MiPACQ Clinical Question Answering 
System," which describes how cTAKES was used wrt to the question and answering 
component of MiPACQ.

However, I'm more interested in how cTAKES was used with the deidentified 
MiPACQ corpus of pathology and colorectal cancer notes. All I'm able to find is 
reference to the MiPACQ Treebank and with that, very little about use of 
cTAKES. Any information about this, especially in the literature would be most 
welcome.

Thanks in advance!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE 
 Department of Surgery University of Minnesota g...@umn.edu

 ›  evaluate-it.org  ‹


RE: Missing body side and laterality attribute in AnatomicalSiteMention [EXTERNAL]

2020-02-17 Thread Savova, Guergana
Hi Abad,

Methods for populating these two attributes have not been implemented in 
cTAKES.  In cTAKES, there is a method for linking anatomical sites to 
diseases/disorders, sign/symptoms or procedures:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994852/

best,
--
Guergana Savova, PhD, FACMI
Associate Professor
Boston Children's Hospital and Harvard Medical School

[http://web2.tch.harvard.edu/homepagestories/Images/SigBlock.jpg]


From: abad.ay...@cognizant.com [mailto:abad.ay...@cognizant.com]
Sent: Monday, February 17, 2020 3:47 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org
Subject: Missing body side and laterality attribute in AnatomicalSiteMention 
[EXTERNAL]

* External Email - Caution *


Hello Team,

We introduced cTAKES as our NLP engine to parse clinical data recently in our  
profile. Though we are able to parse the clinical data at high level, we are 
not able to get values for attributes like bodySide and bodyLaterality. For eg: 
for the below text

"He had a slight fracture in the proximal right fibula"

It should have ideally populated values for 'bodySide' and 'bodyLaterality' in 
the 'AnatomicalSiteMention' as "right" and  "proximal" respectively. These 
attributes are critical information in our profile. We tried different 
possibilities and still it's not working. We are new to cTAKES so we would like 
to know what should be the probable fix for it.  Do we need to add any specific 
changes in our piper file to have AnalysisEngine needed for the same. I tried 
to unit test using the 'RelationExtractorAnnotatorsTest' coming under 
'ctakes-relation-extractor' module but couldn't find an annotator.xml for the 
same. Pls. advise on how to proceed


Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored. This e-mail and any files transmitted with it are for the sole 
use of the intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


RE: Values missing for few attributes and how to populate demographic details [EXTERNAL]

2020-03-09 Thread Savova, Guergana
Abad,
This was pointed out to you - a large portion (3-11) of these attributes have 
not been implemented. If you have a solution for one or all of them, we would 
welcome your contribution to cTAKES.
Regards,
--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova
http://ctakes.apache.org
http://thyme.healthnlp.org
http://cancer.healthnlp.org
http://share.healthnlp.org
http://center.healthnlp.org
Twitter | 
LinkedIn | 
Facebook | 
Instagram
[http://web2.tch.harvard.edu/homepagestories/Images/SigBlock.jpg]

From: abad.ay...@cognizant.com [mailto:abad.ay...@cognizant.com]
Sent: Monday, March 9, 2020 1:01 AM
To: dev@ctakes.apache.org
Subject: RE: Values missing for few attributes and how to populate demographic 
details [EXTERNAL]

* External Email - Caution *


Hi Team,

A gentle reminder on this. Since this is a critical data for our profile it 
would be of a great help if you could confirm whether these data are available 
in cTAKES , This is a decision making factor for us now. So kindly advice.


Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Ayyub, Abad (Cognizant)
Sent: Wednesday, February 26, 2020 12:28 PM
To: dev@ctakes.apache.org
Subject: RE: Values missing for few attributes and how to populate demographic 
details

Hi Team,


A gentle reminder on this. Any advice will be of great help for us.


Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Ayyub, Abad (Cognizant)
Sent: Friday, February 21, 2020 5:50 PM
To: dev@ctakes.apache.org
Subject: Values missing for few attributes and how to populate demographic 
details

Hello Team,

We are working on extracting the clinical data from medical documents and some 
of the attributes that I list below are critical for our profile.
Could you pls. advise on how can we get a valid values for the below attributes 
which is coming under every Event/EntityMention

SlNO

AttributeName

Remarks

1

conditional

Always got this as FALSE

2

originalText

Always got this as NULL

3

event

Always got this as NULL

4

alleviatingFactor

Always got this as NULL

5

associatedSignSymptom

Always got this as NULL

6

course

Always got this as NULL

7

duration

Always got this as NULL

8

endTime

Always got this as NULL

9

exacerbatingFactor

Always got this as NULL

10

startTime

Always got this as NULL

11

relativeTemporalContext

Always got this as NULL


Also we need to identify how can we extract below details regarding Patient 
from the document


  *   Demographic details - Patient Age, weight, BMI, BP, race, gender as 
present in the medical records
  *   Patient history - Patient's past medical history and social history will 
be displayed. Social history includes smoking, alcohol and drug and other 
details
  *   Family history - Patient's family history will also be displayed in this 
section
Kindly advise whether we need to add any new AE's to our existing piper file so 
that cTAKES will provide us the requested O/P as mentioned above. PFB the 
contents of AE's configured in our piper file.



// Load a simple token processing pipeline from another pipeline file
load DefaultTokenizerPipeline.piper
add ContextDependentTokenizerAnnotator
addDescription POSTagger
// Add Chunkers
load ChunkerSubPipe.piper
// Commands and parameters to create a default chunker processing sub-pipeline. 
 This is not a full pipeline.
// Default fast dictionary //lookup
add DefaultJCasTermAnnotator
load AttributeCleartkSubPipe.piper
# Parameters for AssertionAnalysisEngine and ConceptConverterAnalysisEngine
set assertionModelResource=file: org/apache/ctakes/assertion/models/i2b2.model
set scopeModelResource=file: org/apache/ctakes/assertion/models/scope.model
set cueModelResource=file: org/apache/ctakes/assertion/models/cue.model
set enabledFeaturesResource=file: 
org/apache/ctakes/assertion/models/featureFile11b
set posModelResource=file: org/apache/ctakes/assertion/models/pos.model
package org.apache.ctakes.assertion.medfacts
package org.apache.ctakes.assertion.attributes
add ConceptConverterAnalysisEngine
add GenericAttributeAnalysisEngine
add SubjectAttributeA

RE: how to activate inactive features in cTAKES? [EXTERNAL] [SUSPICIOUS]

2020-04-30 Thread Savova, Guergana
To add to Tim's clarification.
In addition, this enables you (or anyone for that matter) to implement your 
/their own method for these types. 

--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Thursday, April 30, 2020 7:53 AM
To: dev@ctakes.apache.org
Subject: Re: how to activate inactive features in cTAKES? [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Akram, the typesystem in ctakes was created by a project with the aim of 
specifying things that are useful, without specifying implementations for them 
all. There are many items in the data model that there are no ctakes modules to 
fill. The idea was that when people bring things online there are placeholders 
for that information, so that new functionality is not added in a completely ad 
hoc way. So of the examples you describe:

- discoveryTechnique is always the same because you are running the same 
pipeline
- confidence is not filled in by the dictionary lookup -- the current method 
used does not generate a confidence score
- disambiguated is not filled but is technically correct because there is no 
disambiguation algorithm running
- polarity, uncertainty, conditional, generic, historyOf, can be filled in by 
certain pipelines. You will have to add them after the DictionarySubPIpe to see 
them filled in.

Tim


From: Akram 
Sent: Thursday, April 30, 2020 4:37 AM
To: dev@ctakes.apache.org
Subject: how to activate inactive features in cTAKES? [EXTERNAL]

* External Email - Caution *


Hi
I can extract many tags when I use the default .piper in cTakes Tags such as 
LabMention, AnatomicalSiteMention, ProcedureMention, etc they all extracted 
from applying this piper

load DefaultTokenizerPipeline

load DictionarySubPipe

writeHtml
writeXmis

The problem is there are some features that do not change no matter the text 
change.
most importantly confidence which is always 0 How can I get the confidence of 
each term?
other features such
discoveryTechnique is always 1

polarity always 0

uncertainty always 0

conditional always false

generic always false

historyOf always 0

score always 0

disambiguated always flase

how can I get these features working and where can I find more info about these 
features and what do they mean?
Thanks



RE: ApacheCon 2020 and cTAKES

2020-06-29 Thread Savova, Guergana
Hi Sean,

Thank you for bringing ApacheCon to the attention of cTAKES-ers!

In my opinion, your list of ideas for presentations/videos catches topics of 
high interest in our community that we have a seen many discussions on in the 
cTAKES lists. Thank you for volunteering to be the point of contact!

It is a short two week timeline, but we as a community can pull it off.

Looking forward to engaging discussions on the list. I am including the user 
list as well as there are many there who might be interested.

--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, June 29, 2020 11:02 AM
To: dev@ctakes.apache.org
Subject: ApacheCon 2020 [Bulk] [EXTERNAL] [SUSPICIOUS] [Bulk]

* External Email - Caution *


Hi all,


General admission to ApacheCon 2020 is free:  
https://urldefense.proofpoint.com/v2/url?u=https-3A__hopin.to_events_apachecon-2Dhome&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=yU_agaYe-PZHfO7KaS_wI1oIHZ9S2WZ6mlFRuPuGX-w&s=iNzRSD7w2OIaoya3gcxVg3TN3e1uZZnaTfnLbPIH13A&e=
 


I think that price of admission and travel costs have held back ctakes users 
from attending past conferences, and lack of a sizable audience has diminished 
the comparative value of ctakes presentations in the eyes of ApacheCon 
planners.  Because of the "at home" nature of this year's conference, an app 
with smaller presence and less hip buzz has a better chance of grabbing some 
time on the schedule.


The predetermined tracks are still an ill fit when it comes to the nature of 
ctakes.  
https://urldefense.proofpoint.com/v2/url?u=https-3A__apachecon.com_acah2020_cfp.html&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=yU_agaYe-PZHfO7KaS_wI1oIHZ9S2WZ6mlFRuPuGX-w&s=NzjDAyMTCLL62RKHfhr4dnMgGZTDFgB3X92YlqwPUEY&e=
 

However, I think that we can still use this opportunity to deliver some 
powerful introduction and training videos, as well as user stories and clinical 
project application.  Perhaps we can argue for a NLP track and do some 
coordination with projects like OpenNLP and UIMA.


There are a scant two weeks to come up with presentations, and less time to 
propose a track/topic.  The call for presentations ends July 13th.  That is a 
deadline that requires immediate attention by anybody who wants to show off 
their project or expertise.


Apache wants to have a single point of contact for each project, and I am 
volunteering to be that person for ctakes.   I am volunteering, not laying 
claim, so if you think that you are a better fit for the position please let me 
know.


I have written some ideas for presentations below.  If you want to take one 
(modify as you like) then please write me and post to the devlist.  If you have 
ideas for another presentation topic, please let me and the devlist know - even 
if you aren't volunteering to do the presentation yourself perhaps somebody 
else will.Again ... two weeks.​


Thank you,

Sean



*  The following talk ideas are by and large directed toward training.  That 
does not mean that topics should stay within that scope.


=


Customizing cTAKES: First Principles

Built using Apache UIMA, cTAKES is modular and extensible.  Why is it 
frequently treated as a black box?  Is it lack of need, sparsity of resources, 
or simply fear of the unknown?

This is a quick start tutorial on adding custom elements to cTAKES.  We 
illustrate creating simple classes to input, process and output data.  This 
involves a concise overview of Apache uimaFIT and the cTAKES type system, as 
well as building a UIMA pipeline using piper files.


=


Loading a shippable with cTAKES DockHand

Customizing a simple pipeline need not be left to cTAKES experts.  Making a 
cTAKES installation need not be confined to source code checkouts or lengthy 
multi-stage binary downloads.

We introduce cTAKES DockHand, a compact single-file installation tool that 
allows one to construct custom pipelines as well as local installations, Rest 
Services and Dockerfiles.


==


Secret Engines of cTAKES

The cTAKES default natural language processing pipeline is a standard in the 
clinical research community.  What is past that standard?  While the default 
clinical pipeline uses almost 20 engines, there are dozens more in various 
cTAKES modules.

We present and discuss the top 10 annotation engines you never kne

RE: ApacheCon 2020 [EXTERNAL] [SUSPICIOUS]

2020-07-07 Thread Savova, Guergana
A fantastic set of presentations, will be of broad interest to the Apache 
community!
Amazing work, cTAKES community!
Stay safe and healthy all,
--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215
Tel: (617) 919-2972
Fax: (617) 730-0817


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, July 6, 2020 9:21 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Fw: ApacheCon 2020 [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


I can't believe that I forgot to mention ...


There will also be a presentation (maybe two?) by a group that has adapted 
ctakes to work with two other languages.  They have also integrated ctakes with 
other tools such as FreeLing and HeidelTime.  So cool ...


Cheers,

Sean



From: Finan, Sean
Sent: Monday, July 6, 2020 9:08 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: ApacheCon 2020


Hi all,


The ctakes representation at ApacheCon 2020 is looking good!​


ApacheCon 2020 runs September 29 through October 1.

Submission runs through Sunday, July 12.  Technically it is 8:00 a.m. Eastern 
time Monday, but please don't procrastinate.

Registration is free.


I am excited to announce that we have three groups interested in giving 
presentations on their configuration and use of ctakes at a large scale!

We also have a presentation on the installation of the ctakes Rest service 
using the ctakes-rest module!


Knowledge on these topics is always extremely valuable to our users, and I for 
one really want to see how sites use ctakes when given different resources, 
requirements and restrictions.  Because of that, I am trying to put together 
(technology allowing) a roundtable discussion with those presenters.  That 
should be of value to every user no matter what your situation.


We still need more presentations!  To encourage you, here is a little 
information:


1.  What you do is interesting!  If you think that nobody out there cares about 
what you've done and how, then you probably aren't fully aware of how large and 
diverse our user base really is.  People want to know about things like 
integration, customization, clinical specialty application, augmentation and 
favorite capability fascination.

2.  Submission is very simple.  This is not like a scientific conference that 
requires a complete paper describing your work.  You only need to submit a 
blurb that loosely covers your topic and major talking point(s).  Half a dozen 
sentences will suffice.  In fact, what I sent last week (far below) could pass 
muster for a submission.  Go for something that will be on a brochure / 
schedule.

3.  The audience is made up of people just like you.  Developers, 
Bioinformaticians, IT Specialists, Students, Medical Researchers, AI Explorers 
and far more Hackers than Rock Stars.

4.  Slick presentation skills are not necessary.  Don't worry if you have never 
spoken to a room full of listeners.  Don't worry if English isn't your first 
language.  Don't worry if your slides are "sloppy".  Your presentation will not 
be graded.

5.  You don't need to prepare your whole talk before submitting.Idea now, 
details later.

6.  Registration is FREE.


Right now the speaking time is anything up to 50 minutes.  If you don't want to 
present a full 50 minutes then that is ok ... The rest can be filled with extra 
question/answer or somebody else may fill the remaining time with a 
presentation on a similar topic.


I am going to put together a lightning round.  If you think that you can cover 
some material in five to fifteen minutes then this is for you!  Lightning 
rounds can be fun as you can make an impact with two or three slides and barely 
enough speaking to run out of breath.  This is really a free-for-all.  You can 
pack the time with data, give a short demonstration, compare using ctakes to 
breaking a mustang, or even do some on-topic (ctakes, nlp, AI, bioinformatics) 
stand up.  Anything goes.  This was an interesting (full) talk last year: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__aceu19.apachecon.com_session_confessions-2Dmiddle-2Daged-2Dcoder-2Dturned-2Dgravel-2Dgrinder&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=e-OOqkInyUhKdC06RHK2xAz6io-pUkfzLWQ4kF_HI1M&s=rrZwfkkVrf06VZ0-06cTQ-JCSvtGXKmpxQo7r20KBxs&e=
 .   If you want to be in the lightning round, just write me a couple of 
sentences on your strike and I will put together the full submission for 
ApacheCon.  Does it get any easier?


I will present one or two things, but to maximize impact I would like to know 
what most interests / would help all of you.  So, please write me a topic or 
two that would best apply to your work.


Some links 

RE: Current thinking on new UMLS authentication [EXTERNAL]

2020-09-18 Thread Savova, Guergana
I have not received that email either. Could you share it with us?
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215


-Original Message-
From: Greg Silverman [mailto:g...@umn.edu.INVALID] 
Sent: Friday, September 18, 2020 1:46 PM
To: dev@ctakes.apache.org
Subject: Re: Current thinking on new UMLS authentication [EXTERNAL]

* External Email - Caution *


I never received the email you mentioned.

I assume this will affect the API call to NLM for UMLS validation? If it does, 
why not take the NLM's model for UMLS and only require UMLS credentials at the 
time of download?

Greg--



On Fri, Sep 18, 2020 at 12:33 PM Peter Abramowitsch 
wrote:

> Hi All
>
> Probably all of you have received an email from Patrick McLaughlin at 
> the NLM regarding upcoming changes to the UMLS authentication they are going 
> to
> support and to retire.   This will have implications for all cTakes users
> in different ways depending on how cTakes is implemented in your
> community.   To me, there were some ambiguities in his email regarding
> usage situations as a registered content provider that needed to be 
> spelled out.
>
> I was wondering if any of you have had further conversations with him 
> which might clarify whether, for instance,  users within a registered 
> content provider installation would still need to be individually 
> authenticated.
> Or on any other authentication scenario.
>
> I'm trying to contact him or his team at the moment to ask about our 
> particular architecture.
>
> Regards,  Peter
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE 
 Department of Surgery University of Minnesota g...@umn.edu


RE: Apache cTAKES 4.0.0.1 : UMLS Authentication Patch [EXTERNAL] [SUSPICIOUS]

2021-01-21 Thread Savova, Guergana
+1
Amazing effort by the community led by Sean and Peter, thank you!
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Thursday, January 21, 2021 7:16 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES 4.0.0.1 : UMLS Authentication Patch [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Seconded, thanks a lot Sean and Peter for getting this working and turned 
around so quickly! 
Tim

On Wed, 2021-01-20 at 23:13 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Thanks Sean!
> 
> Peter
> 
> On Wed, Jan 20, 2021 at 4:25 PM Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > ???As some have experienced, the U.S.A. National Library of Medicine 
> > (NLM) has changed the authentication method for using the Unified 
> > Medical Language System (UMLS).
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.gov
> > _research_umls_index.html&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> > SdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrH
> > Eo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=CVA7xXHE
> > y4dOSNfEju1Or1cr6KZd3QY7bnY4yIDye3I&e=
> > 
> > 
> > Though a bit late in its arrival, Apache cTAKES now has a patch 
> > release that supports the new UMLS authentication method.
> > 
> > 
> > The release number is 4.0.0.1, an update of the previous release 
> > version
> > 4.0.0 with a single change to enable the new UMLS authentication.
> > 
> > No other code or functionality has been modified and there are no 
> > enhancements to the previous release 4.0.0
> > 
> > 
> > There are instructions for use on the Apache cTAKES wiki.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.or
> > g_confluence_display_CTAKES_cTAKES-2B4.0.0.1&d=DwIBaQ&c=qS4goWBT7pop
> > lM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDR
> > RNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhh
> > d8GDDdhA&s=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo&e=
> > 
> > 
> > The source code is available in the 4.0.0.1 tag Subversion (svn) 
> > repository.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_
> > repos_asf_ctakes_tags_ctakes-2D4.0.0.1_&d=DwIBaQ&c=qS4goWBT7poplM69z
> > y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXi
> > powRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDD
> > dhA&s=1jNLJHU_4gH08DUNZDjfC4BLGsPSKdiOe63D48Qqekw&e=
> > 
> > 
> > The jar and pom files are available from maven central and any 
> > Applications utilizing Apache cTAKES as an Apache Maven dependency 
> > should update their pom files.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__search.maven.or
> > g_search-3Fq-3Dctakes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdio
> > CoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8u
> > Yx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=7ICwdr1JlzQe
> > T2skY6TMXmU_u3WAZlxTYKpIZGmGQfs&e=
> > 
> > 
> > At this time the Apache infra script that points mirror download 
> > servers to the pre-built zip/archive files has not run.  I hope that 
> > the mirror servers are updated in a day or two.
> > 
> > When the mirror servers are updated the buttons on the "Downloads"
> > page of
> > ctakes.apache.org should trigger a download of the patch version.  
> > Until then you will get a "page not found" error.
> > 
> > Until the pre-built archive downloads are available through the 
> > website, you can find them in the release repository.
> > 
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apac
> > he.org_content_repositories_releases_org_apache_ctakes_ctakes-2Dcore
> > _4.0.0.1_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=H
> > eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Ko
> > UGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=uM_5s0vlGN8eJc1nK4s9RPxN
> > Q2o5KB3vWRC1M0qo2HU&e=
> > 
> > 
> > For more information please visit the wiki page on the Apache cTAKES
> > 4.0.0.1 patch release.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.or
> > g_confluence_display_CTAKES_cTAKES-2B4.0.0.1&d=DwIBaQ&c=qS4goWBT7pop
> > lM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDR
> > RNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhh
> > d8GDDdhA&s=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo&e=
> > 
> > 
> > 
> > A very special thanks goes to Peter Abramowitsch for conception and 
> > original implementation of the authentication code and workflow.
> > 
> > 
> > Many thanks to those who boldly tested, documented and otherwise 
> > made this patch and its trunk equivalent possible, including
> > 
> > Kean Kaufmann
> > 
> > Gandhi Rajan
> > 
> > Eugenia Monogyiou
> > 
> > Timothy Miller
> > 
> > and anybody else that I have forgotten (apologies).
> > 
> > 
> > ?And for those of you gave gave me a bit of prodding to get this 
> 

RE: Apache cTAKES is now on GitHub ! [EXTERNAL] [SUSPICIOUS]

2023-01-03 Thread Savova, Guergana
Fantastic development, thank you very much for making this happen, Sean! 

Happy New Year to all.
--
Guergana Savova, PhD, FACMI
Patricia F. Brennan Professor
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School


-Original Message-
From: Finan, Sean  
Sent: Friday, December 30, 2022 1:49 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Apache cTAKES is now on GitHub ! [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi all,

I am pleased to announce that the cTAKES source code is now on GitHub at 
https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmM-ol7Lc$
[https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmZ84rBi0$
 
] GitHub - apache/ctakes: Apache 
ctakes Apache ctakes. Contribute to apache/ctakes development by creating an 
account on GitHub.
github.com








All current and future code development should be performed on the source in 
GitHub.


   Changes ( vs. Subversion Repository )
   =

  *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
  *   STRUCTURE:   The project has been slightly restructured at a high level.  
The typical user should not notice the difference.
  *   CODE API:   All package, class, method and constant names remain the 
same, so your code should not need to be refactored.
  *   DEPENDENCIES:   If you include cTAKES modules as dependencies in your 
maven project, you can simply change the version to obtain new 5.0.0-SNAPSHOT 
builds. *
  *   BINARY PACKAGE:   The binary package has some minor differences, but the 
typical user should not notice them.

* If you use maven dependency exclusions for resource ('-res') modules because 
of unwanted ML models, you need to change the excluded name extension from 
'-res' to '-model'.


   Moving forward from the Subversion Repository
   =

  *   VERSION:   The project in the SVN repository was versioned 4.0.1-SNAPSHOT.
  *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT Subversion 
(SVN) repository will remain available for checkout, but should be considered 
read-only.  4.0.1-SNAPSHOT built modules will remain available for maven 
dependencies.  All current and future code development should be performed on 
the source in GitHub.
  *   RELEASE:   There is no cTAKES 4.0.1 release.

   Next Anticipated Release
   

  *   VERSION:   As you might guess from the snapshot version change, we are 
gearing up for a version 5.0.0 release.
  *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0, including 
completely new modules, that the version number was bumped up.
  *   DOCUMENTATION:   All of the new toys will be documented in the confluence 
wiki at the time of the 5.0.0 release.
  *   DATE:   There is no release date yet, but hopefully it will be very very 
soon ...

Happy New Year,

Sean




RE: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

2023-03-08 Thread Savova, Guergana
+1 on waiting.
--Guergana

From: Bethard, Steven - (bethard) 
Sent: Wednesday, March 8, 2023 1:19 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

* External Email - Caution *

+1 on waiting for the new distribution platform (and continuing to search for a 
primary Release Manager).

On 3/8/23, 06:29, "Finan, Sean" wrote:

External Email

Hi all Apache cTAKES developers and users,

I have news on the release front ...

The Apache Infrastructure team is working on a new Artifact Distribution 
Platform.  It will be used to upload and promote release artifacts, sign keys, 
and host distributions in a fashion that is informative and attractive to a 
user.

Some of the old/current items that are part of an Apache project release are 
going to be "legacy" and there are some new metadata items that go with a 
release artifact.

I see two paths moving forward:


  1.   We push on with a release of cTAKES 5.0 and release in the current style.
  2.   We wait a couple of months until the Apache Infrastructure team has the 
new Artifact Distribution Platform ready and use it to release.

For #1 please keep in mind that we still haven't had a volunteer for the 
primary Release Manager.  Gandhi Rajan has volunteered to be co-RM but it will 
be a two-person job.

Either way can create Release Candidate source branches on GitHub to be tested 
and have issues posted on the cTAKES GitHub issues list.

This manner of Release Candidate testing would be a deviation from the method 
of creating Release Candidate artifacts including binary installations and 
putting them in a Subversion (svn) repository online.
We can probably place "binary installation" artifacts on GitHub, but somebody 
will need to check on space limits and other rules before we can make any 
promises there.  If there is some barrier there then testers would need to test 
binary installations by build/packaging locally on their system - which is a 
good thing to have tested anyway.

So, please post any thoughts or questions in reply to this email and we can try 
to figure out where to go from here.

Many thanks,

Sean


From: Finan, Sean 
mailto:sean.fi...@childrens.harvard.edu.INVALID>>
Sent: Monday, February 20, 2023 5:12 PM
To: dev@ctakes.apache.org 
mailto:dev@ctakes.apache.org>>; 
u...@ctakes.apache.org 
mailto:u...@ctakes.apache.org>>
Subject: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Hi all,

The cTAKES Project Management Committee has voted that it is time to officially 
begin the release process for cTAKES 5.0

It has been almost 6 years since version 4.0.0 was released, and with a 
worldwide user count estimated in the thousands, a new release will be 
extremely valuable.

Releasing cTAKES 5.0 will involve some work, and the project needs volunteers 
to assist in the process.

The most important thing right now is the appointment of a Release Manager (RM).
While the position is not to be taken lightly and does involve work, it can be 
a great experience (and a resume builder).

We need a cTAKES committer to be the RM, but I am going to split the general 
responsibilities below.
I am doing this because I believe that any user familiar with cTAKES can be a 
co-RM.

Requiring a committer:
1.  Creating Release Candidates of the code.
2.  Deploying and Signing the actual Official Release.

Not requiring a committer:
1.  Coordinating people performing documentation, testing and bug fixing.
2.  Communicating progress with the developer list.

I am sure that I am forgetting something, but those are the 4 tasks that I can 
think of right now.

If you would like to be the Release Manager (or a co-RM), please volunteer on 
the dev@ctakes.apache.org mailing list.

Other tasks that must be performed for a release include:
1.  Testing the release candidates.
3.  Contributing documentation.
2.  Writing fixes for bugs that can be fixed for the release.
4.  Updating the release information on ctakes.apache.org

Anybody can test release candidates.  There are countless pipelines that can be 
built and tested, but I think that we should try to cover the 'most commonly 
used' pipelines.  If you run any pipeline, please report success - even if you 
don't run it specifically for release testing.
Documentation can be contributed by any user.  A cTAKES committer is required 
to actually push the documentation to the wiki, readme, release notes, etc. 
Sending out markdown, images, plain text or just recommendations is open to all 
users.
While only committers can actually push changes to cTAKES code, any user can 
contribute fixes by creating code patches or even just copy-pasting code in an 
email.
Updating the ctakes.apache.org website will require a committer, but 
non-committer assistance is possible just

RE: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2024-05-15 Thread Savova, Guergana
+1 from me.
--Guergana

-Original Message-
From: Finan, Sean  
Sent: Wednesday, May 15, 2024 1:32 PM
To: dev@ctakes.apache.org
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Thanks Tim!


From: Miller, Timothy 
Sent: Wednesday, May 15, 2024 11:38 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Thanks Sean,
I was able to get it working – definitely a user/documentation issue and not an 
issue with the code. Looks like a great release. I’m happy to vote for release 
+1.
Tim


From: Finan, Sean 
Date: Tuesday, May 14, 2024 at 10:35 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


Ah - are you just running the class within intellij?  If so, you need to set 
the classpath in the run configuration to be ctakes-examples.  Otherwise the 
classpath doesn't contain anything from modules outside ctakes-gui and 
ctakes-core.

Alternatively, run the maven compile step with the "runPiperGui" profile 
selected.  That will also run the piper file submitter gui with the correct 
classpath.

Using a binary build, after running bin/getUmlsDictionary, running 
bin/runPiperSubmitter also works.

I don't want to do it for 5.1.0, but I should make names of the class, profile 
and script match.

I will check the wiki instructions and make sure that -exact- details are in 
there.

Sean


From: Miller, Timothy 
Sent: Tuesday, May 14, 2024 12:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its 
directory) or running 'mvn package' on the ctakes -main- project (in the main 
ctakes root directory) with the web-rest-build profile enabled 
'-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$

RE: roadmap for Apache cTakes "big data" processing

2013-04-28 Thread Savova, Guergana
+1 
--guergana

-Original Message-
From: Kaggal, Vinod C. [mailto:kaggal.vi...@mayo.edu] 
Sent: Saturday, April 27, 2013 11:21 PM
To: 
Cc: 
Subject: Re: roadmap for Apache cTakes "big data" processing

+1


On Apr 27, 2013, at 9:05 PM, "Chen, Pei"  wrote:

> +1 for UIMA-AS
> 
> 
> On Apr 27, 2013, at 9:25 PM, "Andy McMurry"  wrote:
> 
>> I'm writing to gauge community interest and intent for parallel processing 
>> with cTakes. 
>> 
>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM. 
>> http://uima.apache.org/doc-uimaas-what.html
>> 
>> Apache Mahout is likely to become the defacto apache package for machine 
>> learning. 
>> http://mahout.apache.org/
>> 
>> I believe cTakes will embrace both of these in due time.  
>> Do you agree or do you have a different view? 
>> 
>> 
>> 
>> 
>> 


RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
In the clinical narrative there are many sections that are enumerations and 
where a new line character must be treated as a sentence break. For example, 
Current Medications in which each line contains a medication and its signature.

The format of the MIMIC notes is a bit strange as there are many new line 
characters in the middle of the sentences which is imposed by the native 
application the notes were created in (cannot remember the name of the app) 
which has a character window and then a new line is inserted at the end of that 
window. I believe we have a pre-processing script that deals with this issue.
--Guergana

-Original Message-
From: Steven Bethard [mailto:steven.beth...@colorado.edu] 
Sent: Tuesday, May 21, 2013 9:59 AM
To: dev@ctakes.apache.org
Subject: Re: sentence detector newline behavior

On May 21, 2013, at 6:07 AM, "Miller, Timothy" 
 wrote:
> The sentence detector always ends a sentence where there are newlines.
> This is a problem for some notes (e.g. MIMIC radiology notes) where a 
> line can wrap in the  middle of a sentence at specified character 
> offsets. In the comments for SentenceDetector, it seems to be split up 
> very logically in that it first runs the opennlp sentence detector, 
> then breaks any detected sentence wherever there is a newline. Questions:
> 1) Would it be good to add a boolean parameter for breaking on newlines?
> 2) If that section was removed/avoided, does the opennlp sentence 
> detector give good results given our model? Or is the model trained on 
> text that always breaks at carriage returns?

For what it's worth, in the ClearTK wrapper for the OpenNLP sentence detector, 
we only add extra sentences when there are *multiple* newlines in a row, i.e. 
"\\s*\\n\\s*\\n\\s*".

And it certainly seems like a good idea to me to have some way of disabling the 
"every newline is the end of a sentence" behavior. That seems like a 
particularly bad default behavior for most real text.

Steve


RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
The OpenNLP sentence segmenter is trained on clinical data (cannot remember 
exactly how many sentences were in the training corpus). This is the model 
distributed with cTAKES. The only hard rule is the new line.
--Guergana

-Original Message-
From: Steven Bethard [mailto:steven.beth...@colorado.edu] 
Sent: Tuesday, May 21, 2013 11:38 AM
To: dev@ctakes.apache.org
Subject: Re: sentence detector newline behavior

On May 21, 2013, at 9:02 AM, Tim Miller  
wrote:
> I think the whole reason to use a machine learning approach for 
> sentence detection should be to help weigh evidence with these cases 
> where hard rules cause problems, mainly 1) when a period does not end 
> a sentence, but also 2) where a newline does and does not mean end of 
> sentence.

Perhaps we should consider re-training the OpenNLP sentence segmenter on some 
clinical data? Presumably we can get sentences from the TreeBank annotations.

I don't know much about the OpenNLP sentence segmenter though. Does it only 
classify on periods? We'd want to classify all periods and newlines. And we'd 
want to add features that capture patterns like "XXX: YYY".

Steve

> It
> is of course bad that in your example if you don't put a sentence 
> break you will think that "extravascular findings" is negated. But it 
> is also bad if you put a sentence break immediately after the word 
> "and" at the end of a line and then you find that your language model 
> thinks that "and " is a good bigram.
> 
> I will create a jira for the parameter thing, and try to implement it 
> and see if it gets ok results with the existing model.
> Tim
> 
> On 05/21/2013 10:11 AM, Masanz, James J. wrote:
>> +1 for adding a boolean parameter, or perhaps instead a list of 
>> +section IDs
>> 
>> The sentence detector model was trained on data that always breaks at 
>> carriage returns.
>> 
>> It is important for text that is a list something like this:
>> 
>> Heart Rate: normal
>> ENT: negative
>> EXTRAVASCULAR FINDINGS: Severe prostatic enlargement.
>> 
>> And without breaking on the line ending, the word negative would 
>> negate extravascular findings
>> 
>> 
>> -Original Message-
>> From: dev-return-1605-Masanz.James=mayo@ctakes.apache.org 
>> [mailto:dev-return-1605-Masanz.James=mayo@ctakes.apache.org] On 
>> Behalf Of Miller, Timothy
>> Sent: Tuesday, May 21, 2013 7:07 AM
>> To: dev@ctakes.apache.org
>> Subject: sentence detector newline behavior
>> 
>> The sentence detector always ends a sentence where there are newlines.
>> This is a problem for some notes (e.g. MIMIC radiology notes) where a 
>> line can wrap in the  middle of a sentence at specified character 
>> offsets. In the comments for SentenceDetector, it seems to be split 
>> up very logically in that it first runs the opennlp sentence 
>> detector, then breaks any detected sentence wherever there is a newline. 
>> Questions:
>> 1) Would it be good to add a boolean parameter for breaking on newlines?
>> 2) If that section was removed/avoided, does the opennlp sentence 
>> detector give good results given our model? Or is the model trained 
>> on text that always breaks at carriage returns?
>> 
>> Tim
> 



RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
The model is trained to disambiguate punctuation characters which in most cases 
is the period.
--Guergana

-Original Message-
From: Steven Bethard [mailto:steven.beth...@colorado.edu] 
Sent: Tuesday, May 21, 2013 12:07 PM
To: dev@ctakes.apache.org
Subject: Re: sentence detector newline behavior

On May 21, 2013, at 9:53 AM, "Savova, Guergana" 
 wrote:
> The OpenNLP sentence segmenter is trained on clinical data (cannot remember 
> exactly how many sentences were in the training corpus). This is the model 
> distributed with cTAKES. The only hard rule is the new line.

If it's trained on clinical data, why does it need a hard rule for that? Why 
isn't the model able to learn when to break on a newline or not?

Steve

> --Guergana
> 
> -Original Message-
> From: Steven Bethard [mailto:steven.beth...@colorado.edu]
> Sent: Tuesday, May 21, 2013 11:38 AM
> To: dev@ctakes.apache.org
> Subject: Re: sentence detector newline behavior
> 
> On May 21, 2013, at 9:02 AM, Tim Miller 
>  wrote:
>> I think the whole reason to use a machine learning approach for 
>> sentence detection should be to help weigh evidence with these cases 
>> where hard rules cause problems, mainly 1) when a period does not end 
>> a sentence, but also 2) where a newline does and does not mean end of 
>> sentence.
> 
> Perhaps we should consider re-training the OpenNLP sentence segmenter on some 
> clinical data? Presumably we can get sentences from the TreeBank annotations.
> 
> I don't know much about the OpenNLP sentence segmenter though. Does it only 
> classify on periods? We'd want to classify all periods and newlines. And we'd 
> want to add features that capture patterns like "XXX: YYY".
> 
> Steve
> 
>> It
>> is of course bad that in your example if you don't put a sentence 
>> break you will think that "extravascular findings" is negated. But it 
>> is also bad if you put a sentence break immediately after the word 
>> "and" at the end of a line and then you find that your language model 
>> thinks that "and " is a good bigram.
>> 
>> I will create a jira for the parameter thing, and try to implement it 
>> and see if it gets ok results with the existing model.
>> Tim
>> 
>> On 05/21/2013 10:11 AM, Masanz, James J. wrote:
>>> +1 for adding a boolean parameter, or perhaps instead a list of 
>>> +section IDs
>>> 
>>> The sentence detector model was trained on data that always breaks at 
>>> carriage returns.
>>> 
>>> It is important for text that is a list something like this:
>>> 
>>> Heart Rate: normal
>>> ENT: negative
>>> EXTRAVASCULAR FINDINGS: Severe prostatic enlargement.
>>> 
>>> And without breaking on the line ending, the word negative would 
>>> negate extravascular findings
>>> 
>>> 
>>> -Original Message-
>>> From: dev-return-1605-Masanz.James=mayo@ctakes.apache.org
>>> [mailto:dev-return-1605-Masanz.James=mayo@ctakes.apache.org] On 
>>> Behalf Of Miller, Timothy
>>> Sent: Tuesday, May 21, 2013 7:07 AM
>>> To: dev@ctakes.apache.org
>>> Subject: sentence detector newline behavior
>>> 
>>> The sentence detector always ends a sentence where there are newlines.
>>> This is a problem for some notes (e.g. MIMIC radiology notes) where 
>>> a line can wrap in the  middle of a sentence at specified character 
>>> offsets. In the comments for SentenceDetector, it seems to be split 
>>> up very logically in that it first runs the opennlp sentence 
>>> detector, then breaks any detected sentence wherever there is a newline. 
>>> Questions:
>>> 1) Would it be good to add a boolean parameter for breaking on newlines?
>>> 2) If that section was removed/avoided, does the opennlp sentence 
>>> detector give good results given our model? Or is the model trained 
>>> on text that always breaks at carriage returns?
>>> 
>>> Tim
>> 
> 



RE: how do you feel about putting public presentations on ctakes.apache.org ?

2013-07-08 Thread Savova, Guergana
+1

There are already a number of publications in Publications and Acknowledgements 
section in History. Of course, the content can be re-organized.
--Guergana

-Original Message-
From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] 
Sent: Wednesday, July 03, 2013 10:04 AM
To: 'dev@ctakes.apache.org'
Subject: RE: how do you feel about putting public presentations on 
ctakes.apache.org ?

+1

-Original Message-
From: dev-return-1727-Masanz.James=mayo@ctakes.apache.org 
[mailto:dev-return-1727-Masanz.James=mayo@ctakes.apache.org] On Behalf Of 
Mattmann, Chris A (398J)
Sent: Tuesday, July 02, 2013 9:54 PM
To: dev@ctakes.apache.org
Subject: Re: how do you feel about putting public presentations on 
ctakes.apache.org ?

+1 makes total sense, should be a great way to show off the project.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department University of Southern 
California, Los Angeles, CA 90089 USA
++






-Original Message-
From: , Pei 
Reply-To: "dev@ctakes.apache.org" 
Date: Tuesday, July 2, 2013 7:45 PM
To: "dev@ctakes.apache.org" 
Subject: RE: how do you feel about putting public presentations on 
ctakes.apache.org ?

>+1 for a Related Resources with public links.
>There have been a couple of recently accepted/published papers that can 
>illustrate what could be done with cTAKES which would fit nicely into 
>that page...
>
>From: Girivaraprasad Nambari [girinamb...@gmail.com]
>Sent: Tuesday, July 02, 2013 10:15 PM
>To: dev@ctakes.apache.org
>Subject: Re: how do you feel about putting public presentations on 
>ctakes.apache.org ?
>
>I think it would be good to have a page on Wiki with title something 
>like "Related resources" (Apache Mahout has similar page I guess) and 
>add relevant "public" links here. I know we can't pull everything to 
>this page, but whatever core team thinks are valuable at least (or) 
>referred while implementing the code.
>
>I think this page will give high level overview on what frameworks 
>developers need to be aware of while using ctakes.
>
>For example, we can add following links:
>
>https://code.google.com/p/uimafit/wiki/GettingStarted
>http://knowtator.sourceforge.net/quickstart.shtml
>
>And some links related to "SVM" and "MaxEnt" related information 
>etc.,(Which were referred by original implementer).
>
>This way people who want to extend/add new features and referring 
>existing implementation will be able to go through these to get 
>understanding on what is happening inside ctakes.
>
>Thank you,
>Giri
>
>
>
>
>
>On Tue, Jul 2, 2013 at 7:26 PM, Andy McMurry 
>wrote:
>
>> Argument FOR:
>>
>> Videos can also be very educational for new users!
>> For example, this cTAKES description by Guergana :
>> https://vimeo.com/24829353
>>
>> Publishing our slides or video -- for example recent the NLP 
>>presentations  from the i2b2 user group -- gives folks a very real 
>>sense of the kinds of  problems we are currently working on.
>> A lot of people can't make it to these events. The slides are on the 
>>web  anyway, just harder to find if you dont know already.
>>
>> Argument AGAINST:
>>
>> Sharing your ideas before they are published in a journal can be bad 
>>for  academic credit!
>> Make every effort to separate the scientific research from 
>>engineering  product.
>> There is little value in sharing an idea without an implementation 
>>anyway,  this is already complicated enough with stable software.
>>
>> ~~~
>> I can see both perspectives.
>> Curious what others think about this.
>>
>> --Andy
>>
>>
>>
>>



RE: Next cTAKES release (3.1)?

2013-07-18 Thread Savova, Guergana
We have 5-6 clinical notes that we got from the web (=publicly available to 
anyone). We can include them as samples in the 3.1 release. We have been using 
these notes for demo purposes.
--Guergana

-Original Message-
From: Andy McMurry [mailto:mcmurry.a...@gmail.com] 
Sent: Friday, June 28, 2013 10:15 AM
To: dev@ctakes.apache.org
Subject: Re: Next cTAKES release (3.1)?

iDash and others have medical NLP datasets that could be used for ctakes 
"Getting Started" examples http://idash.ucsd.edu/nlp-and-data-modeling
http://idash.ucsd.edu/nlp/umls-vm

the GOOD: iDash already includes ctakes 
the BAD: iDash references old versions ctakes and points to cabig (which is now 
defunct)   

Recommendation: we should talk to iDash, create "hello medical world" training 
examples, and request iDaash point to the cTakes Apache home page. 

Disclaimer: I'm not involved with iDash 

On Jun 27, 2013, at 10:58 PM, Girivaraprasad Nambari  
wrote:

> Hi Vijay and Andy,
> 
> Thanks for sharing those examples.
> 
> "Trouble is, privacy requires that these examples be made up by hand"
> 
> Agree with this statement and this is very valid concern.
> 
> In "getting started examples", I think we should just have couple of 
> entries (5-10 small entries), not more than that (with explicit 
> statement like "ONLY EXAMPLE", NOT GOOD FOR REAL USAGE). I understand 
> handcrafting these may not be easy because we are not medical domain 
> experts, but I feel worth time, because it brings in more user community.
> 
> Thank you,
> Giri
> 
> 
> 
> 
> 
> On Thu, Jun 27, 2013 at 10:25 PM, Andy McMurry wrote:
> 
>> GREAT !
>> 
>> The i2b2 data though isn't publicly distributable, you still need to 
>> request access to it since it is "semi private"
>> 
>> 
>> On Jun 27, 2013, at 9:52 PM, vijay garla  wrote:
>> 
>>> We released code on using cTAKES to annotate clinical text and SVMs 
>>> that use the annotations to classify clinical text from the CMC 2007 
>>> and I2B2
>>> 2008 challenges:
>>> 
>>> We did the cmd 2007 with cTAKES 2.5:
>>> 
>> https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08#Repro
>> ducing_results_on_CMC_2007_challenge
>> 
>>> 
>>> 
>>> And the i2b2 2008 with the version of cTAKES distributed with the 
>>> first version of ARC:
>>> https://code.google.com/p/ytex/wiki/FeatEng_V05#i2b2_2008
>>> 
>>> These are both publicly available datasets, and represent real-world 
>>> problems (in general I believe when publishing a paper the code 
>>> should be reproducible and made publicly available, but that's a different 
>>> issue).
>>> 
>>> When we get around to upgrading YTEX to cTAKES 3.1, we would like to 
>>> upgrade these samples as well.
>>> 
>>> Best,
>>> 
>>> VJ
>>> 
>>> 
>>> 
>>> On Thu, Jun 27, 2013 at 8:32 PM, Andy McMurry 
>>> >> wrote:
>>> 
 +1 suggestion for documenting many examples of "getting started" 
 +NLP
 datasets.
 
 I have at least one we can use that was created by our lead 
 Pathologist
 
 
>> https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cas
>> es/train/traincase.xml
 
 We should provide at least one sample for each domain.
 Trouble is, privacy requires that these examples be made up by hand 
 and not copy-pasted from EMR systems.
 
 --Andy
 
 On Jun 27, 2013, at 5:32 PM, Girivaraprasad Nambari <
>> girinamb...@gmail.com>
 wrote:
 
> +1 for this observation Andy!
> 
> Lowering time will motive users in writing blogs about features, 
> how
>> to,
> etc., which reduces core team work load on documentation.
> 
> I have been trying to write a small "how to write standalone 
> client for ctakes" with my experience (I saw at least 4 users 
> posted similar
 question
> in last 2 months), but not getting enough time because ctakes 
> depends
>> on
> lot of other frameworks (UimaFit, cleartk, UIMA Framework etc.,), 
> most
>> of
> my spare time is being spent on juggling between these frameworks,
 posting
> and browsing those forums, relating observations to ctakes code. I
>> think
 we
> need to have some high level documentation about these (with links 
> to corresponding forums).
> 
> Above case is for developers (I think this will be more user base 
> as
 ctakes
> progress), for users I think documentation is lot better though 
> some improvements need to be done.
> 
> As a developer I felt tough with lack of sample training data (I 
> am
>> still
> struggling in this area even though I browsed all relevant code),
>> though
> training class are there. I understood that there are licensing 
> issues
 with
> REAL data, but at least some hand made example sentences, which 
> may not
 be
> real but helps developers in understanding the type/structure of 
> input TRAINING classes expecting. This way people who b

RE: Next cTAKES release (3.1)?

2013-07-18 Thread Savova, Guergana
Actually, MTsamples is what iDASH downloaded for their notes repository.
--Guergana

-Original Message-
From: andy mcmurry [mailto:mcmurry.a...@gmail.com] 
Sent: Wednesday, July 03, 2013 7:26 PM
To: dev@ctakes.apache.org
Subject: Re: Next cTAKES release (3.1)?

Mtsamples has lots of free public examples already but we aren't using them 
yet.  This is probably because mtsamples don't have the annotations we need to 
use them as training examples.
On Jul 3, 2013 2:46 PM, "Hephaestus Studio" 
wrote:

> @Andy - Not a doctor yet, but soon! Thanks for the promotion though, 
> one more year!
>
> - Apropos meds or clinical type questions: any developer on here can 
> feel free to shoot me a quick question via the list anytime, Id be 
> happy to confirm that a drug or anything else makes since given a 
> particular clinical/note context.
>
> - "I wonder if there is someway in which you could guide us in making 
> better use of the medical knowledge sources (ontologies) that are 
> available." - I'd be happy to brainstorm about using existing 
> resources to help in decision making. We use these all the time in the clinic.
>
> @ Tim+Andy+Chen - I haven't had a chance to really start chewing into 
> the code, though I hope to over the next year; so, what kind of 
> examples would be most helpful?
> - Any particular disease processes?
> - Are you all familiar with the ubiquitous SOAP style presentation 
> that doctors use to write free notes? The few examples I clicked 
> through in the repository that Chen pointed me too are very sparse. 
> Would we want gradations? E.g., a scale for "well done" notes to "very 
> quick I-dont-care-because-I'm-in-a-rush" notes?
>
> @ Chen - Thank you for the kind words. It's nice to be welcomed by a 
> community in which you hope to integrate. And thank you for pointing 
> me to the directory with the current sample notes. This was very 
> helpful in determining where those are at in there development. I know 
> that each of your hospitals have a wealth of HIPAA-closed notes, but 
> I'll see what I can do to make some "stereotypical" open-notes for 
> common disease presentations. Again: maybe a scale, not necessarily 
> just on brevity but some other metric, whose continuum represented 
> various permutations of degrees of something, maybe of difficulty in 
> processing? Apropos code,
> Chen: I will help where I can but where I want to be is elbow deep in 
> the code :)
>
> Finally: I haven't had a chance to look into some of the links from 
> earlier in this thread regarding open access repositories of free text 
> clinical notes: what do you all feel the quality of these resources are?
> Abundant but low quality? Paucity but those that are there are high quality?
>
> Bottom line: no problem either answering contextual questions (can 
> afib be associated with a lower gi bleed??) and no problem writing 
> some notes, only question would be, before I put in any time: what 
> disease/specialty domain?
> and would we want some system that put them on a continuum of some 
> variable, say, brevity or "readability"?
>
> Just thinking before leaping,
>
> Thanks,
> JG
>
> Sent from my iPhone
>
> On Jul 2, 2013, at 21:23, "Chen, Pei" 
> wrote:
>
> > Hi John,
> > Welcome!  There are actually many ways to contribute and it's not
> limited to just code.  It's always great to hear new ideas and 
> suggestions on how to improve the software.  Therefore even, things 
> like user feedback, documentation, new use cases, essentially anything 
> that will make things better would be awesome!
> >
> > To get started, I would suggest subscribing to the email lists.  If 
> > you
> would like to contribute anything, just create an Jira account (anyone 
> should be able to do this), and add/review Jira items (add attachments 
> if you like) and we can even help integrate it.
> >
> > We normally use Jira to keep track of issues:
> > [1] https://issues.apache.org/jira/browse/ctakes
> >
> > Current collection of sample test notes that have been collected 
> > over
> the years:
> >
> https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-regression-test/t
> estdata/input/plaintext/
> >
> > 
> > From: Tim Miller [timothy.mil...@childrens.harvard.edu]
> > Sent: Tuesday, July 02, 2013 6:31 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Next cTAKES release (3.1)?
> >
> > Agreed that you could definitely help out, and that would be a great 
> > way to do so. We don't really have "examples" right now, more like 
> > just short test sentences for showing simple results and verifying 
> > that nothing has been broken by changes. I think regular length fake 
> > but realistic notes would be very useful.
> > Tim
> >
> > On 07/02/2013 05:19 PM, John Green wrote:
> >> Hi all,
> >>
> >> Ive been following this mail list for a couple of months. Im a 
> >> third
> year medical student rounding the bend toward my MD. I used to be a 
> computer programmer, however, and co

RE: Next cTAKES release (3.1)?

2013-07-18 Thread Savova, Guergana
+1 for Dr. Green generating fake but realistically looking notes.

Dr. Green,
If you can generate a few notes that could go in the 3.1 release, that would be 
wonderful! Thanking you!
--Guergana

-Original Message-
From: Tim Miller [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, July 02, 2013 6:31 PM
To: dev@ctakes.apache.org
Subject: Re: Next cTAKES release (3.1)?

Agreed that you could definitely help out, and that would be a great way to do 
so. We don't really have "examples" right now, more like just short test 
sentences for showing simple results and verifying that nothing has been broken 
by changes. I think regular length fake but realistic notes would be very 
useful.
Tim

On 07/02/2013 05:19 PM, John Green wrote:
> Hi all,
>
> Ive been following this mail list for a couple of months. Im a third year 
> medical student rounding the bend toward my MD. I used to be a computer 
> programmer, however, and continue my own projects. Im very interested in 
> contributing eventually to cTakes development. In the meantime, given the 
> current talk of examples, if any domain specific examples needed generated I 
> am domain knowledgable enough that I could pound out a few free text notes 
> made to order.
>
> Let me know, you all may already have docs on hand willing todo this, but if 
> not...
>
> John Green
>
> Sent from my iPhone
>
> On Jun 28, 2013, at 8:59, "Chen, Pei"  wrote:
>
>> I completely agree with making cTAKES easier use.  I think it is exciting to 
>> hear the different use cases here and understanding where some of the areas 
>> that need improvements are (which we haven't thought about earlier).
>> I think Tim's suggestions and the 3 concrete actionable items makes a lot of 
>> sense.  Hopefully it should attract new users, adopters, and perhaps more 
>> committers.
>>
>>> i) Make the typesystem forefront in documentation -- generate 
>>> javadocs and have as a link on the ctakes frontpage/sidebar
>>> ii) Similar to the way that we are aiming to have tests in every 
>>> module, also have clearly labeled examples in every module that set 
>>> up a pipeline, run on sample notes (could be the same sample notes 
>>> from the tests), and do something with the results.
>>> iii) Follow Giri's recommendation to have example training data for 
>>> people who want to take the next step and train their own models
>> I think Java developers are accustomed to including a library as a 
>> dependency/jar, have an API to pass input, and get the results via pojos;  
>> So the examples could initially shield the complexity of wiring a pipeline 
>> together etc.
>> If we can improve the API's and how it gets integrated with other apps, we 
>> can add any GUI/CLI tools on top of this afterwards.
>>
>> --Pei
>>
>>> -Original Message-
>>> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>>> Sent: Friday, June 28, 2013 8:00 AM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: Next cTAKES release (3.1)?
>>>
>>> Very interesting discussion. I think Giri is right about giving 
>>> example training data in the format that our training code can read. 
>>> While our ultimate goal would be to build and release models that 
>>> are completely domain- independent, in the real world it is almost 
>>> always better to use some domain-specific data and we should think more 
>>> about how to facilitate that.
>>>
>>> As for making it easier to get started, it is not totally clear to 
>>> me what this means/how to do it so it might be useful to get 
>>> specific about what this means. I think our biggest hurdle is
>>>
>>> 1) Prerequisite of understanding UIMA/UIMAFit
>>>
>>> Since UIMAFit is officially becoming part of UIMA that will be 
>>> easier, and hopefully people will just learn the easier (in my 
>>> opinion) UIMAFit way than the standard UIMA way of doing things. Is 
>>> there something we can be doing to make understanding UIMA easier? 
>>> Or do we just need to say upfront that this is a prerequisite and 
>>> hope that people don't give up due to this thing that is out of our control?
>>>
>>> Another hurdle is:
>>>
>>> 2) cTAKES is a multi-purpose developer-aimed tool
>>>
>>> So it's not just a matter of hiding complexity -- at some point 
>>> people have to understand their problem, understand cTAKES' capabilities, 
>>> and start coding.
>>> Pei's GUI will help for some common use cases but will not remove 
>>> the requirement that someone at the organization knows cTAKES.
>>> I think one part of this problem is the fact that the typesystem is 
>>> not well documented. A developer needs to know what the output is 
>>> (objects from the typesystem), how to get them (which 
>>> modules/pipelines), and what information is in them. So maybe on this end 
>>> my recommendation would be:
>>> i) Make the typesystem forefront in documentation -- generate 
>>> javadocs and have as a link on the ctakes frontpage/sidebar
>>> ii) Similar to the way that we are aiming to have tests 

RE: Examples

2013-08-17 Thread Savova, Guergana
I second Pei. Excellent development, John! Thank you!

I can ask my annotators to create gold annotations for several layers according 
to the annotation guidelines. How many notes do you have, John?
--Guergana

-Original Message-
From: Pei Chen [mailto:chen...@apache.org] 
Sent: Saturday, August 17, 2013 2:49 PM
To: dev@ctakes.apache.org
Subject: Re: Examples

Hi John,
This is great news.
I would suggest creating a Jira item and attach the docs to it (Additional
Actions>Attach). [On can create an Jira account on
https://issues.apache.org/jira/browse/ctakes.]
Then we can make the commits for you.
This is awesome... do anyone have the gold standard annotation guideline(s) 
handy?  Would be great to have those test notes annotated as well as examples.

--Pei



On Sat, Aug 17, 2013 at 11:41 AM, John Green wrote:

> Just got some free time. I have a number of example free-text per 
> previous discussions to upload. They're quality but not annotated. Do 
> I need someone to commit for me?
>
> Thanks,
> J Green
>


RE: ctakes-examples project?

2013-08-21 Thread Savova, Guergana
We will look at the metadata info. Some of it is critical for the annotations 
(e.g. docTime).
Thank you, John.
--Guergana

-Original Message-
From: John Green [mailto:john.travis.gr...@gmail.com] 
Sent: Wednesday, August 21, 2013 12:56 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes-examples project?

So far I have about 15 notes done. Im submitting them slow as, after I said 
they were done I decided one last review of each for gross errors and 
completeness would be in order. Im slowly working through the proof read of 
each now. Just FYI if anyone was wondering.

I don't know what the coders can do with the metadata I included at the top of 
each note. I thought it would be useful, however, to attempt to describe the 
data with these additional metrics. Maybe they are already included vectors in 
the gold-standard annotation.

JG


On Wed, Aug 21, 2013 at 12:00 PM, Pei Chen  wrote:

> John is creating example clinical notes [1]...  and I believe Guergana's 
> group and co. will create gold standard annotations for them (as 
> training examples)?
>
> Where would be a good home for something like this?
>
> What do folks think about creating a separate project called 
> ctakes-examples?  An alternative might be to put it in one of the test 
> projects?
> [1] https://issues.apache.org/jira/browse/CTAKES-223
>
> --Pei
>


RE: Happy thanksgiving !

2013-11-29 Thread Savova, Guergana
Thank you, Andy! Happy thanksgiving cTAKES team!
--Guergana

-Original Message-
From: andy mcmurry [mailto:mcmurry.a...@gmail.com] 
Sent: Thursday, November 28, 2013 7:19 PM
To: dev@ctakes.apache.org
Subject: Happy thanksgiving !

Sharing is caring! Happy thanksgiving to the ctakes crew. Thanks for 
everything, you are appreciated


RE: ctakes-pad-term-spotter component?

2014-02-18 Thread Savova, Guergana
I vote to deprecate.
--Guergana

From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
Sent: Tuesday, February 18, 2014 10:45 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org
Subject: ctakes-pad-term-spotter component?

Hi,
Is anyone still using the pad-term-spotter component?
Deprecating this module if it's no longer used will simplify the codebase and 
reduce the effort in support...

--Pei



RE: temporal assertion module

2014-03-26 Thread Savova, Guergana
The temporal module is still in development. We are working on a release but 
will take couple of months. We will email the cTAKES community once the 
temporal system is ready to go.
--Guergana

-Original Message-
From: digital paula [mailto:cybersat...@hotmail.com] 
Sent: Tuesday, March 25, 2014 6:42 PM
To: dev@ctakes.apache.org
Subject: temporal assertion module

Hello cTAKES Developer Community,
 
There are about 10 packages under temporal with a total of maybe 80 or 90 files 
but no XML descriptor.  I see that the temporal module is not part of the 
released version in cTAKES since no documentation available in component use 
guide page: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+Component+Use+Guide#cTAKES3.1ComponentUseGuide-Components
 
I'd like to use the temporal module but I'm lost without the XML descriptors 
which is how I integrate modules into the clinical pipeline.  If anyone's using 
the temporal module I'd appreciate any help to get started which could be links 
to more info on usage, maybe even an XML descriptor or two on the temporal 
module.  
 
Thanks.
 
Regards,
Paula
  


RE: Allergy Indication

2014-04-09 Thread Savova, Guergana
Hi Manu,

cTAKES does not have a module for allergy discovery. The annotation of this 
sentence will span "Penicillin" with a sematic type of a drug and attributes 
associated with a drug (aka drug signature).

Hope this helps.
--Guergana

-Original Message-
From: Manu Sikka [mailto:manusi...@hotmail.com] 
Sent: Wednesday, April 09, 2014 4:31 AM
To: dev@ctakes.apache.org
Subject: RE: Allergy Indication

It was actually like this
Patient has asthmaAllergies : Penicillin I have also tried other sentences like 
- "Patient is allergic to Penicillin"
but it would not associate the Medication with an allergy Thanks

> From: manusi...@hotmail.com
> To: dev@ctakes.apache.org
> Subject: Allergy Indication
> Date: Tue, 8 Apr 2014 17:34:41 -0400
> 
> Hello,
> I am running the ctakes GUI with the following text Patient has 
> asthmaAllergies : Penicillin It does catch Asthma and Allergies as a 
> disease and Penicillin as a medication It does not however classify 
> Penicillin as a allergic component Under
> 
>   ontologyConceptArr = uima.cas.FSArray[1]
> 
>   medicationAllergy = 
> and
> 
>   org.apache.ctakes.typesystem.type.textsem.MedicationAllergyModifier [0]   
> is Zero 
> Any help is appreciated?
> ThanksManu  
  


RE: Preparing for an Apache cTAKES 3.2 Release?

2014-06-16 Thread Savova, Guergana
+1

Guergana

-Original Message-
From: Dligach, Dmitriy [mailto:dmitriy.dlig...@childrens.harvard.edu] 
Sent: Monday, June 16, 2014 10:56 AM
To: cTAKES Developer list
Subject: Re: Preparing for an Apache cTAKES 3.2 Release?

+1

Dima




On Jun 16, 2014, at 9:42, Miller, Timothy 
 wrote:

> Sorry to weigh in so late on this -- just returned from vacation. If 
> we want to have a one release delay before making dictionary2 default 
> for testing/documentation/configuration purposes, and there isn't an 
> obvious function-related name, and the main difference is speed, maybe 
> we could call it dictionary-lookup-fast? Besides being accurate and 
> more descriptive than "2", it might lure people into trying it and 
> give us some feedback.
> 
> Tim
> 
> 
> On 06/16/2014 10:34 AM, Chen, Pei wrote:
>> I'm making some significant updates to trunk that may cause some instability 
>> for this release.
>> It should be mostly transparent, but let me know if you encounter any issues 
>> with trunk.
>> 
>> Also, regarding the dictionary-lookup2.  If there are no strong objections, 
>> we can leave default to as-is (old behavior).  Folks who wish to give the 
>> new one a try are welcome to do so and we can change the default behavior in 
>> a future release.
>> 
>> [ducks for cover now]
>> --Pei
>> 
>>> -Original Message-
>>> From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of 
>>> Karthik Sarma
>>> Sent: Wednesday, June 11, 2014 9:58 AM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
>>> 
>>> Agreed
>>> 
>>> On Wednesday, June 11, 2014, vijay garla  wrote:
>>> 
 regardless of the name, I think it would be incredibly helpful to 
 have thorough documentation on the dictionary lookup, how to 
 configure it, and how to create new dictionaries.  I would venture 
 to say that this is the most important component in cTAKES, and 
 probably the one that has generated the most questions on the newsgroup.
 
 
 
 On Wed, Jun 11, 2014 at 9:21 AM, Finan, Sean < 
 sean.fi...@childrens.harvard.edu> wrote:
 
>> . The newer NER should have in its name the Behavior...
> I agree, but the *2 module is a complete replacement for the 
> current lookup.  It does not (really) have any different behavior, 
> just a
 different
> implementation and performance.  We plan to swap out the old with 
> the new in the next release and get rid of the *2 suffix.  So, any 
> name provided now is just temporary - unless people don't like the 
> name "dictionary-lookup" at all.
> 
> In my original sandbox it was named "RareWordLookup", a nod to its 
> implementation.  However, this doesn't help any users.
> 
> Sean
> 
> -Original Message-
> From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
> Sent: Wednesday, June 11, 2014 3:09 AM
> To: dev@ctakes.apache.org
> Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
> 
> "2" doesn't mean much. The newer NER should have in its name the 
> Behavior...
> 
> Perhaps something like MetaMap Usage 
>  "--
>>> allow_overmatches"
> or  "--allow_concept_gaps" or .other?
> 
> Since yTex already provides a pluggable *DictionaryLookup, *that 
> seems like the best place to define the differing Behavior /  Usage.
> 
> https://cwiki.apache.org/confluence/display/CTAKES/User's+Guide
> https://code.google.com/p/ytex/wiki/DictionaryLookup_V05
> 
> 
> AndyMC
> 
> On Tue, Jun 10, 2014 at 9:55 AM, britt fitch 
> 
> wrote:
> 
>> I don't have an issue with the *-2 name. I also don't have any 
>> objections to renaming it.
>> 
>> It might be nice to keep the old dictionary code around for a 
>> release-worth of time but after that I would vote purging it.
>> If someone needs it after that it'll be accessible in the 
>> archived releases.
>> 
>> 
>> 
>> On Jun 10, 2014, at 12:48 PM, Chen, Pei 
>> 
>> wrote:
>> 
>>> I think James has a fair point here.
>>> It may be worthwhile biting the bullet here and push forward.
>>> 
>>> Since this essentially will be a full replacement of the
>> ctakes-dictionary-lookup module, a good option maybe to just 
>> replace the entire module now and rename the existing module to *
>>> _deprecated.
>>> How do folks feel about that?  In a nutshell,
>>> ctakes-dictionary-lookup-2
>> is a faster algorithm with a simpler code base- and comparable 
>> results (Sean has a full comparison in the documentation for 
>> those who are
> curious).
>>> --Pei
>>> 
 -Original Message-
 From: britt fitch [mailto:britt.fi...@gmail.com]
 Sent: Monday, June 09, 2014 5:42 PM
 To: dev@ctakes.apache.org
 Subject: Re: Preparing for 

RE: [DISCUSS] cTAKES BigTop/Hadoop integration

2014-09-22 Thread Savova, Guergana
+1. Fantastic project!!
--Guergana

From: britt fitch [mailto:britt.fi...@wiredinformatics.com]
Sent: Monday, September 22, 2014 11:32 AM
To: u...@ctakes.apache.org
Cc: dev@ctakes.apache.org
Subject: Re: [DISCUSS] cTAKES BigTop/Hadoop integration

This is a really exciting. I can't wait!









Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

On Sep 22, 2014, at 11:23 AM, Murali Nagendranath 
mailto:mmin...@gmail.com>> wrote:


+1

On Mon, Sep 22, 2014 at 8:50 PM, Kaggal, Vinod C. 
mailto:kaggal.vi...@mayo.edu>> wrote:
+1

Pei, what do you have in mind for hackathon?

On 9/22/14, 10:02 AM, "Pei Chen" 
mailto:chen...@apache.org>> wrote:

Jay proposed an interesting idea of creating an app that takes in
different streams of datasources, process text with cTAKES under the
BigTop/Hadoop ecosystem...

Initial thoughts were to have a hackathon, have something for Dec
2014, and a joint demo/effort at the next ApacheCon (04/2015).

https://issues.apache.org/jira/browse/CTAKES-314

--Pei




RE: sentence detector model

2014-09-29 Thread Savova, Guergana
How about pairing it with THYME and MiPACQ? Perhaps you are using them 
already...
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Monday, September 29, 2014 1:38 PM
To: dev@ctakes.apache.org
Subject: Re: sentence detector model

Some of them are a bit artificial for this task, with notes being annotated as 
one sentence per line and offset punctuation. I think maybe the 2008 and 2009 
data might have original formatting though, with newlines not always breaking 
sentences. That has certain advantages over raw MIMIC for training since the 
PHI isn't so weirdly formatted, but then again is not a mix of styles (that is, 
the styles of newline always terminates sentence vs. sometimes terminates 
sentence). I think it would still have to be paired with another dataset to be 
a representative sample.
Tim

On 09/29/2014 01:24 PM, vijay garla wrote:
> Why not use the i2b2 corpora?
>
> On Monday, September 29, 2014, Dligach, Dmitriy < 
> dmitriy.dlig...@childrens.harvard.edu> wrote:
>
>> Maybe creating a made-up set of sentences would be an option? That 
>> way we could agree on the annotation of concrete cases. Although this 
>> would be more of a unit test than a corpus.
>>
>> Dima
>>
>>
>>
>>
>> On Sep 27, 2014, at 12:15, Miller, Timothy < 
>> timothy.mil...@childrens.harvard.edu > wrote:
>>
>>> I've just been using the opennlp command line cross validator on the
>> small dataset i annotated (along with some eyeballing). It would be 
>> cool if there was a standard clinical resource available for this 
>> task, but I hadn't considered it much because the data I annotated 
>> pulls from multiple datasets and the process of  arranging with 
>> different institutions to make something like that available would probably 
>> be a nightmare.
>>> Tim
>>>
>>> Sent from my iPad. Sorry about the typos.
>>>
 On Sep 27, 2014, at 12:16 PM, "Dligach, Dmitriy" <
>> dmitriy.dlig...@childrens.harvard.edu > wrote:
 Tim, thanks for working on this!

 Question: do we have some formal way of evaluating the sentence
>> detector? Maybe we should come up with some dev set that would 
>> include examples from mimic...
 Dima




> On Sep 27, 2014, at 8:57, Miller, Timothy <
>> timothy.mil...@childrens.harvard.edu > wrote:
> I have been working on the sentence detector newline issue, 
> training a
>> model to probabilistically split sentences on newlines rather than 
>> forcing sentence breaks. I have checked in a model to the repo under 
>> ctakes-core-res. I also attached a patch to ctakes-core to the jira issue:
> https://issues.apache.org/jira/browse/CTAKES-41
>
> for people to test. The status of my testing is that it doesn't 
> seem
>> to break on notes where ctakes worked well before (those where 
>> newlines are always sentence breaks), and is a slight improvement on 
>> notes where newlines may or may not be sentence breaks. Once the 
>> change is checked in we can continue improving the model by adding 
>> more data and features, but the first hurdle I'd like to get past is 
>> making sure it runs well enough on the type of data that the old 
>> model worked well on. Let me know if you have any questions.
> Thanks
> Tim
>>



RE: De-identified lab tests dataset

2014-09-29 Thread Savova, Guergana
Ajay,
cTAKES currently does not implement a method to discover labs from the text. 
The motivation is that you can get that easily from the structured part of the 
EMR (what Pete explained below). Hope this makes sense!
--Guergana

-Original Message-
From: Peter Szolovits [mailto:p...@mit.edu] 
Sent: Monday, September 29, 2014 2:32 PM
To: dev@ctakes.apache.org
Subject: Re: De-identified lab tests dataset

Ajay, I'm confused by your query.  cTakes is good at interpreting text, but 
most lab test results are reported in tabular form that is most appropriately 
searched by SQL queries.  Sometimes lab results are also reported in narrative 
notes, but parsing those is often more a matter of deciphering the text 
structure of tables than of parsing real English text.  What am I 
misunderstanding?

--Pete Sz.

On Sep 29, 2014, at 2:25 PM, Ajay Jain  wrote:

> Hello All,
> 
> I am working on a use case for lab tests data using cTAKES and my 
> online search to find a test dataset has been futile.  I'll greatly 
> appreciate if someone can share such a dataset or can point me in the 
> right direction to go looking for one.
> 
> Best,
> Ajay
> 
> --
> Founder & CEO
> Mobile Insights, Inc.
> (630) 408-8623



RE: YTEX depends on trove4j? LGPL issue

2014-10-15 Thread Savova, Guergana
At one point (some long time ago), I remember LGPL was compatible with Apache. 
What version of LGPL is this dependency using?
--Guergana

-Original Message-
From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] 
Sent: Wednesday, October 15, 2014 11:43 AM
To: dev@ctakes.apache.org
Subject: RE: YTEX depends on trove4j? LGPL issue

Steve,
This is a good catch!  I was pretty sure 3rd party libs were checked but 
somehow this may have been missed.
I noticed it's in the convenience binary distro as well.  We need to remove 
this; I'll create a Jira.
VJ, could you confirm- I actually don't think we use trove4j in ytex? 
ctakes-ytex/pom.xml

--Pei

> -Original Message-
> From: Steven Bethard [mailto:steven.beth...@gmail.com]
> Sent: Wednesday, October 15, 2014 10:40 AM
> To: dev@ctakes.apache.org
> Subject: YTEX depends on trove4j? LGPL issue
> 
> It seems that YTEX depends on trove4j which is LGPL [1], but 
> "LGPL-licensed works must not be included in Apache products" [2].
> Have the YTEX dependencies been reviewed for licensing issues? (I only 
> stumbled upon the trove issue via a version conflict in other code.)
> 
> Steve
> 
> [1] http://trove4j.sourceforge.net/html/license.html
> [2] http://www.apache.org/legal/resolved.html


RE: Announcement: UMLS MedGen-MySQL dataset now available as open access download

2014-11-11 Thread Savova, Guergana
This is great Thank you so much, Andy!!!
I agree that it will make life for many users MUCH easier.
--guergana

-Original Message-
From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] 
Sent: Tuesday, November 11, 2014 5:31 PM
To: dev@ctakes.apache.org
Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available as open 
access download

+1000 on this!  Great lets make a jira!!!

> On Nov 11, 2014, at 5:02 PM, andy mcmurry  wrote:
> 
> Hello!
> 
> https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
> 
> We just released a new library containing a huge chunk of UMLS 
> concepts which are available without registering accounts/username/passwords.
> LEGALLY. Yes, really!
> 
> The subset is from NCBI and it contains *thousands of concepts from 
> SNOMED and other vocabularies*.
> 
> The code is essentially
> 1. a list of WGET targets to various NCBI FTP site mirrors 2. Makefile 
> for building the databases of interest
> 
> Our legal team has approved distribution for Open Access work, ASL2 
> LICENSE.
> 
> I recommend we use this opportunity to make this the default 
> distribution for CTAKES UMLS connections, because it obviates the need 
> for so much painful credentialing and back and forth agreements with 
> the US National Library of Medicine.
> 
> Cheers!
> --Andy
> 
> 
> On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. 
> 
> wrote:
> 
>> 
>> I would love to see the install be as simple as apt-get install to 
>> end up with some working dictionary that have more than a handful of 
>> entries to get them started.
>> 
>> Regards,
>> James Masanz
>> 
>> -Original Message-
>> From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
>> Sent: Tuesday, September 09, 2014 4:32 PM
>> To: ctakes-...@incubator.apache.org
>> Subject: Recommendation for ctakes default (UMLS) dictionaries
>> 
>> Greetings ctakes-dev:
>> 
>> *UMLS license restrictions have been getting more lax over the years 
>> -- *much of the UMLS can be downloaded directly from the NCBI 
>> official FTP site.
>> 
>> In fact, the NIH (and implicitly the NLM) *have already made the 
>> standard terms public for some medical specialities*.
>> 
>> For example: Here is the UMLS subset specific to Medical Genetics 
>> (MedGen) and Genetic Testing (GTR) complete with SNOMED-CT concept 
>> CUI(s) and names, etc :
>> 
>> [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
>> 
>> My team has developed a JVM based wrapper for MetaMap 2013AB which I 
>> intend to open source soon (Clojure).  It includes REST support for 
>> invoking MetaMap with any or all of the command line arguments.
>> We do not integrate with UIMA, we are basically a wrapper around the 
>> binary installation of MetaMap. The emphasis is on publication text 
>> not clinical text, still, some services are common (such as LVG).
>> 
>> Strangely, the NLM still requires UMLS licenses to download MetaMap 
>> execution binaries. The MetaMap binary install is better but 
>> customizing dictionaries (DataFileBuilder) is not as easy to use as 
>> CTAKES with YTEXT
>> 
>> [ 
>> https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation 
>> ]
>> 
>> *** Hence, there is a real opportunity here to enable Apache cTAKES 
>> to have a stronger default dictionary. ** *
>> 
>> Imagine if we could
>> *$ apt-get install apache-ctakes *
>> 
>> and instantly have a working package for SOME problem domain.
>> In my case (Medical Genetics) the UMLS definitions are already 
>> available and the UMLS license problem becomes a non issue, at least 
>> for many first time users
>> 
>> Your thoughts?
>> AndyMC
>> 


revamping the Apache cTAKES website

2014-12-05 Thread Savova, Guergana
cTAKES-ers,

we would like to start working on updating the Apache cTAKES website - some of 
the information there is already stale and needs refreshing. Do you have ideas 
on website design, content, etc.? Would you like to contribute to the effort? 
We are planning to start working on the website the week of Dec 15.

Cheers,
--Guergana



gold standard annotations for Apache cTAKES sample notes

2014-12-05 Thread Savova, Guergana
Thanks to John Green, we now have sample clinical notes in cTAKES. Many thanks, 
John, for your effort!

We will take these notes and will start generating gold annotations that could 
be used then to compare cTAKES output to. We are planning to include 
annotations for:

1.   Entities with the attributes

2.   LocationOf and DegreeOf relations

3.   Within-document coreference

4.   Events

5.   Temporal expressions

6.   Temporal relations

Effort permitting, we also have on the list annotations for:

1.   Syntactic trees

2.   Dependency links

3.   Semantic roles

It will take us some time to generate the gold annotations, we will keep you 
posted on the progress.
Cheers,
--Guergana


RE: revamping the Apache cTAKES website

2014-12-05 Thread Savova, Guergana
Wonderful, thank you, Michelle! There will be a flurry of emails the week of 
Dec 15 followed by actual work, so book your calendar if possible...
--Guergana

-Original Message-
From: Michelle Chen [mailto:michelle1919c...@gmail.com] 
Sent: Friday, December 05, 2014 11:48 AM
To: dev@ctakes.apache.org
Subject: Re: revamping the Apache cTAKES website

Hello Guergana,

I don't know that much about cTakes, but would be interested in contributing to 
the effort.

I'm not sure if there is an interest in matching the website design of other 
Apache projects, but it seems that the two main designs that are being used 
from my arbitrary search on http://projects.apache.org/indexes/alpha.html is 1. 
the current design that cTakes is using and 2. a Bootstrap approach.

I've done a little bit of work on Bootstrap and would be interested in helping 
with that. Let me know how I can be helpful.

Sincerely,
Michelle Chen :)

"Be strong and of good courage; do not be afraid, nor be dismayed, for the Lord 
your God is with you wherever you go." ~Joshua 1:9


On Fri, Dec 5, 2014 at 11:21 AM, Savova, Guergana < 
guergana.sav...@childrens.harvard.edu> wrote:

> cTAKES-ers,
>
> we would like to start working on updating the Apache cTAKES website - 
> some of the information there is already stale and needs refreshing. 
> Do you have ideas on website design, content, etc.? Would you like to 
> contribute to the effort? We are planning to start working on the 
> website the week of Dec 15.
>
> Cheers,
> --Guergana
>
>


RE: revamping the Apache cTAKES website

2014-12-05 Thread Savova, Guergana
There are now 4 volunteers:
Michelle Chen
Pei Chen
Sean Finan
Guergana Savova

--Guergana

-Original Message-
From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] 
Sent: Friday, December 05, 2014 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: revamping the Apache cTAKES website

Wonderful, thank you, Michelle! There will be a flurry of emails the week of 
Dec 15 followed by actual work, so book your calendar if possible...
--Guergana

-Original Message-
From: Michelle Chen [mailto:michelle1919c...@gmail.com]
Sent: Friday, December 05, 2014 11:48 AM
To: dev@ctakes.apache.org
Subject: Re: revamping the Apache cTAKES website

Hello Guergana,

I don't know that much about cTakes, but would be interested in contributing to 
the effort.

I'm not sure if there is an interest in matching the website design of other 
Apache projects, but it seems that the two main designs that are being used 
from my arbitrary search on http://projects.apache.org/indexes/alpha.html is 1. 
the current design that cTakes is using and 2. a Bootstrap approach.

I've done a little bit of work on Bootstrap and would be interested in helping 
with that. Let me know how I can be helpful.

Sincerely,
Michelle Chen :)

"Be strong and of good courage; do not be afraid, nor be dismayed, for the Lord 
your God is with you wherever you go." ~Joshua 1:9


On Fri, Dec 5, 2014 at 11:21 AM, Savova, Guergana < 
guergana.sav...@childrens.harvard.edu> wrote:

> cTAKES-ers,
>
> we would like to start working on updating the Apache cTAKES website - 
> some of the information there is already stale and needs refreshing.
> Do you have ideas on website design, content, etc.? Would you like to 
> contribute to the effort? We are planning to start working on the 
> website the week of Dec 15.
>
> Cheers,
> --Guergana
>
>


RE: Scaling cTakes

2014-12-05 Thread Savova, Guergana
Hi Brandon,
Our estimate of how long it takes to process a document is under a second with 
the fast dictionary lookup I believe. Sean can provide more details. 
--Guergana

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Friday, December 05, 2014 1:21 PM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Hi Brandon,

It sounds like you've got  a decent pipeline set up.  To increase the speed you 
could try swapping out use of ctakes-dictionary-lookup with 
ctakes-dictionary-lookup-fast in the AE.  Check 
ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for 
an example.  As for the CASPool, I don't think that it will make any difference 
for cTakes.  

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Friday, December 05, 2014 12:40 PM
To: dev@ctakes.apache.org
Subject: Scaling cTakes

Hi,

I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
documentation and was able to take the BagofCUIGenerator example and modify to 
read notes from a DB, process using the UMLS AE in the clinical-pipeline using 
a local DB version of UMLS, and output the CUIs to a DB.  However, the problem 
I'm having is it's extremely slow; ~3.5-4 notes a minute.  I was hoping I could 
get some hints or advice on speeding the process up.  I read there's a patch 
for LVG, but wasn't quite sure how to implement.  Also from testing using the 
CPE GUI, I don't notice any different in processing time by adjusting the 
CASPool setting.  Some advice on the CASPool would be appreciated also.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.


RE: Links Not Working

2014-12-12 Thread Savova, Guergana
Thank you for pointing it out to us! Could you please send the inactive links?
We will be working on the website next week.
--Guergana


-Original Message-
From: kasie.allen [mailto:kasie.al...@world.edu] 
Sent: Friday, December 12, 2014 11:39 AM
To: dev@ctakes.apache.org
Subject: Links Not Working

Hi!

I came across a few links that aren't working on your website. Do you mind 
telling me who I should contact about them?

Thanks! :)
Kasie

-- 

Kasie Allen


RE: revamping the Apache cTAKES website

2014-12-15 Thread Savova, Guergana
Very nice, Pei, thank you!!!

For now, don't pay too much attention to the content and images, they are just 
placeholders, working on them.
--guergana

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, December 15, 2014 5:03 PM
To: dev@ctakes.apache.org
Subject: RE: revamping the Apache cTAKES website

Anyway, a pretty amazing fresh start, thanks Pei

-Original Message-
From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
Sent: Monday, December 15, 2014 4:33 PM
To: dev@ctakes.apache.org
Subject: RE: revamping the Apache cTAKES website

Check out a mockup of a new website proposal:
http://svn.apache.org/repos/asf/ctakes/site/new/index.html
Based off bootstrap (Idea borrowed from the Spark folks..).

Couple of key pieces of info:
- 10% of visitors are on mobile/tablets
- The most currently visited pages are: downloads.cgi, gettingstarted.html.  I 
suggest we focus our attention on those 2 items.  (Putting a Downloads link 
right on the front page, etc.)

svn co http://svn.apache.org/repos/asf/ctakes/site/new if you want to checkout 
the code of the site.

--Pei

-Original Message-
From: John Green [mailto:john.travis.gr...@gmail.com]
Sent: Friday, December 05, 2014 6:34 PM
To: dev@ctakes.apache.org
Cc: dev@ctakes.apache.org
Subject: RE: revamping the Apache cTAKES website

I would like to second the bootstrap recommendation, with the additional 
recommendation of django for the backend. It is an amazing platform for rapid 
development and easy updating.


JG
—
Sent from Mailbox

On Fri, Dec 5, 2014 at 12:15 PM, Savova, Guergana 
 wrote:

> There are now 4 volunteers:
> Michelle Chen
> Pei Chen
> Sean Finan
> Guergana Savova
> --Guergana
> -Original Message-
> From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]
> Sent: Friday, December 05, 2014 11:56 AM
> To: dev@ctakes.apache.org
> Subject: RE: revamping the Apache cTAKES website Wonderful, thank you, 
> Michelle! There will be a flurry of emails the week of Dec 15 followed by 
> actual work, so book your calendar if possible...
> --Guergana
> -Original Message-
> From: Michelle Chen [mailto:michelle1919c...@gmail.com]
> Sent: Friday, December 05, 2014 11:48 AM
> To: dev@ctakes.apache.org
> Subject: Re: revamping the Apache cTAKES website Hello Guergana, I 
> don't know that much about cTakes, but would be interested in contributing to 
> the effort.
> I'm not sure if there is an interest in matching the website design of other 
> Apache projects, but it seems that the two main designs that are being used 
> from my arbitrary search on http://projects.apache.org/indexes/alpha.html is 
> 1. the current design that cTakes is using and 2. a Bootstrap approach.
> I've done a little bit of work on Bootstrap and would be interested in 
> helping with that. Let me know how I can be helpful.
> Sincerely,
> Michelle Chen :)
> "Be strong and of good courage; do not be afraid, nor be dismayed, for 
> the Lord your God is with you wherever you go." ~Joshua 1:9 On Fri, Dec 5, 
> 2014 at 11:21 AM, Savova, Guergana < guergana.sav...@childrens.harvard.edu> 
> wrote:
>> cTAKES-ers,
>>
>> we would like to start working on updating the Apache cTAKES website
>> - some of the information there is already stale and needs refreshing.
>> Do you have ideas on website design, content, etc.? Would you like to 
>> contribute to the effort? We are planning to start working on the 
>> website the week of Dec 15.
>>
>> Cheers,
>> --Guergana
>>
>>


RE: cTakes Annotation Comparison

2014-12-19 Thread Savova, Guergana
We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation across three 
gold standard sets. We did not see the trend that Bruce reported below. The P, 
R and F1 results from the old dictionary look up and the fast one were similar.

Thank you everyone!
--Guergana

-Original Message-
From: David Kincaid [mailto:kincaid.d...@gmail.com] 
Sent: Friday, December 19, 2014 9:02 AM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my 
small tests that I've done in a non-systematic way. Did you happen to capture 
the number of false positives yet (annotations made by cTAKES that are not in 
the human adjudicated standard)? I've seen a lot of dictionary hits that are 
not actually entity mentions, but I haven't had a chance to do a systematic 
analysis (we're working on our annotated gold standard now). One great example 
is the antibiotic "Today". Every time the word today appears in any text it is 
annotated as a medication mention when it almost never is being used in that 
sense.

These results by themselves are quite disappointing to me. Both the 
UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
recall. It seems like the trade off for more speed is a ten-fold (or more) 
decrease in entity recognition.

Thanks again for sharing your results with us. I think they are very useful to 
the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen < 
bruce.tiet...@perfectsearchcorp.com> wrote:
>
> Actually, we are working on a similar tool to compare it to the human 
> adjudicated standard for the set we tested against.  I didn't mention 
> it before because the tool isn't complete yet, but initial results for 
> the set (excluding those marked as "CUI-less") was as follows:
>
> Human adjudicated annotations: 4591 (excluding CUI-less)
>
> Annotations found matching the human adjudicated standard
> UMLSProcessor  2245
> FastUMLSProcessor   215
>
>
>
>
>
>
>  [image: IMAT Solutions]   Bruce Tietjen 
> Senior Software Engineer
> [image: Mobile:] 801.634.1547
> bruce.tiet...@imatsolutions.com
>
> On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei 
>  >
> wrote:
> >
> > Bruce,
> > Thanks for this-- very useful.
> > Perhaps Sean Finan comment more-
> > but it's also probably worth it to compare to an adjudicated human 
> > annotated gold standard.
> >
> > --Pei
> >
> > -Original Message-
> > From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
> > Sent: Thursday, December 18, 2014 1:45 PM
> > To: dev@ctakes.apache.org
> > Subject: cTakes Annotation Comparison
> >
> > With the recent release of cTakes 3.2.1, we were very interested in 
> > checking for any differences in annotations between using the 
> > AggregatePlaintextUMLSProcessor pipeline and the 
> > AggregatePlanetextFastUMLSProcessor pipeline within this release of
> cTakes
> > with its associated set of UMLS resources.
> >
> > We chose to use the SHARE 14-a-b Training data that consists of 199 
> > documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the 
> > basis for the comparison.
> >
> > We decided to share a summary of the results with the development 
> > community.
> >
> > Documents Processed: 199
> >
> > Processing Time:
> > UMLSProcessor   2,439 seconds
> > FastUMLSProcessor1,837 seconds
> >
> > Total Annotations Reported:
> > UMLSProcessor  20,365 annotations
> > FastUMLSProcessor 8,284 annotations
> >
> >
> > Annotation Comparisons:
> > Annotations common to both sets:  3,940
> > Annotations reported only by the UMLSProcessor: 16,425
> > Annotations reported only by the FastUMLSProcessor:4,344
> >
> >
> > If anyone is interested, following was our test procedure:
> >
> > We used the UIMA CPE to process the document set twice, once using 
> > the AggregatePlaintextUMLSProcessor pipeline and once using the 
> > AggregatePlaintextFastUMLSProcessor pipeline. We used the 
> > WriteCAStoFile CAS consumer to write the results to output files.
> >
> > We used a tool we recently developed to analyze and compare the 
> > annotations generated by the two pipelines. The tool compares the 
> > two outputs for each file and reports any differences in the 
> > annotations (MedicationMention, SignSymptomMention, 
> > ProcedureMention, AnatomicalSiteMention, and
> > DiseaseDisorderMention) between the two output sets. The tool 
> > reports the number of 'matches' and 'misses' between each annotation set. A 
> > 'match'
> is
> > defined as the presence of an identified source text interval with 
> > its associated CUI appearing in both annotation sets. A 'miss' is 
> > defined as the presence of an identified source text interval and 
> > its associated CUI in one annotation set, but no matching iden