Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Miller, Timothy
I had in mind the notes in:
/ctakes-examples-res/src/main/resources/org/apache/ctakes/examples/notes/rtf

which I believe are the fake notes Dr. John Green wrote for us. I don't know 
why they are rtf but they are nice, non-toy-length notes.
Tim


From: Alexandru Zbarcea 
Sent: Tuesday, October 3, 2017 5:32 PM
To: Apache cTAKES Dev
Subject: Re: Missing resources for script that extracts markables from a corpus 
for analysis [EXTERNAL]

Hi Tim,

That's great news. If you think there are sample notes that can be used, I
can start working on the Lucene index and slowly build the UTest for them.

I have created CTAKES-462[1] where we can track this work.

Looking into the ctakes-examples-res, what I can find is:
$ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" |
grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper"
./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt
./main/resources/org/apache/ctakes/examples/notes/claudication
./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt
./main/resources/org/apache/ctakes/examples/notes/edge_cases_plaintext_1.txt

./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt
./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy
./main/resources/org/apache/ctakes/examples/notes/SampleInputRadiologyNotes.txt

./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_past_smoker.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc2_07543210_sample_past_smoker.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc2_07543210_sample_current.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_unknown.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_current.txt
./main/resources/org/apache/ctakes/examples/notes/mother_goose/README
./main/resources/org/apache/ctakes/examples/notes/mother_
goose/OneMistyMoistyMorning.txt
./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1

What notes do you consider I should start with (all) ?

Alex

[1] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D462&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=COSkyBpYGrcp_hTAFRRfTx8JCwHAzxTM3GMiXKrSbnE&s=jOmot_onPFb31eg689D0ihb5Y4dZTzKcQ40vMCW0Bgk&e=


On Mon, Oct 2, 2017 at 6:46 PM, Miller, Timothy  wrote:

> Yeah, it might be nice to build a lucene index of all the sample notes in
> the ctakes-example module. I'll create a jira for it but probably won't be
> able to get to it right away.
> Tim
>
> 
> From: Alexandru Zbarcea 
> Sent: Monday, October 2, 2017 5:31 PM
> To: Apache cTAKES Dev
> Subject: Re: Missing resources for script that extracts markables from a
> corpus for analysis [EXTERNAL]
>
> Hi Tim,
>
> I understand, makes sense. Is it possible to anonymize the data you have or
> come up with a separate body of test data to generate a Lucene index and
> unit test the code? I think this would have the double benefit of the code
> being tested and showing dev/users how the code is supposed to be used.
>
> What 

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Alexandru Zbarcea
Thanks Tim,

I will let you know about the progress.

Alex

On Oct 4, 2017 06:34, "Miller, Timothy" <
timothy.mil...@childrens.harvard.edu> wrote:

> I had in mind the notes in:
> /ctakes-examples-res/src/main/resources/org/apache/ctakes/
> examples/notes/rtf
>
> which I believe are the fake notes Dr. John Green wrote for us. I don't
> know why they are rtf but they are nice, non-toy-length notes.
> Tim
>
> 
> From: Alexandru Zbarcea 
> Sent: Tuesday, October 3, 2017 5:32 PM
> To: Apache cTAKES Dev
> Subject: Re: Missing resources for script that extracts markables from a
> corpus for analysis [EXTERNAL]
>
> Hi Tim,
>
> That's great news. If you think there are sample notes that can be used, I
> can start working on the Lucene index and slowly build the UTest for them.
>
> I have created CTAKES-462[1] where we can track this work.
>
> Looking into the ctakes-examples-res, what I can find is:
> $ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" |
> grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper"
> ./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt
> ./main/resources/org/apache/ctakes/examples/notes/claudication
> ./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt
> ./main/resources/org/apache/ctakes/examples/notes/edge_
> cases_plaintext_1.txt
>
> ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt
> ./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy
> ./main/resources/org/apache/ctakes/examples/notes/
> SampleInputRadiologyNotes.txt
>
> ./main/resources/org/apache/ctakes/examples/notes/smoker/
> doc1_07543210_sample_past_smoker.txt
> ./main/resources/org/apache/ctakes/examples/notes/smoker/
> doc2_07543210_sample_past_smoker.txt
> ./main/resources/org/apache/ctakes/examples/notes/smoker/
> doc2_07543210_sample_current.txt
> ./main/resources/org/apache/ctakes/examples/notes/smoker/
> doc1_07543210_sample_unknown.txt
> ./main/resources/org/apache/ctakes/examples/notes/smoker/
> doc1_07543210_sample_current.txt
> ./main/resources/org/apache/ctakes/examples/notes/mother_goose/README
> ./main/resources/org/apache/ctakes/examples/notes/mother_
> goose/OneMistyMoistyMorning.txt
> ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1
> ./main/resources/org/apache/ctakes/examples/annotation/
> anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1
>
> What notes do you consider I should start with (all) ?
>
> Alex
>
> [1] - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_CTAKES-2D462&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=COSkyBpYGrcp_
> hTAFRRfTx8JCwHAzxTM3GMiXKrSbnE&s=jOmot_onPFb31eg689D0ihb5Y4dZTzKcQ40v
> MCW0Bgk&e=
>
>
> On Mon, Oct 2, 2017 at 6:46 PM, Miller, Timothy  harvard.edu> wrote:
>
> > Yeah, it might be nice to build a lucene index of all the sample notes in
> > the ctakes-example module. I'll create a jira for it but probably won't
> be
> > able to get to it right away.
> > Tim
> >
> > 
> > From: Alexandru Zbarcea 
> > Sent: Monday, October 2, 2017 5:31 PM
> > To: Apache cTAKES Dev
> > Subject: Re: Missing resources for script that extracts markables fro

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Alexandru Zbarcea
Hi Tim,

Because LuceneIndex is touched in several places within the code, I started
with refactorization of LuceneIndexReaderResourceImpl (see: CTAKES-464 [1])

If you have time, may you also check CTAKES-334 [2]. I started to have it
as a prerequisite, because the patch provided actually will make the tests
pass (having also UMLS credentials).

Alex

[1] - https://issues.apache.org/jira/browse/CTAKES-464
[2] - https://issues.apache.org/jira/browse/CTAKES-334

On Wed, Oct 4, 2017 at 8:15 AM, Alexandru Zbarcea  wrote:

> Thanks Tim,
>
> I will let you know about the progress.
>
> Alex
>
> On Oct 4, 2017 06:34, "Miller, Timothy"  harvard.edu> wrote:
>
>> I had in mind the notes in:
>> /ctakes-examples-res/src/main/resources/org/apache/ctakes/ex
>> amples/notes/rtf
>>
>> which I believe are the fake notes Dr. John Green wrote for us. I don't
>> know why they are rtf but they are nice, non-toy-length notes.
>> Tim
>>
>> 
>> From: Alexandru Zbarcea 
>> Sent: Tuesday, October 3, 2017 5:32 PM
>> To: Apache cTAKES Dev
>> Subject: Re: Missing resources for script that extracts markables from a
>> corpus for analysis [EXTERNAL]
>>
>> Hi Tim,
>>
>> That's great news. If you think there are sample notes that can be used, I
>> can start working on the Lucene index and slowly build the UTest for them.
>>
>> I have created CTAKES-462[1] where we can track this work.
>>
>> Looking into the ctakes-examples-res, what I can find is:
>> $ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" |
>> grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper"
>> ./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt
>> ./main/resources/org/apache/ctakes/examples/notes/claudication
>> ./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt
>> ./main/resources/org/apache/ctakes/examples/notes/edge_cases
>> _plaintext_1.txt
>>
>> ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt
>> ./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy
>> ./main/resources/org/apache/ctakes/examples/notes/SampleInpu
>> tRadiologyNotes.txt
>>
>> ./main/resources/org/apache/ctakes/examples/notes/smoker/
>> doc1_07543210_sample_past_smoker.txt
>> ./main/resources/org/apache/ctakes/examples/notes/smoker/
>> doc2_07543210_sample_past_smoker.txt
>> ./main/resources/org/apache/ctakes/examples/notes/smoker/
>> doc2_07543210_sample_current.txt
>> ./main/resources/org/apache/ctakes/examples/notes/smoker/
>> doc1_07543210_sample_unknown.txt
>> ./main/resources/org/apache/ctakes/examples/notes/smoker/
>> doc1_07543210_sample_current.txt
>> ./main/resources/org/apache/ctakes/examples/notes/mother_goose/README
>> ./main/resources/org/apache/ctakes/examples/notes/mother_
>> goose/OneMistyMoistyMorning.txt
>> ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1
>> ./main/resources/org/apache/ctakes/examples/annotation/
>> anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1
>>
>> What notes do you consider I should start with (all) ?
>>
>> Alex
>>
>> [1] - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
>> apache.org_jira_browse_CTAKES-2D462&d=DwIBaQ&c=qS4goWBT7popl
>> M69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRR
>> NQXipowRLRjx0ibQrHEo8uYx6674

IBM MRTAS

2017-10-04 Thread Abilash.Mathew
Hi All,

I have recently come  across below YouTube video and the documentation on  IBM 
Watson Medical Records Text Analytics Solution (MRTAS). Based on my analysis, 
it follows the same approach as CTAKES ( uses same architecture and dictionary 
lookups)  and also does the same kind of  data extraction. This is just to 
check anyone noticed this similarity.

https://www.youtube.com/watch?v=7c4kxYnuBNk

http://www.medtechmedia.com/files/medtech_images/IBM_110615_Final_PPT.pdf


Thanks,
Abilash Mathew
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.