Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]
I had in mind the notes in: /ctakes-examples-res/src/main/resources/org/apache/ctakes/examples/notes/rtf which I believe are the fake notes Dr. John Green wrote for us. I don't know why they are rtf but they are nice, non-toy-length notes. Tim From: Alexandru Zbarcea Sent: Tuesday, October 3, 2017 5:32 PM To: Apache cTAKES Dev Subject: Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL] Hi Tim, That's great news. If you think there are sample notes that can be used, I can start working on the Lucene index and slowly build the UTest for them. I have created CTAKES-462[1] where we can track this work. Looking into the ctakes-examples-res, what I can find is: $ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" | grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper" ./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt ./main/resources/org/apache/ctakes/examples/notes/claudication ./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt ./main/resources/org/apache/ctakes/examples/notes/edge_cases_plaintext_1.txt ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt ./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy ./main/resources/org/apache/ctakes/examples/notes/SampleInputRadiologyNotes.txt ./main/resources/org/apache/ctakes/examples/notes/smoker/ doc1_07543210_sample_past_smoker.txt ./main/resources/org/apache/ctakes/examples/notes/smoker/ doc2_07543210_sample_past_smoker.txt ./main/resources/org/apache/ctakes/examples/notes/smoker/ doc2_07543210_sample_current.txt ./main/resources/org/apache/ctakes/examples/notes/smoker/ doc1_07543210_sample_unknown.txt ./main/resources/org/apache/ctakes/examples/notes/smoker/ doc1_07543210_sample_current.txt ./main/resources/org/apache/ctakes/examples/notes/mother_goose/README ./main/resources/org/apache/ctakes/examples/notes/mother_ goose/OneMistyMoistyMorning.txt ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1 ./main/resources/org/apache/ctakes/examples/annotation/ anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1 What notes do you consider I should start with (all) ? Alex [1] - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D462&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=COSkyBpYGrcp_hTAFRRfTx8JCwHAzxTM3GMiXKrSbnE&s=jOmot_onPFb31eg689D0ihb5Y4dZTzKcQ40vMCW0Bgk&e= On Mon, Oct 2, 2017 at 6:46 PM, Miller, Timothy wrote: > Yeah, it might be nice to build a lucene index of all the sample notes in > the ctakes-example module. I'll create a jira for it but probably won't be > able to get to it right away. > Tim > > > From: Alexandru Zbarcea > Sent: Monday, October 2, 2017 5:31 PM > To: Apache cTAKES Dev > Subject: Re: Missing resources for script that extracts markables from a > corpus for analysis [EXTERNAL] > > Hi Tim, > > I understand, makes sense. Is it possible to anonymize the data you have or > come up with a separate body of test data to generate a Lucene index and > unit test the code? I think this would have the double benefit of the code > being tested and showing dev/users how the code is supposed to be used. > > What
Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]
Thanks Tim, I will let you know about the progress. Alex On Oct 4, 2017 06:34, "Miller, Timothy" < timothy.mil...@childrens.harvard.edu> wrote: > I had in mind the notes in: > /ctakes-examples-res/src/main/resources/org/apache/ctakes/ > examples/notes/rtf > > which I believe are the fake notes Dr. John Green wrote for us. I don't > know why they are rtf but they are nice, non-toy-length notes. > Tim > > > From: Alexandru Zbarcea > Sent: Tuesday, October 3, 2017 5:32 PM > To: Apache cTAKES Dev > Subject: Re: Missing resources for script that extracts markables from a > corpus for analysis [EXTERNAL] > > Hi Tim, > > That's great news. If you think there are sample notes that can be used, I > can start working on the Lucene index and slowly build the UTest for them. > > I have created CTAKES-462[1] where we can track this work. > > Looking into the ctakes-examples-res, what I can find is: > $ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" | > grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper" > ./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt > ./main/resources/org/apache/ctakes/examples/notes/claudication > ./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt > ./main/resources/org/apache/ctakes/examples/notes/edge_ > cases_plaintext_1.txt > > ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt > ./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy > ./main/resources/org/apache/ctakes/examples/notes/ > SampleInputRadiologyNotes.txt > > ./main/resources/org/apache/ctakes/examples/notes/smoker/ > doc1_07543210_sample_past_smoker.txt > ./main/resources/org/apache/ctakes/examples/notes/smoker/ > doc2_07543210_sample_past_smoker.txt > ./main/resources/org/apache/ctakes/examples/notes/smoker/ > doc2_07543210_sample_current.txt > ./main/resources/org/apache/ctakes/examples/notes/smoker/ > doc1_07543210_sample_unknown.txt > ./main/resources/org/apache/ctakes/examples/notes/smoker/ > doc1_07543210_sample_current.txt > ./main/resources/org/apache/ctakes/examples/notes/mother_goose/README > ./main/resources/org/apache/ctakes/examples/notes/mother_ > goose/OneMistyMoistyMorning.txt > ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1 > ./main/resources/org/apache/ctakes/examples/annotation/ > anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1 > > What notes do you consider I should start with (all) ? > > Alex > > [1] - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues. > apache.org_jira_browse_CTAKES-2D462&d=DwIBaQ&c=qS4goWBT7poplM69zy_ > 3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=COSkyBpYGrcp_ > hTAFRRfTx8JCwHAzxTM3GMiXKrSbnE&s=jOmot_onPFb31eg689D0ihb5Y4dZTzKcQ40v > MCW0Bgk&e= > > > On Mon, Oct 2, 2017 at 6:46 PM, Miller, Timothy harvard.edu> wrote: > > > Yeah, it might be nice to build a lucene index of all the sample notes in > > the ctakes-example module. I'll create a jira for it but probably won't > be > > able to get to it right away. > > Tim > > > > > > From: Alexandru Zbarcea > > Sent: Monday, October 2, 2017 5:31 PM > > To: Apache cTAKES Dev > > Subject: Re: Missing resources for script that extracts markables fro
Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]
Hi Tim, Because LuceneIndex is touched in several places within the code, I started with refactorization of LuceneIndexReaderResourceImpl (see: CTAKES-464 [1]) If you have time, may you also check CTAKES-334 [2]. I started to have it as a prerequisite, because the patch provided actually will make the tests pass (having also UMLS credentials). Alex [1] - https://issues.apache.org/jira/browse/CTAKES-464 [2] - https://issues.apache.org/jira/browse/CTAKES-334 On Wed, Oct 4, 2017 at 8:15 AM, Alexandru Zbarcea wrote: > Thanks Tim, > > I will let you know about the progress. > > Alex > > On Oct 4, 2017 06:34, "Miller, Timothy" harvard.edu> wrote: > >> I had in mind the notes in: >> /ctakes-examples-res/src/main/resources/org/apache/ctakes/ex >> amples/notes/rtf >> >> which I believe are the fake notes Dr. John Green wrote for us. I don't >> know why they are rtf but they are nice, non-toy-length notes. >> Tim >> >> >> From: Alexandru Zbarcea >> Sent: Tuesday, October 3, 2017 5:32 PM >> To: Apache cTAKES Dev >> Subject: Re: Missing resources for script that extracts markables from a >> corpus for analysis [EXTERNAL] >> >> Hi Tim, >> >> That's great news. If you think there are sample notes that can be used, I >> can start working on the Lucene index and slowly build the UTest for them. >> >> I have created CTAKES-462[1] where we can track this work. >> >> Looking into the ctakes-examples-res, what I can find is: >> $ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" | >> grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper" >> ./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt >> ./main/resources/org/apache/ctakes/examples/notes/claudication >> ./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt >> ./main/resources/org/apache/ctakes/examples/notes/edge_cases >> _plaintext_1.txt >> >> ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt >> ./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy >> ./main/resources/org/apache/ctakes/examples/notes/SampleInpu >> tRadiologyNotes.txt >> >> ./main/resources/org/apache/ctakes/examples/notes/smoker/ >> doc1_07543210_sample_past_smoker.txt >> ./main/resources/org/apache/ctakes/examples/notes/smoker/ >> doc2_07543210_sample_past_smoker.txt >> ./main/resources/org/apache/ctakes/examples/notes/smoker/ >> doc2_07543210_sample_current.txt >> ./main/resources/org/apache/ctakes/examples/notes/smoker/ >> doc1_07543210_sample_unknown.txt >> ./main/resources/org/apache/ctakes/examples/notes/smoker/ >> doc1_07543210_sample_current.txt >> ./main/resources/org/apache/ctakes/examples/notes/mother_goose/README >> ./main/resources/org/apache/ctakes/examples/notes/mother_ >> goose/OneMistyMoistyMorning.txt >> ./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1 >> ./main/resources/org/apache/ctakes/examples/annotation/ >> anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1 >> >> What notes do you consider I should start with (all) ? >> >> Alex >> >> [1] - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues. >> apache.org_jira_browse_CTAKES-2D462&d=DwIBaQ&c=qS4goWBT7popl >> M69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRR >> NQXipowRLRjx0ibQrHEo8uYx6674
IBM MRTAS
Hi All, I have recently come across below YouTube video and the documentation on IBM Watson Medical Records Text Analytics Solution (MRTAS). Based on my analysis, it follows the same approach as CTAKES ( uses same architecture and dictionary lookups) and also does the same kind of data extraction. This is just to check anyone noticed this similarity. https://www.youtube.com/watch?v=7c4kxYnuBNk http://www.medtechmedia.com/files/medtech_images/IBM_110615_Final_PPT.pdf Thanks, Abilash Mathew This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.