Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
Thank you Sean. That helped to figure out what we did. Not quite sure where we went wrong but now at least we know the cause. So a long time ago in our project using ctakes, we emptied out the tables CUI_TERMS, RXNORM, PREFTERM, and TUI and then loaded them with the values we wanted. Worked great. Now in the new version the /desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml engine seems to be using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab and that seems to be where things went sideways. If I don't mess with the db and keep the original, no errors. So somewhere in this if statement at line 102 in DefaultJCASTermAnnotator: if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() ) || hitTokens[ hit ].equals( allTokens.get( i ).getVariant() ) ) { It's expecting to not ever have a null and I suspect we are leaving something null somewhere that really shouldn't have nulls. If it's obvioius where I've went wrong, the assistance would be appreciated. Otherwise, I'll get it figured out eventually. I suspect it's possibly because we never did anything with the SNOMEDCT_US in the prior version. On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Jeff, > > I have no problem running on your example "DIDANOSINE, 250MG (PO Capsule > Delayed Release)" or any other text. > > I don't know how you are running ctakes through com.clientproject.ctakes. > processors.CommandLineProcessor, so I don't know how closely the standard > pipeline approximates yours. > > Sean > > -Original Message- > From: Jeff Headley [mailto:jeffun...@gmail.com] > Sent: Sunday, October 01, 2017 11:31 PM > To: dev@ctakes.apache.org > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL] > > After upgrading our project to version 4, we are getting a NPE from cTAKES. > The text that was being processed was DIDANOSINE, 250MG (PO Capsule > Delayed Release), though it seems to be happening to us no matter what text > we submit. The stack trace is below. Any help would be appreciated as I'm > at a loss at to what we might be doing wrong if this is not a bug in cTAKES. > > Thank you, > Jeff > > Oct 01, 2017 11:10:16 PM > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl > processAndOutputNewCASes(273) > SEVERE: Exception occurred > org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator > processing failed. > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412) > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314) > at > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator. > processUntilNextOutputCas(ASB_impl.java:570) > at > org.apache.uima.analysis_engine.asb.impl.ASB_impl$ > AggregateCasIterator.(ASB_impl.java:412) > at > org.apache.uima.analysis_engine.asb.impl.ASB_impl. > process(ASB_impl.java:344) > at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl. > processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) > at > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process( > AnalysisEngineImplBase.java:269) > at > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process( > AnalysisEngineImplBase.java:284) > at > com.clientproject.ctakes.processors.CommandLineProcessor.processLine( > CommandLineProcessor.java:163) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList. > java:1374) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline. > java:580) > at > com.clientproject.ctakes.processors.CommandLineProcessor.run( > CommandLineProcessor.java:114) > at com.clientproject.ctakes.App.main(App.java:109) > Caused by: java.lang.NullPointerException at > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator. > isTermMatch(DefaultJCasTermAnnotator.java:102) > at > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator. > findTerms(DefaultJCasTermAnnotator.java:79) > at > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator. > findTerms(AbstractJCasTermAnnotator.java:236) > at > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator. > processWindow(AbstractJCasTermAnnotator.java:219) > at > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.process( > AbstractJCasTermAnnotator.java:156) > at > org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process( > JCasAnnotator_ImplBase.java:48) > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:396) > ... 12 more >
Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]
Yeah, it might be nice to build a lucene index of all the sample notes in the ctakes-example module. I'll create a jira for it but probably won't be able to get to it right away. Tim From: Alexandru ZbarceaSent: Monday, October 2, 2017 5:31 PM To: Apache cTAKES Dev Subject: Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL] Hi Tim, I understand, makes sense. Is it possible to anonymize the data you have or come up with a separate body of test data to generate a Lucene index and unit test the code? I think this would have the double benefit of the code being tested and showing dev/users how the code is supposed to be used. What do you think? Alex On Mon, Oct 2, 2017 at 9:45 AM, Miller, Timothy < timothy.mil...@childrens.harvard.edu> wrote: > Thanks Alex, > This code is for processing a clinical text data corpus stored as a > lucene index -- data that cannot be redistributed for privacy reasons. > Since it's so related to the coref stuff I thought it should go > alongside the coreference module. But maybe it makes more sense as an > external project since it can't really function without externally > created resources -- what do you think? > Tim > > > On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote: > > Hi, > > > > I was trying to do a UTest for the > > org.apache.ctakes.coreference.data.PrintMimicMarkables (recently > > added), > > but I couldn't find any of the existing resources that can be used > > for > > this. Can anyone help me pointing to a resource (Lucene index) > > folder. > > > > org.apache.ctakes.coreference.data.PrintMimicMarkables \ > > > > /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup- > > res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index > > \ > > index.out > > > > I was trying with the following lucene folder/resource: > > ./ctakes-coreference- > > res/src/main/resources/org/apache/ctakes/coreference/models/index_med > > _5k > > > > And also the dictionaries: > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > > like_codes_sample > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_ > > cue_phrase_index > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > > like_sample > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index > > > > Any execution looks like: > > 01 Oct 2017 19:50:19 INFO ConstituencyParser - Initializing > > parser... > > Oct 01, 2017 7:50:20 PM > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process > > WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::) > > Message: > > docID must be >= 0 and < maxDoc=5000 (got docID=5000) > > Oct 01, 2017 7:50:20 PM > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820) > > WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000) > > java.lang.IllegalArgumentException: docID must be >= 0 and < > > maxDoc=5000 > > (got docID=5000) > > at > > org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite > > Reader.java:152) > > at > > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea > > der.java:115) > > at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) > > at > > org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec > > tionReader.java:90) > > at > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext( > > ArtifactProducer.java:494) > > at > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif > > actProducer.java:711) > > > > Collection process complete called, closing file writer. > > > > I appreciate any of your help, > > Alex
Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]
Hi Tim, I understand, makes sense. Is it possible to anonymize the data you have or come up with a separate body of test data to generate a Lucene index and unit test the code? I think this would have the double benefit of the code being tested and showing dev/users how the code is supposed to be used. What do you think? Alex On Mon, Oct 2, 2017 at 9:45 AM, Miller, Timothy < timothy.mil...@childrens.harvard.edu> wrote: > Thanks Alex, > This code is for processing a clinical text data corpus stored as a > lucene index -- data that cannot be redistributed for privacy reasons. > Since it's so related to the coref stuff I thought it should go > alongside the coreference module. But maybe it makes more sense as an > external project since it can't really function without externally > created resources -- what do you think? > Tim > > > On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote: > > Hi, > > > > I was trying to do a UTest for the > > org.apache.ctakes.coreference.data.PrintMimicMarkables (recently > > added), > > but I couldn't find any of the existing resources that can be used > > for > > this. Can anyone help me pointing to a resource (Lucene index) > > folder. > > > > org.apache.ctakes.coreference.data.PrintMimicMarkables \ > > > > /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup- > > res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index > > \ > > index.out > > > > I was trying with the following lucene folder/resource: > > ./ctakes-coreference- > > res/src/main/resources/org/apache/ctakes/coreference/models/index_med > > _5k > > > > And also the dictionaries: > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > > like_codes_sample > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_ > > cue_phrase_index > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > > like_sample > > ./ctakes-dictionary-lookup- > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index > > > > Any execution looks like: > > 01 Oct 2017 19:50:19 INFO ConstituencyParser - Initializing > > parser... > > Oct 01, 2017 7:50:20 PM > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process > > WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::) > > Message: > > docID must be >= 0 and < maxDoc=5000 (got docID=5000) > > Oct 01, 2017 7:50:20 PM > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820) > > WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000) > > java.lang.IllegalArgumentException: docID must be >= 0 and < > > maxDoc=5000 > > (got docID=5000) > > at > > org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite > > Reader.java:152) > > at > > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea > > der.java:115) > > at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) > > at > > org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec > > tionReader.java:90) > > at > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext( > > ArtifactProducer.java:494) > > at > > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif > > actProducer.java:711) > > > > Collection process complete called, closing file writer. > > > > I appreciate any of your help, > > Alex
LuceneDictionaryImpl for cTAKES-4.0.0
Hi, We have a custom decent size dictionary (~1.4M concepts) in a Lucene Index I'd like to have an implementation of AbstractJCasTermAnnotator, e.g. DefaultJCas, finding terms from the lucene index directly. I can think on two options, but I'd like to get everyone's input 1- Create a hsql db containing a dictionary using an approach similar to org.apache.ctakes.gui.dictionary,DictionaryBuilder and then some sort of LuceneConceptFactory extending AbstractConceptFactory 2- Creating a new Dictionary Lookup, e.g. LuceneJCasTermAnnotation, similar to DefaultJCasTermAnnotator with the signature of the findTerms method something like this void findTerms( IndexSearcher searcher, List allTokens) I've seen that for cTakes v3 there was something similar in the LuceneDictionaryImpl but that doesn't seem to work with the Fast Dictionary Lookup approach for cTakes-4.0.0 Thanks in advance for any ideas or suggestions! Iker
Re: CTAKES-460: coreference Test should not be part of main [EXTERNAL]
Thank you Tim Alex On Oct 2, 2017 10:43, "Miller, Timothy" < timothy.mil...@childrens.harvard.edu> wrote: Thanks Alex, I've committed this patch. I unfortunately looked at the wrong tab when typing my commit message and committed it with the wrong issue number (459). Tim On Mon, 2017-10-02 at 08:17 -0400, Alexandru Zbarcea wrote: > Hi, > > I have refactor a main class that should have been a UTest. > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or > g_jira_browse_CTAKES- > 2D460=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=T0fckwyf1n_TXQgdwCR5YlQItLlxMx > 9nU_S5EUx1Iu0=f5ZcQqm3Dbk91cdhymh20-kg5cyZGoHPFjK0x9ZH32k= > > This moves the test code from src/main to src/test and also added > some > refactoring. > > No impact. Can easily be merged. > > Alex
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
Hi Tim, The coreference question (just a question) was for a different item altogether. Sorry for any confusion. The reason that I CC:d you ... From Gandhi: > Interestingly even I was able to generate [Sean's coref output] using piper > GUI by having only that single line - " The patient started study treatment > of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) > on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input > file. >But when I change the input file content with the following lines: [Full >paragraph (below), single-sentence in middle] The co-reference superscript is >lost by then. Sean's answer: > Ctakes is a system with many moving parts. Things that precede or follow > your original example sentence will change the evaluation of that sentence. With the pipeline you are using and the full note, you should see a number (mine is 4) next to the first "thalomid" in the original example sentence. If you click that number you should see (to the right) 4 instances of "thalomid". >Tim can correct me here, but maybe the coreference module ranked the links >between "thalomid" as much higher than the rank between "study treatment of >thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded >the encapsulating treatment texts from markables? It is probably more complex >than that. Sean "This patient is participating in a Non-IND study; Protocol CG-000424: "Phase I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic Hepatocellular Carcinoma".Information has been received from the investigator regarding an 82 year-old male patient who had gastrointestinal bleeding while on Thalomid, Epirubicin, and Coumadin. He had a past medical history of diverticulosis in 03/02 and a right atrial clot from intraventricular catheter (IVC) for which he was started on Coumadin. During the hospitalization for a right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was referred to an oncologist. The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. He was concomitantly receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient presented to the emergency room with the chief complaint of hematochezia. He reported noticing bright red blood and small clots mixed in with his stool. On 07/13/02, he was admitted due to gastrointestinal bleed. The physician ordered 2 large bore intravenous lines and planned to transfuse for hematocrit less than 30%. Due to the INR (international normalized ratio) level of 3.0, Coumadin was held. He was also noted to have bilateral lower extremity edema with dyspnea on exertion. On 07/13/02, he had a chest X-ray PA and lateral done that showed no evidence of acute pneumonia or congestive heart failure. On 07/14/02, he underwent an ultrasound which was negative for deep vein thrombosis. This patient did not take Thalomid on the day of his admittance to the hospital, but resumed treatment shortly after with no return of symptoms. On 07/15/02, he was discharged in stable condition. There have been no further reports of bleeding at this time. Thedoctor has assessed the hematochezia as related to Coumadin treatment and previously diagnosed diverticulosis, and not to protocol therapy with Thalomid and Epirubicin.Additional information received from the investigator on 27Aug02 reveals that this male patient began on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin. His post cycle two computed tomography scans revealed increase in size of liver lesion with development of multiple new satellite nodules. On 29Jul02, the investigator removed this patient from protocol for progressive disease and recommended hospice care. After seeking a second opinion from two other institutions, this patient was admitted to hospice on 05Aug02. On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair. On 25Aug02, this patient died due to disease progression. The investigator assessed the death as not related to study treatment and expected" -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 10:36 AM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for
Re: CTAKES-460: coreference Test should not be part of main [EXTERNAL]
Thanks Alex, I've committed this patch. I unfortunately looked at the wrong tab when typing my commit message and committed it with the wrong issue number (459). Tim On Mon, 2017-10-02 at 08:17 -0400, Alexandru Zbarcea wrote: > Hi, > > I have refactor a main class that should have been a UTest. > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or > g_jira_browse_CTAKES- > 2D460=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=T0fckwyf1n_TXQgdwCR5YlQItLlxMx > 9nU_S5EUx1Iu0=f5ZcQqm3Dbk91cdhymh20-kg5cyZGoHPFjK0x9ZH32k= > > This moves the test code from src/main to src/test and also added > some > refactoring. > > No impact. Can easily be merged. > > Alex
Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that > I dint send the complete text. Did you mean that with the text I > sent, the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - https://urldefen > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy> Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to follow. I think that the > following links are relevant: > Working with issues: https://urldefense.proofpoint.com/v2/url?u=http > s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith- > 2Dissues- > 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ=Fo-LGlsEfYJpgYcWvrDmor0B3YGxx5brZLelntVMxrU= > Creating patches: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor- > 2Dpre-2Dcommit-2Dreviews- > 2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ=wVhEQCU73iEplHm34bO2AtgaDUpjAvrFe4GFx5b6pYo= > Attaching files: https://urldefense.proofpoint.com/v2/url?u=https-3 > A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand- > 2Dscreenshots-2Dto-2Dissues- > 2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ=eO_HZCkkeOg8jF3CMYnMxttXRHSM16qdwPl5nTW48zQ= > > I don't know if you have a jira account and permissions for the > ctakes project. An administrator may need to set that up for you. > > Thanks, > Sean > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > Sent: Thursday, September 28, 2017 4:09 AM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Sean, > > Thanks for the response. I was able to see the co-reference > superscript using the html file that you sent. Interestingly even I > was able to generate the sample HTML using piper GUI by having only > that single line - " The patient started study treatment of Thalomid > 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on > 06/07/02 for the
Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]
Thanks Alex, This code is for processing a clinical text data corpus stored as a lucene index -- data that cannot be redistributed for privacy reasons. Since it's so related to the coref stuff I thought it should go alongside the coreference module. But maybe it makes more sense as an external project since it can't really function without externally created resources -- what do you think? Tim On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote: > Hi, > > I was trying to do a UTest for the > org.apache.ctakes.coreference.data.PrintMimicMarkables (recently > added), > but I couldn't find any of the existing resources that can be used > for > this. Can anyone help me pointing to a resource (Lucene index) > folder. > > org.apache.ctakes.coreference.data.PrintMimicMarkables \ > > /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup- > res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index > \ > index.out > > I was trying with the following lucene folder/resource: > ./ctakes-coreference- > res/src/main/resources/org/apache/ctakes/coreference/models/index_med > _5k > > And also the dictionaries: > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > like_codes_sample > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_ > cue_phrase_index > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > like_sample > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index > > Any execution looks like: > 01 Oct 2017 19:50:19 INFO ConstituencyParser - Initializing > parser... > Oct 01, 2017 7:50:20 PM > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process > WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::) > Message: > docID must be >= 0 and < maxDoc=5000 (got docID=5000) > Oct 01, 2017 7:50:20 PM > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820) > WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000) > java.lang.IllegalArgumentException: docID must be >= 0 and < > maxDoc=5000 > (got docID=5000) > at > org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite > Reader.java:152) > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea > der.java:115) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) > at > org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec > tionReader.java:90) > at > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext( > ArtifactProducer.java:494) > at > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif > actProducer.java:711) > > Collection process complete called, closing file writer. > > I appreciate any of your help, > Alex
CTAKES-460: coreference Test should not be part of main
Hi, I have refactor a main class that should have been a UTest. https://issues.apache.org/jira/browse/CTAKES-460 This moves the test code from src/main to src/test and also added some refactoring. No impact. Can easily be merged. Alex