RE: Allergy Annotator

Finan, Sean Tue, 17 Jan 2017 06:33:46 -0800

Hi Shyam,

I just checked an example into 3.2.3 trunk.
org.apache.ctakes.examples.pipeline.ProcessLinesClinicalRunner.java


It reads one of the example files and logs some simple entity properties.  The 
code for the collection reader :

         // Add the Lines from File reader
         final File inputFile = FileLocator.locateFile( INPUT_FILE_PATH );
         final CollectionReader linesFromFileReader
               = CollectionReaderFactory.createReader( 
LinesFromFileCollectionReader.class,
               LinesFromFileCollectionReader.PARAM_INPUT_FILE_NAME, 
inputFile.getAbsolutePath() );

Sean

-----Original Message-----
From: Ks Sunder [mailto:[email protected]] 
Sent: Monday, January 16, 2017 1:23 AM
To: [email protected]
Subject: Re: Allergy Annotator

Thanq Sean,

can we have any LinesFromFileCollectionReader  example please  share me,



regards,
shyam k.

On Fri, Jan 13, 2017 at 8:19 PM, Finan, Sean < 
[email protected]> wrote:

> Hi Shyam,
>
> I'm not sure what the [4] is doing in your nextLine String processing.
>
> That aside, are you seeing the pipeline being initiated multiple times?
> This could be the problem.
>
> Your file reader looks nice, but as I advised in my last email, give 
> LinesFromFileCollectionReader a try.  Instead of creating a new cas 
> object and initializing the pipeline once per line, this will allow 
> ctakes to reuse a single cas object and initialize the pipeline only once.
>
> Sean
>
> -----Original Message-----
> From: Ks Sunder [mailto:[email protected]]
> Sent: Friday, January 13, 2017 1:11 AM
> To: [email protected]
> Subject: Re: Allergy Annotator
>
> Thanq Sean,
>
>    I have done coding for this  read the csv file purpose im using 
> java, but cTakes UML Dictionary purpose I am using below fuction.
>
>
>  public  AnalysisEngineDescription getUMLPipeline() throws 
> ResourceInitializationException, URISyntaxException{
>    AggregateBuilder builder = new AggregateBuilder();
>    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
>    builder.add(SentenceDetector.createAnnotatorDescription());
>    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
>    builder.add(POSTagger.createAnnotatorDescription());
>    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
>    builder.add(LvgAnnotator.createAnnotatorDescription());
>
>      try {
>          builder.add( AnalysisEngineFactory.createEngineDescription(
> DefaultJCasTermAnnotator.class,
>               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
>               "org.apache.ctakes.typesystem.type.textspan.Sentence",
>               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
>               ExternalResourceFactory.createExternalResourceDescription(
>                     FileResourceImpl.class,
>                     FileLocator.locateFile( 
> "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> )
>                     )
>         ) );
>      } catch ( FileNotFoundException e ) {
>         e.printStackTrace();
>         throw new ResourceInitializationException( e );
>      }
>
>    return builder.createAggregateDescription();
>  }
>
>
> and next I am calling this fuction from here......
>
>
>
>  reader = new CSVReader(new FileReader(ExelReadJava.NarrativeFile));
>  String [] nextLine;
>  int lineNumber = 0;
>
>
>  while ((nextLine = reader.readNext()) != null) {
>    lineNumber++;
>    System.out.println("Line # " + lineNumber);
>
>     //UML code start
>       try {
> if(nextLine[4].length()>1 ){
>
> final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( 
> nextLine[4] ); SimplePipeline.runPipeline(jcas, pipelineTesting.
> getUMLPipeline());
>
> for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, 
> IdentifiedAnnotation.class ) ) {
>      if(entity.getOntologyConceptArr() != null){
>
>     add.append(entity.getCoveredText()+ ",");
>      }
> }
>
>
> this function working properly , but processing time one line per 
> 40sec, how can decrease the processing time .
>
> i have 1lakh records(lines) in a csv file.
>
> please give me a solution and example......
>
>
>
>
>
> regards,
> shyam k.
>
> On Thu, Jan 12, 2017 at 8:48 PM, Finan, Sean < 
> [email protected]> wrote:
>
> > Hi Shyam,
> >
> > Have a look at the LinesFromFileCollectionReader class in ctakes-core.
> > It doesn't use csv files, but instead treats every newline character 
> > as a separator.
> >
> > Sean
> >
> > -----Original Message-----
> > From: Ks Sunder [mailto:[email protected]]
> > Sent: Wednesday, January 11, 2017 1:29 AM
> > To: [email protected]
> > Subject: Re: Allergy Annotator
> >
> > Hi All,
> >
> > my scenario is, read the string content from csv file, and find out 
> > medical terms from that content using cTakes UML.
> >
> > as per your suggestion i try to find CollectionReader in 
> > ctakes-core, but i didnt get clear solution, please give valuable 
> > solution, and one
> example.
> >
> >
> > regards,
> > shyam k.
> >
> > On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean < 
> > [email protected]> wrote:
> >
> > > Hi Shyam,
> > >
> > > I think that the key to your first question
> > > >   how can execute the single function to run all this jobs in 
> > > > short
> > > time...
> > > Is in your code here:
> > >
> > > 1       final JCas jcas = JCasFactory.createJCas();
> > > 2       jcas.setDocumentText( nextLine[0] );
> > > 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
> > >
> > > What you probably want to do is replace lines #1 and #2 with a 
> > > CollectionReader, and then in #3 use a different SimplePipeline 
> > > call that runs the pipeline using the CollectionReader instead of 
> > > a static
> > cas.
> > >
> > > There are commonly used CollectionReaders in ctakes-core.  The 
> > > most widely applicable is probably the FileTreeReader*, which 
> > > reads a tree of ascii files.  If you have some other source of 
> > > text data then look around the code for something that might fit 
> > > and let the devlist know if you can't find anything that fits your needs.
> > >
> > > I don't understand your second question:
> > > > how can i find sentence vised Dictionary words from string, give 
> > > > me a
> > > solution for this..
> > > Can you rephrase it and post to the devlist again?
> > >
> > > * one advantage that the FileTreeReader has is that it stores 
> > > metadata on the input file tree placement, which can then be 
> > > reproduced by output file writers like the html writer.
> > >
> > > Sean
> > >
> > >
> > > -----Original Message-----
> > > From: Ks Sunder [mailto:[email protected]]
> > > Sent: Thursday, December 22, 2016 2:33 AM
> > > To: [email protected]
> > > Subject: Re: Allergy Annotator
> > >
> > > Hi All,
> > >
> > > I have done the below code for finding medical terms from String 
> > > information.
> > >
> > > step 1 :
> > > public static AnalysisEngineDescription getUMLPipeline() throws 
> > > ResourceInitializationException, URISyntaxException{
> > >    AggregateBuilder builder = new AggregateBuilder();
> > >    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
> > >    builder.add(SentenceDetector.createAnnotatorDescription());
> > >    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
> > >    builder.add(POSTagger.createAnnotatorDescription());
> > >    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
> > >    builder.add(LvgAnnotator.createAnnotatorDescription());
> > >
> > >      try {
> > >          builder.add( 
> > > AnalysisEngineFactory.createEngineDescription(
> > > DefaultJCasTermAnnotator.class,
> > >               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
> > >               "org.apache.ctakes.typesystem.type.textspan.Sentence",
> > >               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
> > >               
> > > ExternalResourceFactory.createExternalResourceDescript
> ion(
> > >                     FileResourceImpl.class,
> > >                     FileLocator.locateFile(
> > "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> > > ) )
> > >         ) );
> > >      } catch ( FileNotFoundException e ) {
> > >         e.printStackTrace();
> > >         throw new ResourceInitializationException( e );
> > >      }
> > >
> > >    return builder.createAggregateDescription();
> > >  }
> > > step 2:
> > >
> > > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( 
> > > nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
> > >
> > > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, 
> > > IdentifiedAnnotation.class ) ) {
> > >
> > >          if(entity.getOntologyConceptArr() != null){
> > >
> > >         add.append(entity.getCoveredText()+ ",");
> > >
> > >          }
> > > }
> > >
> > >
> > >
> > >
> > >
> > > its working Fine..
> > >
> > > But i have two quires..
> > >
> > > 1. step1 , i am using Annotator step by step ... that time its 
> > > taking more time load the all fuctions
> > >    how can execute the single function to run all this jobs in 
> > > short time...
> > >
> > > 2. how can i find sentence vised Dictionary words from string, 
> > > give me a solution for this..
> > >
> > >
> > > ...please give me a solutions for this issues....
> > >
> > >
> > >
> > > regards,
> > > shyam k.
> > >
> > > On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS < 
> > > [email protected]> wrote:
> > >
> > > > I'm reviving this thread with reference to negation detection. I 
> > > > previously posted about this to the User list but this is 
> > > > probably a more appropriate venue.
> > > >
> > > > The way the sentences are split on ":" makes the negation 
> > > > annotator miss negation in lists of this form:
> > > >
> > > > Hyperlipidemia:  Yes
> > > > Hypercholesterolemia:  No
> > > > Chronic Renal Insufficiency:  N/A
> > > >
> > > > I tried reversing order and removing ":"s and found that the 
> > > > negation for Hypercholesterolemia is detected when in this form:
> > > >
> > > > Yes Hyperlipidemia
> > > > No Hypercholesterolemia
> > > > N/A Chronic Renal Insufficiency
> > > >
> > > > Our notes have quite a few places with this sort of list where 
> > > > good negation detection is important but I haven't very good 
> > > > results. The sentence segmentator sees this as 12 separate 
> > > > sentences, but I would think proper behavior would be to 
> > > > consider this as 6 sentences (breaking sentences on line break 
> > > > but not on colons). I see previous discussion on the list about 
> > > > the sentence segmentator breaking on newlines but little 
> > > > regarding colons. I would think in most cases it would be more 
> > > > useful not to break on ":". Or is there an overriding
> > > reason for the current behavior?
> > > > If changing the sentence segmentator isn't an option is there a 
> > > > different way to configure the negation detection annotator that 
> > > > would avoid this issue?
> > > >
> > > > Thanks,
> > > > Sean
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I am interested in the design decision of the sentence detector.
> > > >
> > > > Why does it split a sentence of the form "WORD1: WORD2 WORD3."
> > > > into two sentences "WORD1:" and "WORD2 WORD3."? Do other 
> > > > components of cTAKES require such a sentence splitting?
> > > >
> > > > It would seem to me that it should remain one sentence. For 
> > > > example, the smoking status detector has its own 
> > > > SentenceAdjuster that merges some of such sentences back into 
> > > > one, because of this
> design.
> > > >
> > > > Thanks, Tomasz
> > > >
> > > > ________________________________________ From: Finan, Sean [ 
> > > > [email protected]] Sent: Friday, July 10, 2015 3:20 
> > > > PM
> To:
> > > > [email protected] Subject: RE: Allergy Annotator
> > > >
> > > > Hi Tom,
> > > >
> > > > It is exactly because the sentence detector splits "KEY:" from
> "VALUE"
> > > > that I
> > > > didn't suggest using sentences. Instead, I would just iterate 
> > > > over the whole cas collection of medication events and attempt 
> > > > to match allergy phrases ("allergic to medication") with text 
> > > > the note spanning from
> > > > event.begin-15 to
> > > > event.end+15 or whatever window size you prefer.
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Tom Devel 
> > > > [mailto:[email protected]]
> > > > Sent: Friday, July 10, 2015 4:12 PM To: [email protected]
> > Subject:
> > > > Re: Allergy Annotator
> > > >
> > > > Sean and Dima, these are great suggestions, thanks so far.
> > > >
> > > > Sean, when looping over medication events as you say, I can see 
> > > > how it is possible to take the textspan.Sentence of this 
> > > > MedicationMention, and then do a regex check for the phrase 
> > > > structure
> > as Dima said.
> > > >
> > > > But instead of textspan.Sentence, you mention "see any is 
> > > > included in a phrase".
> > > > What cTAKES/UIMA class is related to this?
> > > >
> > > > Because if I would use textspan.Sentence, it would work for "The 
> > > > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> > > PENICILLIN, WHEAT"
> > > > into two sentences, so that the MedicationMentions here would 
> > > > not be in the same sentence as the word "ALLERGIES".
> > > >
> > > > Thanks again, Tom
> > > >
> > > > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < 
> > > > [email protected]>
> > > > wrote:
> > > >
> > > > Hi Dima, Tom,
> > > >
> > > > I was thinking the same as Dima's first solution. Iterate 
> > > > through the medication events and see any is included in a 
> > > > phrase as mentioned in Tom's original email. Each phrase 
> > > > structure would have to be specified beforehand. However, 
> > > > assigning appropriate CUIs would require having a lookup table 
> > > > for each medication allergy. I think that would be the simplest 
> > > > solution.
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > > > [email protected]] Sent: Friday, July 10, 2015 2:50 
> > > > PM
> To:
> > > > cTAKES Developer list Subject: Re: Allergy Annotator
> > > >
> > > > Hi Tom,
> > > >
> > > > If the patters are pretty simple, you could just add a few rules 
> > > > on top of the cTAKES dictionary lookup output. Something of the 
> > > > kind "allergic to <medication>" or "allergies: <medication1>, 
> > > > <medication2>, <substance1>, ...".
> > > >
> > > > If these patterns are hard to express as rules, you should 
> > > > consider a machine learning based sequence labeling route (e.g.
> > > > something similar to the cTAKES chunker).
> > > >
> > > > Dima
> > > >
> > > > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and 
> > > > Harvard Medical School (617) 651-0397
> > > >
> > > > On Jul 10, 2015, at 13:40, Tom Devel <[email protected]<mailto:
> > > > [email protected]>> wrote:
> > > >
> > > > Sean,
> > > >
> > > > It would be a wider net, such that if an allergy is mentioned in 
> > > > the clinical note, this is captured in the corresponding 
> > > > IdentifiedAnnotation (or alternatively, if the 
> > > > IdentifiedAnnotation class should not be changed with a new 
> > > > attribute, in a separate allergy annotation).
> > > >
> > > > This annotator would then have to of course run after the 
> > > > clinical pipeline has run and discovered all IdentifiedAnnotations.
> > > >
> > > > I am familiar with writing UIMA/cTAKES annotators, but not sure 
> > > > how a new ML method could be integrated here for detecting 
> > > > allergies. Do you have any thoughts about how to approach this 
> > > > in
> general?
> > > >
> > > > Thanks, Tom
> > > >
> > > > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < 
> > > > [email protected]<mailto:[email protected].
> > > > e
> > > > du>>
> > > > wrote:
> > > >
> > > > Hi Tom,
> > > >
> > > > Are you interested in catching all allergies or just a few 
> > > > specific allergies for a study? If you are only concerned with a 
> > > > few then there is a
> > > > (possibly) simple solution. If you are interested in throwing a 
> > > > wider net then I think that a new module would need to be 
> > > > created; does anybody reading this have an ML or regex style module?
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Tom Devel 
> > > > [mailto:[email protected]]
> > > > Sent: Friday, July 10, 2015 12:42 PM To: 
> > > > [email protected]<
> > mailto:
> > > > [email protected]> Subject: Allergy Annotator
> > > >
> > > > Hi,
> > > >
> > > > I would like to use/extend cTAKES to detect allergies.
> > > >
> > > > In the cTAKES publication (2010)
> > > >
> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm
> > > > .n
> > > > ih
> > > > .g
> > > > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwE
> > > > W1
> > > > 4J
> > > > ZM
> > > > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZAp
> > > > Jm
> > > > GK
> > > > jz
> > > > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5I
> > > > Ye 7t 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to 
> > > > a given medication are handled by setting the negation attribute 
> > > > of that medication to 'is negated'."
> > > >
> > > > However, in a post here in 2014 (RE: Allergy Indication) it is 
> > > > said that cTAKES does not have a module for allergy discovery.
> > > >
> > > > 1. What is the current status of allergy detection in cTAKES?
> > > >
> > > > 2. I did some testing, while cTAKES discovers concepts about 
> > > > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES:
> > > > PENICILLIN, WHEAT" or "The patient is allergic to penicillin."
> > > > does not give penicillin or wheat annotations allergy status.
> > > >
> > > > How would I go about detecting these allergy mentions?
> > > >
> > > > Thanks, Tom
> > > >
> > > >
> > >
> >
>

RE: Allergy Annotator

Reply via email to