Re: TermConsumers

Miller, Timothy Fri, 20 Nov 2015 05:45:47 -0800

I believe most of the Xmi/Xcas reader classes are just wrappers for UIMA
utilities; look at XCASDeserializer's static method deserialize:
https://uima.apache.org/d/uimaj-2.6.0/apidocs/


Tim


On 11/19/2015 06:48 PM, Tomasz Oliwa wrote:
> Sean,
>
> I tested this, the Annotator itself works, great. The only change I had to do 
> when writing the Annotator class with the code below is to provide generics 
> in:
>
> static private final Collection<Class<? extends IdentifiedAnnotation>> 
> EVENT_CLASSES = Arrays.<Class<? extends IdentifiedAnnotation>>asList(
>             MedicationMention.class, DiseaseDisorderMention.class,
>             SignSymptomMention.class, LabMention.class, 
> ProcedureMention.class );
>
> At least on a small example XMI CAS I see the behavior is as expected for the 
> IdentifiedAnnotations.
>
> However, for my usecase, I have XCAS files, not XMI CAS files. I can use 
> XCasWriterCasConsumer to write the CAS files, but I cannot find any XCAS 
> Collection Reader to initially read them in. 
>
> Is such a reader available?
>
> Regards,
> Tomasz
>
>
> ________________________________________
> From: Finan, Sean [[email protected]]
> Sent: Thursday, November 19, 2015 4:03 PM
> To: [email protected]
> Subject: RE: TermConsumers
>
> Hi Tomasz,
>
> I don't know that anybody has done this.  However, you could try running a 
> pipeline with items in ctakes-core:
> XmiCollectionReaderCtakes       to read your existing cas xmi files in 
> directory
> -- custom refiner AE below --   to remove unwanted umls annotations
> XmiWriterCasConsumerCtakes      to write the new cas xmi files
>
>
> The refiner AE would basically do what the PrecisionTermConsumer of the fast 
> lookup does, but over a pre-populated cas.  This is mostly cut and paste from 
> other code with a little bit of lookompiling  - I haven't tested it at all!  
> If you do give it a run-through and it works then let me know and I'll clean 
> it up and check into sandbox.
>
>
> static private final Collection<Class<? extends IdentifiedAnnotation>> 
> EVENT_CLASSES = Arrays.asList(
>          MedicationMention.class, DiseaseDisorderMention.class,
>          SignSymptomMention.class, LabMention.class, ProcedureMention.class );
>    // Don't forget AnatomicalSiteMention.class and generic 
> EntityMention.class!
>
> static private final Function<Annotation,TextSpan> createTextSpan
>          = annotation -> new DefaultTextSpan( annotation.getBegin(), 
> annotation.getEnd() );
>
> static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> 
> returnSelf = annotation -> annotation;
>
>    @Override
>    public void process( final JCas jcas ) throws 
> AnalysisEngineProcessException {
>       LOGGER.info( "Starting processing" );
>       for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES 
> ) {
>          refineForClass( jcas, eventClass );
>       }
>       final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( 
> jcas, AnatomicalSiteMention.class );
>       final Collection<EntityMention> entityMentions = new ArrayList<>( 
> JCasUtil.select( jcas, EntityMention.class ) );
>       entityMentions.removeAll( anatomicals );
>       refineForAnnotations( jcas, anatomicals );
>       refineForAnnotations( jcas, entityMentions );
>       LOGGER.info( "Finished processing" );
>    }
>
>    static private <T extends IdentifiedAnnotation> void refineForClass( final 
> JCas jcas,
>                                                                         final 
> Class<T> eventClass ) {
>       refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );
>    }
>
>    static private <T extends IdentifiedAnnotation> void refineForAnnotations( 
> final JCas jcas,
>                                                                               
> final Collection<T> annotations ) {
>       final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
>             = annotations.stream().collect( Collectors.toMap( createTextSpan, 
> returnSelf ) );
>       final Collection<TextSpan> unwantedSpans = getUnwantedSpans( 
> annotationTextSpans.keySet() );
>       unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> 
> t.removeFromIndexes( jcas ) );
>    }
>
>    static private Collection<TextSpan> getUnwantedSpans( final 
> Collection<TextSpan> originalTextSpans ) {
>       final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
>       final Collection<TextSpan> discardSpans = new HashSet<>();
>       final int count = textSpans.size();
>       for ( int i = 0; i < count; i++ ) {
>          final TextSpan spanKeyI = textSpans.get( i );
>          for ( int j = i + 1; j < count; j++ ) {
>             final TextSpan spanKeyJ = textSpans.get( j );
>             if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && 
> spanKeyJ.getEnd() > spanKeyI.getEnd())
>                  || (spanKeyJ.getBegin() < spanKeyI.getBegin() && 
> spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) {
>                // J contains I, discard less precise concepts for span I and 
> move on to next span I
>                discardSpans.add( spanKeyI );
>                break;
>             }
>             if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && 
> spanKeyI.getEnd() > spanKeyJ.getEnd())
>                   || (spanKeyI.getBegin() < spanKeyJ.getBegin() && 
> spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) {
>                // I contains J, discard less precise concepts for span J and 
> move on to next span J
>                discardSpans.add( spanKeyJ );
>             }
>          }
>       }
>       return discardSpans;
>    }
>
>
> Good luck,
> Sean
>
>
> -----Original Message-----
> From: Tomasz Oliwa [mailto:[email protected]]
> Sent: Thursday, November 19, 2015 12:08 PM
> To: [email protected]
> Subject: TermConsumers
>
> Hi,
>
> How can I run a different TermConsumer on already generated CAS files?
>
> I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the 
> DefaultTermConsumer set in cTakesHsql.xml.
>
> Now I would like to apply the PrecisionTermConsumer on these CAS files 
> without having to do the whole annotation process again. The 
> IdentifiedAnnotations are all there, it is only a matter of removing them 
> according to the TermConsumers logic.
>
> Is there a way to create a passthrough Processor that simply reads the CAS, 
> applies a different TermConsumer and writes it to disk?
>
> Or is there a different way to go on about this?
>
> Thanks for any help,
> Tomasz
>

Re: TermConsumers

Reply via email to