RE: TermConsumers

Finan, Sean Thu, 19 Nov 2015 16:54:07 -0800

Holy cattle, it worked ?!?

I don't know of a specific xcas reader offhand ... have you tried running with 
the xmi reader?  Some of the reads laying around will handle both.


-----Original Message-----
From: Tomasz Oliwa [mailto:ol...@uchicago.edu] 
Sent: Thursday, November 19, 2015 6:48 PM
To: dev@ctakes.apache.org
Subject: RE: TermConsumers

Sean,

I tested this, the Annotator itself works, great. The only change I had to do 
when writing the Annotator class with the code below is to provide generics in:

static private final Collection<Class<? extends IdentifiedAnnotation>> 
EVENT_CLASSES = Arrays.<Class<? extends IdentifiedAnnotation>>asList(
            MedicationMention.class, DiseaseDisorderMention.class,
            SignSymptomMention.class, LabMention.class, ProcedureMention.class 
);

At least on a small example XMI CAS I see the behavior is as expected for the 
IdentifiedAnnotations.

However, for my usecase, I have XCAS files, not XMI CAS files. I can use 
XCasWriterCasConsumer to write the CAS files, but I cannot find any XCAS 
Collection Reader to initially read them in. 

Is such a reader available?

Regards,
Tomasz


________________________________________
From: Finan, Sean [sean.fi...@childrens.harvard.edu]
Sent: Thursday, November 19, 2015 4:03 PM
To: dev@ctakes.apache.org
Subject: RE: TermConsumers

Hi Tomasz,

I don't know that anybody has done this.  However, you could try running a 
pipeline with items in ctakes-core:
XmiCollectionReaderCtakes       to read your existing cas xmi files in directory
-- custom refiner AE below --   to remove unwanted umls annotations
XmiWriterCasConsumerCtakes      to write the new cas xmi files


The refiner AE would basically do what the PrecisionTermConsumer of the fast 
lookup does, but over a pre-populated cas.  This is mostly cut and paste from 
other code with a little bit of lookompiling  - I haven't tested it at all!  If 
you do give it a run-through and it works then let me know and I'll clean it up 
and check into sandbox.


static private final Collection<Class<? extends IdentifiedAnnotation>> 
EVENT_CLASSES = Arrays.asList(
         MedicationMention.class, DiseaseDisorderMention.class,
         SignSymptomMention.class, LabMention.class, ProcedureMention.class );
   // Don't forget AnatomicalSiteMention.class and generic EntityMention.class!

static private final Function<Annotation,TextSpan> createTextSpan
         = annotation -> new DefaultTextSpan( annotation.getBegin(), 
annotation.getEnd() );

static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> 
returnSelf = annotation -> annotation;

   @Override
   public void process( final JCas jcas ) throws AnalysisEngineProcessException 
{
      LOGGER.info( "Starting processing" );
      for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) {
         refineForClass( jcas, eventClass );
      }
      final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( 
jcas, AnatomicalSiteMention.class );
      final Collection<EntityMention> entityMentions = new ArrayList<>( 
JCasUtil.select( jcas, EntityMention.class ) );
      entityMentions.removeAll( anatomicals );
      refineForAnnotations( jcas, anatomicals );
      refineForAnnotations( jcas, entityMentions );
      LOGGER.info( "Finished processing" );
   }

   static private <T extends IdentifiedAnnotation> void refineForClass( final 
JCas jcas,
                                                                        final 
Class<T> eventClass ) {
      refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );
   }

   static private <T extends IdentifiedAnnotation> void refineForAnnotations( 
final JCas jcas,
                                                                              
final Collection<T> annotations ) {
      final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
            = annotations.stream().collect( Collectors.toMap( createTextSpan, 
returnSelf ) );
      final Collection<TextSpan> unwantedSpans = getUnwantedSpans( 
annotationTextSpans.keySet() );
      unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> 
t.removeFromIndexes( jcas ) );
   }

   static private Collection<TextSpan> getUnwantedSpans( final 
Collection<TextSpan> originalTextSpans ) {
      final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
      final Collection<TextSpan> discardSpans = new HashSet<>();
      final int count = textSpans.size();
      for ( int i = 0; i < count; i++ ) {
         final TextSpan spanKeyI = textSpans.get( i );
         for ( int j = i + 1; j < count; j++ ) {
            final TextSpan spanKeyJ = textSpans.get( j );
            if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && 
spanKeyJ.getEnd() > spanKeyI.getEnd())
                 || (spanKeyJ.getBegin() < spanKeyI.getBegin() && 
spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) {
               // J contains I, discard less precise concepts for span I and 
move on to next span I
               discardSpans.add( spanKeyI );
               break;
            }
            if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && 
spanKeyI.getEnd() > spanKeyJ.getEnd())
                  || (spanKeyI.getBegin() < spanKeyJ.getBegin() && 
spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) {
               // I contains J, discard less precise concepts for span J and 
move on to next span J
               discardSpans.add( spanKeyJ );
            }
         }
      }
      return discardSpans;
   }


Good luck,
Sean


-----Original Message-----
From: Tomasz Oliwa [mailto:ol...@uchicago.edu]
Sent: Thursday, November 19, 2015 12:08 PM
To: dev@ctakes.apache.org
Subject: TermConsumers

Hi,

How can I run a different TermConsumer on already generated CAS files?

I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the 
DefaultTermConsumer set in cTakesHsql.xml.

Now I would like to apply the PrecisionTermConsumer on these CAS files without 
having to do the whole annotation process again. The IdentifiedAnnotations are 
all there, it is only a matter of removing them according to the TermConsumers 
logic.

Is there a way to create a passthrough Processor that simply reads the CAS, 
applies a different TermConsumer and writes it to disk?

Or is there a different way to go on about this?

Thanks for any help,
Tomasz

RE: TermConsumers

Reply via email to