I believe most of the Xmi/Xcas reader classes are just wrappers for UIMA utilities; look at XCASDeserializer's static method deserialize: https://uima.apache.org/d/uimaj-2.6.0/apidocs/
Tim On 11/19/2015 06:48 PM, Tomasz Oliwa wrote: > Sean, > > I tested this, the Annotator itself works, great. The only change I had to do > when writing the Annotator class with the code below is to provide generics > in: > > static private final Collection<Class<? extends IdentifiedAnnotation>> > EVENT_CLASSES = Arrays.<Class<? extends IdentifiedAnnotation>>asList( > MedicationMention.class, DiseaseDisorderMention.class, > SignSymptomMention.class, LabMention.class, > ProcedureMention.class ); > > At least on a small example XMI CAS I see the behavior is as expected for the > IdentifiedAnnotations. > > However, for my usecase, I have XCAS files, not XMI CAS files. I can use > XCasWriterCasConsumer to write the CAS files, but I cannot find any XCAS > Collection Reader to initially read them in. > > Is such a reader available? > > Regards, > Tomasz > > > ________________________________________ > From: Finan, Sean [[email protected]] > Sent: Thursday, November 19, 2015 4:03 PM > To: [email protected] > Subject: RE: TermConsumers > > Hi Tomasz, > > I don't know that anybody has done this. However, you could try running a > pipeline with items in ctakes-core: > XmiCollectionReaderCtakes to read your existing cas xmi files in > directory > -- custom refiner AE below -- to remove unwanted umls annotations > XmiWriterCasConsumerCtakes to write the new cas xmi files > > > The refiner AE would basically do what the PrecisionTermConsumer of the fast > lookup does, but over a pre-populated cas. This is mostly cut and paste from > other code with a little bit of lookompiling - I haven't tested it at all! > If you do give it a run-through and it works then let me know and I'll clean > it up and check into sandbox. > > > static private final Collection<Class<? extends IdentifiedAnnotation>> > EVENT_CLASSES = Arrays.asList( > MedicationMention.class, DiseaseDisorderMention.class, > SignSymptomMention.class, LabMention.class, ProcedureMention.class ); > // Don't forget AnatomicalSiteMention.class and generic > EntityMention.class! > > static private final Function<Annotation,TextSpan> createTextSpan > = annotation -> new DefaultTextSpan( annotation.getBegin(), > annotation.getEnd() ); > > static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> > returnSelf = annotation -> annotation; > > @Override > public void process( final JCas jcas ) throws > AnalysisEngineProcessException { > LOGGER.info( "Starting processing" ); > for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES > ) { > refineForClass( jcas, eventClass ); > } > final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( > jcas, AnatomicalSiteMention.class ); > final Collection<EntityMention> entityMentions = new ArrayList<>( > JCasUtil.select( jcas, EntityMention.class ) ); > entityMentions.removeAll( anatomicals ); > refineForAnnotations( jcas, anatomicals ); > refineForAnnotations( jcas, entityMentions ); > LOGGER.info( "Finished processing" ); > } > > static private <T extends IdentifiedAnnotation> void refineForClass( final > JCas jcas, > final > Class<T> eventClass ) { > refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) ); > } > > static private <T extends IdentifiedAnnotation> void refineForAnnotations( > final JCas jcas, > > final Collection<T> annotations ) { > final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans > = annotations.stream().collect( Collectors.toMap( createTextSpan, > returnSelf ) ); > final Collection<TextSpan> unwantedSpans = getUnwantedSpans( > annotationTextSpans.keySet() ); > unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> > t.removeFromIndexes( jcas ) ); > } > > static private Collection<TextSpan> getUnwantedSpans( final > Collection<TextSpan> originalTextSpans ) { > final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans ); > final Collection<TextSpan> discardSpans = new HashSet<>(); > final int count = textSpans.size(); > for ( int i = 0; i < count; i++ ) { > final TextSpan spanKeyI = textSpans.get( i ); > for ( int j = i + 1; j < count; j++ ) { > final TextSpan spanKeyJ = textSpans.get( j ); > if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && > spanKeyJ.getEnd() > spanKeyI.getEnd()) > || (spanKeyJ.getBegin() < spanKeyI.getBegin() && > spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) { > // J contains I, discard less precise concepts for span I and > move on to next span I > discardSpans.add( spanKeyI ); > break; > } > if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && > spanKeyI.getEnd() > spanKeyJ.getEnd()) > || (spanKeyI.getBegin() < spanKeyJ.getBegin() && > spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) { > // I contains J, discard less precise concepts for span J and > move on to next span J > discardSpans.add( spanKeyJ ); > } > } > } > return discardSpans; > } > > > Good luck, > Sean > > > -----Original Message----- > From: Tomasz Oliwa [mailto:[email protected]] > Sent: Thursday, November 19, 2015 12:08 PM > To: [email protected] > Subject: TermConsumers > > Hi, > > How can I run a different TermConsumer on already generated CAS files? > > I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the > DefaultTermConsumer set in cTakesHsql.xml. > > Now I would like to apply the PrecisionTermConsumer on these CAS files > without having to do the whole annotation process again. The > IdentifiedAnnotations are all there, it is only a matter of removing them > according to the TermConsumers logic. > > Is there a way to create a passthrough Processor that simply reads the CAS, > applies a different TermConsumer and writes it to disk? > > Or is there a different way to go on about this? > > Thanks for any help, > Tomasz >
