Hi Tomasz,
I don't know that anybody has done this. However, you could try running a
pipeline with items in ctakes-core:
XmiCollectionReaderCtakes to read your existing cas xmi files in directory
-- custom refiner AE below -- to remove unwanted umls annotations
XmiWriterCasConsumerCtakes to write the new cas xmi files
The refiner AE would basically do what the PrecisionTermConsumer of the fast
lookup does, but over a pre-populated cas. This is mostly cut and paste from
other code with a little bit of lookompiling - I haven't tested it at all! If
you do give it a run-through and it works then let me know and I'll clean it up
and check into sandbox.
static private final Collection<Class<? extends IdentifiedAnnotation>>
EVENT_CLASSES = Arrays.asList(
MedicationMention.class, DiseaseDisorderMention.class,
SignSymptomMention.class, LabMention.class, ProcedureMention.class );
// Don't forget AnatomicalSiteMention.class and generic EntityMention.class!
static private final Function<Annotation,TextSpan> createTextSpan
= annotation -> new DefaultTextSpan( annotation.getBegin(),
annotation.getEnd() );
static private final Function<IdentifiedAnnotation,IdentifiedAnnotation>
returnSelf = annotation -> annotation;
@Override
public void process( final JCas jcas ) throws AnalysisEngineProcessException
{
LOGGER.info( "Starting processing" );
for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) {
refineForClass( jcas, eventClass );
}
final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select(
jcas, AnatomicalSiteMention.class );
final Collection<EntityMention> entityMentions = new ArrayList<>(
JCasUtil.select( jcas, EntityMention.class ) );
entityMentions.removeAll( anatomicals );
refineForAnnotations( jcas, anatomicals );
refineForAnnotations( jcas, entityMentions );
LOGGER.info( "Finished processing" );
}
static private <T extends IdentifiedAnnotation> void refineForClass( final
JCas jcas,
final
Class<T> eventClass ) {
refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );
}
static private <T extends IdentifiedAnnotation> void refineForAnnotations(
final JCas jcas,
final Collection<T> annotations ) {
final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
= annotations.stream().collect( Collectors.toMap( createTextSpan,
returnSelf ) );
final Collection<TextSpan> unwantedSpans = getUnwantedSpans(
annotationTextSpans.keySet() );
unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t ->
t.removeFromIndexes( jcas ) );
}
static private Collection<TextSpan> getUnwantedSpans( final
Collection<TextSpan> originalTextSpans ) {
final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
final Collection<TextSpan> discardSpans = new HashSet<>();
final int count = textSpans.size();
for ( int i = 0; i < count; i++ ) {
final TextSpan spanKeyI = textSpans.get( i );
for ( int j = i + 1; j < count; j++ ) {
final TextSpan spanKeyJ = textSpans.get( j );
if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() &&
spanKeyJ.getEnd() > spanKeyI.getEnd())
|| (spanKeyJ.getBegin() < spanKeyI.getBegin() &&
spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) {
// J contains I, discard less precise concepts for span I and
move on to next span I
discardSpans.add( spanKeyI );
break;
}
if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() &&
spanKeyI.getEnd() > spanKeyJ.getEnd())
|| (spanKeyI.getBegin() < spanKeyJ.getBegin() &&
spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) {
// I contains J, discard less precise concepts for span J and
move on to next span J
discardSpans.add( spanKeyJ );
}
}
}
return discardSpans;
}
Good luck,
Sean
-----Original Message-----
From: Tomasz Oliwa [mailto:[email protected]]
Sent: Thursday, November 19, 2015 12:08 PM
To: [email protected]
Subject: TermConsumers
Hi,
How can I run a different TermConsumer on already generated CAS files?
I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the
DefaultTermConsumer set in cTakesHsql.xml.
Now I would like to apply the PrecisionTermConsumer on these CAS files without
having to do the whole annotation process again. The IdentifiedAnnotations are
all there, it is only a matter of removing them according to the TermConsumers
logic.
Is there a way to create a passthrough Processor that simply reads the CAS,
applies a different TermConsumer and writes it to disk?
Or is there a different way to go on about this?
Thanks for any help,
Tomasz