Hi Vighnesh,
1. Does ctakes depend upon exact word match?
By default, yet. The fast clinical pipeline uses
"DefaultJCasTermAnnotator" or some such horribly named class. There is also an
"OverlapJCasTermAnnotator". Equally horrible name, slightly different
functionality. Given: "Blood, urine test" the Default will identify "blood",
"urine" and "urine test". The overlap will identify "Blood", "urine", "urine
test" and "blood test". Obviously this requires all four terms to be in the
dictionary.
2. How to get all nouns in a document not covered by an IdentifiedAnnotation?
JCasUtil.select( jcas, BaseToken.class ).stream().filter( b ->
b.getPartOfSpeech().equals("NN") ).map( Annotation::getCoveredText() ).forEach(
System.out::println );
Something like that should work. Filtering by discovered IdentifiedAnnotations
is another step. Something like:
Collection<TextSpan> identifiedSpans = JCasUtil.select( jcas,
IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0)
).collect( Collectors.toList() );
Predicate<BaseToken> overlapped = bt -> {
TextSpan ts = new DefaultTextSpan( BaseToken, 0 );
return identifiedSpans.stream().filter( s -> s.overlaps(ts)
).findAny().exists();
}
Then add .filter( !overlapped ) before the original .map(
Annotation::getCoveredText ). I am not debugging this email, so you may need
to check my stream methods.
Sean
-----Original Message-----
From: Sparsh K [mailto:[email protected]]
Sent: Thursday, January 12, 2017 7:31 AM
To: [email protected]; [email protected]
Subject: Question on ctakes
Hi
I am new to ctakes, I have got few questions, Please guide me with your inputs.
1. When a clinical note is inputted to ctakes, it will process that text in
multi stages.
Let us take an eg of a clinical note :- SINGLE/PRETERM (35 WEEKS 5
DAYS)/MALE/AGA.
Here the word "preterm" is not in dictionary, preterm infant, premature baby
etc is there. So ctakes is not identifying that word as coveredText.
My question is does ctakes processing mainly depends on exact word match with
the dictionary. If so If i give one page of clinical note with explanation of
disease and if it does not contain exact matching words with dictionary, then
ctakes will not identify that word. Is it true?
2. Ctakes does POS tagging and does named entity recognition on the noun terms.
How to pull out a list of nouns created which are not matched to a named
disorder code at the named entity recognition level.
Regards
Vighnesh