[jira] [Commented] (CTAKES-316) "I do not see any" returns different ContextAnnotations in drugner pipeline

Kim Ebert (JIRA) Tue, 07 Oct 2014 15:11:47 -0700

    [ 
https://issues.apache.org/jira/browse/CTAKES-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162657#comment-14162657
 ]


Kim Ebert commented on CTAKES-316:
----------------------------------

Sean said the following:


If I understand the code correctly (it could use some doc), it runs negation 
engines and then if any negation exists it creates a single hit signifying 
negation.  Like a heavyweight Boolean.   Unfortunately, as you know, because 
Collection "s"  is a Set and it throws in the first token to come along ...  

An isolated change here would probably be better than going through the entire 
code base and switching to LinkedHashMaps, Lists, etc. - plus it would fix your 
problem.

You could (for reuse by others, assuming that one doesn't already exist) create 
a singleton BaseTokenComparator implements Comparator<BaseToken>  with 
something like:
   public int compare( final BaseToken textSpan1, final BaseToken textSpan2 ) {
      if ( textSpan1. getStartOffset () != textSpan2. getStartOffset () ) {
         return textSpan1. getStartOffset () - textSpan2. getStartOffset ();
      }
      return textSpan1. getEndOffset () - textSpan2. getEndOffset ();
   }

And in NegationContextAnalyzer line ~48
Final List<NegationIndicator> negatorsList = new ArrayList( 
_negIndicatorFSM.execute(fsmTokenList) );
If ( !negatorsList.isEmpty() ) {
        Collections.sort( negatorsList, BaseTokenComparator.getInstance() );    
        Return new ContextHit( negatorsList.get(0).getStartOffset(), 
negatorsList.get(0).getEndOffset() );

Or you could write a (faster) method to use in place of the List and Sort like:
BaseToken getFirstTextSpan( final Iterable<BaseToken> tokens ) {
        BaseToken firstToken  = null;
        For ( BaseToken token : tokens ) {
                If ( firstToken == null || token.getStartOffset() < 
firstToken.getStartOffset() ) {
                        firstToken = token;
                        continue;
                }
                If ( token.getStartOffset() == firstToken.getStartOffset() && 
token.getEndOffset() < firstToken.getEndOffset() ) {
                        firstToken = token;
                }
        }
        Return firstToken; 
                

Of course, a perfectly reasonable question to pose to the community is 
something like "Is the best stored negation context the first or largest or 
???"  Perhaps the first negator span isn't the most wanted for later use - 
perhaps it is the most-encompassing span so that multiple words can be reused.  
You could throw that out under a new thread title and perhaps the original 
authors or current users would speak up as to what might be best.  Personally I 
have no idea.

Anyway, great catch!

Sean


> "I do not see any" returns different ContextAnnotations in drugner pipeline
> ---------------------------------------------------------------------------
>
>                 Key: CTAKES-316
>                 URL: https://issues.apache.org/jira/browse/CTAKES-316
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-drug-ner
>    Affects Versions: 3.2.0
>            Reporter: Kim Ebert
>
> "I do not see any"
> Can result in the following ContextAnnotations:
> <org.apache.ctakes.typesystem.type.textsem.ContextAnnotation _indexed="1" 
> _id="130" _ref_sofa="1" begin="13" end="16" id="0" typeID="0" 
> discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0" 
> conditional="false" generic="false" historyOf="0" FocusText="I" 
> Scope="RIGHT"/>
> or
> <org.apache.ctakes.typesystem.type.textsem.ContextAnnotation _indexed="1" 
> _id="130" _ref_sofa="1" begin="5" end="16" id="0" typeID="0" 
> discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0" 
> conditional="false" generic="false" historyOf="0" FocusText="I" 
> Scope="RIGHT"/>
> or
> <org.apache.ctakes.typesystem.type.textsem.ContextAnnotation _indexed="1" 
> _id="130" _ref_sofa="1" begin="5" end="8" id="0" typeID="0" 
> discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0" 
> conditional="false" generic="false" historyOf="0" FocusText="I" 
> Scope="RIGHT"/>
> Well, after doing some digging it turns out that 
> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer is to blame.
> The code looks like the following:
> public ContextHit analyzeContext(List<? extends Annotation> contextTokens, 
> int scopeOrientation)
> throws AnalysisEngineProcessException {
> List<TextToken> fsmTokenList = wrapAsFsmTokens(contextTokens);
> try {
> Set<NegationIndicator> s = _negIndicatorFSM.execute(fsmTokenList);
> if (s.size() > 0) {
> NegationIndicator neg = s.iterator().next();
> return new ContextHit(neg.getStartOffset(), neg.getEndOffset());
> } else {
> return null;
> }
> } catch (Exception e) {
> throw new AnalysisEngineProcessException(e);
> }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CTAKES-316) "I do not see any" returns different ContextAnnotations in drugner pipeline

Reply via email to