Hi Pei, I'm not sure if that would solve the problem: change in the ytex branch causes newlines to be ignored (i.e. not treated as a token). trunk's sentence splitter is splits sentences on newlines, so newlines would never be found in a sentence. However, if we had a reproducer we could check it fairly easily in the ytex branch.
Best, VJ On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei <pei.c...@childrens.harvard.edu>wrote: > Vj, > Do you think this is what was causing the NPE's [1]? > If so, shall we make the same fix in trunk? > --Pei > > [1] > http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E > > -----Original Message----- > From: vjapa...@apache.org [mailto:vjapa...@apache.org] > Sent: Tuesday, December 17, 2013 9:15 PM > To: comm...@ctakes.apache.org > Subject: svn commit: r1551805 - > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java > > Author: vjapache > Date: Wed Dec 18 02:14:13 2013 > New Revision: 1551805 > > URL: http://svn.apache.org/r1551805 > Log: > add support for sentences that contain newline tokens. > > Modified: > > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java > > Modified: > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java > URL: > http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java?rev=1551805&r1=1551804&r2=1551805&view=diff > > ============================================================================== > --- > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java > (original) > +++ ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake > +++ s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta > +++ kesImpl.java Wed Dec 18 02:14:13 2013 > @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat import > org.mitre.medfacts.i2b2.api.ApiConcept; > import org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter; > import org.mitre.medfacts.zoner.LineAndTokenPosition; > - > import org.apache.ctakes.typesystem.type.syntax.BaseToken; > +import org.apache.ctakes.typesystem.type.syntax.NewlineToken; > import org.apache.ctakes.typesystem.type.textspan.Sentence; > > public class CharacterOffsetToLineTokenConverterCtakesImpl implements > CharacterOffsetToLineTokenConverter > @@ -78,11 +78,13 @@ public class CharacterOffsetToLineTokenC > for (Annotation current : annotationIndex) > { > BaseToken bt = (BaseToken)current; > - int begin = bt.getBegin(); > - int end = bt.getEnd(); > - > - tokenBeginEndTreeSet.add(begin); > - tokenBeginEndTreeSet.add(end); > + // filter out NewlineToken > + if (!(bt instanceof NewlineToken)) { > + int begin = bt.getBegin(); > + int end = bt.getEnd(); > + tokenBeginEndTreeSet.add(begin); > + tokenBeginEndTreeSet.add(end); > + } > } > } > > > >