Sorry Paula, it's been a busy few weeks. I'm sure everyone else has been busy as well.
I'm sorry to say I think at this point it might be difficult to get the exact fix you want out of the module. It works in 2 parts I believe: 1) Identify cue words 2) Classify entities given the identified cue words. And you fixed 1) to recognize your cue word, but if 2) uses a machine learning model it may not get the right outcome sometimes and that can be hard to fix. It obviously wouldn't have seen any examples using that keyword, though I might've thought that there might be some cases it would get right using other features. If you've tried a bunch of different examples and it seems like it can't get any of them right with new cue words, then there are a few things you might consider as next steps: 1) Write your own rule-based analysis engine to follow the existing assertion module and use some simple algorithm to link your cue words with nearby entities. 2) Acquire training data and try to re-train the assertion module with your cue word additions. I believe they used the i2b2 challenge 2010 concept assertion dataset which is available with a data use agreement. Hope this helps, Tim On 01/17/2014 10:46 PM, digital paula wrote: > > > Hello again cTAKES Community, I thought that adding the sentence > splitter(w/newline-sentence-continuation-recognition) would have been as > simple as it was adding the sectionizer annotator to the eclipse environment. > I see per VJ's note that it's not that simple, my understanding is that the > standard clinical pipeline requires the assertion and dependency parsers. > I've explored a bit of the changes needed and at least for Assertion looks > like SentenceDetector, SentenceSpan, likely the SingleDocumentProcessor from > the MITRE jar will need to be modified to recognize multi-line sentences. > This is so the assertion and dependency parsers can be kept in the pipeline. > I would love to devote the time needed to fix the sentence split to recognize > sentences that are multiline but I need to focus on hacking my way through > the cue word issue because I've been left in the lurch with no response to my > posts :-((((( > Regards, > Paula > >> Date: Wed, 15 Jan 2014 14:53:17 -0500 >> Subject: Re: sentence splitter & forks/branches >> From: [email protected] >> To: [email protected] >> >> It is unfortunately not that trivial, as allowing newlines within sentences >> requires changes to the assertion and dependency parser modules. >> >> If you're not using those AEs you could theoretically build the ytex >> branch, and just add ctakes-ytex-uima.jar and >> ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to your >> exsting ctakes install (haven't tried it, but it should work). >> >> -vj >> >> >> On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd <[email protected]>wrote: >> >>> I have a general question about forks, specifically the YTEX branch that >>> Vijay mentions. >>> If I wanted to implement just the sentence splitter from YTEX into a >>> currently existing 3.1 install, how would I do that? Is it possible? Or do >>> I have to switch over completely to run from YTEX branch? >>> >>> Todd Lingren >>> Biomedical Informatics >>> Cincinnati Children's Hospital >>> [email protected] >>> 513-803-9032 >>> >>> >>> -----Original Message----- >>> From: vijay garla [mailto:[email protected]] >>> Sent: Wednesday, January 15, 2014 11:34 AM >>> To: [email protected] >>> Subject: Re: svn commit: r1551805 - >>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java >>> >>> The issue is indeed the sentence splitter - negation is limited to words >>> within the sentence, and if newlines are considered sentence boundaries, it >>> doesn't work properly (splitting on newlines breaks many other things as >>> well). The YTEX branch includes a sentence splitter that does not >>> automatically split sentences on newlines. >>> >>> best, >>> >>> vj >>> >>> >>> On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. <[email protected] >>>> wrote: >>>> Hi Paula, >>>> >>>> The sentence detector in 3.1.0 and 3.1.1 (and previous releases) >>>> assumes sentences don't cross line boundaries. >>>> OpenNLP is used to find sentence breaks, but then if newlines are >>>> found, those are also set (within cTAKES, not OpenNLP) to be sentence >>> breaks. >>>> (just FYI I haven't had a chance to look at the ytex branch, which the >>>> subject commit is about) >>>> >>>> -- James >>>> >>>> -----Original Message----- >>>> From: [email protected] [mailto: >>>> [email protected]] On Behalf Of >>>> digital paula >>>> Sent: Tuesday, January 14, 2014 10:25 PM >>>> To: [email protected] >>>> Subject: RE: svn commit: r1551805 - >>>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes >>>> /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes >>>> Impl.java >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Hello cTAKES Developer Community, >>>> I'm a little behind on reading posts....this one is from last month. >>>> I think this issue is already addressed in current release? I'm still >>>> running the previous release...3.1.0. >>>> I just noticed something interesting, the negation didn't take when it >>>> is on a different line. I just removed all carriage returns from >>> narratives >>>> and negation picked it up as long as it's treated as one long string. >>> To >>>> better explain what I mean. Two narrative comments below. >>>> >>>> 1. patient did not have diabetes >>>> 2. patient did not have >>>> diabetes >>>> >>>> Number 1 above got negated but number 2 did not. This might be related >>>> to the issue w/the sectionizer. I noticed that when I treated the >>> narrative >>>> as one string the sectionizer never crashes with the NPE. Well the >>>> sectionizer is of no point if narrative is as one string but it's >>>> helping me pinpoint the problem. >>>> >>>> Regards, >>>> Paula >>>> >>>> >>>>> Date: Thu, 19 Dec 2013 11:04:57 -0500 >>>>> Subject: Re: FW: svn commit: r1551805 - >>>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes >>>> /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes >>>> Impl.java >>>>> From: [email protected] >>>>> To: [email protected] >>>>> >>>>> Hi Pei, >>>>> >>>>> I'm not sure if that would solve the problem: change in the ytex >>>>> branch causes newlines to be ignored (i.e. not treated as a token). >>>>> trunk's sentence splitter is splits sentences on newlines, so >>>>> newlines would >>>> never >>>>> be found in a sentence. However, if we had a reproducer we could >>>>> check >>>> it >>>>> fairly easily in the ytex branch. >>>>> >>>>> Best, >>>>> >>>>> VJ >>>>> >>>>> >>>>> On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei >>>>> <[email protected]>wrote: >>>>> >>>>>> Vj, >>>>>> Do you think this is what was causing the NPE's [1]? >>>>>> If so, shall we make the same fix in trunk? >>>>>> --Pei >>>>>> >>>>>> [1] >>>>>> >>>> http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924 >>>> DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E >>>>>> -----Original Message----- >>>>>> From: [email protected] [mailto:[email protected]] >>>>>> Sent: Tuesday, December 17, 2013 9:15 PM >>>>>> To: [email protected] >>>>>> Subject: svn commit: r1551805 - >>>>>> >>>> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes >>>> /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes >>>> Impl.java >>>>>> Author: vjapache >>>>>> Date: Wed Dec 18 02:14:13 2013 >>>>>> New Revision: 1551805 >>>>>> >>>>>> URL: http://svn.apache.org/r1551805 >>>>>> Log: >>>>>> add support for sentences that contain newline tokens. >>>>>> >>>>>> Modified: >>>>>> >>>>>> >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/ >>>> assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI >>>> mpl.java >>>>>> Modified: >>>>>> >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/ >>>> assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI >>>> mpl.java >>>>>> URL: >>>>>> >>>> http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src >>>> /main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffs >>>> etToLineTokenConverterCtakesImpl.java?rev=1551805&r1=1551804&r2=155180 >>>> 5&view=diff >>>>>> >>>> ====================================================================== >>>> ======== >>>>>> --- >>>>>> >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/ >>>> assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesI >>>> mpl.java >>>>>> (original) >>>>>> +++ >>>> ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake >>>>>> +++ >>>> s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta >>>>>> +++ kesImpl.java Wed Dec 18 02:14:13 2013 >>>>>> @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat import >>>>>> org.mitre.medfacts.i2b2.api.ApiConcept; >>>>>> import >>>>>> org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter; >>>>>> import org.mitre.medfacts.zoner.LineAndTokenPosition; >>>>>> - >>>>>> import org.apache.ctakes.typesystem.type.syntax.BaseToken; >>>>>> +import org.apache.ctakes.typesystem.type.syntax.NewlineToken; >>>>>> import org.apache.ctakes.typesystem.type.textspan.Sentence; >>>>>> >>>>>> public class CharacterOffsetToLineTokenConverterCtakesImpl >>>>>> implements CharacterOffsetToLineTokenConverter >>>>>> @@ -78,11 +78,13 @@ public class CharacterOffsetToLineTokenC >>>>>> for (Annotation current : annotationIndex) >>>>>> { >>>>>> BaseToken bt = (BaseToken)current; >>>>>> - int begin = bt.getBegin(); >>>>>> - int end = bt.getEnd(); >>>>>> - >>>>>> - tokenBeginEndTreeSet.add(begin); >>>>>> - tokenBeginEndTreeSet.add(end); >>>>>> + // filter out NewlineToken >>>>>> + if (!(bt instanceof NewlineToken)) { >>>>>> + int begin = bt.getBegin(); >>>>>> + int end = bt.getEnd(); >>>>>> + tokenBeginEndTreeSet.add(begin); >>>>>> + tokenBeginEndTreeSet.add(end); >>>>>> + } >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >>> >
