The  7 lines I referred to as "ending with apostrophe" indeed have apostrophe 
followed immediately by newline.

In the training data it is indeed very rare to end on apostrophe. 7 out of 
>400K sentences. 
I second your suggestion of removing the apostrophe from the list of 
sentence-breaking characters.  It is straight-forward and cleaner. Thanks

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Tim Miller
Sent: Monday, August 26, 2013 11:35 AM
To: [email protected]
Subject: Re: apostrophe and sentence detector

Ah, so we might suspect that some of those 7 lines in the file were 
indeed followed by newlines in the original training data. In the 
absence of more/better training data which would help us learn this I 
think it would be reasonable to restore the list of sentence-breaking 
characters to not include apostrophe. Seems like it is rare for a 
sentence to end on it, and my preference is to accidentally call 2 
sentences one sentence, rather than splitting one sentence in the 
middle. I think it's probably better for downstream processing.
Just my .02,
Tim

On 08/26/2013 12:29 PM, Masanz, James J. wrote:
> The training data is one sentence per line.
> That's how you feed data to the sentence detector.
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Tim Miller
> Sent: Monday, August 26, 2013 11:12 AM
> To: [email protected]
> Subject: Re: apostrophe and sentence detector
>
>
> On 08/26/2013 12:05 PM, Masanz, James J. wrote:
>> The recently rebuilt sentence detector (currently in trunk and the 3.1.0 
>> branch) is sometimes taking the apostrophe as a sentence break where the 
>> ctakes-3.0.0-incubating model didn't.
>>
>> The training data used for the recently rebuilt model only contains only 7 
>> lines that end with an apostrophe (single quote)
> Do you mean 7 sentences that end in a single apostrophe or 7 lines? The
> sentence detector will currently break on newlines no matter what, so
> the important number is how many sentences end mid-line with an
> apostrophe, right?
> Tim

Reply via email to