On Sun, Feb 14, 2010 at 4:47 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Benson,
>
> One more thing.  I forget the actual reference, but the best Chinese
> segmenter that I have seen in practice (whose name I forget) was able to get
> away with a simple unweighted lexicon and 2-3 word look-ahead + average word
> length for score.

That's more or less what our existing 'legacy' algorithmic code does.
It has very good accuracy overall, but there are important cases where
that approach runs into trouble.

The thing we're currently working with is
http://acl.ldc.upenn.edu/P/P07/p07-1106.pdf.

I follow the argument you are making here, though, and I'm thinking
about how to detect what I'd call 'word sausage boundaries.'



 > This indicates to me that you can depth bound your beam
> search and turn it into an exhaustive search.  The lesson of their success
> is that garden path sentences (with regard to segmentation) are rare in
> Chinese.

Reply via email to