On Apr 5, 2007, at 6:42 PM, Tolkin, Steve wrote:

> Also, this is somewhat more complicated because sometimes
> spaces can be removed, although occasionally with much lower  
> frequency.
> For example "Arti factrefers" ought to be "Artifact refers"

How is the program supposed to select from variants such as

   Artifact refers
   Art I fact refers

   documents and
   document sand

?

It almost seems like you can't trust the spaces at all, so you might  
as well just throw them all out and then look for valid word chains  
in the remaining text.

If nothing else, that would also solve the ancillary problem of a  
space before punctuation marks...



-- 
Chris Devers
 
_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to