On Apr 5, 2007, at 6:42 PM, Tolkin, Steve wrote: > Also, this is somewhat more complicated because sometimes > spaces can be removed, although occasionally with much lower > frequency. > For example "Arti factrefers" ought to be "Artifact refers"
How is the program supposed to select from variants such as Artifact refers Art I fact refers documents and document sand ? It almost seems like you can't trust the spaces at all, so you might as well just throw them all out and then look for valid word chains in the remaining text. If nothing else, that would also solve the ancillary problem of a space before punctuation marks... -- Chris Devers _______________________________________________ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm