As you suggest the easiest way is to just ignore all the blanks, and then try to find words, probably by a greedy approach, and then backing off. However, the original email explained that extra spaces are much more likely than missing spaces. This information could be used to get better results.
Thanks to Richard Barbalace for sending his program. I can run it, and now I need to look at how to revise it. Thanks, Steve -----Original Message----- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Thursday, April 05, 2007 10:44 PM To: Tolkin, Steve Cc: boston perl mongers Subject: Re: [Boston.pm] Program wanted to recover text that has spaces inserted or deleted On Apr 5, 2007, at 6:42 PM, Tolkin, Steve wrote: > Also, this is somewhat more complicated because sometimes > spaces can be removed, although occasionally with much lower > frequency. > For example "Arti factrefers" ought to be "Artifact refers" How is the program supposed to select from variants such as Artifact refers Art I fact refers documents and document sand ? It almost seems like you can't trust the spaces at all, so you might as well just throw them all out and then look for valid word chains in the remaining text. If nothing else, that would also solve the ancillary problem of a space before punctuation marks... -- Chris Devers _______________________________________________ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm