----- Original Message -----
> From: "Jerry Schwartz" <je...@gii.co.jp>
> 
> I'm not sure that I could easily build a dictionary of non-junk
> words, since

The traditional way is to build a database of junk words. The list tends to be 
shorter :-)

Think and/or/it/the/with/like/...

Percentages of mutual and non-mutual words between two titles should be a 
reasonable indicator of likeness. You could conceivably even assign value to 
individual words, so "polypropylbutanate" is more useful than "synergy" for 
comparison purposes.

All very theoretical, though, I haven't actually done much of it to this level. 
My experience in data mangling is limited to mostly should-be-fixed-format data 
like sports results.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to