No takers? -----Original Message----- From: Jerry Schwartz [mailto:je...@gii.co.jp] Sent: Monday, April 25, 2011 2:34 PM To: 'Mailing-List mysql' Subject: Join based upon LIKE
I have to match lists of new publications against our database, so that I can replace the existing publications in our catalog. For example, "The UK Market for Puppies in February 2011" would be a replacement for "The UK Market for Puppies in December 2010" Unfortunately, the publishers aren't particularly careful with their titles. One might even say they are perverse. I am likely to get "UK Market: Puppies - Feb 2011" as replacement for "The UK Market for Puppies in December 2010" You can see that a straight match by title is not going to work. Here's what I've been doing: ===== SET @PUBID = (SELECT pub.pub_id FROM pub WHERE pub.pub_code = 'GD'); CREATE TEMPORARY TABLE new_titles ( new_title VARCHAR(255), INDEX (new_title), new_title_like VARCHAR(255), INDEX (new_title_like) ); INSERT INTO new_titles VALUES ('Alternative Energy Monthly Deal Analysis - M&A and Investment Trends, April 2011', 'Alternative Energy Monthly Deal Analysis%M&A%Investment Trends%'), ('Asia Pacific Propylene Industry Outlook to 2015 - Market Size, Company Share, Price Trends, Capacity Forecasts of All Active and Planned Plants', 'Asia Pacific Propylene Industry Outlook to%Market Size%Company Share%Price Trends%Capacity Forecasts of All Active%Planned Plants'), ... ('Underground Gas Storage Industry Outlook in North America, 2011 - Details of All Operating and Planned Gas Storage Sites to 2014', 'Underground Gas Storage Industry Outlook%North America%Details of All Operating%Planned Gas Storage Sites to%'), ('Uveitis Therapeutics - Pipeline Assessment and Market Forecasts to 2017', 'Uveitis Therapeutics%Pipeline Assessment%Market Forecasts to%'); SELECT prod.prod_title AS `Title IN Database`, new_titles.new_title AS `Title IN Feed`, prod.prod_num AS `ID` FROM new_titles JOIN prod ON prod.prod_title LIKE (new_titles.new_title_like) AND prod.pub_id = @PUBID AND prod.prod_discont = 0 ORDER BY new_titles.new_title; ====== (I've written code that substitutes "%" for certain strings that I specify, and there is some trial and error involved.) Here's how MySQL handles that SELECT: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: new_titles type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 47 Extra: Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: prod type: ref possible_keys: pub_id key: pub_id key_len: 48 ref: const rows: 19607 Extra: Using where ===== Here's the important part of the table `prod`: ===== Table: prod Create Table: CREATE TABLE `prod` ( `prod_id` varchar(15) NOT NULL DEFAULT '', `prod_num` mediumint(6) unsigned DEFAULT NULL, `prod_title` varchar(255) DEFAULT NULL, `prod_type` varchar(2) DEFAULT NULL, `prod_vat_pct` decimal(5,2) DEFAULT NULL, `prod_discont` tinyint(1) DEFAULT NULL, `prod_replacing` mediumint(6) unsigned DEFAULT NULL, `prod_replaced_by` mediumint(6) unsigned DEFAULT NULL, `prod_ready` tinyint(1) DEFAULT NULL, `pub_id` varchar(15) DEFAULT NULL, ... PRIMARY KEY (`prod_id`), UNIQUE KEY `prod_num` (`prod_num`), KEY `prod_pub_prod_id` (`prod_pub_prod_id`), KEY `pub_id` (`pub_id`), KEY `prod_title` (`prod_title`), FULLTEXT KEY `prod_title_fulltext` (`prod_title`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 ===== This works reasonably well for a small number (perhaps 200-300) of new products; but now I've been handed a list of over 15000 to stuff into the table `new_titles`! This motivates me to wonder if there is a better way, since I expect this to take a very long time. Suggestions? Regards, Jerry Schwartz Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 E-mail: je...@gii.co.jp Web site: www.the-infoshop.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org