Assistance with SQL syntax: pulling duplicates back
I think this is an easy question...I've set up a SQL statement like so: SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' -- And where 'html' should come up in 1 entry, I get duplicates when printing out the field to the screen: -- html unix network php Over 10 years of HTML experience. 2 years networking administration. html unix network php Over 10 years of HTML experience. 2 years networking administration. -- I can't decide if this is my code, or the SQL syntax. Would it be possible, based on this statement, to have pulled back duplicates from the same record? - Eve -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Assistance with SQL syntax: pulling duplicates back
You are joining two tables, resume and candidate. Without a join condition, you get a Cartesian product, each row of the first table paired with each and every row of the second table. (Some on this list would go so far as to say that's not even a join.) You need to specify how rows in resume should be lined up with rows in candidate. You are filtering the resulting rows with your WHERE conditions, but that's not the same thing. I would expect that you have a relationship between resumes and candidates. One of them should have a column which holds a key with the ID value in the other. In the first case, you would add something like resume.candidate_id = candidate.id to your WHERE clause, and in the second case you would add something like candidate.resume_id = resume.id to your WHERE clause. I'd expect one candidate per resume, but possibly more than one resume per candidate, so I'd expect the first case. Michael Eve Atley wrote: I think this is an easy question...I've set up a SQL statement like so: SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' -- And where 'html' should come up in 1 entry, I get duplicates when printing out the field to the screen: -- html unix network php Over 10 years of HTML experience. 2 years networking administration. html unix network php Over 10 years of HTML experience. 2 years networking administration. -- I can't decide if this is my code, or the SQL syntax. Would it be possible, based on this statement, to have pulled back duplicates from the same record? - Eve -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Assistance with SQL syntax: pulling duplicates back
You have a cartesian join because you do not have join criteria between the resume and candidate tables. -Original Message- From: Eve Atley To: [EMAIL PROTECTED] Sent: 8/17/04 12:22 PM Subject: Assistance with SQL syntax: pulling duplicates back I think this is an easy question...I've set up a SQL statement like so: SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' -- And where 'html' should come up in 1 entry, I get duplicates when printing out the field to the screen: -- html unix network php Over 10 years of HTML experience. 2 years networking administration. html unix network php Over 10 years of HTML experience. 2 years networking administration. -- I can't decide if this is my code, or the SQL syntax. Would it be possible, based on this statement, to have pulled back duplicates from the same record? - Eve -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Assistance with SQL syntax: pulling duplicates back
Hi Eve, You have made a very common mistake while using the comma-join method. I think if I translate your implicit inner join to an explicit inner join you will spot your own mistake: SELECT resume.Section_Value, candidate.Location FROM resume INNER JOIN candidate WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' You did not link your two tables. You didn't say that this column in resume matches up with this column in candidate so the query engine put together what is known as a Cartesian product. You are finding all of the possible combinations of rows from both tables where your WHERE clause is true. You said you only get one row from MATCH ... ('HTML') (that's the contribution from the resume table). However, you are getting two rows from the candidate table based on location like. That's why you had two rows in your results. Imagine if you had gotten 3 rows back from the Match... clause... You would have had 6 records in your results and been really confused, eh? You can cure this by somehow equating the two tables. Depending on what form of INNER JOIN you want to write you ether need another WHERE condition or an ON clause. FORM 1(I prefer this form): SELECT resume.Section_Value, candidate.Location FROM resume INNER JOIN candidate ON candidate.id = resume.candidate_id WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' FORM 2 (in comma-joined format): SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' AND candidate.id = resume.candidate_id Like I said, it's a common mistake when you write your queries that way (comma-join) to leave out the relationship condition. Best Wishes, Shawn Green Database Administrator Unimin Corporation - Spruce Pine Eve Atley [EMAIL PROTECTED] wrote on 08/17/2004 01:22:45 PM: I think this is an easy question...I've set up a SQL statement like so: SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' -- And where 'html' should come up in 1 entry, I get duplicates when printing out the field to the screen: -- html unix network php Over 10 years of HTML experience. 2 years networking administration. html unix network php Over 10 years of HTML experience. 2 years networking administration. -- I can't decide if this is my code, or the SQL syntax. Would it be possible, based on this statement, to have pulled back duplicates from the same record? - Eve -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Assistance with SQL syntax: pulling duplicates back
Eve, From your earlier post, I see it should be resume.Candidate_ID = candidate.Candidate_ID === I should also point out that there are several problems with your Location matching. You have candidate.Location LIKE '%CA%' OR 'California' First, this evaluates as (candidate.Location LIKE '%CA%') OR ('California') 'California' evaluates as false, so only the first part can match. You probably meant candidate.Location LIKE '%CA%' OR candidate.Location LIKE 'California' In any case, '%CA%' matches 'California', so the latter part is still redundant. '%CA%' also matches 'Ocala', or any other string which contains 'ca'. I don't think that's what you want. Also, if the LIKE comparison string starts with a wildcard, an index on Location can't be used. If candidate.Location contains only the state, then there is no need for the wildcards: candidate.Location LIKE 'CA' OR candidate.Location LIKE 'California' In fact, you wouldn't even need LIKE then: candidate.Location IN ('CA', 'California') So, now your query would be SELECT resume.Section_Value, candidate.Location FROM resume JOIN candidate ON resume.Candidate_ID = candidate.Candidate_ID WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location IN ('CA', 'California'); You could improve this still further by changing all the states to the 2-letter form in your table and requiring the 2-letter state codes in the future. Then Location could be changed to the smaller, faster CHAR(2), and the last part of the WHERE clause would be candidate.Location = 'CA' On the other hand, if candidate.Location contains more than just the state, you're in trouble. It will be difficult to reliably separate rows which contain CA meaning California from rows which contain ca as part of something else. Possible, but difficult, and the solution will almost certainly prevent use of an index on Location. Michael Michael Stassen wrote: You are joining two tables, resume and candidate. Without a join condition, you get a Cartesian product, each row of the first table paired with each and every row of the second table. (Some on this list would go so far as to say that's not even a join.) You need to specify how rows in resume should be lined up with rows in candidate. You are filtering the resulting rows with your WHERE conditions, but that's not the same thing. I would expect that you have a relationship between resumes and candidates. One of them should have a column which holds a key with the ID value in the other. In the first case, you would add something like resume.candidate_id = candidate.id to your WHERE clause, and in the second case you would add something like candidate.resume_id = resume.id to your WHERE clause. I'd expect one candidate per resume, but possibly more than one resume per candidate, so I'd expect the first case. Michael Eve Atley wrote: I think this is an easy question...I've set up a SQL statement like so: SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html') AND candidate.Location LIKE '%CA%' OR 'California' -- And where 'html' should come up in 1 entry, I get duplicates when printing out the field to the screen: -- html unix network php Over 10 years of HTML experience. 2 years networking administration. html unix network php Over 10 years of HTML experience. 2 years networking administration. -- I can't decide if this is my code, or the SQL syntax. Would it be possible, based on this statement, to have pulled back duplicates from the same record? - Eve -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Assistance with SQL syntax: pulling duplicates back
Eve, Best to keep threads on the list. Others may have better ideas, and future readers may benefit. The comparison candidate.Location IN ('CA', 'California') will match 'CA' and 'California', but will not match 'Cupertino, CA' because it isn't either of those strings. To match that row as well, you would need to use LIKE and a wildcard (or RLIKE) instead. Something like: candidate.Location LIKE '%CA' OR candidate.Location LIKE '%California' Unfortunately, the index can't be used then because of the wildcard. Mysql will have to look at every row. The more rows you have, the worse the impact will be. That may be OK if the rest of your WHERE criteria sufficiently pare down the number of rows first, but you'd have to test to be sure. Even then, this method will generally yield incorrect matches. For example, consider candidate.Location LIKE '%NE' OR candidate.Location LIKE '%Nebraska' That would match 'Bangor, Maine' because it ends with 'ne'. Or how about candidate.Location LIKE '%IA' OR candidate.Location LIKE '%Iowa' That would match 'California' because it ends with 'ia'. See the problem? We could reduce these by making the comparisons case-sensitive with the BINARY keyword: candidate.Location LIKE BINARY '%IA' OR candidate.Location LIKE '%Iowa' That would no longer match 'California', but it would still match 'CALIFORNIA'. Finally, consider that candidate.Location LIKE BINARY '%CA' OR candidate.Location LIKE '%California' will not match 'Pasadena, California, USA'. I doubt it's what you wanted to hear, but the problem is that the Location column is poorly designed. It contains the answers to different questions. That is, multiple/different kinds of data are crammed into one column. The only sure-fire way to perform searches by state is to have a state column. You need to fix the db, and its data. If you really cannot fix the db and data, you will have to live with slow queries and imperfect results. Michael Eve Atley wrote: Thanks for helping me out, Michael! I've learned *several* things today. I have ended up using: candidate.Location IN ('CA', 'California') However, you're correct: candidate.Location can contain more than just the state, at times. It was the way the database was previously designed, unfortunately. It can include: 'Cupertino, CA' or 'CA' or 'California' Will this still work for what I require? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]