Assistance with SQL syntax: pulling duplicates back

2004-08-17 Thread Eve Atley

I think this is an easy question...I've set up a SQL statement like so:

SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE
resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html')
AND candidate.Location LIKE '%CA%' OR 'California'

--
And where 'html' should come up in 1 entry, I get duplicates when printing
out the field to the screen:

--
html unix network php Over 10 years of HTML experience. 2 years networking
administration.

html unix network php Over 10 years of HTML experience. 2 years networking
administration.

--

I can't decide if this is my code, or the SQL syntax. Would it be possible,
based on this statement, to have pulled back duplicates from the same
record?



- Eve



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Assistance with SQL syntax: pulling duplicates back

2004-08-17 Thread Michael Stassen
You are joining two tables, resume and candidate.  Without a join condition, 
you get a Cartesian product, each row of the first table paired with each 
and every row of the second table.  (Some on this list would go so far as to 
say that's not even a join.)  You need to specify how rows in resume should 
be lined up with rows in candidate.  You are filtering the resulting rows 
with your WHERE conditions, but that's not the same thing.

I would expect that you have a relationship between resumes and candidates. 
 One of them should have a column which holds a key with the ID value in 
the other.  In the first case, you would add something like

  resume.candidate_id = candidate.id
to your WHERE clause, and in the second case you would add something like
  candidate.resume_id = resume.id
to your WHERE clause.  I'd expect one candidate per resume, but possibly 
more than one resume per candidate, so I'd expect the first case.

Michael
Eve Atley wrote:
I think this is an easy question...I've set up a SQL statement like so:
SELECT resume.Section_Value, candidate.Location FROM resume, candidate WHERE
resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html')
AND candidate.Location LIKE '%CA%' OR 'California'
--
And where 'html' should come up in 1 entry, I get duplicates when printing
out the field to the screen:
--
html unix network php Over 10 years of HTML experience. 2 years networking
administration.
html unix network php Over 10 years of HTML experience. 2 years networking
administration.
--
I can't decide if this is my code, or the SQL syntax. Would it be possible,
based on this statement, to have pulled back duplicates from the same
record?

- Eve

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Assistance with SQL syntax: pulling duplicates back

2004-08-17 Thread Victor Pendleton
You have a cartesian join because you do not have join criteria between the
resume and candidate tables. 

-Original Message-
From: Eve Atley
To: [EMAIL PROTECTED]
Sent: 8/17/04 12:22 PM
Subject: Assistance with SQL syntax: pulling duplicates back


I think this is an easy question...I've set up a SQL statement like so:

SELECT resume.Section_Value, candidate.Location FROM resume, candidate
WHERE
resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST
('html')
AND candidate.Location LIKE '%CA%' OR 'California'

--
And where 'html' should come up in 1 entry, I get duplicates when
printing
out the field to the screen:

--
html unix network php Over 10 years of HTML experience. 2 years
networking
administration.

html unix network php Over 10 years of HTML experience. 2 years
networking
administration.

--

I can't decide if this is my code, or the SQL syntax. Would it be
possible,
based on this statement, to have pulled back duplicates from the same
record?



- Eve



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Assistance with SQL syntax: pulling duplicates back

2004-08-17 Thread SGreen
Hi Eve,

You have made a very common mistake while using the comma-join method. I 
think if I translate your implicit inner join to an explicit inner join 
you will spot your own mistake:

SELECT resume.Section_Value, candidate.Location 
FROM resume
INNER JOIN candidate 

WHERE resume.Section_ID = '1' 
AND MATCH (resume.Section_Value) AGAINST ('html')
AND candidate.Location LIKE '%CA%' OR 'California'

You did not link your two tables. You didn't say that this column in 
resume matches up with this column in candidate so the query engine put 
together what is known as a Cartesian product. You are finding all of the 
possible combinations of rows from both tables where your WHERE clause is 
true.

You said you only get one row from MATCH ... ('HTML') (that's the 
contribution from the resume table). However, you are getting two rows 
from the candidate table based on location like. That's why you had 
two rows in your results. Imagine if you had gotten 3 rows back from the 
Match... clause... You would have had 6 records in your results and been 
really confused, eh?

You can cure this by somehow equating the two tables. Depending on what 
form of INNER JOIN you want to write you ether need another WHERE 
condition or an ON clause.

FORM 1(I prefer this form):

SELECT resume.Section_Value, candidate.Location 
FROM resume
INNER JOIN candidate 
ON candidate.id = resume.candidate_id
WHERE resume.Section_ID = '1' 
AND MATCH (resume.Section_Value) AGAINST ('html')
AND candidate.Location LIKE '%CA%' OR 'California'


FORM 2 (in comma-joined format):

SELECT resume.Section_Value, candidate.Location 
FROM resume, candidate 
WHERE resume.Section_ID = '1' 
AND MATCH (resume.Section_Value) AGAINST ('html')
AND candidate.Location LIKE '%CA%' OR 'California'
AND candidate.id = resume.candidate_id

Like I said, it's a common mistake when you write your queries that way 
(comma-join) to leave out the relationship condition.

Best Wishes,

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Eve Atley [EMAIL PROTECTED] wrote on 08/17/2004 01:22:45 PM:

 
 I think this is an easy question...I've set up a SQL statement like so:
 
 SELECT resume.Section_Value, candidate.Location FROM resume, candidate 
WHERE
 resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST 
('html')
 AND candidate.Location LIKE '%CA%' OR 'California'
 
 --
 And where 'html' should come up in 1 entry, I get duplicates when 
printing
 out the field to the screen:
 
 --
 html unix network php Over 10 years of HTML experience. 2 years 
networking
 administration.
 
 html unix network php Over 10 years of HTML experience. 2 years 
networking
 administration.
 
 --
 
 I can't decide if this is my code, or the SQL syntax. Would it be 
possible,
 based on this statement, to have pulled back duplicates from the same
 record?
 
 
 
 - Eve
 
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 


Re: Assistance with SQL syntax: pulling duplicates back

2004-08-17 Thread Michael Stassen
Eve,
From your earlier post, I see it should be
  resume.Candidate_ID = candidate.Candidate_ID
===
I should also point out that there are several problems with your Location 
matching.  You have

  candidate.Location LIKE '%CA%' OR 'California'
First, this evaluates as
  (candidate.Location LIKE '%CA%') OR ('California')
'California' evaluates as false, so only the first part can match.  You 
probably meant

  candidate.Location LIKE '%CA%' OR candidate.Location LIKE 'California'
In any case, '%CA%' matches 'California', so the latter part is still 
redundant.  '%CA%' also matches 'Ocala', or any other string which contains 
'ca'.  I don't think that's what you want.  Also, if the LIKE comparison 
string starts with a wildcard, an index on Location can't be used.

If candidate.Location contains only the state, then there is no need for the 
wildcards:

  candidate.Location LIKE 'CA' OR candidate.Location LIKE 'California'
In fact, you wouldn't even need LIKE then:
  candidate.Location IN ('CA', 'California')
So, now your query would be
  SELECT resume.Section_Value, candidate.Location
  FROM resume JOIN candidate ON resume.Candidate_ID = candidate.Candidate_ID
  WHERE resume.Section_ID = '1'
  AND MATCH (resume.Section_Value) AGAINST ('html')
  AND candidate.Location IN ('CA', 'California');
You could improve this still further by changing all the states to the 
2-letter form in your table and requiring the 2-letter state codes in the 
future.  Then Location could be changed to the smaller, faster CHAR(2), and 
the last part of the WHERE clause would be

  candidate.Location = 'CA'
On the other hand, if candidate.Location contains more than just the state, 
you're in trouble.  It will be difficult to reliably separate rows which 
contain CA meaning California from rows which contain ca as part of 
something else.  Possible, but difficult, and the solution will almost 
certainly prevent use of an index on Location.

Michael
Michael Stassen wrote:
You are joining two tables, resume and candidate.  Without a join 
condition, you get a Cartesian product, each row of the first table 
paired with each and every row of the second table.  (Some on this list 
would go so far as to say that's not even a join.)  You need to specify 
how rows in resume should be lined up with rows in candidate.  You are 
filtering the resulting rows with your WHERE conditions, but that's not 
the same thing.

I would expect that you have a relationship between resumes and 
candidates.  One of them should have a column which holds a key with the 
ID value in the other.  In the first case, you would add something like

  resume.candidate_id = candidate.id
to your WHERE clause, and in the second case you would add something like
  candidate.resume_id = resume.id
to your WHERE clause.  I'd expect one candidate per resume, but possibly 
more than one resume per candidate, so I'd expect the first case.

Michael
Eve Atley wrote:
I think this is an easy question...I've set up a SQL statement like so:
SELECT resume.Section_Value, candidate.Location FROM resume, candidate 
WHERE
resume.Section_ID = '1' AND MATCH (resume.Section_Value) AGAINST ('html')
AND candidate.Location LIKE '%CA%' OR 'California'

--
And where 'html' should come up in 1 entry, I get duplicates when 
printing
out the field to the screen:

--
html unix network php Over 10 years of HTML experience. 2 years 
networking
administration.

html unix network php Over 10 years of HTML experience. 2 years 
networking
administration.

--
I can't decide if this is my code, or the SQL syntax. Would it be 
possible,
based on this statement, to have pulled back duplicates from the same
record?


- Eve



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Assistance with SQL syntax: pulling duplicates back

2004-08-17 Thread Michael Stassen
Eve,
Best to keep threads on the list.  Others may have better ideas, and future 
readers may benefit.

The comparison
  candidate.Location IN ('CA', 'California')
will match 'CA' and 'California', but will not match 'Cupertino, CA' because 
it isn't either of those strings.  To match that row as well, you would need 
to use LIKE and a wildcard (or RLIKE) instead.  Something like:

  candidate.Location LIKE '%CA' OR candidate.Location LIKE '%California'
Unfortunately, the index can't be used then because of the wildcard.  Mysql 
will have to look at every row.  The more rows you have, the worse the 
impact will be.  That may be OK if the rest of your WHERE criteria 
sufficiently pare down the number of rows first, but you'd have to test to 
be sure.

Even then, this method will generally yield incorrect matches.  For example, 
consider

  candidate.Location LIKE '%NE' OR candidate.Location LIKE '%Nebraska'
That would match 'Bangor, Maine' because it ends with 'ne'.  Or how about
  candidate.Location LIKE '%IA' OR candidate.Location LIKE '%Iowa'
That would match 'California' because it ends with 'ia'.  See the problem? 
We could reduce these by making the comparisons case-sensitive with the 
BINARY keyword:

  candidate.Location LIKE BINARY '%IA'
  OR candidate.Location LIKE '%Iowa'
That would no longer match 'California', but it would still match 'CALIFORNIA'.
Finally, consider that
candidate.Location LIKE BINARY '%CA'
OR candidate.Location LIKE '%California'
will not match 'Pasadena, California, USA'.
I doubt it's what you wanted to hear, but the problem is that the Location 
column is poorly designed.  It contains the answers to different questions. 
 That is, multiple/different kinds of data are crammed into one column. 
The only sure-fire way to perform searches by state is to have a state 
column.  You need to fix the db, and its data.  If you really cannot fix the 
db and data, you will have to live with slow queries and imperfect results.

Michael
Eve Atley wrote:
Thanks for helping me out, Michael! I've learned *several* things today.
I have ended up using:
candidate.Location IN ('CA', 'California')
However, you're correct: candidate.Location can contain more than just the
state, at times. It was the way the database was previously designed,
unfortunately. It can include:
'Cupertino, CA'
or
'CA'
or
'California'
Will this still work for what I require?


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]