Re: select where multiple joined records match

AM Thomas Wed, 16 Feb 2005 14:23:38 -0800

Hi there,

I'll answer your questions below, but I should tell you that it looks like even three or four "joins" seem to make this too slow. Theoretically, the user could select up to 15 or so GoalNumber values, 12 for Grade, and about 20 possibilities for Subject - clearly it wouldn't be useful to specify that many items in a search, but that's the theoretical maximum; 4 or 5 values isn't unreasonable, though.

Four "AND"ed goal numbers plus a subject and a grade slowed the search (on the shared commercial web host I'm using) into the 3 minute range, and that's with a regular join, not a left join. This is the SELECT that took about 3 minutes (3 trials, simplifying slightly each time, simplest given here):

select r.id from resources as r
  join resource_goals as g0 on (r.id=g0.ResourceID)
  join resource_goals as g1 on (r.id=g1.ResourceID and g1.GoalNumber=1)
  join resource_goals as g2 on (r.id=g2.ResourceID and g2.GoalNumber=2)
  join resource_goals as g3 on (r.id=g3.ResourceID and g3.GoalNumber=3)
  join resource_goals as g4 on (r.id=g4.ResourceID and g4.GoalNumber=4)
where ((g0.Subject='Social_Studies')
  and (g0.Grade='4th'))
group by r.id;

The fastest time was 2 min 48 sec.
Last time (simplest query) was 3 min 2 sec.

I'm really running out of time on this project, so I just went ahead and made the user interface such that users can only select one subject, grade, and/or goal number at a time. It's probably a sound decision from a usability perspective, so I'm not too sad. If I decide to make this work in the future, I'd probably have to just do a SELECT for each "AND"ed field, get the list of resource id's for each SELECT, then find the intersection of the lists in Perl.

If I could speed this up with some kind of indexing, I'd love to know about it.

The GROUP BY phrase is because I wanted just one row per resource. It seemed like I'd get a row for each condition/resource (didn't test it with the final ). I'm actually doing SELECT * FROM... in my code, and not using the resource_goals information in my output (that's a separate view at present, generated by different Perl code).

It seems to work fine without the NOT NULL parts, you're right. I was wondering about that, but was sleepy enough at the time that I didn't trust my thinking.

Sorry about not including my table defs :-( . I guess I was just hoping for a general approach, and didn't realize that anyone would be interested enough to read all that detail and provide and exact solution for me. Of course, now I realize that it would have simplified our discussion. Anyway, late but not never, and for help to whoever finds this in the list archives someday, here are my table defs (you're right - the goal number is a TINYINT):

(Below is an abridged version of the resources table ; it also contains about 60 more TINYINT fields which are essentially used as booleans, some of which I hope to eliminate. Yes, I could have used SET or something, but I didn't for various reasons.)

CREATE TABLE resources (
  id INT UNSIGNED PRIMARY KEY,
  Title TEXT,
  ResourceType_THJHArticle TINYINT,
  ResourceType_NIEArticle TINYINT,
  DataEntryName TINYTEXT,
  Date DATETIME,
  Notes TEXT,
  Made_Keywords TEXT);

CREATE TABLE  resource_goals (
  goal_id INT UNSIGNED PRIMARY KEY,
  ResourceID INT,
  Grade TINYTEXT,
  Subject TINYTEXT,
  GoalNumber TINYINT,
  NumericGrade TINYINT);

Thanks a bunch for your help; I'm finding this more interesting than I thought I would.

On Wed, 16 Feb 2005 11:08:20 -0500, Michael Stassen <[EMAIL PROTECTED]> wrote:

AM Thomas wrote:
Guarded exclamations of success! This seems to be working for me so far - thank you! Here's an actual example of a working query on my database; the field names are a little different (sorry, I was trying to make my earlier example more readable). The main change, though, is that I did plain 'join' instead of 'left join', which seems to make it much faster. It was pretty slow at first.
Yes, LEFT JOIN does extra work, and it wasn't needed here.
mysql> select r.id, ga.Grade, ga.GoalNumber, gb.GoalNumber from resources as r join resource_goals as ga on r.id=ga.ResourceID and ga.Grade='4th' and ga.GoalNumber='1' join resource_goals as gb on r.id=gb.ResourceID and gb.Grade='4th' and gb.GoalNumber='2' where ga.goal_id IS NOT NULL and gb.goal_id IS NOT NULL group by r.id;
There are some strange things here, I think.
* You've never shown us your table definitions, but I would have expected GoalNumber to be an integer, not a string. If so, you shouldn't quote the numbers you compare it to.

* I think it unlikely that you have rows with values in Grade, GoalNumber, and ResourceID which have NULL for goal_id. Isn't goal_id the primary key? If I'm right, you don't need your WHERE clause.

* Why have you added "GROUP BY r.id"? Was that an attempt to fix something? If we've got the query right, there should be no need for grouping. If you don't get the result you want without the GROUP BY, then you should let us know, because that would mean we've missed something.
Putting those together, I'd expect
   SELECT r.id, ga.Grade, ga.GoalNumber, gb.GoalNumber
   FROM resources AS r
   JOIN resource_goals AS ga
      ON r.id = ga.ResourceID
     AND ga.Grade = '4th'
     AND ga.GoalNumber = 1
   JOIN resource_goals AS gb
      ON r.id = gb.ResourceID
     AND gb.Grade = '4th'
     AND gb.GoalNumber = 2
to do the job.  Does it?
Michael


--
Virtue of the Small / (919) 929-8687

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: select where multiple joined records match

Reply via email to