Re: LEFT JOINS same data twice?

Chris Knipe Fri, 15 Apr 2005 10:20:31 -0700

I eliminated your GROUP BY clause because you weren't actually GROUPing
anything. You weren't looking for a MIN(), a MAX(), an AVG(), or anything
else that GROUP BY was intended to be used for. That method of
de-duplicating results will return potentially misleading information as
it will randomly pick just one record from all of the records that fit
within that "group". This way, by using DISTINCT, you can see if things
change mid-flight or not (which is either an indication of bad data or
some other troubles).

Ok, again I should have explained this. The reason why FlightData is so huge and growing so rapidly, is because we take a snapshot every minute of every single flight we monitor. I therefore group by a time stamp and/or FlightTracker (Depending what data I need), so that I can only get the data returned each time the data (plane's position) actually changed. A quick example:

mysql> SELECT * FROM FlightData LIMIT 2\G
*************************** 1. row ***************************
     EntryID: 3471
     Network: V
   TimeStamp: 2005-04-14 07:27:42
     Tracker: df0363ea-fe06-1027-b1dd-00e01884a424
PilotCallSign: xxx
    PilotCID: xxx
PilotRealName: Abdul Henderson KDFW
  CurHeading: 124
      CurAlt: 34887
       CurGS: 469
       Plane: H/DC10/W
 Transponder: 7207
       QNHHg: 29.80
       QNHMb: 1009.14
     Enroute: 06:00:00
        Feul: 10:00:00
  AirportDep: KLAX
  AirportDes: MMMX
  AirportAlt:
 PositionCur: 32.06073/-114.70269
 PositionDep: 33.942536/-118.408075
 PositionDes: 19.435278/-099.072222
        PLGS: 480
       PLAlt: 35000
   PLDepTime: 640
PLFlightType: I
*************************** 2. row ***************************
     EntryID: 3472
     Network: V
   TimeStamp: 2005-04-14 07:37:42
     Tracker: df0363ea-fe06-1027-b1dd-00e01884a424
PilotCallSign: xxx
    PilotCID: xxx
PilotRealName: xxx
  CurHeading: 255
      CurAlt: 35707
       CurGS: 470
       Plane: B738
 Transponder: 7207
       QNHHg: 29.96
       QNHMb: 1014.56
     Enroute: 02:12:00
        Feul: 05:30:00
  AirportDep: KPDX
  AirportDes: KLAX
  AirportAlt:
 PositionCur: 33.94045/-118.40567
 PositionDep: 45.588722/-122.597500
 PositionDes: 33.942536/-118.408075
        PLGS: 350
       PLAlt: 34000
   PLDepTime: 220
PLFlightType: I
2 rows in set (0.00 sec)

FlightTracker therefore appears in the database a couple of hundred times for each flight. It is inserted each and every time we poll the flight, whether the data has changed or not. If I do not group therefore, my original query would return the Deparutre / Arrival for each and every occurance for FlightTracker. I do however understand what you are saying, and I was not aware that it would Group the GROUP / JOIN the entire table before adding the exceptions. This is obviously my problem as well.

Mainly, I use the TimeStamp to determine WHEN last the position of a flight was changed (and thus I know when and what change was made in the flight), whilst FlightTracker is used to "group" all the occurances of the changes in the flight.

> This command would fit between the CREATE...SELECT... and the second

SELECT... Even though the docs say that only 1 index is ever used per
query, I believe that each JOIN can also make use of an index if it exists
(this belief is based on the results of the EXPLAIN(s) of queries
containing JOINed tables. However, I could be completely wrong in this
belief.).

I was seriously hoping to be able to do this effeciently with one query. But what you are saying does make sense however - especially when the database gets bigger and we're sitting with 1GB of flight data for example. One question though, if we sit with (for example) with 1,000,000 records instead of the odd 350K records in the database now. Will it not take just as long to populate the temp table?

I do not think this technique is faster if you eliminate the WHERE clause
from the first query. But if you do have a WHERE, it should return much
faster than if you simply added a WHERE to your original query.

Ok. That is something that I was not sure about either yes. WHERE would obviously make the JOINs / GROUP BY smaller in regards to the amount of data examined and processed. I thought perhaps LIMIT would do the same, in that it would use only the 30 records specified in my example query. Obviously it JOIN / GROUP the entire table first, and only then return the first 30 records - which is indeed a big difference.

mysql> SELECT FlightData.PilotCallSign AS FLNum, FlightLogs.Tracker AS Tracker, FlightLogs.TimeStart AS TimeDep, Departure.IATA AS AirpDep, Arrival.IATA AS AirpDes, FlightLogs.Plane AS Plane FROM FlightLogs LEFT JOIN Airports AS Departure ON FlightLogs.AirpDep=Departure.ICAO LEFT JOIN Airports AS Arrival ON FlightLogs.AirpDes=Arrival.ICAO LEFT JOIN FlightData ON FlightLogs.Tracker=FlightData.Tracker GROUP BY FlightData.Tracker ORDER BY FlightLogs.TimeStart LIMIT 30; +---------+--------------------------------------+---------------------+---------+---------+----------+ | FLNum | Tracker | TimeDep | AirpDep | AirpDes | Plane | +---------+--------------------------------------+---------------------+---------+---------+----------+ <SNIP> +---------+--------------------------------------+---------------------+---------+---------+----------+ 30 rows in set (14.43 sec)

mysql> SELECT FlightData.PilotCallSign AS FLNum, FlightLogs.Tracker AS Tracker, FlightLogs.TimeStart AS TimeDep, Departure.IATA AS AirpDep, Arrival.IATA AS AirpDes, FlightLogs.Plane AS Plane FROM FlightLogs LEFT JOIN Airports AS Departure ON FlightLogs.AirpDep=Departure.ICAO LEFT JOIN Airports AS Arrival ON FlightLogs.AirpDes=Arrival.ICAO LEFT JOIN FlightData ON FlightLogs.Tracker=FlightData.Tracker WHERE FlightLogs.CCID='927968' GROUP BY FlightData.Tracker ORDER BY FlightLogs.TimeStart; +--------+--------------------------------------+---------------------+---------+---------+--------+ | FLNum | Tracker | TimeDep | AirpDep | AirpDes | Plane | +--------+--------------------------------------+---------------------+---------+---------+--------+ <SNIP> +--------+--------------------------------------+---------------------+---------+---------+--------+ 2 rows in set (0.02 sec)

MAJOR difference indeed ;) I'll check out the temp tables a bit later after I had some sleep. I think I'll need it for what I'm going to be exploring here... hehehe One last question... A sub select may not perhaps speed things up as well?

--
Chris.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: LEFT JOINS same data twice?

Reply via email to