Re: [GENERAL] JOIN issues (Left vs Right for sorting), and "Nested Loop" problem

Alban Hertroys Sat, 01 Sep 2007 05:17:16 -0700


On Sep 1, 2007, at 11:46, Phoenix Kiula wrote:

Hello,

I have a simple query as follows. It joins two very straightforwardtables.



SELECT
  trades.id,
  trades.url,
  trades.alias,
  tradecount.t_count,
  tradecount.u_count
FROM trades
LEFT JOIN tradecount ON trades.id = tradecount.id
WHERE trades.user_id = 'jondoe' and trades.status = 'Y'
ORDER BY
  tradecount.u_count desc
OFFSET 20 LIMIT 10


Both the tables have a bigint "id" field that connects them. The table
definitions are included below:




                                Table "public.trades"

Column | Type |Modifiers-----------------------+-----------------------------+------------------------------

 id                    | bigint                      | not null
 user_id               | character varying(45)       | not null
 url                   | text                        | not null
 alias                 | character varying(20)       | not null
 title                 | character varying(500)      |
 private               | character(1)                |

status | character(1) | default'Y'::bpchar

 modify_date           | timestamp without time zone |
 disable_in_statistics | character(1)                | not null
default 'N'::bpchar
Indexes:
    "trades_pkey" PRIMARY KEY, btree (id)
    "trades_unique_alias" UNIQUE, btree (alias)
    "idx_trades_mdate" btree (modify_date)
    "idx_trades_userid" btree (user_id)
Check constraints:
    "trades_alias_valid" CHECK (alias::text ~ '[-A-Za-z0-9_]'::text)
    "trades_id_check" CHECK (id > 0)
    "trades_url_check" CHECK (url <> ''::text)
    "trades_user_id_check" CHECK (user_id::text <> ''::text)





                    Table "public.tradecount"

    Column    |            Type             |     Modifiers
--------------+-----------------------------+--------------------
 id           | bigint                      | not null
 t_count  | integer                     | not null default 0
 u_count | integer                     | not null default 0
 modify_date  | timestamp without time zone | default now()
Indexes:
    "tradecount_pkey" PRIMARY KEY, btree (id)
    "i_tradecount_uc" btree (u_count)
    "i_tradecount_vc" btree (t_count)
Foreign-key constraints:
    "fk_tradecount_trades_id" FOREIGN KEY (id) REFERENCES trades(id)
ON DELETE CASCADE
Rules:
    replace_tradecount_on_duplicate_insert AS
    ON INSERT TO tradecount
   WHERE (EXISTS ( SELECT 1
           FROM tradecount
          WHERE tradecount.id = new.id)) DO INSTEAD  UPDATE tradecount
SET t_count = tradecount.t_count, u_count = tradecount.u_count
  WHERE tradecount.id = new.id




Now I have two problems:


1. The above query takes more time to fire up that an index should
really take. I have bitmap heap scan off in conf file, and indexscan
on, otherwise this was going into a bitmap heap thing.

As you will see from the SQL above, the trades.user_id index should be
limiting the number of rows to a few hundred (or thousand at max) and
then we are trying to get only 10 tuples based on the OFFSET and LIMIT
clauses.

However, there's a nested loop in there as the EXPLAIN ANALYZE shows
below. What is causing this nested loop?

It looks like it's used to match trades to tradecounts. I think thatmakes sense, as the number of matching records from both tables isn'tnecessarily equal. The query is looping over trades until eachtradecount has all its trades (for user 'jondoe' with status 'Y')associated.

It is kind of confusing that you're using the id column intradecounts for both primary key and foreign key, and I'm not surewhat that implies to the query planner. It suggests that there can beonly (up to) one tradecounts record for each trade count, but itappears that either the planner doesn't realise that...


Is 10 ms problematic for this query?

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=4829.70..4829.73 rows=10 width=125) (actual
time=9.784..9.835 rows=10 loops=1)
   ->  Sort  (cost=4829.65..4830.61 rows=385 width=125) (actual
time=9.703..9.757 rows=30 loops=1)
         Sort Key: tradecount.u_count
         ->  Nested Loop Left Join  (cost=0.00..4813.12 rows=385
width=125) (actual time=0.075..8.662 rows=386 loops=1)
               ->  Index Scan using idx_trades_userid on trades
(cost=0.00..1556.08 rows=385 width=117) (actual time=0.05
0..1.225 rows=386 loops=1)
                     Index Cond: ((user_id)::text = 'jondoe'::text)
                     Filter: (status = 'Y'::bpchar)
               ->  Index Scan using tradecount_pkey on tradecount
(cost=0.00..8.45 rows=1 width=16) (actual time=0.006.
.0.008 rows=1 loops=386)
                     Index Cond: (trades.id = tradecount.id)
 Total runtime: 9.963 ms
(10 rows)




2. Secondly, if I want to sort the join by a column on the second
table, then the rows returned are not really sorted unless I do a
RIGHT JOIN (my sql above shows a LEFT JOIN). Getting results from a
right join is fine as long as the column is not null in the second
table, but if it is null, then nothing is returned. This is why I do a
LEFT join in the first place! So my question: how can I do a left
join, which is the logic that I wish to accomplish, but get the
sorting to work from the second table and if a column is null then
just return as 0 instead of nothing at all? (The LEFT JOIN used to
work in Mysql).


That's very odd, the right join should work fine.

You constrain tradecounts to require a matching record in trades, soa right join with tradecounts can not return NULL values for columnsin trades; Except where status is null (which is possible), in whichcase the record doesn't match your WHERE-clause. Are you sure youdon't have NULL values for statuses?

You don't say what "not really sorted" means in your left joins, butI expect the rows with NULL values for u_count to be grouped togetherat the top (in no particular order, you didn't specify any otherorder than by u_count) and the rest ordered by u_count as expected.

You could use ORDER BY COALESCE(tradecount.u_count, 0) desc if youwant it to behave like you say mysql sorted it.

Apparently mysql treats NULL values as 0 when ordering? Or do theyjust order them first instead of last like PG does (which is just amatter of preference, really)?

You should realize that NULL means 'unknown', so theoretically youcould encounter databases that put them at "random" places in yourotherwise sorted result set, not touching their position among otherrecords because they can't know how to compare them.

If you want certain behaviour from NULL values you'll need to specifywhat you want or expect surprises ;)


--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //




!DSPAM:737,46d95276289901944772347!



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [GENERAL] JOIN issues (Left vs Right for sorting), and "Nested Loop" problem

Reply via email to