Hello, it's me, a Postgres n00b again. I'm dealing with a query that scans a rather large table (94,000,000 tuples or so) and just picks out certain rows and sums them:
select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0) as pl from dbo._pl_data_cache_intraday dci where dci.snapshot_time between '2014-03-25 11:32:40.004552-05'::timestamptz and '2014-03-25 12:02:40.015177-05'::timestamptz and dci.symbol in (select sec.symbol from dbo.security_underliers sec where sec.ultimate_underlier = 'SPY') and dci.manager = 'BJC' and dci.account in (select account from pl2.visible_accounts where is_fund) group by dci.snapshot_time order by dci.snapshot_time; For the most part, Postgres is doing the right thing: snapshot_time is the lead column in all of the table's indexes, so it's able to pick up the source rows fairly quickly in its index scan. It's also enforcing "dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for "dci.symbol in (...)". The trouble comes when enforcing the "dci.account in (...)" search condition: pl2.visible_accounts is a view that determines which accounts the current user can see, which, depending on who you are, can be several hundred or none at all. Postgres estimates the output of this query as two rows, but in my case, it's actually 240. Unfortunately, that leads the query planner to try to think a nested loop is cheap enough to enforce this, when actually it's really expensive. If I hard-code the results from pl2.visible_accounts, Postgres will do a hash semi join for me, which is much faster, but then I have to wrap up this whole query as a function in order to preserve its security properties. Not only is that the situation I was trying to avoid, it means I can't use EXPLAIN for my query anymore. I've noticed I can also do the really sneaky "dci.account in (select unnest(array_agg(account)) from pl2.visible_accounts)", which tricks the estimator into thinking there will be 100 rows. That _really_ feels like cheating. Besides the above, is there anything I can do to get Postgres to do a hash instead of a nested loop? --Brian -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general