Re: [PERFORM] query of partitioned object doesnt use index in qa
On 09/16/2017 12:05 AM, Tom Lane wrote: > Andres Freundwrites: >> I'm wondering if we should extend the new CREATE STATISTICS >> framework to be able to do that without requiring an index. > > I think that's already on the roadmap --- it's one of the reasons we > ended up with a SELECT-like syntax for CREATE STATISTICS. But it > didn't get done for v10. > Right. It's one of the things I'd like to be working on after getting in the more complex statistics types (MCV & histograms). > If we do look at that as a substitute for "make an expression index > just so you get some stats", it would be good to have a way to > specify that you only want the standard ANALYZE stats on that value > and not the extended ones. > Not sure I understand what you mean by "extended" - the statistics we collect for expression indexes, or the CREATE STATISTICS stuff? I assume the former, because if you don't want the latter then just don't create the statistics. Or am I missing something? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
Andres Freundwrites: > I'm wondering if we should extend the new CREATE STATISTICS framework to > be able to do that without requiring an index. I think that's already on the roadmap --- it's one of the reasons we ended up with a SELECT-like syntax for CREATE STATISTICS. But it didn't get done for v10. If we do look at that as a substitute for "make an expression index just so you get some stats", it would be good to have a way to specify that you only want the standard ANALYZE stats on that value and not the extended ones. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
On September 15, 2017 1:42:23 PM PDT, Tom Lanewrote: >One thing you could consider doing about this is creating an index >on (body ->> 'SID'::text), which would prompt ANALYZE to gather >statistics >about that expression. Even if the index weren't actually used in the >plan, this might improve the estimates and the resulting planning >choices >enough to make it worth maintaining such an index. I'm wondering if we should extend the new CREATE STATISTICS framework to be able to do that without requiring an index. I.e. allow expressions and add a new type of stats that just correspond to what normal columns have. Could even create that implicitly for expression indexes, but allow to drop it, if the overtrading isn't worth it. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
That makes a lot of sense, thanks for taking a look. An index like you suggest would probably further improve the query. Is that suggestion sidestepping the original problem that production is evaluating the landing_id bit with the partition index and qa is sequence scanning instead? AND exists (select 1 from t_sap where e.landing_id = t_sap.landing_id)) as rankings; Based on the difference in row estimate I am attempting an analyze with a higher default_statistic_target (currently 100) to see if that helps. On Fri, Sep 15, 2017 at 3:42 PM, Tom Lanewrote: > Mike Broers writes: > > If Im reading this correctly postgres thinks the partition will return > 6.5 > > million matching rows but actually comes back with 162k. Is this a case > > where something is wrong with the analyze job? > > You've got a lot of scans there that're using conditions like > > > │ -> Seq Scan on event__ e_1 > (cost=0.00..2527828.05 rows=11383021 width=778) (actual > time=25522.389..747238.885 rows=42 loops=1) > > │ Filter: (((body ->> 'SID'::text) IS > NOT NULL) AND (validation_status_code = 'P'::bpchar)) > > │ Rows Removed by Filter: 12172186 > > While I'd expect the planner to be pretty solid on estimating the > validation_status_code condition, it's not going to have any idea about > that JSON field test. That's apparently very selective, but you're just > getting a default estimate, which is not going to think that a NOT NULL > test will exclude lots of rows. > > One thing you could consider doing about this is creating an index > on (body ->> 'SID'::text), which would prompt ANALYZE to gather statistics > about that expression. Even if the index weren't actually used in the > plan, this might improve the estimates and the resulting planning choices > enough to make it worth maintaining such an index. > > Or you could think about pulling that field out and storing it on its own. > JSON columns are great for storing random unstructured data, but they are > less great when you want to do relational-ish things on subfields. > > regards, tom lane >
[PERFORM] Indexing an array of two separate columns
I have a user who is trying to match overlapping duplicate phone info but for different customer_ids. The intended conditional could be expressed: IF the intersection of the sets {c.main_phone, c.secondary_phone} and {c1.main_phone, c1.secondary_phone} is not empty THEN join EXCEPT where the intersection of the sets = {'00'} He wants a join like this: FROM customers c INNER JOIN customers c1 on (array[c.main_phone, c.secondary_phone] && array[nullif(c1.main_phone, '00') , nullif(c1.secondary_phone, '00')]) (array[c.main_phone, c.secondary_phone] && array[nullif(c1.main_phone, '00') , nullif(c1.secondary_phone, '00')]) WHERE c.customer_id = 1; I want to index this part: array[nullif(c1.main_phone, '00') , nullif(c1.secondary_phone, '00')] First of all I see I can't create a btree index on an array. And with btree_gin, this index is not being used: CREATE INDEX ON customers USING gin ((NULLIF(main_phone, '00'::text)), (NULLIF(secondary_phone, '00'::text))); What am I missing here? Is there a way to support a condition like this? Thank you!
Re: [PERFORM] query of partitioned object doesnt use index in qa
Mike Broerswrites: > If Im reading this correctly postgres thinks the partition will return 6.5 > million matching rows but actually comes back with 162k. Is this a case > where something is wrong with the analyze job? You've got a lot of scans there that're using conditions like > │ -> Seq Scan on event__ e_1 > (cost=0.00..2527828.05 rows=11383021 width=778) (actual > time=25522.389..747238.885 rows=42 loops=1) > │ Filter: (((body ->> 'SID'::text) IS NOT > NULL) AND (validation_status_code = 'P'::bpchar)) > │ Rows Removed by Filter: 12172186 While I'd expect the planner to be pretty solid on estimating the validation_status_code condition, it's not going to have any idea about that JSON field test. That's apparently very selective, but you're just getting a default estimate, which is not going to think that a NOT NULL test will exclude lots of rows. One thing you could consider doing about this is creating an index on (body ->> 'SID'::text), which would prompt ANALYZE to gather statistics about that expression. Even if the index weren't actually used in the plan, this might improve the estimates and the resulting planning choices enough to make it worth maintaining such an index. Or you could think about pulling that field out and storing it on its own. JSON columns are great for storing random unstructured data, but they are less great when you want to do relational-ish things on subfields. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
Query finally came back with an explain analyze :) If Im reading this correctly postgres thinks the partition will return 6.5 million matching rows but actually comes back with 162k. Is this a case where something is wrong with the analyze job? Seq Scan on event__00071000 e_4 (cost=0.00..2204374.94 rows=6523419 width=785) (actual time=7020.509..448368.247 rows=162912 loops=1) ┌───┐ │ QUERY PLAN │ ├───┤ │ Subquery Scan on rankings (cost=45357272.27..47351629.37 rows=39887142 width=24) (actual time=6117566.189..6117619.805 rows=25190 loops=1) │ │ -> WindowAgg (cost=45357272.27..46952757.95 rows=39887142 width=772) (actual time=6117566.101..6117611.266 rows=25190 loops=1) │ │ -> Sort (cost=45357272.27..45456990.12 rows=39887142 width=772) (actual time=6117566.054..6117572.121 rows=25190 loops=1) │ │ Sort Key: ((e.body ->> 'SID'::text)), ((e.body ->> 'Timestamp'::text)) DESC │ │ Sort Method: quicksort Memory: 13757kB │ │ -> Hash Join (cost=46.38..24740720.18 rows=39887142 width=772) (actual time=1511499.761..6117335.382 rows=25190 loops=1) │ │ Hash Cond: (e.landing_id = t_sap.landing_id) │ │ -> Append (cost=0.00..24387085.38 rows=79774283 width=776) (actual time=25522.442..6116672.504 rows=2481659 loops=1) │ │ -> Seq Scan on event e (cost=0.00..1.36 rows=1 width=97) (actual time=0.049..0.049 rows=0 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by Filter: 24 │ │ -> Seq Scan on event__ e_1 (cost=0.00..2527828.05 rows=11383021 width=778) (actual time=25522.389..747238.885 rows=42 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by Filter: 12172186 │ │ -> Seq Scan on event__00069000 e_2 (cost=0.00..1462613.93 rows=5957018 width=771) (actual time=4486.295..370098.760 rows=183696 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by Filter: 6956029 │ │ -> Seq Scan on event__0007 e_3 (cost=0.00..1534702.41 rows=5991507 width=787) (actual time=3415.907..361606.800 rows=199081 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by Filter: 7177444 │ │ -> Seq Scan on event__00071000 e_4 (cost=0.00..2204374.94 rows=6523419 width=785) (actual time=7020.509..448368.247 rows=162912 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by Filter: 8091470 │ │ -> Seq Scan on event__00072000 e_5 (cost=0.00..1531430.89 rows=5814704 width=792) (actual time=25.304..343612.826 rows=214891 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by Filter: 7301151 │ │ -> Seq Scan on event__00073000 e_6 (cost=0.00..1384865.48 rows=5876959 width=767) (actual time=1631.133..424827.603 rows=163959 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by Filter: 6523673 │ │ -> Seq Scan on event__00074000 e_7 (cost=0.00..1289048.37 rows=4747343 width=801) (actual time=3287.286..280317.057 rows=204394 loops=1) │ │ Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar)) │ │ Rows Removed by