Re: R: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-21 Thread Tom Lane
Stephen Frost  writes:
> * Job (j...@colliniconsulting.it) wrote:
>> We thought to implement one partition for day.
>> Do you think it should be fine?

> Really depends on what you're doing.  If you're running very short
> queries that pull out just a record or a few records, then you're going
> to be unhappy with the planning time required when you have hundreds and
> thousands of partitions, which is why I typically recommend against
> using partitions-by-day unless you're only keeping a few months worth of
> data.

Or to put it more simply: if you have more than O(100) partitions,
you're doing it wrong.  There is a cost to subdividing things too finely.

The improved partitioning support that's going into v10 will probably
allow more partitions before it really starts to groan, but it'll still
not be a great idea to create more than the minimum number of partitions
you really need.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: R: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-21 Thread Stephen Frost
Greetings,

* Job (j...@colliniconsulting.it) wrote:
> >If this is really what you're mostly doing, having constraint exclusion and 
> >an index on 'profile' would probably be enough, if you insist on continuing 
> >to have the table partitioned by day (which I continue to argue is a bad 
> >idea- 
> >based on the number of total rows you mentioned and the number of 
> >partitions, you have partitions with less than 20M rows each and that's 
> >really small, month-based partitions with a BRIN would probably work 
> >better).  If you get to 
> >the point of having years worth of daily partitions, you'd going to see 
> >increases in planning time.
> 
> Based on our simulation, we can raise until 5Gb of datas for a single day, 
> with some millions of rows.

5GB and a few millions rows isn't actually all that much.

> We thought to implement one partition for day.
> Do you think it should be fine?

Really depends on what you're doing.  If you're running very short
queries that pull out just a record or a few records, then you're going
to be unhappy with the planning time required when you have hundreds and
thousands of partitions, which is why I typically recommend against
using partitions-by-day unless you're only keeping a few months worth of
data.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-21 Thread Rakesh Kumar
> We thought to implement one partition for day.

That would be 365 partitions in a year.

In our experience INSERTS suffers the most in a partitioned table because 
triggers are the only way to route the row to the proper child (inherited) 
table.

Question: How is your insert pattern? Do you insert always current date. In 
that case you can write the trigger code to have current date at the top so 
that the insert trigger finds the matching date as early as possible.

Selects, updates and deletes are reasonably better.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


R: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-21 Thread Job
Hi Stephen,

Thank you for your excellent opinion!

>If this is really what you're mostly doing, having constraint exclusion and an 
>index on 'profile' would probably be enough, if you insist on continuing to 
>have the table partitioned by day (which I continue to argue is a bad idea- 
>based on the number of total rows you mentioned and the number of partitions, 
>you have partitions with less than 20M rows each and that's really small, 
>month-based partitions with a BRIN would probably work better).  If you get to 
>the point of having years worth of daily partitions, you'd going to see 
>increases in planning time.

Based on our simulation, we can raise until 5Gb of datas for a single day, with 
some millions of rows.
We thought to implement one partition for day.
Do you think it should be fine?

Thank you!
/F

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-20 Thread Steven Winfield
> Unfortunately, that'll require locking each table and scanning it to make 
> sure that the CHECK constraint isn't violated.

Actually, CHECK constraints can be added with the NOT VALID clause. 
New tuples will be checked immediately, while the validation of existing tuples 
can be done later using ALTER TABLE ... VALIDATE CONSTRAINT ... which takes a 
less invasive lock than if you'd omitted NOT VALID.

Steve.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-20 Thread Stephen Frost
Greetings,

* Job (j...@colliniconsulting.it) wrote:
> here is primary a partitioned table (for 20/2/2017 logs):
> flashstart=# \d webtraffic_archive_day_2017_02_20;
>  Table 
> "public.webtraffic_archive_day_2017_02_20"
>   Column   |Type |   
> Modifiers
> ---+-+
>  id| numeric(1000,1) | not null default 
> function_get_next_sequence('webtraffic_archive_id_seq'::text)
>  timestamp | timestamp without time zone |
>  domain| character varying(255)  |
>  action| integer |
>  profile   | character varying(50)   |
>  accessi   | integer |
>  url   | text|
> Indexes:
> "webtraffic_archive_day_2017_02_20_action_wbidx" btree (action)
> "webtraffic_archive_day_2017_02_20_domain_wbidx" btree (domain)
> "webtraffic_archive_day_2017_02_20_profile_wbidx" btree (profile)
> "webtraffic_archive_day_2017_02_20_timestamp_date_wbidx" brin 
> (("timestamp"::date))
> "webtraffic_archive_day_2017_02_20_timestamp_time_wbidx" btree 
> (("timestamp"::time without time zone))
> "webtraffic_archive_day_2017_02_20_timestamp_wbidx" brin ("timestamp")

You don't have a CHECK constraint on your individual partitions, so PG
has no idea which of the partitions contains data from what time-period.
To have PostgreSQL's constraint exclusion work to eliminate entire
partitions, you need to add a CHECK constraint on the partition key.  In
this case, it looks like you'd want something like:

CHECK (timestamp >= partition_date AND timestamp < partition_date+1)

Unfortunately, that'll require locking each table and scanning it to
make sure that the CHECK constraint isn't violated.

A couple of other notes- you probably don't need both a BRIN and a btree
index on the same column.  If the BRIN works well enough for you then
you can drop the btree index.  If it doesn't, then you might as well
just keep the btree index and drop the BRIN.  It seems equally
unnecessary to have a BRIN on a cast of the column.  You should also be
using timestamp w/ timezone, really.

> This is the query planner:
> explain analyze
> select * from webtraffic_archive where timestamp::date='2017-02-20' and 
> profile='f62467'

If this is really what you're mostly doing, having constraint exclusion
and an index on 'profile' would probably be enough, if you insist on
continuing to have the table partitioned by day (which I continue to
argue is a bad idea- based on the number of total rows you mentioned and
the number of partitions, you have partitions with less than 20M rows
each and that's really small, month-based partitions with a BRIN would
probably work better).  If you get to the point of having years worth of
daily partitions, you'd going to see increases in planning time.

Thanks!

Stephen


signature.asc
Description: Digital signature


R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-20 Thread Job
Hi,



here is primary a partitioned table (for 20/2/2017 logs):
flashstart=# \d webtraffic_archive_day_2017_02_20;
 Table 
"public.webtraffic_archive_day_2017_02_20"
  Column   |Type |   
Modifiers
---+-+
 id| numeric(1000,1) | not null default 
function_get_next_sequence('webtraffic_archive_id_seq'::text)
 timestamp | timestamp without time zone |
 domain| character varying(255)  |
 action| integer |
 profile   | character varying(50)   |
 accessi   | integer |
 url   | text|
Indexes:
"webtraffic_archive_day_2017_02_20_action_wbidx" btree (action)
"webtraffic_archive_day_2017_02_20_domain_wbidx" btree (domain)
"webtraffic_archive_day_2017_02_20_profile_wbidx" btree (profile)
"webtraffic_archive_day_2017_02_20_timestamp_date_wbidx" brin 
(("timestamp"::date))
"webtraffic_archive_day_2017_02_20_timestamp_time_wbidx" btree 
(("timestamp"::time without time zone))
"webtraffic_archive_day_2017_02_20_timestamp_wbidx" brin ("timestamp")


and here is "master" table:
  Column   |Type |   
Modifiers
---+-+
 id| numeric(1000,1) | not null default 
function_get_next_sequence('webtraffic_archive_id_seq'::text)
 timestamp | timestamp without time zone |
 domain| character varying(255)  |
 action| integer |
 profile   | character varying(50)   |
 accessi   | integer |
 url   | text|
Indexes:
"keywebrecord_archive" PRIMARY KEY, btree (id)


This is the query planner:
explain analyze
select * from webtraffic_archive where timestamp::date='2017-02-20' and 
profile='f62467'

And here is the results (after lots of seconds). Index seems to work. The query 
on a partitioned table is very fast, the problem is on the entire table.
Thank you!
/F



QUERY PLAN
--
 Limit  (cost=0.00..12.57 rows=10 width=71) (actual 
time=1319111.913..1319111.973 rows=10 loops=1)
   ->  Append  (cost=0.00..144674.15 rows=115133 width=71) (actual 
time=1319111.909..1319111.964 rows=10 loops=1)
 ->  Seq Scan on webtraffic_archive  (cost=0.00..0.00 rows=1 width=953) 
(actual time=0.006..0.006 rows=0 loops=1)
   Filter: (((profile)::text = 'f62467'::text) AND 
(("timestamp")::date = '2017-02-20'::date))
 ->  Index Scan using 
webtraffic_archive_day_2016_12_25_profile_composed_wbidx on 
webtraffic_archive_day_2016_12_25  (cost=0.56..58.08 rows=1 width=71) (actual 
time=0.109..0.109 rows=0 loops=1)
   Index Cond: ((("timestamp")::date = '2017-02-20'::date) AND 
((profile)::text = 'f62467'::text))
 ->  Index Scan using 
webtraffic_archive_day_2016_12_26_profile_composed_wbidx on 
webtraffic_archive_day_2016_12_26  (cost=0.56..58.08 rows=1 width=70) (actual 
time=0.084..0.084 rows=0 loops=1)
   Index Cond: ((("timestamp")::date = '2017-02-20'::date) AND 
((profile)::text = 'f62467'::text))
 ->  Index Scan using 
webtraffic_archive_day_2016_12_27_profile_composed_wbidx on 
webtraffic_archive_day_2016_12_27  (cost=0.56..58.08 rows=1 width=70) (actual 
time=0.076..0.076 rows=0 loops=1)
   Index Cond: ((("timestamp")::date = '2017-02-20'::date) AND 
((profile)::text = 'f62467'::text))
 ->  Index Scan using 
webtraffic_archive_day_2016_12_28_profile_composed_wbidx on 
webtraffic_archive_day_2016_12_28  (cost=0.56..58.08 rows=1 width=70) (actual 
time=0.069..0.069 rows=0 loops=1)
   Index Cond: ((("timestamp")::date = '2017-02-20'::date) AND 
((profile)::text = 'f62467'::text))
 ->  Index Scan using 
webtraffic_archive_day_2016_12_29_profile_composed_wbidx on 
webtraffic_archive_day_2016_12_29  (cost=0.56..58.08 rows=1 width=70) (actual 
time=0.069..0.069 rows=0 loops=1)
   Index Cond: ((("timestamp")::date = '2017-02-20'::date) AND 
((profile)::text = 'f62467'::text))
 ->  Index Scan using 
webtraffic_archive_day_2016_12_30_profile_composed_wbidx on 
webtraffic_archive_day_2016_12_30  (cost=0.56..58.08 rows=1 width=69) (actual 
time=0.105..0.105 rows=0 loops=1)
   Index Cond: ((("timestamp")::date = '2017-02-20'::date) AND 
((profile)::text = 'f62467'::text))
 ->  Index Scan using