Re: RAM usage of PostgreSql

2019-04-04 Thread Keith
On Thu, Apr 4, 2019 at 10:48 AM Prasad  wrote:

> Hi,
>
> I have installed PostgreSQL 9.4 (open source) version on my CentOS Linux
> Red Hat 7 production server and kept default parameters which are in
> postgresql.conf file.So my basic question is, once I start using postgres
> how much RAM the postgres processes  consumes (postgres related processes
> only).
>
> There are lot of allocations in postgresql.conf file, for example
> shared_buffers, work_mem...etc.
>
> As per my knowledge, all postgres processes should not consume the RAM
> more than the value assigned in shared_buffers.Please clarify and let me
> know if I misunderstand the concept..
>
> --
> Thanks,
> Venkata Prasad
>
>
>
shared_buffers is just the shared memory segment. work_mem &
maintenance_work mem are used in addition to shared_buffers and there can
be multiples of those values in use at the same time. The PostgreSQL wiki
has some good guidance on what the different memory settings mean and how
to tune them

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server


Partitioning Optimizer Questions and Issues

2019-02-08 Thread keith anderson
 try and 
prevent this).- To me in the examples queries described above, it makes sense 
to use the partition statistics for the partition level access strategy, but 
the global statistics when estimating the actual rows returned by all the 
individual partition queries. Is there a reason not to do this? Or do others 
believe the optimizer is doing the right thing here?
And then some general questions:
- How do other people use partitioning but without a significant performance 
disadvantage on reading the data? Is there something else I should be doing 
here to achieve the same thing without the overhead? At present my reads have 
increased optimization cost (as it needs to optimize access to each partition) 
and also execution cost (access the index on every partition). Even without the 
optimizer issues described above, the cost of reading simple data is extremely 
high relative to non-partitioned data (unless you use the partition key as a 
filter for each table to eliminate those partitions).- Is there any chance/plan 
to add global indexes to postgres? If so would that impact significantly the 
cost of the partition drop e.g. to clean up the index.
Thanks in advance for any feedback/support,
Keith



Re: Partitioning Optimizer Questions and Issues

2019-02-11 Thread keith anderson
 Thanks for the feedback Justin.

You are right, the most-common-values list is empty for my test case and so it 
is using n_distinct for the 'id IN()' scenario.And I can see that with the 
pg_class.reltuples and the pg_stats.histogram_bounds values how the optimizer 
can conclude with my range query that only 1 in 5 entries in my range query are 
in each individual partition.
However, I can also see that the pg_stats.n_distinct value for tablea shows -1, 
as do all the individual child partitions. In my opinion it makes sense for the 
optimizer when using n_distinct on partitioned tables to use the n_distinct 
value of the parent table level when estimating row counts rather than a sum of 
the partition level statistics. Or can someone come up with a good reason to 
avoid this? 
A couple of examples of different data in a partitioned table:
- Unique identifier -> if providing a single value in a query -> using 
n_distinct from parent will estimate 1, using child tables will be 1 * (number 
of partitions). Use of parent table would be correct.- Date of activity, with  
1000 records per day -> if providing a single day to the query -> using 
n_distinct from parent would show 1000 rows returned, using child tables will 
be 1000 * (number of partitions). Use of parent table n_distinct is correct.
Perhaps when querying on columns that are part of the partition logic you could 
use the partition level stats, but I think the vast majority of the time, using 
the parent statistics would be much more reliable/accurate than summing across 
partitions.
In terms of the partition strategy, I agree that it should be done with a view 
to helping performance improve. I will look into more detail at your 
suggestions, but in general it is very hard to use effectively as there are 
competing priorities:
- I would like to not have to manage massive numbers of partitions- I would 
like to be able to archive data easily using date (a big plus point to the 
existing date partitioning strategy)- It is hard in most cases to come up with 
a partition strategy that allows for partition elimination e.g. consider a 
common 'transaction record' table with a primary key, an account identifier, 
and a date -> it is natural to want to be able to query on any one of these, 
but as things stand it cannot be achieved performantly with partitioning.
Global index support feels like it has potential to resolve many of the issues 
I have with partitioning (beyond the optimizer concern above). I assume this 
has been discussed and rejected though by the community?
I've attached as a file the original test script.
Keith


On Friday, 8 February 2019, 13:05:04 GMT, Justin Pryzby 
 wrote:  
 
 On Fri, Feb 08, 2019 at 11:13:51AM +, keith anderson wrote:
> So to summarise the findings/questions from above:
> - It seems like the Postgres optimizer sometimes uses the partition level 
> statistics, and sometimes the global table level statistics? Or is it using 
> something else?- With partitioning tables with unique identifier and 
> retrieving explicitly on those identifiers, at present the optimizer will 
> always understimate the selectivity and overestimate the rows returned. This 
> inaccuracy increases in proportion to the number of partitions.- As a result, 
> when joining to other tables, you are liable to hitting sequential scans. 
> This becomes more likely as you have more partitions or if join to more 
> partitioned tables (note I am aware I could try and tune random_page_cost to 
> try and prevent this).- To me in the examples queries described above, it 
> makes sense to use the partition statistics for the partition level access 
> strategy, but the global statistics when estimating the actual rows returned 
> by all the individual partition queries. Is there a reason not to do this? Or 
> do others believe the optimizer is doing the right thing here?
> And then some general questions:
> - How do other people use partitioning but without a significant performance 
> disadvantage on reading the data? Is there something else I should be doing 
> here to achieve the same thing without the overhead? At present my reads have 
> increased optimization cost (as it needs to optimize access to each 
> partition) and also execution cost (access the index on every partition). 
> Even without the optimizer issues described above, the cost of reading simple 
> data is extremely high relative to non-partitioned data (unless you use the 
> partition key as a filter for each table to eliminate those partitions).- Is 
> there any chance/plan to add global indexes to postgres? If so would that 
> impact significantly the cost of the partition drop e.g. to clean up the 
> index.
> Thanks in advance for any feedback/support,

An equality or IN() query will use the pg_stats most-common-values list,
whereas a range query will us