Re: [HACKERS] Proposing pg_hibernate

2014-06-07 Thread Cédric Villemain
Le lundi 3 février 2014 19:18:54 Gurjeet Singh a écrit :
 Possible enhancements:
 - Ability to save/restore only specific databases.
 - Control how many BlockReaders are active at a time; to avoid I/O
 storms. - Be smart about lowered shared_buffers across the restart.
 - Different modes of reading like pg_prewarm does.
 - Include PgFincore functionality, at least for Linux platforms.

Please note that pgfincore is working on any system where PostgreSQL
prefetch is working, exactly like pg_prewarm. This includes linux, BSD and
many unix-like. It *is not* limited to linux.
I never had a single request for windows, but windows does provides an
API for that too (however I have no windows offhand to test).

Another side note is that currently BSD (at least freeBSD) have a more
advanced mincore() syscall than linux and offers a better analysis (dirty
status is known) and they implemented posix_fadvise...

PS:
There is a previous thread about that hibernation feature. Mitsuru IWASAKI
did a patch, and it triggers some interesting discussions.
Some notes in this thread are outdated now, but it's worth having a look
at it:

http://www.postgresql.org/message-id/20110504.231048.113741617.iwas...@jp.freebsd.org

https://commitfest.postgresql.org/action/patch_view?idT9

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


[HACKERS] Using Index-only scans to speed up count(*)

2014-06-07 Thread Gurjeet Singh
While reading [1] in context of Postgres Hibernator, I see that
Mitsuru mentioned one of the ways other RDBMS allows count(*) to be
driven by an index.

 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load index blocks

I am not sure if Postgres planner already allows this, but it would be
great if the planner considered driving a count(*) query using a
non-partial index, in the hopes that it turns into an index-only scan,
and hence returns count(*) result faster.The non-partial index may not
necessarily be the primary key index, it can be chosen purely based on
size, favouring smaller indexes.

This may alleviate some of the concerns of people migrating
applications from other DBMS' that perform count(*) in a blink of an
eye.

[1]: 
http://www.postgresql.org/message-id/20110507.08.83883502.iwas...@jp.freebsd.org

Best regards,

PS: Please note that I am not proposing to add support for the
optimizer hint embedded in Mitsuru's query.
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using Index-only scans to speed up count(*)

2014-06-07 Thread Cédric Villemain
Le samedi 7 juin 2014 08:35:27 Gurjeet Singh a écrit :
 While reading [1] in context of Postgres Hibernator, I see that
 Mitsuru mentioned one of the ways other RDBMS allows count(*) to be
 driven by an index.
 
  'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load index
  blocks
 I am not sure if Postgres planner already allows this, but it would be
 great if the planner considered driving a count(*) query using a
 non-partial index, in the hopes that it turns into an index-only
 scan, and hence returns count(*) result faster.The non-partial index
 may not necessarily be the primary key index, it can be chosen purely
 based on size, favouring smaller indexes.

IIRC it is not (yet) possible to switch from index-scan to indexonly-
scan on the fly because the structure used are different (indexonly scan 
needs to prepare a tuple struct to hold data, I'm not sure of the 
details).
Indexonly scan is already used to answer count(*) but decision is done 
during planning.

Now, it happens that this is an interesting idea which has already been 
discussed if not on postgresql-hacker at least during pre/post-
conferences social events: being able to switch the plan during 
execution if things are not working as expected (in the same topic you 
have 'progress bar' for query execution, at least some mechanisms should 
be shared by both features). 

 PS: Please note that I am not proposing to add support for the
 optimizer hint embedded in Mitsuru's query.

:-) 

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Using Index-only scans to speed up count(*)

2014-06-07 Thread Gurjeet Singh
On Sat, Jun 7, 2014 at 8:56 AM, Cédric Villemain ced...@2ndquadrant.com wrote:
 Le samedi 7 juin 2014 08:35:27 Gurjeet Singh a écrit :

 PS: Please note that I am not proposing to add support for the
 optimizer hint embedded in Mitsuru's query.

 :-)

Even though I (sometimes) favor hints, and developed the optimizer
hints feature in EDB (PPAS), I know how much Postgres **hates** [1]
optimizer hints :) So just trying to wade off potential flamewar-ish
comments.

[1]: http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using Index-only scans to speed up count(*)

2014-06-07 Thread Cédric Villemain
Le samedi 7 juin 2014 09:09:00 Gurjeet Singh a écrit :
 On Sat, Jun 7, 2014 at 8:56 AM, Cédric Villemain 
ced...@2ndquadrant.com wrote:
  Le samedi 7 juin 2014 08:35:27 Gurjeet Singh a écrit :
  PS: Please note that I am not proposing to add support for the
  optimizer hint embedded in Mitsuru's query.
  
  :-)
 
 Even though I (sometimes) favor hints, and developed the optimizer
 hints feature in EDB (PPAS), I know how much Postgres **hates** [1]
 optimizer hints :) So just trying to wade off potential flamewar-ish
 comments.

There is a large benefits to users in preventing HINT in core: it makes 
it mandatory for PostgreSQL to keep improving, and it makes it mandatory 
for developers to find solutions around this constraint. There is at 
least planner_hint contribution (search Oleg website), and added to a 
postgresql hook on the parser you're done implementing HINT in userland.

I'm not arguing pro/cons about the feature (as you said the question has 
been answered already) but arguing that arbitrary constraints challenge 
us and produces good things for PostgreSQL in return.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Using Index-only scans to speed up count(*)

2014-06-07 Thread Noah Misch
On Sat, Jun 07, 2014 at 08:35:27AM -0400, Gurjeet Singh wrote:
 While reading [1] in context of Postgres Hibernator, I see that
 Mitsuru mentioned one of the ways other RDBMS allows count(*) to be
 driven by an index.
 
  'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load index blocks
 
 I am not sure if Postgres planner already allows this, 

It does:

create table t (c, junk) as select *, repeat('a', 100)
  from generate_series(1,1);
alter table t add primary key (c);
vacuum t;
analyze t;
explain select count(*) from t;

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG Manual: Clarifying the repeatable read isolation example

2014-06-07 Thread Kevin Grittner
David G Johnston david.g.johns...@gmail.com wrote:

   For example, even a read only transaction at this level may see a
 control record updated to show that a batch has been completed but
 not see one of the detail records which is logically part of the
 batch because it read an earlier revision of the control record.

 Hmm, that seems to be a super-summarized description of what Kevin  Dan
 called the receipts problem. There's an example of that in the
 isolation test suite, see src/test/isolation/specs/receipt-report.spec.

It is also one of the examples I provided on the SSI Wiki page:

https://wiki.postgresql.org/wiki/SSI#Deposit_Report
 
 Googling for it, I also found an academic paper written by Kevin  Dan
 that illustrates it: http://arxiv.org/pdf/1208.4179.pdf, 2.1.2 Example
 2: Batch Processing. (Nice work, I didn't know of that paper until now!)

There were links to drafts of the paper in July, 2012, but I guess
the official location in the Proceedings of the VLDB Endowment was
never posted to the community lists.  That's probably worth having
on record here:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

 I agree that's too terse. I think it would be good to actually spell out
 a complete example of the Receipt problem in the manual. That chapter in
 the manual contains examples of anomalities in Read Committed mode, so
 it would be good to give a concrete example of an anomaly in Repeatable
 Read mode too.

I found it hard to decide how far to go in the docs versus the Wiki
page.  Any suggestions or suggested patches welcome.

 While this is not a doc patch I decided to give it some thought.  The bank
 example was understandable enough for me so I simply tried to make it more
 accessible.  I also didn't go and try to get it to conform to other,
 existing, examples.  This is intended to replace the entire For example...
 paragraph noted above.

 While Repeatable Read provides for stable in-transaction reads logical query
 anomalies can result because commit order is not restricted and
 serialization errors only occur if two transactions attempt to modify the
 same record.

 Consider a rule that, upon updating r1 OR r2, if r1+r2  0 then subtract an
 additional 1 from the corresponding row.
 Initial State: r1 = 0; r2 = 0
 Transaction 1 Begins: reads (0,0); adds -10 to r1, notes r1 + r2 will be -10
 and subtracts an additional 1
 Transaction 2 Begins: reads (0,0); adds 20 to r2, notes r1 + r2 will be +20;
 no further action needed
 Commit 2
 Transaction 3: reads (0,20) and commits
 Commit 1
 Transaction 4: reads (-11,20) and commits

 However, if Transaction 2 commits first then, logically, the calculation of
 r1 + r2 in Transaction 1 should result in a false outcome and the additional
 subtraction of 1 should not occur - leaving T4 reading (-10,20). 

 The ability for out-of-order commits is what allows T3 to read the pair
 (0,20) which is logically impossible in the T2-before-T1 commit order with
 T4 reading (-11,20).

 Neither transaction fails since a serialization failure only occurs if a
 concurrent update occurs to [ r1 (in T1) ] or to [ r2 (in T2) ]; The update
 of [ r2 (in T1) ] is invisible - i.e., no failure occurs if a read value
 undergoes a change.

 Inspired by:
 http://www.sigmod.org/publications/sigmod-record/0409/2.ROAnomONeil.pdf -
 Example 1.3

I know this is subjective, but that seems to me a little too much
in an academic style for the docs.  In the Wiki page examples I
tried to use a style more accessible to DBAs and application
programmers.  Don't get me wrong, I found various papers by Alan
Fekete and others very valuable while working on the feature, but
they are often geared more toward those developing such features
than those using them.

That said, I know I'm not the best word-smith in the community, and
would very much welcome suggestions from others on the best way to
cover this.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG Manual: Clarifying the repeatable read isolation example

2014-06-07 Thread Gavin Flower

On 08/06/14 05:03, Kevin Grittner wrote:
[...]
I found it hard to decide how far to go in the docs versus the Wiki 
page.  Any suggestions or suggested patches welcome.

[...]
I know this is subjective, but that seems to me a little too much in 
an academic style for the docs.  In the Wiki page examples I tried to 
use a style more accessible to DBAs and application programmers.  
Don't get me wrong, I found various papers by Alan Fekete and others 
very valuable while working on the feature, but they are often geared 
more toward those developing such features than those using them. That 
said, I know I'm not the best word-smith in the community, and would 
very much welcome suggestions from others on the best way to cover 
this. -- Kevin Grittner EDB: http://www.enterprisedb.com The 
Enterprise PostgreSQL Company 


I know that I first look at the docs  seldom look at the Wiki - in fact 
it was only recently that I became aware of the Wiki, and it is still 
not the first thing I think of when I want to know something, and I 
often forget it exists.  I suspect many people are like me in this!


Also the docs have a more authoritative air, and probably automatically 
assumed to be more up-to-date and relevant to the version of Postgres used.


So I suggest that the docs should have an appropriate coverage of such 
topics, possibly mostly in an appendix with brief references in affected 
parts of the main docs) if it does not quite fit into the rest of the 
documentation (affects many different features, so no one place in the 
main docs is appropriate - or too detailed, or too much).  Also links to 
the Wiki, and to the more academic papers, could be provided for the 
really keen.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2014-06-07 Thread Jaime Casanova
On Fri, Mar 14, 2014 at 12:30 AM, Prabakaran, Vaishnavi
vaishna...@fast.au.fujitsu.com wrote:
 Hi,

 In connection to my previous proposal about providing catalog view to
 pg_hba.conf file contents , I have developed the attached patch .

[...]

 [What this Patch does]

 Functionality of the attached patch is that it will provide a new view
 pg_hba_settings to admin users. Public access to the view is restricted.
 This view will display basic information about HBA setting details of
 postgresql cluster.  Information to be shown , is taken from parsed hba
 lines and not directly read from pg_hba.conf files. Documentation files are
 also updated to include details of this new view under Chapter 47.System
 Catalogs. Also , a new note is added in chapter 19.1 The pg_hba.conf File


A normal user can see all the info the view provide once you GRANT
permissions on it. How much info should a non-superuser see from this
view? currently a non-superuser can't see pg_hba info, now it can.

This function should be superuser only or only show info related for
current_user if it user is not superuser.

Also, i think you should use lowercase values just they are in
pg_hba.conf (ie: local not Local, host not Host, etc)

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-07 Thread David Rowley
On Fri, Jun 6, 2014 at 2:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I'm not entirely convinced that it's worth the extra planning cycles,
 though.  Given the small number of complaints to date, it might not
 be worth doing this.  Thoughts?


That's a difficult question for sure. Obviously it's going to depend on the
query that we're planning. If it's a very simple query and we don't reduce
the target list of the subquery any, then we'll get a small performance
impact... But if we do manage to prune down the targetlist and later allow
a left join on a 1 billion row table to be removed, then the cost of the
extra few cycles performed by this patch would be totally unmeasurable and
completely insignificant.

At work we use insert name of popular commercial relational database
product here, and on a daily basis I come across poorly written queries by
other developers.

The things I can think of off hand are:

1. All primary key fields of a table are in a DISTINCT clause
2. Someone had written a query like: SELECT some,1 as type FROM table UNION
SELECT thing,2 as type from othertable

The database in question manage to remove the DISTINCT on 1 because it's
just not needed as each group could only ever have 1 row. On 2 it managed
to see that because column 2 of the UNION query was a constant and it was a
different constant on each side of the UNION, then the query would not
produce duplicates and it changed this to UNION ALL.

At home I checked how PostgreSQL handled both of these cases, and saw that
it failed to see through the non-sense in the poorly written queries on
both accounts. This is fine, as anyone who ever posted on the performance
list saying, why is this slow? We'd tell them to write their query
another way. But what about all the companies who consider porting their
application over to PostgreSQL and get to the stage of importing all the
data over onto a test server and trying a few of their (poorly written)
queries. And they see a 10 times slowdown and immediately think PostgreSQL
is just not for them. It's a shame that they'd turn us down so early, but
if we went ahead and added code to detect both of these situations then
we'd end up increasing planning time for everyone else who writes their
queries properly.

I don't really have an answer for this, but I do think it needs more
discussion. The only things I've thought of so far as a planner_strength
GNU that can try to optimise these things when it gets to a certain level,
but that just introduces surprise factor and a whole bunch of threads on
performance saying... Why is this query on pg10.4 slower than pg10.2? and
our reply goes, is planner_strength set to the same on both? It does not
sound pretty, or another option to replan a query when the cost is over a
certain threshold with the strength level turned to maximum, but that seems
like it could go along the same lines as far as surprise factor is
concerned.

It's pretty hard to get the best of both worlds here. I know my post does
not have many answers, but I posted it anyway just in case someone else
comes up with a good idea that perhaps we could all work towards.

Regards

David Rowley