date:20131118


Committed, thanks for the report!

On 16.11.2013 22:05, Миша Тюрин wrote:

Hello!

Could anyone review patch suggested by Jeff Janes ?

Initial thread 
http://www.postgresql.org/message-id/flat/1384356585.995240612%40f50.i.mail.ru

Thanks in advance!



On Wed, Nov 13, 2013 at 3:53 PM, Sergey Burladyan   eshkin...@gmail.com  
wrote:

Jeff Janes  jeff.ja...@gmail.com  writes:

If I not mistaken, looks like lazy_scan_heap() called from lazy_vacuum_rel()
(see [1]) skip pages, even if it run with scan_all == true, lazy_scan_heap()
does not increment scanned_pages if lazy_check_needs_freeze() return false, so
if this occurred at wraparound vacuum it cannot update pg_class, because
pg_class updated via this code:

 new_frozen_xid = FreezeLimit;
 if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
 new_frozen_xid = InvalidTransactionId;

 vac_update_relstats(onerel,
 new_rel_pages,
 new_rel_tuples,
 new_rel_allvisible,
 vacrelstats-hasindex,
 new_frozen_xid);

so i think in our prevent wraparound vacuum vacrelstats-scanned_pages always
less than vacrelstats-rel_pages and pg_class relfrozenxid never updated.


Yeah, I think that that is a bug.  If the clean-up lock is unavailable but the 
page is inspected without it and found not to need freezing, then the page 
needs to be counted as scanned, but is not so counted.

commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e
Date:   Mon Nov 7 21:39:40 2011 -0500

But this was introduced in 9.2.0, so unless the OP didn't upgrade to 9.2 until 
recently, I don't know why it just started happening.

It looks like a simple fix (to HEAD attached), but I don't know how to test it.

Also, it seem like it might be worth issuing a warning if scan_all is true but 
all was not scanned.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
To make changes to your subscription:
https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/mailpref/pgsql-generalk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=n4tu%2Fhw2DBAVCLO0UwZqdJsniWyqjnPt3OK%2FqepXInw%3D%0As=ffa1f6d45b713d12b320b72a4f417015498f57305664de0da209dc32f5e8a63b






--







--
- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP


On 15.11.2013 21:00, Simon Riggs wrote:

On 15 November 2013 15:48, Peter Eisentraut pete...@gmx.net wrote:

Also, you set this to returned with feedback in the CF app.  Please
verify whether that was intentional.


Not sure that was me, if so, corrected.


It was me, sorry. I figured this needs such a large restructuring that 
it's not going to be committable in this commitfest. But I'm happy to 
keep the discussion going if you think otherwise.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP


On 15.11.2013 20:21, Andres Freund wrote:

On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:

It's pretty hard to review the this without seeing the other
implementation you're envisioning to use this API. But I'll try:


We've written a distributed sequence implementation against it, so it
works at least for that use case.

While certainly not release worthy, it still might be interesting to
look at.
https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitweb/?p%3Dusers/andresfreund/postgres.git%3Ba%3Dblob%3Bf%3Dcontrib/bdr/bdr_seq.c%3Bh%3Dc9480c8021882f888e35080f0e7a7494af37ae27%3Bhb%3Dbdrk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=Rbmo%2BDar4PZrQmGH2adz7cCgqRl2%2FXH845YIA1ifS7A%3D%0As=8d287f35070fe7cb586f10b5bfe8664ad29e986b5f2d2286c4109e09f615668d

bdr_sequencer_fill_sequence pre-allocates ranges of values,
bdr_sequence_alloc returns them upon nextval().


Thanks. That pokes into the low-level details of sequence tuples, just 
like I was afraid it would.



I wonder if the level of abstraction is right. The patch assumes that the
on-disk storage of all sequences is the same, so the access method can't
change that.  But then it leaves the details of actually updating the on-disk
block, WAL-logging and all, as the responsibility of the access method.
Surely that's going to look identical in all the seqams, if they all use the
same on-disk format. That also means that the sequence access methods can't
be implemented as plugins, as plugins can't do WAL-logging.


Well, it exposes log_sequence_tuple() - together with the added am
private column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.


The proposed abstraction leaks like a sieve. The plugin should not need 
to touch anything but the private amdata field. log_sequence_tuple() is 
way too low-level.


Just as a thought-experiment, imagine that we decided to re-implement 
sequences so that all the sequence values are stored in one big table, 
or flat-file in the data directory, instead of the current 
implementation where we have a one-block relation file for each 
sequence. If the sequence access method API is any good, it should stay 
unchanged. That's clearly not the case with the proposed API.



The comment in seqam.c says that there's a private column reserved for
access method-specific data, called am_data, but that seems to be the only
mention of am_data in the patch.


It's amdata, not am_data :/. Directly at the end of pg_sequence.


Ah, got it.


Stepping back a bit and looking at this problem from a higher level, why 
do you need to hack this stuff into the sequences? Couldn't you just 
define a new set of functions, say bdr_currval() and bdr_nextval(), to 
operate on these new kinds of sequences? You're not making much use of 
the existing sequence infrastructure, anyway, so it might be best to 
just keep the implementation completely separate. If you need it for 
compatibility with applications, you could create facade 
currval/nextval() functions that call the built-in version or the bdr 
version depending on the argument.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-18 Thread Hannu Krosing

On 11/18/2013 05:19 AM, Andrew Dunstan wrote:

 On 11/17/2013 08:49 PM, Josh Berkus wrote:
 Now, if it turns out that the new hstore is not dealing with json input
 and output, we could have json, jstore and hstore.
 Jstore isn't the worst name suggestion I've heard on this thread.  The
 reason I prefer JSONB though, is that a new user looking for a place to
 put JSON data will clearly realize that JSON and JSONB are alternatives
 and related in some way.  They won't necessarily expect that jstore
 has anything to do with JSON, especially when there is another type
 called JSON.  Quite a few people are liable to think it's something to
 do with Java.

 Besides, we might get sued by these people: http://www.jstor.org/  ;-)


 I don't think any name that doesn't begin with json is acceptable.
 I could live with jsonb. It has the merit of brevity, but maybe it's
 a tad
 too close to json to be the right answer.
How about jsondoc, or jsonobj ?

It is still reasonably 'json' but not too easy to confuse with existing
json
when typing

And it perhaps hints better at the main difference from string-json, namely
that it is an object and not textual source code / notation / processing
info .

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY table FROM STDIN doesn't show count tag

2013-11-18 Thread Amit Khandekar

On 18 October 2013 17:07, Rajeev rastogi rajeev.rast...@huawei.com wrote:

  From the following mail, copy behaviour between stdin and normal file
 having some inconsistency.


 http://www.postgresql.org/message-id/ce85a517.4878e%tim.k...@gmail.com



 The issue was that if copy  execute from stdin, then it goes to the
 server to execute the command and then server request for the input, it
 sends back the control to client to enter the data. So once client sends
 the input to server, server execute the copy command and sends back the
 result to client but client does not print the result instead it just clear
 it out.

 Changes are made to ensure the final result from server get printed before
 clearing the result.



 Please find the patch for the same and let me know your suggestions.


In this call :
success = handleCopyIn(pset.db, pset.cur_cmd_source,
   PQbinaryTuples(*results), intres)  success;

if (success  intres)
 success = PrintQueryResults(intres);

Instead of handling of the result status this way, what if we use the
ProcessResult()  argument 'result' to pass back the COPY result status to
the caller ? We already call PrintQueryResults(results) after the
ProcessResult() call. So we don't have to have a COPY-specific
PrintQueryResults() call. Also, if there is a subsequent SQL command in the
same query string, the consequence of the patch is that the client prints
both COPY output and the last command output. So my suggestion would also
allow us to be consistent with the general behaviour that only the last SQL
command output is printed in case of multiple SQL commands. Here is how it
gets printed with your patch :

psql -d postgres -c \copy tab from '/tmp/st.sql' delimiter ' '; insert
into tab values ('lll', 3)
COPY 1
INSERT 0 1

This is not harmful, but just a matter of consistency.




 *Thanks and Regards,*

 *Kumar Rajeev Rastogi *




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog


On 18 November 2013 11:19 Haribabu kommi wrote:
 On 17 November 2013 00:55 Fujii Masao wrote:
  On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   on 15 November 2013 17:26 Magnus Hagander wrote:
  
  On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
  
  On 14 November 2013 23:59 Fujii Masao wrote:
   On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
Please find attached the patch, for adding a new option for
pg_basebackup, to specify a different directory for pg_xlog.
  
   Sounds good! Here are the review comments:
   Don't we need to prevent users from specifying the same
 directory
   in both --pgdata and --xlogdir?
  
  I feel no need to prevent, even if user specifies both --pgdata
 and
  --xlogdir as same directory all the transaction log files will be
  created in the base directory  instead of pg_xlog directory.
  
  
  
  Given how easy it would be to prevent that, I think we should. It
  would be  an easy misunderstanding to specify that when you
 actually
  want it in  wherever/pg_xlog. Specifying that would be redundant
  in the first place,  but people ca do that, but it
  
  would also be very easy to do it by mistake, and you'd end up with
  something that's really bad, including a recursive symlink.
  
  
  
   Presently with initdb also user can specify both data and xlog
   directories as same.
  
   To prevent the data directory and xlog directory as same, there is
 a
   way in windows (_fullpath api) to get absolute path from a relative
   path, but I didn't get any such possibilities in linux.
  
   I didn't find any other way to check it, if anyone have any idea
   regarding this please let me know.
 
  What about make_absolute_path() in miscinit.c?
 
 The make_absoulte_patch() function gets the current working directory
 and adds The relative path to CWD, this is not giving proper absolute
 path.
 
 I have added a new function verify_data_and_xlog_dir_same() which will
 change the Current working directory to data directory and gets the CWD
 and the same way for transaction log directory. Compare the both data
 and xlog directories and throw an error. Please check it once.
 
 Is there any other way to identify that both data and xlog directories
 are pointing to the same Instead of comparing both absolute paths?
 
 Updated patch attached in the mail.

Failure when the xlogdir doesn't exist is fixed in the updated patch attached 
in the mail.

Regards,
Hari babu.



UserSpecifiedxlogDir_v4.patch
Description: UserSpecifiedxlogDir_v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP

On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
 On 15.11.2013 20:21, Andres Freund wrote:
 Well, it exposes log_sequence_tuple() - together with the added am
 private column of pg_squence that allows to do quite a bit of different
 things. I think unless we really implement pluggable RMGRs - which I
 don't really see coming - we cannot be radically different.
 
 The proposed abstraction leaks like a sieve. The plugin should not need to
 touch anything but the private amdata field. log_sequence_tuple() is way too
 low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence-amdata
-is_called
-last_value
-log_cnt
SeqTable-last
SeqTable-cached
SeqTable-last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
  once are.  Which e.g. you need to control if want a sequence that
  doesn't leak values in crashes
* cached is needed to control the per-backend caching.


 Just as a thought-experiment, imagine that we decided to re-implement
 sequences so that all the sequence values are stored in one big table, or
 flat-file in the data directory, instead of the current implementation where
 we have a one-block relation file for each sequence. If the sequence access
 method API is any good, it should stay unchanged. That's clearly not the
 case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

 Stepping back a bit and looking at this problem from a higher level, why do
 you need to hack this stuff into the sequences? Couldn't you just define a
 new set of functions, say bdr_currval() and bdr_nextval(), to operate on
 these new kinds of sequences?

Basically two things:
a) User interface. For one everyone would need to reimplement the entire
sequence DDL from start. For another it means it's hard to write
(client) code that doesn't depend on a specific implementation.
b) It's not actually easy to get similar semantics in userspace. How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

 You're not making much use of the existing sequence infrastructure, anyway, 
 so it might be best to just keep the
 implementation completely separate. If you need it for compatibility with
 applications, you could create facade currval/nextval() functions that call
 the built-in version or the bdr version depending on the argument.

That doesn't get you very far:
a) the default functions created by sequences are pg_catalog
prefixed. So you'd need to hack up the catalog to get your own functions
to be used if you want the application to work transparently. In which
you need to remember the former function because you now cannot call it
normally anymore. Yuck.
b) One would need nearly all of sequence.c again. You need the state
across calls, the locking, the WAL logging, DDL support.  Pretty much
the only thing *not* used would be nextval_internal() and do_setval().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-18 Thread Sameer Thakur

Hello,
Please find v10 of patch attached. This patch addresses following
review comments
1. Removed errcode and used elogs for error pg_stat_statements schema
is not supported by its binary
2. Removed comments and other code formatting not directly relevant to
patch functionality
3. changed position of query_id in view to userid,dbid,query_id..
4 cleaned the patch some more to avoid unnecessary  whitespaces, newlines.

I assume the usage of PGSS_TUP_LATEST after explanation given.
Also the mixing of PG_VERSION_NUM with query_id is ok after after
explanation given.

regards
Sameer


pg_stat_statements-identification-v10.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP


On 18.11.2013 11:48, Andres Freund wrote:

On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:

On 15.11.2013 20:21, Andres Freund wrote:

Well, it exposes log_sequence_tuple() - together with the added am
private column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.


The proposed abstraction leaks like a sieve. The plugin should not need to
touch anything but the private amdata field. log_sequence_tuple() is way too
low-level.


Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.


It doesn't go far enough, it's still too *low*-level. The sequence AM 
implementation shouldn't need to have direct access to the buffer page 
at all.



If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence-amdata
 -is_called
 -last_value
 -log_cnt
SeqTable-last
SeqTable-cached
SeqTable-last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
   once are.  Which e.g. you need to control if want a sequence that
   doesn't leak values in crashes
* cached is needed to control the per-backend caching.


I don't think the sequence AM should be in control of 'cached'. The 
caching is done outside the AM. And log_cnt probably should be passed to 
the _alloc function directly as an argument, ie. the server code asks 
the AM to allocate N new values in one call.


I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

And it should return:

int64 value - the first value allocated.
int nvalues - the number of values allocated.
am_private - updated private data.

The backend code handles the caching and logging of values. When it has 
exhausted all the cached values (or doesn't have any yet), it calls the 
AM's alloc function to get a new batch. The AM returns the new batch, 
and updates its private state as necessary. Then the backend code 
updates the relation file with the new values and the AM's private data. 
WAL-logging and checkpointing is the backend's responsibility.



Just as a thought-experiment, imagine that we decided to re-implement
sequences so that all the sequence values are stored in one big table, or
flat-file in the data directory, instead of the current implementation where
we have a one-block relation file for each sequence. If the sequence access
method API is any good, it should stay unchanged. That's clearly not the
case with the proposed API.


I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.


I'm not thinking that we'd change sequences to not be relations. I'm 
thinking that we might not want to store the state as a one-page file 
anymore. In fact that was just discussed in the other thread titled 
init_sequence spill to hash table.



b) It's not actually easy to get similar semantics in userspace. How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?


heap_inplace_update()

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio

On 18 October 2013 13:35 KONDO Mitsumasa wrote: 
 Hi,
 
 I submit improvement of pg_stat_statement usage patch in CF3.
 
 In pg_stat_statement, I think buffer hit ratio is very important value.
 However, it is difficult to calculate it, and it need complicated SQL.
 This patch makes it more simple usage and documentation.
 
  -bench=# SELECT query, calls, total_time, rows, 100.0 *
 shared_blks_hit /
  -   nullif(shared_blks_hit + shared_blks_read, 0) AS
 hit_percent
  +bench=# SELECT query, calls, total_time, rows,
  +shared_blks_hit_percent
 FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
 
 It will be very simple:-)
 
 This patch conflicts pg_stat_statement_min_max_exectime patch which I
 submitted, and pg_stat_statement_min_max_exectime patch also adds new
 columns which are min_time and max_time. So I'd like to change it in
 this opportunity.

This patch adds another column shared_blks_hit_percent to pg_stat_statements 
view
Which is very beneficial to the user to know how much percentage of blks are 
hit.

All changes are fine and working as described. Marked as ready for committer.

Regards,
Hari babu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio

2013-11-18 Thread KONDO Mitsumasa


(2013/11/18 20:16), Haribabu kommi wrote:

On 18 October 2013 13:35 KONDO Mitsumasa wrote:

This patch conflicts pg_stat_statement_min_max_exectime patch which I
submitted, and pg_stat_statement_min_max_exectime patch also adds new
columns which are min_time and max_time. So I'd like to change it in
this opportunity.


This patch adds another column shared_blks_hit_percent to pg_stat_statements 
view
Which is very beneficial to the user to know how much percentage of blks are 
hit.

All changes are fine and working as described. Marked as ready for committer.

Thank you for your reviewing!

However, I'd like to add average time in each statement, too. Attached patch is 
my latest one. Adding shared_blks_hit_percent and ave_time. This is the adding 
main code.

+ total_time / calls::float AS avg_time,


If this patch and min/max and stddev patch will be commited, we can see more 
detail and simple information in pg_stat_statements, by light-weight coding.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
new file mode 100644
index 000..f0a8e0f
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
@@ -0,0 +1,63 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements();
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(
+OUT userid oid,
+OUT dbid oid,
+OUT query text,
+OUT calls int8,
+OUT total_time float8,
+OUT rows int8,
+OUT shared_blks_hit int8,
+OUT shared_blks_read int8,
+OUT shared_blks_dirtied int8,
+OUT shared_blks_written int8,
+OUT local_blks_hit int8,
+OUT local_blks_read int8,
+OUT local_blks_dirtied int8,
+OUT local_blks_written int8,
+OUT temp_blks_read int8,
+OUT temp_blks_written int8,
+OUT blk_read_time float8,
+OUT blk_write_time float8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT userid,
+ dbid,
+ query,
+ calls,
+ total_time,
+ rows,
+ CASE WHEN shared_blks_hit + shared_blks_read  0
+   THEN 100.0 * (shared_blks_hit::float / (shared_blks_hit + shared_blks_read))
+   ELSE 0 END AS shared_blks_hit_percent,
+ shared_blks_hit,
+ shared_blks_read,
+ shared_blks_dirtied,
+ shared_blks_written,
+ local_blks_hit,
+ local_blks_read,
+ local_blks_dirtied,
+ local_blks_written,
+ temp_blks_read,
+ temp_blks_written,
+ blk_read_time,
+ blk_write_time
+  FROM pg_stat_statements();
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
new file mode 100644
index 000..41dc16b
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
@@ -0,0 +1,65 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit
+
+-- Register functions.
+CREATE FUNCTION pg_stat_statements_reset()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE FUNCTION pg_stat_statements(
+OUT userid oid,
+OUT dbid oid,
+OUT query text,
+OUT calls int8,
+OUT total_time float8,
+OUT rows int8,
+OUT shared_blks_hit int8,
+OUT shared_blks_read int8,
+OUT shared_blks_dirtied int8,
+OUT shared_blks_written int8,
+OUT local_blks_hit int8,
+OUT local_blks_read int8,
+OUT local_blks_dirtied int8,
+OUT local_blks_written int8,
+OUT temp_blks_read int8,
+OUT temp_blks_written int8,
+OUT blk_read_time float8,
+OUT blk_write_time float8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+-- Register a view on the function for ease of use.
+CREATE VIEW pg_stat_statements AS
+  SELECT userid,
+ dbid,
+ query,
+ calls,
+ total_time,
+ total_time / calls::float AS avg_time,
+ rows,
+ CASE WHEN shared_blks_hit + shared_blks_read  0
+   THEN 100.0 * (shared_blks_hit::float / (shared_blks_hit + shared_blks_read))
+   ELSE 0 END AS shared_blks_hit_percent,
+ shared_blks_hit,
+ shared_blks_read,
+ shared_blks_dirtied,
+ shared_blks_written,
+

Re: [HACKERS] Sequence Access Method WIP

On 2013-11-18 12:50:21 +0200, Heikki Linnakangas wrote:
 On 18.11.2013 11:48, Andres Freund wrote:
 I don't think the sequence AM should be in control of 'cached'. The caching
 is done outside the AM. And log_cnt probably should be passed to the _alloc
 function directly as an argument, ie. the server code asks the AM to
 allocate N new values in one call.

Sounds sane.

 I'm thinking that the alloc function should look something like this:
 
 seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

I don't think we can avoid giving access to the other columns of
pg_sequence, stuff like increment, limits et all need to be available
for reading, so that'd also need to get passed in. And we need to signal
that am_private can be NULL, otherwise we'd end up with ugly ways to
signal that.
So I'd say to pass in the entire tuple, and return a copy? Alternatively
we can return am_private as a 'struct varlena *', so we can properly
signal empty values.

We also need a way to set am_private from outside
seqam_alloc/setval/... Many of the fancier sequences that people talked
about will need preallocation somewhere in the background. As proposed
that's easy enough to write using log_sequence_tuple(), this way we'd
need something that calls a callback with the appropriate buffer lock
held. So maybe a seqam_update(Relation seqrel, ...) callback that get's
called when somebody executes pg_sequence_update(oid)?

It'd probably a good idea to provide a generic function for checking
whether a new value falls in the boundaries of the sequence's min, max +
error handling.

 I'm not thinking that we'd change sequences to not be relations. I'm
 thinking that we might not want to store the state as a one-page file
 anymore. In fact that was just discussed in the other thread titled
 init_sequence spill to hash table.

Yes, I read and even commented in that thread... But nothing in the
current proposed API would prevent you from going in that direction,
you'd just get passed in a different tuple/buffer.

 b) It's not actually easy to get similar semantics in userspace. How
 would you emulate the crash-safe but non-transactional semantics of
 sequences without copying most of sequence.c? Without writing
 XLogInsert()s which we cannot do from a out-of-tree code?
 
 heap_inplace_update()

That gets the crashsafe part partially (doesn't allow making the tuple
wider than before), but not the caching/stateful part et al. The point
is that you need most of sequence.c again.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP

2013-11-18 Thread Simon Riggs

On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 It doesn't go far enough, it's still too *low*-level. The sequence AM
 implementation shouldn't need to have direct access to the buffer page at
 all.

 I don't think the sequence AM should be in control of 'cached'. The caching
 is done outside the AM. And log_cnt probably should be passed to the _alloc
 function directly as an argument, ie. the server code asks the AM to
 allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Do you have a new use case that shows why changes should happen? We
can't just redesign things based upon arbitrary decisions about what
things should or should not be possible via the API.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP


On 18.11.2013 13:48, Simon Riggs wrote:

On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com wrote:


It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page at
all.



I don't think the sequence AM should be in control of 'cached'. The caching
is done outside the AM. And log_cnt probably should be passed to the _alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.


I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..


Index AM's are completely responsible for the on-disk structure, while 
with the proposed API, both the AM and the backend are intimately aware 
of the on-disk representation. Such a shared responsibility is not a 
good thing in an API. I would also be fine with going 100% to the index 
AM direction, and remove all knowledge of the on-disk layout from the 
backend code and move it into the AMs. Then you could actually implement 
the discussed store all sequences in a single file change by writing a 
new sequence AM for it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

On 2013-11-18 19:52:29 +0900, Michael Paquier wrote:
 On Sat, Nov 16, 2013 at 5:09 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-11-15 11:40:17 +0900, Michael Paquier wrote:
  - 20131114_3_reindex_concurrently.patch, providing the core feature.
  Patch 3 needs to have patch 2 applied first. Regression tests,
  isolation tests and documentation are included with the patch.
 
  Have you addressed my concurrency concerns from the last version?
 I have added documentation in the patch with a better explanation
 about why those choices of implementation are made.

The dropping still isn't safe:
After phase 4 we are in the state:
old index: valid, live, !isdead
new index: !valid, live, !isdead
Then you do a index_concurrent_set_dead() from that state on in phase 5.
There you do WaitForLockers(locktag, AccessExclusiveLock) before
index_set_state_flags(INDEX_DROP_SET_DEAD).
That's not sufficient.

Consider what happens with the following sequence:
1) WaitForLockers(locktag, AccessExclusiveLock)
   - GetLockConflicts() = virtualxact 1
   - VirtualXactLock(1)
2) virtualxact 2 starts, opens the *old* index since it's currently the
   only valid one.
3) virtualxact 1 finishes
4) index_concurrent_set_dead() does index_set_state_flags(DROP_SET_DEAD)
5) another transaction (vxid 3) starts inserting data into the relation, updates
   only the new index, the old index is dead
6) vxid 2 inserts data, updates only the old index. Since it had the
   index already open the cache invalidations won't be processed.

Now the indexes are out of sync. There's entries only in the old index
and there's entries only in the new index. Not good.

I hate to repeat myself, but you really need to follow the current
protocol for concurrently dropping indexes. Which involves *first*
marking the index as invalid so it won't be used for querying anymore,
then wait for everyone possibly still seing that entry to finish, and
only *after* that mark the index as dead. You cannot argue away
correctness concerns with potential deadlocks.

c.f. 
http://www.postgresql.org/message-id/20130926103400.ga2471...@alap2.anarazel.de


I am also still unconvinced that the logic in index_concurrent_swap() is
correct. It very much needs to explain why no backend can see values
that are inconsistent. E.g. what prevents a backend thinking the old and
new indexes have the same relfilenode? MVCC snapshots don't seem to
protect you against that.
I am not sure there's a problem, but there certainly needs to more
comments explaining why there are none.

Something like the following might be possible:

Backend 1: start reindex concurrently, till phase 4
Backend 2: ExecOpenIndices()
   - RelationGetIndexList (that list is consistent due to mvcc 
snapshots!)
Backend 2: - index_open(old_index) (old relfilenode)
Backend 1: index_concurrent_swap()
   - CommitTransaction()
   - ProcArrayEndTransaction() (changes visible to others henceforth!)
Backend 2: - index_open(new_index)
   = no cache invalidations yet, gets the old relfilenode
Backend 2: ExecInsertIndexTuples()
   = updates the same relation twice, corruptf
Backend 1: stil. in CommitTransaction()
  - AtEOXact_Inval() sends out invalidations

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP


On 14.11.2013 22:10, Simon Riggs wrote:

Includes test extension which allows sequences without gaps - gapless.


I realize this is just for demonstration purposes, but it's worth noting 
that it doesn't actually guarantee that when you use the sequence to 
populate a column in the table, the column would not have gaps. 
Sequences are not transactional, so rollbacks will still produce gaps. 
The documentation is misleading on that point. Without a strong 
guarantee, it's a pretty useless extension.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

On 17 November 2013 12:25 Amit Kapila wrote:
 On Sat, Nov 16, 2013 at 4:35 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 16 November 2013 09:43 Amit Kapila wrote:
  On Fri, Nov 15, 2013 at 10:18 PM, Peter Eisentraut pete...@gmx.net
  wrote:
   On 11/14/13, 2:50 AM, Amit Kapila wrote:
   Find the rebased version attached with this mail.
  
  ereport(ERROR,
 
 (errcode(ERRCODE_CONFIG_FILE_ERROR),
   errmsg(configuration file
 \%s\ contains errors,
  -
 ConfigFileName)));
  +
  + ErrorConfFile)));
 
  The ErrorConfFile prints postgresql.auto.conf only if there is any
  parsing problem with postgresql.auto.conf otherwise it always print
 postgresql.conf because of any other error.
 
Changed to ensure ErrorConfFile contains proper config file name.
Note: I have not asssigned file name incase of error in below loop,
 as file name in gconf is NULL in most cases and moreover this loops
 over
 guc_variables which doesn't contain values/parameters from
 auto.conf. So I don't think it is required to assign ErrorConfFile in
 this loop.
 
 ProcessConfigFile(GucContext context)
 {
 ..
for (i = 0; i  num_guc_variables; i++)
{
struct config_generic *gconf = guc_variables[i];
 
 ..
 }

Code changes are fine. 
If two or three errors are present in the configuration file, it prints the 
last error
Configuration parameter file only. Is it required to be mentioned in the 
documentation?

 
  if any postmaster setting which are set by the alter system command
  which leads to failure of server start, what is the solution to user
  to proceed further to start the server. As it is mentioned that the
  auto.conf file shouldn't be edited manually.
 
 Yeah, but in case of emergency user can change it to get server started.
 Now the question is whether to mention it in documentation, I think we
 can leave this decision to committer. If he thinks that it is better to
 document then I will update it.

Ok fine.

Regards,
Hari babu.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2013-11-18 Thread Fujii Masao

On Mon, Nov 18, 2013 at 6:31 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:

 On 18 November 2013 11:19 Haribabu kommi wrote:
 On 17 November 2013 00:55 Fujii Masao wrote:
  On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   on 15 November 2013 17:26 Magnus Hagander wrote:
  
  On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
  
  On 14 November 2013 23:59 Fujii Masao wrote:
   On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
Please find attached the patch, for adding a new option for
pg_basebackup, to specify a different directory for pg_xlog.
  
   Sounds good! Here are the review comments:
   Don't we need to prevent users from specifying the same
 directory
   in both --pgdata and --xlogdir?
  
  I feel no need to prevent, even if user specifies both --pgdata
 and
  --xlogdir as same directory all the transaction log files will be
  created in the base directory  instead of pg_xlog directory.
  
  
  
  Given how easy it would be to prevent that, I think we should. It
  would be  an easy misunderstanding to specify that when you
 actually
  want it in  wherever/pg_xlog. Specifying that would be redundant
  in the first place,  but people ca do that, but it
  
  would also be very easy to do it by mistake, and you'd end up with
  something that's really bad, including a recursive symlink.
  
  
  
   Presently with initdb also user can specify both data and xlog
   directories as same.
  
   To prevent the data directory and xlog directory as same, there is
 a
   way in windows (_fullpath api) to get absolute path from a relative
   path, but I didn't get any such possibilities in linux.
  
   I didn't find any other way to check it, if anyone have any idea
   regarding this please let me know.
 
  What about make_absolute_path() in miscinit.c?

 The make_absoulte_patch() function gets the current working directory
 and adds The relative path to CWD, this is not giving proper absolute
 path.

 I have added a new function verify_data_and_xlog_dir_same() which will
 change the Current working directory to data directory and gets the CWD
 and the same way for transaction log directory. Compare the both data
 and xlog directories and throw an error. Please check it once.

 Is there any other way to identify that both data and xlog directories
 are pointing to the same Instead of comparing both absolute paths?

 Updated patch attached in the mail.

 Failure when the xlogdir doesn't exist is fixed in the updated patch attached 
 in the mail.

Thanks for updating the patch!

With the patch, when I specify the same directory in both -D and --xlogdir,
I got the following error.

$ cd /tmp
$ pg_basebackup -D hoge --xlogdir=/tmp/hoge -X fetch
pg_basebackup: could not change directory to hoge: No such file or directory

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE TABLE IF NOT EXISTS AS

2013-11-18 Thread Fabrízio de Royes Mello

On Sun, Nov 17, 2013 at 6:05 PM, David E. Wheeler da...@justatheory.comwrote:

 On Nov 16, 2013, at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  Co-worker asked a question I could not answer: Why is IF NOT EXISTS not
 supported by CREATE TABLE AS?
 
  That's an even worse idea than plain CREATE IF NOT EXISTS (which was
  put in over vocal objections from me and some other people).  Not only
  would you not have the faintest clue as to the properties of the table
  afterwards, but no clue as to its contents either.

 You mean that, after running it, one cannot tell whether or not a new
 table was created or not without looking at it? I guess that makes sense,
 though sometimes I like to tell the system to assume that I know what I’m
 doing -- e.g., that either outcome works for me.

 Not essential as a core feature, mind you; I can use DO to accomplish the
 same thing. It’s just a bit more work that way. And perhaps that’s for the
 best.


I'm planning to implement it for the next commit fest (2014-01)...

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA

2013-11-18 Thread Marko Kreen

On Mon, Nov 18, 2013 at 06:18:01PM +1300, David Rowley wrote:
 On Mon, Nov 18, 2013 at 1:01 AM, Marko Kreen mark...@gmail.com wrote:
  I am bit suspicious of performance impact of this patch, but think
  that it's still worthwhile as it decreases code style where single
  string argument is given to printf-style function without %s.
 
 
 This thread probably did not explain very will the point of this patch.
 All this kicked up from an earlier patch which added for alignment in the
 log_line_prefix GUC. After some benchmarks were done on the proposed patch
 for that, it was discovered that replacing appendStringInfoString with
 appendStringInfo gave a big enough slowdown to matter in high volume
 logging scenarios. That patch was revised and the appendStringInfo()'s were
 only used when they were really needed and performance increased again.
 
 I then ran a few benchmarks seen here:
 http://www.postgresql.org/message-id/caaphdvp2ulkpuhjnckonbgg2vxpvxolopzhrgxbs-m0r0v4...@mail.gmail.com
 
 To compare appendStringInfo(si, %s, str); with appendStringinfoString(a,
 str); and appendStringInfo(si, str);
 
 The conclusion to those benchmarks were that appendStringInfoString() was
 the best function to use when no formatting was required, so I submitted a
 patch which replaced appendStringInfo() with appendStringInfoString() where
 that was possible and that was accepted.
 
 appendPQExpBuffer() and appendPQExpBufferStr are the front end versions of
 appendStringInfo, so I spent an hour or so replacing these calls too as it
 should show a similar speedup, though in this case likely the performance
 is less critical, my thinking was more along the lines of, it increases
 performance a little bit with a total of 0 increase in code complexity.

I was actually praising the patch that it reduces complexity,
so it's worth applying even if there is no performance win.

With performance win, it's doubleplus good.

The patch applies and regtests work fine.  So I mark it as
ready for committer.

 The findings in the benchmarks in the link above also showed that we might
 want to look into turning appendStringInfoString into a macro
 around appendBinaryStringInfo() to allow us to skip the strlen() call for
 string constants... It's unclear at the moment if this would be a good idea
 or much of an improvement, so it was left for something to think about for
 the future.

In any case it should be separate patch.  Perhaps applying the same
optimization for all such functions.

-- 
marko



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-18 Thread Peter Eisentraut

On 11/15/13, 6:15 PM, Josh Berkus wrote:
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.

We could do something like SQL/XML and specify the level of validity
in a typmod, e.g., json(loose), json(strict), etc.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Force optimizer to use hash/nl/merge join?

2013-11-18 Thread Zhan Li

Hi All,

Is there any way to force the optimizer to use a specific join operator?
For example, in SQL Server, I can do this way

select * from (A inner hash join B on A.a = B.b) inner loop join C on A.a =
C.c

I did some search but didn't find PostgreSQL had similar join hints except
for enable_* setting which doesn't help in my case.

Best Regards,
Zhan

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-11-18 Thread Amit Kapila

On Mon, Nov 18, 2013 at 6:28 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 17 November 2013 12:25 Amit Kapila wrote:
 On Sat, Nov 16, 2013 at 4:35 PM, Haribabu kommi
   Find the rebased version attached with this mail.
  
  ereport(ERROR,
 
 (errcode(ERRCODE_CONFIG_FILE_ERROR),
   errmsg(configuration file
 \%s\ contains errors,
  -
 ConfigFileName)));
  +
  + ErrorConfFile)));
 
  The ErrorConfFile prints postgresql.auto.conf only if there is any
  parsing problem with postgresql.auto.conf otherwise it always print
 postgresql.conf because of any other error.

Changed to ensure ErrorConfFile contains proper config file name.
Note: I have not asssigned file name incase of error in below loop,
 as file name in gconf is NULL in most cases and moreover this loops
 over
 guc_variables which doesn't contain values/parameters from
 auto.conf. So I don't think it is required to assign ErrorConfFile in
 this loop.

 ProcessConfigFile(GucContext context)
 {
 ..
for (i = 0; i  num_guc_variables; i++)
{
struct config_generic *gconf = guc_variables[i];

 ..
 }

 Code changes are fine.
 If two or three errors are present in the configuration file, it prints the 
 last error
 Configuration parameter file only. Is it required to be mentioned in the 
 documentation?

Do you mean to say parsing errors or some run-time error, could you
explain with example?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Force optimizer to use hash/nl/merge join?

2013-11-18 Thread Amit Kapila

On Mon, Nov 18, 2013 at 7:58 PM, Zhan Li zhanl...@gmail.com wrote:
 Hi All,

 Is there any way to force the optimizer to use a specific join operator? For
 example, in SQL Server, I can do this way

 select * from (A inner hash join B on A.a = B.b) inner loop join C on A.a =
 C.c

 I did some search but didn't find PostgreSQL had similar join hints except
 for enable_* setting which doesn't help in my case.

PostgreSQL doesn't have join hints.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2013-11-18 Thread Robert Haas

On Thu, Nov 14, 2013 at 12:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 2) Allow foreign tables to have CHECK constraints
 Like NOT NULL, I think we don't need to enforce the check duroing
 INSERT/UPDATE against foreign table.

 Really?  It's one thing to say that somebody who adds a CHECK constraint
 to a foreign table is responsible to make sure that the foreign data will
 satisfy the constraint.  It feels like a different thing to say that ALTER
 TABLE ADD CONSTRAINT applied to a parent table will silently assume that
 some child table that happens to be foreign doesn't need any enforcement.

 Perhaps more to the point, inheritance trees are the main place where the
 planner depends on the assumption that CHECK constraints represent
 reality.  Are we really prepared to say that it's the user's fault if the
 planner generates an incorrect plan on the strength of a CHECK constraint
 that's not actually satisfied by the foreign data?  If so, that had better
 be documented by this patch.  But for a project that refuses to let people
 create a local CHECK or FOREIGN KEY constraint without mechanically
 checking it, it seems pretty darn weird to be so laissez-faire about
 constraints on foreign data.

I can see both sides of this issue.  We certainly have no way to force
the remote side to enforce CHECK constraints defined on the local
side, because the remote side could also be accepting writes from
other sources that don't have any matching constraint. But having said
that, I can't see any particularly principled reason why we shouldn't
at least check the new rows we insert ourselves.  After all, we could
be in the situation proposed by KaiGai Kohei, where the foreign data
wrapper API is being used as a surrogate storage engine API - i.e.
there are no writers to the foreign side except ourselves.  In that
situation, it would seem odd to randomly fail to enforce the
constraints.

On the other hand, the performance costs of checking every row bound
for the remote table could be quite steep.  Consider an update on an
inheritance hierarchy that sets a = a + 1 for every row.  If we don't
worry about verifying that the resulting rows satisfy all local-side
constraints, we can potentially ship a single update statement to the
remote server and let it do all the work there.  But if we DO have to
worry about that, then we're going to have to ship every updated row
over the wire in at least one direction, if not both.  If the purpose
of adding CHECK constraints was to enable constraint exclusion, that's
a mighty steep price to pay for it.

I think it's been previously proposed that we have some version of a
CHECK constraint that effectively acts as an assertion for query
optimization purposes, but isn't actually enforced by the system.  I
can see that being useful in a variety of situations, including this
one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-18 Thread Merlin Moncure

On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote:
 I don't think any name that doesn't begin with json is acceptable. I could
 live with jsonb. It has the merit of brevity, but maybe it's a tad too
 close to json to be the right answer.

I think that seems right.  Couple thoughts:

*) Aside from the text in and out routines, how is 'jsbonb' different
from the coming 'nested hstore'?   Enough to justify two code bases?

*) How much of the existing json API has to be copied over to the
jsonb type and how exactly is that going to happen?  For example, I
figure we'd need a record_to_jsonb etc. for sure, but do we also
need a jsonb_each()...can't we overload instead?

Maybe we can cheat a little bit overload the functions so that one the
existing APIs (hstore or json) can be recovered -- only adding what
minimal functionality needs to be added to handle the type distinction
(mostly on serialization routines and casts).  What I'm driving at
here is that it would be nice if the API was not strongly bifurcated:
this would cause quite a bit of mindspace confusion.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Review: pre-commit triggers

2013-11-18 Thread Ian Lawrence Barwick

Review for Andrew Dunstan's patch in CF 2013-11:

  https://commitfest.postgresql.org/action/patch_view?id=1312

This review is more from a usage point of view, I would like
to spend more time looking at the code but only so many hours in a day,
etcetera; I hope this is useful as-is.


Submission review
-
* Is the patch in a patch format which has context?
Yes

* Does it apply cleanly to the current git master?
Yes. No new files are introduced by this patch.

* Does it include reasonable tests, necessary doc patches, etc?
Documentation patches included, but no tests. (I have a feeling it
might be necessary to add a FAQ somewhere as to why there's
no transaction rollback trigger).


Usability review

* Does the patch actually implement that?
Yes.

* Do we want that?
There is an item in the todo-list Add database and transaction-level triggers;
the linked thread:

   http://archives.postgresql.org/pgsql-hackers/2008-05/msg00620.php

from 2008 doesn't seem too receptive to the idea, but this time round
there don't seem to be any particular objections. Personally I don't
have a use-case but it certainly looks like a useful compatibility
feature when porting from other databases. Andrew mentions
porting from Firebird; for reference this is the relevant Firebird
documentation:

  http://www.firebirdsql.org/refdocs/langrefupd21-ddl-trigger.html

* Do we already have it?
No.

* Does it follow SQL spec, or the community-agreed behavior?
Not sure about the SQL spec. The Compatibility section of the
CREATE TRIGGER doc page doesn't mention anything along these lines.

  
http://www.postgresql.org/docs/9.3/static/sql-createtrigger.html#SQL-CREATETRIGGER-COMPATIBILITY

* Does it include pg_dump support (if applicable)?
Yes


Feature test


* Does the feature work as advertised?
Yes.

* Are there corner cases the author has failed to consider?

I'm not sure whether it's intended behaviour, but if the
commit trigger itself fails, there's an implicit rollback, e.g.:

  postgres=# BEGIN ;
  BEGIN
  postgres=*# INSERT INTO foo (id) VALUES (1);
  INSERT 0 1
  postgres=*# COMMIT ;
  NOTICE:  Pre-commit trigger called
  ERROR:  relation bar does not exist
  LINE 1: SELECT foo FROM bar
 ^
  QUERY:  SELECT foo FROM bar
  CONTEXT:  PL/pgSQL function pre_commit_trigger() line 4 at EXECUTE statement
  postgres=#

I'd expect this to lead to a failed transaction block,
or at least some sort of notice that the transaction itself
has been rolled back.

Note that 'DROP FUNCTION broken_trigger_function() CASCADE' will remove
the broken function without having to resort to single user mode so
it doesn't seem like an error in the commit trigger itself will
necessarily lead to an intractable situation.


* Are there any assertion failures or crashes?
No.


Performance review
--

* Does the patch slow down simple tests?
No.

* If it claims to improve performance, does it?
n/a

* Does it slow down other things?
No.


Coding review
-

* Does it follow the project coding guidelines?
Yes.

* Are there portability issues?
I don't see any.

* Will it work on Windows/BSD etc?
Tested on OS X and Linux. I don't see anything, e.g. system calls, which
might prevent it working on Windows.

* Does it do what it says, correctly?
As far as I can tell, yes.

* Does it produce compiler warnings?
No.

* Can you make it crash?
So far, no.


Architecture review
---

* Is everything done in a way that fits together coherently with other
features/modules?
The changes are not all that complex and I don't see any issues, however
I'm not very familiar with that area of the code (but hey, that's why I'm
taking a look) so I might be missing something.

* Are there interdependencies that can cause problems?
I can't see any.


Additional notes


A sample transaction commit trigger:

  CREATE OR REPLACE FUNCTION pre_commit_trigger()
RETURNS EVENT_TRIGGER
LANGUAGE 'plpgsql'
  AS $$
  BEGIN
RAISE NOTICE 'Pre-commit trigger called';
  END;
  $$;

  CREATE EVENT TRIGGER trg_pre_commit ON transaction_commit
 EXECUTE PROCEDURE pre_commit_trigger();


Some relevant links:

  http://www.postgresql.org/docs/9.3/interactive/event-triggers.html
  http://www.postgresql.org/docs/9.3/interactive/sql-createeventtrigger.html
  http://www.postgresql.org/docs/9.3/interactive/sql-prepare-transaction.html
  http://en.wikipedia.org/wiki/Database_trigger


Regards

Ian Barwick


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-18 Thread Andrew Dunstan



On 11/18/2013 09:38 AM, Merlin Moncure wrote:

On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote:

I don't think any name that doesn't begin with json is acceptable. I could
live with jsonb. It has the merit of brevity, but maybe it's a tad too
close to json to be the right answer.

I think that seems right.  Couple thoughts:

*) Aside from the text in and out routines, how is 'jsbonb' different
from the coming 'nested hstore'?   Enough to justify two code bases?


The discussion has been around making a common library that would be 
used for both.





*) How much of the existing json API has to be copied over to the
jsonb type and how exactly is that going to happen?  For example, I
figure we'd need a record_to_jsonb etc. for sure, but do we also
need a jsonb_each()...can't we overload instead?



Overloading is what I was planning to do.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 14.10.2013 07:12, Peter Geoghegan wrote:

On Wed, Oct 9, 2013 at 1:11 PM, Peter Geoghegan p...@heroku.com wrote:

Unfortunately, I have a very busy schedule in the month ahead,
including travelling to Ireland and Japan, so I don't think I'm going
to get the opportunity to work on this too much. I'll try and produce
a V4 that formally proposes some variant of my ideas around visibility
of locked tuples.


V4 is attached.

Most notably, this adds the modifications to HeapTupleSatisfiesMVCC(),
though they're neater than in the snippet I sent earlier.

There is also some clean-up around row-level locking. That code has
been simplified. I also try and handle serialization failures in a
better way, though that really needs the attention of a subject matter
expert.

There are a few additional XXX comments highlighting areas of concern,
particularly around serializable behavior. I've deferred making higher
isolation levels care about wrongfully relying on the special
HeapTupleSatisfiesMVCC() exception (e.g. they won't throw a
serialization failure, mostly because I couldn't decide on where to do
the test on time prior to travelling tomorrow).

I've added code to do heap_prepare_insert before value locks are held.
Whatever our eventual value locking implementation, that's going to be
a useful optimization. Though unfortunately I ran out of time to give
this the scrutiny it really deserves, I suppose that it's something
that we can return to later.

I ask that reviewers continue to focus on concurrency issues and broad
design issues, and continue to defer discussion about an eventual
value locking implementation. I continue to think that that's the most
useful way of proceeding for the time being. My earlier points about
probable areas of concern [1] remain a good place for reviewers to
start.


I think it's important to recap the design goals of this. I don't think 
these have been listed before, so let me try:


* It should be usable and perform well for both large batch updates and 
small transactions.


* It should perform well both when there are no duplicates, and when 
there are lots of duplicates


And from that follows some finer requirements:

* Performance when there are no duplicates should be close to raw INSERT 
performance.


* Performance when all rows are duplicates should be close to raw UPDATE 
performance.


* We should not leave behind large numbers of dead tuples in either case.

Anything else I'm missing?


What about exclusion constraints? I'd like to see this work for them as 
well. Currently, exclusion constraints are checked after the tuple is 
inserted, and you abort if the constraint was violated. We could still 
insert the heap and index tuples first, but instead of aborting on 
violation, we would kill the heap tuple we already inserted and retry. 
There are some complications there, like how to wake up any other 
backends that are waiting to grab a lock on the tuple we just killed, 
but it seems doable.


That would, however, perform badly and leave garbage behind if there are 
duplicates. A refinement of that would be to first check for constraint 
violations, then insert the tuple, and then check again. That would 
avoid the garbage in most cases, but would perform much more poorly when 
there are no duplicates, because it needs two index scans for every 
insertion. A further refinement would be to keep track of how many 
duplicates there have been recently, and switch between the two 
strategies based on that.


That cost of doing two scans could be alleviated by using 
markpos/restrpos to do the second scan. That is presumably cheaper than 
starting a whole new scan with the same key. (markpos/restrpos don't 
currently work for non-MVCC snapshots, so that'd need to be fixed, though)


And that detour with exclusion constraints takes me back to the current 
patch :-). What if you implemented the unique check in a similar fashion 
too (when doing INSERT ON DUPLICATE KEY ...)? First, scan for a 
conflicting key, and mark the position. Then do the insertion to that 
position. If the insertion fails because of a duplicate key (which was 
inserted after we did the first scan), mark the heap tuple as dead, and 
start over. The indexam changes would be quite similar to the changes 
you made in your patch, but instead of keeping the page locked, you'd 
only hold a pin on the target page (if even that). The first indexam 
call would check that the key doesn't exist, and remember the insert 
position. The second call would re-find the previous position, and 
insert the tuple, checking again that there really wasn't a duplicate 
key violation. The locking aspects would be less scary than your current 
patch.


I'm not sure if that would perform as well as your current patch. I must 
admit your current approach is pretty optimal performance-wise. But I'd 
like to see it, and that would be a solution for exclusion constraints 
in any case.


One fairly limitation with your current approach

Re: [HACKERS] inherit support for foreign tables

2013-11-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 14, 2013 at 12:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 2) Allow foreign tables to have CHECK constraints
 Like NOT NULL, I think we don't need to enforce the check duroing
 INSERT/UPDATE against foreign table.

 Really?

 I think it's been previously proposed that we have some version of a
 CHECK constraint that effectively acts as an assertion for query
 optimization purposes, but isn't actually enforced by the system.  I
 can see that being useful in a variety of situations, including this
 one.

Yeah, I think it would be much smarter to provide a different syntax
to explicitly represent the notion that we're only assuming the condition
is true, and not trying to enforce it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

On 18 November 2013 18:45 Fujii Masao wrote:
 On Mon, Nov 18, 2013 at 6:31 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
 
  On 18 November 2013 11:19 Haribabu kommi wrote:
  On 17 November 2013 00:55 Fujii Masao wrote:
   On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
on 15 November 2013 17:26 Magnus Hagander wrote:
   
   On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
   
   On 14 November 2013 23:59 Fujii Masao wrote:
On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 Please find attached the patch, for adding a new option for
 pg_basebackup, to specify a different directory for pg_xlog.
   
Sounds good! Here are the review comments:
Don't we need to prevent users from specifying the same
  directory
in both --pgdata and --xlogdir?
   
   I feel no need to prevent, even if user specifies both --pgdata
  and
   --xlogdir as same directory all the transaction log files will
   be created in the base directory  instead of pg_xlog directory.
   
   
   
   Given how easy it would be to prevent that, I think we should.
 It
   would be  an easy misunderstanding to specify that when you
  actually
   want it in  wherever/pg_xlog. Specifying that would be
   redundant in the first place,  but people ca do that, but it
   
   would also be very easy to do it by mistake, and you'd end up
   with something that's really bad, including a recursive symlink.
   
   
   
Presently with initdb also user can specify both data and xlog
directories as same.
   
To prevent the data directory and xlog directory as same, there
is
  a
way in windows (_fullpath api) to get absolute path from a
relative path, but I didn't get any such possibilities in linux.
   
I didn't find any other way to check it, if anyone have any idea
regarding this please let me know.
  
   What about make_absolute_path() in miscinit.c?
 
  The make_absoulte_patch() function gets the current working
 directory
  and adds The relative path to CWD, this is not giving proper
 absolute
  path.
 
  I have added a new function verify_data_and_xlog_dir_same() which
  will change the Current working directory to data directory and gets
  the CWD and the same way for transaction log directory. Compare the
  both data and xlog directories and throw an error. Please check it
 once.
 
  Is there any other way to identify that both data and xlog
  directories are pointing to the same Instead of comparing both
 absolute paths?
 
  Updated patch attached in the mail.
 
  Failure when the xlogdir doesn't exist is fixed in the updated patch
 attached in the mail.
 
 Thanks for updating the patch!
 
 With the patch, when I specify the same directory in both -D and --
 xlogdir, I got the following error.
 
 $ cd /tmp
 $ pg_basebackup -D hoge --xlogdir=/tmp/hoge -X fetch
 pg_basebackup: could not change directory to hoge: No such file or
 directory

Thanks. After finding the xlog directory absolute path returning back to 
current working 
Directory is missed, because of this reason it is failing while finding the 
absolute
Path for data directory. Apart from this change the code is reorganized a bit.

Updated patch is attached in the mail. Please review it once.

Regards,
Hari babu.


UserSpecifiedxlogDir_v5.patch
Description: UserSpecifiedxlogDir_v5.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

On 18 November 2013 20:01 Amit Kapila wrote:
 On Mon, Nov 18, 2013 at 6:28 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 17 November 2013 12:25 Amit Kapila wrote:
  On Sat, Nov 16, 2013 at 4:35 PM, Haribabu kommi
Find the rebased version attached with this mail.
   
   ereport(ERROR,
  
  (errcode(ERRCODE_CONFIG_FILE_ERROR),
errmsg(configuration
 file
  \%s\ contains errors,
   -
  ConfigFileName)));
   +
   + ErrorConfFile)));
  
   The ErrorConfFile prints postgresql.auto.conf only if there is
   any parsing problem with postgresql.auto.conf otherwise it always
   print
  postgresql.conf because of any other error.
 
 Changed to ensure ErrorConfFile contains proper config file name.
 Note: I have not asssigned file name incase of error in below
  loop, as file name in gconf is NULL in most cases and moreover this
  loops over
  guc_variables which doesn't contain values/parameters
  from auto.conf. So I don't think it is required to assign
  ErrorConfFile in this loop.
 
  ProcessConfigFile(GucContext context) { ..
 for (i = 0; i  num_guc_variables; i++)
 {
 struct config_generic *gconf = guc_variables[i];
 
  ..
  }
 
  Code changes are fine.
  If two or three errors are present in the configuration file, it
 prints the last error
  Configuration parameter file only. Is it required to be mentioned in
 the documentation?
 
 Do you mean to say parsing errors or some run-time error, could you
 explain with example?

LOG:  parameter shared_buffers cannot be changed without restarting the server
LOG:  parameter port cannot be changed without restarting the server
LOG:  configuration file 
/home/hari/installation/bin/../../data/postgresql.auto.conf contains errors; 
unaffected changes were applied

The shared_buffers parameter is changed in postgresql.conf and port is changed 
in postgresql.auto.conf.
The error file displays the last error occurred file.

Is it required to mention the above behavior in the document?

Regards,
Hari babu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to json out functions by using cstring_to_text_with_len instead of cstring_to_text

2013-11-18 Thread Robert Haas

On Thu, Nov 14, 2013 at 2:18 AM, David Rowley dgrowle...@gmail.com wrote:
 Hi,

 Here's a small patch which should speedup json out functions a little bit by
 removing a call to strlen for which could be a long string.
 The length of the string is already known by the StringInfoData, so there's
 no point in letting cstring_to_text() loop over the whole string again.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Can we add sample systemd service file to git repo?

2013-11-18 Thread Will Crawford

On 12 November 2013 20:54, Nigel Heron nhe...@querymetrics.com wrote:
 On Tue, Nov 12, 2013 at 11:47 AM, Devrim GÜNDÜZ dev...@gunduz.org wrote:
...
 http://svn.pgrpms.org/browser/rpm/redhat/9.3/postgresql/F-19/postgresql-9.3.service
  is an example of what we use in the RPMs. (if website fails, please just 
 reload)

 Attached is a modified version that will work with the compile defaults.

 Hi, should we put PGPORT in the systemd file without an easy way to override 
 it?
 As an example, when yum updating minor versions on fedora 18 (using
 the yum.postgresql.org rpms), any changes to the systemd service file
 are overwritten by the new rpm and the port is forced back to 5432.
 This makes having pg9.2 and pg9.3 on the same box conflict after each
 minor version update.

IIRC there's a %config(noreplace) directive in the .spec that can
cause the new file to be saved with an extension (.rpmnew?) instead of
overwriting the old one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

On 2013-09-25 12:31:20 +0300, Heikki Linnakangas wrote:
 Hmm, some of those are trivial, but others, rewrite_heap_tuple() are
 currently only passed the HeapTuple, with no reference to the page where the
 tuple came from. Instead of changing code to always pass that along with a
 tuple, I think we should add a boolean to HeapTupleData, to indicate if the
 tuple came from a mature page. If the flag is set, the tuple should be
 considered visible to everyone, without looking at the xmin/xmax. This might
 affect extensions, although usually external C code that have to deal with
 HeapTuples will use functions like heap_form_tuple to do so, and won't need
 to deal with low-level stuff or visibility themselves.
 
 Attached is a new version, which adds that field to HeapTupleData. Most of
 the issues on you listed above have been fixed, plus a bunch of other bugs I
 found myself. The bug that Jeff ran into with his count.pl script has also
 been fixed.

This seems a bit hacky to me. Change a widely used struct because a few
functions don't get passed enough information? Visibilitychecks are
properly done with a buffer passed along; that some places have cut
corners around that doesn't mean we have to continue to do so. The
pullups around rs_pagemature are imo indicative of this (there's even a
bug because a page can be all visible before it's mature, right? That
could then cause an assert somewhere down the line when we check page
and tuple are coherent).

Ok, making another scan through this:
* Why can we do PageSetLSN(page, RangeSwitchLSN) in heap_* when the action
  doesn't need WAL? And why is that correct? I can see doing that in
  vacuum itself, but doing it when we write *new* data to the page?
* heap_freeze_page(): The PageSetLSN(page, RangeSwitchLSN) if there's
  nothing to log is not WAL logged. Which means that if we crash it
  won't necessarily be set, so the VM and the heap lsn might get out of
  sync. That probably doesn't have any bad effects, but imo deserves a
  comment.
* heap_freeze_page(): PageUpdateNeedsFreezing() can now be true before
  and after. That's a bit confusing :/
* GetSafeClogTruncationPoint truncates the xid-lsn ranges, but there's
  also an, uncommented, TruncateXidLSNRanges. At leas how they work together
  should be described better.
* There's quite some FIXMEs around
* Let's move the xid-lsn ranges code from GetNewTransactionId() into
  it's own function.
* PageMatureLSN and RangeSwitchLSN should be documented somewhere. They
  are implicitly, but they are used widely enough that that doesn't seem
  sufficient.
* pg_controldata should output pageMatureLSN

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] nested hstore patch

Hi,

On 2013-11-12 22:35:31 +0400, Teodor Sigaev wrote:
 Attatched patch adds nesting feature, types (string, boll and numeric
 values), arrays and scalar to hstore type.

I took a quick peek at this:
* You cannot simply catch and ignore errors by doing
+   PG_TRY();
+   {
+   n = DatumGetNumeric(DirectFunctionCall3(numeric_in, 
CStringGetDatum(s-val), 0, -1));
+   }
+   PG_CATCH();
+   {
+   n = NULL;
+   }
+   PG_END_TRY();
 That skips cleanup and might ignore some errors (think memory allocation
 failures). But why do you even want to silently ignore errors there?
* Shouldn't the checks for v-size be done before filling the
  datastructures in makeHStoreValueArray() and makeHStoreValuePairs()?
* could you make ORDER_PAIRS() a function instead of a macro? It's
  pretty long and there's no reason not to use a function.
* You call numeric_recv via recvHStoreValue via recvHStore without
  checks on the input length. That seems - without having checked it in
  detail - a good way to read unrelated memory. Generally ISTM the input
  needs to be more carefully checked in the whole recv function.
* There's quite some new new, completely uncommented, code. Especially
  in hstore_op.c.
* the _PG_init you added should probably do a EmitWarningsOnPlaceholders().
* why does hstore need it's own atoi?
* shouldn't all the function prototypes be marked as externs?
* Lots of trailing whitespaces, quite some long lines, cuddly braces,
  ...
* I think hstore_compat.c's header should be updated.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA


On 18.11.2013 15:40, Marko Kreen wrote:

On Mon, Nov 18, 2013 at 06:18:01PM +1300, David Rowley wrote:

On Mon, Nov 18, 2013 at 1:01 AM, Marko Kreen mark...@gmail.com wrote:

I am bit suspicious of performance impact of this patch, but think
that it's still worthwhile as it decreases code style where single
string argument is given to printf-style function without %s.



This thread probably did not explain very will the point of this patch.
All this kicked up from an earlier patch which added for alignment in the
log_line_prefix GUC. After some benchmarks were done on the proposed patch
for that, it was discovered that replacing appendStringInfoString with
appendStringInfo gave a big enough slowdown to matter in high volume
logging scenarios. That patch was revised and the appendStringInfo()'s were
only used when they were really needed and performance increased again.

I then ran a few benchmarks seen here:
http://www.postgresql.org/message-id/CAApHDvp2uLKPuHJnCkonBGG2VXPvxoLOPzhrGXBS-M0r0v4wSA%40mail.gmail.com

To compare appendStringInfo(si, %s, str); with appendStringinfoString(a,
str); and appendStringInfo(si, str);

The conclusion to those benchmarks were that appendStringInfoString() was
the best function to use when no formatting was required, so I submitted a
patch which replaced appendStringInfo() with appendStringInfoString() where
that was possible and that was accepted.

appendPQExpBuffer() and appendPQExpBufferStr are the front end versions of
appendStringInfo, so I spent an hour or so replacing these calls too as it
should show a similar speedup, though in this case likely the performance
is less critical, my thinking was more along the lines of, it increases
performance a little bit with a total of 0 increase in code complexity.


I was actually praising the patch that it reduces complexity,
so it's worth applying even if there is no performance win.

With performance win, it's doubleplus good.

The patch applies and regtests work fine.  So I mark it as
ready for committer.


Ok, committed.

PS. I'm not 100% convinced that this kind of code churn is worthwhile. 
It doesn't make any difference to readability in my eyes, in general. In 
some cases it does, but in others it messes with indentation 
(describeOneTables in src/bin/psql/describe.c). It also makes 
backpatching more laborious. But it's done now, hopefully this is a 
one-off thing.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER FREEZE

On Mon, Nov 18, 2013 at 09:22:58PM +1300, David Rowley wrote:
 So now I'm wondering what the next move should be for this patch?
 
 a. Are we waiting on Robert's patch to be committed before we can apply 
 Thomas's
 cluster with freeze as default?
 b. Are we waiting on me reviewing one or both of the patches? Which one?
 
 I think probably it sounds safer not to make freeze the default, but then if 
 we
 release 9.4 with CLUSTER FREEZE then in 9.5/10 decide it should be the default
 then we then have a redundant syntax to support for ever and ever.
 
 Decision time?

Yes, we probably should make a decision, unless Robert's idea can be
implemented.  We have to balance the ease of debugging MVCC failures
with the interface we give to the user community.

From my perspective, I don't see how we can justify the addition of a
FREEZE option to CLUSTER.  If we explain that you would always use
FREEZE unless you want to preserve MVCC information for later debugging,
users will reply with Huh, why would I want that?  MVCC debugging is
just too rare an event for them to understand its usefulness.

If we do add FREEZE, all we would be doing is delaying the time when all
CLUSTER operations will use FREEZE, and hence debugging will be just as
difficult.  My point is that will full knowledge, everyone would use
FREEZE unless they expect MVCC bugs, which is going to be an almost zero
population.

Many MVCC failures are reproducible, so if we provide a C define that
disables the freezing so MVCC information can be preserved, that might
be good enough.  Developers could enable the macro, and the macro could
be used in other places where MVCC information is lost.  

This will make some MVCC harder, and will perhaps allow some MVCC bugs
to exist longer.

I believe there are other cases where rows could be frozen but we have
avoided it for MVCC debugging reasons.  Another idea would be the
addition of a GUC that can disable optimistic freezing. This could be
enabled by sites that suspect MVCC bugs.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER FREEZE

On 2013-11-18 11:39:44 -0500, Bruce Momjian wrote:
 On Mon, Nov 18, 2013 at 09:22:58PM +1300, David Rowley wrote:
  So now I'm wondering what the next move should be for this patch?
  
  a. Are we waiting on Robert's patch to be committed before we can apply 
  Thomas's
  cluster with freeze as default?
  b. Are we waiting on me reviewing one or both of the patches? Which one?
  
  I think probably it sounds safer not to make freeze the default, but then 
  if we
  release 9.4 with CLUSTER FREEZE then in 9.5/10 decide it should be the 
  default
  then we then have a redundant syntax to support for ever and ever.
  
  Decision time?
 
 Yes, we probably should make a decision, unless Robert's idea can be
 implemented.  We have to balance the ease of debugging MVCC failures
 with the interface we give to the user community.

Imo that patch really doesn't need too much further work.

 From my perspective, I don't see how we can justify the addition of a
 FREEZE option to CLUSTER.  If we explain that you would always use
 FREEZE unless you want to preserve MVCC information for later debugging,
 users will reply with Huh, why would I want that?

I tend to agree that we should work towards not needing that option.

 Many MVCC failures are reproducible, so if we provide a C define that
 disables the freezing so MVCC information can be preserved, that might
 be good enough.  Developers could enable the macro, and the macro could
 be used in other places where MVCC information is lost.

 This will make some MVCC harder, and will perhaps allow some MVCC bugs
 to exist longer.

 I believe there are other cases where rows could be frozen but we have
 avoided it for MVCC debugging reasons.  Another idea would be the
 addition of a GUC that can disable optimistic freezing. This could be
 enabled by sites that suspect MVCC bugs.

I think that'd be an enormous failure. You don't usually need to look at
those because there's a bug in postgres' mvcc logic but somewhere
else (application, other postgres code). And looking at the mvcc
information helps you to narrow it down, so you have a chance of
actually finding a reproducible bug.
Besides, in many of the !rewrite cases it's far from clear in which
cases it's a benefit to freeze earlier.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: pre-commit triggers

2013-11-18 Thread Andrew Dunstan



On 11/18/2013 09:39 AM, Ian Lawrence Barwick wrote:

Review for Andrew Dunstan's patch in CF 2013-11:

   https://commitfest.postgresql.org/action/patch_view?id=1312

This review is more from a usage point of view, I would like
to spend more time looking at the code but only so many hours in a day,
etcetera; I hope this is useful as-is.



Many thanks for the fast review.



* Does it include reasonable tests, necessary doc patches, etc?
Documentation patches included, but no tests. (I have a feeling it
might be necessary to add a FAQ somewhere as to why there's
no transaction rollback trigger).


I'll add some tests very shortly, and see about adding that to the docs.



* Are there corner cases the author has failed to consider?

I'm not sure whether it's intended behaviour, but if the
commit trigger itself fails, there's an implicit rollback, e.g.:

   postgres=# BEGIN ;
   BEGIN
   postgres=*# INSERT INTO foo (id) VALUES (1);
   INSERT 0 1
   postgres=*# COMMIT ;
   NOTICE:  Pre-commit trigger called
   ERROR:  relation bar does not exist
   LINE 1: SELECT foo FROM bar
  ^
   QUERY:  SELECT foo FROM bar
   CONTEXT:  PL/pgSQL function pre_commit_trigger() line 4 at EXECUTE statement
   postgres=#

I'd expect this to lead to a failed transaction block,
or at least some sort of notice that the transaction itself
has been rolled back.



I'll check on this.



Note that 'DROP FUNCTION broken_trigger_function() CASCADE' will remove
the broken function without having to resort to single user mode so
it doesn't seem like an error in the commit trigger itself will
necessarily lead to an intractable situation.



Given that, do we want to keep the bar on these operating in single user 
mode? I can easily drop it and just document this way out of difficulty.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning recovery.conf into GUCs

Hi,

On 2013-11-15 22:38:05 -0500, Jaime Casanova wrote:
 sorry, i clearly misunderstood you. attached a rebased patch with
 those functions removed.

* --write-standby-enable seems to loose quite some functionality in
  comparison to --write-recovery-conf since it doesn't seem to set
  primary_conninfo, standby anymore.
* CheckRecoveryReadyFile() doesn't seem to be a very descriptive
  function name.
* Why does StartupXLOG() now use ArchiveRecoveryRequested 
  StandbyModeRequested instead of just the former?
* I am not sure I like recovery.trigger as a name. It seems to close
  to what I've seen people use to trigger failover and too close to
  trigger_file.
* You check for a trigger file in a way that's not compatible with it
  being NULL. Why did you change that?
-   if (TriggerFile == NULL)
+   if (!trigger_file[0])
* You made recovery_target_inclusive/pause_at_recovery_target PGC_SIGHUP
  - did you review that actually works? Imo that should be changed in a
  separate commit.
* Maybe we should rename names like pause_at_recovery_target to
  recovery_pause_at_target? Since we already force everyone to bother
  changing their setup...
* the description of archive_cleanup_command seems wrong to me.
* Why did you change some of the recovery gucs to lowercase names, but
  left out XLogRestoreCommand?
* Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being
  really strangely formatted (multiline :? inside a function?) it
  doesn't a) seem to be correct to ignore potential memory allocation
  errors by just switching back into the context that just errored out,
  and continue to work there b) forgo cleanup by just continuing as if
  nothing happened. That's unlikely to be acceptable.
* You access recovery_target_name[0] unconditionally, although it's
  intialized to NULL.
* Generally, ISTM there's too much logic in the assign hooks.
* Why do you include xlog_internal.h in guc.c and not xlog.h?

Ok, I think that's enough for now ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
 On 11/15/13, 6:15 PM, Josh Berkus wrote:
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.
 
 We could do something like SQL/XML and specify the level of validity
 in a typmod, e.g., json(loose), json(strict), etc.

Doesn't work; with XML, the underlying storage format didn't change.
With JSONB, it will ... so changing the typemod would require a total
rewrite of the table.  That's a POLS violation if I ever saw one

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

On 2013-11-18 15:01:42 +, Haribabu kommi wrote:
  
  /*
 + * Returns the malloced string of containing current working directory.
 + * The caller has to take care of freeing the memory.
 + * On failure exits with error code.
 + */
 +static char *
 +get_current_working_dir()
 +{
 + char   *buf;
 + size_t  buflen;
 +
 + buflen = MAXPGPATH;
 + for (;;)
 + {
 + buf = pg_malloc(buflen);
 + if (getcwd(buf, buflen))
 + break;
 + else if (errno == ERANGE)
 + {
 + pg_free(buf);
 + buflen *= 2;
 + continue;
 + }
 + else
 + {
 + pg_free(buf);
 + fprintf(stderr,
 + _(%s: could not get current working 
 directory: %s\n),
 + progname, strerror(errno));
 + exit(1);
 + }
 + }
 +
 + return buf;
 +}
 +
 +/*
 + * calculates the absolute path for the given path. The output absolute path
 + * is a malloced string. The caller needs to take care of freeing the memory.
 + */
 +static char *
 +get_absolute_path(const char *input_path)
 +{
 + char   *pwd = NULL;
 + char   *abspath = NULL;
 +
 + /* Getting the present working directory */
 + pwd = get_current_working_dir();
 +
 + if (chdir(input_path)  0)
 + {
 + /* Directory doesn't exist */
 + if (errno == ENOENT)
 + return NULL;
 +
 + fprintf(stderr, _(%s: could not change directory to \%s\: 
 %s\n),
 + progname, input_path, strerror(errno));
 + exit(1);
 + }
 +
 + abspath = get_current_working_dir();
 +
 + /* Returning back to old working directory */
 + if (chdir(pwd)  0)
 + {
 + fprintf(stderr, _(%s: could not change directory to \%s\: 
 %s\n),
 + progname, pwd, strerror(errno));
 + exit(1);
 + }
 +
 + pg_free(pwd);
 + return abspath;
 +}

This looks like it should be replaced by moving make_absolute_path from
pg_regress.c to path.[ch] and then use that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2013-11-18 Thread Fujii Masao

On Tue, Nov 19, 2013 at 12:01 AM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 18 November 2013 18:45 Fujii Masao wrote:
 On Mon, Nov 18, 2013 at 6:31 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
 
  On 18 November 2013 11:19 Haribabu kommi wrote:
  On 17 November 2013 00:55 Fujii Masao wrote:
   On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
on 15 November 2013 17:26 Magnus Hagander wrote:
   
   On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
   
   On 14 November 2013 23:59 Fujii Masao wrote:
On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 Please find attached the patch, for adding a new option for
 pg_basebackup, to specify a different directory for pg_xlog.
   
Sounds good! Here are the review comments:
Don't we need to prevent users from specifying the same
  directory
in both --pgdata and --xlogdir?
   
   I feel no need to prevent, even if user specifies both --pgdata
  and
   --xlogdir as same directory all the transaction log files will
   be created in the base directory  instead of pg_xlog directory.
   
   
   
   Given how easy it would be to prevent that, I think we should.
 It
   would be  an easy misunderstanding to specify that when you
  actually
   want it in  wherever/pg_xlog. Specifying that would be
   redundant in the first place,  but people ca do that, but it
   
   would also be very easy to do it by mistake, and you'd end up
   with something that's really bad, including a recursive symlink.
   
   
   
Presently with initdb also user can specify both data and xlog
directories as same.
   
To prevent the data directory and xlog directory as same, there
is
  a
way in windows (_fullpath api) to get absolute path from a
relative path, but I didn't get any such possibilities in linux.
   
I didn't find any other way to check it, if anyone have any idea
regarding this please let me know.
  
   What about make_absolute_path() in miscinit.c?
 
  The make_absoulte_patch() function gets the current working
 directory
  and adds The relative path to CWD, this is not giving proper
 absolute
  path.
 
  I have added a new function verify_data_and_xlog_dir_same() which
  will change the Current working directory to data directory and gets
  the CWD and the same way for transaction log directory. Compare the
  both data and xlog directories and throw an error. Please check it
 once.
 
  Is there any other way to identify that both data and xlog
  directories are pointing to the same Instead of comparing both
 absolute paths?
 
  Updated patch attached in the mail.
 
  Failure when the xlogdir doesn't exist is fixed in the updated patch
 attached in the mail.

 Thanks for updating the patch!

 With the patch, when I specify the same directory in both -D and --
 xlogdir, I got the following error.

 $ cd /tmp
 $ pg_basebackup -D hoge --xlogdir=/tmp/hoge -X fetch
 pg_basebackup: could not change directory to hoge: No such file or
 directory

 Thanks. After finding the xlog directory absolute path returning back to 
 current working
 Directory is missed, because of this reason it is failing while finding the 
 absolute
 Path for data directory. Apart from this change the code is reorganized a bit.

 Updated patch is attached in the mail. Please review it once.

Thanks for newer version of the patch!

I found that the empty base directory is created and remains even when the same
directory is specified in both -D and --xlogdir and the error occurs.
I think that it's
better to throw an error in that case before creating any new
directory. Thought?

+xlogdir = get_absolute_path(xlog_dir);

xlog_dir must be an absolute path. ISTM we don't do the above. No?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

Merlin,

 *) Aside from the text in and out routines, how is 'jsbonb' different
 from the coming 'nested hstore'?   Enough to justify two code bases?

In/out functions and defaults are all different.  Otherwise, the two
types will be accessing the same code base, so no duplication.  Think of
is as XML vs. TEXT.

 Maybe we can cheat a little bit overload the functions so that one the
 existing APIs (hstore or json) can be recovered -- only adding what
 minimal functionality needs to be added to handle the type distinction
 (mostly on serialization routines and casts).  What I'm driving at
 here is that it would be nice if the API was not strongly bifurcated:
 this would cause quite a bit of mindspace confusion.

I'll also note that for functions designed for output to the client, it
doesn't make much of a difference whether the result is JSON or JSONB,
since the string representation will be identical.  However, it makes a
difference if the data is going to be stored, since a double conversion
on a large JSON value would be expensive.

In other words, we need a version of each function which outputs JSONB,
but that version doesn't have to be the default if users don't specify.

Note that this raises the issue of first alternate data type ambiguity
again for overloading builtin functions.  We really need that method of
prefering a specific version of the function ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER FREEZE

On 11/18/2013 08:39 AM, Bruce Momjian wrote:
 If we do add FREEZE, all we would be doing is delaying the time when all
 CLUSTER operations will use FREEZE, and hence debugging will be just as
 difficult.  My point is that will full knowledge, everyone would use
 FREEZE unless they expect MVCC bugs, which is going to be an almost zero
 population.

+1

None of our users would willingly choose worse performance over the 0.1%
possibility of needing to analyze a transaction failure.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: pre-commit triggers

2013-11-18 Thread Dimitri Fontaine

Andrew Dunstan and...@dunslane.net writes:
 Given that, do we want to keep the bar on these operating in single user
 mode? I can easily drop it and just document this way out of difficulty.

Currently Event Triggers are disabled in single user mode, in parts
because operating them require accessing to a catalog index, which might
be corrupted and the reason why you're in single user mode in the first
place.

So please keep your new event trigger disabled in single user mode.

  
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cd3413ec3683918c9cb9cfb39ae5b2c32f231e8b

Regards, and thanks for this new Event Trigger!
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio

2013-11-18 Thread Fujii Masao

On Mon, Nov 18, 2013 at 8:36 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 (2013/11/18 20:16), Haribabu kommi wrote:

 On 18 October 2013 13:35 KONDO Mitsumasa wrote:

 This patch conflicts pg_stat_statement_min_max_exectime patch which I
 submitted, and pg_stat_statement_min_max_exectime patch also adds new
 columns which are min_time and max_time. So I'd like to change it in
 this opportunity.


 This patch adds another column shared_blks_hit_percent to
 pg_stat_statements view
 Which is very beneficial to the user to know how much percentage of blks
 are hit.

The same idea was proposed before but not committed because
Itagaki thought that pg_stat_statements view should report only raw values.
Please read the following thread. I have the same feeling with him.
Anyway we should listen to more opinions from other people, though.
http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-18 Thread Merlin Moncure

On Mon, Nov 18, 2013 at 12:10 PM, Josh Berkus j...@agliodbs.com wrote:
 Merlin,

 *) Aside from the text in and out routines, how is 'jsbonb' different
 from the coming 'nested hstore'?   Enough to justify two code bases?

 In/out functions and defaults are all different.  Otherwise, the two
 types will be accessing the same code base, so no duplication.  Think of
 is as XML vs. TEXT.

 Maybe we can cheat a little bit overload the functions so that one the
 existing APIs (hstore or json) can be recovered -- only adding what
 minimal functionality needs to be added to handle the type distinction
 (mostly on serialization routines and casts).  What I'm driving at
 here is that it would be nice if the API was not strongly bifurcated:
 this would cause quite a bit of mindspace confusion.

 I'll also note that for functions designed for output to the client, it
 doesn't make much of a difference whether the result is JSON or JSONB,
 since the string representation will be identical.  However, it makes a
 difference if the data is going to be stored, since a double conversion
 on a large JSON value would be expensive.

Hm, but it would matter wouldn't it...the jsonb representation would
give output with the record fields reordered, deduplicated, etc.
Also, presumably, the jsonb serialization would be necessarily slower
for exactly that reason, although perhaps not enough to matter much.

 In other words, we need a version of each function which outputs JSONB,
 but that version doesn't have to be the default if users don't specify.

 Note that this raises the issue of first alternate data type ambiguity
 again for overloading builtin functions.  We really need that method of
 prefering a specific version of the function ...

You'd need explicit jsonb creating functions: record_to_jsonb,
array_to_jsonb etc.  AFAIK, these functions would be the only ones
that would have to explicitly reference the jsonb type if you don't
count casts.

It's tempting to *not* make those functions as that would only require
the user to specify jsonb for table columns...you'd then go from json
to jsonb with a cast, perhaps even an implicit one.  The disadvantage
there is that you'd then get unsimplified json always.

Hm -- on that note, is it technically feasible to *not* duplicate the
json API implementations, but just (ab)use implicit casting between
the APIs?  That way the text API would own all the serialization
routines as it does now but you'd run mutation and searching through
jsonb...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

Three times in the last two weeks, we have experience data corruption secondary 
servers using streaming replication on client systems.  The versions involved 
are 9.0.14, 9.2.5, and 9.3.1.  Each incident was separate; two cases they were 
for the same client (9.0.14 and 9.3.1), one for a different client (9.2.5).

The details of each incident are similar, but not identical.

The details of each incident are:

INCDIDENT #1: 9.0.14 -- A new secondary (S1) was initialized using rsync off of 
an existing, correct primary (P1) for the base backup, and using WAL-E for WAL 
segment shipping.  Both the primary and secondary were running 9.0.14.  S1 
properly connected to the primary once the it was caught up on WAL segments, 
and S1 was then promoted as a primary using the trigger file.

No errors in the log files on either system.

After promotion, it was discovered that there was significant data loss on S1.  
Rows that were present on P1 were missing on S1, and some rows were duplicated 
(including duplicates that violated primary key and other unique constraints).  
The indexes were corrupt, in that they seemed to think that the duplicates were 
not duplicated, and that the missing rows were still present.

Because the client's schema included a last_updated field, we were able to 
determine that all of the rows that were either missing or duplicated had been 
updated on P1 shortly (3-5 minutes) before S1 was promoted.  It's possible, but 
not confirmed, that there were active queries (including updates) running on P1 
at the moment of S1's promotion.

As a note, P1 was created from another system (let's call it P0) using just WAL 
shipping (no streaming replication), and no data corruption was observed.

P1 and S1 were both AWS instances running Ubuntu 12.04, using EBS (with xfs as 
the file system) as the data volume.

P1 and S1 have been destroyed at this point.


INCIDENT #2: 9.3.1 -- In order to repair the database, a pg_dump was taken of 
S1y, after having dropped the primary and unique constraints, and restored into 
a new 9.3.1 server, P2.  Duplicate rows were purged, and missing rows were 
added again.  The database, a new primary, was then put back into production, 
and ran without incident.

A new secondary, S2 was created off of the primary.  This secondary was created 
using pg_basebackup using --xlog-method=stream, although the WAL-E archiving 
was still present.

S2 attached to P2 without incident and no errors in the logs, but 
nearly-identical corruption was discovered (although this time without the 
duplicated rows, just missing rows).  At this point, it's not clear if there 
was some clustering in the last_updated timestamp for the rows that are 
missing from S2.  No duplicated rows were observed.

P2 and S2 are both AWS instances running Ubuntu 12.04, using EBS (with xfs as 
the file system) as the data volume.

No errors in the log files on either system.

P2 and S2 are still operational.


INCIDENT #3: 9.2.5 -- A client was migrating a large database from a 9.2.2 
system (P3) to a new 9.2.5 system (S3) using streaming replication.  As I 
personally didn't do the steps on this one, I don't have quite as much 
information, but the basics are close to incident #2: When S3 was promoted 
using the trigger file, no errors were observed and the database came up 
normally, but rows were missing from S3 that were present on P3.

P1 is running Centos 6.3 with ext4 as the file system.

P2 is running Centos 6.4 with ext3 as the file system.

Log shipping in this case was done via rsync.

P3 and S3 are still operational.

No errors in the log files on either system.

--

Obviously, we're very concerned that a bug was introduced in the latest minor 
release.  We're happy to gather data as required to assist in diagnosing this.
--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio

On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao masao.fu...@gmail.com wrote:
 The same idea was proposed before but not committed because
 Itagaki thought that pg_stat_statements view should report only raw values.
 Please read the following thread. I have the same feeling with him.
 Anyway we should listen to more opinions from other people, though.
 http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp

+1 from me. I think that a higher level tool needs to be built on
pg_stat_statements to make things easy for those that want a slick,
pre-packaged solution. As I said on the min/max thread, if we're not
doing enough to help people who would like to build such a tool, we
should discuss how we can do better.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

On 11/18/2013 10:58 AM, Christophe Pettus wrote:
 Three times in the last two weeks, we have experience data corruption 
 secondary servers using streaming replication on client systems.  The 
 versions involved are 9.0.14, 9.2.5, and 9.3.1.  Each incident was separate; 
 two cases they were for the same client (9.0.14 and 9.3.1), one for a 
 different client (9.2.5).

To emphasize a salient point: we have not previously seen data
corruption with these exact symptoms prior to the recent patch release.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 10:58 AM, Christophe Pettus x...@thebuild.com wrote:
 As a note, P1 was created from another system (let's call it P0) using just 
 WAL shipping (no streaming replication), and no data corruption was observed.

As another data point, P0 was running 9.0.13, rather than 9.0.14.
--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

Hi,

On 2013-11-18 10:58:26 -0800, Christophe Pettus wrote:
 INCDIDENT #1: 9.0.14 -- A new secondary (S1) was initialized using
 rsync off of an existing, correct primary (P1) for the base backup,
 and using WAL-E for WAL segment shipping.  Both the primary and
 secondary were running 9.0.14.  S1 properly connected to the primary
 once the it was caught up on WAL segments, and S1 was then promoted as
 a primary using the trigger file.

Could you detail how exactly the base backup was created? Including the
*exact* logic for copying?

 No errors in the log files on either system.

Do you have the log entries for the startup after the base backup?

 Because the client's schema included a last_updated field, we were
 able to determine that all of the rows that were either missing or
 duplicated had been updated on P1 shortly (3-5 minutes) before S1 was
 promoted.  It's possible, but not confirmed, that there were active
 queries (including updates) running on P1 at the moment of S1's
 promotion.

Any chance you have log_checkpoints enabled? If so, could you check
whether there was a checkpoint around the time of the base backup?

This server is gone, right? If not, could you do:
SELECT ctid, xmin, xmax, * FROM whatever WHERE duplicate_row?

 INCIDENT #2: 9.3.1 -- In order to repair the database, a pg_dump was taken of 
 S1y, after having dropped the primary and unique constraints, and restored 
 into a new 9.3.1 server, P2.  Duplicate rows were purged, and missing rows 
 were added again.  The database, a new primary, was then put back into 
 production, and ran without incident.
 
 A new secondary, S2 was created off of the primary.  This secondary was 
 created using pg_basebackup using --xlog-method=stream, although the WAL-E 
 archiving was still present.
 
 S2 attached to P2 without incident and no errors in the logs, but 
 nearly-identical corruption was discovered (although this time without the 
 duplicated rows, just missing rows).  At this point, it's not clear if there 
 was some clustering in the last_updated timestamp for the rows that are 
 missing from S2.  No duplicated rows were observed.
 
 P2 and S2 are both AWS instances running Ubuntu 12.04, using EBS (with xfs as 
 the file system) as the data volume.
 
 No errors in the log files on either system.

Could you list the *exact* steps you did to startup the cluster?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 11:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 Could you detail how exactly the base backup was created? Including the
 *exact* logic for copying?

0. Before any of this began, P1 was archiving WAL segments to AWS-S3.
1. pg_start_backup('', true) on P1.
2. Using rsync -av on P1, the entire $PGDATA directory was pushed from P1 to S2.
3. Once the rsync was complete, pg_stop_backup() on P1.
4. Create appropriate recovery.conf on S1.
5. Bring up PostgreSQL on S1.
6. PostgreSQL recovers normally (pulling WAL segments from WAL-E), and 
eventually connects to P1.
 
 Do you have the log entries for the startup after the base backup?

Sadly, not anymore.

 This server is gone, right?

Correct.

 Could you list the *exact* steps you did to startup the cluster?

0. Before any of this began, P2 was archiving WAL segments to AWS-S3.
1. Initial (empty) data directory deleted on S2.
2. New data directory created with:

/usr/lib/postgresql/9.3/bin/pg_basebackup --verbose --progress 
--xlog-method=stream --host=ip --user=repluser --pgdata=/data/9.3/main

3. Once the pg_basebackup completed, create appropriate recovery.conf on S1.
4. Bring up PostgreSQL on S2.
5. PostgreSQL recovers normally (pulling a small number of WAL segments from 
WAL-E), and eventually connects to P2.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

On 2013-11-18 11:38:43 -0800, Christophe Pettus wrote:
 
 On Nov 18, 2013, at 11:28 AM, Andres Freund and...@2ndquadrant.com wrote:
  Could you detail how exactly the base backup was created? Including the
  *exact* logic for copying?
 
 0. Before any of this began, P1 was archiving WAL segments to AWS-S3.
 1. pg_start_backup('', true) on P1.
 2. Using rsync -av on P1, the entire $PGDATA directory was pushed from P1 to 
 S2.

Without deleting any data, including pg_xlog/, backup.label, anything?

Did you have hot_standby enabled on all of those machines? Even on the
9.0.13 cluster?

  Could you list the *exact* steps you did to startup the cluster?
 
 0. Before any of this began, P2 was archiving WAL segments to AWS-S3.
 1. Initial (empty) data directory deleted on S2.
 2. New data directory created with:
 
   /usr/lib/postgresql/9.3/bin/pg_basebackup --verbose --progress 
 --xlog-method=stream --host=ip --user=repluser --pgdata=/data/9.3/main
 
 3. Once the pg_basebackup completed, create appropriate recovery.conf on S1.

That was just recovery command and primary conninfo?


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote:

 Without deleting any data, including pg_xlog/, backup.label, anything?

Correct.

 Did you have hot_standby enabled on all of those machines? Even on the
 9.0.13 cluster?

Yes.

 That was just recovery command and primary conninfo?

Correct.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote:

 Did you have hot_standby enabled on all of those machines? Even on the
 9.0.13 cluster?

Actually, it's a bit more complex than this:

1. We don't know about P0, the 9.0.13 machine.  I assume it was, but it was 
managed elsewhere.
2. P1 never had hot_standby = 'on', as it was never intended to be a hot 
stand_by.
3. S1 did have hot_standby = 'on.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote:
 Without deleting any data, including pg_xlog/, backup.label, anything?

One more correction: After rsync finished and the pg_base_backup() was issued, 
the contents of pg_xlog/ on S1 were deleted.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 12:00 PM, Christophe Pettus x...@thebuild.com wrote:

 One more correction: After rsync finished and the pg_base_backup() was 
 issued, the contents of pg_xlog/ on S1 were deleted.

pg_stop_backup(), sorry.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER FREEZE

2013-11-18 Thread Kevin Grittner

Josh Berkus j...@agliodbs.com wrote:
 On 11/18/2013 08:39 AM, Bruce Momjian wrote:
 If we do add FREEZE, all we would be doing is delaying the time
 when all CLUSTER operations will use FREEZE, and hence debugging
 will be just as difficult.  My point is that will full
 knowledge, everyone would use FREEZE unless they expect MVCC
 bugs, which is going to be an almost zero population.

 +1

+1

 None of our users would willingly choose worse performance over
 the 0.1% possibility of needing to analyze a transaction failure.

I assume that's intended to represent the lifetime probability that
a typical user would ever benefit, not per year or per transaction.
Even as a lifetime number, it seems high unless we're specifically
talking about hackers.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changing pg_dump default file format

On Thu, Nov  7, 2013 at 02:40:17PM -0500, Peter Eisentraut wrote:
 On 11/7/13, 1:00 PM, Josh Berkus wrote:
  If we wanted to change the defaults, I think it would be easier to
  create a separate bin name (e.g. pg_backup) than to change the existing
  parameters for pg_dump.
 
 Note the following code in pg_dump.c:
 
 /* Set default options based on progname */
 if (strcmp(progname, pg_backup) == 0)
 format = c;

Wow, when was that added?  git blame says 2002:

  9f0ae0c8 (Tom Lane   2002-05-10 22:36:27 +   387)   /* Set 
default options based on progname */
  9f0ae0c8 (Tom Lane   2002-05-10 22:36:27 +   388)   if 
(strcmp(progname, pg_backup) == 0)
  9f0ae0c8 (Tom Lane   2002-05-10 22:36:27 +   389)   format = 
c;

However, pggit log 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba does not
show that line being added by the diff.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2013-11-18 Thread Alexander Korotkov

On Fri, Nov 15, 2013 at 11:42 PM, Alexander Korotkov
aekorot...@gmail.comwrote:

 On Fri, Nov 15, 2013 at 11:39 PM, Rod Taylor rod.tay...@gmail.com wrote:




 On Fri, Nov 15, 2013 at 2:26 PM, Alexander Korotkov aekorot...@gmail.com
  wrote:

 On Fri, Nov 15, 2013 at 11:18 PM, Rod Taylor rod.tay...@gmail.comwrote:

 2%.

 It's essentially sentence fragments from 1 to 5 words in length. I
 wasn't expecting it to be much smaller.

 10 recent value selections:

  white vinegar reduce color running
  vinegar cure uti
  cane vinegar acidity depends parameter
  how remedy fir clogged shower
  use vinegar sensitive skin
  home remedies removing rust heating
  does non raw apple cider
  home remedies help maintain healthy
  can vinegar mess up your
  apple cide vineger ph balance


 I didn't get why it's not significantly smaller. Is it possible to share
 dump?


 Sorry, I reported that incorrectly. It's not something I was actually
 looking for and didn't pay much attention to at the time.

 The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB.


 Good. That's meet my expectations :)
 You mention that both master and patched versions was compiled with debug.
 Was cassert enabled?


In the attached version of patch tsvector opclass is enabled to use
fastscan. You can retry your tests.

--
With best regards,
Alexander Korotkov.


gin-fast-scan.7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

On Sat, Nov  9, 2013 at 02:15:52PM -0500, Robert Haas wrote:
 On Fri, Nov 8, 2013 at 5:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  [ I'm so far behind ... ]
 
  Bruce Momjian br...@momjian.us writes:
  Applied.  Thank you for all your suggestions.
 
  I thought the suggestion had been to issue a *warning*.  How did that
  become an error?  This patch seems likely to break applications that
  may have just been harmlessly sloppy about when they were issuing
  SETs and/or what flavor of SET they use.  We don't for example throw
  an error for START TRANSACTION with an open transaction or COMMIT or
  ROLLBACK without one --- how can it possibly be argued that these
  operations are more dangerous than those cases?
 
  I'd personally have voted for using NOTICE.
 
 Well, LOCK TABLE throws an error, so it's not without precedent.

Yeah, I just copied the LOCK TABLE usage.  You could argue that SET
SESSION ISOLATION only affects later commands and doesn't do something
like LOCK, so it should be a warning.  Not sure how to interpret the
COMMIT/ROLLBACK behavior you mentioned.

Considering we are doing this outside of a transaction, and WARNING or
ERROR is pretty much the same, from a behavioral perspective.

Should we change this and LOCK to be a warning?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

On 2013-11-18 10:58:26 -0800, Christophe Pettus wrote:
 After promotion, it was discovered that there was significant data
 loss on S1.  Rows that were present on P1 were missing on S1, and some
 rows were duplicated (including duplicates that violated primary key
 and other unique constraints).  The indexes were corrupt, in that they
 seemed to think that the duplicates were not duplicated, and that the
 missing rows were still present.

Were there any kind of patterns in the lost data? What kind of workload
are they running? I have an idea what the issue might be...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unaccent module - two params function should be immutable

On Fri, Nov 8, 2013 at 06:00:53PM -0500, Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
[ mark unaccent functions immutable ]

Applied.

This patch is flat out wrong and needs to be reverted.

The functions were correctly marked (by you!) in commit
c0577c92a84cc477a88fe6868c16c4a7e3348b11 on the basis of the discussion of
bug #5781,
http://www.postgresql.org/message-id/201012021544.ob2fitn1041...@wwwmaster.postgresql.org
which was a request exactly like this one and was denied for good and
sufficient reasons. There was absolutely no reasoning given in this
thread that explained why we should ignore the previous objections.

In particular, marking the single-argument version of unaccent() as
immutable is the height of folly because its behavior depends on the
setting of search_path. Changing the two-argument function is maybe
a bit more debatable, but that's not what you did.

If we were going to change the behavior, this patch would still be wrong
because it fails to provide an upgrade path. The objections saying you
needed a 1.1 migration script were completely correct.

Thanks, patch reverted. If people still want this, it needs to be
resbumitted with this feedback in mind.

--
Bruce Momjian br...@momjian.ushttp://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 12:57 PM, Andres Freund and...@2ndquadrant.com wrote:

 Were there any kind of patterns in the lost data? What kind of workload
 are they running? I have an idea what the issue might be...

On the P1  S1 case, the data corrupted was data modified in the last few 
minutes before the switchover.  I don't want to over-analyze, but it was within 
the checkpoint_timeout value for that sever.

On the P2  S2 case, it's less obvious what the pattern is, since there was no 
cutover.

Insufficient information on the P3  S3 case.

Each of them is a reasonably high-volume OLTP-style workload.  The P1/P2 client 
has a very high level of writes; the P3 more read-heavy, but still a fair 
number of writes. 

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

Hi,

Afaics it's likely a combination/interaction of bugs and fixes between:
* the initial HS code
* 5a031a5556ff83b8a9646892715d7fef415b83c3
* f44eedc3f0f347a856eea8590730769125964597

But that'd mean nobody noticed it during 9.3's beta...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

Great!  If there's any further data I can supply to help, let me know.

On Nov 18, 2013, at 2:15 PM, Andres Freund and...@2ndquadrant.com wrote:

 Hi,
 
 Afaics it's likely a combination/interaction of bugs and fixes between:
 * the initial HS code
 * 5a031a5556ff83b8a9646892715d7fef415b83c3
 * f44eedc3f0f347a856eea8590730769125964597
 
 But that'd mean nobody noticed it during 9.3's beta...
 
 Greetings,
 
 Andres Freund
 
 -- 
 Andres Freundhttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

On 2013-11-18 14:25:58 -0800, Christophe Pettus wrote:
 Great!  If there's any further data I can supply to help, let me know.

Trying to reproduce the issue with and without hot_standby=on would be
very helpful, but I guess that's time consuming?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1


On Nov 18, 2013, at 2:26 PM, Andres Freund and...@2ndquadrant.com wrote:

 Trying to reproduce the issue with and without hot_standby=on would be
 very helpful, but I guess that's time consuming?

I've been working on it, but I haven't gotten it to fail yet.  I'll keep at it.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Review: HStore Gin Speedup

2013-11-18 Thread Antonin Houska

Following are my initial comments for
https://commitfest.postgresql.org/action/patch_view?id=1203


1. It no longer applies, probably due to commit
a18167510f4c385329697588ce5132cbf95779c3

error: contrib/hstore/hstore--1.1.sql: No such file or directory


2. Compatibility with HStore v2.0
(https://commitfest.postgresql.org/action/patch_view?id=1289)

As long as the separate operator class gin_hstore_combined_ops is used,
there should be no conflict. I'll verify as soon as the $subject patch
is applicable to master.


3. Regression tests

Shouldn't EXPLAIN be there too? I suggest a test that creates 2 indexes
on the hstore column - one using the existing GIN (default) opclass and
one using the new one (combined). The execution plan will then show if
an index is used and if it's the correct one.


// Antonin Houska (Tony)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2013-11-18 Thread Jim Mlodgenski

On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 The attached patches are the revised custom-scan APIs.


My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]

I'd recommend that you split the part1 patch containing the ctidscan
contrib into its own patch. It is more than half of the patch and its
certainly stands on its own. IMO, I think ctidscan fits a very specific use
case and would be better off being an extension instead of in contrib.




 - Custom-scan.sgml was added to introduce the way to write custom-scan
   provider in the official documentation.
 - Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
   handlers into two parts; common portion and fdw specific one.
   Executor callbacks of custom-scan code utilizes the common portion above
   because most of its implementations are equivalent.

 I'd like to see comments regarding to the way to handle Var reference onto
 a custom-scan that replaced relations join.
 A varno of Var that references a join relation is rtindex of either
 right or left
 relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
 set instead.
 However, it does not work well if a custom-scan that just references result
 of remote join query was chosen instead of local join, because its result
 shall be usually set in the ps_ResultTupleSlot of PlanState, thus
 ExecEvalScalarVar does not reference neither inner nor outer slot.
 Instead of existing solution, I added one more special varno; CUSTOM_VARNO
 that just references result-tuple-slot of the target relation.
 If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
 the TupleDesc of underlying ps_ResultTupleSlot.
 I'm not 100% certain whether it is the best approach for us, but it works
 well.

 Also, I'm uncertain for usage of param_info in Path structure, even though
 I followed the manner in other portion. So, please point out if my usage
 was not applicable well.

 Thanks,

 2013/11/11 Kohei KaiGai kai...@kaigai.gr.jp:
  Hi,
 
  I tried to write up a wikipage to introduce how custom-scan works.
 
  https://wiki.postgresql.org/wiki/CustomScanAPI
 
  Any comments please.
 
  2013/11/6 Kohei KaiGai kai...@kaigai.gr.jp:
  The attached patches provide a feature to implement custom scan node
  that allows extension to replace a part of plan tree with its own code
  instead of the built-in logic.
  In addition to the previous proposition, it enables us to integrate
 custom
  scan as a part of candidate paths to be chosen by optimizer.
  Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
  a set of API stuff and a simple demonstration module that implement
  regular table scan using inequality operator on ctid system column.
  The second one (pgsql-v9.4-custom-scan-remote-join) enhances
  postgres_fdw to support remote join capability.
 
  Below is an example to show how does custom-scan work.
 
  We usually run sequential scan even if clause has inequality operator
  that references ctid system column.
 
  postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid  '(10,0)'::tid;
 QUERY PLAN
  
   Seq Scan on t1  (cost=0.00..209.00 rows= width=43)
 Filter: (ctid  '(10,0)'::tid)
  (2 rows)
 
  An extension that performs as custom-scan provider suggests
  an alternative path, and its cost was less than sequential scan,
  thus optimizer choose it.
 
  postgres=# LOAD 'ctidscan';
  LOAD
  postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid  '(10,0)'::tid;
QUERY PLAN
  --
   Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows= width=43)
 Filter: (ctid  '(10,0)'::tid)
  (2 rows)
 
  Of course, more cost effective plan will win if exists.
 
  postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid  '(10,0)'::tid AND
 a = 200;
  QUERY PLAN
  ---
   Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=43)
 Index Cond: (a = 200)
 Filter: (ctid  '(10,0)'::tid)
  (3 rows)
 
  One other worthwhile example is remote-join enhancement on the
  postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
  managed by same foreign server.
 
  postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
  like '%aaa%';
 QUERY
 PLAN
 
 --
   Custom Scan (postgres-fdw)  (cost=100.00..100.01 rows=0 width=72)
 Output: a, b, x, y
 Filter:

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Mon, Nov 18, 2013 at 6:44 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I think it's important to recap the design goals of this.

Seems reasonable to list them out.

 * It should be usable and perform well for both large batch updates and
 small transactions.

I think that that's a secondary goal, a question to be considered but
perhaps deferred during this initial effort. I agree that it certainly
is important.

 * It should perform well both when there are no duplicates, and when there
 are lots of duplicates

I think this is very important.

 And from that follows some finer requirements:

 * Performance when there are no duplicates should be close to raw INSERT
 performance.

 * Performance when all rows are duplicates should be close to raw UPDATE
 performance.

 * We should not leave behind large numbers of dead tuples in either case.

I agree with all that.

 Anything else I'm missing?

I think so, yes. I'll add:

* Should not deadlock unreasonably.

If the UPDATE case is to work and perform almost as well as a regular
UPDATE, that must mean that it has essentially the same
characteristics as plain UPDATE. In particular, I feel fairly strongly
that it is not okay for upserts to deadlock with each other unless the
possibility of each transaction locking multiple rows (in an
inconsistent order) exists. I don't want to repeat the mistakes of
MySQL here. This is a point that I stressed to Robert on a previous
occasion [1]. It's why value locks and row locks cannot be held at the
same time. Incidentally, that implies that all alternative schemes
involving bloat will bloat once per attempt, I believe.

I'll also add:

* Should anticipate a day when Postgres needs plumbing for SQL MERGE,
which is still something we want, particularly for batch operations. I
realize that the standard doesn't strictly require MERGE to handle the
concurrency issues, but even still I don't think that an
implementation that doesn't is practicable - does such an
implementation currently exist in any other system?

 What about exclusion constraints? I'd like to see this work for them as
 well. Currently, exclusion constraints are checked after the tuple is
 inserted, and you abort if the constraint was violated. We could still
 insert the heap and index tuples first, but instead of aborting on
 violation, we would kill the heap tuple we already inserted and retry. There
 are some complications there, like how to wake up any other backends that
 are waiting to grab a lock on the tuple we just killed, but it seems doable.

I agree that it's at least doable.

 That would, however, perform badly and leave garbage behind if there are
 duplicates. A refinement of that would be to first check for constraint
 violations, then insert the tuple, and then check again. That would avoid
 the garbage in most cases, but would perform much more poorly when there are
 no duplicates, because it needs two index scans for every insertion. A
 further refinement would be to keep track of how many duplicates there have
 been recently, and switch between the two strategies based on that.

Seems like an awful lot of additional mechanism.

 That cost of doing two scans could be alleviated by using markpos/restrpos
 to do the second scan. That is presumably cheaper than starting a whole new
 scan with the same key. (markpos/restrpos don't currently work for non-MVCC
 snapshots, so that'd need to be fixed, though)

Well, it seems like we could already use a pick up where you left
off mechanism in the case of regular btree index tuple insertions
into unique indexes -- after all, we don't do that in the event of
blocking pending the outcome of the other transaction (that inserted a
duplicate that we need to conclusively know has or has not committed)
today. The fact that this doesn't already exist leaves me less than
optimistic about the prospect of making it work to facilitate a scheme
such as the one you describe here. (Today we still need to catch a
committed version of the tuple that would make our tuple a duplicate
from a fresh index scan, only *after* waiting for a transaction to
commit/abort at the end of our original index scan). So we're already
pretty naive about this, even though it would pay to not be.

Making something like markpos work for the purposes of an upsert
implementation seems not only hard, but also like a possible
modularity violation. Are we not unreasonably constraining the
implementation going forward? My patch respects the integrity of the
am abstraction, and doesn't really add any knowledge to the core
system about how amcanunique index methods might go about implementing
the new amlock method. The core system worries a bit about the low
level locks (as it naively refers to value locks), and doesn't
consider that it has the right to hold on to them for more than an
instant, but that's about it. Plus we don't have to worry about
whether something does or does not work for a certain snapshot type
with my approach,

Re: [HACKERS] GIN improvements part2: fast scan

2013-11-18 Thread Rod Taylor

On Fri, Nov 15, 2013 at 2:42 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Fri, Nov 15, 2013 at 11:39 PM, Rod Taylor rod.tay...@gmail.com wrote:


 The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB.


 Good. That's meet my expectations :)
 You mention that both master and patched versions was compiled with debug.
 Was cassert enabled?


Just debug. I try not to do performance tests with assertions on.

Patch 7 gives the results I was looking for on this small sampling of data.


gin-fast-scan.6.patch/9.4 master performance
=
the:  1147.413 ms 1159.360 ms  1122.549 ms
and:  1035.540 ms  999.514 ms  1003.042 ms
hotel:  57.670 ms   61.152 ms58.862 ms
and  the  hotel: 266.121 ms  256.711 ms   267.011 ms
hotel  and  the: 260.213 ms  254.055 ms   255.611 ms


gin-fast-scan.7.patch
=
the:  1091.735 ms 1068.909 ms  1076.474 ms
and:   985.690 ms  972.833 ms   948.286 ms
hotel:  60.756 ms   59.028 ms57.836 ms
and  the  hotel:  50.391 ms   38.715 ms46.168 ms
hotel  and  the:  45.395 ms   40.880 ms43.978 ms

Thanks,

Rod

[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-11-18 Thread David Johnston

Bruce Momjian wrote
Considering we are doing this outside of a transaction, and WARNING or
ERROR is pretty much the same, from a behavioral perspective.

Should we change this and LOCK to be a warning?

From the calling application's perspective an error and a warning are
definitely behaviorally different.

For this I'd vote for a warning (haven't pondered the LOCK scenario) as
using SET out of context means the user has a fairly serious
mis-understanding of the code path they have written (accedentially or
otherwise). Notice makes sense (speaking generally and without much
research here) for stuff where the ultimate outcome matches the statement
but the statement itself didn't actually do anything. Auto-sequence and
index generation fell into this but even notice was too noisy. In this case
we'd expect that the no-op statement was issued in error and thus should be
changed making a warning the level of incorrect-ness to communicate. A
notice would be more appropriate if there were valid use-cases for the user
doing this and we just want to make sure they are conscious of the
unusualness of the situation.

I dislike error for backward compatibility reasons. And saving the user
from this kind of mistake doesn't warrant breaking what could be properly
functioning code. Just because PostgreSQL isn't in a transaction does not
mean the client is expecting the current code to work correctly - even if by
accident - as part of a sequence of queries.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5778994.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] More legacy code: pg_ctl

Folks,

Speaking of legacy code with bad default behaviors: pg_ctl.  The current
utility is designed to fathfully reproduce the rather hackish shell
script we originally had for postgres startup.   This results in a
couple of unintuitive defaults which give sysadmins and config
management developers headaches:

a) by default, it returns to the caller without waiting for postgres to
actually start/stop/restart.  In this mode, it also always returns
success regardless of result.

b) by default, it remains attached to the caller's tty for stdout and
stderr, even after it has switched to the regular postgres log.

Yes, one can work around both of these with -w and -l, but the only
reason those are non-default settings is that's the way the 7.X era
shell script behaved.  So at this point we're preserving unintuitive
default behavior in order to be backwards compatible with a 1999-era
shell script.

I don't know if the answer is to rename the utility like we're
discussing with pg_dump/pg_backup, or something else.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More legacy code: pg_ctl

On 11/18/2013 05:09 PM, Josh Berkus wrote:
 Folks,
 
 Speaking of legacy code with bad default behaviors: pg_ctl.  The current
 utility is designed to fathfully reproduce the rather hackish shell
 script we originally had for postgres startup.   This results in a
 couple of unintuitive defaults which give sysadmins and config
 management developers headaches:
 
 a) by default, it returns to the caller without waiting for postgres to
 actually start/stop/restart.  In this mode, it also always returns
 success regardless of result.
 
 b) by default, it remains attached to the caller's tty for stdout and
 stderr, even after it has switched to the regular postgres log.

Oh, and one more:

c) that stop defaults to smart mode, instead of fast mode.

 Yes, one can work around both of these with -w and -l, but the only
 reason those are non-default settings is that's the way the 7.X era
 shell script behaved.  So at this point we're preserving unintuitive
 default behavior in order to be backwards compatible with a 1999-era
 shell script.
 
 I don't know if the answer is to rename the utility like we're
 discussing with pg_dump/pg_backup, or something else.
 


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] configure: allow adding a custom string to PG_VERSION

2013-11-18 Thread Peter Eisentraut

On Tue, 2013-11-05 at 18:29 +0200, Oskari Saarenmaa wrote:
 This can be used to tag custom built packages with an extra version string
 such as the git describe id or distribution package release version.

I think this is a reasonable feature, and the implementation is OK, but
I don't see why the format of the extra version information needs to be
exactly

PG_VERSION=$PACKAGE_VERSION ($withval)

I'd imagine, for example, that someone will want to do -something or
+something.  So I'd just make this

PG_VERSION=$PACKAGE_VERSION$withval

Comments?

 +# Allow adding a custom string to PG_VERSION
 +PGAC_ARG_REQ(with, extra-version, [NAME], [extra version information],
 +[PG_VERSION=$PACKAGE_VERSION ($withval)], [PG_VERSION=$PACKAGE_VERSION])

This could be indented better.  It was a bit confusing at first.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Mon, Nov 18, 2013 at 4:37 PM, Peter Geoghegan p...@heroku.com wrote:
 You're right that the value locking is scary. I think we need to very
 carefully consider it, once I have buy-in on the basic approach. I
 really do think it's the least-worst approach described to date. It
 isn't like we can't discuss making it inherently less scary, but I
 hesitate to do that now, given that I don't know if that discussing
 will go anywhere.

One possible compromise would be promise tuples where we know we'll
be able to keep our promise. In other words:

1. We lock values in the first phase, in more or less the manner of
the extant patch.

2. When a consensus exists that heap tuple insertion proceeds, we
proceed with insertion of these promise index tuples (and probably
keep just a pin on the relevant pages).

3. Proceed with insertion of the heap tuple (with no value locks of
any kind held).

3. Go back to the unique indexes, update the heap tid and unset the
index tuple flag (that indicates that the tuples are in this promise
state). Probably we can even be bright about re-finding the existing
promise tuples with their proper heap tid (e.g. maybe we can avoid
doing a regular index scan at least some of the time - chances are
pretty good that the index tuple is on the same page as before, so
it's generally well worth a shot looking there first). As with the
earlier promise tuple proposals, we store our xid in the ItemPointer.

4. Finally, insertion of non-unique index tuples occurs in the regular manner.

Obviously the big advantage here is that we don't have to worry about
value locking across heap tuple insertion at all, and yet we don't
have to worry about bloating, because we really do know that insertion
proper will proceed when inserting *this* type of promise index tuple.
Maybe that even makes it okay to just use buffer locks, if we think
some more about the other edge cases. Regular index scans take the
aforementioned flag as a kind of visibility hint, perhaps, so we don't
have to worry about them. And VACUUM would kill any dead promise
tuples - this would be much less of a concern than with the earlier
promise tuple proposals, because it is extremely non routine. Maybe
it's fine to not make autovacuum concerned about a whole new class of
(index-only) bloat, which seemed like a big problem with those earlier
proposals, simply because crashes within this tiny window are
hopefully so rare that it couldn't possibly amount to much bloat in
the grand scheme of things (at least before a routine VACUUM - UPDATEs
tend to necessitate those). If you have 50 upserting backends in this
tiny window during a crash, that would be only 50 dead index tuples.
Given the window is so tiny, I doubt it would be much of a problem at
all - even 50 seems like a very high number. The track_counts counts
that drive autovacuum here are already not crash safe, so I see no
regression.

Now, you still have to value lock across multiple btree unique
indexes, and I understand there are reservations about this. But the
surface area is made significantly smaller at reasonably low cost.
Furthermore, doing TOASTing out-of-line and so on ceases to be
necessary.

The LOCK FOR UPDATE case is the same as before. Nothing else changes.

FWIW, without presuming anything about value locking implementation,
I'm not too worried about making the implementation scale to very
large numbers of unique indexes, with very low shared_buffer settings.
We already have a fairly similar situation with
max_locks_per_transaction and so on, no?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio

2013-11-18 Thread KONDO Mitsumasa


(2013/11/19 3:56), Peter Geoghegan wrote:

On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao masao.fu...@gmail.com wrote:

The same idea was proposed before but not committed because
Itagaki thought that pg_stat_statements view should report only raw values.
Please read the following thread. I have the same feeling with him.
Anyway we should listen to more opinions from other people, though.
http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp
I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw values 
be just simple. However, were his changes just simple? I cannot understand his 
aesthetics sense and also you, too:-(


PG8.4 before:

012 CREATE FUNCTION pg_stat_statements(
013 OUT userid oid,
014 OUT dbid oid,
015 OUT query text,
016 OUT calls int8,
017 OUT total_time float8,
018 OUT rows int8
019 )


PG9.0 after:

012 CREATE FUNCTION pg_stat_statements(
013 OUT userid oid,
014 OUT dbid oid,
015 OUT query text,
016 OUT calls int8,
017 OUT total_time float8,
018 OUT rows int8,
019 OUT shared_blks_hit int8,
020 OUT shared_blks_read int8,
021 OUT shared_blks_written int8,
022 OUT local_blks_hit int8,
023 OUT local_blks_read int8,
024 OUT local_blks_written int8,
025 OUT temp_blks_read int8,
026 OUT temp_blks_written int8
027 )
It's too complicated, and do you know how to tuning PG from information of 
local_* and temp_*?
At least, I think that most user cannot tuning from these information, and it 
might not be useful information only part of them.



+1 from me. I think that a higher level tool needs to be built on
pg_stat_statements to make things easy for those that want a slick,
pre-packaged solution.
No. It's not for geek tools and people having pre-packaged solution in big 
company, but also for common DBA tools.


By the way, MySQL and Oracle database which are very popular have these 
statistics. I think that your argument might disturb people who wants to 
migration from these database and will accelerate popularity of these database more.



As I said on the min/max thread, if we're not
doing enough to help people who would like to build such a tool, we
should discuss how we can do better.

Could you tell me how to get min/max statistics with low cost?
I'm not sure about detail of your patch in CF, but it seems very high cost.
Repeatedly, I think that if we want to get drastic detail statistics, we have to 
create another tools of statistics. Your patch will be these statistics tools. 
However, pg_stat_statement sholud be just light weight.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

On Mon, Nov 18, 2013 at 05:05:45PM -0800, David Johnston wrote:
 Bruce Momjian wrote
  Considering we are doing this outside of a transaction, and WARNING or
  ERROR is pretty much the same, from a behavioral perspective.
  
  Should we change this and LOCK to be a warning?
 
 From the calling application's perspective an error and a warning are
 definitely behaviorally different.
 
 For this I'd vote for a warning (haven't pondered the LOCK scenario) as
 using SET out of context means the user has a fairly serious
 mis-understanding of the code path they have written (accedentially or
 otherwise).  Notice makes sense (speaking generally and without much
 research here) for stuff where the ultimate outcome matches the statement
 but the statement itself didn't actually do anything.  Auto-sequence and
 index generation fell into this but even notice was too noisy.  In this case
 we'd expect that the no-op statement was issued in error and thus should be
 changed making a warning the level of incorrect-ness to communicate.  A
 notice would be more appropriate if there were valid use-cases for the user
 doing this and we just want to make sure they are conscious of the
 unusualness of the situation.
 
 I dislike error for backward compatibility reasons.  And saving the user
 from this kind of mistake doesn't warrant breaking what could be properly
 functioning code.  Just because PostgreSQL isn't in a transaction does not
 mean the client is expecting the current code to work correctly - even if by
 accident - as part of a sequence of queries.

Well, ERROR is what LOCK returns, so if we change SET TRANSACTION to be
WARNING, we should change LOCK too, so on backward-compatibility
grounds, ERROR makes more sense.

Personally, I am fine with changing them all to WARNING.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] configure: allow adding a custom string to PG_VERSION

2013-11-18 Thread Michael Paquier

On Tue, Nov 19, 2013 at 10:48 AM, Peter Eisentraut pete...@gmx.net wrote:
 I think this is a reasonable feature, and the implementation is OK, but
 I don't see why the format of the extra version information needs to be
 exactly

 PG_VERSION=$PACKAGE_VERSION ($withval)

 I'd imagine, for example, that someone will want to do -something or
 +something.  So I'd just make this

 PG_VERSION=$PACKAGE_VERSION$withval

 Comments?
It makes sense and brings more flexibility. So +1 for this modification.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-11-18 Thread David Johnston

Bruce Momjian wrote
On Mon, Nov 18, 2013 at 05:05:45PM -0800, David Johnston wrote:
Bruce Momjian wrote
Considering we are doing this outside of a transaction, and WARNING or
ERROR is pretty much the same, from a behavioral perspective.

Should we change this and LOCK to be a warning?

From the calling application's perspective an error and a warning are
definitely behaviorally different.

For this I'd vote for a warning (haven't pondered the LOCK scenario) as
using SET out of context means the user has a fairly serious
mis-understanding of the code path they have written (accedentially or
otherwise). Notice makes sense (speaking generally and without much
research here) for stuff where the ultimate outcome matches the statement
but the statement itself didn't actually do anything. Auto-sequence and
index generation fell into this but even notice was too noisy. In this
case
we'd expect that the no-op statement was issued in error and thus should
be
changed making a warning the level of incorrect-ness to communicate. A
notice would be more appropriate if there were valid use-cases for the
user
doing this and we just want to make sure they are conscious of the
unusualness of the situation.

I dislike error for backward compatibility reasons. And saving the user
from this kind of mistake doesn't warrant breaking what could be properly
functioning code. Just because PostgreSQL isn't in a transaction does
not
mean the client is expecting the current code to work correctly - even if
by
accident - as part of a sequence of queries.

Well, ERROR is what LOCK returns, so if we change SET TRANSACTION to be
WARNING, we should change LOCK too, so on backward-compatibility
grounds, ERROR makes more sense.

Personally, I am fine with changing them all to WARNING.

Error makes more sense if the goal is internal consistency. That goal
should be subservient to backward compatibility. Changing LOCK to warning
is less problematic since the likelihood of current code functioning in such
a way that after upgrade it would begin working differently in the absence
of an error does not seem probable. Basically someone would have be
trapping on the error and conditionally branching their logic.

That said, if this was a day 0 decision I'd likely raise an error.
Weakening LOCK doesn't make sense since it is day 0 behavior. Document the
warning for SET as being weaker than ideal because of backward compatibility
and call it a day (i.e. leave LOCK at error). The documentation, not the
code, then enforces the feeling that such usage is considered wrong without
possibly breaking wrong but working code.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5779006.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio

2013-11-18 Thread KONDO Mitsumasa


(2013/11/19 11:12), KONDO Mitsumasa wrote:

(2013/11/19 3:56), Peter Geoghegan wrote:

On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao masao.fu...@gmail.com wrote:

The same idea was proposed before but not committed because
Itagaki thought that pg_stat_statements view should report only raw values.
Please read the following thread. I have the same feeling with him.
Anyway we should listen to more opinions from other people, though.
http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp

I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw values 
be
just simple. However, were his changes just simple? I cannot understand his
aesthetics sense and also you, too:-(

PG8.4 before:

012 CREATE FUNCTION pg_stat_statements(
013 OUT userid oid,
014 OUT dbid oid,
015 OUT query text,
016 OUT calls int8,
017 OUT total_time float8,
018 OUT rows int8
019 )


PG9.0 after:

012 CREATE FUNCTION pg_stat_statements(
013 OUT userid oid,
014 OUT dbid oid,
015 OUT query text,
016 OUT calls int8,
017 OUT total_time float8,
018 OUT rows int8,
019 OUT shared_blks_hit int8,
020 OUT shared_blks_read int8,
021 OUT shared_blks_written int8,
022 OUT local_blks_hit int8,
023 OUT local_blks_read int8,
024 OUT local_blks_written int8,
025 OUT temp_blks_read int8,
026 OUT temp_blks_written int8
027 )

It's too complicated, and do you know how to tuning PG from information of
local_* and temp_*?
At least, I think that most user cannot tuning from these information, and it
might not be useful information only part of them.


+1 from me. I think that a higher level tool needs to be built on
pg_stat_statements to make things easy for those that want a slick,
pre-packaged solution.

No. It's not for geek tools and people having pre-packaged solution in big
company, but also for common DBA tools.

By the way, MySQL and Oracle database which are very popular have these
statistics. I think that your argument might disturb people who wants to
migration from these database and will accelerate popularity of these database 
more.


As I said on the min/max thread, if we're not
doing enough to help people who would like to build such a tool, we
should discuss how we can do better.

Could you tell me how to get min/max statistics with low cost?
I'm not sure about detail of your patch in CF, but it seems very high cost.
Repeatedly, I think that if we want to get drastic detail statistics, we have to
create another tools of statistics. Your patch will be these statistics tools.
However, pg_stat_statement sholud be just light weight.

Regards,

Oh, I forgot to say it...
I beleive that essence of information technology is to show more informative 
information in little information with low cost. If it wrong, please let me know 
the reason.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Remove use of obsolescent Autoconf macros

2013-11-18 Thread Peter Eisentraut

According to the Autoconf documentation, the following macros are
obsolescent because no reasonably current system has the defect that
they are testing for.

- AC_C_CONST
- AC_C_STRINGIZE
- AC_C_VOLATILE
- AC_FUNC_MEMCMP

Therefore, I propose to remove their use, with the attached patch.

From 21de25975e7ea66e562d9b320c85cef53d0b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut pete...@gmx.net
Date: Tue, 12 Nov 2013 06:46:01 -0500
Subject: [PATCH] Remove use of obsolescent Autoconf macros

Remove the use of the following macros, which are obsolescent according
to the Autoconf documentation:

- AC_C_CONST
- AC_C_STRINGIZE
- AC_C_VOLATILE
- AC_FUNC_MEMCMP
---
 configure  | 297 -
 configure.in   |   6 -
 src/include/c.h|  22 +---
 src/include/pg_config.h.in |  10 --
 src/port/memcmp.c  |  70 ---
 5 files changed, 3 insertions(+), 402 deletions(-)
 delete mode 100644 src/port/memcmp.c

diff --git a/configure b/configure
index 74e34eb..0165c3c 100755
--- a/configure
+++ b/configure
@@ -15326,113 +15326,6 @@ $as_echo $as_me: error: unknown endianness
{ (exit 1); exit 1; }; } ;;
  esac
 
-{ $as_echo $as_me:$LINENO: checking for an ANSI C-conforming const 5
-$as_echo_n checking for an ANSI C-conforming const...  6; }
-if test ${ac_cv_c_const+set} = set; then
-  $as_echo_n (cached)  6
-else
-  cat conftest.$ac_ext _ACEOF
-/* confdefs.h.  */
-_ACEOF
-cat confdefs.h conftest.$ac_ext
-cat conftest.$ac_ext _ACEOF
-/* end confdefs.h.  */
-
-int
-main ()
-{
-/* FIXME: Include the comments suggested by Paul. */
-#ifndef __cplusplus
-  /* Ultrix mips cc rejects this.  */
-  typedef int charset[2];
-  const charset cs;
-  /* SunOS 4.1.1 cc rejects this.  */
-  char const *const *pcpcc;
-  char **ppc;
-  /* NEC SVR4.0.2 mips cc rejects this.  */
-  struct point {int x, y;};
-  static struct point const zero = {0,0};
-  /* AIX XL C 1.02.0.0 rejects this.
- It does not let you subtract one const X* pointer from another in
- an arm of an if-expression whose if-part is not a constant
- expression */
-  const char *g = string;
-  pcpcc = g + (g ? g-g : 0);
-  /* HPUX 7.0 cc rejects these. */
-  ++pcpcc;
-  ppc = (char**) pcpcc;
-  pcpcc = (char const *const *) ppc;
-  { /* SCO 3.2v4 cc rejects this.  */
-char *t;
-char const *s = 0 ? (char *) 0 : (char const *) 0;
-
-*t++ = 0;
-if (s) return 0;
-  }
-  { /* Someone thinks the Sun supposedly-ANSI compiler will reject this.  */
-int x[] = {25, 17};
-const int *foo = x[0];
-++foo;
-  }
-  { /* Sun SC1.0 ANSI compiler rejects this -- but not the above. */
-typedef const int *iptr;
-iptr p = 0;
-++p;
-  }
-  { /* AIX XL C 1.02.0.0 rejects this saying
-   k.c, line 2.27: 1506-025 (S) Operand must be a modifiable lvalue. */
-struct s { int j; const int *ap[3]; };
-struct s *b; b-j = 5;
-  }
-  { /* ULTRIX-32 V3.1 (Rev 9) vcc rejects this */
-const int foo = 10;
-if (!foo) return 0;
-  }
-  return !cs[0]  !zero.x;
-#endif
-
-  ;
-  return 0;
-}
-_ACEOF
-rm -f conftest.$ac_objext
-if { (ac_try=$ac_compile
-case (($ac_try in
-  *\* | *\`* | *\\*) ac_try_echo=\$ac_try;;
-  *) ac_try_echo=$ac_try;;
-esac
-eval ac_try_echo=\\$as_me:$LINENO: $ac_try_echo\
-$as_echo $ac_try_echo) 5
-  (eval $ac_compile) 2conftest.er1
-  ac_status=$?
-  grep -v '^ *+' conftest.er1 conftest.err
-  rm -f conftest.er1
-  cat conftest.err 5
-  $as_echo $as_me:$LINENO: \$? = $ac_status 5
-  (exit $ac_status); }  {
-	 test -z $ac_c_werror_flag ||
-	 test ! -s conftest.err
-   }  test -s conftest.$ac_objext; then
-  ac_cv_c_const=yes
-else
-  $as_echo $as_me: failed program was: 5
-sed 's/^/| /' conftest.$ac_ext 5
-
-	ac_cv_c_const=no
-fi
-
-rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-fi
-{ $as_echo $as_me:$LINENO: result: $ac_cv_c_const 5
-$as_echo $ac_cv_c_const 6; }
-if test $ac_cv_c_const = no; then
-
-cat confdefs.h \_ACEOF
-#define const /**/
-_ACEOF
-
-fi
-
 { $as_echo $as_me:$LINENO: checking for inline 5
 $as_echo_n checking for inline...  6; }
 if test ${ac_cv_c_inline+set} = set; then
@@ -15572,40 +15465,6 @@ _ACEOF
 
 fi
 
-{ $as_echo $as_me:$LINENO: checking for preprocessor stringizing operator 5
-$as_echo_n checking for preprocessor stringizing operator...  6; }
-if test ${ac_cv_c_stringize+set} = set; then
-  $as_echo_n (cached)  6
-else
-  cat conftest.$ac_ext _ACEOF
-/* confdefs.h.  */
-_ACEOF
-cat confdefs.h conftest.$ac_ext
-cat conftest.$ac_ext _ACEOF
-/* end confdefs.h.  */
-#define x(y) #y
-
-char *s = x(teststring);
-_ACEOF
-if (eval $ac_cpp conftest.$ac_ext) 25 |
-  $EGREP #teststring /dev/null 21; then
-  ac_cv_c_stringize=no
-else
-  ac_cv_c_stringize=yes
-fi
-rm -f conftest*
-
-fi
-{ $as_echo $as_me:$LINENO: result: $ac_cv_c_stringize 5
-$as_echo $ac_cv_c_stringize 6; }
-if test $ac_cv_c_stringize = yes; then
-
-cat confdefs.h \_ACEOF
-#define HAVE_STRINGIZE 1
-_ACEOF
-
-fi
-
 
   { $as_echo

Re: [HACKERS] Transaction-lifespan memory leak with plpgsql DO blocks

2013-11-18 Thread Dilip kumar

On 13 November 2013 03:17 David Johnston wrote,

 
 Having had this same thought WRT the FOR UPDATE in LOOP bug posting
 the lack of a listing of outstanding bugs does leave some gaps.  I
 would imagine people would appreciate something like:
 
 Frequency: Rare
 Severity: Low
 Fix Complexity: Moderate
 Work Around: Easy - create an actual function; create some form of loop
 Status: Confirmed - Awaiting Volunteers to Fix

This problem is fixed as explained below..

1. Created own simple_eval_estate for every Do block in plpgsql_inline_handler 
and Freed it after the execution is finished.
2. While executing the simple expression if func-simple_eval_estate is not 
null (means its Do block) then use this otherwise use globle one.

Patch is attached in the mail.

Please let me know whether this approach is fine or not ?


Regards,
Dilip

 



transaction-lifespan_memory_leak_fix.patch
Description: transaction-lifespan_memory_leak_fix.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio

On Mon, Nov 18, 2013 at 6:12 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw
 values be just simple. However, were his changes just simple? I cannot
 understand his aesthetics sense and also you, too:-(

 It's too complicated, and do you know how to tuning PG from information of
 local_* and temp_*?
 At least, I think that most user cannot tuning from these information, and
 it might not be useful information only part of them.

All of those costs are cumulative aggregates. If we didn't aggregate
them, then the user couldn't possibly determine them on their own, to
any approximation. That's the difference. If you think the local_* and
temp_* aren't very useful, I'm inclined to agree, but it's too late to
do anything about that now.

 No. It's not for geek tools and people having pre-packaged solution in big
 company, but also for common DBA tools.

I don't think that the tool needs to be expensive. If selecting from
the pg_stat_statements view every 1-3 seconds is too expensive for
such a tool, we can have a discussion about being smarter, because
there certainly are ways to optimize it.

Regarding your min/max patch: I'm opposed to adding even more to the
spinlock-protected counters struct, so that we can get an exact answer
to a question where an approximate answer would very likely be good
enough. And as Itagaki-san said 4 years ago, who is to say that what
you've done here for buffers (or equally, what you've done in your
min/max patch) is more interesting than the same thing but for another
cost? The point of having what you've removed from the
pg_stat_statements docs about calculating averages is that it is an
example that can be generalized from. I certainly think there should
be better tooling to make displaying costs over time easier, or
characterizing the distribution, but unfortunately this isn't it.

Something like pg_stat_statements is allowed to be approximate. That's
considered an appropriate trade-off. Most obviously, today there can
be hash table collisions, and some of the entries can therefore be
plain wrong. Previously, I put the probability of 1 collision in the
hash table at about 1% when pg_stat_statements.max is set to 10,000.
So if your min/max patch was implemented in userspace, and an
outlier is lost in the noise with just one second of activity, I'm not
terribly concerned about that. It's a trade-off, and if you don't
think it's the correct one, well then I'm afraid that's just where you
and I differ. As I've said many times, if you want to have a
discussion about making aggressive snapshotting of the
pg_stat_statements view more practical, I think that would be very
useful.

 By the way, MySQL and Oracle database which are very popular have these
 statistics. I think that your argument might disturb people who wants to
 migration from these database and will accelerate popularity of these
 database more.

I think that there should be better tooling built on top of
pg_stat_statements. I don't know what Oracle does, but I'm pretty sure
that MySQL has nothing like pg_stat_statements. Please correct me if
I'm mistaken.

 As I said on the min/max thread, if we're not
 doing enough to help people who would like to build such a tool, we
 should discuss how we can do better.

 Could you tell me how to get min/max statistics with low cost?

See my previous comments on the other thread about making
pg_stat_statements only return changed entries, and only sending the
query text once.

 I'm not sure about detail of your patch in CF, but it seems very high cost.

I think you should actually go and read the code and read my
explanation of it, and refrain from making uninformed remarks like
this. Whatever overhead my patch may add, the important point is that
it obviously and self-evidently adds exactly zero overhead to
maintaining statistics for existing entries - we only care about the
query text when first creating an entry, or when viewing an entry when
the view is selected from. With the reduced shared memory consumption,
more entries can be created, making the cost of creating new entries
(and thus whatever might have been added to that cost) matter less.
Having said that, the additional cost is thought to be very low
anyway.

If you read my mail to the list about this, you'd know that I
specifically worried about the implications of what I'd proposed for
tools like your tool. That's part of the reason I keep urging you to
think about making pg_stat_statements cheaper for consumer tools.
There is no reason to send query texts to such tools more than once
per entry, and no reason to send unchanged entries.

 Repeatedly, I think that if we want to get drastic detail statistics, we
 have to create another tools of statistics. Your patch will be these
 statistics tools. However, pg_stat_statement sholud be just light weight.

This is incomprehensible. As with the cumulative aggregate costs, how
is a consumer

[HACKERS] Standalone synchronous master

2013-11-18 Thread Rajeev rastogi

This patch implements the following TODO item:

Add a new eager synchronous mode that starts out synchronous but reverts to 
asynchronous after a failure timeout period
This would require some type of command to be executed to alert administrators 
of this change.
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php

This patch implementation is in the same line as it was given in the earlier 
thread.
Some Of the additional important changes are:

1.   Have added two GUC variable to take commands from user to be executed

a.   Master_to_standalone_cmd: To be executed before master switches to 
standalone mode.

b.  Master_to_sync_cmd: To be executed before master switches from sync 
mode to standalone mode.

2.   Master mode switch will happen only if the corresponding command 
executed successfully.

3.   Taken care of replication timeout to decide whether synchronous 
standby has gone down. i.e. only after expiry of

wal_sender_timeout, the master will switch from sync mode to standalone mode.

Please provide your opinion or any other expectation out of this patch.

I will add the same to November commitFest.

Thanks and Regards,
Kumar Rajeev Rastogi


replication_new_modeV1.patch
Description: replication_new_modeV1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add \i option to bring in the specified file as a quoted literal

2013-11-18 Thread Piotr Marcinczyk

Dnia 2013-11-13, śro o godzinie 10:26 -0500, Bruce Momjian pisze:
 On Wed, Nov 13, 2013 at 08:58:07AM +0530, Amit Kapila wrote:
  On Tue, Nov 12, 2013 at 9:37 PM, Bruce Momjian br...@momjian.us wrote:
   On Wed, Oct 23, 2013 at 10:31:39AM +0530, Amit Kapila wrote:
   On Tue, Oct 22, 2013 at 3:04 AM, Piotr Marcinczyk pmarc...@gmail.com 
   wrote:
Hi,
   
I would like to implement item from TODO marked as easy: Add \i option
to bring in the specified file as a quoted literal. I understand 
intent
of this item, to be able to have parts of query written in separate
files (now it is impossible, because \i tries to execute content of 
file
as a separate command by function process_file).
  
   For the usecase discussed in the mail chain of that TODO item, Robert
   Haas has provided an alternative to achieve it, please check below
   link:
   http://www.postgresql.org/message-id/AANLkTi=7c8xfyf7uqw0y+si8ondkoy2nx8jc4bu0g...@mail.gmail.com
  
   If you think that alternative is not sufficient for the use case, then
   adding new option/syntax is worth, otherwise it might be a shortcut or
   other form of some existing way which can be useful depending on how
   frequently users use this syntax.
  
   So, can we remove this TODO item?
TODO item is created before Robert Haas has provided an alternative
  way to achieve the same thing. In some cases there are multiple ways
  to
achieve the same thing (example: shortcut options in psql) if it is
  used quite frequently and people want some easy way of doing it. In
  this case I
don't think this is used frequently, so I don't see the need of
  doing it. We should remove this TODO item.
 
 OK, removed.
 
Well, I wrote it few days ago. I'm sure it is not critical, but I
suppose it may be useful. This is my first patch, so I think that it is
good idea to sent it and have it reviewed anyway. First argument to send
it, is to see what kind of errors I made, to not do them in the next
patches. Second, if it (flexible appending file to buffer) appears
interesting for reviewer, it may be committed.
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
***
*** 1731,1736  hello 10
--- 1731,1749 
  
  
varlistentry
+ termliteral\ib replaceable class=parameterfilename/replaceable [ replaceable class=parameterquote_string/replaceable ] /literal/term
+ listitem
+ para
+ The literal\ib/ command appends content of file literalfilename/literal 
+ to current query buffer. If parameter literalquote_string/literal 
+ is not set, no quotation is used. If it is set, content of file will be
+ quoted by literalquote_string/literal enclosed in literal$/literal.
+ /para
+ /listitem
+   /varlistentry
+ 
+ 
+   varlistentry
  termliteral\ir replaceable class=parameterfilename/replaceable/literal/term
  listitem
  para
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
***
*** 59,64  static backslashResult exec_command(const char *cmd,
--- 59,65 
  			 PQExpBuffer query_buf);
  static bool do_edit(const char *filename_arg, PQExpBuffer query_buf,
  		int lineno, bool *edited);
+ static bool do_append(const char *filename_arg, const char *quote_string, PQExpBuffer query_buf);
  static bool do_connect(char *dbname, char *user, char *host, char *port);
  static bool do_shell(const char *command);
  static bool do_watch(PQExpBuffer query_buf, long sleep);
***
*** 798,806  exec_command(const char *cmd,
  	}
  
  
! 	/* \i and \ir include files */
  	else if (strcmp(cmd, i) == 0 || strcmp(cmd, include) == 0
! 		   || strcmp(cmd, ir) == 0 || strcmp(cmd, include_relative) == 0)
  	{
  		char	   *fname = psql_scan_slash_option(scan_state,
     OT_NORMAL, NULL, true);
--- 799,808 
  	}
  
  
! 	/* \i, \ib and \ir include files */
  	else if (strcmp(cmd, i) == 0 || strcmp(cmd, include) == 0
! 		   || strcmp(cmd, ir) == 0 || strcmp(cmd, include_relative) == 0
! 		   || strcmp(cmd, ib) == 0 || strcmp(cmd, include_buffer) == 0)
  	{
  		char	   *fname = psql_scan_slash_option(scan_state,
     OT_NORMAL, NULL, true);
***
*** 812,823  exec_command(const char *cmd,
  		}
  		else
  		{
! 			bool		include_relative;
  
- 			include_relative = (strcmp(cmd, ir) == 0
- || strcmp(cmd, include_relative) == 0);
- 			expand_tilde(fname);
- 			success = (process_file(fname, false, include_relative) == EXIT_SUCCESS);
  			free(fname);
  		}
  	}
--- 814,845 
  		}
  		else
  		{
! 			bool	   include_buffer;
! 
! 			include_buffer = (strcmp(cmd, ib) == 0
! || strcmp(cmd, include_buffer) == 0);
! 
! 
! 			if (include_buffer)
! 			{
! char *quote_string = psql_scan_slash_option(scan_state,
! 			OT_NORMAL, NULL, true);
! expand_tilde(fname);
! success = !do_append(fname, quote_string, query_buf);
! if

Re: [HACKERS] GIN improvements part2: fast scan

2013-11-18 Thread Rod Taylor

I checked out master and put together a test case using a small percentage
of production data for a known problem we have with Pg 9.2 and text search
scans.

A small percentage in this case means 10 million records randomly selected;
has a few billion records.


Tests ran for master successfully and I recorded timings.



Applied the patch included here to master along with
gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)

initdb, import dump


The GIN index fails to build with a segfault.

DETAIL:  Failed process was running: CREATE INDEX textsearch_gin_idx ON kp
USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT
NULL);


#0  XLogCheckBuffer (holdsExclusiveLock=1 '\001', lsn=lsn@entry=0x7fffcf341920,
bkpb=bkpb@entry=0x7fffcf341960, rdata=0x468f11 ginFindLeafPage+529,
rdata=0x468f11 ginFindLeafPage+529) at xlog.c:2339
#1  0x004b9ddd in XLogInsert (rmid=rmid@entry=13 '\r',
info=info@entry=16 '\020', rdata=rdata@entry=0x7fffcf341bf0) at xlog.c:936
#2  0x00468a9e in createPostingTree (index=0x7fa4e8d31030,
items=items@entry=0xfb55680, nitems=nitems@entry=762,
buildStats=buildStats@entry=0x7fffcf343dd0) at gindatapage.c:1324
#3  0x004630c0 in buildFreshLeafTuple (buildStats=0x7fffcf343dd0,
nitem=762, items=0xfb55680, category=optimized out, key=34078256,
attnum=optimized out, ginstate=0x7fffcf341df0) at gininsert.c:281
#4  ginEntryInsert (ginstate=ginstate@entry=0x7fffcf341df0,
attnum=optimized out, key=34078256, category=optimized out,
items=0xfb55680, nitem=762,
buildStats=buildStats@entry=0x7fffcf343dd0) at gininsert.c:351
#5  0x004635b0 in ginbuild (fcinfo=optimized out) at
gininsert.c:531
#6  0x00718637 in OidFunctionCall3Coll
(functionId=functionId@entry=2738,
collation=collation@entry=0, arg1=arg1@entry=140346257507968,
arg2=arg2@entry=140346257510448, arg3=arg3@entry=32826432) at
fmgr.c:1649
#7  0x004ce1da in index_build
(heapRelation=heapRelation@entry=0x7fa4e8d30680,
indexRelation=indexRelation@entry=0x7fa4e8d31030,
indexInfo=indexInfo@entry=0x1f4e440, isprimary=isprimary@entry=0
'\000', isreindex=isreindex@entry=0 '\000') at index.c:1963
#8  0x004ceeaa in index_create
(heapRelation=heapRelation@entry=0x7fa4e8d30680,

indexRelationName=indexRelationName@entry=0x1f4e660
textsearch_gin_knn_idx, indexRelationId=16395, indexRelationId@entry=0,
relFileNode=optimized out, indexInfo=indexInfo@entry=0x1f4e440,
indexColNames=indexColNames@entry=0x1f4f728,
accessMethodObjectId=accessMethodObjectId@entry=2742,
tableSpaceId=tableSpaceId@entry=0,
collationObjectId=collationObjectId@entry=0x1f4fcc8,

classObjectId=classObjectId@entry=0x1f4fce0,
coloptions=coloptions@entry=0x1f4fcf8,
reloptions=reloptions@entry=0, isprimary=0 '\000',
isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
is_internal=0 '\000') at index.c:1082
#9  0x00546a78 in DefineIndex (stmt=optimized out,
indexRelationId=indexRelationId@entry=0, is_alter_table=is_alter_table@entry=0
'\000',
check_rights=check_rights@entry=1 '\001', skip_build=skip_build@entry=0
'\000', quiet=quiet@entry=0 '\000') at indexcmds.c:594
#10 0x0065147e in ProcessUtilitySlow
(parsetree=parsetree@entry=0x1f7fb68,

queryString=0x1f7eb10 CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);,
context=optimized out, params=params@entry=0x0,
completionTag=completionTag@entry=0x7fffcf344c10 , dest=optimized out)
at utility.c:1163
#11 0x0065079e in standard_ProcessUtility (parsetree=0x1f7fb68,
queryString=optimized out, context=optimized out, params=0x0,
dest=optimized out, completionTag=0x7fffcf344c10 ) at utility.c:873
#12 0x0064de61 in PortalRunUtility (portal=portal@entry=0x1f4c350,
utilityStmt=utilityStmt@entry=0x1f7fb68, isTopLevel=isTopLevel@entry=1
'\001',
dest=dest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10
) at pquery.c:1187
#13 0x0064e9e5 in PortalRunMulti (portal=portal@entry=0x1f4c350,
isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1f7ff08,
altdest=altdest@entry=0x1f7ff08,
completionTag=completionTag@entry=0x7fffcf344c10
) at pquery.c:1318
#14 0x0064f459 in PortalRun (portal=portal@entry=0x1f4c350,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001',
dest=dest@entry=0x1f7ff08, altdest=altdest@entry=0x1f7ff08,
completionTag=completionTag@entry=0x7fffcf344c10 ) at pquery.c:816
#15 0x0064d2d5 in exec_simple_query (
query_string=0x1f7eb10 CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);) at
postgres.c:1048
#16 PostgresMain (argc=optimized out, argv=argv@entry=0x1f2ad40,
dbname=0x1f2abf8 rbt, username=optimized out) at

Re: [HACKERS] GIN improvements part2: fast scan

2013-11-18 Thread Rod Taylor

I tried again this morning using gin-packed-postinglists-16.patch and
gin-fast-scan.6.patch. No crashes.

It is about a 0.1% random sample of production data (10,000,000 records)
with the below structure. Pg was compiled with debug enabled in both cases.

  Table public.kp
 Column |  Type   | Modifiers
+-+---
 id | bigint  | not null
 string | text| not null
 score1 | integer |
 score2 | integer |
 score3 | integer |
 score4 | integer |
Indexes:
kp_pkey PRIMARY KEY, btree (id)
kp_string_key UNIQUE CONSTRAINT, btree (string)
textsearch_gin_idx gin (to_tsvector('simple'::regconfig, string))
WHERE score1 IS NOT NULL



This is a query tested. All data is in Pg buffer cache for these timings.
Words like the and and are very common (~9% of entries, each) and a
word like hotel is much less common (~0.2% of entries).

  SELECT id,string
FROM kp
   WHERE score1 IS NOT NULL
 AND to_tsvector('simple', string) @@ to_tsquery('simple', ?)
 -- ? is substituted with the query strings
ORDER BY score1 DESC, score2 ASC
LIMIT 1000;

 Limit  (cost=56.04..56.04 rows=1 width=37) (actual time=250.010..250.032
rows=142 loops=1)
   -  Sort  (cost=56.04..56.04 rows=1 width=37) (actual
time=250.008..250.017 rows=142 loops=1)
 Sort Key: score1, score2
 Sort Method: quicksort  Memory: 36kB
 -  Bitmap Heap Scan on kp  (cost=52.01..56.03 rows=1 width=37)
(actual time=249.711..249.945 rows=142 loops=1)
   Recheck Cond: ((to_tsvector('simple'::regconfig, string) @@
'''hotel''  ''and''  ''the'''::tsquery) AND (score1 IS NOT NULL))
   -  Bitmap Index Scan on textsearch_gin_idx
(cost=0.00..52.01 rows=1 width=0) (actual time=249.681..249.681 rows=142
loops=1)
 Index Cond: (to_tsvector('simple'::regconfig, string)
@@ '''hotel''  ''and''  ''the'''::tsquery)
 Total runtime: 250.096 ms



Times are from \timing on.

MASTER
===
the:   888.436 ms   926.609 ms   885.502 ms
and:   944.052 ms   937.732 ms   920.050 ms
hotel:  53.992 ms57.039 ms65.581 ms
and  the  hotel: 260.308 ms   248.275 ms   248.098 ms

These numbers roughly match what we get with Pg 9.2. The time savings
between 'the' and 'and  the  hotel' is mostly heap lookups for the score
and the final sort.



The size of the index on disk is about 2% smaller in the patched version.

PATCHED
===
the:  1055.169 ms 1081.976 ms  1083.021 ms
and:   912.173 ms  949.364 ms   965.261 ms
hotel:  62.591 ms   64.341 ms62.923 ms
and  the  hotel: 268.577 ms  259.293 ms   257.408 ms
hotel  and  the: 253.574 ms  258.071 ms  250.280 ms

I was hoping that the 'and  the  hotel' case would improve with this
patch to be closer to the 'hotel' search, as I thought that was the kind of
thing it targeted. Unfortunately, it did not. I actually applied the
patches, compiled, initdb/load data, and ran it again thinking I made a
mistake.

Reordering the terms 'hotel  and  the' doesn't change the result.





On Fri, Nov 15, 2013 at 1:51 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Fri, Nov 15, 2013 at 3:25 AM, Rod Taylor r...@simple-knowledge.comwrote:

 I checked out master and put together a test case using a small
 percentage of production data for a known problem we have with Pg 9.2 and
 text search scans.

 A small percentage in this case means 10 million records randomly
 selected; has a few billion records.


 Tests ran for master successfully and I recorded timings.



 Applied the patch included here to master along with
 gin-packed-postinglists-14.patch.
 Run make clean; ./configure; make; make install.
 make check (All 141 tests passed.)

 initdb, import dump


 The GIN index fails to build with a segfault.


 Thanks for testing. See fixed version in thread about packed posting lists.

 --
 With best regards,
 Alexander Korotkov.

Re: [HACKERS] pg_upgrade: delete_old_cluster.sh issues

On Tue, Nov 12, 2013 at 10:35:58AM +, Marc Mamin wrote:
Hello,

IMHO, there is a serious issue in the script to clean the old data directory
when running pg_upgrade in link mode.

in short: When working with symbolic links, the first step in
delete_old_cluster.sh
is to delete the old $PGDATA folder that may contain tablespaces used by the
new instance.

in long, our use case:

Rather than removing files/directories individually, which would be
difficult to maintain, we decided in pg_upgrade 9.3 to detect
tablespaces in the old data directory and report that and not create a
delete script. Here is the commit:

http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=4765dd79219b9697d84f5c2c70f3fe00455609a1

The problem with your setup is that while you didn't pass symbolic links
to pg_upgrade, you did use symbolic links when defining the tablespaces,
so pg_upgrade couldn't recognize that the symbolic links were inside the
old data directory.

We could use readlink() to go walk over all symbolic links, but that
seems quite complex. We could use stat() and make sure there are no
matching inodes in the old data directory, or that they are in a
different file system. We could look for a directory named after the PG
catversion in the old data directory. We could update the docs.

I am not sure what to do. We never expected people would put
tablespaces in the data directory.

--
Bruce Momjian br...@momjian.ushttp://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block