Re: [HACKERS] Enabling Checksums

2013-03-13 Thread Jeff Davis
On Thu, 2013-03-07 at 13:45 -0800, Jeff Davis wrote:
 I need to do another self-review after these changes and some more
 extensive testing, so I might have missed a couple things.

New patch attached.

Aside from rebasing, I also found a problem with temp tables. At first I
was going to fix it by continuing to exclude temp tables from checksums
entirely. But then I re-thought it and decided to just checksum temp
tables, too.

Excluding temp tables from checksums means more special cases in the
code, and more documentation. After thinking about it, there is no huge
benefit to excluding temp tables:
  * small temp tables will be in memory only, and never checksummed
  * no WAL for temp tables, so the biggest cost of checksums is
non-existent
  * there are good reasons to want to checksum temp tables, because they
can be used to stage data for permanent tables

However, I'm willing to be convinced to exclude temp tables again.

Regards,
Jeff Davis



checksums-20130312.patch.gz
Description: GNU Zip compressed data


replace-tli-with-checksums-20130312.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Craig Ringer
On 03/12/2013 06:27 AM, Craig Ringer wrote:
  Think also about the case where someone wants to change multiple
  values together and having just some set and not others would be
  inconsistent.
 Yeah, that's a killer. The reload would need to be scheduled for COMMIT
 time, it can't be done by `SET PERSISTENT` directly.
Thinking about this some more, I'm not sure this is a good idea.

Right now, SET takes effect immediately. Always, without exception. 
Delaying SET PERSISTENT's effects until commit would make it
inconsistent with SET's normal behaviour.

However, *not* delaying it would make it another quirky
not-transactional not-atomic command. That's OK, but if it's not going
to follow transactional semantics it should not be allowed to run within
a transaction, like VACUUM .

Writing the changes immediately but deferring the reload until commit
seems to be the worst of those two worlds.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-13 Thread Alexander Korotkov
Hi!

Thanks for your work on this patch!

On Tue, Mar 12, 2013 at 8:03 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 So, after some hacking, I ended up with this version. I find it more
 readable, I hope I didn't miss anything. This seems to produce results that
 are close, but not identical, to the original patch. I'm not sure where the
 discrepancy is coming from, or which patch is more correct in that respect.
 I'll continue from this tomorrow, but if you have the time, please take a
 look and let me know what you think.


I've read your explanation and version of patch. In general it seems
correct to me.
There is one point why I have breaked up abstraction in some functions is
infinities. For example, in calc_length_hist_frac one or both of length1
and length2 can be infinity. In the line
 frac = area / (length2 - length1);
you can get NaN result. I've especially adjusted the code to get more of
less correct result in this case.

Another minor note about this line
 bin_width *= get_position(typcache, lower, hist_lower[i],
  hist_lower[i + 1]);
ITSM it sould looks like
 bin_width -= 1.0 - get_position(typcache, lower, hist_lower[i],
   hist_lower[i + 1]);
Imagine lower and upper bounds fall into same histogram bin. In this case
we should subtract lengths of both parts which were cut in the left and in
the right.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] leaking lots of unreferenced inodes (pg_xlog files?), maybe after moving tables and indexes to tablespace on different volume

2013-03-13 Thread Magnus Hagander
On Mar 13, 2013 3:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Palle Girgensohn gir...@freebsd.org writes:
  ... I got lots of space freed
  up, but it seems that after that the disk usage grows linearly (it seems
  to leave many inodes unreferenced).

 Hm.  We've seen issues in the past with PG processes failing to close
 no-longer-useful files promptly, but ...

  Strange thing is I cannot find any open files.

 ... that suggests there's something else going on.

  The unreferenced inodes are almost exclusively around 16 MB in size, so
  i.e. they would most probably all be pg_xlog files.

 Have you got any sort of WAL archiving active, and if so maybe that's
 holding onto WAL files?  Not that it's clear how come lsof wouldn't
 tattle on an archiving process either.

  Stopping postgresql briefly did not help, I tried that.

 That seems to point the finger at some non-postgres cause.  I confess
 I can't guess what.


Yeah, unreferenced inodes with no open files, and only discoverable with
fsck sounds like a filsystem bug to me. Particularly since it showed up
just after a operating system upgrade, and doesn't go away with a postgres
restart...

/Magnus


Re: Column defaults for foreign tables (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-13 Thread Albe Laurenz
Tom Lane wrote:
 Yeah, I'm drifting towards the position that we should just define the
 defaults as being whatever they are locally, rather than trying to be
 cute about supporting remotely-executed defaults.  It looks to me like
 if we try to do the latter, we're going to have pitfalls and weird
 corner cases that will never be quite transparent.  There's also the
 argument that this'd be a lot of work that benefits only some FDWs,
 since the whole concept of remote column defaults doesn't apply when
 the FDW's data source isn't a traditional RDBMS.

That was my first thought on the topic, to have a solution that
is simple (if not perfect).
Your argument that it would be unpleasant to lose the ability
to use sequence-generated remote default values made me reconsider.

But there is a workaround, namely to use a trigger before insert
to generate an automatic primary key (e.g. if the inserted value is
NULL).
Maybe it would be good to add a few hints at workarounds like that
to the documentation if it's going to be local defaults.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Boszormenyi Zoltan

2013-03-13 07:42 keltezéssel, Craig Ringer írta:

On 03/12/2013 06:27 AM, Craig Ringer wrote:

Think also about the case where someone wants to change multiple
values together and having just some set and not others would be
inconsistent.

Yeah, that's a killer. The reload would need to be scheduled for COMMIT
time, it can't be done by `SET PERSISTENT` directly.

Thinking about this some more, I'm not sure this is a good idea.

Right now, SET takes effect immediately. Always, without exception.
Delaying SET PERSISTENT's effects until commit would make it
inconsistent with SET's normal behaviour.

However, *not* delaying it would make it another quirky
not-transactional not-atomic command. That's OK, but if it's not going
to follow transactional semantics it should not be allowed to run within
a transaction, like VACUUM .

Writing the changes immediately but deferring the reload until commit
seems to be the worst of those two worlds.


I was thinking about it a little. There is a hook that runs at the end
of (sub-)transactions. It can be abused for this purpose to make
SET PERSISTENT transactional. The subtransactions can also stack
these settings, by forgetting settings upon ROLLBACK [ TO SAVEPOINT ]
and overwriting previous settings upon COMMIT and RELEASE SAVEPOINT.
All it needs is a list and maintaining intermediate pointers when entering
into a new level of SAVEPOINT. The functions to register such hooks are
in src/include/access/xact.h:

extern void RegisterXactCallback(XactCallback callback, void *arg);
extern void UnregisterXactCallback(XactCallback callback, void *arg);
extern void RegisterSubXactCallback(SubXactCallback callback, void *arg);
extern void UnregisterSubXactCallback(SubXactCallback callback, void *arg);

The per-session local effect can also be immediate, it just needs
the normal SET and SET PERSISTENT code paths to be unified.

On the other hand, it would make a lot of sense to implement it
as one setting per file or extending the  code to allow modifying
settings in bulk. The one setting per file should be easier and
it would also allow extensions to drop their settings automatically
into the automatic config directory. I don't know who mentioned
this idea about extensions but it also came up a while back.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some regression tests for SEQUENCE

2013-03-13 Thread Albe Laurenz
robins wrote:
 Attached is a small patch to test corner cases related to Sequences 
 (basically aimed at increasing
 code-coverage of sequence.sql in regression tests).
 
 Look forward to any and all feedback.

Looks ok except that the patch is backwards
(all added lines start with -).  I found a typo:
exit instead of exist.

You should add the patch to the next commitfest
(http://wiki.postgresql.org/wiki/Submitting_a_Patch).

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scanner/parser minimization

2013-03-13 Thread Simon Riggs
On 2 March 2013 18:47, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Uh ... no.  I haven't looked into why the flex tables are so large,
 but this theory is just wrong.  See ScanKeywordLookup().


 Interestingly, the yy_transition array generated by flex used to be much
 smaller:

 8.3: 22072 elements
 8.4: 62623 elements
 master: 64535 elements

 The big jump between 8.3 and 8.4 was caused by introduction of the unicode
 escapes: U'foo' [UESCAPE 'x'] . And in particular, the error rule for the
 UESCAPE, which we use to avoid backtracking.

 I experimented with a patch that uses two extra flex states to shorten the
 error rules, see attached. The idea is that after lexing a unicode literal
 like U'foo', you enter a new state, in which you check whether an
 UESCAPE 'x' follows. This slashes the size of the array to 36581 elements.

+1 to do this sooner rather than later

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Boszormenyi Zoltan

2013-03-13 09:09 keltezéssel, Boszormenyi Zoltan írta:

2013-03-13 07:42 keltezéssel, Craig Ringer írta:

On 03/12/2013 06:27 AM, Craig Ringer wrote:

Think also about the case where someone wants to change multiple
values together and having just some set and not others would be
inconsistent.

Yeah, that's a killer. The reload would need to be scheduled for COMMIT
time, it can't be done by `SET PERSISTENT` directly.

Thinking about this some more, I'm not sure this is a good idea.

Right now, SET takes effect immediately. Always, without exception.
Delaying SET PERSISTENT's effects until commit would make it
inconsistent with SET's normal behaviour.

However, *not* delaying it would make it another quirky
not-transactional not-atomic command. That's OK, but if it's not going
to follow transactional semantics it should not be allowed to run within
a transaction, like VACUUM .

Writing the changes immediately but deferring the reload until commit
seems to be the worst of those two worlds.


I was thinking about it a little. There is a hook that runs at the end
of (sub-)transactions. It can be abused for this purpose to make
SET PERSISTENT transactional. The subtransactions can also stack
these settings, by forgetting settings upon ROLLBACK [ TO SAVEPOINT ]
and overwriting previous settings upon COMMIT and RELEASE SAVEPOINT.


By overwriting at RELEASE SAVEPOINT I meant only the memory copy
so the value that was set last is remembered at the same level of subxact.
The final settings make it to the configuration file at COMMIT time.


All it needs is a list and maintaining intermediate pointers when entering
into a new level of SAVEPOINT. The functions to register such hooks are
in src/include/access/xact.h:

extern void RegisterXactCallback(XactCallback callback, void *arg);
extern void UnregisterXactCallback(XactCallback callback, void *arg);
extern void RegisterSubXactCallback(SubXactCallback callback, void *arg);
extern void UnregisterSubXactCallback(SubXactCallback callback, void *arg);

The per-session local effect can also be immediate, it just needs
the normal SET and SET PERSISTENT code paths to be unified.

On the other hand, it would make a lot of sense to implement it
as one setting per file or extending the  code to allow modifying
settings in bulk. The one setting per file should be easier and
it would also allow extensions to drop their settings automatically
into the automatic config directory. I don't know who mentioned
this idea about extensions but it also came up a while back.

Best regards,
Zoltán Böszörményi




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some regression tests for SEQUENCE

2013-03-13 Thread robins
Thanks Laurenz.

Would correct these (and readup) before submitting next patch.

--
Robins
Tharakan


On 13 March 2013 13:49, Albe Laurenz laurenz.a...@wien.gv.at wrote:

 robins wrote:
  Attached is a small patch to test corner cases related to Sequences
 (basically aimed at increasing
  code-coverage of sequence.sql in regression tests).
 
  Look forward to any and all feedback.

 Looks ok except that the patch is backwards
 (all added lines start with -).  I found a typo:
 exit instead of exist.

 You should add the patch to the next commitfest
 (http://wiki.postgresql.org/wiki/Submitting_a_Patch).

 Yours,
 Laurenz Albe



Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Robert Haas
On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net wrote:
 So my order of preference for the options would be:

 1. Have the JSON type collapse objects so the last instance of a key wins
 and is actually stored

 2. Throw an error when a JSON type has duplicate keys

 3. Have the accessors find the last instance of a key and return that
 value

 4. Let things remain as they are now

 On second though, I don't like 4 at all. It means that the JSON type
 things a value is valid while the accessor does not. They contradict one
 another.

 You can forget 1. We are not going to have the parser collapse anything.
 Either the JSON it gets is valid or it's not. But the parser isn't going to
 try to MAKE it valid.

Why not?  Because it's the wrong thing to do, or because it would be slower?

What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
 numeric
-
  10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Robert Haas
On Wed, Mar 6, 2013 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On the other hand, I don't have a problem with decreeing that
 non-Postgres FDWs need to use PK row identification in the first
 release; which would be the consequence if we don't do anything about
 allowing new system columns in 9.3.  We will certainly need that style
 of row identification to be written and tested anyway.  It won't stop
 us from extending things later.

Oh, I didn't realize that was how it was going to work out.  That
seems very reasonable to me.  There is a performance problem with
forcing DELETE FROM ft WHERE nonkey = 5 to be pushed to the remote
side as SELECT pk FROM ft WHERE nonkey = 5 followed by DELETE FROM ft
WHERE pk = $1 for each pk value returned by the SELECT, which sounds
like it's what will happen under this system.  But I don't have any
problem leaving that as future work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Andres Freund
On 2013-03-12 10:46:53 +0530, Amit Kapila wrote:
 Do you mean to say that because some variables can only be set after restart
 can lead to 
 inconsistency, or is it because of asynchronous nature of pg_reload_conf()?

As long as SET PERSISTENT cannot be executed inside a transaction - or
only takes effect after its end - there doesn't seem to be any problem
executing ProcessConfigFile() directly.

The reason its not executed directly is that it is normally called
during query execution and it wouldn't be nice changing stuff after half
of a query has been processed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Andres Freund
On 2013-03-13 09:09:24 +0100, Boszormenyi Zoltan wrote:
 2013-03-13 07:42 keltezéssel, Craig Ringer írta:
 On 03/12/2013 06:27 AM, Craig Ringer wrote:
 Think also about the case where someone wants to change multiple
 values together and having just some set and not others would be
 inconsistent.
 Yeah, that's a killer. The reload would need to be scheduled for COMMIT
 time, it can't be done by `SET PERSISTENT` directly.
 Thinking about this some more, I'm not sure this is a good idea.
 
 Right now, SET takes effect immediately. Always, without exception.
 Delaying SET PERSISTENT's effects until commit would make it
 inconsistent with SET's normal behaviour.
 
 However, *not* delaying it would make it another quirky
 not-transactional not-atomic command. That's OK, but if it's not going
 to follow transactional semantics it should not be allowed to run within
 a transaction, like VACUUM .
 
 Writing the changes immediately but deferring the reload until commit
 seems to be the worst of those two worlds.
 
 I was thinking about it a little. There is a hook that runs at the end
 of (sub-)transactions. It can be abused for this purpose to make
 SET PERSISTENT transactional. The subtransactions can also stack
 these settings, by forgetting settings upon ROLLBACK [ TO SAVEPOINT ]
 and overwriting previous settings upon COMMIT and RELEASE SAVEPOINT.
 All it needs is a list and maintaining intermediate pointers when entering
 into a new level of SAVEPOINT. The functions to register such hooks are
 in src/include/access/xact.h:
 
 extern void RegisterXactCallback(XactCallback callback, void *arg);
 extern void UnregisterXactCallback(XactCallback callback, void *arg);
 extern void RegisterSubXactCallback(SubXactCallback callback, void *arg);
 extern void UnregisterSubXactCallback(SubXactCallback callback, void *arg);

(sub)xact commit/abort already calls AtEOXact_GUC(commit, nestlevel). So you
wouldn't even need that. It seems we could add another value to enum
GucStackState, like GUC_SET_PERSISTENT - and process those only if commit 
nestlevel == 1.
Everytime you see one with commit  nestlevel  1 you put them into them into
the stack one level up.

This seems like its somewhat in line with the way SET LOCAL is implemented?

 On the other hand, it would make a lot of sense to implement it
 as one setting per file or extending the  code to allow modifying
 settings in bulk. The one setting per file should be easier and
 it would also allow extensions to drop their settings automatically
 into the automatic config directory. I don't know who mentioned
 this idea about extensions but it also came up a while back.

I still think one-setting-per-file is the right way to go, yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Craig Ringer
On 03/13/2013 08:17 PM, Robert Haas wrote:
 I think Andrew and I had envisioned this as basically a text data type
 that enforces some syntax checking on its input, hence the current
 design.  But I'm not sure that's the ONLY sensible design.
We're probably stuck with it at this point, but it may well be worth
considering the later introduction of a compatible `jsonobj` that stores
parsed and normalized json objects in some internal format the client
doesn't have to care about, like serialized V8 JS VM objects.

I suspect that such a type is better offered by a contrib until/unless
PL/V8 or a similar becomes a core language. It'd be nuts to try to
re-implement all of the JSON and javascript object functionality in a
javascript engine when we can just plug an existing one in and use its
JSON and javascript object manipulation. The minimalist approach makes
sense for the json type precisely because it's just validated text, but
I don't think it makes sense to continually extend it and slowly
reinvent a whole javascript engine in Pg.

If we're going to do things like normalizing json I think that's a job
for a real JavaScript engine that understands Javascript objects.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andrew Dunstan


On 03/13/2013 08:17 AM, Robert Haas wrote:

On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net wrote:

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a key wins
and is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that
value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type
things a value is valid while the accessor does not. They contradict one
another.

You can forget 1. We are not going to have the parser collapse anything.
Either the JSON it gets is valid or it's not. But the parser isn't going to
try to MAKE it valid.

Why not?  Because it's the wrong thing to do, or because it would be slower?

What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
  numeric
-
   10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.




I think we've moved on from this point, because a) other implementations 
allow duplicate keys, b) it's trivially easy to make Postgres generate 
such json, and c) there is some dispute about exactly what the spec 
mandates.


I'll be posting a revised patch shortly that doesn't error out but 
simply uses the value for the later key lexically.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Amit Kapila
On Wednesday, March 13, 2013 6:10 PM Andres Freund wrote:
 On 2013-03-12 10:46:53 +0530, Amit Kapila wrote:
  Do you mean to say that because some variables can only be set after
 restart
  can lead to
  inconsistency, or is it because of asynchronous nature of
 pg_reload_conf()?
 
 As long as SET PERSISTENT cannot be executed inside a transaction - or
 only takes effect after its end - there doesn't seem to be any problem
 executing ProcessConfigFile() directly.

Do you mean to say we call directly ProcessConfigFile() at end of SET
PERSISTENT instead 
Of pg_reload_conf() but in that case would it load the variables for other
backends?


With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andres Freund
On 2013-03-13 20:48:57 +0800, Craig Ringer wrote:
 On 03/13/2013 08:17 PM, Robert Haas wrote:
  I think Andrew and I had envisioned this as basically a text data type
  that enforces some syntax checking on its input, hence the current
  design.  But I'm not sure that's the ONLY sensible design.
 We're probably stuck with it at this point, but it may well be worth
 considering the later introduction of a compatible `jsonobj` that stores
 parsed and normalized json objects in some internal format the client
 doesn't have to care about, like serialized V8 JS VM objects.
 
 I suspect that such a type is better offered by a contrib until/unless
 PL/V8 or a similar becomes a core language. It'd be nuts to try to
 re-implement all of the JSON and javascript object functionality in a
 javascript engine when we can just plug an existing one in and use its
 JSON and javascript object manipulation. The minimalist approach makes
 sense for the json type precisely because it's just validated text, but
 I don't think it makes sense to continually extend it and slowly
 reinvent a whole javascript engine in Pg.

While I am not convinced - but not the contrary either - that using
something like V8 is a good idea, I wish the patch adding json had
reserved the first byte in the varlena for the 'json encoding' or
something similar. That would have left the road open for easily adding
different encodings in the future. Now youre left of marking it with a
nullbyte in the beginning or similar atrocities...

Just wanted to say that we might want to think about such stuff now that
we preserve cross-version compatibility of on-disk data the next time we
add a type.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Andres Freund
On 2013-03-13 18:38:12 +0530, Amit Kapila wrote:
 On Wednesday, March 13, 2013 6:10 PM Andres Freund wrote:
  On 2013-03-12 10:46:53 +0530, Amit Kapila wrote:
   Do you mean to say that because some variables can only be set after
  restart
   can lead to
   inconsistency, or is it because of asynchronous nature of
  pg_reload_conf()?
  
  As long as SET PERSISTENT cannot be executed inside a transaction - or
  only takes effect after its end - there doesn't seem to be any problem
  executing ProcessConfigFile() directly.
 
 Do you mean to say we call directly ProcessConfigFile() at end of SET
 PERSISTENT instead 
 Of pg_reload_conf() but in that case would it load the variables for other
 backends?

I'd say do both. Yes, we would evaluate config potentially twice. Who
cares. Messages inside non-postmaster environments are only output at DEBUG2
anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Amit Kapila
On Wednesday, March 13, 2013 6:44 PM Andres Freund wrote:
 On 2013-03-13 18:38:12 +0530, Amit Kapila wrote:
  On Wednesday, March 13, 2013 6:10 PM Andres Freund wrote:
   On 2013-03-12 10:46:53 +0530, Amit Kapila wrote:
Do you mean to say that because some variables can only be set
 after
   restart
can lead to
inconsistency, or is it because of asynchronous nature of
   pg_reload_conf()?
  
   As long as SET PERSISTENT cannot be executed inside a transaction -
 or
   only takes effect after its end - there doesn't seem to be any
 problem
   executing ProcessConfigFile() directly.
 
  Do you mean to say we call directly ProcessConfigFile() at end of SET
  PERSISTENT instead
  Of pg_reload_conf() but in that case would it load the variables for
 other
  backends?
 
 I'd say do both. Yes, we would evaluate config potentially twice. Who
 cares. Messages inside non-postmaster environments are only output at
 DEBUG2
 anyway.

I could see your point, when you say do both that you want that in current
session,
the values will be immediately available which can make user happy.
However if there is any error during function ProcessConfigFile(), it could
be little
inconvenient for user as the setting would have been done in file but memory
processing 
has created problem.
Ideally, there should not be any error in function ProcessConfigFile(), but
might be some palloc or such call
would be sufficient to make user unhappy.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Merlin Moncure
On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Shigeru Hanada shigeru.han...@gmail.com writes:
 I'm not sure whether postgres_fdw should support, but updatable views
 have no system column including ctid.  So, we need magic identifier,
 perhaps it would be set of primary key value(s), to support updating
 remote updatable views via foreign tables.

 Yeah, I considered that.  I thought seriously about proposing that we
 forget magic row identifiers altogether, and instead make postgres_fdw
 require a remote primary key for a foreign table to be updatable.

IMO, Utilizing anything but this for remote record identification is
an implementation specific optimization.  Aren't the semantics
different though?  If you go:

update foo set id = 1 where id = 1;

the primary key would not change, but the ctid would.  or is that
already a handled?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 6, 2013 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On the other hand, I don't have a problem with decreeing that
 non-Postgres FDWs need to use PK row identification in the first
 release; which would be the consequence if we don't do anything about
 allowing new system columns in 9.3.  We will certainly need that style
 of row identification to be written and tested anyway.  It won't stop
 us from extending things later.

 Oh, I didn't realize that was how it was going to work out.  That
 seems very reasonable to me.  There is a performance problem with
 forcing DELETE FROM ft WHERE nonkey = 5 to be pushed to the remote
 side as SELECT pk FROM ft WHERE nonkey = 5 followed by DELETE FROM ft
 WHERE pk = $1 for each pk value returned by the SELECT, which sounds
 like it's what will happen under this system.  But I don't have any
 problem leaving that as future work.

That performance issue exists for *all* foreign updates/deletes at the
moment, with or without a ctid-ish column.  As you say, fixing it is
material for 9.4 or later.  (It's possible that an FDW could dodge this
by itself without any additional help from the core code, but I'm not
sure.  Anyway postgres_fdw hasn't tried yet, and I think there are a
number of things that are higher priority to worry about than bulk
update performance.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, I considered that.  I thought seriously about proposing that we
 forget magic row identifiers altogether, and instead make postgres_fdw
 require a remote primary key for a foreign table to be updatable.

 IMO, Utilizing anything but this for remote record identification is
 an implementation specific optimization.  Aren't the semantics
 different though?  If you go:

 update foo set id = 1 where id = 1;

 the primary key would not change, but the ctid would.  or is that
 already a handled?

In postgres_fdw as it currently stands, the remote ctid would change.
I'm not sure we should posit that as a universal property of FDWs
though.  It's not even a meaningful question for FDWs with no underlying
rowid concept.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Andres Freund
On 2013-03-13 18:52:48 +0530, Amit Kapila wrote:
 On Wednesday, March 13, 2013 6:44 PM Andres Freund wrote:
  On 2013-03-13 18:38:12 +0530, Amit Kapila wrote:
   On Wednesday, March 13, 2013 6:10 PM Andres Freund wrote:
On 2013-03-12 10:46:53 +0530, Amit Kapila wrote:
 Do you mean to say that because some variables can only be set
  after
restart
 can lead to
 inconsistency, or is it because of asynchronous nature of
pg_reload_conf()?
   
As long as SET PERSISTENT cannot be executed inside a transaction -
  or
only takes effect after its end - there doesn't seem to be any
  problem
executing ProcessConfigFile() directly.
  
   Do you mean to say we call directly ProcessConfigFile() at end of SET
   PERSISTENT instead
   Of pg_reload_conf() but in that case would it load the variables for
  other
   backends?
  
  I'd say do both. Yes, we would evaluate config potentially twice. Who
  cares. Messages inside non-postmaster environments are only output at
  DEBUG2
  anyway.
 
 I could see your point, when you say do both that you want that in current
 session,
 the values will be immediately available which can make user happy.
 However if there is any error during function ProcessConfigFile(), it could
 be little  inconvenient for user as the setting would have been done in file 
 but memory
 processing has created problem.

But thats pretty independent from this? If anything it allows for
*better* reporting of problems since you could convert the log level to
WARNING if ProcessConfigFile() is executed in foreground - which at least
interactive sessions normally will noramlly be displayed for the user.

If you do don't do it immediately you're in the same situation after the
pg_reload_config(), just that the user won't see any error messages.

There is something I am more worried about which is that it ight be bad
if a postmaster child adapts new values bfore postmaster does. I right
now can't think of any new dangers since the reverse is already true...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Merlin Moncure
On Wed, Mar 13, 2013 at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, I considered that.  I thought seriously about proposing that we
 forget magic row identifiers altogether, and instead make postgres_fdw
 require a remote primary key for a foreign table to be updatable.

 IMO, Utilizing anything but this for remote record identification is
 an implementation specific optimization.  Aren't the semantics
 different though?  If you go:

 update foo set id = 1 where id = 1;

 the primary key would not change, but the ctid would.  or is that
 already a handled?

 In postgres_fdw as it currently stands, the remote ctid would change.
 I'm not sure we should posit that as a universal property of FDWs
 though.  It's not even a meaningful question for FDWs with no underlying
 rowid concept.

I just find it odd that rowid concept is used at all without strong
guarantee that the record you are referencing is the one you are
supposed to be referencing.  Basically I'm saying PKEY semantics are
the correct ones and that ctid is ok to use iff they match the pkey
ones.  I don't think this is possible unless you maintain a remote
lock on the ctid between when you fetch it and do some other
operation.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 I just find it odd that rowid concept is used at all without strong
 guarantee that the record you are referencing is the one you are
 supposed to be referencing.  Basically I'm saying PKEY semantics are
 the correct ones and that ctid is ok to use iff they match the pkey
 ones.  I don't think this is possible unless you maintain a remote
 lock on the ctid between when you fetch it and do some other
 operation.

postgres_fdw does maintain such a lock (it does SELECT FOR UPDATE when
scanning an update/delete target table).  However, I'm not exactly sure
why you think a pkey-based design would be free of such hazards.  pkeys
aren't immutable either ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Andres Freund
On 2013-03-13 09:51:59 -0500, Merlin Moncure wrote:
 On Wed, Mar 13, 2013 at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Merlin Moncure mmonc...@gmail.com writes:
  On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yeah, I considered that.  I thought seriously about proposing that we
  forget magic row identifiers altogether, and instead make postgres_fdw
  require a remote primary key for a foreign table to be updatable.
 
  IMO, Utilizing anything but this for remote record identification is
  an implementation specific optimization.  Aren't the semantics
  different though?  If you go:
 
  update foo set id = 1 where id = 1;
 
  the primary key would not change, but the ctid would.  or is that
  already a handled?
 
  In postgres_fdw as it currently stands, the remote ctid would change.
  I'm not sure we should posit that as a universal property of FDWs
  though.  It's not even a meaningful question for FDWs with no underlying
  rowid concept.
 
 I just find it odd that rowid concept is used at all without strong
 guarantee that the record you are referencing is the one you are
 supposed to be referencing.  Basically I'm saying PKEY semantics are
 the correct ones and that ctid is ok to use iff they match the pkey
 ones.  I don't think this is possible unless you maintain a remote
 lock on the ctid between when you fetch it and do some other
 operation.

ISTM thats just mostly the same semantics as you have locally. If you
locally do an UPDATE it will use the ctid for location of the updated
row.
If it got updated since (which you can detect by checking whether xmax
has committed) the behaviour depends on the transaction isolation level. On
repeatable_read upwards it errors out, otherwise it will do the
EvalPlanQual magic. Which is to make sure the WHERE condition still fits
the row. Thats obviously where the cases start to differ. But thats also
the case if you use the primary key since you obviously can have more
quals than just that on the origin side.
Perhaps pgsql-fdw should make sure the update was performed *without*
following the ctid chain to a new valid tuple?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] leaking lots of unreferenced inodes (pg_xlog files?), maybe after moving tables and indexes to tablespace on different volume

2013-03-13 Thread Dan Thomas
We're seeing similar behaviour on several of our FreeBSD servers too.
It doesn't look like open files, or filesystem snapshots. Rebooting
does reset it, but restarting PG makes no difference.

We've got an assortment of different versions of both FreeBSD and
PostgreSQL, some of which are demonstrating this behaviour, some
aren't. Here's a quick breakdown of versions and what we've got
running:

FreeBSD   PostgreSQL   Leaking?
8.0   8.4.4no
8.2   9.0.4no
8.3   9.1.4yes
8.3   9.2.3yes
9.1   9.2.3yes

All of these machines are under similar load patterns and (apart from
the version differences), are set up essentially the same and are
doing the same job. They all have hot standbys yet this problem
doesn't exist on any of the standby servers. We haven't done anything
with tablespaces, the database has its own dedicated partition
(although pg_log/pg_xlog are both symlinked out to /usr).

However (just to throw a spanner in the works) we do have another
server running fbsd8.3/pg9.1.4 which ISN'T showing this behaviour -
although its load patterns are quite different.

I'm not sure if this is going to help, but here's a graph of this disk
space disparity over the last few days (Y axis is in gigabytes). The
flat-ish part in the middle is the weekend where we have little
traffic, so we can at least say it's not constant:
http://i.imgur.com/jlbgzNI.png

Up until now we've been upgrading things in the hope that the problem
will go away, but since we've got one server up to fbsd9.1/pg9.2.3 and
still seeing the problem we're a little stumped. Any ideas about how
we can go about debugging this would be appreciated.

Thanks,

Dan

On 13 March 2013 07:39, Magnus Hagander mag...@hagander.net wrote:

 On Mar 13, 2013 3:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Palle Girgensohn gir...@freebsd.org writes:
  ... I got lots of space freed
  up, but it seems that after that the disk usage grows linearly (it seems
  to leave many inodes unreferenced).

 Hm.  We've seen issues in the past with PG processes failing to close
 no-longer-useful files promptly, but ...

  Strange thing is I cannot find any open files.

 ... that suggests there's something else going on.

  The unreferenced inodes are almost exclusively around 16 MB in size, so
  i.e. they would most probably all be pg_xlog files.

 Have you got any sort of WAL archiving active, and if so maybe that's
 holding onto WAL files?  Not that it's clear how come lsof wouldn't
 tattle on an archiving process either.

  Stopping postgresql briefly did not help, I tried that.

 That seems to point the finger at some non-postgres cause.  I confess
 I can't guess what.


 Yeah, unreferenced inodes with no open files, and only discoverable with
 fsck sounds like a filsystem bug to me. Particularly since it showed up just
 after a operating system upgrade, and doesn't go away with a postgres
 restart...

 /Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Perhaps pgsql-fdw should make sure the update was performed *without*
 following the ctid chain to a new valid tuple?

I did think about these issues before committing the patch ;-)

The basic theory in PG's existing design is to postpone locking rows as
long as possible; which means that when we do finally lock a target row,
we have to check if it's changed since we scanned it, and that leads
into the whole EvalPlanQual mess.  I considered trying to make FDWs
duplicate that behavior, but gave up on it.  In the first place, it's
hard to see how you even define did the row change unless you have
something exactly like ctids (including forward update chains).  And
in the second place, this would mandate yet another round trip to the
remote server for each row to be updated.

In the patch as committed, the expectation (which is satisfied by
postgres_fdw) is that FDWs should lock rows that are candidates for
update/delete during the initial scan.  This avoids an extra round trip
and justifies leaving EvalPlanQual out of the picture altogether.
The cost is that we may lock rows that we end up not updating, because
they fail locally-checked restriction or join conditions.  I think on
balance that's a good trade-off.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Merlin Moncure
On Wed, Mar 13, 2013 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Perhaps pgsql-fdw should make sure the update was performed *without*
 following the ctid chain to a new valid tuple?

 I did think about these issues before committing the patch ;-)

 The basic theory in PG's existing design is to postpone locking rows as
 long as possible; which means that when we do finally lock a target row,
 we have to check if it's changed since we scanned it, and that leads
 into the whole EvalPlanQual mess.  I considered trying to make FDWs
 duplicate that behavior, but gave up on it.  In the first place, it's
 hard to see how you even define did the row change unless you have
 something exactly like ctids (including forward update chains).  And
 in the second place, this would mandate yet another round trip to the
 remote server for each row to be updated.

 In the patch as committed, the expectation (which is satisfied by
 postgres_fdw) is that FDWs should lock rows that are candidates for
 update/delete during the initial scan.  This avoids an extra round trip
 and justifies leaving EvalPlanQual out of the picture altogether.
 The cost is that we may lock rows that we end up not updating, because
 they fail locally-checked restriction or join conditions.  I think on
 balance that's a good trade-off.

agreed.

As long as lock as held between ctid examination and row modification
you are ok.  didn't read the patch, just pointing that out (history is
full of client side drivers that did not do this properly).

I also might have missed some of the finer contextual points of the
discussion here: I was thinking that you are identifying rows on the
client over fetch transaction A to write back in transaction B.  If
that is the case, ctid based identification to me is full of issues
and probably best used to handle awkward cases like there is no
identified primary key.  But sure, if you are holding locks you can
base it on physical position.

with pkey row identification, yes, rows can change from under you, but
now you can employ application tricks to guard for that just as you
would with any optimistic locking strategy in SQL.  This is and
expected and well understood problem (use transactions etc).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-13 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 As long as lock as held between ctid examination and row modification
 you are ok.  didn't read the patch, just pointing that out (history is
 full of client side drivers that did not do this properly).

 I also might have missed some of the finer contextual points of the
 discussion here: I was thinking that you are identifying rows on the
 client over fetch transaction A to write back in transaction B.  If
 that is the case, ctid based identification to me is full of issues

Absolutely --- you can't rely on ctid across transactions.  postgres_fdw
isn't doing that though, just using it to update or delete rows that it
locked earlier in the same remote transaction.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread David E. Wheeler
On Mar 13, 2013, at 5:17 AM, Robert Haas robertmh...@gmail.com wrote:

 What I think is tricky here is that there's more than one way to
 conceptualize what the JSON data type really is.  Is it a key-value
 store of sorts, or just a way to store text values that meet certain
 minimalist syntactic criteria?  I had imagined it as the latter, in
 which case normalization isn't sensible.  But if you think of it the
 first way, then normalization is not only sensible, but almost
 obligatory.

That makes a lot of sense. Given the restrictions I tend to prefer in my 
database data types, I had imagined it as the former. And since I'm using it 
now to store key/value pairs (killing off some awful EAV implementations in the 
process, BTW), I certainly think of it more formally as an object.


But I can live with the other interpretation, as long as the differences are 
clearly understood and documented. Perhaps a note could be added to the docs 
explaining this difference, and what one can do to adapt for it. A normalizing 
function would certainly help.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Gavin Flower

On 14/03/13 02:02, Andrew Dunstan wrote:


On 03/13/2013 08:17 AM, Robert Haas wrote:
On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net 
wrote:

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a 
key wins

and is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that
value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type
things a value is valid while the accessor does not. They 
contradict one

another.
You can forget 1. We are not going to have the parser collapse 
anything.
Either the JSON it gets is valid or it's not. But the parser isn't 
going to

try to MAKE it valid.
Why not?  Because it's the wrong thing to do, or because it would be 
slower?


What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
  numeric
-
   10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.




I think we've moved on from this point, because a) other 
implementations allow duplicate keys, b) it's trivially easy to make 
Postgres generate such json, and c) there is some dispute about 
exactly what the spec mandates.


I'll be posting a revised patch shortly that doesn't error out but 
simply uses the value for the later key lexically.


cheers

andrew




How about adding a new function with '_strict' added to the existing 
name, with an extra parameter 'coalesce' - or using other names, if 
considered more appropriate!


That way slower more stringent functionality can be added where 
required.  This way, the existing function need not be changed.


If coalesce = true,
then: the last duplicate is used
else: an error is returned when the new key is a duplicate.


Cheers,
Gavin





Re: [HACKERS] transforms

2013-03-13 Thread Dimitri Fontaine
Peter Eisentraut pete...@gmx.net writes:
 At run time, this will sort itself out, because all the required modules
 will be loaded.  The problem is when you create the glue extension and
 haven't invoked any code from any of the dependent extension in the
 session.  Abstractly, the possible solutions are either not to check the
 functions when the extension is created (possibly settable by a flag) or
 to somehow force a load of all dependent extensions when the new
 extension is created.  (I say extension here even though dynamically
 loadable modules are attached to functions, which makes this even more
 confusing.)

I think CREATE EXTENSION should be LOADing all distinct modules
referenced from all required extensions functions. It does sound easy
enough to code, and I can't imagine why we would want to not do that.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-13 Thread Josh Berkus
Jeff,

 However, I'm willing to be convinced to exclude temp tables again.


Those reasons sound persuasive.  Let's leave them in for 9.3.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andrew Dunstan


On 03/13/2013 12:51 PM, Gavin Flower wrote:

On 14/03/13 02:02, Andrew Dunstan wrote:


On 03/13/2013 08:17 AM, Robert Haas wrote:
On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net 
wrote:

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a 
key wins

and is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that
value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type
things a value is valid while the accessor does not. They 
contradict one

another.
You can forget 1. We are not going to have the parser collapse 
anything.
Either the JSON it gets is valid or it's not. But the parser isn't 
going to

try to MAKE it valid.
Why not?  Because it's the wrong thing to do, or because it would be 
slower?


What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
  numeric
-
   10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.




I think we've moved on from this point, because a) other 
implementations allow duplicate keys, b) it's trivially easy to make 
Postgres generate such json, and c) there is some dispute about 
exactly what the spec mandates.


I'll be posting a revised patch shortly that doesn't error out but 
simply uses the value for the later key lexically.


cheers

andrew




How about adding a new function with '_strict' added to the existing 
name, with an extra parameter 'coalesce' - or using other names, if 
considered more appropriate!


That way slower more stringent functionality can be added where 
required.  This way, the existing function need not be changed.


If coalesce = true,
then: the last duplicate is used
else: an error is returned when the new key is a duplicate.






For good or ill, we now already have a json type that will accept 
strings with duplicate keys, and generator functions which can now 
generate such strings. If someone wants functions to enforce a stricter 
validity check (e.g. via a check constraint on a domain), or to convert 
json to a canonical version which strips out prior keys of the same name 
and their associated values, then these should be relatively simple to 
implement given the parser API in the current patch. But they aren't 
part of the current patch, and I think it's way too late to be adding 
such things. I have been persuaded by arguments made upthread that the 
best thing to do is exactly what other well known json-accepting 
implementations do (e.g. V8), which is to accept json with duplicate 
keys and to treat the later key/value as overriding the former 
key/value. If I'd done that from the start nobody would now be talking 
about this at all.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread David E. Wheeler
On Mar 13, 2013, at 10:45 AM, Andrew Dunstan and...@dunslane.net wrote:

 If someone wants functions to enforce a stricter validity check (e.g. via a 
 check constraint on a domain), or to convert json to a canonical version 
 which strips out prior keys of the same name and their associated values, 
 then these should be relatively simple to implement given the parser API in 
 the current patch. But they aren't part of the current patch, and I think 
 it's way too late to be adding such things.

I think it might be good to get something like this into core eventually, 
otherwise I suspect that there will be a different version of it for every 
JSON-using project out there. And my first cut at it won’t descend into 
sub-objects.

 I have been persuaded by arguments made upthread that the best thing to do is 
 exactly what other well known json-accepting implementations do (e.g. V8), 
 which is to accept json with duplicate keys and to treat the later key/value 
 as overriding the former key/value. If I'd done that from the start nobody 
 would now be talking about this at all.

That’s true, though I might have started thinking about a canonicalizing 
function before long. :-)

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andrew Dunstan


On 03/13/2013 01:50 PM, David E. Wheeler wrote:

On Mar 13, 2013, at 10:45 AM, Andrew Dunstan and...@dunslane.net wrote:


If someone wants functions to enforce a stricter validity check (e.g. via a 
check constraint on a domain), or to convert json to a canonical version which 
strips out prior keys of the same name and their associated values, then these 
should be relatively simple to implement given the parser API in the current 
patch. But they aren't part of the current patch, and I think it's way too late 
to be adding such things.

I think it might be good to get something like this into core eventually, 
otherwise I suspect that there will be a different version of it for every 
JSON-using project out there. And my first cut at it won’t descend into 
sub-objects.



The you wouldn't be doing it right. The whole thing about a recursive 
descent parser is that it's, well, recursive.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread David E. Wheeler
On Mar 13, 2013, at 10:59 AM, Andrew Dunstan and...@dunslane.net wrote:

 And my first cut at it won’t descend into sub-objects.
 
 
 The you wouldn't be doing it right. The whole thing about a recursive descent 
 parser is that it's, well, recursive.

Right, but it would serve my immediate needs. I have a column that just stores 
key/value pairs with no nesting. So I know I can write something like this and 
have it be good enough:

create or replace function json_smash(
json
) RETURNS JSON language SQL STRICT IMMUTABLE AS $$
SELECT format('{%s}', array_to_string(ARRAY(
SELECT format('%s: %s', to_json(key), value)
  FROM (
  SELECT key, value, row_number() OVER (
 partition by key order by rnum desc
 ) AS rnum
FROM (
SELECT key, value, row_number() OVER (
   partition by key
   ) AS rnum
  FROM json_each($1)
) a
 ) b
 WHERE rnum = 1
), ','))::json;
$$;

And do you really want to see that unloosed on the world? :-P (Yes, I know 
there is no guarantee on the order of rows returned by json_each()).

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Column defaults for foreign tables (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-13 Thread Tom Lane
Albe Laurenz laurenz.a...@wien.gv.at writes:
 Yeah, I'm drifting towards the position that we should just define the
 defaults as being whatever they are locally, rather than trying to be
 cute about supporting remotely-executed defaults.

 That was my first thought on the topic, to have a solution that
 is simple (if not perfect).
 Your argument that it would be unpleasant to lose the ability
 to use sequence-generated remote default values made me reconsider.

 But there is a workaround, namely to use a trigger before insert
 to generate an automatic primary key (e.g. if the inserted value is
 NULL).

Another attack is to set up a different foreign table pointing to the
same remote table, but lacking the sequence column.  When you insert via
that table, you'll get the remote's default for the hidden column.
This doesn't require any weird triggers on the remote side, but it could
be hard to persuade existing apps to use the second foreign table.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-13 Thread Heikki Linnakangas

On 01.03.2013 16:22, Alexander Korotkov wrote:

On Tue, Mar 12, 2013 at 8:03 PM, Heikki Linnakangashlinnakan...@vmware.com

wrote:



So, after some hacking, I ended up with this version. I find it more
readable, I hope I didn't miss anything. This seems to produce results that
are close, but not identical, to the original patch. I'm not sure where the
discrepancy is coming from, or which patch is more correct in that respect.
I'll continue from this tomorrow, but if you have the time, please take a
look and let me know what you think.


I've read your explanation and version of patch. In general it seems
correct to me.
There is one point why I have breaked up abstraction in some functions is
infinities. For example, in calc_length_hist_frac one or both of length1
and length2 can be infinity. In the line

frac = area / (length2 - length1);

you can get NaN result. I've especially adjusted the code to get more of
less correct result in this case.


Hmm, good point. I think I managed to fix those cases in the attached 
version. Is there any other corner case that I missed?



Another minor note about this line

bin_width *= get_position(typcache, lower,hist_lower[i],
  hist_lower[i + 1]);

ITSM it sould looks like

bin_width -= 1.0 - get_position(typcache, lower,hist_lower[i],
   hist_lower[i + 1]);

Imagine lower and upper bounds fall into same histogram bin. In this case
we should subtract lengths of both parts which were cut in the left and in
the right.


Yes, true. There's one negation too many above, though; should be:

bin_width -= get_position(typcache, lower,hist_lower[i],
   hist_lower[i + 1]);

Fixed that. Barring any more issues, I'll read through this once more 
tomorrow and commit.


- Heikki
*** a/src/backend/utils/adt/rangetypes_selfuncs.c
--- b/src/backend/utils/adt/rangetypes_selfuncs.c
***
*** 20,25 
--- 20,26 
  #include access/htup_details.h
  #include catalog/pg_operator.h
  #include catalog/pg_statistic.h
+ #include utils/builtins.h
  #include utils/lsyscache.h
  #include utils/rangetypes.h
  #include utils/selfuncs.h
***
*** 39,44  static int rbound_bsearch(TypeCacheEntry *typcache, RangeBound *value,
--- 40,60 
  			   RangeBound *hist, int hist_length, bool equal);
  static float8 get_position(TypeCacheEntry *typcache, RangeBound *value,
  			 RangeBound *hist1, RangeBound *hist2);
+ static float8 get_len_position(double value, double hist1, double hist2);
+ static float8 get_distance(TypeCacheEntry *typcache, RangeBound *bound1,
+ 			RangeBound *bound2);
+ static int length_hist_bsearch(Datum *length_hist_values,
+ 	int length_hist_nvalues, double value, bool equal);
+ static double calc_length_hist_frac(Datum *length_hist_values,
+ 	int length_hist_nvalues, double length1, double length2, bool equal);
+ static double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
+ RangeBound *lower, RangeBound *upper,
+ RangeBound *hist_lower, int hist_nvalues,
+ 			Datum *length_hist_values, int length_hist_nvalues);
+ static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
+ 			   RangeBound *lower, RangeBound *upper,
+ 			   RangeBound *hist_lower, int hist_nvalues,
+ 			Datum *length_hist_values, int length_hist_nvalues);
  
  /*
   * Returns a default selectivity estimate for given operator, when we don't
***
*** 213,219  calc_rangesel(TypeCacheEntry *typcache, VariableStatData *vardata,
  		/* Try to get fraction of empty ranges */
  		if (get_attstatsslot(vardata-statsTuple,
  			 vardata-atttype, vardata-atttypmod,
! 			 STATISTIC_KIND_RANGE_EMPTY_FRAC, InvalidOid,
  			 NULL,
  			 NULL, NULL,
  			 numbers, nnumbers))
--- 229,235 
  		/* Try to get fraction of empty ranges */
  		if (get_attstatsslot(vardata-statsTuple,
  			 vardata-atttype, vardata-atttypmod,
! 			 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM, InvalidOid,
  			 NULL,
  			 NULL, NULL,
  			 numbers, nnumbers))
***
*** 332,337  calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
--- 348,355 
  {
  	Datum	   *hist_values;
  	int			nhist;
+ 	Datum	   *length_hist_values;
+ 	int			length_nhist;
  	RangeBound *hist_lower;
  	RangeBound *hist_upper;
  	int			i;
***
*** 365,370  calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
--- 383,403 
  			elog(ERROR, bounds histogram contains an empty range);
  	}
  
+ 	/* @ and @ also need a histogram of range lengths */
+ 	if (operator == OID_RANGE_CONTAINS_OP ||
+ 		operator == OID_RANGE_CONTAINED_OP)
+ 	{
+ 		if (!(HeapTupleIsValid(vardata-statsTuple) 
+ 			  get_attstatsslot(vardata-statsTuple,
+ 			   vardata-atttype, vardata-atttypmod,
+ 			   STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+ 			   InvalidOid,
+ 			   NULL,
+ 			   length_hist_values, length_nhist,
+ 			   NULL, NULL)))
+ 			return 

Re: [HACKERS] matview join view error

2013-03-13 Thread Kevin Grittner
Erikjan Rijkers e...@xs4all.nl wrote:

 With 9.3devel, I can't seem to join a matview to a view; surely
 that should be allowed?

Yes.

 Here is an example:

I expect to add a regression test based on that, once the bug is
fixed.

Thanks!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized View patch broke pg_dump

2013-03-13 Thread Kevin Grittner
Andrew Dunstan and...@dunslane.net wrote:
 On 03/11/2013 12:30 PM, Fujii Masao wrote:
 Andrew Dunstan and...@dunslane.net wrote:
 On 03/11/2013 10:43 AM, Andrew Dunstan wrote:

 I noticed this morning that I am still getting failures on
 9.0, 9.1 and 9.2 which cause my cross-version upgrade testing
 to fail for git tip. For
 all I know this might apply to all back branches, but these
 are the only ones tested for upgrade, so that's all I can
 report on reliably.

 I'm chasing it up to find out exactly what's going on, but
 figured some extra eyeballs would help.


 The problem is that pg_dump is sending an empty query in
 versions less than 9.3, and choking on that. Suggested fix
 attached - there's really no reason to be doing anything re mat
 views in versions  9.3. This is the same problem that I
 reported in another thread.



http://www.postgresql.org/message-id/CAHGQGwH+4vtyq==
l6hrupxtggfqrnlf0mwj75bfisoske28...@mail.gmail.com



 Oh, I missed that. Yes, either of these would work.

I've been out of town and not able to keep up here for a few days. 
Will push a fix today unless I find that someone has beaten me to
it as I work through the rest of the messages.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump selectively ignores extension configuration tables

2013-03-13 Thread Robert Haas
On Fri, Mar 8, 2013 at 5:07 PM, Joe Conway m...@joeconway.com wrote:
 (reposting apparently I used a verboten word the first
  time around sigh. Sorry for any duplicates)

 The -t and -n options of pg_dump do not dump anything from an extension
 configuration table, whereas normal pg_dump will dump the user data.

 To see what I mean, in psql do (tested on pg9.2):
 8--
 create extension postgis;
 insert into spatial_ref_sys
   values(42,'test',42,'foo','bar');
 8--

 Then in bash do:
 8--
 pg_dumptest|grep spatial_ref_sys
 pg_dump -t spatial_ref_sys test|grep spatial_ref_sys
 pg_dump -n public  test|grep spatial_ref_sys
 8--

 Is this intentional, or oversight, or missing feature?

Hmm.  It doesn't seem right to me.  It seems like it should either
dump everything, or dump just the user data portion, when the name
matches.  Not entirely sure which - probably the latter?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump selectively ignores extension configuration tables

2013-03-13 Thread Joe Conway
On 03/13/2013 03:07 PM, Robert Haas wrote:
 Is this intentional, or oversight, or missing feature?
 
 Hmm.  It doesn't seem right to me.  It seems like it should either
 dump everything, or dump just the user data portion, when the name
 matches.  Not entirely sure which - probably the latter?

+1

I think it should dump the user data portion, especially since that
matches what pg_dump would do if you did not specify the table or schema.

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Column defaults for foreign tables (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-13 Thread Thom Brown
On 13 March 2013 19:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Albe Laurenz laurenz.a...@wien.gv.at writes:
 Yeah, I'm drifting towards the position that we should just define the
 defaults as being whatever they are locally, rather than trying to be
 cute about supporting remotely-executed defaults.

 That was my first thought on the topic, to have a solution that
 is simple (if not perfect).
 Your argument that it would be unpleasant to lose the ability
 to use sequence-generated remote default values made me reconsider.

 But there is a workaround, namely to use a trigger before insert
 to generate an automatic primary key (e.g. if the inserted value is
 NULL).

 Another attack is to set up a different foreign table pointing to the
 same remote table, but lacking the sequence column.  When you insert via
 that table, you'll get the remote's default for the hidden column.
 This doesn't require any weird triggers on the remote side, but it could
 be hard to persuade existing apps to use the second foreign table.

How about:

CREATE FOREIGN TABLE tablename (id int DEFAULT PASSTHROUGH) SERVER pg_server;

That way it will pass DEFAULT through to the remote table as it's
defined on the table.  Users can then explicitly insert values, or
select the default, which will configured to ensure the default on the
remote server is used... although I suspect I'm overlooking something
here.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-13 Thread Boszormenyi Zoltan

2013-03-13 13:45 keltezéssel, Andres Freund írta:

On 2013-03-13 09:09:24 +0100, Boszormenyi Zoltan wrote:

2013-03-13 07:42 keltezéssel, Craig Ringer írta:

On 03/12/2013 06:27 AM, Craig Ringer wrote:

Think also about the case where someone wants to change multiple
values together and having just some set and not others would be
inconsistent.

Yeah, that's a killer. The reload would need to be scheduled for COMMIT
time, it can't be done by `SET PERSISTENT` directly.

Thinking about this some more, I'm not sure this is a good idea.

Right now, SET takes effect immediately. Always, without exception.
Delaying SET PERSISTENT's effects until commit would make it
inconsistent with SET's normal behaviour.

However, *not* delaying it would make it another quirky
not-transactional not-atomic command. That's OK, but if it's not going
to follow transactional semantics it should not be allowed to run within
a transaction, like VACUUM .

Writing the changes immediately but deferring the reload until commit
seems to be the worst of those two worlds.

I was thinking about it a little. There is a hook that runs at the end
of (sub-)transactions. It can be abused for this purpose to make
SET PERSISTENT transactional. The subtransactions can also stack
these settings, by forgetting settings upon ROLLBACK [ TO SAVEPOINT ]
and overwriting previous settings upon COMMIT and RELEASE SAVEPOINT.
All it needs is a list and maintaining intermediate pointers when entering
into a new level of SAVEPOINT. The functions to register such hooks are
in src/include/access/xact.h:

extern void RegisterXactCallback(XactCallback callback, void *arg);
extern void UnregisterXactCallback(XactCallback callback, void *arg);
extern void RegisterSubXactCallback(SubXactCallback callback, void *arg);
extern void UnregisterSubXactCallback(SubXactCallback callback, void *arg);

(sub)xact commit/abort already calls AtEOXact_GUC(commit, nestlevel). So you
wouldn't even need that.


Yes, thanks.


  It seems we could add another value to enum
GucStackState, like GUC_SET_PERSISTENT - and process those only if commit 
nestlevel == 1.


Maybe it's not needed, only enum GucAction needs a new
GUC_ACTION_PERSISTENT value since that's what has any
business in push_old_value(). Adding two new members to
GucStack like these are enough
bool has_persistent;
config_var_value persistent;
and SET PERSISTENT can be treated as GUC_SET. push_old_value()
can merge GUC values set in the same transaction level.


Everytime you see one with commit  nestlevel  1 you put them into them into
the stack one level up.

This seems like its somewhat in line with the way SET LOCAL is implemented?


On the other hand, it would make a lot of sense to implement it
as one setting per file or extending the  code to allow modifying
settings in bulk. The one setting per file should be easier and
it would also allow extensions to drop their settings automatically
into the automatic config directory. I don't know who mentioned
this idea about extensions but it also came up a while back.

I still think one-setting-per-file is the right way to go, yes.

Greetings,

Andres Freund




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-13 Thread Alexander Korotkov
On Wed, Mar 13, 2013 at 11:10 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01.03.2013 16:22, Alexander Korotkov wrote:

 On Tue, Mar 12, 2013 at 8:03 PM, Heikki Linnakangashlinnakangas@**
 vmware.com hlinnakan...@vmware.com

 wrote:


  So, after some hacking, I ended up with this version. I find it more
 readable, I hope I didn't miss anything. This seems to produce results
 that
 are close, but not identical, to the original patch. I'm not sure where
 the
 discrepancy is coming from, or which patch is more correct in that
 respect.
 I'll continue from this tomorrow, but if you have the time, please take a
 look and let me know what you think.


 I've read your explanation and version of patch. In general it seems
 correct to me.
 There is one point why I have breaked up abstraction in some functions is
 infinities. For example, in calc_length_hist_frac one or both of length1
 and length2 can be infinity. In the line

 frac = area / (length2 - length1);

 you can get NaN result. I've especially adjusted the code to get more of
 less correct result in this case.


 Hmm, good point. I think I managed to fix those cases in the attached
 version. Is there any other corner case that I missed?


Did you try test case by Jeff Davis on this thread?
http://www.postgresql.org/message-id/1355167304.3896.37.camel@jdavis
I try it with attached version of patch and get NaN estimate.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Display output file name in psql prompt?

2013-03-13 Thread Alex

Josh Berkus j...@agliodbs.com writes:

 Tom,

 If you're proposing changing the contents of the default prompt, I think
 that has very little chance of passing.  A new option for something to
 add into a custom prompt might get accepted.  I'm not sure that that
 approach would do much for the scenario you describe, since it's
 unlikely that anyone would add it to their prompt (at least not before
 they'd gotten burnt...).  However, I don't recall hearing of anyone
 getting confused like this before, and you say you figured it out pretty
 quickly, so it doesn't seem like a big problem.

 Well, I think having a \ command to show the current \o setting would be
 fine.  Actually, it might be better to show *all* psql settings with a
 general command, including things like \timing and \pset.

Actually in my case, I've also used \t and \a psql commands to produce
undecorated machine-readable output and now I think it would be nice to
also show the status of these in the prompt.  What if we add '%\'
substitute to expand to the list of all settings with non-default
values?  Like this:

postgres=# \set PROMPT1 '%/%\%R%# '
postgres=# -- Nothing is set, so '%\' expands to an empty string.

postgres=# \t
Showing only tuples.
postgres\t=# \a
Output format is unaligned.
postgres\a\t=# \o /path/to/output/file.out
postgres\a\o=/path/to/output/file.out\t=#

Not sure how \pset could fit there, it might be a bit excessive.

Another idea is we could add \O and \G commands that will act like the
plain \o and \g, but will set \a and \t automatically.  I mean it must
be quite often that you don't need all the decoration when you save
query results to a file, so instead of doing \a, \t, \g (then setting \a
and \t back) you can just do \G and move on.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Hannu Krosing

On 03/13/2013 12:40 PM, David E. Wheeler wrote:

On Mar 13, 2013, at 5:17 AM, Robert Haas robertmh...@gmail.com wrote:


What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.

That makes a lot of sense. Given the restrictions I tend to prefer in my 
database data types, I had imagined it as the former. And since I'm using it 
now to store key/value pairs (killing off some awful EAV implementations in the 
process, BTW), I certainly think of it more formally as an object.


But I can live with the other interpretation, as long as the differences are 
clearly understood and documented. Perhaps a note could be added to the docs 
explaining this difference, and what one can do to adapt for it. A normalizing 
function would certainly help.
I guess the easiest and most generic way to normalize is to actually 
convert to some internal representation and back.


in pl/python this would look like this:

hannu=# create function normalize(IN ij json, OUT oj json) language 
plpythonu as $$

import json
return json.dumps(json.loads(ij))
$$;
CREATE FUNCTION
hannu=# select normalize('{a:1, a:b, a:true}');
  normalize
-
 {a: true}
(1 row)

If we could want to be really fancy we could start storing our json in 
some format which
is faster to parse, like tnetstrings, but probably it is too late in 
release cycle to change this now.


Hannu


Best,

David





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump selectively ignores extension configuration tables

2013-03-13 Thread Dimitri Fontaine
Joe Conway m...@joeconway.com writes:
 I think it should dump the user data portion, especially since that
 matches what pg_dump would do if you did not specify the table or schema.

+1

If you don't have time slots to fix that by then, I will have a look at
fixing that while in beta.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview patch readability/correctness gripe

2013-03-13 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 diff --git a/src/backend/rewrite/rewriteDefine.c
 b/src/backend/rewrite/rewriteDefine.c
 index
 a1a9808e5d94959218b415ed34c46579c478c177..896326615753f2344b466eb180080174ddeda31d
 100644
 *** a/src/backend/rewrite/rewriteDefine.c
 --- b/src/backend/rewrite/rewriteDefine.c
 *** DefineQueryRewrite(char *rulename,
 *** 356,362 
   */
   checkRuleResultList(query-targetList,
   RelationGetDescr(event_relation),
 !    true);
  
   /*
   * ... there must not be another ON SELECT rule already ...
 --- 357,364 
   */
   checkRuleResultList(query-targetList,
   RelationGetDescr(event_relation),
 !    event_relation-rd_rel-relkind !=
 !    RELKIND_MATVIEW);
  
   /*
   * ... there must not be another ON SELECT rule already ...

 IMO this is either flat-out wrong, or an unmaintainable crock, or both.

 So this either needs to be reverted, or refactored into two arguments
 not one, with checkRuleResultList being updated to account honestly
 and readably for whatever it's supposed to be doing here.

Will review in light of your comments.

Thanks!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-13 Thread Jim Nasby

On 3/7/13 9:31 PM, Bruce Momjian wrote:

1 storage
2 storage controller
3 file system
4 RAM
5 CPU


I would add 2.5 in there: storage interconnect. iSCSI, FC, what-have-you. 
Obviously not everyone has that.


My guess is that storage checksums only cover layer 1, while our patch
covers layers 1-3, and probably not 4-5 because we only compute the
checksum on write.


Actually, it depends. In our case, we run 512GB servers and 8GB shared buffers 
(previous testing has shown that anything much bigger than 8G hurts 
performance).

So in our case, PG checksums protect a very significant portion of #4.


If that is correct, the open question is what percentage of corruption
happens in layers 1-3?


The last bout of corruption we had was entirely coincident with memory 
failures. IIRC we had 3-4 corruption events on more than one server. Everything 
was running standard ECC (sadly, not 4-bit ECC).


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using indexes for partial index builds

2013-03-13 Thread Jim Nasby

On 3/12/13 9:10 AM, Ants Aasma wrote:

I have a feeling this is an increasingly widespread pattern with a
proliferation of mobile devices that need syncing.


If you're doing that with timestamps you're asking for a slew of problems, not 
all of which can be solved by just adding some random amount of fluff to your 
criteria. A queue-based solution is often a more robust solution, even if it is 
harder to implement.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Column defaults for foreign tables (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-13 Thread Tom Lane
Thom Brown t...@linux.com writes:
 How about:

 CREATE FOREIGN TABLE tablename (id int DEFAULT PASSTHROUGH) SERVER pg_server;

 That way it will pass DEFAULT through to the remote table as it's
 defined on the table.  Users can then explicitly insert values, or
 select the default, which will configured to ensure the default on the
 remote server is used... although I suspect I'm overlooking something
 here.

Yeah ... how to implement it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temporal features in PostgreSQL

2013-03-13 Thread Jim Nasby

On 2/13/13 10:06 PM, Vlad Arkhipov wrote:

 - I don't need to deal with update conflicts, because I use 
clock_timestamp() instead of current_timestamp.

You can still come across a conflict even with clock_timestamp(). What if 
clocks go back during the time synchronization? Even if you have absolutely 
precious clocks, there are may be clock skew on different CPUs, low system 
clock time resolution, etc.


Sorry for the late reply, catching up on email...

If you want to track the history of something, measured time is absolutely NOT the way 
to do it. I use the term measured time to differentiate from the real-world concept of 
time that is forever flowing forward from one instant to the next. The problem with measured time 
is that it's incredibly easy to screw up. Clocks out of sync, clocks running backwards, etc. Heck, 
it's not even clear what time you should actually use: transaction start, wallclock, or transaction 
end.

For any kind of history tracking to actually be robust you have no choice but 
to link one history record to another so that you can actually walk down a 
chain. Of course you might want to capture a timestamp as part of your history 
metadata, but you better be ready to deal with a history record with a 
timestamp that is *earlier* than the prior history record.

BTW, we've been working on a generic history implementation at work; hopefully 
we'll be able to release it relatively soon.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using indexes for partial index builds

2013-03-13 Thread Ants Aasma
On Thu, Mar 14, 2013 at 1:51 AM, Jim Nasby j...@nasby.net wrote:
 On 3/12/13 9:10 AM, Ants Aasma wrote:
 I have a feeling this is an increasingly widespread pattern with a
 proliferation of mobile devices that need syncing.

 If you're doing that with timestamps you're asking for a slew of problems,
 not all of which can be solved by just adding some random amount of fluff to
 your criteria. A queue-based solution is often a more robust solution, even
 if it is harder to implement.

Do you know of anything else besides the obvious issues with having to
use one clocksource and ensure that it produces monotonic timestamps?
My first reaction was also that this is what queues are meant for, but
the proposed solution seems to work surprisingly well. Unless you can
point at some glaring hole that I'm missing I would say that it is
good enough for a rather wide range of syncing problems.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing code-coverage of 'make check'

2013-03-13 Thread robins
Thanks Alvaro!

The thought of psql_help purely because it was the easiest at that time.
Since I've just begun my understanding of the code is barely negligible.

I began working on SEQUENCE related tests thereafter and hopefully would
move to more complicated tests in time. Peter's link is obviously helpful
but since I end up doing make check ~100 of times a day, for now its useful
only to cross-check how much code is uncommitted :)

Robins


On 11 March 2013 09:16, Alvaro Herrera alvhe...@2ndquadrant.com wrote:


 I think increasing coverage is a good thing.  But psql help?  *shrug*
 backend code is far more interesting and useful.

 Another thing to keep in mind is that there are some corner cases that
 are interesting to test that might not necessarily show up in a coverage
 chart -- for example how stuff behaves in the face of concurrent
 processes, or when various counters wrap around.

 Peter Eisentraut has set up a Jenkins instance that publishes coverage
 info.
 http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/
 (I think he only has it running make check; doing the isolation tests
 probably raises percentages a bit).




[HACKERS] TupleTable like data structure

2013-03-13 Thread Luma
I'm writing my own /Group by/ operator (non-hashed implementation) and I'm
currently looking on a good data structure to store the result groups. There
are two requirement in my mind: the table need to be efficiently expanded
and the tuples within each group need to be sorted.

I carefully examined the implementation of standard /Group by/ (both hashed
and sorted implementation), the closest I got is to leverage parts of the
/TupleHashTable/ but before going through this plan, I need to make sure I'm
not overlooking an existing data structure.

Thanks



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/TupleTable-like-data-structure-tp5748432.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized View patch broke pg_dump

2013-03-13 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:
 Andrew Dunstan and...@dunslane.net wrote:
 On 03/11/2013 12:30 PM, Fujii Masao wrote:
 Andrew Dunstan and...@dunslane.net wrote:

 The problem is that pg_dump is sending an empty query in
 versions less than 9.3, and choking on that. Suggested fix
 attached - there's really no reason to be doing anything re mat
 views in versions  9.3.

 This is the same problem that I reported in another thread.
 
 http://www.postgresql.org/message-id/CAHGQGwH+4vtyq==l6hrupxtggfqrnlf0mwj75bfisoske28...@mail.gmail.com

 Oh, I missed that. Yes, either of these would work.

 Will push a fix today unless I find that someone has beaten me to
 it as I work through the rest of the messages.

I agree that either would work.  I preferred Andrew's patch because
it kept knowledge of this issue more localized.

Pushed.  Thanks to you both, and apologies for the error.

Next time I do anything with pg_dump I will know better what
constitutes decent testing.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] is it bug? - printing boolean domains in sql/xml function

2013-03-13 Thread Peter Eisentraut
On Mon, 2013-03-04 at 08:35 +0100, Pavel Stehule wrote:
 in this use case I am think so some regression test is important - It
 should not be mine, but missing more explicit regression test is
 reason, why this bug was not detected some years. 

I've added the tests.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers